API reference¶
The recommended surface is intentionally small. Most workflows start with
WalkForwardPolicy, TrainHistoryPolicy or
DriftMonitoringPolicy, then drop down to explicit simulation,
planning or splitter objects only when lower-level control is needed.
Public inputs can come from pandas, numpy or
polars. When the source is not pandas, Jano normalizes it at the boundary
and keeps the same split and reporting surface.
Main workflow¶
- class jano.workflows.WalkForwardPolicy(time_col, *, partition, step, strategy='rolling', allow_partial=False, engine='auto', start_at=None, end_at=None, max_folds=None)[source]
Recommended high-level entry point for production-like walk-forward evaluation.
- Parameters:
time_col (str | int | TemporalSemanticsSpec) – Timeline column name, column position, or
TemporalSemanticsSpec. UseTemporalSemanticsSpecwhen ordering, reporting and segment eligibility need different timestamp columns.partition (TemporalPartitionSpec) – Train/test or train/validation/test layout to move through time.
step – Amount by which the simulation advances after each fold. It must use the same unit family as
partitionsizes.strategy (str) – Movement strategy:
"single","rolling"or"expanding".allow_partial (bool) – Whether to keep a final fold whose last segment exceeds the available timeline.
engine (str) – Internal partition engine preference.
"auto"keeps native Polars and NumPy paths when safe;"pandas","polars"and"numpy"force a specific engine.start_at (object | None) – Optional lower timestamp bound applied before folds are planned.
end_at (object | None) – Optional upper timestamp bound applied before folds are planned.
max_folds (int | None) – Optional maximum number of folds to keep.
- plan(X, title=None)[source]
Return the precomputed walk-forward geometry.
- Parameters:
X – Input dataset as
pandas.DataFrame,numpy.ndarrayorpolars.DataFrame.title (str | None) – Optional title attached to the returned plan.
- Returns:
A
SimulationPlanwith fold boundaries and row counts, but without materialized train/test slices.- Return type:
SimulationPlan
- run(X, output_path=None, title=None)[source]
Materialize the walk-forward simulation.
- Parameters:
X – Input dataset as
pandas.DataFrame,numpy.ndarrayorpolars.DataFrame.output_path (str | None) – Optional filesystem path for the HTML report.
title (str | None) – Optional title used in reports.
- Returns:
A
SimulationResultwith materialized folds, tabular summary, chart data and rendered HTML.- Return type:
SimulationResult
- as_splitter()[source]
Expose the underlying splitter for manual control.
- Return type:
TemporalBacktestSplitter
- property simulation: TemporalSimulation
Expose the underlying simulation object.
- class jano.workflows.TrainHistoryPolicy(time_col, *, cutoff, train_sizes, test_size, gap_before_test=None)[source]
Recommended entry point for fixed-test, growing-train history studies.
- Parameters:
time_col (str | int | TemporalSemanticsSpec) – Timeline column name, column position, or
TemporalSemanticsSpec.cutoff – Boundary where train ends and the fixed test horizon begins after any
gap_before_test.train_sizes (Sequence[object]) – Candidate duration windows to evaluate by looking backward from
cutoff.test_size – Duration of the fixed test window.
gap_before_test – Optional duration gap between the train end and test start.
- evaluate(X, *, model, target_col, feature_cols=None, metrics=None)[source]
Evaluate all configured train-history variants against one fixed test slice.
- Parameters:
X – Input dataset as
pandas.DataFrame,numpy.ndarrayorpolars.DataFrame.model – Estimator with
fitandpredictmethods.target_col (ColumnRef) – Target column name or position.
feature_cols (Sequence[ColumnRef] | None) – Optional feature column names or positions. If omitted, all non-temporal, non-target columns are used.
metrics (MetricSpec) – Mapping of metric names to user-provided callables, such as
{"business_cost": cost_fn}.
- Return type:
TrainGrowthResult
- find_optimal_train_size(X, **kwargs)[source]
Return the smallest train window that stays within tolerance of the best score.
- Return type:
dict[str, object]
- class jano.workflows.DriftMonitoringPolicy(time_col, *, cutoff, train_size, test_size, step, gap_before_test=None, max_windows=None)[source]
Recommended entry point for fixed-train, moving-test decay monitoring.
- Parameters:
time_col (str | int | TemporalSemanticsSpec) – Timeline column name, column position, or
TemporalSemanticsSpec.cutoff – Boundary where the fixed train window ends.
train_size – Duration of the fixed train window looking backward from
cutoff.test_size – Duration of each forward test window.
step – Duration by which the test window advances after each evaluation.
gap_before_test – Optional duration gap between train end and first test start.
max_windows (int | None) – Optional maximum number of test windows to evaluate.
- evaluate(X, *, model, target_col, feature_cols=None, metrics=None)[source]
Evaluate how performance evolves as the test window moves forward.
- Parameters:
X – Input dataset as
pandas.DataFrame,numpy.ndarrayorpolars.DataFrame.model – Estimator with
fitandpredictmethods.target_col (ColumnRef) – Target column name or position.
feature_cols (Sequence[ColumnRef] | None) – Optional feature column names or positions. If omitted, all non-temporal, non-target columns are used.
metrics (MetricSpec) – Mapping of metric names to user-provided callables.
- Return type:
PerformanceDecayResult
- find_drift_onset(X, **kwargs)[source]
Return the first test window whose performance crosses the chosen threshold.
- Return type:
dict[str, object] | None
- class jano.workflows.RollingTrainHistoryPolicy(time_col, *, partition, step, train_sizes, strategy='rolling', allow_partial=False, engine='auto', start_at=None, end_at=None, max_folds=None)[source]
Run train-history optimization inside each outer walk-forward iteration.
This policy answers questions such as: how much training history is required on average if the optimal train window is allowed to vary over time?
- Parameters:
time_col (str | int | TemporalSemanticsSpec) – Timeline column name, column position, or
TemporalSemanticsSpec.partition (TemporalPartitionSpec) – Outer walk-forward partition that defines the moving train/test windows.
step – Amount by which the outer walk-forward process advances.
train_sizes (Sequence[object]) – Candidate train-history durations tested inside each outer fold.
strategy (str) – Outer movement strategy:
"single","rolling"or"expanding".allow_partial (bool) – Whether the outer plan can keep a final partial fold.
engine (str) – Internal partition engine preference for the outer plan.
start_at (object | None) – Optional lower timestamp bound for the outer plan.
end_at (object | None) – Optional upper timestamp bound for the outer plan.
max_folds (int | None) – Optional maximum number of outer folds.
- plan(X, title=None)[source]
Return the outer walk-forward plan used by the composed policy.
- Parameters:
title (str | None)
- Return type:
SimulationPlan
- evaluate(X, *, model, target_col, feature_cols=None, metrics=None, metric='rmse', tolerance=0.0, relative=True, title=None)[source]
Choose an optimal train-history size for each outer walk-forward iteration.
- Parameters:
X – Input dataset as
pandas.DataFrame,numpy.ndarrayorpolars.DataFrame.model – Estimator with
fitandpredictmethods.target_col (ColumnRef) – Target column name or position.
feature_cols (Sequence[ColumnRef] | None) – Optional feature column names or positions.
metrics (MetricSpec) – Mapping of metric names to user-provided callables.
metric (str) – Metric column used to choose the optimal train size.
tolerance (float) – Allowed distance from the best score.
relative (bool) – Whether
toleranceis proportional instead of absolute.title (str | None) – Optional title attached to the outer plan.
- Return type:
RollingTrainHistoryResult
- class jano.workflows.RollingTrainHistoryResult(records, metric)[source]
Per-iteration optimal training-history choices over a walk-forward plan.
- Parameters:
records (DataFrame)
metric (str)
- records
DataFrame with one row per outer walk-forward iteration and the selected train-history window for that iteration.
- Type:
pandas.DataFrame
- metric
Metric used to choose the optimal train-history size.
- Type:
str
- records: DataFrame
- metric: str
- to_frame()[source]
Return one row per outer iteration with the chosen optimal train size.
- Return type:
DataFrame
- summary()[source]
Return compact aggregate statistics for the chosen train windows.
- Return type:
dict[str, object]
- class jano.simulation.TemporalSimulation(time_col, partition, step, strategy='rolling', allow_partial=False, engine='auto', start_at=None, end_at=None, max_folds=None)[source]
High-level interface for executing a complete temporal simulation.
- Parameters:
time_col (str | int | TemporalSemanticsSpec) – Timeline column name, column position or
TemporalSemanticsSpecdescribing the timeline, ordering column and per-segment eligibility columns.partition (TemporalPartitionSpec) – High-level definition of the train/test or train/validation/test layout.
step – Amount by which the simulation advances after each fold.
strategy (str) – Simulation policy. Use
"single","rolling"or"expanding".allow_partial (bool) – Whether to keep the last fold when the final evaluation segment would otherwise run past the end of the dataset.
engine (str) – Internal partition engine preference. Use
"auto"to let Jano choose the safest native backend, or force"pandas","polars"or"numpy".start_at (object | None) – Optional lower bound for the simulation timeline. Rows strictly before this timestamp are excluded before folds are generated.
end_at (object | None) – Optional upper bound for the simulation timeline. Rows strictly after this timestamp are excluded before folds are generated.
max_folds (int | None) – Optional maximum number of folds to materialize.
- property time_col
Return the timeline column configured for the simulation.
- property partition
Return the validated partition configuration used by the simulation.
- property temporal_semantics: TemporalSemanticsSpec
Return the temporal semantics used by the simulation.
- as_splitter()[source]
Return the underlying low-level splitter.
- Return type:
TemporalBacktestSplitter
- run(X, output_path=None, title=None)[source]
Execute the configured simulation over
Xand materialize its folds.- Parameters:
X (DataFrame) – Input dataset as
pandas.DataFrame,numpy.ndarrayorpolars.DataFrame.output_path (str | Path | None) – Optional filesystem path where the rendered HTML report should be written.
title (str | None) – Optional title used in the returned report outputs.
- Returns:
A
SimulationResultcontaining the materialized folds and their summary.- Return type:
SimulationResult
- plan(X, title=None)[source]
Precompute the simulation geometry before materializing any folds.
- Parameters:
X (DataFrame) – Input dataset as
pandas.DataFrame,numpy.ndarrayorpolars.DataFrame.title (str | None) – Optional title used when the plan is later described or rendered.
- Returns:
A
SimulationPlanwith fold boundaries and row counts.- Return type:
SimulationPlan
- class jano.simulation.SimulationResult(frame, splits, summary, engine_metadata)[source]
Materialized result of running a temporal simulation over a dataset.
- Parameters:
frame (DataFrame)
splits (List[TimeSplit])
summary (SimulationSummary)
engine_metadata (PartitionEngineMetadata)
- frame
Source dataset used to build the simulation.
- Type:
pandas.DataFrame
- splits
Materialized fold objects.
- Type:
List[jano.splits.TimeSplit]
- summary
Structured report for the simulation.
- Type:
jano.reporting.SimulationSummary
- engine_metadata
Internal partition engine selected for the simulation.
- Type:
jano.engines.PartitionEngineMetadata
- frame: DataFrame
- splits: List[TimeSplit]
- summary: SimulationSummary
- engine_metadata: PartitionEngineMetadata
- property total_folds: int
Return the number of materialized folds.
- property chart_data: SimulationChartData
Return plot-ready chart data for the simulation.
- property html: str
Return the rendered HTML report.
- to_frame()[source]
Return fold-level simulation metadata as a pandas DataFrame.
- Return type:
DataFrame
- to_dict()[source]
Return a serializable dictionary representation.
- Return type:
dict[str, object]
- write_html(path)[source]
Write the rendered HTML report to disk.
- Parameters:
path (str | Path)
- Return type:
Path
- iter_splits()[source]
Iterate over materialized fold objects.
- Return type:
Iterator[TimeSplit]
- class jano.runner.WalkForwardRunner(*, model, target_col=None, feature_cols=None, retrain=True, retrain_interval=None, retrain_policy=None, metrics=None, metric_directions=None, primary_metric=None, evaluation=None, prediction_column='prediction')[source]
Run an estimator over temporal folds while applying a retrain policy.
- Parameters:
feature_cols (Sequence[object] | None)
retrain (bool | str)
retrain_interval (int | None)
retrain_policy (RetrainPolicy | None)
metrics (MetricSpec)
metric_directions (Mapping[str, str] | None)
primary_metric (str | None)
evaluation (EvaluationProfile | None)
prediction_column (str)
- run(workflow, X, y=None)[source]
Execute the configured estimator over a temporal workflow.
- Return type:
WalkForwardRunResult
- class jano.runner.WalkForwardRunResult(records, predictions, metric_directions, retrain_policy, primary_metric=None)[source]
Materialized execution of a temporal workflow with an estimator.
- Parameters:
records (DataFrame)
predictions (DataFrame)
metric_directions (dict[str, str])
retrain_policy (str)
primary_metric (str | None)
- records: DataFrame
- predictions: DataFrame
- metric_directions: dict[str, str]
- retrain_policy: str
- primary_metric: str | None = None
- to_frame()[source]
Return one row per evaluated fold.
- Return type:
DataFrame
- predictions_frame()[source]
Return row-level predictions across all test folds.
- Return type:
DataFrame
- property metric_names: list[str]
Return metric columns recorded for each evaluated fold.
- fold_summary()[source]
Return fold geometry and retraining metadata without metric columns.
- Return type:
DataFrame
- metric_trajectory()[source]
Return metrics in long format, one row per fold and metric.
- Return type:
DataFrame
- retrain_events()[source]
Return the subset of folds where the estimator was retrained.
- Return type:
DataFrame
- summary()[source]
Return compact aggregate execution statistics.
- Return type:
dict[str, object]
- report_data(*, include_predictions=False)[source]
Return JSON-ready execution data for notebooks, agents or custom reports.
- Parameters:
include_predictions (bool)
- Return type:
dict[str, object]
- to_dict(*, include_predictions=False)[source]
Return a serializable representation of the execution result.
- Parameters:
include_predictions (bool)
- Return type:
dict[str, object]
- class jano.online.OnlineTemporalRunner(*, model, time_col, target_col, initial_train_size, update_size=1, feature_cols=None, update_strategy=None, metrics=None, metric_directions=None, primary_metric=None, evaluation=None, include_predictions=True, prediction_column='prediction', retrain_trigger=None)[source]
Run prequential online temporal evaluation over events or micro-batches.
The runner first initializes a model on an initial train window. It then repeats the production-like sequence
predict -> observe target -> update modelfor each future event or micro-batch.- Parameters:
model – Estimator implementing
predictplus the method required byupdate_strategy. UsePartialFitUpdateStrategyfor incremental estimators orRefitUpdateStrategyfor standardfitestimators.time_col (str | int | TemporalSemanticsSpec) – Timeline column used to order events.
target_col (ColumnRef) – Target column name or position.
initial_train_size – Initial history used before the first prediction. Supports duration strings, integer row counts and fractions.
update_size – Event or micro-batch size. Use
1for event-level updates, an integer for row batches, or a duration string such as"1D".feature_cols (Sequence[ColumnRef] | None) – Optional feature columns. If omitted, all non-temporal, non-target columns are used.
update_strategy (OnlineUpdateStrategy | None) – Strategy that initializes and updates the model.
metrics (MetricSpec) – Mapping of metric names to user-provided callables.
metric_directions (dict[str, str] | None) – Optional metric direction overrides.
primary_metric (str | None) – Primary metric used by downstream analysis.
evaluation (EvaluationProfile | None) – Optional explicit
EvaluationProfile.include_predictions (bool) – Whether row-level predictions should be stored.
retrain_trigger (OnlineRetrainTrigger | None) – Optional callable evaluated after each batch is scored. It receives
history(all records up to the current batch) andlatest(the current batch record). ReturnTrue, a reason string, or a dictionary such as{"retrain": True, "reason": "..."}to mark that batch as a retraining checkpoint.prediction_column (str)
- run(X)[source]
Execute prequential evaluation over
X.- Return type:
OnlineRunResult
- class jano.online.OnlineRunResult(records, predictions, metric_directions, update_strategy, primary_metric=None)[source]
Materialized online temporal evaluation result.
- Parameters:
records (DataFrame)
predictions (DataFrame)
metric_directions (dict[str, str])
update_strategy (str)
primary_metric (str | None)
- records
One row per evaluated event or micro-batch.
- Type:
pandas.DataFrame
- predictions
Optional row-level predictions for all evaluated rows.
- Type:
pandas.DataFrame
- metric_directions
Mapping from metric name to
"min"or"max".- Type:
dict[str, str]
- update_strategy
Name of the strategy used to update the model.
- Type:
str
- primary_metric
Primary metric used by downstream analysis.
- Type:
str | None
- records: DataFrame
- predictions: DataFrame
- metric_directions: dict[str, str]
- update_strategy: str
- primary_metric: str | None = None
- to_frame()[source]
Return one row per evaluated event or micro-batch.
- Return type:
DataFrame
- predictions_frame()[source]
Return row-level predictions across all online evaluation batches.
- Return type:
DataFrame
- property metric_names: list[str]
- metric_trajectory()[source]
Return metrics in long format, one row per batch and metric.
- Return type:
DataFrame
- summary()[source]
Return compact aggregate statistics for the online run.
- Return type:
dict[str, object]
- report_data(*, include_predictions=False)[source]
Return JSON-ready data for notebooks, agents and custom reports.
- Parameters:
include_predictions (bool)
- Return type:
dict[str, object]
- to_dict(*, include_predictions=False)[source]
Return a serializable representation of the online run.
- Parameters:
include_predictions (bool)
- Return type:
dict[str, object]
- retrain_checkpoints()[source]
Return batches where the user-defined online retrain trigger fired.
- Return type:
DataFrame
- class jano.online.OnlineUpdatePolicyStudy(*, model, time_col, target_col, initial_train_size, policies, feature_cols=None, metrics=None, metric_directions=None, primary_metric=None, evaluation=None)[source]
Compare online update policies over the same temporal stream.
The study runs one
OnlineTemporalRunnerper candidate policy and returns policy-level metrics plus detailed per-policy runs. It is useful for comparing update cadences such as every event, everyNrows, every day, or refit strategies with different retained-history caps.- Parameters:
time_col (str | int | TemporalSemanticsSpec)
target_col (ColumnRef)
policies (Sequence[OnlineUpdatePolicy])
feature_cols (Sequence[ColumnRef] | None)
metrics (MetricSpec)
metric_directions (dict[str, str] | None)
primary_metric (str | None)
evaluation (EvaluationProfile | None)
- run(X)[source]
Evaluate all candidate online update policies over
X.- Return type:
OnlineUpdatePolicyStudyResult
- class jano.online.OnlineUpdatePolicyStudyResult(records, runs, metric_directions, primary_metric=None)[source]
Comparison result for multiple online update policies.
- Parameters:
records (DataFrame)
runs (dict[str, OnlineRunResult])
metric_directions (dict[str, str])
primary_metric (str | None)
- records: DataFrame
- runs: dict[str, OnlineRunResult]
- metric_directions: dict[str, str]
- primary_metric: str | None = None
- to_frame()[source]
Return one row per evaluated online update policy.
- Return type:
DataFrame
- run(policy)[source]
Return the detailed run for a named policy.
- Parameters:
policy (str)
- Return type:
OnlineRunResult
- metric_trajectory()[source]
Return long-format metric trajectories for all policies.
- Return type:
DataFrame
- find_optimal_policy(metric=None, *, update_cost_weight=0.0)[source]
Return the best policy after optional update-cost penalization.
- Parameters:
metric (str | None) – Metric column to optimize. Defaults to
primary_metric.update_cost_weight (float) – Penalty applied to
total_update_cost. For lower-is-better metrics the penalty is added; for higher-is-better metrics it is subtracted.
- Return type:
dict[str, object]
Evaluation profiles¶
- class jano.evaluation.EvaluationProfile(metrics=None, metric_directions=None, primary_metric=None)[source]
Define how a temporal run should be measured.
- Parameters:
metrics (Mapping[str, Callable[[ndarray, ndarray], float]] | None) – Mapping of metric names to user-defined callables.
Nonemeans no metrics are computed by Jano.metric_directions (Mapping[str, str] | None) – Optional mapping from metric name to
"min"or"max". Custom metrics default to"min"unless explicitly overridden.primary_metric (str | None) – Metric used as the default optimization or retraining signal.
- metrics: Mapping[str, Callable[[ndarray, ndarray], float]] | None = None
- metric_directions: Mapping[str, str] | None = None
- primary_metric: str | None = None
- resolve()[source]
Return normalized metric functions, directions and primary metric.
- Return type:
ResolvedEvaluationProfile
- class jano.evaluation.ResolvedEvaluationProfile(metrics, metric_directions, primary_metric)[source]
Normalized metrics and metadata consumed by execution layers.
- Parameters:
metrics (dict[str, Callable[[ndarray, ndarray], float]])
metric_directions (dict[str, str])
primary_metric (str | None)
- metrics: dict[str, Callable[[ndarray, ndarray], float]]
- metric_directions: dict[str, str]
- primary_metric: str | None
- class jano.evaluation.RegressionProfile(metrics=None, *, metric_directions=None, primary_metric=None)[source]
Convenience profile for user-provided regression-style losses.
- Parameters:
metrics (Mapping[str, Callable[[ndarray, ndarray], float]] | None)
metric_directions (Mapping[str, str] | None)
primary_metric (str | None)
- class jano.evaluation.ClassificationProfile(metrics=None, *, metric_directions=None, primary_metric=None)[source]
Convenience profile for user-provided classification-style scores.
- Parameters:
metrics (Mapping[str, Callable[[ndarray, ndarray], float]] | None)
metric_directions (Mapping[str, str] | None)
primary_metric (str | None)
- class jano.evaluation.OrdinalClassificationProfile(metrics, *, metric_directions=None, primary_metric=None)[source]
Profile for ordered classes where user-defined costs usually matter.
- Parameters:
metrics (Mapping[str, Callable[[ndarray, ndarray], float]] | None)
metric_directions (Mapping[str, str] | None)
primary_metric (str | None)
- class jano.evaluation.RankingProfile(metrics, *, metric_directions=None, primary_metric=None)[source]
Profile for ranking or retrieval evaluations with custom metrics.
- Parameters:
metrics (Mapping[str, Callable[[ndarray, ndarray], float]] | None)
metric_directions (Mapping[str, str] | None)
primary_metric (str | None)
- class jano.planning.SimulationPlan(partition_plan, title)[source]
High-level simulation plan with helpers for reporting and materialization.
- Parameters:
partition_plan (PartitionPlan)
title (str)
- partition_plan
Lower-level partition plan with fold boundaries and counts.
- Type:
jano.planning.PartitionPlan
- title
Report title used when the plan is described or written as HTML.
- Type:
str
- partition_plan: PartitionPlan
- title: str
- property total_folds: int
- to_frame()[source]
Return one row per planned fold with boundaries and row counts.
- Return type:
DataFrame
- select_iterations(iterations)[source]
Return a simulation plan containing only the selected iteration numbers.
- Parameters:
iterations (Sequence[int])
- Return type:
SimulationPlan
- select_from_iteration(iteration)[source]
Return a simulation plan starting at
iteration.- Parameters:
iteration (int)
- Return type:
SimulationPlan
- select_until_iteration(iteration)[source]
Return a simulation plan ending at
iteration.- Parameters:
iteration (int)
- Return type:
SimulationPlan
- exclude_windows(*, train=None, validation=None, test=None)[source]
Return a simulation plan after removing folds that overlap excluded windows.
- Parameters:
train (Sequence[tuple[object, object]] | None)
validation (Sequence[tuple[object, object]] | None)
test (Sequence[tuple[object, object]] | None)
- Return type:
SimulationPlan
- materialize()[source]
Materialize the plan into a
SimulationResult.- Return type:
SimulationResult
- describe()[source]
Materialize the plan and return its structured summary.
- Return type:
SimulationSummary
- write_html(path)[source]
Materialize the plan and write its rendered HTML report.
- Parameters:
path (str | Path)
- Return type:
Path
- class jano.types.TemporalPartitionSpec(layout, train_size, test_size=None, validation_size=None, gap_before_train=None, gap_before_validation=None, gap_before_test=None, gap_after_test=None, calendar_frequency=None)[source]
High-level description of a temporal partition layout.
- Parameters:
layout (str)
train_size (str | int | float | Timedelta)
test_size (str | int | float | Timedelta | None)
validation_size (str | int | float | Timedelta | None)
gap_before_train (str | int | float | Timedelta | None)
gap_before_validation (str | int | float | Timedelta | None)
gap_before_test (str | int | float | Timedelta | None)
gap_after_test (str | int | float | Timedelta | None)
calendar_frequency (str | None)
- layout
Either
train_testortrain_val_test.- Type:
str
- train_size
Size of the train segment.
- Type:
str | int | float | pandas.Timedelta
- test_size
Size of the test segment when present.
- Type:
str | int | float | pandas.Timedelta | None
- validation_size
Size of the validation segment when present.
- Type:
str | int | float | pandas.Timedelta | None
- gap_before_train
Optional gap inserted before train.
- Type:
str | int | float | pandas.Timedelta | None
- gap_before_validation
Optional gap inserted before validation.
- Type:
str | int | float | pandas.Timedelta | None
- gap_before_test
Optional gap inserted before test.
- Type:
str | int | float | pandas.Timedelta | None
- gap_after_test
Optional trailing gap after test.
- Type:
str | int | float | pandas.Timedelta | None
- calendar_frequency
Optional pandas-compatible frequency used to align duration windows to calendar boundaries. For example,
"D"makes daily windows run from midnight to midnight instead of from the first observed timestamp.- Type:
str | None
- layout: str
- train_size: str | int | float | Timedelta
- test_size: str | int | float | Timedelta | None = None
- validation_size: str | int | float | Timedelta | None = None
- gap_before_train: str | int | float | Timedelta | None = None
- gap_before_validation: str | int | float | Timedelta | None = None
- gap_before_test: str | int | float | Timedelta | None = None
- gap_after_test: str | int | float | Timedelta | None = None
- calendar_frequency: str | None = None
- class jano.types.TemporalSemanticsSpec(timeline_col, order_col=None, segment_time_cols=<factory>)[source]
Temporal semantics for ordering, reporting and segment eligibility.
- Parameters:
timeline_col (str | int)
order_col (str | int | None)
segment_time_cols (Mapping[str, str | int])
- timeline_col
Column used to anchor the global simulation timeline and reports.
- Type:
str | int
- order_col
Optional column used to sort the dataset internally. Defaults to
timeline_col.- Type:
str | int | None
- segment_time_cols
Optional per-segment timestamp mapping. Use this when a segment should be sliced by a different temporal column than the global timeline. For example, train can be filtered by
arrived_atwhile test stays anchored ondepartured_at.- Type:
Mapping[str, str | int]
- timeline_col: str | int
- order_col: str | int | None = None
- segment_time_cols: Mapping[str, str | int]
- property effective_order_col: str | int
Return the ordering column used by the engine.
- column_for_segment(name)[source]
Return the timestamp column used to assign rows to
name.- Parameters:
name (str)
- Return type:
str | int
- class jano.types.FeatureLookbackSpec(default_lookback=None, group_lookbacks=<factory>, feature_groups=<factory>)[source]
Lookback requirements for feature groups within the same fold.
- Parameters:
default_lookback (str | int | float | Timedelta | None)
group_lookbacks (Mapping[str, str | int | float | Timedelta])
feature_groups (Mapping[str, Sequence[str | int]])
- default_lookback
Optional fallback lookback applied to features that do not belong to an explicit group.
- Type:
str | int | float | pandas.Timedelta | None
- group_lookbacks
Mapping from feature-group name to the temporal lookback needed to build that group.
- Type:
Mapping[str, str | int | float | pandas.Timedelta]
- feature_groups
Mapping from group name to the feature columns that belong to it.
- Type:
Mapping[str, Sequence[str | int]]
All lookbacks must use duration-based sizes.
- default_lookback: str | int | float | Timedelta | None = None
- group_lookbacks: Mapping[str, str | int | float | Timedelta]
- feature_groups: Mapping[str, Sequence[str | int]]
- normalized_group_lookbacks()[source]
Return validated duration lookbacks for each explicit feature group.
- Return type:
dict[str, SizeSpec]
- normalized_default_lookback()[source]
Return the validated duration lookback for ungrouped features.
- Return type:
SizeSpec | None
- class jano.splitters.TemporalBacktestSplitter(time_col, partition, step, strategy='rolling', allow_partial=False, engine='auto')[source]
Flexible temporal splitter for single or repeated temporal backtests.
- Parameters:
time_col (str | int | TemporalSemanticsSpec) – Timeline column name, column position or
TemporalSemanticsSpecdescribing the timeline, ordering column and per-segment eligibility columns.partition (TemporalPartitionSpec) – High-level definition of the train/test or train/validation/test layout.
step – Amount by which the simulation advances after each fold. It must use the same unit family as the partition sizes.
strategy (str) – Simulation policy. Use
"single"for one split,"rolling"for fixed-size windows or"expanding"for growing training history.allow_partial (bool) – Whether to keep the last fold when the final evaluation segment would otherwise run past the end of the dataset.
engine (str) – Internal partition engine preference. Use
"auto"to let Jano choose the safest native backend, or force"pandas","polars"or"numpy".
- split(X, y=None, groups=None)[source]
Yield each fold as a plain tuple of positional index arrays.
- Parameters:
X – Input dataset as
pandas.DataFrame,numpy.ndarrayorpolars.DataFrame.y – Unused placeholder for scikit-learn compatibility.
groups – Unused placeholder for scikit-learn compatibility.
- Yields:
Tuples of NumPy arrays ordered by the configured segment names.
- Return type:
Iterator[Tuple[ndarray, …]]
- iter_splits(X, y=None, groups=None)[source]
Yield rich
TimeSplitobjects for each fold in the simulation.- Parameters:
X – Input dataset as
pandas.DataFrame,numpy.ndarrayorpolars.DataFrame.y – Unused placeholder for scikit-learn compatibility.
groups – Unused placeholder for scikit-learn compatibility.
- Yields:
TimeSplitinstances containing segment indices, boundaries and metadata.- Return type:
Iterator[TimeSplit]
- get_n_splits(X=None, y=None, groups=None)[source]
Return the number of valid folds generated for
X.- Parameters:
X – Input dataset as
pandas.DataFrame,numpy.ndarrayorpolars.DataFrame.y – Unused placeholder for scikit-learn compatibility.
groups – Unused placeholder for scikit-learn compatibility.
- Returns:
Total number of folds that would be produced by
iter_splits.- Return type:
int
- plan(X)[source]
Precompute the temporal geometry without materializing train/test slices.
- Parameters:
X – Input dataset as
pandas.DataFrame,numpy.ndarrayorpolars.DataFrame.- Returns:
A
PartitionPlancontaining fold boundaries and row counts.- Return type:
PartitionPlan
- describe_simulation(X, output_path=None, title=None, output='summary')[source]
Describe a simulation over a concrete dataset.
- Parameters:
X (DataFrame) – Input dataset as
pandas.DataFrame,numpy.ndarrayorpolars.DataFrame.output_path (str | Path | None) – Optional filesystem path where the rendered HTML report should be written.
title (str | None) – Optional title used in the returned report outputs.
output (str) – Output mode. Use
"summary"forSimulationSummary,"html"for a rendered HTML string or"chart_data"for plot-ready Python data.
- Returns:
A
SimulationSummary, raw HTML string orSimulationChartDatadepending onoutput.- Return type:
SimulationSummary | SimulationChartData | str
- class jano.planning.PartitionPlan(frame, temporal_semantics, strategy, size_kind, folds, engine=None)[source]
Precomputed temporal plan that can be inspected and materialized later.
A plan contains fold boundaries and row counts before the actual train/test slices are materialized. This makes it cheap to inspect, filter or subset a simulation.
- Parameters:
frame (Any)
temporal_semantics (TemporalSemanticsSpec)
strategy (str)
size_kind (str)
folds (List[PlannedFold])
engine (PartitionEngine | None)
- frame
Source dataset used to compute row counts and later materialize folds.
- Type:
Any
- temporal_semantics
Timeline, ordering and per-segment timestamp semantics.
- Type:
jano.types.TemporalSemanticsSpec
- strategy
Movement strategy used to generate folds.
- Type:
str
- size_kind
Unit family used by the partition:
"duration","rows"or"fraction".- Type:
str
- folds
Precomputed fold geometry.
- Type:
List[jano.planning.PlannedFold]
- engine
Internal partition engine used to materialize folds. When omitted, Jano builds one from
framefor backward compatibility.- Type:
jano.engines.PartitionEngine | None
- frame: Any
- temporal_semantics: TemporalSemanticsSpec
- strategy: str
- size_kind: str
- folds: List[PlannedFold]
- engine: PartitionEngine | None = None
- property total_folds: int
- property time_col
- to_frame()[source]
Return one row per planned fold with boundaries and row counts.
- Return type:
DataFrame
- select_iterations(iterations)[source]
Return a plan containing only the selected iteration numbers.
- Parameters:
iterations (Sequence[int])
- Return type:
PartitionPlan
- select_from_iteration(iteration)[source]
Return a plan containing iterations greater than or equal to
iteration.- Parameters:
iteration (int)
- Return type:
PartitionPlan
- select_until_iteration(iteration)[source]
Return a plan containing iterations less than or equal to
iteration.- Parameters:
iteration (int)
- Return type:
PartitionPlan
- exclude_windows(*, train=None, validation=None, test=None)[source]
Return a plan with folds removed when segment boundaries overlap exclusions.
- Parameters:
train (Sequence[tuple[object, object]] | None) – Optional excluded windows applied to train segment boundaries.
validation (Sequence[tuple[object, object]] | None) – Optional excluded windows applied to validation boundaries.
test (Sequence[tuple[object, object]] | None) – Optional excluded windows applied to test segment boundaries.
- Return type:
PartitionPlan
- materialize()[source]
Materialize the planned fold boundaries into
TimeSplitobjects.- Return type:
list[TimeSplit]
- iter_splits()[source]
Iterate over materialized
TimeSplitobjects.- Return type:
Iterator[TimeSplit]
- property engine_metadata: PartitionEngineMetadata
Return the internal engine metadata used by the plan.
- source_frame()[source]
Return the source data as pandas for reporting and user-facing slices.
- Return type:
DataFrame
- class jano.planning.PlannedFold(iteration, boundaries, counts, metadata=<factory>)[source]
Precomputed temporal geometry for one simulation iteration.
- Parameters:
iteration (int)
boundaries (Dict[str, SegmentBoundaries])
counts (Dict[str, int])
metadata (Dict[str, object])
- iteration
Zero-based simulation iteration.
- Type:
int
- boundaries
Mapping from segment name to closed-open temporal boundaries.
- Type:
Dict[str, jano.types.SegmentBoundaries]
- counts
Mapping from segment name to the number of rows in that segment.
- Type:
Dict[str, int]
- metadata
Additional planning metadata such as
is_partial.- Type:
Dict[str, object]
- iteration: int
- boundaries: Dict[str, SegmentBoundaries]
- counts: Dict[str, int]
- metadata: Dict[str, object]
- property fold: int
- property is_partial: bool
- property simulation_start: Timestamp
- property simulation_end: Timestamp
- to_dict()[source]
Return the fold as one serializable row for DataFrame/report output.
- Return type:
dict[str, object]
Retrain policies¶
- class jano.runner.RetrainPolicy[source]
Base interface for deciding whether a runner should retrain.
- should_retrain(context)[source]
- Parameters:
context (RetrainContext)
- Return type:
bool
- class jano.runner.AlwaysRetrain[source]
Retrain before every fold.
- should_retrain(context)[source]
- Parameters:
context (RetrainContext)
- Return type:
bool
- class jano.runner.NeverRetrain[source]
Train once and reuse the fitted model across all folds.
- should_retrain(context)[source]
- Parameters:
context (RetrainContext)
- Return type:
bool
- class jano.runner.PeriodicRetrain(every)[source]
Retrain every
everyfolds after the previous retrain.- Parameters:
every (int)
- should_retrain(context)[source]
- Parameters:
context (RetrainContext)
- Return type:
bool
- class jano.runner.FunctionRetrainPolicy(rule)[source]
Delegate retraining decisions to a user-provided callable.
- Parameters:
rule (Callable[[RetrainContext], bool])
- should_retrain(context)[source]
- Parameters:
context (RetrainContext)
- Return type:
bool
- class jano.runner.DriftBasedRetrain(*, metric=None, threshold=0.05, baseline='last_retrain', relative=True)[source]
Retrain when previously observed degradation crosses a threshold.
The decision for fold
kis based on metrics observed through foldk-1.- Parameters:
metric (str | None)
threshold (float)
baseline (str)
relative (bool)
- should_retrain(context)[source]
- Parameters:
context (RetrainContext)
- Return type:
bool
Online update strategies¶
- class jano.online.OnlineUpdateStrategy[source]
Base interface for updating a model during online temporal evaluation.
- name = 'OnlineUpdateStrategy'
- initialize(model, X_initial, y_initial)[source]
Fit the initial model state before the first prediction batch.
- Parameters:
X_initial (DataFrame)
y_initial (Series)
- update(model, X_batch, y_batch)[source]
Update the model after a prediction batch has been observed.
- Parameters:
X_batch (DataFrame)
y_batch (Series)
- class jano.online.OnlineUpdatePolicy(name, update_size, update_strategy=None, update_cost=1.0)[source]
Candidate online update policy evaluated by
OnlineUpdatePolicyStudy.- Parameters:
name (str) – Stable label used in result frames.
update_size (object) – Event, row-batch, duration or fraction cadence passed to
OnlineTemporalRunner.update_strategy (OnlineUpdateStrategy | Callable[[], OnlineUpdateStrategy] | None) – Strategy instance or factory. When omitted, the runner defaults to
PartialFitUpdateStrategy.update_cost (float) – Relative cost per update. Use this to compare predictive quality against operational cost.
- name: str
- update_size: object
- update_strategy: OnlineUpdateStrategy | Callable[[], OnlineUpdateStrategy] | None = None
- update_cost: float = 1.0
- build_strategy()[source]
Return a fresh update strategy for one candidate run.
- Return type:
OnlineUpdateStrategy | None
- class jano.online.PartialFitUpdateStrategy(classes=None)[source]
Update models that implement scikit-learn-style
partial_fit.- Parameters:
classes (Sequence[object] | None) – Optional class labels passed to the first
partial_fitcall for classifiers that require the full label set up front.
- name = 'PartialFitUpdateStrategy'
- initialize(model, X_initial, y_initial)[source]
Call
partial_fiton the initial train window.- Parameters:
X_initial (DataFrame)
y_initial (Series)
- update(model, X_batch, y_batch)[source]
Call
partial_fitafter predictions have been scored.- Parameters:
X_batch (DataFrame)
y_batch (Series)
- class jano.online.RefitUpdateStrategy(max_train_rows=None)[source]
Update any fit/predict estimator by refitting on observed history.
This strategy is slower than
PartialFitUpdateStrategybut works with standard estimators that only implementfit. After each prediction batch is observed, the batch is appended to the internal history and the estimator is fitted again.- Parameters:
max_train_rows (int | None) – Optional rolling cap for the number of most recent observed rows used on each refit. When omitted, history expands.
- name = 'RefitUpdateStrategy'
- initialize(model, X_initial, y_initial)[source]
Fit the model on the initial train window.
- Parameters:
X_initial (DataFrame)
y_initial (Series)
- update(model, X_batch, y_batch)[source]
Append the observed batch and refit the model on retained history.
- Parameters:
X_batch (DataFrame)
y_batch (Series)
Temporal policies¶
- class jano.policies.TrainGrowthPolicy(time_col, *, cutoff, train_sizes, test_size, gap_before_test=None)[source]
Evaluate whether adding more training history improves a fixed test slice.
This policy keeps the test window fixed and grows the train window backward in time. It is useful when you want to understand how much historical data is actually needed to match the best achievable test performance.
- Parameters:
time_col (str | int | TemporalSemanticsSpec) – Timeline column name, column position, or
TemporalSemanticsSpec.cutoff – Boundary where candidate train windows end and the fixed test horizon begins after any configured gap.
train_sizes (Sequence[object]) – Candidate duration windows evaluated backward from
cutoff.test_size – Duration of the fixed test window.
gap_before_test – Optional duration gap between train end and test start.
- evaluate(X, *, model, target_col, feature_cols=None, metrics=None)[source]
Run the fixed-test evaluation over all configured train sizes.
- Parameters:
X – Input dataset as
pandas.DataFrame,numpy.ndarrayorpolars.DataFrame.model – Estimator with
fitandpredictmethods.target_col (str | int) – Target column name or position.
feature_cols (Sequence[str | int] | None) – Optional feature column names or positions. If omitted, all non-temporal, non-target columns are used.
metrics (Mapping[str, Callable[[ndarray, ndarray], float]] | None) – Mapping of metric names to user-provided callables.
- Returns:
A
TrainGrowthResultcontaining one metric row per candidate train size.- Return type:
TrainGrowthResult
- find_optimal_train_size(X, *, model, target_col, feature_cols=None, metrics=None, metric='rmse', tolerance=0.0, relative=True)[source]
Return the smallest train size that stays within tolerance of the best score.
- Parameters:
X – Input dataset.
model – Estimator with
fitandpredictmethods.target_col (str | int) – Target column name or position.
feature_cols (Sequence[str | int] | None) – Optional feature column names or positions.
metrics (Mapping[str, Callable[[ndarray, ndarray], float]] | None) – Mapping of metric names to user-provided callables.
metric (str) – Metric column used to choose the optimal train size.
tolerance (float) – Allowed distance from the best score.
relative (bool) – Whether
toleranceis proportional instead of absolute.
- Return type:
dict[str, object]
- class jano.policies.TrainGrowthResult(records, metric_directions)[source]
Evaluated records for a fixed-test, growing-train temporal hypothesis.
- Parameters:
records (DataFrame)
metric_directions (dict[str, str])
- records
DataFrame with one row per candidate train window and metric columns.
- Type:
pandas.DataFrame
- metric_directions
Mapping from metric name to optimization direction:
"min"for lower-is-better metrics and"max"for higher-is-better metrics.- Type:
dict[str, str]
- records: DataFrame
- metric_directions: dict[str, str]
- to_frame()[source]
Return the evaluated train variants as a pandas DataFrame.
- Return type:
DataFrame
- find_optimal_train_size(metric='rmse', tolerance=0.0, relative=True)[source]
Return the smallest train window whose score is within tolerance of the best.
- Parameters:
metric (str) – Metric column used to compare train variants.
tolerance (float) – Allowed distance from the best score.
relative (bool) – Whether tolerance is interpreted proportionally instead of absolutely.
- Return type:
dict[str, object]
- class jano.policies.PerformanceDecayPolicy(time_col, *, cutoff, train_size, test_size, step, gap_before_test=None, max_windows=None)[source]
Evaluate how long a fixed train window stays useful as test moves forward.
This policy keeps train fixed and repeatedly shifts the test window into the future. It is useful when you want to estimate when performance decay or drift becomes operationally relevant without retraining the model at every step.
- Parameters:
time_col (str | int | TemporalSemanticsSpec) – Timeline column name, column position, or
TemporalSemanticsSpec.cutoff – Boundary where the fixed train window ends.
train_size – Duration of the fixed train window looking backward from
cutoff.test_size – Duration of each test window.
step – Duration by which the test window advances.
gap_before_test – Optional duration gap between train end and first test start.
max_windows (int | None) – Optional maximum number of test windows to evaluate.
- evaluate(X, *, model, target_col, feature_cols=None, metrics=None)[source]
Run the fixed-train evaluation over moving test windows.
- Parameters:
X – Input dataset as
pandas.DataFrame,numpy.ndarrayorpolars.DataFrame.model – Estimator with
fitandpredictmethods.target_col (str | int) – Target column name or position.
feature_cols (Sequence[str | int] | None) – Optional feature column names or positions. If omitted, all non-temporal, non-target columns are used.
metrics (Mapping[str, Callable[[ndarray, ndarray], float]] | None) – Mapping of metric names to user-provided callables.
- Returns:
A
PerformanceDecayResultcontaining one metric row per test window.- Return type:
PerformanceDecayResult
- find_drift_onset(X, *, model, target_col, feature_cols=None, metrics=None, metric='rmse', threshold=0.1, baseline='first', relative=True)[source]
Return the first test window whose metric crosses the degradation threshold.
- Parameters:
X – Input dataset.
model – Estimator with
fitandpredictmethods.target_col (str | int) – Target column name or position.
feature_cols (Sequence[str | int] | None) – Optional feature column names or positions.
metrics (Mapping[str, Callable[[ndarray, ndarray], float]] | None) – Mapping of metric names to user-provided callables.
metric (str) – Metric column used to detect degradation.
threshold (float) – Allowed degradation before a window is flagged.
baseline (str | float) –
"first","best"or an explicit numeric baseline.relative (bool) – Whether
thresholdis proportional instead of absolute.
- Return type:
dict[str, object] | None
- class jano.policies.PerformanceDecayResult(records, metric_directions)[source]
Evaluated records for a fixed-train, moving-test temporal hypothesis.
- Parameters:
records (DataFrame)
metric_directions (dict[str, str])
- records
DataFrame with one row per moving test window and metric columns.
- Type:
pandas.DataFrame
- metric_directions
Mapping from metric name to optimization direction.
- Type:
dict[str, str]
- records: DataFrame
- metric_directions: dict[str, str]
- to_frame()[source]
Return the evaluated test windows as a pandas DataFrame.
- Return type:
DataFrame
- find_drift_onset(metric='rmse', threshold=0.1, baseline='first', relative=True)[source]
Return the first evaluation window where performance becomes problematic.
- Parameters:
metric (str) – Metric column used to detect degradation.
threshold (float) – Allowed degradation from the baseline before the window is flagged.
baseline (str | float) –
"first","best"or an explicit numeric baseline.relative (bool) – Whether threshold is interpreted proportionally instead of absolutely.
- Return type:
dict[str, object] | None
Fold objects¶
- class jano.splits.TimeSplit(fold, segments, boundaries, metadata=<factory>)[source]
A single temporal partition with named segments and metadata.
- Parameters:
fold (int)
segments (Dict[str, ndarray])
boundaries (Dict[str, SegmentBoundaries])
metadata (Dict[str, object])
- fold
Zero-based fold number.
- Type:
int
- segments
Mapping from segment name to positional NumPy indices.
- Type:
Dict[str, numpy.ndarray]
- boundaries
Mapping from segment name to temporal boundaries.
- Type:
Dict[str, jano.types.SegmentBoundaries]
- metadata
Additional metadata such as strategy or size kind.
- Type:
Dict[str, object]
- fold: int
- segments: Dict[str, ndarray]
- boundaries: Dict[str, SegmentBoundaries]
- metadata: Dict[str, object]
- slice(X)[source]
Slice a DataFrame into segment-specific DataFrames.
- Parameters:
X (DataFrame)
- Return type:
Dict[str, DataFrame]
- slice_xy(X, y)[source]
Slice features and target into segment-specific objects.
- Parameters:
X (DataFrame)
y (Series)
- Return type:
Dict[str, DataFrame | Series]
- summary()[source]
Return a serializable summary of the fold and its segments.
- Return type:
Dict[str, object]
- feature_history_bounds(lookbacks, *, segment_name='train')[source]
Return per-group historical windows needed to build feature groups.
The returned windows end at the start of
segment_nameand extend backward according to each configured feature-group lookback.- Parameters:
lookbacks (FeatureLookbackSpec)
segment_name (str)
- Return type:
Dict[str, SegmentBoundaries]
- slice_feature_history(X, lookbacks, *, time_col, segment_name='train')[source]
Slice historical context windows needed by feature groups.
This helper is useful when the fold itself is fixed, but different feature groups need different amounts of past data to be engineered.
- Parameters:
X (DataFrame)
lookbacks (FeatureLookbackSpec)
time_col (str)
segment_name (str)
- Return type:
Dict[str, DataFrame]
Reporting objects¶
- class jano.reporting.SimulationSummary(title, time_col, dataset_start, dataset_end, total_rows, total_folds, strategy, size_kind, folds, segment_order, chart_data, html)[source]
Structured description of a temporal simulation over a dataset.
- Parameters:
title (str)
time_col (str)
dataset_start (Timestamp)
dataset_end (Timestamp)
total_rows (int)
total_folds (int)
strategy (str)
size_kind (str)
folds (List[Dict[str, object]])
segment_order (List[str])
chart_data (SimulationChartData)
html (str)
- title
Report title.
- Type:
str
- time_col
Name of the timestamp column used in the dataset.
- Type:
str
- dataset_start
Earliest timestamp present in the dataset.
- Type:
pandas.Timestamp
- dataset_end
Latest timestamp present in the dataset.
- Type:
pandas.Timestamp
- total_rows
Number of rows in the source dataset.
- Type:
int
- total_folds
Number of simulated folds.
- Type:
int
- strategy
Split strategy used to build the simulation.
- Type:
str
- size_kind
Unit family used by the partition sizes.
- Type:
str
- folds
Fold-by-fold segment metadata.
- Type:
List[Dict[str, object]]
- segment_order
Ordered list of segment names.
- Type:
List[str]
- chart_data
Plot-ready representation of the same simulation.
- Type:
jano.reporting.SimulationChartData
- html
Rendered HTML report.
- Type:
str
- title: str
- time_col: str
- dataset_start: Timestamp
- dataset_end: Timestamp
- total_rows: int
- total_folds: int
- strategy: str
- size_kind: str
- folds: List[Dict[str, object]]
- segment_order: List[str]
- chart_data: SimulationChartData
- html: str
- to_dict()[source]
Return a serializable dictionary representation.
- Return type:
Dict[str, object]
- to_frame()[source]
Convert fold summaries into a tabular pandas DataFrame.
- Return type:
DataFrame
- write_html(path)[source]
Write the rendered HTML report to
path.- Parameters:
path (str | Path)
- Return type:
Path
- class jano.reporting.SimulationChartData(title, time_col, dataset_start, dataset_end, total_rows, total_folds, strategy, size_kind, segment_order, segment_colors, segment_stats, folds)[source]
Plot-ready description of a temporal simulation timeline.
- Parameters:
title (str)
time_col (str)
dataset_start (Timestamp)
dataset_end (Timestamp)
total_rows (int)
total_folds (int)
strategy (str)
size_kind (str)
segment_order (List[str])
segment_colors (Dict[str, str])
segment_stats (Dict[str, Dict[str, object]])
folds (List[Dict[str, object]])
- title
Report title.
- Type:
str
- time_col
Name of the timestamp column used in the dataset.
- Type:
str
- dataset_start
Earliest timestamp present in the dataset.
- Type:
pandas.Timestamp
- dataset_end
Latest timestamp present in the dataset.
- Type:
pandas.Timestamp
- total_rows
Number of rows in the source dataset.
- Type:
int
- total_folds
Number of simulated folds.
- Type:
int
- strategy
Split strategy used to build the simulation.
- Type:
str
- size_kind
Unit family used by the partition sizes.
- Type:
str
- segment_order
Ordered list of segment names.
- Type:
List[str]
- segment_colors
Color associated with each segment.
- Type:
Dict[str, str]
- segment_stats
Aggregate per-segment row statistics across folds.
- Type:
Dict[str, Dict[str, object]]
- folds
Fold-level timeline payload ready for plotting.
- Type:
List[Dict[str, object]]
- title: str
- time_col: str
- dataset_start: Timestamp
- dataset_end: Timestamp
- total_rows: int
- total_folds: int
- strategy: str
- size_kind: str
- segment_order: List[str]
- segment_colors: Dict[str, str]
- segment_stats: Dict[str, Dict[str, object]]
- folds: List[Dict[str, object]]
- to_dict()[source]
Return a serializable dictionary representation.
- Return type:
Dict[str, object]
MCP helper functions¶
These functions power the optional local MCP server and are useful for understanding the exact tool contract exposed to AI clients.
- jano.mcp_tools.preview_dataset(dataset_path, *, dataset_format='auto', sample_rows=5)[source]
Return a compact preview of a local dataset for an MCP client.
- Parameters:
dataset_path (str) – Local dataset path.
dataset_format (str) – Explicit format or
"auto".sample_rows (int) – Number of rows to include in the preview.
- Returns:
JSON-ready dictionary with dataset path, column names and preview rows.
- Return type:
dict[str, Any]
- jano.mcp_tools.plan_walk_forward(dataset_path, *, partition, step, time_col=None, strategy='rolling', allow_partial=False, engine='auto', start_at=None, end_at=None, max_folds=None, dataset_format='auto', order_col=None, train_time_col=None, validation_time_col=None, test_time_col=None, title=None, preview_rows=20)[source]
Return a precomputed walk-forward plan as JSON-ready data.
- Parameters:
dataset_path (str) – Local dataset path.
partition (dict[str, Any]) – Dictionary accepted by
TemporalPartitionSpec.step (object) – Step size used by the walk-forward simulation.
time_col (str | int | None) – Timeline column name or position.
strategy (str) – Movement strategy:
"single","rolling"or"expanding".allow_partial (bool) – Whether to keep a final partial fold.
engine (str) – Internal partition engine preference:
"auto","pandas","polars"or"numpy".start_at (object | None) – Optional lower timestamp bound.
end_at (object | None) – Optional upper timestamp bound.
max_folds (int | None) – Optional maximum number of folds.
dataset_format (str) – Explicit format or
"auto".order_col (str | int | None) – Optional column used to sort the dataset.
train_time_col (str | int | None) – Optional timestamp column used to assign train rows.
validation_time_col (str | int | None) – Optional timestamp column used to assign validation rows.
test_time_col (str | int | None) – Optional timestamp column used to assign test rows.
title (str | None) – Optional report title.
preview_rows (int) – Number of planned folds to include in the returned preview.
- Returns:
JSON-ready dictionary with fold count, plan columns and preview rows.
- Return type:
dict[str, Any]
- jano.mcp_tools.run_walk_forward(dataset_path, *, partition, step, time_col=None, strategy='rolling', allow_partial=False, engine='auto', start_at=None, end_at=None, max_folds=None, dataset_format='auto', order_col=None, train_time_col=None, validation_time_col=None, test_time_col=None, title=None, preview_rows=20)[source]
Run a walk-forward simulation and return a compact summary.
- Parameters:
dataset_path (str) – Local dataset path.
partition (dict[str, Any]) – Dictionary accepted by
TemporalPartitionSpec.step (object) – Step size used by the walk-forward simulation.
time_col (str | int | None) – Timeline column name or position.
strategy (str) – Movement strategy:
"single","rolling"or"expanding".allow_partial (bool) – Whether to keep a final partial fold.
engine (str) – Internal partition engine preference:
"auto","pandas","polars"or"numpy".start_at (object | None) – Optional lower timestamp bound.
end_at (object | None) – Optional upper timestamp bound.
max_folds (int | None) – Optional maximum number of folds.
dataset_format (str) – Explicit format or
"auto".order_col (str | int | None) – Optional column used to sort the dataset.
train_time_col (str | int | None) – Optional timestamp column used to assign train rows.
validation_time_col (str | int | None) – Optional timestamp column used to assign validation rows.
test_time_col (str | int | None) – Optional timestamp column used to assign test rows.
title (str | None) – Optional report title.
preview_rows (int) – Number of summary rows to include in the response.
- Returns:
JSON-ready dictionary with fold summary, chart data and rendered HTML.
- Return type:
dict[str, Any]
- jano.mcp_tools.run_walk_forward_baseline(dataset_path, *, partition, step, time_col=None, target_col=None, feature_cols=None, model='mean', metrics=None, metric_directions=None, retrain='always', retrain_interval=None, drift_metric='rmse', drift_threshold=0.05, drift_baseline='last_retrain', drift_relative=True, strategy='rolling', allow_partial=False, engine='auto', start_at=None, end_at=None, max_folds=None, dataset_format='auto', order_col=None, train_time_col=None, validation_time_col=None, test_time_col=None, include_predictions=False, preview_rows=20, prediction_preview_rows=20)[source]
Run a model-free baseline over temporal folds for MCP clients.
- Parameters:
dataset_path (str) – Local dataset path.
partition (dict[str, Any]) – Dictionary accepted by
TemporalPartitionSpec.step (object) – Step size used by the walk-forward simulation.
time_col (str | int | None) – Timeline column name or position.
target_col (str | int | None) – Target column name or position.
feature_cols (list[str | int] | None) – Optional feature columns. Baseline models do not require features, but the argument keeps the MCP surface aligned with
WalkForwardRunner.model (str) – Baseline estimator:
"mean"for numeric regression or"majority_class"for classification.metrics (dict[str, Any] | None) – Mapping of metric names to user-provided callables.
metric_directions (dict[str, str] | None) – Optional mapping declaring
"min"or"max"per metric.retrain (bool | str) – Retraining policy:
"always","never","periodic","on_drift",TrueorFalse.retrain_interval (int | None) – Fold interval required by
retrain="periodic".drift_metric (str) – Metric monitored when
retrain="on_drift".drift_threshold (float) – Degradation threshold for drift-based retraining.
drift_baseline (str) – Drift baseline:
"last_retrain","first","best"or"previous_fold".drift_relative (bool) – Whether drift threshold is relative or absolute.
strategy (str) – Movement strategy:
"single","rolling"or"expanding".allow_partial (bool) – Whether to keep a final partial fold.
engine (str) – Internal partition engine preference.
start_at (object | None) – Optional lower timestamp bound.
end_at (object | None) – Optional upper timestamp bound.
max_folds (int | None) – Optional maximum number of folds.
dataset_format (str) – Explicit format or
"auto".order_col (str | int | None) – Optional column used to sort the dataset.
train_time_col (str | int | None) – Optional timestamp column used to assign train rows.
validation_time_col (str | int | None) – Optional timestamp column used to assign validation rows.
test_time_col (str | int | None) – Optional timestamp column used to assign test rows.
include_predictions (bool) – Whether to include a bounded prediction preview.
preview_rows (int) – Number of fold/metric rows returned in previews.
prediction_preview_rows (int) – Number of predictions returned when requested.
- Returns:
JSON-ready dictionary with runner summary, fold preview, metric trajectory preview, retraining events and optional prediction preview.
- Return type:
dict[str, Any]
- jano.mcp_tools.compare_retrain_policies(dataset_path, *, partition, step, time_col=None, target_col=None, feature_cols=None, model='mean', metrics=None, policies=None, strategy='rolling', allow_partial=False, engine='auto', start_at=None, end_at=None, max_folds=None, dataset_format='auto', order_col=None, train_time_col=None, validation_time_col=None, test_time_col=None, preview_rows=20)[source]
Compare retraining policies over the same walk-forward fold geometry.
The MCP surface intentionally uses built-in baseline models instead of accepting arbitrary Python estimators. Use the Python API for custom models.
- Parameters:
dataset_path (str)
partition (dict[str, Any])
step (object)
time_col (str | int | None)
target_col (str | int | None)
feature_cols (list[str | int] | None)
model (str)
metrics (dict[str, Any] | None)
policies (list[dict[str, Any]] | None)
strategy (str)
allow_partial (bool)
engine (str)
start_at (object | None)
end_at (object | None)
max_folds (int | None)
dataset_format (str)
order_col (str | int | None)
train_time_col (str | int | None)
validation_time_col (str | int | None)
test_time_col (str | int | None)
preview_rows (int)
- Return type:
dict[str, Any]
- jano.mcp_tools.find_train_history_window(dataset_path, *, time_col=None, cutoff=None, train_sizes=None, test_size=None, target_col=None, feature_cols=None, model='mean', metrics=None, metric='rmse', tolerance=0.0, relative=True, gap_before_test=None, dataset_format='auto', order_col=None, train_time_col=None, validation_time_col=None, test_time_col=None, preview_rows=20)[source]
Evaluate train-history candidates against one fixed test window.
- Parameters:
dataset_path (str)
time_col (str | int | None)
cutoff (object | None)
train_sizes (list[object] | None)
test_size (object | None)
target_col (str | int | None)
feature_cols (list[str | int] | None)
model (str)
metrics (dict[str, Any] | None)
metric (str)
tolerance (float)
relative (bool)
gap_before_test (object | None)
dataset_format (str)
order_col (str | int | None)
train_time_col (str | int | None)
validation_time_col (str | int | None)
test_time_col (str | int | None)
preview_rows (int)
- Return type:
dict[str, Any]
- jano.mcp_tools.monitor_decay(dataset_path, *, time_col=None, cutoff=None, train_size=None, test_size=None, step=None, target_col=None, feature_cols=None, model='mean', metrics=None, metric='rmse', threshold=0.1, baseline='first', relative=True, gap_before_test=None, max_windows=None, dataset_format='auto', order_col=None, train_time_col=None, validation_time_col=None, test_time_col=None, preview_rows=20)[source]
Evaluate when fixed-train performance decay crosses a threshold.
- Parameters:
dataset_path (str)
time_col (str | int | None)
cutoff (object | None)
train_size (object | None)
test_size (object | None)
step (object | None)
target_col (str | int | None)
feature_cols (list[str | int] | None)
model (str)
metrics (dict[str, Any] | None)
metric (str)
threshold (float)
baseline (str | float)
relative (bool)
gap_before_test (object | None)
max_windows (int | None)
dataset_format (str)
order_col (str | int | None)
train_time_col (str | int | None)
validation_time_col (str | int | None)
test_time_col (str | int | None)
preview_rows (int)
- Return type:
dict[str, Any]
Type and validation helpers¶
- class jano.engines.PartitionEngineMetadata(engine, input_backend, converted)[source]
Execution metadata for the internal partition engine.
- Parameters:
engine (str)
input_backend (str)
converted (bool)
- engine
Internal engine selected to compute temporal boundaries and indices.
- Type:
str
- input_backend
Backend detected from the user-provided input.
- Type:
str
- converted
Whether the full dataset was converted before planning.
- Type:
bool
- engine: str
- input_backend: str
- converted: bool
- to_dict()[source]
Return metadata as a serializable dictionary.
- Return type:
dict[str, object]
- class jano.types.SizeSpec(value, kind)[source]
Normalized specification for segment sizes.
- Parameters:
value (Timedelta | int | float)
kind (str)
- value
Parsed size value as
Timedelta, integer row count or fraction.- Type:
pandas.Timedelta | int | float
- kind
Unit family for the value:
duration,rowsorfraction.- Type:
str
- value: Timedelta | int | float
- kind: str
- classmethod from_value(value)[source]
Normalize a raw size value into a typed
SizeSpec.- Parameters:
value (str | int | float | Timedelta)
- Return type:
SizeSpec
- class jano.types.SegmentBoundaries(start, end)[source]
Closed-open boundaries for a named temporal segment.
- Parameters:
start (Timestamp)
end (Timestamp)
- start: Timestamp
- end: Timestamp
- class jano.validation.ValidatedPartitionSpec(layout, segments, gaps, tail_gap, size_kind, calendar_frequency=None)[source]
Partition specification after normalization.
- Parameters:
layout (str)
segments (Dict[str, SizeSpec])
gaps (Dict[str, SizeSpec])
tail_gap (SizeSpec | None)
size_kind (str)
calendar_frequency (str | None)
- layout: str
- segments: Dict[str, SizeSpec]
- gaps: Dict[str, SizeSpec]
- tail_gap: SizeSpec | None
- size_kind: str
- calendar_frequency: str | None = None
- jano.validation.validate_strategy(strategy)[source]
Validate and normalize a split strategy name.
- Parameters:
strategy (str)
- Return type:
str
- jano.validation.validate_partition_spec(partition)[source]
Validate a high-level partition spec and normalize its sizes.
- Parameters:
partition (TemporalPartitionSpec)
- Return type:
ValidatedPartitionSpec