Referencia de API¶

La superficie recomendada es deliberadamente chica. La mayoría de los workflows empiezan con WalkForwardPolicy, TrainHistoryPolicy o DriftMonitoringPolicy, y recién bajan a simulación explícita, planning o splitter cuando necesitan control de nivel más bajo.

Los inputs públicos pueden venir de pandas, numpy o polars. Cuando la fuente no es pandas, Jano la normaliza en el borde y mantiene la misma superficie de split y reporting.

Workflow principal¶

class jano.workflows.WalkForwardPolicy(time_col, *, partition, step, strategy='rolling', allow_partial=False, engine='auto', start_at=None, end_at=None, max_folds=None)[source]

Recommended high-level entry point for production-like walk-forward evaluation.

Parameters:

time_col (str | int | TemporalSemanticsSpec) – Timeline column name, column position, or TemporalSemanticsSpec. Use TemporalSemanticsSpec when ordering, reporting and segment eligibility need different timestamp columns.
partition (TemporalPartitionSpec) – Train/test or train/validation/test layout to move through time.
step – Amount by which the simulation advances after each fold. It must use the same unit family as partition sizes.
strategy (str) – Movement strategy: "single", "rolling" or "expanding".
allow_partial (bool) – Whether to keep a final fold whose last segment exceeds the available timeline.
engine (str) – Internal partition engine preference. "auto" keeps native Polars and NumPy paths when safe; "pandas", "polars" and "numpy" force a specific engine.
start_at (object | None) – Optional lower timestamp bound applied before folds are planned.
end_at (object | None) – Optional upper timestamp bound applied before folds are planned.
max_folds (int | None) – Optional maximum number of folds to keep.

plan(X, title=None)[source]

Return the precomputed walk-forward geometry.

Parameters:

X – Input dataset as pandas.DataFrame, numpy.ndarray or polars.DataFrame.
title (str | None) – Optional title attached to the returned plan.

Returns:

A SimulationPlan with fold boundaries and row counts, but without materialized train/test slices.

Return type:

SimulationPlan

run(X, output_path=None, title=None)[source]

Materialize the walk-forward simulation.

Parameters:

X – Input dataset as pandas.DataFrame, numpy.ndarray or polars.DataFrame.
output_path (str | None) – Optional filesystem path for the HTML report.
title (str | None) – Optional title used in reports.

Returns:

A SimulationResult with materialized folds, tabular summary, chart data and rendered HTML.

Return type:

SimulationResult

as_splitter()[source]

Expose the underlying splitter for manual control.

Return type:: TemporalBacktestSplitter

property simulation: TemporalSimulation: Expose the underlying simulation object.

class jano.workflows.TrainHistoryPolicy(time_col, *, cutoff, train_sizes, test_size, gap_before_test=None)[source]

Recommended entry point for fixed-test, growing-train history studies.

Parameters:

time_col (str | int | TemporalSemanticsSpec) – Timeline column name, column position, or TemporalSemanticsSpec.
cutoff – Boundary where train ends and the fixed test horizon begins after any gap_before_test.
train_sizes (Sequence[object]) – Candidate duration windows to evaluate by looking backward from cutoff.
test_size – Duration of the fixed test window.
gap_before_test – Optional duration gap between the train end and test start.

evaluate(X, *, model, target_col, feature_cols=None, metrics=None)[source]

Evaluate all configured train-history variants against one fixed test slice.

Parameters:

X – Input dataset as pandas.DataFrame, numpy.ndarray or polars.DataFrame.
model – Estimator with fit and predict methods.
target_col (ColumnRef) – Target column name or position.
feature_cols (Sequence[ColumnRef] | None) – Optional feature column names or positions. If omitted, all non-temporal, non-target columns are used.
metrics (MetricSpec) – Mapping of metric names to user-provided callables, such as {"business_cost": cost_fn}.

Return type:

TrainGrowthResult

find_optimal_train_size(X, **kwargs)[source]

Return the smallest train window that stays within tolerance of the best score.

Return type:: dict[str, object]

class jano.workflows.DriftMonitoringPolicy(time_col, *, cutoff, train_size, test_size, step, gap_before_test=None, max_windows=None)[source]

Recommended entry point for fixed-train, moving-test decay monitoring.

Parameters:

time_col (str | int | TemporalSemanticsSpec) – Timeline column name, column position, or TemporalSemanticsSpec.
cutoff – Boundary where the fixed train window ends.
train_size – Duration of the fixed train window looking backward from cutoff.
test_size – Duration of each forward test window.
step – Duration by which the test window advances after each evaluation.
gap_before_test – Optional duration gap between train end and first test start.
max_windows (int | None) – Optional maximum number of test windows to evaluate.

evaluate(X, *, model, target_col, feature_cols=None, metrics=None)[source]

Evaluate how performance evolves as the test window moves forward.

Parameters:

X – Input dataset as pandas.DataFrame, numpy.ndarray or polars.DataFrame.
model – Estimator with fit and predict methods.
target_col (ColumnRef) – Target column name or position.
feature_cols (Sequence[ColumnRef] | None) – Optional feature column names or positions. If omitted, all non-temporal, non-target columns are used.
metrics (MetricSpec) – Mapping of metric names to user-provided callables.

Return type:

PerformanceDecayResult

find_drift_onset(X, **kwargs)[source]

Return the first test window whose performance crosses the chosen threshold.

Return type:: dict[str, object] | None

class jano.workflows.RollingTrainHistoryPolicy(time_col, *, partition, step, train_sizes, strategy='rolling', allow_partial=False, engine='auto', start_at=None, end_at=None, max_folds=None)[source]

Run train-history optimization inside each outer walk-forward iteration.

This policy answers questions such as: how much training history is required on average if the optimal train window is allowed to vary over time?

Parameters:

time_col (str | int | TemporalSemanticsSpec) – Timeline column name, column position, or TemporalSemanticsSpec.
partition (TemporalPartitionSpec) – Outer walk-forward partition that defines the moving train/test windows.
step – Amount by which the outer walk-forward process advances.
train_sizes (Sequence[object]) – Candidate train-history durations tested inside each outer fold.
strategy (str) – Outer movement strategy: "single", "rolling" or "expanding".
allow_partial (bool) – Whether the outer plan can keep a final partial fold.
engine (str) – Internal partition engine preference for the outer plan.
start_at (object | None) – Optional lower timestamp bound for the outer plan.
end_at (object | None) – Optional upper timestamp bound for the outer plan.
max_folds (int | None) – Optional maximum number of outer folds.

plan(X, title=None)[source]

Return the outer walk-forward plan used by the composed policy.

Parameters:: title (str | None)
Return type:: SimulationPlan

evaluate(X, *, model, target_col, feature_cols=None, metrics=None, metric='rmse', tolerance=0.0, relative=True, title=None)[source]

Choose an optimal train-history size for each outer walk-forward iteration.

Parameters:

X – Input dataset as pandas.DataFrame, numpy.ndarray or polars.DataFrame.
model – Estimator with fit and predict methods.
target_col (ColumnRef) – Target column name or position.
feature_cols (Sequence[ColumnRef] | None) – Optional feature column names or positions.
metrics (MetricSpec) – Mapping of metric names to user-provided callables.
metric (str) – Metric column used to choose the optimal train size.
tolerance (float) – Allowed distance from the best score.
relative (bool) – Whether tolerance is proportional instead of absolute.
title (str | None) – Optional title attached to the outer plan.

Return type:

RollingTrainHistoryResult

class jano.workflows.RollingTrainHistoryResult(records, metric)[source]

Per-iteration optimal training-history choices over a walk-forward plan.

Parameters:

records (DataFrame)
metric (str)

records

DataFrame with one row per outer walk-forward iteration and the selected train-history window for that iteration.

Type:: pandas.DataFrame

metric

Metric used to choose the optimal train-history size.

Type:: str

records: DataFrame

metric: str

to_frame()[source]

Return one row per outer iteration with the chosen optimal train size.

Return type:: DataFrame

summary()[source]

Return compact aggregate statistics for the chosen train windows.

Return type:: dict[str, object]

class jano.simulation.TemporalSimulation(time_col, partition, step, strategy='rolling', allow_partial=False, engine='auto', start_at=None, end_at=None, max_folds=None)[source]

High-level interface for executing a complete temporal simulation.

Parameters:

time_col (str | int | TemporalSemanticsSpec) – Timeline column name, column position or TemporalSemanticsSpec describing the timeline, ordering column and per-segment eligibility columns.
partition (TemporalPartitionSpec) – High-level definition of the train/test or train/validation/test layout.
step – Amount by which the simulation advances after each fold.
strategy (str) – Simulation policy. Use "single", "rolling" or "expanding".
allow_partial (bool) – Whether to keep the last fold when the final evaluation segment would otherwise run past the end of the dataset.
engine (str) – Internal partition engine preference. Use "auto" to let Jano choose the safest native backend, or force "pandas", "polars" or "numpy".
start_at (object | None) – Optional lower bound for the simulation timeline. Rows strictly before this timestamp are excluded before folds are generated.
end_at (object | None) – Optional upper bound for the simulation timeline. Rows strictly after this timestamp are excluded before folds are generated.
max_folds (int | None) – Optional maximum number of folds to materialize.

property time_col: Return the timeline column configured for the simulation.

property partition: Return the validated partition configuration used by the simulation.

property temporal_semantics: TemporalSemanticsSpec: Return the temporal semantics used by the simulation.

as_splitter()[source]

Return the underlying low-level splitter.

Return type:: TemporalBacktestSplitter

run(X, output_path=None, title=None)[source]

Execute the configured simulation over X and materialize its folds.

Parameters:

X (DataFrame) – Input dataset as pandas.DataFrame, numpy.ndarray or polars.DataFrame.
output_path (str | Path | None) – Optional filesystem path where the rendered HTML report should be written.
title (str | None) – Optional title used in the returned report outputs.

Returns:

A SimulationResult containing the materialized folds and their summary.

Return type:

SimulationResult

plan(X, title=None)[source]

Precompute the simulation geometry before materializing any folds.

Parameters:

X (DataFrame) – Input dataset as pandas.DataFrame, numpy.ndarray or polars.DataFrame.
title (str | None) – Optional title used when the plan is later described or rendered.

Returns:

A SimulationPlan with fold boundaries and row counts.

Return type:

SimulationPlan

class jano.simulation.SimulationResult(frame, splits, summary, engine_metadata)[source]

Materialized result of running a temporal simulation over a dataset.

Parameters:

frame (DataFrame)
splits (List[TimeSplit])
summary (SimulationSummary)
engine_metadata (PartitionEngineMetadata)

frame

Source dataset used to build the simulation.

Type:: pandas.DataFrame

splits

Materialized fold objects.

Type:: List[jano.splits.TimeSplit]

summary

Structured report for the simulation.

Type:: jano.reporting.SimulationSummary

engine_metadata

Internal partition engine selected for the simulation.

Type:: jano.engines.PartitionEngineMetadata

frame: DataFrame

splits: List[TimeSplit]

summary: SimulationSummary

engine_metadata: PartitionEngineMetadata

property total_folds: int: Return the number of materialized folds.

property chart_data: SimulationChartData: Return plot-ready chart data for the simulation.

property html: str: Return the rendered HTML report.

to_frame()[source]

Return fold-level simulation metadata as a pandas DataFrame.

Return type:: DataFrame

to_dict()[source]

Return a serializable dictionary representation.

Return type:: dict[str, object]

write_html(path)[source]

Write the rendered HTML report to disk.

Parameters:: path (str | Path)
Return type:: Path

iter_splits()[source]

Iterate over materialized fold objects.

Return type:: Iterator[TimeSplit]

class jano.runner.WalkForwardRunner(*, model, target_col=None, feature_cols=None, retrain=True, retrain_interval=None, retrain_policy=None, metrics=None, metric_directions=None, primary_metric=None, evaluation=None, prediction_column='prediction')[source]

Run an estimator over temporal folds while applying a retrain policy.

Parameters:

feature_cols (Sequence[object] | None)
retrain (bool | str)
retrain_interval (int | None)
retrain_policy (RetrainPolicy | None)
metrics (MetricSpec)
metric_directions (Mapping[str, str] | None)
primary_metric (str | None)
evaluation (EvaluationProfile | None)
prediction_column (str)

run(workflow, X, y=None)[source]

Execute the configured estimator over a temporal workflow.

Return type:: WalkForwardRunResult

class jano.runner.WalkForwardRunResult(records, predictions, metric_directions, retrain_policy, primary_metric=None)[source]

Materialized execution of a temporal workflow with an estimator.

Parameters:

records (DataFrame)
predictions (DataFrame)
metric_directions (dict[str, str])
retrain_policy (str)
primary_metric (str | None)

records: DataFrame

predictions: DataFrame

metric_directions: dict[str, str]

retrain_policy: str

primary_metric: str | None = None

to_frame()[source]

Return one row per evaluated fold.

Return type:: DataFrame

predictions_frame()[source]

Return row-level predictions across all test folds.

Return type:: DataFrame

property metric_names: list[str]: Return metric columns recorded for each evaluated fold.

fold_summary()[source]

Return fold geometry and retraining metadata without metric columns.

Return type:: DataFrame

metric_trajectory()[source]

Return metrics in long format, one row per fold and metric.

Return type:: DataFrame

retrain_events()[source]

Return the subset of folds where the estimator was retrained.

Return type:: DataFrame

summary()[source]

Return compact aggregate execution statistics.

Return type:: dict[str, object]

report_data(*, include_predictions=False)[source]

Return JSON-ready execution data for notebooks, agents or custom reports.

Parameters:: include_predictions (bool)
Return type:: dict[str, object]

to_dict(*, include_predictions=False)[source]

Return a serializable representation of the execution result.

Parameters:: include_predictions (bool)
Return type:: dict[str, object]

class jano.online.OnlineTemporalRunner(*, model, time_col, target_col, initial_train_size, update_size=1, feature_cols=None, update_strategy=None, metrics=None, metric_directions=None, primary_metric=None, evaluation=None, include_predictions=True, prediction_column='prediction', retrain_trigger=None)[source]

Run prequential online temporal evaluation over events or micro-batches.

The runner first initializes a model on an initial train window. It then repeats the production-like sequence predict -> observe target -> update model for each future event or micro-batch.

Parameters:

model – Estimator implementing predict plus the method required by update_strategy. Use PartialFitUpdateStrategy for incremental estimators or RefitUpdateStrategy for standard fit estimators.
time_col (str | int | TemporalSemanticsSpec) – Timeline column used to order events.
target_col (ColumnRef) – Target column name or position.
initial_train_size – Initial history used before the first prediction. Supports duration strings, integer row counts and fractions.
update_size – Event or micro-batch size. Use 1 for event-level updates, an integer for row batches, or a duration string such as "1D".
feature_cols (Sequence[ColumnRef] | None) – Optional feature columns. If omitted, all non-temporal, non-target columns are used.
update_strategy (OnlineUpdateStrategy | None) – Strategy that initializes and updates the model.
metrics (MetricSpec) – Mapping of metric names to user-provided callables.
metric_directions (dict[str, str] | None) – Optional metric direction overrides.
primary_metric (str | None) – Primary metric used by downstream analysis.
evaluation (EvaluationProfile | None) – Optional explicit EvaluationProfile.
include_predictions (bool) – Whether row-level predictions should be stored.
retrain_trigger (OnlineRetrainTrigger | None) – Optional callable evaluated after each batch is scored. It receives history (all records up to the current batch) and latest (the current batch record). Return True, a reason string, or a dictionary such as {"retrain": True, "reason": "..."} to mark that batch as a retraining checkpoint.
prediction_column (str)

run(X)[source]

Execute prequential evaluation over X.

Return type:: OnlineRunResult

class jano.online.OnlineRunResult(records, predictions, metric_directions, update_strategy, primary_metric=None)[source]

Materialized online temporal evaluation result.

Parameters:

records (DataFrame)
predictions (DataFrame)
metric_directions (dict[str, str])
update_strategy (str)
primary_metric (str | None)

records

One row per evaluated event or micro-batch.

Type:: pandas.DataFrame

predictions

Optional row-level predictions for all evaluated rows.

Type:: pandas.DataFrame

metric_directions

Mapping from metric name to "min" or "max".

Type:: dict[str, str]

update_strategy

Name of the strategy used to update the model.

Type:: str

primary_metric

Primary metric used by downstream analysis.

Type:: str | None

records: DataFrame

predictions: DataFrame

metric_directions: dict[str, str]

update_strategy: str

primary_metric: str | None = None

to_frame()[source]

Return one row per evaluated event or micro-batch.

Return type:: DataFrame

predictions_frame()[source]

Return row-level predictions across all online evaluation batches.

Return type:: DataFrame

property metric_names: list[str]

metric_trajectory()[source]

Return metrics in long format, one row per batch and metric.

Return type:: DataFrame

summary()[source]

Return compact aggregate statistics for the online run.

Return type:: dict[str, object]

report_data(*, include_predictions=False)[source]

Return JSON-ready data for notebooks, agents and custom reports.

Parameters:: include_predictions (bool)
Return type:: dict[str, object]

to_dict(*, include_predictions=False)[source]

Return a serializable representation of the online run.

Parameters:: include_predictions (bool)
Return type:: dict[str, object]

retrain_checkpoints()[source]

Return batches where the user-defined online retrain trigger fired.

Return type:: DataFrame

class jano.online.OnlineUpdatePolicyStudy(*, model, time_col, target_col, initial_train_size, policies, feature_cols=None, metrics=None, metric_directions=None, primary_metric=None, evaluation=None)[source]

Compare online update policies over the same temporal stream.

The study runs one OnlineTemporalRunner per candidate policy and returns policy-level metrics plus detailed per-policy runs. It is useful for comparing update cadences such as every event, every N rows, every day, or refit strategies with different retained-history caps.

Parameters:

time_col (str | int | TemporalSemanticsSpec)
target_col (ColumnRef)
policies (Sequence[OnlineUpdatePolicy])
feature_cols (Sequence[ColumnRef] | None)
metrics (MetricSpec)
metric_directions (dict[str, str] | None)
primary_metric (str | None)
evaluation (EvaluationProfile | None)

run(X)[source]

Evaluate all candidate online update policies over X.

Return type:: OnlineUpdatePolicyStudyResult

class jano.online.OnlineUpdatePolicyStudyResult(records, runs, metric_directions, primary_metric=None)[source]

Comparison result for multiple online update policies.

Parameters:

records (DataFrame)
runs (dict[str, OnlineRunResult])
metric_directions (dict[str, str])
primary_metric (str | None)

records: DataFrame

runs: dict[str, OnlineRunResult]

metric_directions: dict[str, str]

primary_metric: str | None = None

to_frame()[source]

Return one row per evaluated online update policy.

Return type:: DataFrame

run(policy)[source]

Return the detailed run for a named policy.

Parameters:: policy (str)
Return type:: OnlineRunResult

metric_trajectory()[source]

Return long-format metric trajectories for all policies.

Return type:: DataFrame

find_optimal_policy(metric=None, *, update_cost_weight=0.0)[source]

Return the best policy after optional update-cost penalization.

Parameters:

metric (str | None) – Metric column to optimize. Defaults to primary_metric.
update_cost_weight (float) – Penalty applied to total_update_cost. For lower-is-better metrics the penalty is added; for higher-is-better metrics it is subtracted.

Return type:

dict[str, object]

Perfiles de evaluación¶

class jano.evaluation.EvaluationProfile(metrics=None, metric_directions=None, primary_metric=None)[source]

Define how a temporal run should be measured.

Parameters:

metrics (Mapping[str, Callable[[ndarray, ndarray], float]] | None) – Mapping of metric names to user-defined callables. None means no metrics are computed by Jano.
metric_directions (Mapping[str, str] | None) – Optional mapping from metric name to "min" or "max". Custom metrics default to "min" unless explicitly overridden.
primary_metric (str | None) – Metric used as the default optimization or retraining signal.

metrics: Mapping[str, Callable[[ndarray, ndarray], float]] | None = None

metric_directions: Mapping[str, str] | None = None

primary_metric: str | None = None

resolve()[source]

Return normalized metric functions, directions and primary metric.

Return type:: ResolvedEvaluationProfile

class jano.evaluation.ResolvedEvaluationProfile(metrics, metric_directions, primary_metric)[source]

Normalized metrics and metadata consumed by execution layers.

Parameters:

metrics (dict[str, Callable[[ndarray, ndarray], float]])
metric_directions (dict[str, str])
primary_metric (str | None)

metrics: dict[str, Callable[[ndarray, ndarray], float]]

metric_directions: dict[str, str]

primary_metric: str | None

class jano.evaluation.RegressionProfile(metrics=None, *, metric_directions=None, primary_metric=None)[source]

Convenience profile for user-provided regression-style losses.

Parameters:

metrics (Mapping[str, Callable[[ndarray, ndarray], float]] | None)
metric_directions (Mapping[str, str] | None)
primary_metric (str | None)

class jano.evaluation.ClassificationProfile(metrics=None, *, metric_directions=None, primary_metric=None)[source]

Convenience profile for user-provided classification-style scores.

Parameters:

metrics (Mapping[str, Callable[[ndarray, ndarray], float]] | None)
metric_directions (Mapping[str, str] | None)
primary_metric (str | None)

class jano.evaluation.OrdinalClassificationProfile(metrics, *, metric_directions=None, primary_metric=None)[source]

Profile for ordered classes where user-defined costs usually matter.

Parameters:

metrics (Mapping[str, Callable[[ndarray, ndarray], float]] | None)
metric_directions (Mapping[str, str] | None)
primary_metric (str | None)

class jano.evaluation.RankingProfile(metrics, *, metric_directions=None, primary_metric=None)[source]

Profile for ranking or retrieval evaluations with custom metrics.

Parameters:

metrics (Mapping[str, Callable[[ndarray, ndarray], float]] | None)
metric_directions (Mapping[str, str] | None)
primary_metric (str | None)

class jano.planning.SimulationPlan(partition_plan, title)[source]

High-level simulation plan with helpers for reporting and materialization.

Parameters:

partition_plan (PartitionPlan)
title (str)

partition_plan

Lower-level partition plan with fold boundaries and counts.

Type:: jano.planning.PartitionPlan

title

Report title used when the plan is described or written as HTML.

Type:: str

partition_plan: PartitionPlan

title: str

property total_folds: int

to_frame()[source]

Return one row per planned fold with boundaries and row counts.

Return type:: DataFrame

select_iterations(iterations)[source]

Return a simulation plan containing only the selected iteration numbers.

Parameters:: iterations (Sequence[int])
Return type:: SimulationPlan

select_from_iteration(iteration)[source]

Return a simulation plan starting at iteration.

Parameters:: iteration (int)
Return type:: SimulationPlan

select_until_iteration(iteration)[source]

Return a simulation plan ending at iteration.

Parameters:: iteration (int)
Return type:: SimulationPlan

exclude_windows(*, train=None, validation=None, test=None)[source]

Return a simulation plan after removing folds that overlap excluded windows.

Parameters:

train (Sequence[tuple[object, object]] | None)
validation (Sequence[tuple[object, object]] | None)
test (Sequence[tuple[object, object]] | None)

Return type:

SimulationPlan

materialize()[source]

Materialize the plan into a SimulationResult.

Return type:: SimulationResult

describe()[source]

Materialize the plan and return its structured summary.

Return type:: SimulationSummary

write_html(path)[source]

Materialize the plan and write its rendered HTML report.

Parameters:: path (str | Path)
Return type:: Path

class jano.types.TemporalPartitionSpec(layout, train_size, test_size=None, validation_size=None, gap_before_train=None, gap_before_validation=None, gap_before_test=None, gap_after_test=None, calendar_frequency=None)[source]

High-level description of a temporal partition layout.

Parameters:

layout (str)
train_size (str | int | float | Timedelta)
test_size (str | int | float | Timedelta | None)
validation_size (str | int | float | Timedelta | None)
gap_before_train (str | int | float | Timedelta | None)
gap_before_validation (str | int | float | Timedelta | None)
gap_before_test (str | int | float | Timedelta | None)
gap_after_test (str | int | float | Timedelta | None)
calendar_frequency (str | None)

layout

Either train_test or train_val_test.

Type:: str

train_size

Size of the train segment.

Type:: str | int | float | pandas.Timedelta

test_size

Size of the test segment when present.

Type:: str | int | float | pandas.Timedelta | None

validation_size

Size of the validation segment when present.

Type:: str | int | float | pandas.Timedelta | None

gap_before_train

Optional gap inserted before train.

Type:: str | int | float | pandas.Timedelta | None

gap_before_validation

Optional gap inserted before validation.

Type:: str | int | float | pandas.Timedelta | None

gap_before_test

Optional gap inserted before test.

Type:: str | int | float | pandas.Timedelta | None

gap_after_test

Optional trailing gap after test.

Type:: str | int | float | pandas.Timedelta | None

calendar_frequency

Optional pandas-compatible frequency used to align duration windows to calendar boundaries. For example, "D" makes daily windows run from midnight to midnight instead of from the first observed timestamp.

Type:: str | None

layout: str

train_size: str | int | float | Timedelta

test_size: str | int | float | Timedelta | None = None

validation_size: str | int | float | Timedelta | None = None

gap_before_train: str | int | float | Timedelta | None = None

gap_before_validation: str | int | float | Timedelta | None = None

gap_before_test: str | int | float | Timedelta | None = None

gap_after_test: str | int | float | Timedelta | None = None

calendar_frequency: str | None = None

class jano.types.TemporalSemanticsSpec(timeline_col, order_col=None, segment_time_cols=<factory>)[source]

Temporal semantics for ordering, reporting and segment eligibility.

Parameters:

timeline_col (str | int)
order_col (str | int | None)
segment_time_cols (Mapping[str, str | int])

timeline_col

Column used to anchor the global simulation timeline and reports.

Type:: str | int

order_col

Optional column used to sort the dataset internally. Defaults to timeline_col.

Type:: str | int | None

segment_time_cols

Optional per-segment timestamp mapping. Use this when a segment should be sliced by a different temporal column than the global timeline. For example, train can be filtered by arrived_at while test stays anchored on departured_at.

Type:: Mapping[str, str | int]

timeline_col: str | int

order_col: str | int | None = None

segment_time_cols: Mapping[str, str | int]

property effective_order_col: str | int: Return the ordering column used by the engine.

column_for_segment(name)[source]

Return the timestamp column used to assign rows to name.

Parameters:: name (str)
Return type:: str | int

class jano.types.FeatureLookbackSpec(default_lookback=None, group_lookbacks=<factory>, feature_groups=<factory>)[source]

Lookback requirements for feature groups within the same fold.

Parameters:

default_lookback (str | int | float | Timedelta | None)
group_lookbacks (Mapping[str, str | int | float | Timedelta])
feature_groups (Mapping[str, Sequence[str | int]])

default_lookback

Optional fallback lookback applied to features that do not belong to an explicit group.

Type:: str | int | float | pandas.Timedelta | None

group_lookbacks

Mapping from feature-group name to the temporal lookback needed to build that group.

Type:: Mapping[str, str | int | float | pandas.Timedelta]

feature_groups

Mapping from group name to the feature columns that belong to it.

Type:: Mapping[str, Sequence[str | int]]

All lookbacks must use duration-based sizes.

default_lookback: str | int | float | Timedelta | None = None

group_lookbacks: Mapping[str, str | int | float | Timedelta]

feature_groups: Mapping[str, Sequence[str | int]]

normalized_group_lookbacks()[source]

Return validated duration lookbacks for each explicit feature group.

Return type:: dict[str, SizeSpec]

normalized_default_lookback()[source]

Return the validated duration lookback for ungrouped features.

Return type:: SizeSpec | None

class jano.splitters.TemporalBacktestSplitter(time_col, partition, step, strategy='rolling', allow_partial=False, engine='auto')[source]

Flexible temporal splitter for single or repeated temporal backtests.

Parameters:

time_col (str | int | TemporalSemanticsSpec) – Timeline column name, column position or TemporalSemanticsSpec describing the timeline, ordering column and per-segment eligibility columns.
partition (TemporalPartitionSpec) – High-level definition of the train/test or train/validation/test layout.
step – Amount by which the simulation advances after each fold. It must use the same unit family as the partition sizes.
strategy (str) – Simulation policy. Use "single" for one split, "rolling" for fixed-size windows or "expanding" for growing training history.
allow_partial (bool) – Whether to keep the last fold when the final evaluation segment would otherwise run past the end of the dataset.
engine (str) – Internal partition engine preference. Use "auto" to let Jano choose the safest native backend, or force "pandas", "polars" or "numpy".

split(X, y=None, groups=None)[source]

Yield each fold as a plain tuple of positional index arrays.

Parameters:

X – Input dataset as pandas.DataFrame, numpy.ndarray or polars.DataFrame.
y – Unused placeholder for scikit-learn compatibility.
groups – Unused placeholder for scikit-learn compatibility.

Yields:

Tuples of NumPy arrays ordered by the configured segment names.

Return type:

Iterator[Tuple[ndarray, …]]

iter_splits(X, y=None, groups=None)[source]

Yield rich TimeSplit objects for each fold in the simulation.

Parameters:

X – Input dataset as pandas.DataFrame, numpy.ndarray or polars.DataFrame.
y – Unused placeholder for scikit-learn compatibility.
groups – Unused placeholder for scikit-learn compatibility.

Yields:

TimeSplit instances containing segment indices, boundaries and metadata.

Return type:

Iterator[TimeSplit]

get_n_splits(X=None, y=None, groups=None)[source]

Return the number of valid folds generated for X.

Parameters:

X – Input dataset as pandas.DataFrame, numpy.ndarray or polars.DataFrame.
y – Unused placeholder for scikit-learn compatibility.
groups – Unused placeholder for scikit-learn compatibility.

Returns:

Total number of folds that would be produced by iter_splits.

Return type:

int

plan(X)[source]

Precompute the temporal geometry without materializing train/test slices.

Parameters:: X – Input dataset as pandas.DataFrame, numpy.ndarray or polars.DataFrame.
Returns:: A PartitionPlan containing fold boundaries and row counts.
Return type:: PartitionPlan

describe_simulation(X, output_path=None, title=None, output='summary')[source]

Describe a simulation over a concrete dataset.

Parameters:

X (DataFrame) – Input dataset as pandas.DataFrame, numpy.ndarray or polars.DataFrame.
output_path (str | Path | None) – Optional filesystem path where the rendered HTML report should be written.
title (str | None) – Optional title used in the returned report outputs.
output (str) – Output mode. Use "summary" for SimulationSummary, "html" for a rendered HTML string or "chart_data" for plot-ready Python data.

Returns:

A SimulationSummary, raw HTML string or SimulationChartData depending on output.

Return type:

SimulationSummary | SimulationChartData | str

class jano.planning.PartitionPlan(frame, temporal_semantics, strategy, size_kind, folds, engine=None)[source]

Precomputed temporal plan that can be inspected and materialized later.

A plan contains fold boundaries and row counts before the actual train/test slices are materialized. This makes it cheap to inspect, filter or subset a simulation.

Parameters:

frame (Any)
temporal_semantics (TemporalSemanticsSpec)
strategy (str)
size_kind (str)
folds (List[PlannedFold])
engine (PartitionEngine | None)

frame

Source dataset used to compute row counts and later materialize folds.

Type:: Any

temporal_semantics

Timeline, ordering and per-segment timestamp semantics.

Type:: jano.types.TemporalSemanticsSpec

strategy

Movement strategy used to generate folds.

Type:: str

size_kind

Unit family used by the partition: "duration", "rows" or "fraction".

Type:: str

folds

Precomputed fold geometry.

Type:: List[jano.planning.PlannedFold]

engine

Internal partition engine used to materialize folds. When omitted, Jano builds one from frame for backward compatibility.

Type:: jano.engines.PartitionEngine | None

frame: Any

temporal_semantics: TemporalSemanticsSpec

strategy: str

size_kind: str

folds: List[PlannedFold]

engine: PartitionEngine | None = None

property total_folds: int

property time_col

to_frame()[source]

Return one row per planned fold with boundaries and row counts.

Return type:: DataFrame

select_iterations(iterations)[source]

Return a plan containing only the selected iteration numbers.

Parameters:: iterations (Sequence[int])
Return type:: PartitionPlan

select_from_iteration(iteration)[source]

Return a plan containing iterations greater than or equal to iteration.

Parameters:: iteration (int)
Return type:: PartitionPlan

select_until_iteration(iteration)[source]

Return a plan containing iterations less than or equal to iteration.

Parameters:: iteration (int)
Return type:: PartitionPlan

exclude_windows(*, train=None, validation=None, test=None)[source]

Return a plan with folds removed when segment boundaries overlap exclusions.

Parameters:

train (Sequence[tuple[object, object]] | None) – Optional excluded windows applied to train segment boundaries.
validation (Sequence[tuple[object, object]] | None) – Optional excluded windows applied to validation boundaries.
test (Sequence[tuple[object, object]] | None) – Optional excluded windows applied to test segment boundaries.

Return type:

PartitionPlan

materialize()[source]

Materialize the planned fold boundaries into TimeSplit objects.

Return type:: list[TimeSplit]

iter_splits()[source]

Iterate over materialized TimeSplit objects.

Return type:: Iterator[TimeSplit]

property engine_metadata: PartitionEngineMetadata: Return the internal engine metadata used by the plan.

source_frame()[source]

Return the source data as pandas for reporting and user-facing slices.

Return type:: DataFrame

class jano.planning.PlannedFold(iteration, boundaries, counts, metadata=<factory>)[source]

Precomputed temporal geometry for one simulation iteration.

Parameters:

iteration (int)
boundaries (Dict[str, SegmentBoundaries])
counts (Dict[str, int])
metadata (Dict[str, object])

iteration

Zero-based simulation iteration.

Type:: int

boundaries

Mapping from segment name to closed-open temporal boundaries.

Type:: Dict[str, jano.types.SegmentBoundaries]

counts

Mapping from segment name to the number of rows in that segment.

Type:: Dict[str, int]

metadata

Additional planning metadata such as is_partial.

Type:: Dict[str, object]

iteration: int

boundaries: Dict[str, SegmentBoundaries]

counts: Dict[str, int]

metadata: Dict[str, object]

property fold: int

property is_partial: bool

property simulation_start: Timestamp

property simulation_end: Timestamp

to_dict()[source]

Return the fold as one serializable row for DataFrame/report output.

Return type:: dict[str, object]

Policies de retraining¶

class jano.runner.RetrainPolicy[source]

Base interface for deciding whether a runner should retrain.

should_retrain(context)[source]

Parameters:: context (RetrainContext)
Return type:: bool

class jano.runner.AlwaysRetrain[source]

Retrain before every fold.

should_retrain(context)[source]

Parameters:: context (RetrainContext)
Return type:: bool

class jano.runner.NeverRetrain[source]

Train once and reuse the fitted model across all folds.

should_retrain(context)[source]

Parameters:: context (RetrainContext)
Return type:: bool

class jano.runner.PeriodicRetrain(every)[source]

Retrain every every folds after the previous retrain.

Parameters:: every (int)

should_retrain(context)[source]

Parameters:: context (RetrainContext)
Return type:: bool

class jano.runner.FunctionRetrainPolicy(rule)[source]

Delegate retraining decisions to a user-provided callable.

Parameters:: rule (Callable[[RetrainContext], bool])

should_retrain(context)[source]

Parameters:: context (RetrainContext)
Return type:: bool

class jano.runner.DriftBasedRetrain(*, metric=None, threshold=0.05, baseline='last_retrain', relative=True)[source]

Retrain when previously observed degradation crosses a threshold.

The decision for fold k is based on metrics observed through fold k-1.

Parameters:

metric (str | None)
threshold (float)
baseline (str)
relative (bool)

should_retrain(context)[source]

Parameters:: context (RetrainContext)
Return type:: bool

Estrategias de actualización online¶

class jano.online.OnlineUpdateStrategy[source]

Base interface for updating a model during online temporal evaluation.

name = 'OnlineUpdateStrategy'

initialize(model, X_initial, y_initial)[source]

Fit the initial model state before the first prediction batch.

Parameters:

X_initial (DataFrame)
y_initial (Series)

update(model, X_batch, y_batch)[source]

Update the model after a prediction batch has been observed.

Parameters:

X_batch (DataFrame)
y_batch (Series)

class jano.online.OnlineUpdatePolicy(name, update_size, update_strategy=None, update_cost=1.0)[source]

Candidate online update policy evaluated by OnlineUpdatePolicyStudy.

Parameters:

name (str) – Stable label used in result frames.
update_size (object) – Event, row-batch, duration or fraction cadence passed to OnlineTemporalRunner.
update_strategy (OnlineUpdateStrategy | Callable[[], OnlineUpdateStrategy] | None) – Strategy instance or factory. When omitted, the runner defaults to PartialFitUpdateStrategy.
update_cost (float) – Relative cost per update. Use this to compare predictive quality against operational cost.

name: str

update_size: object

update_strategy: OnlineUpdateStrategy | Callable[[], OnlineUpdateStrategy] | None = None

update_cost: float = 1.0

build_strategy()[source]

Return a fresh update strategy for one candidate run.

Return type:: OnlineUpdateStrategy | None

class jano.online.PartialFitUpdateStrategy(classes=None)[source]

Update models that implement scikit-learn-style partial_fit.

Parameters:: classes (Sequence[object] | None) – Optional class labels passed to the first partial_fit call for classifiers that require the full label set up front.

name = 'PartialFitUpdateStrategy'

initialize(model, X_initial, y_initial)[source]

Call partial_fit on the initial train window.

Parameters:

X_initial (DataFrame)
y_initial (Series)

update(model, X_batch, y_batch)[source]

Call partial_fit after predictions have been scored.

Parameters:

X_batch (DataFrame)
y_batch (Series)

class jano.online.RefitUpdateStrategy(max_train_rows=None)[source]

Update any fit/predict estimator by refitting on observed history.

This strategy is slower than PartialFitUpdateStrategy but works with standard estimators that only implement fit. After each prediction batch is observed, the batch is appended to the internal history and the estimator is fitted again.

Parameters:: max_train_rows (int | None) – Optional rolling cap for the number of most recent observed rows used on each refit. When omitted, history expands.

name = 'RefitUpdateStrategy'

initialize(model, X_initial, y_initial)[source]

Fit the model on the initial train window.

Parameters:

X_initial (DataFrame)
y_initial (Series)

update(model, X_batch, y_batch)[source]

Append the observed batch and refit the model on retained history.

Parameters:

X_batch (DataFrame)
y_batch (Series)

Policies temporales¶

class jano.policies.TrainGrowthPolicy(time_col, *, cutoff, train_sizes, test_size, gap_before_test=None)[source]

Evaluate whether adding more training history improves a fixed test slice.

This policy keeps the test window fixed and grows the train window backward in time. It is useful when you want to understand how much historical data is actually needed to match the best achievable test performance.

Parameters:

time_col (str | int | TemporalSemanticsSpec) – Timeline column name, column position, or TemporalSemanticsSpec.
cutoff – Boundary where candidate train windows end and the fixed test horizon begins after any configured gap.
train_sizes (Sequence[object]) – Candidate duration windows evaluated backward from cutoff.
test_size – Duration of the fixed test window.
gap_before_test – Optional duration gap between train end and test start.

evaluate(X, *, model, target_col, feature_cols=None, metrics=None)[source]

Run the fixed-test evaluation over all configured train sizes.

Parameters:

X – Input dataset as pandas.DataFrame, numpy.ndarray or polars.DataFrame.
model – Estimator with fit and predict methods.
target_col (str | int) – Target column name or position.
feature_cols (Sequence[str | int] | None) – Optional feature column names or positions. If omitted, all non-temporal, non-target columns are used.
metrics (Mapping[str, Callable[[ndarray, ndarray], float]] | None) – Mapping of metric names to user-provided callables.

Returns:

A TrainGrowthResult containing one metric row per candidate train size.

Return type:

TrainGrowthResult

find_optimal_train_size(X, *, model, target_col, feature_cols=None, metrics=None, metric='rmse', tolerance=0.0, relative=True)[source]

Return the smallest train size that stays within tolerance of the best score.

Parameters:

X – Input dataset.
model – Estimator with fit and predict methods.
target_col (str | int) – Target column name or position.
feature_cols (Sequence[str | int] | None) – Optional feature column names or positions.
metrics (Mapping[str, Callable[[ndarray, ndarray], float]] | None) – Mapping of metric names to user-provided callables.
metric (str) – Metric column used to choose the optimal train size.
tolerance (float) – Allowed distance from the best score.
relative (bool) – Whether tolerance is proportional instead of absolute.

Return type:

dict[str, object]

class jano.policies.TrainGrowthResult(records, metric_directions)[source]

Evaluated records for a fixed-test, growing-train temporal hypothesis.

Parameters:

records (DataFrame)
metric_directions (dict[str, str])

records

DataFrame with one row per candidate train window and metric columns.

Type:: pandas.DataFrame

metric_directions

Mapping from metric name to optimization direction: "min" for lower-is-better metrics and "max" for higher-is-better metrics.

Type:: dict[str, str]

records: DataFrame

metric_directions: dict[str, str]

to_frame()[source]

Return the evaluated train variants as a pandas DataFrame.

Return type:: DataFrame

find_optimal_train_size(metric='rmse', tolerance=0.0, relative=True)[source]

Return the smallest train window whose score is within tolerance of the best.

Parameters:

metric (str) – Metric column used to compare train variants.
tolerance (float) – Allowed distance from the best score.
relative (bool) – Whether tolerance is interpreted proportionally instead of absolutely.

Return type:

dict[str, object]

class jano.policies.PerformanceDecayPolicy(time_col, *, cutoff, train_size, test_size, step, gap_before_test=None, max_windows=None)[source]

Evaluate how long a fixed train window stays useful as test moves forward.

This policy keeps train fixed and repeatedly shifts the test window into the future. It is useful when you want to estimate when performance decay or drift becomes operationally relevant without retraining the model at every step.

Parameters:

time_col (str | int | TemporalSemanticsSpec) – Timeline column name, column position, or TemporalSemanticsSpec.
cutoff – Boundary where the fixed train window ends.
train_size – Duration of the fixed train window looking backward from cutoff.
test_size – Duration of each test window.
step – Duration by which the test window advances.
gap_before_test – Optional duration gap between train end and first test start.
max_windows (int | None) – Optional maximum number of test windows to evaluate.

evaluate(X, *, model, target_col, feature_cols=None, metrics=None)[source]

Run the fixed-train evaluation over moving test windows.

Parameters:

X – Input dataset as pandas.DataFrame, numpy.ndarray or polars.DataFrame.
model – Estimator with fit and predict methods.
target_col (str | int) – Target column name or position.
feature_cols (Sequence[str | int] | None) – Optional feature column names or positions. If omitted, all non-temporal, non-target columns are used.
metrics (Mapping[str, Callable[[ndarray, ndarray], float]] | None) – Mapping of metric names to user-provided callables.

Returns:

A PerformanceDecayResult containing one metric row per test window.

Return type:

PerformanceDecayResult

find_drift_onset(X, *, model, target_col, feature_cols=None, metrics=None, metric='rmse', threshold=0.1, baseline='first', relative=True)[source]

Return the first test window whose metric crosses the degradation threshold.

Parameters:

X – Input dataset.
model – Estimator with fit and predict methods.
target_col (str | int) – Target column name or position.
feature_cols (Sequence[str | int] | None) – Optional feature column names or positions.
metrics (Mapping[str, Callable[[ndarray, ndarray], float]] | None) – Mapping of metric names to user-provided callables.
metric (str) – Metric column used to detect degradation.
threshold (float) – Allowed degradation before a window is flagged.
baseline (str | float) – "first", "best" or an explicit numeric baseline.
relative (bool) – Whether threshold is proportional instead of absolute.

Return type:

dict[str, object] | None

class jano.policies.PerformanceDecayResult(records, metric_directions)[source]

Evaluated records for a fixed-train, moving-test temporal hypothesis.

Parameters:

records (DataFrame)
metric_directions (dict[str, str])

records

DataFrame with one row per moving test window and metric columns.

Type:: pandas.DataFrame

metric_directions

Mapping from metric name to optimization direction.

Type:: dict[str, str]

records: DataFrame

metric_directions: dict[str, str]

to_frame()[source]

Return the evaluated test windows as a pandas DataFrame.

Return type:: DataFrame

find_drift_onset(metric='rmse', threshold=0.1, baseline='first', relative=True)[source]

Return the first evaluation window where performance becomes problematic.

Parameters:

metric (str) – Metric column used to detect degradation.
threshold (float) – Allowed degradation from the baseline before the window is flagged.
baseline (str | float) – "first", "best" or an explicit numeric baseline.
relative (bool) – Whether threshold is interpreted proportionally instead of absolutely.

Return type:

dict[str, object] | None

Objetos de fold¶

class jano.splits.TimeSplit(fold, segments, boundaries, metadata=<factory>)[source]

A single temporal partition with named segments and metadata.

Parameters:

fold (int)
segments (Dict[str, ndarray])
boundaries (Dict[str, SegmentBoundaries])
metadata (Dict[str, object])

fold

Zero-based fold number.

Type:: int

segments

Mapping from segment name to positional NumPy indices.

Type:: Dict[str, numpy.ndarray]

boundaries

Mapping from segment name to temporal boundaries.

Type:: Dict[str, jano.types.SegmentBoundaries]

metadata

Additional metadata such as strategy or size kind.

Type:: Dict[str, object]

fold: int

segments: Dict[str, ndarray]

boundaries: Dict[str, SegmentBoundaries]

metadata: Dict[str, object]

slice(X)[source]

Slice a DataFrame into segment-specific DataFrames.

Parameters:: X (DataFrame)
Return type:: Dict[str, DataFrame]

slice_xy(X, y)[source]

Slice features and target into segment-specific objects.

Parameters:

X (DataFrame)
y (Series)

Return type:

Dict[str, DataFrame | Series]

summary()[source]

Return a serializable summary of the fold and its segments.

Return type:: Dict[str, object]

feature_history_bounds(lookbacks, *, segment_name='train')[source]

Return per-group historical windows needed to build feature groups.

The returned windows end at the start of segment_name and extend backward according to each configured feature-group lookback.

Parameters:

lookbacks (FeatureLookbackSpec)
segment_name (str)

Return type:

Dict[str, SegmentBoundaries]

slice_feature_history(X, lookbacks, *, time_col, segment_name='train')[source]

Slice historical context windows needed by feature groups.

This helper is useful when the fold itself is fixed, but different feature groups need different amounts of past data to be engineered.

Parameters:

X (DataFrame)
lookbacks (FeatureLookbackSpec)
time_col (str)
segment_name (str)

Return type:

Dict[str, DataFrame]

Objetos de reporting¶

class jano.reporting.SimulationSummary(title, time_col, dataset_start, dataset_end, total_rows, total_folds, strategy, size_kind, folds, segment_order, chart_data, html)[source]

Structured description of a temporal simulation over a dataset.

Parameters:

title (str)
time_col (str)
dataset_start (Timestamp)
dataset_end (Timestamp)
total_rows (int)
total_folds (int)
strategy (str)
size_kind (str)
folds (List[Dict[str, object]])
segment_order (List[str])
chart_data (SimulationChartData)
html (str)

title

Report title.

Type:: str

time_col

Name of the timestamp column used in the dataset.

Type:: str

dataset_start

Earliest timestamp present in the dataset.

Type:: pandas.Timestamp

dataset_end

Latest timestamp present in the dataset.

Type:: pandas.Timestamp

total_rows

Number of rows in the source dataset.

Type:: int

total_folds

Number of simulated folds.

Type:: int

strategy

Split strategy used to build the simulation.

Type:: str

size_kind

Unit family used by the partition sizes.

Type:: str

folds

Fold-by-fold segment metadata.

Type:: List[Dict[str, object]]

segment_order

Ordered list of segment names.

Type:: List[str]

chart_data

Plot-ready representation of the same simulation.

Type:: jano.reporting.SimulationChartData

html

Rendered HTML report.

Type:: str

title: str

time_col: str

dataset_start: Timestamp

dataset_end: Timestamp

total_rows: int

total_folds: int

strategy: str

size_kind: str

folds: List[Dict[str, object]]

segment_order: List[str]

chart_data: SimulationChartData

html: str

to_dict()[source]

Return a serializable dictionary representation.

Return type:: Dict[str, object]

to_frame()[source]

Convert fold summaries into a tabular pandas DataFrame.

Return type:: DataFrame

write_html(path)[source]

Write the rendered HTML report to path.

Parameters:: path (str | Path)
Return type:: Path

class jano.reporting.SimulationChartData(title, time_col, dataset_start, dataset_end, total_rows, total_folds, strategy, size_kind, segment_order, segment_colors, segment_stats, folds)[source]

Plot-ready description of a temporal simulation timeline.

Parameters:

title (str)
time_col (str)
dataset_start (Timestamp)
dataset_end (Timestamp)
total_rows (int)
total_folds (int)
strategy (str)
size_kind (str)
segment_order (List[str])
segment_colors (Dict[str, str])
segment_stats (Dict[str, Dict[str, object]])
folds (List[Dict[str, object]])

title

Report title.

Type:: str

time_col

Name of the timestamp column used in the dataset.

Type:: str

dataset_start

Earliest timestamp present in the dataset.

Type:: pandas.Timestamp

dataset_end

Latest timestamp present in the dataset.

Type:: pandas.Timestamp

total_rows

Number of rows in the source dataset.

Type:: int

total_folds

Number of simulated folds.

Type:: int

strategy

Split strategy used to build the simulation.

Type:: str

size_kind

Unit family used by the partition sizes.

Type:: str

segment_order

Ordered list of segment names.

Type:: List[str]

segment_colors

Color associated with each segment.

Type:: Dict[str, str]

segment_stats

Aggregate per-segment row statistics across folds.

Type:: Dict[str, Dict[str, object]]

folds

Fold-level timeline payload ready for plotting.

Type:: List[Dict[str, object]]

title: str

time_col: str

dataset_start: Timestamp

dataset_end: Timestamp

total_rows: int

total_folds: int

strategy: str

size_kind: str

segment_order: List[str]

segment_colors: Dict[str, str]

segment_stats: Dict[str, Dict[str, object]]

folds: List[Dict[str, object]]

to_dict()[source]

Return a serializable dictionary representation.

Return type:: Dict[str, object]

Funciones helper MCP¶

Estas funciones sostienen el servidor MCP local opcional y sirven para entender el contrato exacto de tools que se expone a clientes de IA.

jano.mcp_tools.preview_dataset(dataset_path, *, dataset_format='auto', sample_rows=5)[source]

Return a compact preview of a local dataset for an MCP client.

Parameters:

dataset_path (str) – Local dataset path.
dataset_format (str) – Explicit format or "auto".
sample_rows (int) – Number of rows to include in the preview.

Returns:

JSON-ready dictionary with dataset path, column names and preview rows.

Return type:

dict[str, Any]

jano.mcp_tools.plan_walk_forward(dataset_path, *, partition, step, time_col=None, strategy='rolling', allow_partial=False, engine='auto', start_at=None, end_at=None, max_folds=None, dataset_format='auto', order_col=None, train_time_col=None, validation_time_col=None, test_time_col=None, title=None, preview_rows=20)[source]

Return a precomputed walk-forward plan as JSON-ready data.

Parameters:

dataset_path (str) – Local dataset path.
partition (dict[str, Any]) – Dictionary accepted by TemporalPartitionSpec.
step (object) – Step size used by the walk-forward simulation.
time_col (str | int | None) – Timeline column name or position.
strategy (str) – Movement strategy: "single", "rolling" or "expanding".
allow_partial (bool) – Whether to keep a final partial fold.
engine (str) – Internal partition engine preference: "auto", "pandas", "polars" or "numpy".
start_at (object | None) – Optional lower timestamp bound.
end_at (object | None) – Optional upper timestamp bound.
max_folds (int | None) – Optional maximum number of folds.
dataset_format (str) – Explicit format or "auto".
order_col (str | int | None) – Optional column used to sort the dataset.
train_time_col (str | int | None) – Optional timestamp column used to assign train rows.
validation_time_col (str | int | None) – Optional timestamp column used to assign validation rows.
test_time_col (str | int | None) – Optional timestamp column used to assign test rows.
title (str | None) – Optional report title.
preview_rows (int) – Number of planned folds to include in the returned preview.

Returns:

JSON-ready dictionary with fold count, plan columns and preview rows.

Return type:

dict[str, Any]

jano.mcp_tools.run_walk_forward(dataset_path, *, partition, step, time_col=None, strategy='rolling', allow_partial=False, engine='auto', start_at=None, end_at=None, max_folds=None, dataset_format='auto', order_col=None, train_time_col=None, validation_time_col=None, test_time_col=None, title=None, preview_rows=20)[source]

Run a walk-forward simulation and return a compact summary.

Parameters:

dataset_path (str) – Local dataset path.
partition (dict[str, Any]) – Dictionary accepted by TemporalPartitionSpec.
step (object) – Step size used by the walk-forward simulation.
time_col (str | int | None) – Timeline column name or position.
strategy (str) – Movement strategy: "single", "rolling" or "expanding".
allow_partial (bool) – Whether to keep a final partial fold.
engine (str) – Internal partition engine preference: "auto", "pandas", "polars" or "numpy".
start_at (object | None) – Optional lower timestamp bound.
end_at (object | None) – Optional upper timestamp bound.
max_folds (int | None) – Optional maximum number of folds.
dataset_format (str) – Explicit format or "auto".
order_col (str | int | None) – Optional column used to sort the dataset.
train_time_col (str | int | None) – Optional timestamp column used to assign train rows.
validation_time_col (str | int | None) – Optional timestamp column used to assign validation rows.
test_time_col (str | int | None) – Optional timestamp column used to assign test rows.
title (str | None) – Optional report title.
preview_rows (int) – Number of summary rows to include in the response.

Returns:

JSON-ready dictionary with fold summary, chart data and rendered HTML.

Return type:

dict[str, Any]

jano.mcp_tools.run_walk_forward_baseline(dataset_path, *, partition, step, time_col=None, target_col=None, feature_cols=None, model='mean', metrics=None, metric_directions=None, retrain='always', retrain_interval=None, drift_metric='rmse', drift_threshold=0.05, drift_baseline='last_retrain', drift_relative=True, strategy='rolling', allow_partial=False, engine='auto', start_at=None, end_at=None, max_folds=None, dataset_format='auto', order_col=None, train_time_col=None, validation_time_col=None, test_time_col=None, include_predictions=False, preview_rows=20, prediction_preview_rows=20)[source]

Run a model-free baseline over temporal folds for MCP clients.

Parameters:

dataset_path (str) – Local dataset path.
partition (dict[str, Any]) – Dictionary accepted by TemporalPartitionSpec.
step (object) – Step size used by the walk-forward simulation.
time_col (str | int | None) – Timeline column name or position.
target_col (str | int | None) – Target column name or position.
feature_cols (list[str | int] | None) – Optional feature columns. Baseline models do not require features, but the argument keeps the MCP surface aligned with WalkForwardRunner.
model (str) – Baseline estimator: "mean" for numeric regression or "majority_class" for classification.
metrics (dict[str, Any] | None) – Mapping of metric names to user-provided callables.
metric_directions (dict[str, str] | None) – Optional mapping declaring "min" or "max" per metric.
retrain (bool | str) – Retraining policy: "always", "never", "periodic", "on_drift", True or False.
retrain_interval (int | None) – Fold interval required by retrain="periodic".
drift_metric (str) – Metric monitored when retrain="on_drift".
drift_threshold (float) – Degradation threshold for drift-based retraining.
drift_baseline (str) – Drift baseline: "last_retrain", "first", "best" or "previous_fold".
drift_relative (bool) – Whether drift threshold is relative or absolute.
strategy (str) – Movement strategy: "single", "rolling" or "expanding".
allow_partial (bool) – Whether to keep a final partial fold.
engine (str) – Internal partition engine preference.
start_at (object | None) – Optional lower timestamp bound.
end_at (object | None) – Optional upper timestamp bound.
max_folds (int | None) – Optional maximum number of folds.
dataset_format (str) – Explicit format or "auto".
order_col (str | int | None) – Optional column used to sort the dataset.
train_time_col (str | int | None) – Optional timestamp column used to assign train rows.
validation_time_col (str | int | None) – Optional timestamp column used to assign validation rows.
test_time_col (str | int | None) – Optional timestamp column used to assign test rows.
include_predictions (bool) – Whether to include a bounded prediction preview.
preview_rows (int) – Number of fold/metric rows returned in previews.
prediction_preview_rows (int) – Number of predictions returned when requested.

Returns:

JSON-ready dictionary with runner summary, fold preview, metric trajectory preview, retraining events and optional prediction preview.

Return type:

dict[str, Any]

jano.mcp_tools.compare_retrain_policies(dataset_path, *, partition, step, time_col=None, target_col=None, feature_cols=None, model='mean', metrics=None, policies=None, strategy='rolling', allow_partial=False, engine='auto', start_at=None, end_at=None, max_folds=None, dataset_format='auto', order_col=None, train_time_col=None, validation_time_col=None, test_time_col=None, preview_rows=20)[source]

Compare retraining policies over the same walk-forward fold geometry.

The MCP surface intentionally uses built-in baseline models instead of accepting arbitrary Python estimators. Use the Python API for custom models.

Parameters:

dataset_path (str)
partition (dict[str, Any])
step (object)
time_col (str | int | None)
target_col (str | int | None)
feature_cols (list[str | int] | None)
model (str)
metrics (dict[str, Any] | None)
policies (list[dict[str, Any]] | None)
strategy (str)
allow_partial (bool)
engine (str)
start_at (object | None)
end_at (object | None)
max_folds (int | None)
dataset_format (str)
order_col (str | int | None)
train_time_col (str | int | None)
validation_time_col (str | int | None)
test_time_col (str | int | None)
preview_rows (int)

Return type:

dict[str, Any]

jano.mcp_tools.find_train_history_window(dataset_path, *, time_col=None, cutoff=None, train_sizes=None, test_size=None, target_col=None, feature_cols=None, model='mean', metrics=None, metric='rmse', tolerance=0.0, relative=True, gap_before_test=None, dataset_format='auto', order_col=None, train_time_col=None, validation_time_col=None, test_time_col=None, preview_rows=20)[source]

Evaluate train-history candidates against one fixed test window.

Parameters:

dataset_path (str)
time_col (str | int | None)
cutoff (object | None)
train_sizes (list[object] | None)
test_size (object | None)
target_col (str | int | None)
feature_cols (list[str | int] | None)
model (str)
metrics (dict[str, Any] | None)
metric (str)
tolerance (float)
relative (bool)
gap_before_test (object | None)
dataset_format (str)
order_col (str | int | None)
train_time_col (str | int | None)
validation_time_col (str | int | None)
test_time_col (str | int | None)
preview_rows (int)

Return type:

dict[str, Any]

jano.mcp_tools.monitor_decay(dataset_path, *, time_col=None, cutoff=None, train_size=None, test_size=None, step=None, target_col=None, feature_cols=None, model='mean', metrics=None, metric='rmse', threshold=0.1, baseline='first', relative=True, gap_before_test=None, max_windows=None, dataset_format='auto', order_col=None, train_time_col=None, validation_time_col=None, test_time_col=None, preview_rows=20)[source]

Evaluate when fixed-train performance decay crosses a threshold.

Parameters:

dataset_path (str)
time_col (str | int | None)
cutoff (object | None)
train_size (object | None)
test_size (object | None)
step (object | None)
target_col (str | int | None)
feature_cols (list[str | int] | None)
model (str)
metrics (dict[str, Any] | None)
metric (str)
threshold (float)
baseline (str | float)
relative (bool)
gap_before_test (object | None)
max_windows (int | None)
dataset_format (str)
order_col (str | int | None)
train_time_col (str | int | None)
validation_time_col (str | int | None)
test_time_col (str | int | None)
preview_rows (int)

Return type:

dict[str, Any]

Helpers de tipos y validación¶

class jano.engines.PartitionEngineMetadata(engine, input_backend, converted)[source]

Execution metadata for the internal partition engine.

Parameters:

engine (str)
input_backend (str)
converted (bool)

engine

Internal engine selected to compute temporal boundaries and indices.

Type:: str

input_backend

Backend detected from the user-provided input.

Type:: str

converted

Whether the full dataset was converted before planning.

Type:: bool

engine: str

input_backend: str

converted: bool

to_dict()[source]

Return metadata as a serializable dictionary.

Return type:: dict[str, object]

class jano.types.SizeSpec(value, kind)[source]

Normalized specification for segment sizes.

Parameters:

value (Timedelta | int | float)
kind (str)

value

Parsed size value as Timedelta, integer row count or fraction.

Type:: pandas.Timedelta | int | float

kind

Unit family for the value: duration, rows or fraction.

Type:: str

value: Timedelta | int | float

kind: str

classmethod from_value(value)[source]

Normalize a raw size value into a typed SizeSpec.

Parameters:: value (str | int | float | Timedelta)
Return type:: SizeSpec

class jano.types.SegmentBoundaries(start, end)[source]

Closed-open boundaries for a named temporal segment.

Parameters:

start (Timestamp)
end (Timestamp)

start: Timestamp

end: Timestamp

class jano.validation.ValidatedPartitionSpec(layout, segments, gaps, tail_gap, size_kind, calendar_frequency=None)[source]

Partition specification after normalization.

Parameters:

layout (str)
segments (Dict[str, SizeSpec])
gaps (Dict[str, SizeSpec])
tail_gap (SizeSpec | None)
size_kind (str)
calendar_frequency (str | None)

layout: str

segments: Dict[str, SizeSpec]

gaps: Dict[str, SizeSpec]

tail_gap: SizeSpec | None

size_kind: str

calendar_frequency: str | None = None

jano.validation.validate_strategy(strategy)[source]

Validate and normalize a split strategy name.

Parameters:: strategy (str)
Return type:: str

jano.validation.validate_partition_spec(partition)[source]

Validate a high-level partition spec and normalize its sizes.

Parameters:: partition (TemporalPartitionSpec)
Return type:: ValidatedPartitionSpec

Navigation

Related Topics

Referencia de API¶

Workflow principal¶

Perfiles de evaluación¶

Policies de retraining¶

Estrategias de actualización online¶

Policies temporales¶

Objetos de fold¶

Objetos de reporting¶

Funciones helper MCP¶

Helpers de tipos y validación¶