Temporal Discretization Clocks¶

Jano always partitions temporal processes. What changes is the clock used to advance the evaluation:

Calendar clock: days, hours, weeks or months.
Row/event clock: every event or every N observed rows.
Micro-batch clock: each observed batch in an online stream.
Business clock: a user-defined trigger that marks retraining checkpoints.

All clocks are causal: data observed later must not influence decisions that would have been made earlier. Event-based updates are therefore not outside time; they are another way to discretize the same temporal line. When event X triggers retraining, Jano records the associated timestamp, turning accumulated evidence into an auditable temporal checkpoint.

Calendar-Driven Partitioning¶

Calendar-driven partitioning is the default mode for backtesting tabular ML systems. It answers questions such as:

What would performance have looked like if the model had been retrained every day?
How much history should the train window contain?
How does a fixed model decay across future time windows?

Use TemporalBacktestSplitter directly when you want manual control over the fold loop:

from jano import TemporalBacktestSplitter

splitter = TemporalBacktestSplitter(
    time_col="timestamp",
    train_size="30D",
    test_size="7D",
    step="7D",
    strategy="rolling",
)

for train_idx, test_idx in splitter.split(frame):
    train = frame.iloc[train_idx]
    test = frame.iloc[test_idx]

Use WalkForwardPolicy or TemporalSimulation when you want Jano to build a plan, run the folds, and expose auditable outputs.

Observation-Driven Online Partitioning¶

Observation-driven online partitioning is not a separate non-temporal mode. It is a causal online evaluation pattern over the same timeline: initialize a model, predict the next event or micro-batch, observe the target, update the model, and repeat.

This is useful when the operational clock is not only calendar time, but also the amount of evidence accumulated since the last update.

Use OnlineTemporalRunner with PartialFitUpdateStrategy when the model supports real incremental updates through partial_fit:

from jano import OnlineTemporalRunner, PartialFitUpdateStrategy

runner = OnlineTemporalRunner(
    model=model,
    time_col="timestamp",
    target_col="target",
    feature_cols=["feature_a", "feature_b"],
    initial_train_size="30D",
    update_size=1,
    metrics={"mae": mae, "rmse": rmse},
    update_strategy=PartialFitUpdateStrategy(),
)

run = runner.run(frame)
print(run.to_frame().head())
print(run.metric_trajectory().head())
print(run.summary())

The sequence is deliberately causal:

initialize the model on the initial train window,
predict the next event or micro-batch,
score the prediction once the target is observed,
update the model with that observed batch,
repeat.

update_size=1 means one temporal tick per observed event. You can also use row batches such as update_size=100 or duration batches such as update_size="1D". This lets you compare event-level, row-batch and calendar-batch clocks without changing the rest of the runner configuration.

User-Defined Retrain Checkpoints¶

Online evaluation can also mark the exact temporal checkpoint where your own logic says retraining has become necessary. Pass retrain_trigger to OnlineTemporalRunner. The trigger receives the accumulated online history and the latest scored batch:

def should_retrain(history, latest):
    if latest["mae"] > 0.15:
        return {
            "retrain": True,
            "reason": "mae crossed production tolerance",
            "score": latest["mae"],
        }
    return False

runner = OnlineTemporalRunner(
    model=model,
    time_col="timestamp",
    target_col="target",
    feature_cols=["feature_a", "feature_b"],
    initial_train_size="30D",
    update_size=100,
    metrics={"mae": mae},
    update_strategy=PartialFitUpdateStrategy(),
    retrain_trigger=should_retrain,
)

run = runner.run(frame)
print(run.retrain_checkpoints())

The trigger can return True, a reason string, or a dictionary such as {"retrain": True, "reason": "...", "score": 0.23}. Jano records the batch number, timestamps, row counts, metrics and optional trigger metadata. The drift or business-loss rule remains user-owned; Jano makes the resulting retraining checkpoint explicit and reproducible.

Not every estimator supports partial_fit. For regular fit/predict models, use RefitUpdateStrategy instead:

from jano import OnlineTemporalRunner, RefitUpdateStrategy

runner = OnlineTemporalRunner(
    model=model,
    time_col="timestamp",
    target_col="target",
    feature_cols=["feature_a", "feature_b"],
    initial_train_size="30D",
    update_size="1D",
    metrics={"mae": mae},
    update_strategy=RefitUpdateStrategy(max_train_rows=10_000),
)

This strategy refits after each observed batch. It is more expensive than partial_fit, but it works with standard estimators and can keep bounded history through max_train_rows.

Finding an Observation-Driven Update Clock¶

OnlineUpdatePolicyStudy compares multiple update cadences over the same temporal stream. That lets you ask whether model updates should be triggered by calendar time, row count, or accumulated evidence:

from jano import OnlineUpdatePolicy, OnlineUpdatePolicyStudy, RefitUpdateStrategy

study = OnlineUpdatePolicyStudy(
    model=model,
    time_col="timestamp",
    target_col="target",
    feature_cols=["feature_a", "feature_b"],
    initial_train_size="30D",
    policies=[
        OnlineUpdatePolicy("every-event", update_size=1, update_strategy=RefitUpdateStrategy()),
        OnlineUpdatePolicy("every-100-events", update_size=100, update_strategy=RefitUpdateStrategy()),
        OnlineUpdatePolicy("daily", update_size="1D", update_strategy=RefitUpdateStrategy()),
    ],
    metrics={"mae": mae},
)

comparison = study.run(frame)

print(comparison.to_frame())
print(comparison.metric_trajectory().head())
print(comparison.find_optimal_policy(metric="mae", update_cost_weight=0.01))

The optional update_cost_weight penalizes policies that update too often. This keeps the output data-first while making the tradeoff explicit: a policy can win because it predicts better, because it updates less, or because it offers the best cost-adjusted compromise.

Navigation

Related Topics

Temporal Discretization Clocks¶

Calendar-Driven Partitioning¶

Observation-Driven Online Partitioning¶

User-Defined Retrain Checkpoints¶

Finding an Observation-Driven Update Clock¶