Benchmark

This page summarizes a local benchmark of Jano’s adaptive partition engine.

The goal is to measure the cost of temporal partitioning itself: computing fold boundaries, row counts and split indices. It does not include model training time.

What Was Measured

Two operations were measured:

splitter.plan(data)
sum(1 for _ in splitter.split(data))

plan() precomputes fold geometry and row counts. split() yields the positional index arrays for every fold.

Configuration Used

The same splitter configuration was used for every input backend:

  • strategy: rolling

  • layout: train_test

  • train_size="2D"

  • test_size="12h"

  • step="12h"

  • dataset frequency: one row per minute

  • repetitions: median of 7 timed runs after 2 warmups

  • dataset sizes: 10k, 100k and 500k rows

The benchmark compares:

  • engine="auto": Jano chooses the native safe path for the input.

  • engine="pandas": Jano forces the stable pandas path.

For Polars and NumPy inputs, engine="pandas" is a useful baseline for the previous behavior because it includes converting the input to pandas before partitioning.

Raw Results

Rows

Input

Engine arg

Selected engine

Converted

Folds

Plan ms

Split ms

10,000

pandas

auto

pandas

no

9

0.26

0.33

10,000

pandas

pandas

pandas

no

9

0.26

0.32

10,000

polars

auto

polars

no

9

0.26

0.32

10,000

polars

pandas

pandas

yes

9

6.09

6.07

10,000

numpy

auto

numpy

no

9

0.26

0.31

10,000

numpy

pandas

pandas

yes

9

0.36

0.41

100,000

pandas

auto

pandas

no

134

1.71

2.53

100,000

pandas

pandas

pandas

no

134

1.70

2.53

100,000

polars

auto

polars

no

134

1.71

2.51

100,000

polars

pandas

pandas

yes

134

56.82

58.95

100,000

numpy

auto

numpy

no

134

1.86

2.59

100,000

numpy

pandas

pandas

yes

134

1.90

2.72

500,000

pandas

auto

pandas

no

690

8.65

12.40

500,000

pandas

pandas

pandas

no

690

8.61

12.33

500,000

polars

auto

polars

no

690

8.58

12.37

500,000

polars

pandas

pandas

yes

690

296.48

304.73

500,000

numpy

auto

numpy

no

690

9.14

13.00

500,000

numpy

pandas

pandas

yes

690

9.42

13.40

Speedup Versus Forced Pandas

Rows

Input

Plan speedup

Split speedup

Engine path

10,000

pandas

0.98x

0.98x

pandas -> pandas

10,000

polars

23.28x

19.17x

pandas -> polars

10,000

numpy

1.38x

1.32x

pandas -> numpy

100,000

pandas

0.99x

1.00x

pandas -> pandas

100,000

polars

33.29x

23.45x

pandas -> polars

100,000

numpy

1.02x

1.05x

pandas -> numpy

500,000

pandas

1.00x

0.99x

pandas -> pandas

500,000

polars

34.57x

24.64x

pandas -> polars

500,000

numpy

1.03x

1.03x

pandas -> numpy

Visual Summary

The bars below show split-time speedup for engine="auto" against the forced pandas baseline. Higher is better.

10k rows
pandas 0.98x
numpy 1.32x
polars 19.17x
100k rows
pandas 1.00x
numpy 1.05x
polars 23.45x
500k rows
pandas 0.99x
numpy 1.03x
polars 24.64x

How To Read These Results

The main improvement is for Polars inputs. Before the adaptive engine, Polars inputs had to be fully converted to pandas before partitioning. With engine="auto", Jano keeps Polars column extraction native for planning and split-index generation.

The pandas path is intentionally unchanged. Pandas remains the stable baseline and the engine selected for pandas input.

The NumPy path improves modestly in this benchmark because the measured structured-array input already converts cheaply. NumPy remains useful as a low-overhead input path, but the largest observed gain is avoiding the Polars-to-pandas conversion on larger datasets.

These timings are local and should be read as directional. They are most useful for understanding partition-engine overhead; full model simulations will also include feature engineering, model fitting and prediction time.