Benchmark ========= This page summarizes a local benchmark of temporal partition generation across the currently supported tabular backends. What was measured ----------------- The benchmark measures the total time needed to materialize all folds from: .. code-block:: python list(splitter.iter_splits(data)) Configuration used ------------------ The same splitter configuration was used for every backend: - strategy: ``rolling`` - layout: ``train_test`` - ``train_size="3D"`` - ``test_size="12h"`` - ``gap_before_test="30min"`` - ``step="6h"`` - dataset frequency: one row per minute - metric: wall-clock runtime over the full split iteration - repetitions: 3 per backend and dataset size The benchmark was run locally on the current implementation, where pandas is still the internal execution engine. That means the ``numpy`` and ``polars`` timings include the cost of normalizing those inputs to pandas before splitting. So this benchmark should be read as an end-to-end benchmark of the public API as it behaves today, not as a native backend-versus-backend comparison. Results ------- .. list-table:: :header-rows: 1 * - Backend - Rows - Folds - Mean ms - Min ms - Max ms * - pandas - 10,000 - 14 - 7.581 - 5.478 - 11.634 * - numpy - 10,000 - 14 - 4.536 - 4.208 - 5.181 * - polars - 10,000 - 14 - 10.767 - 10.657 - 10.825 * - pandas - 100,000 - 264 - 10.544 - 8.264 - 14.778 * - numpy - 100,000 - 264 - 19.789 - 18.930 - 20.940 * - polars - 100,000 - 264 - 65.366 - 62.823 - 69.806 * - pandas - 500,000 - 1,375 - 24.592 - 22.083 - 29.403 * - numpy - 500,000 - 1,375 - 94.801 - 91.859 - 97.771 * - polars - 500,000 - 1,375 - 294.843 - 289.612 - 299.288 * - pandas - 1,000,000 - 2,764 - 44.719 - 38.910 - 47.781 * - numpy - 1,000,000 - 2,764 - 183.353 - 177.574 - 190.390 * - polars - 1,000,000 - 2,764 - 587.886 - 583.358 - 592.276 Visual summary -------------- Below, each bar shows the mean runtime for a given dataset size. The scale is relative within each row so the shape remains readable. .. raw:: html