Skip to content

Prediction Output Structure

When you run the canari_ml predict command, the output is written to the outputs/ directory under the corresponding training run folder. The prediction outputs are organised by forecast date, model run, and include logs, raw predictions, and a reference to the input dataset used for inference.


Overview

The prediction pipeline performs the following steps:

  • Loads the trained model checkpoint from a specified training run.
  • Loads the corresponding prediction dataset (preprocessed but not yet seen by the model).
  • Generate predictions from the trained model.
  • Saves raw predictions in .npy format, with one file per forecast start date.
  • Logs the prediction generation and symlinks back to the dataset used for traceability.

Simplified Output Tree

outputs/
└── demo_train/                      # Parent directory corresponding to the training run
    └── prediction/
        └── 1979-01-26/              # Prediction name defined in the config
            ├── 42/                  # Seed for this prediction (e.g. from training)
               ├── raw_predictions/
                  └── 1979_01_26.npy  # NumPy array of raw model outputs for this forecast
               └── predict_2025-09-16_11-08-52.log  # Log file with timestamp of prediction run
            └── cache_dir -> ../../../../preprocessed_data/predict_1976_example/03_cache_1976_example/

Where:

  • demo_train/ is the name of the original training run the model was trained under.
  • 1979-01-26/ is the name of the prediction run, specified via the training config (predict.name) or CLI flag.
  • 42/ is the seed number (Matches the training seed).
  • Symlinks are created to the cached input dataset (cache_dir) used during prediction.

Output Breakdown

prediction/<forecast_date>/

  • Each forecast date is organised into its own directory under prediction/.

42/

  • A single prediction run corresponding to a trained model. This is the same as the seed used for the trained model.

raw_predictions/

  • Contains .npy files of the model's raw output.
  • File naming format: YYYY_MM_DD.npy, where the date corresponds to the forecast initialisation date.

predict_<timestamp>.log

  • Log file for the prediction run.
  • Includes configuration used, model checkpoint, input paths, and prediction status.

cache_dir -> preprocessed_data/predict_1976_example/03_cache_1976_example/

  • Points to the prediction dataset used during inference.
  • Ensures that the same inputs can be used for postprocessing or re-running prediction.

Summary

After running canari_ml predict:

  • Prediction results are stored under outputs/<train.name>/prediction/<predict.name>/.
  • All prediction data, logs, and references are grouped by the forecast start date.
  • Each run includes:
    • A .npy file with the raw predictions.
    • A symlink back to the input dataset (cache_dir) used for inference.
    • A detailed prediction log for traceability and debugging.