Prediction Output Structure¶
When you run the canari_ml predict command, the output is written to the outputs/ directory under the corresponding training run folder. The prediction outputs are organised by forecast date, model run, and include logs, raw predictions, and a reference to the input dataset used for inference.
Overview¶
The prediction pipeline performs the following steps:
- Loads the trained model checkpoint from a specified training run.
- Loads the corresponding prediction dataset (preprocessed but not yet seen by the model).
- Generate predictions from the trained model.
- Saves raw predictions in
.npyformat, with one file per forecast start date. - Logs the prediction generation and symlinks back to the dataset used for traceability.
Simplified Output Tree¶
outputs/
└── demo_train/ # Parent directory corresponding to the training run
└── prediction/
└── 1979-01-26/ # Prediction name defined in the config
├── 42/ # Seed for this prediction (e.g. from training)
│ ├── raw_predictions/
│ │ └── 1979_01_26.npy # NumPy array of raw model outputs for this forecast
│ └── predict_2025-09-16_11-08-52.log # Log file with timestamp of prediction run
└── cache_dir -> ../../../../preprocessed_data/predict_1976_example/03_cache_1976_example/
Where:
demo_train/is the name of the original training run the model was trained under.1979-01-26/is the name of the prediction run, specified via the training config (predict.name) or CLI flag.42/is the seed number (Matches the training seed).- Symlinks are created to the cached input dataset (
cache_dir) used during prediction.
Output Breakdown¶
prediction/<forecast_date>/
- Each forecast date is organised into its own directory under
prediction/.
42/
- A single prediction run corresponding to a trained model. This is the same as the seed used for the trained model.
raw_predictions/
- Contains .npy files of the model's raw output.
- File naming format:
YYYY_MM_DD.npy, where the date corresponds to the forecast initialisation date.
predict_<timestamp>.log
- Log file for the prediction run.
- Includes configuration used, model checkpoint, input paths, and prediction status.
Symlinks¶
cache_dir -> preprocessed_data/predict_1976_example/03_cache_1976_example/
- Points to the prediction dataset used during inference.
- Ensures that the same inputs can be used for postprocessing or re-running prediction.
Summary¶
After running canari_ml predict:
- Prediction results are stored under
outputs/<train.name>/prediction/<predict.name>/. - All prediction data, logs, and references are grouped by the forecast start date.
- Each run includes:
- A
.npyfile with the raw predictions. - A symlink back to the input dataset (
cache_dir) used for inference. - A detailed prediction log for traceability and debugging.
- A