network_dataset
canari_ml.data.network_dataset
¶
canari_ml.data.network_dataset.logger = logging.getLogger(__name__)
module-attribute
¶
canari_ml.data.network_dataset.SplittingMixin
¶
Read train, val, test datasets from tfrecord protocol buffer files.
Split and shuffle data if specified as well.
Example
This mixin is not to be used directly, but to give an idea of its use:
Initialise SplittingMixin¶
split_dataset = SplittingMixin()
canari_ml.data.network_dataset.SplittingMixin.batch_size
property
¶
The dataset's batch size.
canari_ml.data.network_dataset.SplittingMixin.dtype
property
¶
The dataset's data type.
canari_ml.data.network_dataset.SplittingMixin.lead_time
property
¶
The number of time steps to forecast.
canari_ml.data.network_dataset.SplittingMixin.num_channels
property
¶
The number of channels in dataset.
canari_ml.data.network_dataset.SplittingMixin.shape
property
¶
The shape of dataset.
canari_ml.data.network_dataset.SplittingMixin.shuffling
property
¶
A flag for whether training dataset(s) are marked to be shuffled.
canari_ml.data.network_dataset.IceNetDataSet(configuration_path, *args, batch_size=4, path=os.path.join('.', 'network_datasets'), shuffling=False, **kwargs)
¶
Bases: SplittingMixin, DataCollection
Initialises and configures a dataset.
It loads a JSON configuration file, updates the _config attribute with the
result, creates a data loader, and methods to access the dataset.
Attributes:
| Name | Type | Description |
|---|---|---|
_config |
A dict used to store configuration loaded from JSON file. |
|
_configuration_path |
The path to the JSON configuration file. |
|
_batch_size |
The batch size for the data loader. |
|
_counts |
A dict with number of elements in train, val, test. |
|
_dtype |
The type of the dataset. |
|
_loader_config |
The path to the data loader configuration file. |
|
_generate_workers |
An integer representing number of workers for parallel processing with Dask. |
|
_lead_time |
An integer representing number of days to predict for. |
|
_num_channels |
An integer representing number of channels (input variables) in the dataset. |
|
_shape |
The shape of the dataset. |
|
_shuffling |
A flag indicating whether to shuffle the data or not. |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
configuration_path
|
str
|
The path to the JSON configuration file. |
required |
*args
|
Additional positional arguments. |
()
|
|
batch_size
|
optional
|
How many samples to load per batch. Defaults to 4. |
4
|
path
|
optional
|
The path to the directory where the processed tfrecord protocol buffer files will be stored. Defaults to './network_datasets'. |
join('.', 'network_datasets')
|
shuffling
|
optional
|
Flag indicating whether to shuffle the data. Defaults to False. |
False
|
*args
|
Additional keyword arguments. |
()
|
Source code in src/canari_ml/data/network_dataset.py
canari_ml.data.network_dataset.IceNetDataSet.loader_config
property
¶
The path to the JSON loader configuration file stored in the dataset config file.
canari_ml.data.network_dataset.IceNetDataSet.channels
property
¶
The list of channels (variable names) specified in the dataset config file.
canari_ml.data.network_dataset.IceNetDataSet.counts
property
¶
A dict with number of elements in train, val, test in the config file.