Downstream App: F10.7

Data Module

class sdofmv2.tasks.f107.f107_datamodule.EmbSolarProxyDataset(*args: Any, **kwargs: Any)[source]

Bases: SDOMLDataset

A dataset class for solar proxy prediction using SDO multi-instrument data.

This class extends SDOMLDataset to include the F10.7 solar proxy as the target variable for supervised learning tasks. It retrieves aligned image data from AIA and HMI instruments and pairs them with the corresponding normalized F10.7 index.

Parameters:

aligndata (pd.DataFrame) – Aligned temporal indexes and proxy values. Must contain a ‘f107_norm’ column for the target variable.
hmi_data (zarr.hierarchy.Group) – Zarr dataset containing HMI magnetogram observations.
aia_data (zarr.hierarchy.Group) – Zarr dataset containing AIA EUV/UV image observations.
eve_data (zarr.hierarchy.Group) – Zarr dataset containing EVE irradiance observations.
components (list[str]) – List of magnetic components to load for HMI (e.g., [‘Bx’, ‘By’, ‘Bz’]).
wavelengths (list[str] or list[int]) – List of channels to load for AIA (e.g., [171, 193, 211]).
ions (list[str]) – List of spectral lines/ions to load for EVE.
freq (str) – The temporal cadence used for rounding and aligning the time series (e.g., ‘12min’).
months (list[int]) – List of valid months (1-12) to include in the dataset.
normalization (dict, optional) – The normalization strategy to apply during data loading. Defaults to None.
normalization_stat (dict, optional) – Pre-computed statistics required for the chosen normalization. Defaults to None.
mask (torch.Tensor, optional) – HMI limb mask to apply to the spatial data. Defaults to None.
num_frames (int, optional) – The number of consecutive temporal frames to load per sequence sample. Defaults to 1.
drop_frame_dim (bool, optional) – If True and num_frames is 1, drops the temporal dimension. Defaults to False.
min_date (str or datetime, optional) – The earliest date boundary to include in the dataset. Defaults to None.
max_date (str or datetime, optional) – The latest date boundary to include in the dataset. Defaults to None.
get_header (bool or list, optional) – Whether to retrieve and return header metadata alongside the image tensors. Defaults to False.
precision (str, optional) – The floating-point precision for the output tensors (e.g., “32”, “16”). Defaults to “32”.

Returns:

A tuple containing:

image_stack (torch.Tensor): Multimodal image data tensor.
timestamps (int or np.ndarray): Unix timestamps for the frames.
target (torch.Tensor): Normalized F10.7 solar proxy values.

Return type:

tuple

class sdofmv2.tasks.f107.f107_datamodule.EmbSolarProxyDataModule(*args: Any, **kwargs: Any)[source]

Bases: SDOMLDataModule

PyTorch Lightning DataModule for solar proxy prediction using SDO data.

This class manages the loading, preprocessing, and splitting of multi-instrument SDO data (HMI, AIA, EVE) paired with F10.7 solar proxy values. It handles temporal alignment between the SDO observations and the proxy data provided in a CSV file.

Parameters:

hmi_path (str) – Path to the HMI Zarr dataset.
aia_path (str) – Path to the AIA Zarr dataset.
eve_path (str) – Path to the EVE Zarr dataset.
components (list[str]) – List of HMI magnetic components to load.
wavelengths (list[str] or list[int]) – List of AIA wavelengths to load.
ions (list[str]) – List of EVE spectral lines/ions to load.
frequency (str) – Temporal cadence for data alignment (e.g., ‘12min’).
batch_size (int, optional) – Number of samples per batch. Defaults to 32.
num_workers (int, optional) – Number of subprocesses for data loading. Defaults to None.
pin_memory (bool, optional) – If True, copies tensors into CUDA pinned memory before returning them. Defaults to False.
persistent_workers (bool, optional) – If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. Defaults to False.
val_months (list[int], optional) – Months to use for the validation set. Defaults to [10, 1].
test_months (list[int], optional) – Months to use for the test set. Defaults to [11, 12].
holdout_months (list[int], optional) – Months to exclude from all sets. Defaults to [].
normalization (bool or str, optional) – Normalization strategy to apply. Defaults to False.
cache_dir (str, optional) – Directory to store cached normalization statistics. Defaults to “”.
norm_stat_tag (str, optional) – Tag for identifying specific normalization statistics. Defaults to “”.
apply_mask (bool, optional) – Whether to apply the HMI limb mask. Defaults to True.
num_frames (int, optional) – Number of consecutive frames per sample. Defaults to 1.
drop_frame_dim (bool, optional) – Whether to drop the temporal dimension if num_frames is 1. Defaults to False.
min_date (str or datetime, optional) – Earliest date to include. Defaults to None.
max_date (str or datetime, optional) – Latest date to include. Defaults to None.
precision (str, optional) – Floating-point precision (“32” or “16”). Defaults to “32”.
ds_data_path (str, optional) – Path to the CSV file containing F10.7 proxy data. Defaults to None.

Returns:

The class provides methods (train_dataloader, val_dataloader,: test_dataloader) that return PyTorch DataLoaders yielding batches of (image_stack, timestamps, target).

Return type:

DataLoader

Methods

`setup`([stage])
`test_dataloader`()
`train_dataloader`()
`val_dataloader`()

Model Module

class sdofmv2.tasks.f107.f107_module.MultiLayerPerceptron(*args: Any, **kwargs: Any)[source]

Bases: BaseModule

Multi-layer perceptron head for processing backbone features.

This class implements a regression or classification head that sits on top of a pre-trained backbone. It extracts latent representations from the backbone, aggregates patch tokens using both mean and max pooling, and processes the combined features through a series of fully connected layers.

Parameters:

backbone (nn.Module) – The feature extraction model containing an autoencoder.
freeze (bool) – Whether to freeze the backbone parameters to prevent training.
input_dim (int) – The dimensionality of the backbone’s latent features. The internal MLP input dimension is twice this value due to the concatenation of mean and max pooled features.
output_dim (int, optional) – The number of output units. Defaults to 1.
hidden_layer_dims (list[int], optional) – Dimensions of the hidden MLP layers. Defaults to [512, 512, 512].
dropout (float, optional) – Dropout probability for regularization. Defaults to 0.0.
mask_ratio (float, optional) – Fraction of input patches to mask during the forward pass. Defaults to 0.0.
optimizer_dict (dict, optional) – Configuration for the optimizer. Defaults to None.
scheduler_dict (dict, optional) – Configuration for the learning rate scheduler. Defaults to None.

Returns:

The output logits or predictions from the final linear layer.

Return type:

torch.Tensor

Methods

`forward`(x)	Processes input through the backbone and MLP head.
`on_before_optimizer_step`(optimizer)
`on_train_start`()
`test_step`(batch, batch_idx)
`training_step`(batch, batch_idx)	Perform a single training step.
`validation_step`(batch, batch_idx)	Perform a single validation step.

forward(x)[source]

Processes input through the backbone and MLP head.

Parameters:: x (torch.Tensor) – Input image or data tensor.
Returns:: Output logits of shape (batch_size, output_dim).
Return type:: torch.Tensor

training_step(batch, batch_idx)[source]

Perform a single training step.

Parameters:

batch – The training batch data.
batch_idx – The index of the current batch.

Raises:

NotImplementedError – Subclasses must implement this method.

validation_step(batch, batch_idx)[source]

Perform a single validation step.

Parameters:

batch – The validation batch data.
batch_idx – The index of the current batch.

Raises:

NotImplementedError – Subclasses must implement this method.