Getting Started

This guide provides instructions on how to set up your environment, download the required data, and run the SDOFMv2 scripts.

Environment Setup

Prerequisites

  • Linux or macOS

  • Python 3.11+

  • NVIDIA GPU + CUDA toolkit (recommended for training)

Installation

We use mamba (or conda) for fast dependency resolution.

Note

Hardware Note: sdofmv2_environment.yml is configured for CUDA 12.8 by default. If your system requires a different CUDA version (e.g., 11.8), edit the pip section in sdofmv2_environment.yml before running setup — change cu128 to the appropriate tag (e.g., cu118).

# Clone the repository
git clone https://github.com/Joaggi/sdofmv2.git
cd sdofmv2

# Create and activate the environment
# (installs PyTorch and the local package automatically)
mamba env create -f sdofmv2_environment.yml
mamba activate sdofmv2

Data Preparation

SDOFMv2 uses the SDOMLv2 dataset — a curated, multi-instrument dataset for the Solar Dynamics Observatory, hosted on NASA’s HDRL S3 bucket. Data is streamed via s3fs and stored in the Zarr format.

Dataset Components

Component

Instrument

Data Type

Description

aia

AIA

EUV Images

9 extreme ultraviolet channels (94 Å, 131 Å, 171 Å, 193 Å, 211 Å, 304 Å, 335 Å, 1600 Å, 1700 Å), capturing the solar atmosphere

hmi

HMI

Magnetograms

3-component vector magnetic field (Bx, By, Bz) for the solar photosphere

Warning

Zarr datasets require significant local disk space. Verify your target drive has sufficient capacity before downloading.

Downloading the Data

The download script is resumable — it checks for existing local files and only fetches what’s missing.

# Download AIA only
python scripts/download_data.py --target /path/to/your/storage --component aia

# Download HMI only
python scripts/download_data.py --target /path/to/your/storage --component hmi

# Download the full dataset
python scripts/download_data.py --target /path/to/your/storage --component both

Training & Evaluation

Pretraining

python scripts/pretrain.py --config-name pretrain_mae_AIA.yaml

Evaluation

python scripts/test.py --config-name pretrain_mae_AIA.yaml

Downstream Finetuning

# Example: solar wind forecasting
python scripts/finetuning_solarwind.py --config-name finetune_solarwind_config.yaml

Configuration files for all tasks are in configs/downstream/. Notebook-based walkthroughs are available in notebooks/downstream_apps/.