OmegaConf, Hydra, MLflow

OmegaConf

OmegaConf is a library to handle configurations in Python, typically making a bridge between:

configuration files (YAML, JSON, …)
configuration objects in Python code (dictionaries, dataclasses, record, …)
command line arguments

Typically, a yaml configuration is used, loaded with OmegaConf, possibly overridden with command line arguments, and then used in the code. It can also handle command line arguments without any configuration file, merging multiple source of configuration, etc…

Example of a YAML configuration file:

model:
  type: "resnet50"
  learning_rate: 0.001
  batch_size: 32
dataset:
  name: cifar10
logging:
  iterations: [100, 200, 1000]
  name: "bs_${model.batch_size}" # see tutorial for more interpolation
                                 # with default, env vars, etc...

Example of a variety of command line invocation:

pip install omegaconf
python train.py
python train.py model.learning_rate=2e-4
python train.py dataset.name=CIFAR100 model.type=resnet18

Example of Python code using OmegaConf

from omegaconf import OmegaConf

cfg = OmegaConf.load("config.yml")
cfg.merge_with_cli()

print(cfg.model.type)
print(cfg.model["learning_rate"])
print(cfg.dataset.name)
print("# Modifying and dumping to yaml")
cfg.dataset.name = cfg.dataset.name.lower()
print(OmegaConf.to_yaml(cfg, resolve=True)) # resolve/interpolate

Example output

when run with:

python train.py model.learning_rate=2e-4 model.batch_size=8

The above code, with the above config.yml, will output:

resnet50
0.0002
cifar10
# Modifying and dumping to yaml
model:
  type: resnet50
  learning_rate: 0.0002
  batch_size: 8
dataset:
  name: cifar10
logging:
  iterations:
  - 100
  - 200
  - 1000
  name: bs_8

Hydra

Hydra builds on top of OmegaConf and handles the main entry point of your application. It provides a decorator: you can annotate your main function with @hydra.main to wrap it so that it automatically loads a config file, handles command line arguments, etc.

Compared to using OmegaConf directly, Hydra provides:

automatic loading of configuration files (and allowing to pass its path/name as command line argument) that get fed to the main function,
a concept of config directory that will be used to load config files (the main one and possible config chunks),
at the command line, a distinction between overriding config values (with the OmegaConf syntax key=value) and adding new config values (with +new_key=new_value), which may detect some typos,
the ability to compose multiple config files and override specific parts of the configuration (see config groups in the documentation).

Important

💚

Beware: Hydra expects you to use .yaml as file extension for configuration files, not .yml (and config names do not include an extension).
Hydra writes automatically output files (logs, config files, etc…) in a new folder for each run. By default, these folders are created in ./outputs/YYYY-MM-DD/HH-MM-SS relative to the current working directory.
Multiruns (parameter sweeps, see below) create folders within ./multirun/ instead of ./outputs/.
Looking at the command line syntax page of Hydra documentation is very useful, especially if you are not very familiar with the shell.

Example of Python code using Hydra

import hydra
from omegaconf import DictConfig, OmegaConf

@hydra.main(config_path="conf/", config_name="config", version_base="1.3")
def train(cfg: DictConfig) -> None:
    if 'quiet' in cfg and cfg.quiet:
        return
    print("# Accessing config values")
    print(cfg.model.type)
    print(cfg.model["learning_rate"])
    print(cfg.dataset.name)

    print("# Modifying and dumping to yaml")
    cfg.dataset.name = cfg.dataset.name.lower()
    print(repr(cfg))
    print(OmegaConf.to_yaml(cfg, resolve=True)) # resolve/interpolate


if __name__ == "__main__":
    train()

One can run the above code, using the conf/config.yaml OmegaConf, with:

python train.py

Hydra multiruns (parameter sweeps), from the command line

Hydra provides an easy way to launch multiple runs with different configurations (parameter sweeps). You can do it from the command line, using the -m option and specifying space-separated list of values for some keys.

For instance, with:

python train.py -m model.learning_rate=0.001,0.0001 model.batch_size=16,32,64,128,256 +quiet=True

Hydra will then run all the 10 combinations (2 learning rates x 5 batch sizes).

Hydra multirun from the config file

If your sweeps are core and you don’t want to pass them at the command line, you can also specify them in a config file, under the hydra key.

# ...
hydra:
  sweeper:
    params:
      model.learning_rate: 0.1,0.01,0.001,0.0001

Still running with -m:

python train.py -m

You can also use hydra config groups to specify sweeps. For instance, you can create a file conf/hydra/sweeplr.yaml (the folder is very important, see groups) with the following content:

sweeper:
  params:
    model.learning_rate: 0.1,0.01,0.001,0.0001

Then, you can launch the multirun by asking to add the config with:

python train.py -m +hydra=sweeplr

To combine several chunks, you can pass a list:

python train.py -m +hydra='[sweeplr,sweepbatch]'

Above the files are in the conf/hydra/ folder and can only contribute to the hydra key of the configuration. To decorellate these two aspects (directory name and config key), you can have files that contribute to the main configuration, for instance in conf/sweep/batch.yaml:

💚

# @package hydra.sweeper.params
model.learning_rate: 0.1,0.01,0.001,0.0001

# @package _global_
hydra:
  sweeper:
    params:
      model.batch_size: 16,32,64,128,256

Then you can launch the multirun with:

💚

python train.py -m +sweeps=lr
python train.py -m +sweeps='[batch,lr]'
python train.py -m -cn config2 +sweeps='[batch,lr]'

Hydra structured configs

To have better type checking and autocompletion, you can define structured configs with python dataclasses (or pydantic models). See the structured config tutorial for all details.

We introduce here only one approach, which replaces the base yaml config file with a python dataclass. Still, all override, multiruns, config groups, etc… work as before.

The main python imports the config dataclass and uses it as type for the config argument of the main function. This way tools can type check it.

from config import Config   ######## <<<<-----
import hydra
from omegaconf import OmegaConf

@hydra.main(config_path="conf/", config_name="config", version_base="1.3")
def train(cfg: Config) -> None: ###### <<<<-----
    if 'quiet' in cfg and cfg.quiet:
        return
    #print(cfg.problem)
    print(OmegaConf.to_yaml(cfg, resolve=True)) # resolve/interpolate


if __name__ == "__main__":
    train()

The configuration dataclass itself looks like that (it needs to be defined and then registered with OmegaConf).

💚

from dataclasses import field
from dataclasses import dataclass
#from pydantic.dataclasses import dataclass # for more pydantic features

@dataclass
class ModelConfig:
    type: str = "resnet50"
    learning_rate: float = 0.001
    batch_size: int = 32

@dataclass
class DatasetConfig:
    name: str = "cifar10"

@dataclass
class LoggingConfig:
    iterations: list[int] = (100, 200, 1000)
    name: str = "bs_${model.batch_size}"

@dataclass
class Config:
    model: ModelConfig = field(default_factory=ModelConfig)
    dataset: DatasetConfig = field(default_factory=DatasetConfig)
    logging: LoggingConfig = field(default_factory=LoggingConfig)
    quiet: bool | None = None

# Register it
from hydra.core.config_store import ConfigStore
cs = ConfigStore.instance()
cs.store(name="config", node=Config)

Example with pydantic BaseModel’s

It might be better to use BaseModel from pydantic instead of dataclass for better validation
BUT… it seems omegaconf does not accept pydantic models directly.

from pydantic import BaseModel
class ModelConfig(BaseModel):
    type: str = "resnet50"
    learning_rate: float = 0.001
    batch_size: int = 32
class DatasetConfig(BaseModel):
    name: str = "cifar10"
class LoggingConfig(BaseModel):
    iterations: list[int] = [100, 200, 1000]
    name: str = "bs_${model.batch_size}"
class Config(BaseModel):
    model: ModelConfig = ModelConfig()
    dataset: DatasetConfig = DatasetConfig()
    logging: LoggingConfig = LoggingConfig()
    quiet: bool | None = None

# Register it
from hydra.core.config_store import ConfigStore
cs = ConfigStore.instance()
cs.store(name="config", node=Config)

If one wants to have the config file (python dataclass) in the same folder as the usual config files (yaml), it can be. One just have to place it in conf/ and import it properly.

from conf.config import Config   ######## <<<<-----

# ... the rest is unchanged...

Hydra typing but keeping yaml base config file

This approach is hybrid: you keep the base config file as yaml, but you define a dataclass for typing only. It is not DRY (Don’t Repeat Yourself) but it might be preferred in some cases. Indeed, the python typing dataclass becomes much cleaner:

cleaner config.py (using dataclasses, so not validating)

from dataclasses import dataclass

@dataclass
class ModelConfig:
    type: str
    learning_rate: float
    batch_size: int

@dataclass
class DatasetConfig:
    name: str

@dataclass
class LoggingConfig:
    iterations: list[int]
    name: str

@dataclass
class Config:
    model: ModelConfig
    dataset: DatasetConfig
    logging: LoggingConfig
    quiet: bool | None

Then the main code is unchanged (it imports the Config dataclass from config.py).

It is recommended to use BaseModel from pydantic instead of dataclass for better validation.

💚

from pydantic import BaseModel

class ModelConfig(BaseModel):
    type: str
    learning_rate: float
    batch_size: int

class DatasetConfig(BaseModel):
    name: str

class LoggingConfig(BaseModel):
    iterations: list[int]
    name: str

class Config(BaseModel):
    model: ModelConfig
    dataset: DatasetConfig
    logging: LoggingConfig
    quiet: bool | None

And in the main code, to explicitly validate the config (to catch e.g. typos):

💚

...
def train(cfg: Config) -> None:
    Config.model_validate(cfg)
    ...

Running slurm with Hydra (launchers)

See [https://hydra.cc/docs/plugins/submitit_launcher/](Hydra submitit launcher) for all details.

For this step we will typically run the script from the slurm front (labslurm), that have shared file system access to the compute nodes (labcompute-<id>).

In addition to our config (that we keep to be able to also run fully normally), we can create a new config that:

inherits from the base config (so that we don’t repeat ourselves)
selects the slurm launcher
configures some default slurm parameters (partition, gpus, cpus, memory, time, etc…)

defaults:
  - config
  - override hydra/launcher: submitit_slurm
hydra:
  launcher:
    nodes: 1
    name: ${hydra.job.name}
    _target_: hydra_plugins.hydra_submitit_launcher.submitit_launcher.SlurmLauncher
    partition: "GPU,GPU-DEPINFO"
    gres: "gpu:1"
    cpus_per_task: 2
    mem_per_cpu: 32G
    timeout_min: 1200
    constraint: "[gpu24G]"

We can also have some specific “groups” for typical overrides, for instance:

# @package hydra.launcher
partition: "GPU"

Then we can launch a slurm job, from the front, with:

pip install omegaconf hydra-core
pip install hydra-submitit-launcher --upgrade
pip install setuptools

python train.py -m -cn slurm24
python train.py -m -cn slurm24 +slurm=GPU
python train.py -m -cn slurm24 +slurm=GPU +sweeps=batch

(use uv run if using uv)

Running different configs/entry-points with slurm

The previous example used one slurm config (slurm24.yaml). However, it requires to create a new config file (for slurm) everytime we have a new base config. To separate concerns, we can put all the slurm config in a file, typically conf/slurm/gpu24.yaml:

💚

# @package _global_
defaults:
  - override /hydra/launcher: submitit_slurm
hydra:
  launcher:
    nodes: 1
    name: ${hydra.job.name}
    _target_: hydra_plugins.hydra_submitit_launcher.submitit_launcher.SlurmLauncher
    partition: "GPU,GPU-DEPINFO"
    gres: "gpu:1"
    cpus_per_task: 2
    mem_per_cpu: 32G
    timeout_min: 1200
    constraint: "[gpu24G]"

Then, we can launch any config with that slurm config, for instance:

💚

pip install omegaconf hydra-core setuptools
pip install hydra-submitit-launcher --upgrade

python train.py -m +slurm=gpu24
python train.py -m +slurm=gpu24 +sweeps=batch
python train.py -m -cn config2 +slurm=gpu24 +sweeps=batch