Training Loop Functions

These functions are used to train the neural network

mentat_lss.training_loops.train_galaxy_ps_one_epoch(emulator: ps_emulator, train_loader: DataLoader, bin_idx: list)[source]

Runs through one epoch of training for one sub-network in the galaxy_ps model

Parameters:

emulator (ps_emulator) – emulator object to train
train_loader (torch.utils.data.DataLoader) – training data to loop through
bin_idx (list) – bin index [ps, z] or [z] identifying the sub-network to train.

Returns:

Average training-set loss. Used for backwards propagation

Return type:

avg_loss (torch.Tensor)

mentat_lss.training_loops.train_on_multiple_devices(gpu_id: int, net_indeces: list, config_dir: str)[source]

Trains the given network on multiple gpu devices by splitting.

This function is called in parralel using multiproccesing, and works by training specific sub-networks on seperate gpus, each saving to a seperate sub-directory. After 25 epochs have passed on gpu 0, the results from all gpus are compiles together and saved in the base save directory

Parameters:

gpu_id (int) – gpu number for logging and organizing save location.
net_indeces (list) – List of sub-network indices to train on the given gpu. This is different for each gpu
config_dir (str) – Location of the input network config file.

mentat_lss.training_loops.train_on_single_device(emulator: ps_emulator, trial=None)[source]

Trains the emulator on a single device (cpu or gpu)

Parameters:

emulator (ps_emulator) – network object to train.
trial (optuna.trial.Trial, optional) – If not None, the current trial informaiton from optuna. Default None