Training Loop Functions
These functions are used to train the neural network
- mentat_lss.training_loops.train_galaxy_ps_one_epoch(emulator: ps_emulator, train_loader: DataLoader, bin_idx: list)[source]
Runs through one epoch of training for one sub-network in the galaxy_ps model
- Parameters:
emulator (ps_emulator) – emulator object to train
train_loader (torch.utils.data.DataLoader) – training data to loop through
bin_idx (list) – bin index [ps, z] identifying the sub-network to train.
- Returns:
Average training-set loss. Used for backwards propagation
- Return type:
avg_loss (torch.Tensor)
- mentat_lss.training_loops.train_on_multiple_devices(gpu_id: int, net_indeces: list, config_dir: str)[source]
Trains the given network on multiple gpu devices by splitting.
This function is called in parralel using multiproccesing, and works by training specific sub-networks on seperate gpus, each saving to a seperate sub-directory. After 25 epochs have passed on gpu 0, the results from all gpus are compiles together and saved in the base save directory
- Parameters:
gpu_id (int) – gpu number for logging and organizing save location.
net_indeces (list) – List of sub-network indices to train on the given gpu. This is different for each gpu
config_dir (str) – Location of the input network config file.
- mentat_lss.training_loops.train_on_single_device(emulator: ps_emulator)[source]
Trains the emulator on a single device (cpu or gpu)
- Parameters:
emulator (ps_emulator) – network object to train.