Training Loop Functions

These functions are used to train the neural network

mentat_lss.training_loops.train_galaxy_ps_one_epoch(emulator: ps_emulator, train_loader: DataLoader, bin_idx: list)[source]

Runs through one epoch of training for one sub-network in the galaxy_ps model

Parameters:
  • emulator (ps_emulator) – emulator object to train

  • train_loader (torch.utils.data.DataLoader) – training data to loop through

  • bin_idx (list) – bin index [ps, z] identifying the sub-network to train.

Returns:

Average training-set loss. Used for backwards propagation

Return type:

avg_loss (torch.Tensor)

mentat_lss.training_loops.train_on_multiple_devices(gpu_id: int, net_indeces: list, config_dir: str)[source]

Trains the given network on multiple gpu devices by splitting.

This function is called in parralel using multiproccesing, and works by training specific sub-networks on seperate gpus, each saving to a seperate sub-directory. After 25 epochs have passed on gpu 0, the results from all gpus are compiles together and saved in the base save directory

Parameters:
  • gpu_id (int) – gpu number for logging and organizing save location.

  • net_indeces (list) – List of sub-network indices to train on the given gpu. This is different for each gpu

  • config_dir (str) – Location of the input network config file.

mentat_lss.training_loops.train_on_single_device(emulator: ps_emulator)[source]

Trains the emulator on a single device (cpu or gpu)

Parameters:

emulator (ps_emulator) – network object to train.