Accelerate documentation
Megatron-LM utilities
Megatron-LM utilities
MegatronLMPlugin
class accelerate.utils.MegatronLMPlugin
< source >( tp_degree: int = None pp_degree: int = None num_micro_batches: int = None gradient_clipping: float = None sequence_parallelism: bool = None recompute_activations: bool = None use_distributed_optimizer: bool = None pipeline_model_parallel_split_rank: int = None num_layers_per_virtual_pipeline_stage: int = None is_train_batch_min: str = True train_iters: int = None train_samples: int = None weight_decay_incr_style: str = 'constant' start_weight_decay: float = None end_weight_decay: float = None lr_decay_style: str = 'linear' lr_decay_iters: int = None lr_decay_samples: int = None lr_warmup_iters: int = None lr_warmup_samples: int = None lr_warmup_fraction: float = None min_lr: float = 0 consumed_samples: list = None no_wd_decay_cond: typing.Optional[typing.Callable] = None scale_lr_cond: typing.Optional[typing.Callable] = None lr_mult: float = 1.0 megatron_dataset_flag: bool = False seq_length: int = None encoder_seq_length: int = None decoder_seq_length: int = None tensorboard_dir: str = None set_all_logging_options: bool = False eval_iters: int = 100 eval_interval: int = 1000 return_logits: bool = False custom_train_step_class: typing.Optional[typing.Any] = None custom_train_step_kwargs: typing.Optional[dict[str, typing.Any]] = None custom_model_provider_function: typing.Optional[typing.Callable] = None custom_prepare_model_function: typing.Optional[typing.Callable] = None custom_megatron_datasets_provider_function: typing.Optional[typing.Callable] = None custom_get_batch_function: typing.Optional[typing.Callable] = None custom_loss_function: typing.Optional[typing.Callable] = None other_megatron_args: typing.Optional[dict[str, typing.Any]] = None )
Parameters
-  tp_degree (int, defaults toNone) — Tensor parallelism degree.
-  pp_degree (int, defaults toNone) — Pipeline parallelism degree.
-  num_micro_batches (int, defaults toNone) — Number of micro-batches.
-  gradient_clipping (float, defaults toNone) — Gradient clipping value based on global L2 Norm (0 to disable).
-  sequence_parallelism (bool, defaults toNone) — Enable sequence parallelism.
-  recompute_activations (bool, defaults toNone) — Enable selective activation recomputation.
-  use_distributed_optimizr (bool, defaults toNone) — Enable distributed optimizer.
-  pipeline_model_parallel_split_rank (int, defaults toNone) — Rank where encoder and decoder should be split.
-  num_layers_per_virtual_pipeline_stage (int, defaults toNone) — Number of layers per virtual pipeline stage.
-  is_train_batch_min (str, defaults toTrue) — If both tran & eval dataloaders are specified, this will decide themicro_batch_size.
-  train_iters (int, defaults toNone) — Total number of samples to train over all training runs. Note that either train-iters or train-samples should be provided when usingMegatronLMDummyScheduler.
-  train_samples (int, defaults toNone) — Total number of samples to train over all training runs. Note that either train-iters or train-samples should be provided when usingMegatronLMDummyScheduler.
-  weight_decay_incr_style (str, defaults to'constant') — Weight decay increment function. choices=[“constant”, “linear”, “cosine”].
-  start_weight_decay (float, defaults toNone) — Initial weight decay coefficient for L2 regularization.
-  end_weight_decay (float, defaults toNone) — End of run weight decay coefficient for L2 regularization.
-  lr_decay_style (str, defaults to'linear') — Learning rate decay function. choices=[‘constant’, ‘linear’, ‘cosine’].
-  lr_decay_iters (int, defaults toNone) — Number of iterations for learning rate decay. If None defaults totrain_iters.
-  lr_decay_samples (int, defaults toNone) — Number of samples for learning rate decay. If None defaults totrain_samples.
-  lr_warmup_iters (int, defaults toNone) — Number of iterations to linearly warmup learning rate over.
-  lr_warmup_samples (int, defaults toNone) — Number of samples to linearly warmup learning rate over.
-  lr_warmup_fraction (float, defaults toNone) — Fraction of lr-warmup-(iters/samples) to linearly warmup learning rate over.
-  min_lr (float, defaults to0) — Minimum value for learning rate. The scheduler clip values below this threshold.
-  consumed_samples (List, defaults toNone) — Number of samples consumed in the same order as the dataloaders toaccelerator.preparecall.
-  no_wd_decay_cond (Optional, defaults toNone) — Condition to disable weight decay.
-  scale_lr_cond (Optional, defaults toNone) — Condition to scale learning rate.
-  lr_mult (float, defaults to1.0) — Learning rate multiplier.
-  megatron_dataset_flag (bool, defaults toFalse) — Whether the format of dataset follows Megatron-LM Indexed/Cached/MemoryMapped format.
-  seq_length (int, defaults toNone) — Maximum sequence length to process.
-  encoder_seq_length (int, defaults toNone) — Maximum sequence length to process for the encoder.
-  decoder_seq_length (int, defaults toNone) — Maximum sequence length to process for the decoder.
-  tensorboard_dir (str, defaults toNone) — Path to save tensorboard logs.
-  set_all_logging_options (bool, defaults toFalse) — Whether to set all logging options.
-  eval_iters (int, defaults to100) — Number of iterations to run for evaluation validation/test for.
-  eval_interval (int, defaults to1000) — Interval between running evaluation on validation set.
-  return_logits (bool, defaults toFalse) — Whether to return logits from the model.
-  custom_train_step_class (Optional, defaults toNone) — Custom train step class.
-  custom_train_step_kwargs (Optional, defaults toNone) — Custom train step kwargs.
-  custom_model_provider_function (Optional, defaults toNone) — Custom model provider function.
-  custom_prepare_model_function (Optional, defaults toNone) — Custom prepare model function.
-  custom_megatron_datasets_provider_function (Optional, defaults toNone) — Custom megatron train_valid_test datasets provider function.
-  custom_get_batch_function (Optional, defaults toNone) — Custom get batch function.
-  custom_loss_function (Optional, defaults toNone) — Custom loss function.
-  other_megatron_args (Optional, defaults toNone) — Other Megatron-LM arguments. Please refer Megatron-LM.
Plugin for Megatron-LM to enable tensor, pipeline, sequence and data parallelism. Also to enable selective activation recomputation and optimized fused kernels.
MegatronLMDummyScheduler
class accelerate.utils.MegatronLMDummyScheduler
< source >( optimizer total_num_steps = None warmup_num_steps = 0 **kwargs )
Dummy scheduler presents model parameters or param groups, this is primarily used to follow conventional training loop when scheduler config is specified in the deepspeed config file.
MegatronLMDummyDataLoader
class accelerate.utils.MegatronLMDummyDataLoader
< source >( **dataset_kwargs )
Dummy dataloader presents model parameters or param groups, this is primarily used to follow conventional training
AbstractTrainStep
Abstract class for batching, forward pass and loss handler.
GPTTrainStep
class accelerate.utils.GPTTrainStep
< source >( accelerator args )
GPT train step class.
BertTrainStep
class accelerate.utils.BertTrainStep
< source >( accelerator args )
Bert train step class.
T5TrainStep
class accelerate.utils.T5TrainStep
< source >( accelerator args )
T5 train step class.
avg_losses_across_data_parallel_group
accelerate.utils.avg_losses_across_data_parallel_group
< source >( losses )
Average losses across data parallel group.