Update training/data_efficiency/variable_batch_size_and_lr/README.md

deepspeedai · Mar 11, 2025 · a4a058e · a4a058e
1 parent 7670822
commit a4a058e
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/training/data_efficiency/variable_batch_size_and_lr/README.md b/training/data_efficiency/variable_batch_size_and_lr/README.md
@@ -6,7 +6,7 @@ In many use cases, particularly LLMs, one is faced with inputs (sentences) of va
 batch contained a set of sentence pairs containing approximately 25000 source tokens and 25000
 target tokens.
 
-Dynamic batch sizes has been requested in [DeepSpeed issue 1051](https://github.com/microsoft/DeepSpeed/issues/1051), [DeepSpeed issue 3455 ](https://github.com/microsoft/DeepSpeed/issues/3455), [Pytorch Lightning issue 16914](https://github.com/Lightning-AI/pytorch-lightning/issues/16914),  [huggingface issue 2647](https://github.com/huggingface/accelerate/issues/2647) and is available already in many libraries e.g. [NVIDIA Triton](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#dynamic-batcher) and [Meta FairSeq](https://github.com/facebookresearch/fairseq) (implementation [here](https://github.com/facebookresearch/fairseq/blob/34973a94d09ecc12092a5ecc8afece5e536b7692/fairseq/data/fairseq_dataset.py#L104) ).
+Dynamic batch sizes has been requested in [DeepSpeed issue 1051](https://github.com/microsoft/DeepSpeed/issues/1051), [DeepSpeed issue 3455 ](https://github.com/microsoft/DeepSpeed/issues/3455), [Pytorch Lightning issue 16914](https://github.com/Lightning-AI/pytorch-lightning/issues/16914),  [huggingface issue 2647](https://github.com/huggingface/accelerate/issues/2647) and is available already in many libraries e.g. [NVIDIA Triton](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#dynamic-batcher) and [Meta FairSeq](https://github.com/facebookresearch/fairseq) (implementation [here](https://github.com/facebookresearch/fairseq/blob/34973a94d09ecc12092a5ecc8afece5e536b7692/fairseq/data/fairseq_dataset.py#L104) ). Dynamic batching support is available in DeepSpeed versions >= [0.16.5](https://github.com/deepspeedai/DeepSpeed/releases/tag/v0.16.5).
 
 The immediate use case for this is when one needs to maximize GPU utilization. Moreover, this is particularly relevant for curriculum learning where a `BxSxE` (Batch x Sequence Length x Embedding) -shaped input should ideally have high `B` and low `S` at the early curriculum steps (many short sentences packed together as a batch), and low `B` and high `S` at the late steps (few long sentences in the batch). A dynamic size `S` is already supported by Deepspeed, e.g. in the documentation for pipeline parallelism's [reset_activation_shape()](https://deepspeed.readthedocs.io/en/stable/pipeline.html#deepspeed.runtime.pipe.engine.PipelineEngine.reset_activation_shape):
 > For curriculum learning that changes the seqlen of each sample, we need to call this whenever the seqlen is going to change.