[Distributed]Add unbalance batch for virtual pp #58383

ForFishes · 2023-10-25T10:56:44Z

PR types

New features

PR changes

Others

Description

[Distributed]Add unbalance batch for virtual pp

paddle-bot · 2023-10-25T10:56:49Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

gongweibao

Is there a test of this new class PipelineParallelWithInterleaveFthenB?

gongweibao · 2023-10-27T01:58:30Z

python/paddle/distributed/fleet/meta_parallel/pipeline_parallel.py

+
+
+class PipelineParallelWithInterleaveFthenB(PipelineParallelWithInterleave):
+    def __init__(self, layers, hcg, strategy):


Add some comments to explain why this is done and how it differs from PipelineParallelWithInterleave.

OK，i will fix in next pr.

* add unbalanced batch for vpp * add unbalanced batch for vpp * add unbalanced batch for vpp

…addlePaddle#58383) * add unbalanced batch for vpp * add unbalanced batch for vpp * add unbalanced batch for vpp

* part-3 cherry from: add check for cembedding (#55621) * part-3 fix cherry from: add check for cembedding * part-3 fix c_embedding * fix test_gpt_with_pir caused by pir * part-3 cherry from: [Distributed] Support dp/sharding overlap in virtual pp (#55651) * Add virtual pp and dp overlap * add sharding/dp overlap * add dp/vpp overlap * fix code * fix log * part-3 cherry from: [cherry-pick] Integration flash attention 2 (#56015) * [FlashAttn] add flash randomness control (#52902) * add flash randomness control * fix VLOG undefied * [WIP] Integration flash attention 2 (#55758) * Work for fa-2 padded fwd. Code to be cleaned. * Work for fa2 unpadded fwd. * Work for padded-bwd, dk get small diff on np.random.seed(0) * Anyway I pass paddle's utest, except return softmax without dropout. * Clean code. * Modify interface. * Clean code and add some check. * Easy compile for dev. * Fix ci. * Fix ci-build. * Add std c++17 option again. * Limit max job when compiling fa2. * Remove const_cast * Add fwd params, to be cleaned. * Clean code. * Add bwd params. * Clean code. * Add enforce. * Use v2.0.4 * Pass RNG state to fa2 capi * Fix review. * Add assert * Skip compile for sm less than 80. --------- Co-authored-by: Chitsing KUI <kuizhiqing@msn.com> * part-4 cherry from: fix codestyle (#56066) * part-4 cherry from(no change): Add assert for static and other plateform (#56044) * part-4 cherry-pick from: dp and sharding coexist (#56096) * dp and sharding coexist * dp * part-4 cherry from: [Distributed] Add debug information for processgroupnccl (#56441) * add debug information * fix log * fix log * add detach for pp * part-4 cherry from: [BugFix]Fix bug in paddle.device.cdua.synchronize() (#56451) * fix bug in synchronize * fix bug in synchronize * part-4 cherry from: add fused gradient (#57048) * part-4 cherry from: [Distribtued] add eager_communication_connection for eager mode in nccl (#57517) * add eager_nccl_connection * add eager_connection * add eager_connection * part-4 cherry from: Add auto growth allocator for CUDA pinned allocator (#57625) * fix h2d bandwidth * remove useless flags * fix cherrry pick #56066 * part-5 cherry from: Add allocation debug FLAGS (#57797) * Add allocation debug FLAGS * add sync after value set * refine flags * part-5 cherry from: fix softmax backward (#57971) * part-5 cherry from: [Distributed]Optimize memory in processgroup (#58299) * optimize memory in processgroupnccl * optimize memory in processgroupnccl * optimize memory in processgroupnccl * optimize memory in processgroupnccl * part-5 cherry from: [Distributed]Add unbalance batch for virtual pp (#58383) * add unbalanced batch for vpp * add unbalanced batch for vpp * add unbalanced batch for vpp * fix * fix comments * fix kunlun compatibility issues * fix test_fused_rotary_position_embedding.py * fix allocator.h * tinyfix * fix conflicts * fix new ir translator c_embedding failure --------- Co-authored-by: ShenLiang <1422485404@qq.com> Co-authored-by: umiswing <umiswing@foxmail.com> Co-authored-by: Chitsing KUI <kuizhiqing@msn.com> Co-authored-by: niuliling123 <51102941+niuliling123@users.noreply.github.com> Co-authored-by: liuzhenhai93 <liuzhenhai93@outlook.com> Co-authored-by: sneaxiy <32832641+sneaxiy@users.noreply.github.com>

add unbalanced batch for vpp

7e680bf

ForFishes added 2 commits October 25, 2023 19:09

add unbalanced batch for vpp

7d51f25

add unbalanced batch for vpp

c956387

paddle-bot bot added the contributor External developers label Oct 25, 2023

sneaxiy approved these changes Oct 26, 2023

View reviewed changes

sneaxiy merged commit ca3fe11 into PaddlePaddle:incubate/new_frl Oct 26, 2023

ForFishes deleted the add_unbalance_batchsize branch October 26, 2023 13:16

gongweibao reviewed Oct 27, 2023

View reviewed changes

paddle-bot bot removed the contributor External developers label Nov 3, 2023

ForFishes added a commit to ForFishes/Paddle that referenced this pull request Nov 7, 2023

[Distributed]Add unbalance batch for virtual pp (PaddlePaddle#58383)

060647d

* add unbalanced batch for vpp * add unbalanced batch for vpp * add unbalanced batch for vpp

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 27, 2023

part-5 cherry from: [Distributed]Add unbalance batch for virtual pp (P…

9755bb5

…addlePaddle#58383) * add unbalanced batch for vpp * add unbalanced batch for vpp * add unbalanced batch for vpp

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 28, 2023

part-5 cherry from: [Distributed]Add unbalance batch for virtual pp (P…

dde2cc1

…addlePaddle#58383) * add unbalanced batch for vpp * add unbalanced batch for vpp * add unbalanced batch for vpp

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 28, 2023

part-5 cherry from: [Distributed]Add unbalance batch for virtual pp (P…

babd631

…addlePaddle#58383) * add unbalanced batch for vpp * add unbalanced batch for vpp * add unbalanced batch for vpp

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 28, 2023

part-5 cherry from: [Distributed]Add unbalance batch for virtual pp (P…

b104fa4

…addlePaddle#58383) * add unbalanced batch for vpp * add unbalanced batch for vpp * add unbalanced batch for vpp

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 4, 2023

part-5 cherry from: [Distributed]Add unbalance batch for virtual pp (P…

2ffacc5

…addlePaddle#58383) * add unbalanced batch for vpp * add unbalanced batch for vpp * add unbalanced batch for vpp

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 5, 2023

part-5 cherry from: [Distributed]Add unbalance batch for virtual pp (P…

e577a93

…addlePaddle#58383) * add unbalanced batch for vpp * add unbalanced batch for vpp * add unbalanced batch for vpp

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 5, 2023

part-5 cherry from: [Distributed]Add unbalance batch for virtual pp (P…

73347a7

…addlePaddle#58383) * add unbalanced batch for vpp * add unbalanced batch for vpp * add unbalanced batch for vpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Distributed]Add unbalance batch for virtual pp #58383

[Distributed]Add unbalance batch for virtual pp #58383

ForFishes commented Oct 25, 2023

paddle-bot bot commented Oct 25, 2023

gongweibao left a comment

gongweibao Oct 27, 2023

ForFishes Oct 27, 2023



		class PipelineParallelWithInterleaveFthenB(PipelineParallelWithInterleave):
		def __init__(self, layers, hcg, strategy):

[Distributed]Add unbalance batch for virtual pp #58383

[Distributed]Add unbalance batch for virtual pp #58383

Conversation

ForFishes commented Oct 25, 2023

PR types

PR changes

Description

paddle-bot bot commented Oct 25, 2023

gongweibao left a comment

Choose a reason for hiding this comment

gongweibao Oct 27, 2023

Choose a reason for hiding this comment

ForFishes Oct 27, 2023

Choose a reason for hiding this comment