Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Distributed]Add unbalance batch for virtual pp #58383

Merged

Conversation

ForFishes
Copy link
Member

PR types

New features

PR changes

Others

Description

[Distributed]Add unbalance batch for virtual pp

@paddle-bot
Copy link

paddle-bot bot commented Oct 25, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot paddle-bot bot added the contributor External developers label Oct 25, 2023
@sneaxiy sneaxiy merged commit ca3fe11 into PaddlePaddle:incubate/new_frl Oct 26, 2023
@ForFishes ForFishes deleted the add_unbalance_batchsize branch October 26, 2023 13:16
Copy link
Contributor

@gongweibao gongweibao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test of this new class PipelineParallelWithInterleaveFthenB?



class PipelineParallelWithInterleaveFthenB(PipelineParallelWithInterleave):
def __init__(self, layers, hcg, strategy):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some comments to explain why this is done and how it differs from PipelineParallelWithInterleave.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK,i will fix in next pr.

@paddle-bot paddle-bot bot removed the contributor External developers label Nov 3, 2023
ForFishes added a commit to ForFishes/Paddle that referenced this pull request Nov 7, 2023
* add unbalanced batch for vpp

* add unbalanced batch for vpp

* add unbalanced batch for vpp
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 27, 2023
…addlePaddle#58383)

* add unbalanced batch for vpp

* add unbalanced batch for vpp

* add unbalanced batch for vpp
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 28, 2023
…addlePaddle#58383)

* add unbalanced batch for vpp

* add unbalanced batch for vpp

* add unbalanced batch for vpp
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 28, 2023
…addlePaddle#58383)

* add unbalanced batch for vpp

* add unbalanced batch for vpp

* add unbalanced batch for vpp
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Nov 28, 2023
…addlePaddle#58383)

* add unbalanced batch for vpp

* add unbalanced batch for vpp

* add unbalanced batch for vpp
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 4, 2023
…addlePaddle#58383)

* add unbalanced batch for vpp

* add unbalanced batch for vpp

* add unbalanced batch for vpp
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 5, 2023
…addlePaddle#58383)

* add unbalanced batch for vpp

* add unbalanced batch for vpp

* add unbalanced batch for vpp
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 5, 2023
…addlePaddle#58383)

* add unbalanced batch for vpp

* add unbalanced batch for vpp

* add unbalanced batch for vpp
zhiqiu pushed a commit that referenced this pull request Dec 6, 2023
* part-3 cherry from: add check for cembedding (#55621)

* part-3 fix cherry from: add check for cembedding

* part-3 fix c_embedding

* fix test_gpt_with_pir caused by pir

* part-3 cherry from: [Distributed] Support dp/sharding overlap in  virtual pp (#55651)

* Add virtual pp and dp overlap

* add sharding/dp overlap

* add dp/vpp overlap

* fix code

* fix log

* part-3 cherry from: [cherry-pick] Integration flash attention 2 (#56015)

* [FlashAttn] add flash randomness control (#52902)

* add flash randomness control

* fix VLOG undefied

* [WIP] Integration flash attention 2 (#55758)

* Work for fa-2 padded fwd. Code to be cleaned.

* Work for fa2 unpadded fwd.

* Work for padded-bwd, dk get small diff on np.random.seed(0)

* Anyway I pass paddle's utest, except return softmax without dropout.

* Clean code.

* Modify interface.

* Clean code and add some check.

* Easy compile for dev.

* Fix ci.

* Fix ci-build.

* Add std c++17 option again.

* Limit max job when compiling fa2.

* Remove const_cast

* Add fwd params, to be cleaned.

* Clean code.

* Add bwd params.

* Clean code.

* Add enforce.

* Use v2.0.4

* Pass RNG state to fa2 capi

* Fix review.

* Add assert

* Skip compile for sm less than 80.

---------

Co-authored-by: Chitsing KUI <kuizhiqing@msn.com>

* part-4 cherry from: fix codestyle (#56066)

* part-4 cherry from(no change): Add assert for static and other plateform (#56044)

* part-4 cherry-pick from: dp and sharding coexist (#56096)

* dp and sharding coexist

* dp

* part-4 cherry from: [Distributed] Add debug information for processgroupnccl (#56441)

* add debug information

* fix log

* fix log

* add detach for pp

* part-4 cherry from: [BugFix]Fix bug in paddle.device.cdua.synchronize() (#56451)

* fix bug in synchronize

* fix bug in synchronize

* part-4 cherry from: add fused gradient (#57048)

* part-4 cherry from: [Distribtued] add eager_communication_connection for eager mode in nccl (#57517)

* add eager_nccl_connection

* add eager_connection

* add eager_connection

* part-4 cherry from: Add auto growth allocator for CUDA pinned allocator (#57625)

* fix h2d bandwidth

* remove useless flags

* fix cherrry pick #56066

* part-5 cherry from: Add allocation debug FLAGS (#57797)

* Add allocation debug FLAGS

* add sync after value set

* refine flags

* part-5 cherry from: fix softmax backward (#57971)

* part-5 cherry from: [Distributed]Optimize memory in processgroup (#58299)

* optimize memory in processgroupnccl

* optimize memory in processgroupnccl

* optimize memory in processgroupnccl

* optimize memory in processgroupnccl

* part-5 cherry from: [Distributed]Add unbalance batch for virtual pp (#58383)

* add unbalanced batch for vpp

* add unbalanced batch for vpp

* add unbalanced batch for vpp

* fix

* fix comments

* fix kunlun compatibility issues

* fix test_fused_rotary_position_embedding.py

* fix allocator.h

* tinyfix

* fix conflicts

* fix new ir translator c_embedding failure

---------

Co-authored-by: ShenLiang <1422485404@qq.com>
Co-authored-by: umiswing <umiswing@foxmail.com>
Co-authored-by: Chitsing KUI <kuizhiqing@msn.com>
Co-authored-by: niuliling123 <51102941+niuliling123@users.noreply.github.com>
Co-authored-by: liuzhenhai93 <liuzhenhai93@outlook.com>
Co-authored-by: sneaxiy <32832641+sneaxiy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants