Merge into develop part-5 #59644

hitywt · 2023-12-04T02:14:03Z

PR types

Others

PR changes

Others

Description

Pcard-70448

Add vocab_size check for c_embedding
Support dp/sharding overlap in virtual pp
Integrating flash-attention-2 to PaddlePaddle.
Exposure softmax_lse & seed_offset in FlashAttention
兼容 shard 、dp 两种并行方式同时存在
Add debug information for processgroupnccl
Support detach of EagerParamBase in recompute
add FLAGS_benchmark_nccl for blocking nccl comm
add eager_communication_connection for eager mode in nccl
Add auto growth allocator for CUDA pinned allocator.

paddle-bot · 2023-12-04T02:14:09Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

…ual pp (PaddlePaddle#55651) * Add virtual pp and dp overlap * add sharding/dp overlap * add dp/vpp overlap * fix code * fix log

…lePaddle#56015) * [FlashAttn] add flash randomness control (PaddlePaddle#52902) * add flash randomness control * fix VLOG undefied * [WIP] Integration flash attention 2 (PaddlePaddle#55758) * Work for fa-2 padded fwd. Code to be cleaned. * Work for fa2 unpadded fwd. * Work for padded-bwd, dk get small diff on np.random.seed(0) * Anyway I pass paddle's utest, except return softmax without dropout. * Clean code. * Modify interface. * Clean code and add some check. * Easy compile for dev. * Fix ci. * Fix ci-build. * Add std c++17 option again. * Limit max job when compiling fa2. * Remove const_cast * Add fwd params, to be cleaned. * Clean code. * Add bwd params. * Clean code. * Add enforce. * Use v2.0.4 * Pass RNG state to fa2 capi * Fix review. * Add assert * Skip compile for sm less than 80. --------- Co-authored-by: Chitsing KUI <kuizhiqing@msn.com>

…orm (PaddlePaddle#56044)

* dp and sharding coexist * dp

…oupnccl (PaddlePaddle#56441) * add debug information * fix log * fix log * add detach for pp

…() (PaddlePaddle#56451) * fix bug in synchronize * fix bug in synchronize

…for eager mode in nccl (PaddlePaddle#57517) * add eager_nccl_connection * add eager_connection * add eager_connection

…or (PaddlePaddle#57625) * fix h2d bandwidth * remove useless flags

* Add allocation debug FLAGS * add sync after value set * refine flags

…dlePaddle#58299) * optimize memory in processgroupnccl * optimize memory in processgroupnccl * optimize memory in processgroupnccl * optimize memory in processgroupnccl

…addlePaddle#58383) * add unbalanced batch for vpp * add unbalanced batch for vpp * add unbalanced batch for vpp

jeff41404

LGTM for synchronize API

sneaxiy · 2023-12-06T02:48:02Z

paddle/fluid/memory/allocation/allocator.h

+#if defined(PADDLE_WITH_NCCL)
+  if (allocation != nullptr) {
+    if (FLAGS_sync_after_alloc || FLAGS_alloc_fill_value >= 0) {
+      cudaDeviceSynchronize();


为什么去掉了PADDLE_ENFORCE_GPU_SUCCESSS?下同

为什么去掉了PADDLE_ENFORCE_GPU_SUCCESSS?下同

加上会一直报错，还没找到原因，所以临时删掉了，测试好了再加上吧

sneaxiy · 2023-12-06T02:48:14Z

paddle/fluid/memory/allocation/allocator.h

@@ -134,6 +141,31 @@ using AllocationPtr = phi::Allocator::AllocationPtr;
 using DecoratedAllocationPtr =
    std::unique_ptr<Allocation, phi::Allocator::DeleterType>;

+template <typename T>
+static T&& FillValue(T&& allocation) {
+#if defined(PADDLE_WITH_NCCL)


#if defined(PADDLE_WITH_CUDA)?

#if defined(PADDLE_WITH_CUDA)?

收到，提一个修复PR解决

sneaxiy · 2023-12-06T02:50:01Z

python/paddle/distributed/fleet/meta_parallel/pipeline_parallel.py

@@ -910,7 +921,7 @@ def __init__(self, layers, hcg, strategy):
        self._virtual_pp_rank = 0
        self._reset_counter()

-        self._check_sanity()
+        self._assign_vpp_info(self.model_chunks)


assign_vpp_info这个函数是不是不需要了？

assign_vpp_info这个函数是不是不需要了？
这里是保留self._check_sanity()，删除self._assign_vpp_info(self.model_chunks)是吧？

lanxianghit

LGTM for FLag

ForFishes

LGTM

zyfncg · 2023-12-06T03:35:23Z

paddle/phi/api/yaml/ops.yaml

@@ -934,7 +934,6 @@
  kernel :
    func : flash_attn
    data_type : q
-  intermediate : softmax_lse, seed_offset


看python接口里这几个输出参数也没有被外部使用，为什么删除了intermediate 配置？

看python接口里这几个输出参数也没有被外部使用，为什么删除了intermediate 配置？
后续功能会在外部使用到

XiaoguangHu01

LGTM

jzhang533

LGTM

zhiqiu

LGTM

* tinyfix for PR #59644 * tinyfix * tinyfix * update

* update * update

* tinyfix for PR PaddlePaddle#59644 * tinyfix * tinyfix * update

* update * update

* tinyfix for PR PaddlePaddle#59644 * tinyfix * tinyfix * update

* Fix comments for PR #59644 (#59885) * update * update * Fix comments for PR #59644 (#59750) * tinyfix for PR #59644 * tinyfix * tinyfix * update * update

* tinyfix for PR PaddlePaddle#59644 * tinyfix * tinyfix * update

hitywt changed the title ~~Merge into develop 1204 5~~ Merge into develop part-5 Dec 4, 2023

hitywt force-pushed the merge_into_develop_1204_5 branch from 3fd5ddb to 00d77de Compare December 5, 2023 01:38

ForFishes and others added 27 commits December 5, 2023 17:12

part-3 cherry from: add check for cembedding (PaddlePaddle#55621)

ab3ec14

part-3 fix cherry from: add check for cembedding

5faa16f

part-3 fix c_embedding

d59333d

fix test_gpt_with_pir caused by pir

10c5890

part-3 cherry from: [Distributed] Support dp/sharding overlap in virt…

64f30b7

…ual pp (PaddlePaddle#55651) * Add virtual pp and dp overlap * add sharding/dp overlap * add dp/vpp overlap * fix code * fix log

part-4 cherry from: fix codestyle (PaddlePaddle#56066)

85ea1de

part-4 cherry from(no change): Add assert for static and other platef…

adf204c

…orm (PaddlePaddle#56044)

part-4 cherry-pick from: dp and sharding coexist (PaddlePaddle#56096)

592d1d6

* dp and sharding coexist * dp

part-4 cherry from: [Distributed] Add debug information for processgr…

631a5dc

…oupnccl (PaddlePaddle#56441) * add debug information * fix log * fix log * add detach for pp

part-4 cherry from: [BugFix]Fix bug in paddle.device.cdua.synchronize…

514ed1f

…() (PaddlePaddle#56451) * fix bug in synchronize * fix bug in synchronize

part-4 cherry from: add fused gradient (PaddlePaddle#57048)

d60d121

part-4 cherry from: [Distribtued] add eager_communication_connection …

660cae8

…for eager mode in nccl (PaddlePaddle#57517) * add eager_nccl_connection * add eager_connection * add eager_connection

part-4 cherry from: Add auto growth allocator for CUDA pinned allocat…

af0ad5b

…or (PaddlePaddle#57625) * fix h2d bandwidth * remove useless flags

fix cherrry pick PaddlePaddle#56066

5affb58

part-5 cherry from: Add allocation debug FLAGS (PaddlePaddle#57797)

eec87a1

* Add allocation debug FLAGS * add sync after value set * refine flags

part-5 cherry from: fix softmax backward (PaddlePaddle#57971)

03ecc6a

part-5 cherry from: [Distributed]Optimize memory in processgroup (Pad…

5491370

…dlePaddle#58299) * optimize memory in processgroupnccl * optimize memory in processgroupnccl * optimize memory in processgroupnccl * optimize memory in processgroupnccl

part-5 cherry from: [Distributed]Add unbalance batch for virtual pp (P…

73347a7

…addlePaddle#58383) * add unbalanced batch for vpp * add unbalanced batch for vpp * add unbalanced batch for vpp

fix

e3592d1

fix comments

d55d337

fix kunlun compatibility issues

586c35e

fix test_fused_rotary_position_embedding.py

6955997

fix allocator.h

790353b

tinyfix

991277f

fix conflicts

c873fb3

fix new ir translator c_embedding failure

bfa1210

hitywt force-pushed the merge_into_develop_1204_5 branch from 00d77de to bfa1210 Compare December 5, 2023 09:13

jeff41404 approved these changes Dec 6, 2023

View reviewed changes

vivienfanghuagood approved these changes Dec 6, 2023

View reviewed changes

sneaxiy reviewed Dec 6, 2023

View reviewed changes

lanxianghit approved these changes Dec 6, 2023

View reviewed changes

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 6, 2023

tinyfix for PR PaddlePaddle#59644

4452a1e

ForFishes approved these changes Dec 6, 2023

View reviewed changes

zyfncg reviewed Dec 6, 2023

View reviewed changes

zyfncg approved these changes Dec 6, 2023

View reviewed changes

hitywt assigned jzhang533 Dec 6, 2023

XiaoguangHu01 approved these changes Dec 6, 2023

View reviewed changes

jzhang533 approved these changes Dec 6, 2023

View reviewed changes

heavengate approved these changes Dec 6, 2023

View reviewed changes

zhiqiu approved these changes Dec 6, 2023

View reviewed changes

zhiqiu merged commit a5a124e into PaddlePaddle:develop Dec 6, 2023

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 6, 2023

tinyfix for PR PaddlePaddle#59644

4212c43

hitywt mentioned this pull request Dec 6, 2023

Fix comments for PR #59644 #59750

Merged

sneaxiy pushed a commit that referenced this pull request Dec 11, 2023

Fix comments for PR #59644 (#59750)

a763716

* tinyfix for PR #59644 * tinyfix * tinyfix * update

hitywt mentioned this pull request Dec 11, 2023

Fix comments for PR #59644 #59885

Merged

ForFishes pushed a commit that referenced this pull request Dec 12, 2023

Fix comments for PR #59644 (#59885)

843d101

* update * update

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 13, 2023

Fix comments for PR PaddlePaddle#59644 (PaddlePaddle#59885)

c41e759

* update * update

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 13, 2023

Fix comments for PR PaddlePaddle#59644 (PaddlePaddle#59750)

70c8223

* tinyfix for PR PaddlePaddle#59644 * tinyfix * tinyfix * update

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 13, 2023

Fix comments for PR PaddlePaddle#59644 (PaddlePaddle#59885)

987baa9

* update * update

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 13, 2023

Fix comments for PR PaddlePaddle#59644 (PaddlePaddle#59750)

6e09f0b

* tinyfix for PR PaddlePaddle#59644 * tinyfix * tinyfix * update

sneaxiy pushed a commit that referenced this pull request Dec 14, 2023

Fix ce 2.6 (#59977)

56eaf0e

* Fix comments for PR #59644 (#59885) * update * update * Fix comments for PR #59644 (#59750) * tinyfix for PR #59644 * tinyfix * tinyfix * update * update

hitywt mentioned this pull request Dec 16, 2023

fix dev 2.6 #60080

Merged

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 25, 2023

Fix comments for PR PaddlePaddle#59644 (PaddlePaddle#59750)

d0050d5

* tinyfix for PR PaddlePaddle#59644 * tinyfix * tinyfix * update

SylarTiaNII mentioned this pull request Jan 12, 2024

[CustomDevice] adapt c_embedding to phi namespace for custom devices #60774

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge into develop part-5 #59644

Merge into develop part-5 #59644

hitywt commented Dec 4, 2023 •

edited

Loading

paddle-bot bot commented Dec 4, 2023

jeff41404 left a comment

sneaxiy Dec 6, 2023

hitywt Dec 6, 2023

sneaxiy Dec 6, 2023

hitywt Dec 6, 2023

sneaxiy Dec 6, 2023

hitywt Dec 6, 2023

lanxianghit left a comment

ForFishes left a comment

zyfncg Dec 6, 2023

hitywt Dec 6, 2023

XiaoguangHu01 left a comment

jzhang533 left a comment

zhiqiu left a comment

Merge into develop part-5 #59644

Merge into develop part-5 #59644

Conversation

hitywt commented Dec 4, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Dec 4, 2023

jeff41404 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lanxianghit left a comment

Choose a reason for hiding this comment

ForFishes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

jzhang533 left a comment

Choose a reason for hiding this comment

zhiqiu left a comment

Choose a reason for hiding this comment

hitywt commented Dec 4, 2023 •

edited

Loading