Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge into develop part-5 #59644

Merged
merged 27 commits into from
Dec 6, 2023
Merged

Conversation

hitywt
Copy link

@hitywt hitywt commented Dec 4, 2023

PR types

Others

PR changes

Others

Description

Pcard-70448

  1. Add vocab_size check for c_embedding
  2. Support dp/sharding overlap in virtual pp
  3. Integrating flash-attention-2 to PaddlePaddle.
  4. Exposure softmax_lse & seed_offset in FlashAttention
  5. 兼容 shard 、dp 两种并行方式同时存在
  6. Add debug information for processgroupnccl
  7. Support detach of EagerParamBase in recompute
  8. add FLAGS_benchmark_nccl for blocking nccl comm
  9. add eager_communication_connection for eager mode in nccl
  10. Add auto growth allocator for CUDA pinned allocator.

Copy link

paddle-bot bot commented Dec 4, 2023

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@hitywt hitywt changed the title Merge into develop 1204 5 Merge into develop part-5 Dec 4, 2023
@hitywt hitywt force-pushed the merge_into_develop_1204_5 branch from 3fd5ddb to 00d77de Compare December 5, 2023 01:38
ForFishes and others added 27 commits December 5, 2023 17:12
…ual pp (PaddlePaddle#55651)

* Add virtual pp and dp overlap

* add sharding/dp overlap

* add dp/vpp overlap

* fix code

* fix log
…lePaddle#56015)

* [FlashAttn] add flash randomness control (PaddlePaddle#52902)

* add flash randomness control

* fix VLOG undefied

* [WIP] Integration flash attention 2 (PaddlePaddle#55758)

* Work for fa-2 padded fwd. Code to be cleaned.

* Work for fa2 unpadded fwd.

* Work for padded-bwd, dk get small diff on np.random.seed(0)

* Anyway I pass paddle's utest, except return softmax without dropout.

* Clean code.

* Modify interface.

* Clean code and add some check.

* Easy compile for dev.

* Fix ci.

* Fix ci-build.

* Add std c++17 option again.

* Limit max job when compiling fa2.

* Remove const_cast

* Add fwd params, to be cleaned.

* Clean code.

* Add bwd params.

* Clean code.

* Add enforce.

* Use v2.0.4

* Pass RNG state to fa2 capi

* Fix review.

* Add assert

* Skip compile for sm less than 80.

---------

Co-authored-by: Chitsing KUI <kuizhiqing@msn.com>
…oupnccl (PaddlePaddle#56441)

* add debug information

* fix log

* fix log

* add detach for pp
…for eager mode in nccl (PaddlePaddle#57517)

* add eager_nccl_connection

* add eager_connection

* add eager_connection
* Add allocation debug FLAGS

* add sync after value set

* refine flags
…dlePaddle#58299)

* optimize memory in processgroupnccl

* optimize memory in processgroupnccl

* optimize memory in processgroupnccl

* optimize memory in processgroupnccl
…addlePaddle#58383)

* add unbalanced batch for vpp

* add unbalanced batch for vpp

* add unbalanced batch for vpp
@hitywt hitywt force-pushed the merge_into_develop_1204_5 branch from 00d77de to bfa1210 Compare December 5, 2023 09:13
Copy link
Contributor

@jeff41404 jeff41404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for synchronize API

#if defined(PADDLE_WITH_NCCL)
if (allocation != nullptr) {
if (FLAGS_sync_after_alloc || FLAGS_alloc_fill_value >= 0) {
cudaDeviceSynchronize();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么去掉了PADDLE_ENFORCE_GPU_SUCCESSS?下同

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么去掉了PADDLE_ENFORCE_GPU_SUCCESSS?下同

加上会一直报错,还没找到原因,所以临时删掉了,测试好了再加上吧

@@ -134,6 +141,31 @@ using AllocationPtr = phi::Allocator::AllocationPtr;
using DecoratedAllocationPtr =
std::unique_ptr<Allocation, phi::Allocator::DeleterType>;

template <typename T>
static T&& FillValue(T&& allocation) {
#if defined(PADDLE_WITH_NCCL)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#if defined(PADDLE_WITH_CUDA)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#if defined(PADDLE_WITH_CUDA)?

收到,提一个修复PR解决

@@ -910,7 +921,7 @@ def __init__(self, layers, hcg, strategy):
self._virtual_pp_rank = 0
self._reset_counter()

self._check_sanity()
self._assign_vpp_info(self.model_chunks)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assign_vpp_info这个函数是不是不需要了?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assign_vpp_info这个函数是不是不需要了?
这里是保留self._check_sanity(),删除self._assign_vpp_info(self.model_chunks)是吧?

Copy link
Contributor

@lanxianghit lanxianghit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for FLag

hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 6, 2023
Copy link
Member

@ForFishes ForFishes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -934,7 +934,6 @@
kernel :
func : flash_attn
data_type : q
intermediate : softmax_lse, seed_offset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看python接口里这几个输出参数也没有被外部使用,为什么删除了intermediate 配置?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看python接口里这几个输出参数也没有被外部使用,为什么删除了intermediate 配置?
后续功能会在外部使用到

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@jzhang533 jzhang533 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@zhiqiu zhiqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhiqiu zhiqiu merged commit a5a124e into PaddlePaddle:develop Dec 6, 2023
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 6, 2023
sneaxiy pushed a commit that referenced this pull request Dec 11, 2023
* tinyfix for PR #59644

* tinyfix

* tinyfix

* update
ForFishes pushed a commit that referenced this pull request Dec 12, 2023
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 13, 2023
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 13, 2023
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 13, 2023
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 13, 2023
sneaxiy pushed a commit that referenced this pull request Dec 14, 2023
* Fix comments for PR #59644 (#59885)

* update

* update

* Fix comments for PR #59644 (#59750)

* tinyfix for PR #59644

* tinyfix

* tinyfix

* update

* update
@hitywt hitywt mentioned this pull request Dec 16, 2023
hitywt pushed a commit to hitywt/Paddle that referenced this pull request Dec 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.