-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge into develop part-5 #59644
Merge into develop part-5 #59644
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
3fd5ddb
to
00d77de
Compare
…ual pp (PaddlePaddle#55651) * Add virtual pp and dp overlap * add sharding/dp overlap * add dp/vpp overlap * fix code * fix log
…lePaddle#56015) * [FlashAttn] add flash randomness control (PaddlePaddle#52902) * add flash randomness control * fix VLOG undefied * [WIP] Integration flash attention 2 (PaddlePaddle#55758) * Work for fa-2 padded fwd. Code to be cleaned. * Work for fa2 unpadded fwd. * Work for padded-bwd, dk get small diff on np.random.seed(0) * Anyway I pass paddle's utest, except return softmax without dropout. * Clean code. * Modify interface. * Clean code and add some check. * Easy compile for dev. * Fix ci. * Fix ci-build. * Add std c++17 option again. * Limit max job when compiling fa2. * Remove const_cast * Add fwd params, to be cleaned. * Clean code. * Add bwd params. * Clean code. * Add enforce. * Use v2.0.4 * Pass RNG state to fa2 capi * Fix review. * Add assert * Skip compile for sm less than 80. --------- Co-authored-by: Chitsing KUI <kuizhiqing@msn.com>
* dp and sharding coexist * dp
…oupnccl (PaddlePaddle#56441) * add debug information * fix log * fix log * add detach for pp
…() (PaddlePaddle#56451) * fix bug in synchronize * fix bug in synchronize
…for eager mode in nccl (PaddlePaddle#57517) * add eager_nccl_connection * add eager_connection * add eager_connection
…or (PaddlePaddle#57625) * fix h2d bandwidth * remove useless flags
* Add allocation debug FLAGS * add sync after value set * refine flags
…dlePaddle#58299) * optimize memory in processgroupnccl * optimize memory in processgroupnccl * optimize memory in processgroupnccl * optimize memory in processgroupnccl
…addlePaddle#58383) * add unbalanced batch for vpp * add unbalanced batch for vpp * add unbalanced batch for vpp
00d77de
to
bfa1210
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for synchronize
API
#if defined(PADDLE_WITH_NCCL) | ||
if (allocation != nullptr) { | ||
if (FLAGS_sync_after_alloc || FLAGS_alloc_fill_value >= 0) { | ||
cudaDeviceSynchronize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么去掉了PADDLE_ENFORCE_GPU_SUCCESSS
?下同
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么去掉了
PADDLE_ENFORCE_GPU_SUCCESSS
?下同
加上会一直报错,还没找到原因,所以临时删掉了,测试好了再加上吧
@@ -134,6 +141,31 @@ using AllocationPtr = phi::Allocator::AllocationPtr; | |||
using DecoratedAllocationPtr = | |||
std::unique_ptr<Allocation, phi::Allocator::DeleterType>; | |||
|
|||
template <typename T> | |||
static T&& FillValue(T&& allocation) { | |||
#if defined(PADDLE_WITH_NCCL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#if defined(PADDLE_WITH_CUDA)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#if defined(PADDLE_WITH_CUDA)
?
收到,提一个修复PR解决
@@ -910,7 +921,7 @@ def __init__(self, layers, hcg, strategy): | |||
self._virtual_pp_rank = 0 | |||
self._reset_counter() | |||
|
|||
self._check_sanity() | |||
self._assign_vpp_info(self.model_chunks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assign_vpp_info
这个函数是不是不需要了?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assign_vpp_info
这个函数是不是不需要了?
这里是保留self._check_sanity(),删除self._assign_vpp_info(self.model_chunks)是吧?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for FLag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -934,7 +934,6 @@ | |||
kernel : | |||
func : flash_attn | |||
data_type : q | |||
intermediate : softmax_lse, seed_offset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
看python接口里这几个输出参数也没有被外部使用,为什么删除了intermediate
配置?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
看python接口里这几个输出参数也没有被外部使用,为什么删除了
intermediate
配置?
后续功能会在外部使用到
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* update * update
* tinyfix for PR PaddlePaddle#59644 * tinyfix * tinyfix * update
* update * update
* tinyfix for PR PaddlePaddle#59644 * tinyfix * tinyfix * update
* tinyfix for PR PaddlePaddle#59644 * tinyfix * tinyfix * update
PR types
Others
PR changes
Others
Description
Pcard-70448