-
Notifications
You must be signed in to change notification settings - Fork 802
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release tensor by ReleaseTensor instruction #4737
Conversation
…leasing when exiting Signed-off-by: daquexian <daquexian566@gmail.com>
@@ -77,6 +77,12 @@ EagerMirroredTensorImpl::EagerMirroredTensorImpl( | |||
eager_blob_object_(eager_blob_object) { | |||
dtype_ = CHECK_JUST(DType::GetDTypeByDataType(eager_blob_object->blob_desc().data_type())); | |||
tensor_storage_ = std::make_shared<TensorStorage>(eager_blob_object->tensor_buffer()); | |||
tensor_storage_->set_releaser_hook( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const auto& eager_blob_object = this->eager_blob_object_;
const auto& parallel_desc = this->parallel_desc();
@@ -77,6 +77,12 @@ EagerMirroredTensorImpl::EagerMirroredTensorImpl( | |||
eager_blob_object_(eager_blob_object) { | |||
dtype_ = CHECK_JUST(DType::GetDTypeByDataType(eager_blob_object->blob_desc().data_type())); | |||
tensor_storage_ = std::make_shared<TensorStorage>(eager_blob_object->tensor_buffer()); | |||
tensor_storage_->set_releaser_hook( | |||
[this](const std::shared_ptr<eager::TensorBuffer>& tensor_buffer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[eager_blob_object, parallel_desc]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this只是裸指针,不能保证执行的时候该指针还有效。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有道理
@@ -77,6 +77,12 @@ EagerMirroredTensorImpl::EagerMirroredTensorImpl( | |||
eager_blob_object_(eager_blob_object) { | |||
dtype_ = CHECK_JUST(DType::GetDTypeByDataType(eager_blob_object->blob_desc().data_type())); | |||
tensor_storage_ = std::make_shared<TensorStorage>(eager_blob_object->tensor_buffer()); | |||
tensor_storage_->set_releaser_hook( | |||
[this](const std::shared_ptr<eager::TensorBuffer>& tensor_buffer) { | |||
PhysicalRun([this](const std::shared_ptr<InstructionsBuilder>& builder) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[&]
* release tensor by instructions, update shut_down_util, skip tensor releasing when exiting Signed-off-by: daquexian <daquexian566@gmail.com> * Captures shared_ptr instread of raw pointer Co-authored-by: lixinqi <lixinqi0703106@163.com> Former-commit-id: 5170ffa
1. <del>#4737 这个很早之前的 PR 在 python 退出后会跳过 tensor 的析构,导致 oom,当时这么做的原因已经不记得了,现在来看不跳过也想不到会有什么问题,所以本 PR 恢复为正常执行析构。</del> 更新:CI 遇到问题,待进一步定位,这个 PR 先不包含这部分改动了。 原 oom 问题可能和 #9681 这个 PR 把设置 is_shutting_down 的时机提前了有关 3. 修复线程不安全导致的 python stack getter 偶发 segfault 的 bug --------- Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
添加对 ReleaseTensor 指令的调用、在 python 退出时跳过 tensor 析构