release tensor by ReleaseTensor instruction #4737

daquexian · 2021-04-26T11:35:26Z

添加对 ReleaseTensor 指令的调用、在 python 退出时跳过 tensor 析构

…leasing when exiting Signed-off-by: daquexian <daquexian566@gmail.com>

lixinqi · 2021-04-26T13:42:00Z

oneflow/core/framework/tensor_impl.cpp

@@ -77,6 +77,12 @@ EagerMirroredTensorImpl::EagerMirroredTensorImpl(
      eager_blob_object_(eager_blob_object) {
  dtype_ = CHECK_JUST(DType::GetDTypeByDataType(eager_blob_object->blob_desc().data_type()));
  tensor_storage_ = std::make_shared<TensorStorage>(eager_blob_object->tensor_buffer());
+  tensor_storage_->set_releaser_hook(


const auto& eager_blob_object = this->eager_blob_object_; const auto& parallel_desc = this->parallel_desc();

lixinqi · 2021-04-26T13:42:34Z

oneflow/core/framework/tensor_impl.cpp

@@ -77,6 +77,12 @@ EagerMirroredTensorImpl::EagerMirroredTensorImpl(
      eager_blob_object_(eager_blob_object) {
  dtype_ = CHECK_JUST(DType::GetDTypeByDataType(eager_blob_object->blob_desc().data_type()));
  tensor_storage_ = std::make_shared<TensorStorage>(eager_blob_object->tensor_buffer());
+  tensor_storage_->set_releaser_hook(
+      [this](const std::shared_ptr<eager::TensorBuffer>& tensor_buffer) {


[eager_blob_object, parallel_desc]

this只是裸指针，不能保证执行的时候该指针还有效。

lixinqi · 2021-04-26T13:42:46Z

oneflow/core/framework/tensor_impl.cpp

@@ -77,6 +77,12 @@ EagerMirroredTensorImpl::EagerMirroredTensorImpl(
      eager_blob_object_(eager_blob_object) {
  dtype_ = CHECK_JUST(DType::GetDTypeByDataType(eager_blob_object->blob_desc().data_type()));
  tensor_storage_ = std::make_shared<TensorStorage>(eager_blob_object->tensor_buffer());
+  tensor_storage_->set_releaser_hook(
+      [this](const std::shared_ptr<eager::TensorBuffer>& tensor_buffer) {
+        PhysicalRun([this](const std::shared_ptr<InstructionsBuilder>& builder) {


* release tensor by instructions, update shut_down_util, skip tensor releasing when exiting Signed-off-by: daquexian <daquexian566@gmail.com> * Captures shared_ptr instread of raw pointer Co-authored-by: lixinqi <lixinqi0703106@163.com> Former-commit-id: 5170ffa

1. <del>#4737 这个很早之前的 PR 在 python 退出后会跳过 tensor 的析构，导致 oom，当时这么做的原因已经不记得了，现在来看不跳过也想不到会有什么问题，所以本 PR 恢复为正常执行析构。</del> 更新：CI 遇到问题，待进一步定位，这个 PR 先不包含这部分改动了。原 oom 问题可能和 #9681 这个 PR 把设置 is_shutting_down 的时机提前了有关 3. 修复线程不安全导致的 python stack getter 偶发 segfault 的 bug --------- Signed-off-by: daquexian <daquexian566@gmail.com> Co-authored-by: oneflow-ci-bot <ci-bot@oneflow.org> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

release tensor by instructions, update shut_down_util, skip tensor re…

b114c2e

…leasing when exiting Signed-off-by: daquexian <daquexian566@gmail.com>

daquexian added eager enhancement labels Apr 26, 2021

daquexian requested a review from oneflow-ci-bot April 26, 2021 11:38

oneflow-ci-bot removed their request for review April 26, 2021 12:00

lixinqi reviewed Apr 26, 2021

View reviewed changes

Captures shared_ptr instread of raw pointer

b3d44eb

lixinqi approved these changes Apr 26, 2021

View reviewed changes

daquexian requested a review from oneflow-ci-bot April 26, 2021 14:23

daquexian added the automerge label Apr 26, 2021

oneflow-ci-bot removed their request for review April 26, 2021 14:28

oneflow-ci-bot merged commit 5170ffa into master Apr 26, 2021

oneflow-ci-bot deleted the skip_releasing_tensor_when_exiting branch April 26, 2021 15:07

daquexian mentioned this pull request Mar 7, 2023

fix segfault and infinite loop in python stack getter #9955

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release tensor by ReleaseTensor instruction #4737

release tensor by ReleaseTensor instruction #4737

daquexian commented Apr 26, 2021

lixinqi Apr 26, 2021

lixinqi Apr 26, 2021

lixinqi Apr 26, 2021

daquexian Apr 26, 2021

lixinqi Apr 26, 2021

release tensor by ReleaseTensor instruction #4737

release tensor by ReleaseTensor instruction #4737

Conversation

daquexian commented Apr 26, 2021

lixinqi Apr 26, 2021

Choose a reason for hiding this comment

lixinqi Apr 26, 2021

Choose a reason for hiding this comment

lixinqi Apr 26, 2021

Choose a reason for hiding this comment

daquexian Apr 26, 2021

Choose a reason for hiding this comment

lixinqi Apr 26, 2021

Choose a reason for hiding this comment