fix flow.set_grad_mode when directly calling #10059

marigoold · 2023-03-29T04:48:38Z

#close https://github.com/Oneflow-Inc/OneCloud/issues/203#issuecomment-1473171630

原先这里和 torch 的实现不同，我们是用 AutoGradMode 这个 C++ 对象的 RAII 来实现更改 grad mode 的，而 torch 是显式地调用 set_grad_mode 来更改的。这就导致了 OneFlow 里面没法全局修改线程里面的 grad mode，只能在装饰器或者上下文语句里面修改。

BBuf · 2023-03-29T06:32:31Z

python/oneflow/autograd/autograd_mode.py

-                return func(*args, **kwargs)
-
+                result = func(*args, **kwargs)
+            oneflow._oneflow_internal.autograd.set_grad_enabled(self.prev_mode)


可以导出这个api？看torch有torch.set_grad_enabled : https://pytorch.org/docs/stable/_modules/torch/autograd/grad_mode.html#set_grad_enabled

导出api不太行，这个class还有其它用法，只导出成api就只能当函数调用了

wyg1997 · 2023-03-29T06:34:58Z

python/oneflow/test/modules/test_autograd_mode.py

+
+        with flow.set_grad_enabled(True):
+            test_case.assertTrue(flow.is_grad_enabled())
+            flow.set_grad_enabled(False)


这个类的用法比较多，可以当装饰器、with域、也可以直接调用，可以考虑在里面测试一下装饰器的用法，保证 __call__ 方法调用的时候恢复的 prev_mode 数据是正确的

这个类的用法比较多，可以当装饰器、with域、也可以直接调用，可以考虑在里面测试一下装饰器的用法，保证 __call__ 方法调用的时候恢复的 prev_mode 数据是正确的

加上了

BBuf

LGTM

github-actions · 2023-03-29T10:29:05Z

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

…-Inc/oneflow into dev_fix_set_grad_enabled

github-actions · 2023-03-29T14:12:17Z

Speed stats:

github-actions · 2023-03-30T02:11:44Z

Speed stats:

github-actions · 2023-03-30T04:46:14Z

CI failed when running job: cuda-module. PR label automerge has been removed

github-actions · 2023-03-30T04:48:21Z

Speed stats:

GPU Name: GeForce GTX 1080 

❌ OneFlow resnet50 time: 141.3ms (= 14128.7ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 145.0ms (= 14498.2ms / 100, input_shape=[16, 3, 224, 224])
❌ Relative speed: 1.03 (= 145.0ms / 141.3ms)

OneFlow resnet50 time: 83.0ms (= 8303.2ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 87.5ms (= 8751.4ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.05 (= 87.5ms / 83.0ms)

OneFlow resnet50 time: 51.1ms (= 10229.7ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 59.5ms (= 11897.3ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.16 (= 59.5ms / 51.1ms)

OneFlow resnet50 time: 34.7ms (= 6932.4ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 47.3ms (= 9462.5ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.36 (= 47.3ms / 34.7ms)

OneFlow resnet50 time: 26.4ms (= 5281.4ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 40.6ms (= 8117.8ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.54 (= 40.6ms / 26.4ms)

OneFlow swin dataloader time: 0.242s (= 48.468s / 200, num_workers=1)
PyTorch swin dataloader time: 0.155s (= 31.072s / 200, num_workers=1)
Relative speed: 0.641 (= 0.155s / 0.242s)

OneFlow swin dataloader time: 0.071s (= 14.202s / 200, num_workers=4)
PyTorch swin dataloader time: 0.042s (= 8.382s / 200, num_workers=4)
Relative speed: 0.590 (= 0.042s / 0.071s)

OneFlow swin dataloader time: 0.045s (= 9.019s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.314s / 200, num_workers=8)
Relative speed: 0.478 (= 0.022s / 0.045s)

❌ OneFlow resnet50 time: 154.0ms (= 15400.8ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 166.3ms (= 16626.6ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.08 (= 166.3ms / 154.0ms)

OneFlow resnet50 time: 93.5ms (= 9350.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 104.0ms (= 10397.4ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.11 (= 104.0ms / 93.5ms)

OneFlow resnet50 time: 61.6ms (= 12325.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 82.8ms (= 16553.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.34 (= 82.8ms / 61.6ms)

OneFlow resnet50 time: 43.9ms (= 8782.3ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 72.5ms (= 14492.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.65 (= 72.5ms / 43.9ms)

OneFlow resnet50 time: 37.4ms (= 7479.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 74.4ms (= 14886.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.99 (= 74.4ms / 37.4ms)

github-actions · 2023-03-30T04:59:26Z

CI failed when running job: cpu-module. PR label automerge has been removed

github-actions · 2023-03-30T05:01:12Z

Speed stats:

github-actions · 2023-03-30T05:22:58Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/10059/

github-actions · 2023-03-30T05:29:36Z

Speed stats:

GPU Name: GeForce GTX 1080 

❌ OneFlow resnet50 time: 141.3ms (= 14132.1ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 143.6ms (= 14359.9ms / 100, input_shape=[16, 3, 224, 224])
❌ Relative speed: 1.02 (= 143.6ms / 141.3ms)

OneFlow resnet50 time: 83.3ms (= 8329.5ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 86.5ms (= 8654.1ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.04 (= 86.5ms / 83.3ms)

OneFlow resnet50 time: 51.3ms (= 10265.6ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 64.1ms (= 12824.4ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.25 (= 64.1ms / 51.3ms)

OneFlow resnet50 time: 34.0ms (= 6800.6ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 48.7ms (= 9748.6ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.43 (= 48.7ms / 34.0ms)

OneFlow resnet50 time: 25.8ms (= 5165.1ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 38.4ms (= 7678.1ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.49 (= 38.4ms / 25.8ms)

OneFlow swin dataloader time: 0.238s (= 47.689s / 200, num_workers=1)
PyTorch swin dataloader time: 0.146s (= 29.216s / 200, num_workers=1)
Relative speed: 0.613 (= 0.146s / 0.238s)

OneFlow swin dataloader time: 0.068s (= 13.676s / 200, num_workers=4)
PyTorch swin dataloader time: 0.041s (= 8.264s / 200, num_workers=4)
Relative speed: 0.604 (= 0.041s / 0.068s)

OneFlow swin dataloader time: 0.041s (= 8.228s / 200, num_workers=8)
PyTorch swin dataloader time: 0.022s (= 4.490s / 200, num_workers=8)
Relative speed: 0.546 (= 0.022s / 0.041s)

❌ OneFlow resnet50 time: 153.7ms (= 15367.4ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 165.5ms (= 16548.2ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
❌ Relative speed: 1.08 (= 165.5ms / 153.7ms)

OneFlow resnet50 time: 94.0ms (= 9398.5ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 104.5ms (= 10451.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.11 (= 104.5ms / 94.0ms)

OneFlow resnet50 time: 61.9ms (= 12385.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 78.7ms (= 15733.1ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.27 (= 78.7ms / 61.9ms)

OneFlow resnet50 time: 44.2ms (= 8839.6ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 71.3ms (= 14252.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.61 (= 71.3ms / 44.2ms)

OneFlow resnet50 time: 36.2ms (= 7235.9ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 68.1ms (= 13623.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.88 (= 68.1ms / 36.2ms)

add set_grad_mode in __init__

200a6e5

marigoold requested review from BBuf and daquexian as code owners March 29, 2023 04:48

marigoold changed the title ~~add set_grad_mode in __init__~~ fix flow.set_grad_mode when directly calling Mar 29, 2023

refine code

38e91f8

BBuf reviewed Mar 29, 2023

View reviewed changes

wyg1997 reviewed Mar 29, 2023

View reviewed changes

marigoold added 2 commits March 29, 2023 14:42

fix bug in __call__

5163f03

refine

5d68045

BBuf approved these changes Mar 29, 2023

View reviewed changes

refine

6c8c8c1

marigoold requested a review from oneflow-ci-bot March 29, 2023 10:27

marigoold added enhancement test api labels Mar 29, 2023

auto format by CI

efabb1f

marigoold requested review from oneflow-ci-bot and removed request for oneflow-ci-bot March 29, 2023 10:29

marigoold added 2 commits March 29, 2023 20:06

refine unittest

ba6a4e9

Merge branch 'dev_fix_set_grad_enabled' of https://github.com/Oneflow…

d2bca29

…-Inc/oneflow into dev_fix_set_grad_enabled

Flowingsun007 approved these changes Mar 29, 2023

View reviewed changes

marigoold added the automerge label Mar 30, 2023

Merge branch 'master' into dev_fix_set_grad_enabled

d057db0

Merge branch 'master' into dev_fix_set_grad_enabled

0269c51

marigoold enabled auto-merge (squash) March 30, 2023 03:50

github-actions bot removed the automerge label Mar 30, 2023

marigoold merged commit 37ec204 into master Mar 30, 2023

marigoold deleted the dev_fix_set_grad_enabled branch March 30, 2023 06:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix flow.set_grad_mode when directly calling #10059

fix flow.set_grad_mode when directly calling #10059

marigoold commented Mar 29, 2023

BBuf Mar 29, 2023

wyg1997 Mar 29, 2023

wyg1997 Mar 29, 2023

marigoold Mar 29, 2023

BBuf left a comment

github-actions bot commented Mar 29, 2023

github-actions bot commented Mar 29, 2023

github-actions bot commented Mar 30, 2023

github-actions bot commented Mar 30, 2023

github-actions bot commented Mar 30, 2023

github-actions bot commented Mar 30, 2023

github-actions bot commented Mar 30, 2023

github-actions bot commented Mar 30, 2023

github-actions bot commented Mar 30, 2023

fix flow.set_grad_mode when directly calling #10059

fix flow.set_grad_mode when directly calling #10059

Conversation

marigoold commented Mar 29, 2023

BBuf Mar 29, 2023

Choose a reason for hiding this comment

wyg1997 Mar 29, 2023

Choose a reason for hiding this comment

wyg1997 Mar 29, 2023

Choose a reason for hiding this comment

marigoold Mar 29, 2023

Choose a reason for hiding this comment

BBuf left a comment

Choose a reason for hiding this comment

github-actions bot commented Mar 29, 2023

github-actions bot commented Mar 29, 2023

github-actions bot commented Mar 30, 2023

github-actions bot commented Mar 30, 2023

github-actions bot commented Mar 30, 2023

github-actions bot commented Mar 30, 2023

github-actions bot commented Mar 30, 2023

github-actions bot commented Mar 30, 2023

github-actions bot commented Mar 30, 2023