[CodeGen]Support Convert NVVM IR to Cubin With LibDevice Linked #10200

howin98 · 2023-04-26T03:32:47Z

No description provided.

…low-Inc/oneflow into support-nvvm-to-cubin-serial-passes

howin98 · 2023-04-26T03:35:03Z

github-actions · 2023-04-26T03:36:53Z

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

…low-Inc/oneflow into support-nvvm-to-cubin-serial-passes

oneflow/ir/test/GPU/test_nvvm_to_cubin.mlir

oneflow/ir/include/OneFlow/OneFlowPasses.td

oneflow/ir/lib/OneFlow/Conversion/NVVMToCubin.cpp

github-actions · 2023-04-26T04:15:04Z

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

github-actions · 2023-04-26T05:45:04Z

CI failed when running job: Build cu116. PR label automerge has been removed

…low-Inc/oneflow into support-nvvm-to-cubin-serial-passes

github-actions · 2023-04-26T07:37:07Z

Code got formatted by CI. Please request CI again if you still want to have this PR merged. If the PR is from a forked repo, please download the patch files from the GitHub Actions web page and apply them locally.

oneflow/ir/lib/OneFlow/Conversion/NVVMToCubin.cpp

github-actions · 2023-04-27T10:03:44Z

CI failed when running job: Build cpu. PR label automerge has been removed

…low-Inc/oneflow into support-nvvm-to-cubin-serial-passes

github-actions · 2023-04-27T11:24:57Z

CI failed when running job: Build cpu. PR label automerge has been removed

github-actions · 2023-04-27T13:04:00Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/10200/

github-actions · 2023-04-27T13:09:23Z

Speed stats:

GPU Name: NVIDIA GeForce RTX 3080 Ti 

❌ OneFlow resnet50 time: 43.2ms (= 4322.7ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.2ms (= 5716.1ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.32 (= 57.2ms / 43.2ms)

OneFlow resnet50 time: 26.2ms (= 2623.0ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 37.6ms (= 3760.9ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.43 (= 37.6ms / 26.2ms)

OneFlow resnet50 time: 19.7ms (= 3948.7ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 36.4ms (= 7270.8ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.84 (= 36.4ms / 19.7ms)

OneFlow resnet50 time: 18.9ms (= 3784.0ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 32.0ms (= 6394.5ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.69 (= 32.0ms / 18.9ms)

OneFlow resnet50 time: 18.9ms (= 3783.7ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 30.4ms (= 6084.3ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.61 (= 30.4ms / 18.9ms)

OneFlow swin dataloader time: 0.202s (= 40.342s / 200, num_workers=1)
PyTorch swin dataloader time: 0.129s (= 25.718s / 200, num_workers=1)
Relative speed: 0.638 (= 0.129s / 0.202s)

OneFlow swin dataloader time: 0.053s (= 10.664s / 200, num_workers=4)
PyTorch swin dataloader time: 0.032s (= 6.468s / 200, num_workers=4)
Relative speed: 0.607 (= 0.032s / 0.053s)

OneFlow swin dataloader time: 0.031s (= 6.162s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.345s / 200, num_workers=8)
Relative speed: 0.543 (= 0.017s / 0.031s)

❌ OneFlow resnet50 time: 47.5ms (= 4749.5ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 65.5ms (= 6552.3ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.38 (= 65.5ms / 47.5ms)

OneFlow resnet50 time: 31.6ms (= 3155.1ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 44.2ms (= 4417.3ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.40 (= 44.2ms / 31.6ms)

OneFlow resnet50 time: 24.0ms (= 4793.7ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 40.8ms (= 8168.6ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.70 (= 40.8ms / 24.0ms)

OneFlow resnet50 time: 22.3ms (= 4467.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 38.7ms (= 7748.9ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.73 (= 38.7ms / 22.3ms)

OneFlow resnet50 time: 21.4ms (= 4280.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 35.1ms (= 7021.7ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.64 (= 35.1ms / 21.4ms)

github-actions · 2023-04-27T13:43:12Z

View latest API docs preview at: https://staging.oneflow.info/docs/Oneflow-Inc/oneflow/pr/10200/

github-actions · 2023-04-27T13:51:29Z

Speed stats:

GPU Name: NVIDIA GeForce RTX 3090 

❌ OneFlow resnet50 time: 42.7ms (= 4269.9ms / 100, input_shape=[16, 3, 224, 224])
PyTorch resnet50 time: 57.7ms (= 5766.3ms / 100, input_shape=[16, 3, 224, 224])
✔️ Relative speed: 1.35 (= 57.7ms / 42.7ms)

OneFlow resnet50 time: 26.4ms (= 2637.2ms / 100, input_shape=[8, 3, 224, 224])
PyTorch resnet50 time: 37.6ms (= 3759.8ms / 100, input_shape=[8, 3, 224, 224])
✔️ Relative speed: 1.43 (= 37.6ms / 26.4ms)

OneFlow resnet50 time: 18.4ms (= 3670.3ms / 200, input_shape=[4, 3, 224, 224])
PyTorch resnet50 time: 35.3ms (= 7067.4ms / 200, input_shape=[4, 3, 224, 224])
✔️ Relative speed: 1.93 (= 35.3ms / 18.4ms)

OneFlow resnet50 time: 17.9ms (= 3577.4ms / 200, input_shape=[2, 3, 224, 224])
PyTorch resnet50 time: 31.0ms (= 6191.5ms / 200, input_shape=[2, 3, 224, 224])
✔️ Relative speed: 1.73 (= 31.0ms / 17.9ms)

OneFlow resnet50 time: 16.1ms (= 3212.9ms / 200, input_shape=[1, 3, 224, 224])
PyTorch resnet50 time: 28.8ms (= 5766.4ms / 200, input_shape=[1, 3, 224, 224])
✔️ Relative speed: 1.79 (= 28.8ms / 16.1ms)

OneFlow swin dataloader time: 0.201s (= 40.172s / 200, num_workers=1)
PyTorch swin dataloader time: 0.130s (= 26.005s / 200, num_workers=1)
Relative speed: 0.647 (= 0.130s / 0.201s)

OneFlow swin dataloader time: 0.054s (= 10.857s / 200, num_workers=4)
PyTorch swin dataloader time: 0.033s (= 6.697s / 200, num_workers=4)
Relative speed: 0.617 (= 0.033s / 0.054s)

OneFlow swin dataloader time: 0.031s (= 6.127s / 200, num_workers=8)
PyTorch swin dataloader time: 0.017s (= 3.450s / 200, num_workers=8)
Relative speed: 0.563 (= 0.017s / 0.031s)

❌ OneFlow resnet50 time: 49.0ms (= 4899.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 65.0ms (= 6496.9ms / 100, input_shape=[16, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.33 (= 65.0ms / 49.0ms)

OneFlow resnet50 time: 36.7ms (= 3666.0ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 46.8ms (= 4684.9ms / 100, input_shape=[8, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.28 (= 46.8ms / 36.7ms)

OneFlow resnet50 time: 28.7ms (= 5741.0ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 39.4ms (= 7873.9ms / 200, input_shape=[4, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.37 (= 39.4ms / 28.7ms)

OneFlow resnet50 time: 25.6ms (= 5118.0ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 38.5ms (= 7708.5ms / 200, input_shape=[2, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.51 (= 38.5ms / 25.6ms)

OneFlow resnet50 time: 24.5ms (= 4907.3ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
PyTorch resnet50 time: 36.6ms (= 7316.1ms / 200, input_shape=[1, 3, 224, 224], ddp, world size=2)
✔️ Relative speed: 1.49 (= 36.6ms / 24.5ms)

fix

ef90a10

howin98 requested review from daquexian, jackalcooper, hjchen2 and BBuf as code owners April 26, 2023 03:32

howin98 and others added 3 commits April 26, 2023 11:33

Merge branch 'master' into support-nvvm-to-cubin-serial-passes

bd91bb4

trim

9d54af1

Merge branch 'support-nvvm-to-cubin-serial-passes' of github.com:Onef…

e6f2e65

…low-Inc/oneflow into support-nvvm-to-cubin-serial-passes

howin98 requested a review from oneflow-ci-bot April 26, 2023 03:35

howin98 added feature automerge ir labels Apr 26, 2023

auto format by CI

4b6bef0

howin98 added 2 commits April 26, 2023 11:54

auto fetch version

10dd5b4

Merge branch 'support-nvvm-to-cubin-serial-passes' of github.com:Onef…

d1bdd38

…low-Inc/oneflow into support-nvvm-to-cubin-serial-passes

jackalcooper reviewed Apr 26, 2023

View reviewed changes

oneflow/ir/test/GPU/test_nvvm_to_cubin.mlir Show resolved Hide resolved

trim

6e8b08f

jackalcooper reviewed Apr 26, 2023

View reviewed changes

oneflow/ir/include/OneFlow/OneFlowPasses.td Show resolved Hide resolved

jackalcooper reviewed Apr 26, 2023

View reviewed changes

oneflow/ir/lib/OneFlow/Conversion/NVVMToCubin.cpp Show resolved Hide resolved

howin98 and others added 3 commits April 26, 2023 12:05

beautify

2041a67

fix

00255b5

auto format by CI

391cb48

github-actions bot removed the automerge label Apr 26, 2023

howin98 and others added 3 commits April 26, 2023 15:34

fix version bug

76f5294

Merge branch 'support-nvvm-to-cubin-serial-passes' of github.com:Onef…

9c87825

…low-Inc/oneflow into support-nvvm-to-cubin-serial-passes

auto format by CI

45e9814

mosout reviewed Apr 26, 2023

View reviewed changes

oneflow/ir/lib/OneFlow/Conversion/NVVMToCubin.cpp Outdated Show resolved Hide resolved

oneflow/ir/lib/OneFlow/Conversion/NVVMToCubin.cpp Show resolved Hide resolved

howin98 and others added 3 commits April 26, 2023 20:20

fix

2dd6ec2

fix

90f6df7

Merge branch 'master' into support-nvvm-to-cubin-serial-passes

7da6adf

howin98 added the automerge label Apr 27, 2023

mosout approved these changes Apr 27, 2023

View reviewed changes

BBuf approved these changes Apr 27, 2023

View reviewed changes

mergify bot added 2 commits April 27, 2023 09:04

Merge branch 'master' into support-nvvm-to-cubin-serial-passes

78d3494

Merge branch 'master' into support-nvvm-to-cubin-serial-passes

b075327

github-actions bot removed the automerge label Apr 27, 2023

howin98 added 3 commits April 27, 2023 18:43

fix cpu

f38d98a

Merge branch 'support-nvvm-to-cubin-serial-passes' of github.com:Onef…

124d8bf

…low-Inc/oneflow into support-nvvm-to-cubin-serial-passes

brace

df7e0ba

howin98 added the automerge label Apr 27, 2023

add flag in cmake

1158867

github-actions bot removed the automerge label Apr 27, 2023

howin98 and others added 2 commits April 27, 2023 20:32

fix lit cfg support in cpu

9e17dad

Merge branch 'master' into support-nvvm-to-cubin-serial-passes

35bdfba

Merge branch 'master' into support-nvvm-to-cubin-serial-passes

34d58ab

howin98 added the automerge label Apr 27, 2023

mergify bot merged commit a22327b into master Apr 27, 2023

mergify bot deleted the support-nvvm-to-cubin-serial-passes branch April 27, 2023 14:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CodeGen]Support Convert NVVM IR to Cubin With LibDevice Linked #10200

[CodeGen]Support Convert NVVM IR to Cubin With LibDevice Linked #10200

howin98 commented Apr 26, 2023

howin98 commented Apr 26, 2023

github-actions bot commented Apr 26, 2023

github-actions bot commented Apr 26, 2023

github-actions bot commented Apr 26, 2023

github-actions bot commented Apr 26, 2023

github-actions bot commented Apr 27, 2023

github-actions bot commented Apr 27, 2023

github-actions bot commented Apr 27, 2023

github-actions bot commented Apr 27, 2023

github-actions bot commented Apr 27, 2023

github-actions bot commented Apr 27, 2023

[CodeGen]Support Convert NVVM IR to Cubin With LibDevice Linked #10200

[CodeGen]Support Convert NVVM IR to Cubin With LibDevice Linked #10200

Conversation

howin98 commented Apr 26, 2023

howin98 commented Apr 26, 2023

github-actions bot commented Apr 26, 2023

github-actions bot commented Apr 26, 2023

github-actions bot commented Apr 26, 2023

github-actions bot commented Apr 26, 2023

github-actions bot commented Apr 27, 2023

github-actions bot commented Apr 27, 2023

github-actions bot commented Apr 27, 2023

github-actions bot commented Apr 27, 2023

github-actions bot commented Apr 27, 2023

github-actions bot commented Apr 27, 2023