Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PP-HumanMatting has non-optimal graph #2276

Closed
jakpiase opened this issue Jun 30, 2022 · 3 comments
Closed

PP-HumanMatting has non-optimal graph #2276

jakpiase opened this issue Jun 30, 2022 · 3 comments
Assignees
Labels
question Further information is requested stale Long time without interaction

Comments

@jakpiase
Copy link

While optimizing PP-HumanMatting model I have found that the computational graph contains some weird and non-optimal pattern. Instead of using pad2d op, there is a combination of unsqueeze2 + pad3d + squeeze2 ops, which are behaving like pad2d, but are significantly slowing the model. I have written both pad3d and pad2d oneDNN kernels in PR: #43990. This PR sped up HumanMatting model by 30%, but to achieve even better performance changing unsqueeze2 + pad3d + squeeze2 patterns into pad2d is needed. Doing that will improve model's performance under oneDNN by another 13% compared to current profiling listed below.

Spotted pattern on humanmatting_model.zip:
image

Profiling measured on Intel(R) Core(TM) i9-9940X CPU @ 3.30GHz after #43990:

------------------------- Overhead Summary -------------------------

Total time: 6985.58
Computation time Total: 6877.93 Ratio: 98.4589%
Framework overhead Total: 107.654 Ratio: 1.5411%

------------------------- Event Summary -------------------------

Event Calls Total Min. Max. Ave. Ratio.
thread0::conv2d 900 3625.98 0.061789 52.8846 4.02886 0.519066
thread0::bilinear_interp_v2 200 746.636 0.152614 36.586 3.73318 0.106882
thread0::Executor::Run 1 665.823 665.823 665.823 665.823 0.0953138
Executor::RunPartialPreparedContext 1 665.703 665.703 665.703 665.703 0.0952967
load_combine 1 665.569 665.569 665.569 665.569 0.0952776
thread0::concat 170 551.391 0.019189 39.612 3.24347 0.0789327
thread0::squeeze2 30 449.727 1.65939 109.896 14.9909 0.0643793
thread0::unsqueeze2 30 444.685 1.78944 110.22 14.8228 0.0636576
thread0::arg_max 10 153.128 13.949 16.8544 15.3128 0.0219206
thread0::pad3d 30 143.108 0.613291 16.0822 4.77026 0.0204862
thread0::nearest_interp_v2 10 91.2025 6.93993 22.8784 9.12025 0.0130558
thread0::slice 80 35.9588 0.009735 3.30152 0.449485 0.00514757
thread0::relu 10 20.5054 1.76941 2.98284 2.05054 0.00293538
thread0::pool2d 70 19.2899 0.070717 0.963652 0.27557 0.00276139
thread0::softmax 10 15.4916 1.3888 1.69225 1.54916 0.00221766
thread0::equal 20 6.84826 0.267666 0.417393 0.342413 0.000980343
thread0::elementwise_add 40 5.39096 0.066211 0.4031 0.134774 0.000771727
thread0::cast 20 5.02093 0.183324 0.665627 0.251047 0.000718757
thread0::elementwise_mul 10 1.52323 0.11234 0.188003 0.152323 0.000218053
thread0::sigmoid 10 1.40188 0.096007 0.208529 0.140188 0.000200681
thread0::shape 40 0.898588 0.010386 0.084814 0.0224647 0.000128635
thread0::scale 20 0.694454 0.00987 0.102248 0.0347227 9.94125e-05
thread0::fill_constant 40 0.637646 0.00751 0.045786 0.0159411 9.12803e-05
thread0::elementwise_floordiv 20 0.241182 0.007927 0.023915 0.0120591 3.45257e-05
@jakpiase jakpiase added the question Further information is requested label Jun 30, 2022
@wuyefeilin
Copy link
Collaborator

Thank you for your suggestion

@wrobcio789
Copy link

Really looking forward to that performance boost

@github-actions
Copy link

github-actions bot commented Dec 9, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Long time without interaction label Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested stale Long time without interaction
Projects
None yet
Development

No branches or pull requests

3 participants