You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While optimizing PP-HumanMatting model I have found that the computational graph contains some weird and non-optimal pattern. Instead of using pad2d op, there is a combination of unsqueeze2 + pad3d + squeeze2 ops, which are behaving like pad2d, but are significantly slowing the model. I have written both pad3d and pad2d oneDNN kernels in PR: #43990. This PR sped up HumanMatting model by 30%, but to achieve even better performance changing unsqueeze2 + pad3d + squeeze2 patterns into pad2d is needed. Doing that will improve model's performance under oneDNN by another 13% compared to current profiling listed below.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
While optimizing PP-HumanMatting model I have found that the computational graph contains some weird and non-optimal pattern. Instead of using
pad2d
op, there is a combination ofunsqueeze2
+pad3d
+squeeze2
ops, which are behaving likepad2d
, but are significantly slowing the model. I have written bothpad3d
andpad2d
oneDNN kernels in PR: #43990. This PR sped up HumanMatting model by 30%, but to achieve even better performance changingunsqueeze2
+pad3d
+squeeze2
patterns intopad2d
is needed. Doing that will improve model's performance under oneDNN by another 13% compared to current profiling listed below.Spotted pattern on humanmatting_model.zip:

Profiling measured on
Intel(R) Core(TM) i9-9940X CPU @ 3.30GHz
after #43990:------------------------- Overhead Summary -------------------------
Total time: 6985.58
Computation time Total: 6877.93 Ratio: 98.4589%
Framework overhead Total: 107.654 Ratio: 1.5411%
------------------------- Event Summary -------------------------
The text was updated successfully, but these errors were encountered: