Performance regression in TV-L1 optical flow #2459

willprice · 2020-03-11T19:27:52Z

System information (version)

OpenCV => 4.10
Operating System / Platform => Linux 64 Bit
Compiler => GCC/nvcc

Detailed description

The CUDA Implementation of TVL1 seems to be significantly slower in OpenCV 4.x
than in OpenCV 2.x.

Steps to reproduce

I've created a repository here
with two versions of a codebase I use for computing optical flow (it's rather
simple, it just reads in frames form a video, resizes them and either executes
TVL1 or Brox flow on them).

There are Docker image definitions both built on Ubuntu 16.04 with CUDA 8.0 upon
which two application Docker images are built, one for each version of OpenCV.

Simply run make to build the docker images, run the ffmpeg command to dump
some frames from a video, then run the commands under Run speed test to get
some timing results.

Under OpenCV 4 it takes 152s to compute flow for 240 frames compared to 15s for
OpenCV 2. This is an order of magnitude slower under OpenCV 4.

Any ideas on what could be causing this? I've tried my best to make both OpenCV builds as comparable as possible so I'm reasonably confident this is an issue in OpenCV itself rather than my application code.

The text was updated successfully, but these errors were encountered:

hanzhn · 2020-03-12T04:12:12Z

I met the same problem with TV-L1 CUDA version. During a while loop, it seems to be run faster at the first several times, but gets slower and slower even more than 18 seconds once! I have no idea why.

willprice · 2020-03-13T00:08:51Z

I've just run both codebases under nvprof on 20 and 50 frame sample videos. The files can be found here:
https://drive.google.com/open?id=1Fz_ZTODkLYYoAFeSNJ85rZjBIBO2RhWn

OpenCV 2 - 20 frame

OpenCV 4 - 50 frame

josh-gleason · 2021-11-08T01:06:40Z

Worth pointing out that two additional parameters were introduced in OpenCV 3.X: scaleStep and gamma. The default values are 0.8 and 0.0 respectively. By comparing the source code of OpenCV 2.4 and OpenCV 3.X we can find that the equivalent values of scaleStep and gamma are actually 0.5 and 0.0 respectively. Setting scaleStep to 0.5 in OpenCV 3.X/4.X does result in a fairly significant speedup, however even with this correction the latest code is still about 50% the speed of OpenCV 2.4 at least in my experiments.

Really hope this gets addressed especially considering that OpenCV 2.4 is no longer supported and can't be used with newer GPUs.

willprice mentioned this issue May 12, 2020

Support video resizing before computing flow willprice/flowty#39

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression in TV-L1 optical flow #2459

Performance regression in TV-L1 optical flow #2459

willprice commented Mar 11, 2020 •

edited

Loading

hanzhn commented Mar 12, 2020

willprice commented Mar 13, 2020 •

edited

Loading

josh-gleason commented Nov 8, 2021

Performance regression in TV-L1 optical flow #2459

Performance regression in TV-L1 optical flow #2459

Comments

willprice commented Mar 11, 2020 • edited Loading

System information (version)

Detailed description

Steps to reproduce

hanzhn commented Mar 12, 2020

willprice commented Mar 13, 2020 • edited Loading

josh-gleason commented Nov 8, 2021

willprice commented Mar 11, 2020 •

edited

Loading

willprice commented Mar 13, 2020 •

edited

Loading