[Feature] Demo stdet (#547)

* resolve comments * update changelog * init stdet_demo * frame extraction & human det * update code * rename label_map.txt as k400_label_map.txt * finish demo * after check * resolve comments & + docstring
open-mmlab · Jan 18, 2021 · 9723036 · 9723036
1 parent afd7cc9
commit 9723036
Show file tree

Hide file tree

Showing 11 changed files with 657 additions and 27 deletions.
diff --git a/demo/README.md b/demo/README.md
@@ -36,19 +36,19 @@ or use checkpoint url from `configs/` to directly load corresponding checkpoint,
 1. Recognize a video file as input by using a TSN model on cuda by default.
 
     ```shell
-    # The demo.mp4 and label_map.txt are both from Kinetics-400
+    # The demo.mp4 and label_map_k400.txt are both from Kinetics-400
     python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
         checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
-        demo/demo.mp4 demo/label_map.txt
+        demo/demo.mp4 demo/label_map_k400.txt
     ```
 
 2. Recognize a video file as input by using a TSN model on cuda by default, loading checkpoint from url.
 
     ```shell
-    # The demo.mp4 and label_map.txt are both from Kinetics-400
+    # The demo.mp4 and label_map_k400.txt are both from Kinetics-400
     python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
         https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
-        demo/demo.mp4 demo/label_map.txt
+        demo/demo.mp4 demo/label_map_k400.txt
     ```
 
 3. Recognize a list of rawframes as input by using a TSN model on cpu.
@@ -62,10 +62,10 @@ or use checkpoint url from `configs/` to directly load corresponding checkpoint,
 4. Recognize a video file as input by using a TSN model and then generate an mp4 file.
 
     ```shell
-    # The demo.mp4 and label_map.txt are both from Kinetics-400
+    # The demo.mp4 and label_map_k400.txt are both from Kinetics-400
     python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
         checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
-        demo/demo.mp4 demo/label_map.txt --out-filename demo/demo_out.mp4
+        demo/demo.mp4 demo/label_map_k400.txt --out-filename demo/demo_out.mp4
     ```
 
 5. Recognize a list of rawframes as input by using a TSN model and then generate a gif file.
@@ -79,30 +79,30 @@ or use checkpoint url from `configs/` to directly load corresponding checkpoint,
 6. Recognize a video file as input by using a TSN model, then generate an mp4 file with a given resolution and resize algorithm.
 
     ```shell
-    # The demo.mp4 and label_map.txt are both from Kinetics-400
+    # The demo.mp4 and label_map_k400.txt are both from Kinetics-400
     python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
         checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
-        demo/demo.mp4 demo/label_map.txt --target-resolution 340 256 --resize-algorithm bilinear \
+        demo/demo.mp4 demo/label_map_k400.txt --target-resolution 340 256 --resize-algorithm bilinear \
         --out-filename demo/demo_out.mp4
     ```
 
     ```shell
-    # The demo.mp4 and label_map.txt are both from Kinetics-400
+    # The demo.mp4 and label_map_k400.txt are both from Kinetics-400
     # If either dimension is set to -1, the frames are resized by keeping the existing aspect ratio
     # For --target-resolution 170 -1, original resolution (340, 256) -> target resolution (170, 128)
     python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
         checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
-        demo/demo.mp4 demo/label_map.txt --target-resolution 170 -1 --resize-algorithm bilinear \
+        demo/demo.mp4 demo/label_map_k400.txt --target-resolution 170 -1 --resize-algorithm bilinear \
         --out-filename demo/demo_out.mp4
     ```
 
 7. Recognize a video file as input by using a TSN model, then generate an mp4 file with a label in a red color and 10px fontsize.
 
     ```shell
-    # The demo.mp4 and label_map.txt are both from Kinetics-400
+    # The demo.mp4 and label_map_k400.txt are both from Kinetics-400
     python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
         checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
-        demo/demo.mp4 demo/label_map.txt --font-size 10 --font-color red \
+        demo/demo.mp4 demo/label_map_k400.txt --font-size 10 --font-color red \
         --out-filename demo/demo_out.mp4
     ```
 
@@ -181,7 +181,7 @@ or use checkpoint url from `configs/` to directly load corresponding checkpoint,
 
     ```shell
     python demo/webcam_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
-      checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth demo/label_map.txt --average-size 5 \
+      checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth demo/label_map_k400.txt --average-size 5 \
       --threshold 0.2 --device cpu
     ```
 
@@ -191,15 +191,15 @@ or use checkpoint url from `configs/` to directly load corresponding checkpoint,
     ```shell
     python demo/webcam_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
       https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
-      demo/label_map.txt --average-size 5 --threshold 0.2 --device cpu
+      demo/label_map_k400.txt --average-size 5 --threshold 0.2 --device cpu
     ```
 
 3. Recognize the action from web camera as input by using a I3D model on gpu by default, averaging the score per 5 times
     and outputting result labels with score higher than 0.2.
 
     ```shell
     python demo/webcam_demo.py configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py \
-      checkpoints/i3d_r50_32x2x1_100e_kinetics400_rgb_20200614-c25ef9a4.pth demo/label_map.txt \
+      checkpoints/i3d_r50_32x2x1_100e_kinetics400_rgb_20200614-c25ef9a4.pth demo/label_map_k400.txt \
       --average-size 5 --threshold 0.2
     ```
 
@@ -237,7 +237,7 @@ or use checkpoint url from `configs/` to directly load corresponding checkpoint,
 
     ```shell
     python demo/long_video_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
-      checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth PATH_TO_LONG_VIDEO demo/label_map.txt PATH_TO_SAVED_VIDEO \
+      checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth PATH_TO_LONG_VIDEO demo/label_map_k400.txt PATH_TO_SAVED_VIDEO \
       --input-step 3 --device cpu --threshold 0.2
     ```
 
@@ -247,7 +247,7 @@ or use checkpoint url from `configs/` to directly load corresponding checkpoint,
     ```shell
     python demo/long_video_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
       https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
-      PATH_TO_LONG_VIDEO demo/label_map.txt PATH_TO_SAVED_VIDEO --input-step 3 --device cpu --threshold 0.2
+      PATH_TO_LONG_VIDEO demo/label_map_k400.txt PATH_TO_SAVED_VIDEO --input-step 3 --device cpu --threshold 0.2
     ```
 
 3. Predict different labels in a long video from web by using a TSN model on cpu, with 3 frames for input steps (that is, random sample one from each 3 frames)
@@ -257,12 +257,12 @@ or use checkpoint url from `configs/` to directly load corresponding checkpoint,
     python demo/long_video_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
       https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
       https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4 \
-      demo/label_map.txt PATH_TO_SAVED_VIDEO --input-step 3 --device cpu --threshold 0.2
+      demo/label_map_k400.txt PATH_TO_SAVED_VIDEO --input-step 3 --device cpu --threshold 0.2
     ```
 
 4. Predict different labels in a long video by using a I3D model on gpu, with input_step=1 and threshold=0.01 as default.
 
     ```shell
     python demo/long_video_demo.py configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py \
-      checkpoints/i3d_r50_256p_32x2x1_100e_kinetics400_rgb_20200801-7d9f44de.pth PATH_TO_LONG_VIDEO demo/label_map.txt PATH_TO_SAVED_VIDEO \
+      checkpoints/i3d_r50_256p_32x2x1_100e_kinetics400_rgb_20200801-7d9f44de.pth PATH_TO_LONG_VIDEO demo/label_map_k400.txt PATH_TO_SAVED_VIDEO \
     ```
diff --git a/demo/demo.ipynb b/demo/demo.ipynb
@@ -54,7 +54,7 @@
    "source": [
     "# test a single video and show the result:\n",
     "video = 'demo.mp4'\n",
-    "label = 'label_map.txt'\n",
+    "label = 'label_map_k400.txt'\n",
     "results = inference_recognizer(model, video, label)"
    ]
   },