Skip to content

Commit

Permalink
[Feature] Demo stdet (#547)
Browse files Browse the repository at this point in the history
* resolve comments

* update changelog

* init stdet_demo

* frame extraction & human det

* update code

* rename label_map.txt as k400_label_map.txt

* finish demo

* after check

* resolve comments & + docstring
  • Loading branch information
kennymckormick authored Jan 18, 2021
1 parent afd7cc9 commit 9723036
Show file tree
Hide file tree
Showing 11 changed files with 657 additions and 27 deletions.
38 changes: 19 additions & 19 deletions demo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,19 +36,19 @@ or use checkpoint url from `configs/` to directly load corresponding checkpoint,
1. Recognize a video file as input by using a TSN model on cuda by default.

```shell
# The demo.mp4 and label_map.txt are both from Kinetics-400
# The demo.mp4 and label_map_k400.txt are both from Kinetics-400
python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
demo/demo.mp4 demo/label_map.txt
demo/demo.mp4 demo/label_map_k400.txt
```

2. Recognize a video file as input by using a TSN model on cuda by default, loading checkpoint from url.

```shell
# The demo.mp4 and label_map.txt are both from Kinetics-400
# The demo.mp4 and label_map_k400.txt are both from Kinetics-400
python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
demo/demo.mp4 demo/label_map.txt
demo/demo.mp4 demo/label_map_k400.txt
```

3. Recognize a list of rawframes as input by using a TSN model on cpu.
Expand All @@ -62,10 +62,10 @@ or use checkpoint url from `configs/` to directly load corresponding checkpoint,
4. Recognize a video file as input by using a TSN model and then generate an mp4 file.

```shell
# The demo.mp4 and label_map.txt are both from Kinetics-400
# The demo.mp4 and label_map_k400.txt are both from Kinetics-400
python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
demo/demo.mp4 demo/label_map.txt --out-filename demo/demo_out.mp4
demo/demo.mp4 demo/label_map_k400.txt --out-filename demo/demo_out.mp4
```

5. Recognize a list of rawframes as input by using a TSN model and then generate a gif file.
Expand All @@ -79,30 +79,30 @@ or use checkpoint url from `configs/` to directly load corresponding checkpoint,
6. Recognize a video file as input by using a TSN model, then generate an mp4 file with a given resolution and resize algorithm.

```shell
# The demo.mp4 and label_map.txt are both from Kinetics-400
# The demo.mp4 and label_map_k400.txt are both from Kinetics-400
python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
demo/demo.mp4 demo/label_map.txt --target-resolution 340 256 --resize-algorithm bilinear \
demo/demo.mp4 demo/label_map_k400.txt --target-resolution 340 256 --resize-algorithm bilinear \
--out-filename demo/demo_out.mp4
```

```shell
# The demo.mp4 and label_map.txt are both from Kinetics-400
# The demo.mp4 and label_map_k400.txt are both from Kinetics-400
# If either dimension is set to -1, the frames are resized by keeping the existing aspect ratio
# For --target-resolution 170 -1, original resolution (340, 256) -> target resolution (170, 128)
python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
demo/demo.mp4 demo/label_map.txt --target-resolution 170 -1 --resize-algorithm bilinear \
demo/demo.mp4 demo/label_map_k400.txt --target-resolution 170 -1 --resize-algorithm bilinear \
--out-filename demo/demo_out.mp4
```

7. Recognize a video file as input by using a TSN model, then generate an mp4 file with a label in a red color and 10px fontsize.

```shell
# The demo.mp4 and label_map.txt are both from Kinetics-400
# The demo.mp4 and label_map_k400.txt are both from Kinetics-400
python demo/demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
demo/demo.mp4 demo/label_map.txt --font-size 10 --font-color red \
demo/demo.mp4 demo/label_map_k400.txt --font-size 10 --font-color red \
--out-filename demo/demo_out.mp4
```

Expand Down Expand Up @@ -181,7 +181,7 @@ or use checkpoint url from `configs/` to directly load corresponding checkpoint,

```shell
python demo/webcam_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth demo/label_map.txt --average-size 5 \
checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth demo/label_map_k400.txt --average-size 5 \
--threshold 0.2 --device cpu
```

Expand All @@ -191,15 +191,15 @@ or use checkpoint url from `configs/` to directly load corresponding checkpoint,
```shell
python demo/webcam_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
demo/label_map.txt --average-size 5 --threshold 0.2 --device cpu
demo/label_map_k400.txt --average-size 5 --threshold 0.2 --device cpu
```

3. Recognize the action from web camera as input by using a I3D model on gpu by default, averaging the score per 5 times
and outputting result labels with score higher than 0.2.

```shell
python demo/webcam_demo.py configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py \
checkpoints/i3d_r50_32x2x1_100e_kinetics400_rgb_20200614-c25ef9a4.pth demo/label_map.txt \
checkpoints/i3d_r50_32x2x1_100e_kinetics400_rgb_20200614-c25ef9a4.pth demo/label_map_k400.txt \
--average-size 5 --threshold 0.2
```

Expand Down Expand Up @@ -237,7 +237,7 @@ or use checkpoint url from `configs/` to directly load corresponding checkpoint,
```shell
python demo/long_video_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth PATH_TO_LONG_VIDEO demo/label_map.txt PATH_TO_SAVED_VIDEO \
checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth PATH_TO_LONG_VIDEO demo/label_map_k400.txt PATH_TO_SAVED_VIDEO \
--input-step 3 --device cpu --threshold 0.2
```
Expand All @@ -247,7 +247,7 @@ or use checkpoint url from `configs/` to directly load corresponding checkpoint,
```shell
python demo/long_video_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
PATH_TO_LONG_VIDEO demo/label_map.txt PATH_TO_SAVED_VIDEO --input-step 3 --device cpu --threshold 0.2
PATH_TO_LONG_VIDEO demo/label_map_k400.txt PATH_TO_SAVED_VIDEO --input-step 3 --device cpu --threshold 0.2
```
3. Predict different labels in a long video from web by using a TSN model on cpu, with 3 frames for input steps (that is, random sample one from each 3 frames)
Expand All @@ -257,12 +257,12 @@ or use checkpoint url from `configs/` to directly load corresponding checkpoint,
python demo/long_video_demo.py configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py \
https://download.openmmlab.com/mmaction/recognition/tsn/tsn_r50_1x1x3_100e_kinetics400_rgb/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth \
https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4 \
demo/label_map.txt PATH_TO_SAVED_VIDEO --input-step 3 --device cpu --threshold 0.2
demo/label_map_k400.txt PATH_TO_SAVED_VIDEO --input-step 3 --device cpu --threshold 0.2
```
4. Predict different labels in a long video by using a I3D model on gpu, with input_step=1 and threshold=0.01 as default.
```shell
python demo/long_video_demo.py configs/recognition/i3d/i3d_r50_video_inference_32x2x1_100e_kinetics400_rgb.py \
checkpoints/i3d_r50_256p_32x2x1_100e_kinetics400_rgb_20200801-7d9f44de.pth PATH_TO_LONG_VIDEO demo/label_map.txt PATH_TO_SAVED_VIDEO \
checkpoints/i3d_r50_256p_32x2x1_100e_kinetics400_rgb_20200801-7d9f44de.pth PATH_TO_LONG_VIDEO demo/label_map_k400.txt PATH_TO_SAVED_VIDEO \
```
2 changes: 1 addition & 1 deletion demo/demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
"source": [
"# test a single video and show the result:\n",
"video = 'demo.mp4'\n",
"label = 'label_map.txt'\n",
"label = 'label_map_k400.txt'\n",
"results = inference_recognizer(model, video, label)"
]
},
Expand Down
Loading

0 comments on commit 9723036

Please sign in to comment.