BEV support by @Arthur151

Yu Sun · YuliangXiu · commit 060e265bd253 · 2022-05-30T10:25:48.000+02:00
diff --git a/.gitignore b/.gitignore
@@ -9,3 +9,8 @@ results/*
 !.gitignore
 force_push.sh
 .idea
+smplx/
+human_det/
+kaolin/
+neural_voxelization_layer/
+pytorch3d/
diff --git a/README.md b/README.md
@@ -39,6 +39,7 @@
 <br />
 
 ## News :triangular_flag_on_post:
+- [2022/05/16] <a href="https://github.com/Arthur151/ROMP">BEV</a> is supported as optional HPS by <a href="https://scholar.google.com/citations?hl=en&user=fkGxgrsAAAAJ">Yu Sun</a>.
 - [2022/05/15] Training code is released, please check [Training Instruction](docs/training.md).
 - [2022/04/26] <a href="https://github.com/Jeff-sjtu/HybrIK">HybrIK (SMPL)</a> is supported as optional HPS by <a href="https://jeffli.site/">Jiefeng Li</a>.
 - [2022/03/05] <a href="https://github.com/YadiraF/PIXIE">PIXIE (SMPL-X)</a>, <a href="https://github.com/mkocabas/PARE">PARE (SMPL)</a>, <a href="https://github.com/HongwenZhang/PyMAF">PyMAF (SMPL)</a> are all supported as optional HPS.
@@ -114,9 +115,9 @@
 ## TODO
 
 - [x] testing code and pretrained models (*self-implemented version)
-  - [x] ICON (w/ & w/o global encoder, w/ PyMAF/HybrIK/PIXIE/PARE as HPS)
+  - [x] ICON (w/ & w/o global encoder, w/ PyMAF/HybrIK/BEV/PIXIE/PARE as HPS)
   - [x] PIFu* (RGB image + predicted normal map as input)
-  - [x] PaMIR* (RGB image + predicted normal map as input, w/ PyMAF/PARE as HPS)
+  - [x] PaMIR* (RGB image + predicted normal map as input, w/ PyMAF/HybrIK/BEV/PIXIE/PARE as HPS)
 - [x] colab notebook <a href='https://colab.research.google.com/drive/1-AWeWhPvCTBX0KfMtgtMk10uPU05ihoA?usp=sharing' style='padding-left: 0.5rem;'>
       <img src='https://colab.research.google.com/assets/colab-badge.svg' alt='Google Colab'>
     </a>
@@ -153,10 +154,10 @@ python infer.py -cfg ../configs/pifu.yaml -gpu 0 -in_dir ../examples -out_dir ..
 python infer.py -cfg ../configs/pamir.yaml -gpu 0 -in_dir ../examples -out_dir ../results
 
 # ICON w/ global filter (better visual details --> lower Normal Error))
-python infer.py -cfg ../configs/icon-filter.yaml -gpu 0 -in_dir ../examples -out_dir ../results -hps_type {pixie/pymaf/pare/hybrik}
+python infer.py -cfg ../configs/icon-filter.yaml -gpu 0 -in_dir ../examples -out_dir ../results -hps_type {pixie/pymaf/pare/hybrik/bev}
 
 # ICON w/o global filter (higher evaluation scores --> lower P2S/Chamfer Error))
-python infer.py -cfg ../configs/icon-nofilter.yaml -gpu 0 -in_dir ../examples -out_dir ../results -hps_type {pixie/pymaf/pare/hybrik}
+python infer.py -cfg ../configs/icon-nofilter.yaml -gpu 0 -in_dir ../examples -out_dir ../results -hps_type {pixie/pymaf/pare/hybrik/bev}
 ```
 
 ## More Qualitative Results
@@ -197,7 +198,7 @@ Here are some great resources we benefit from:
 - [PaMIR](https://github.com/ZhengZerong/PaMIR), [PIFu](https://github.com/shunsukesaito/PIFu), [PIFuHD](https://github.com/facebookresearch/pifuhd), and [MonoPort](https://github.com/Project-Splinter/MonoPort) for Benchmark
 - [SCANimate](https://github.com/shunsukesaito/SCANimate) and [AIST++](https://github.com/google/aistplusplus_api) for Animation
 - [rembg](https://github.com/danielgatis/rembg) for Human Segmentation
-- [smplx](https://github.com/vchoutas/smplx), [PARE](https://github.com/mkocabas/PARE), [PyMAF](https://github.com/HongwenZhang/PyMAF), [PIXIE](https://github.com/YadiraF/PIXIE), and [HybrIK](https://github.com/Jeff-sjtu/HybrIK) for Human Pose & Shape Estimation
+- [smplx](https://github.com/vchoutas/smplx), [PARE](https://github.com/mkocabas/PARE), [PyMAF](https://github.com/HongwenZhang/PyMAF), [PIXIE](https://github.com/YadiraF/PIXIE), [BEV](https://github.com/Arthur151/ROMP), and [HybrIK](https://github.com/Jeff-sjtu/HybrIK) for Human Pose & Shape Estimation
 - [CAPE](https://github.com/qianlim/CAPE) and [THuman](https://github.com/ZhengZerong/DeepHuman/tree/master/THUmanDataset) for Dataset
 - [PyTorch3D](https://github.com/facebookresearch/pytorch3d) for Differential Rendering
 
diff --git a/docs/dataset.md b/docs/dataset.md
@@ -44,3 +44,35 @@ You could check the visibility computing status from `log/vis/thuman2-{num_views
 |---|---|---|---|---|---|
 |RGB Image|Normal(Front)|Normal(Back)|Normal(SMPL, Front)|Normal(SMPL, Back)|Visibility|
 
+## Citation
+If you use this dataset for your research, please consider citing:
+```
+@InProceedings{tao2021function4d,
+  title={Function4D: Real-time Human Volumetric Capture from Very Sparse Consumer RGBD Sensors},
+  author={Yu, Tao and Zheng, Zerong and Guo, Kaiwen and Liu, Pengpeng and Dai, Qionghai and Liu, Yebin},
+  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR2021)},
+  month={June},
+  year={2021},
+}
+```
+This `PyTorch Dataloader` benefits a lot from [MonoPortDataset](https://github.com/Project-Splinter/MonoPortDataset), so please consider citing:
+
+```
+@inproceedings{li2020monoport,
+  title={Monocular Real-Time Volumetric Performance Capture},
+  author={Li, Ruilong and Xiu, Yuliang and Saito, Shunsuke and Huang, Zeng and Olszewski, Kyle and Li, Hao},
+  booktitle={European Conference on Computer Vision},
+  pages={49--67},
+  year={2020},
+  organization={Springer}
+}
+  
+@incollection{li2020monoportRTL,
+  title={Volumetric human teleportation},
+  author={Li, Ruilong and Olszewski, Kyle and Xiu, Yuliang and Saito, Shunsuke and Huang, Zeng and Li, Hao},
+  booktitle={ACM SIGGRAPH 2020 Real-Time Live},
+  pages={1--1},
+  year={2020}
+}
+```
+
diff --git a/docs/installation.md b/docs/installation.md
@@ -67,6 +67,61 @@ Optional:
   bash fetch_hps.sh
   ```
 
+## Citation
+:+1: Please consider citing these awesome HPS approaches
+
+<details><summary>PyMAF, PARE, PIXIE, HybrIK, BEV</summary>
+
+```
+@inproceedings{pymaf2021,
+  title={PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop},
+  author={Zhang, Hongwen and Tian, Yating and Zhou, Xinchi and Ouyang, Wanli and Liu, Yebin and Wang, Limin and Sun, Zhenan},
+  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
+  year={2021}
+}
+
+@inproceedings{Kocabas_PARE_2021,
+  title = {{PARE}: Part Attention Regressor for {3D} Human Body Estimation},
+  author = {Kocabas, Muhammed and Huang, Chun-Hao P. and Hilliges, Otmar and Black, Michael J.},
+  booktitle = {Proc. International Conference on Computer Vision (ICCV)},
+  pages = {11127--11137},
+  month = oct,
+  year = {2021},
+  doi = {},
+  month_numeric = {10}
+}
+
+@inproceedings{PIXIE:2021,
+  title={Collaborative Regression of Expressive Bodies using Moderation}, 
+  author={Yao Feng and Vasileios Choutas and Timo Bolkart and Dimitrios Tzionas and Michael J. Black},
+  booktitle={International Conference on 3D Vision (3DV)},
+  year={2021}
+}
+
+@inproceedings{li2021hybrik,
+  title={Hybrik: A hybrid analytical-neural inverse kinematics solution for 3d human pose and shape estimation},
+  author={Li, Jiefeng and Xu, Chao and Chen, Zhicun and Bian, Siyuan and Yang, Lixin and Lu, Cewu},
+  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+  pages={3383--3393},
+  year={2021}
+}
+
+@InProceedings{BEV,
+  author = {Sun, Yu and Liu, Wu and Bao, Qian and Fu, Yili and Mei, Tao and Black, Michael J},
+  title = {Putting People in their Place: Monocular Regression of 3D People in Depth},
+  booktitle = {CVPR},
+  year = {2022}
+}
+
+@InProceedings{ROMP,
+  author = {Sun, Yu and Bao, Qian and Liu, Wu and Fu, Yili and Michael J., Black and Mei, Tao},
+  title = {Monocular, One-stage, Regression of Multiple 3D People},
+  booktitle = {ICCV},
+  year = {2021}
+}
+
+```
+</details>
 
 ## Tree structure of **data** folder
 
diff --git a/lib/dataset/TestDataset.py b/lib/dataset/TestDataset.py
@@ -40,7 +40,7 @@
 from lib.pymaf.models import pymaf_net
 from lib.pymaf.core import path_config
 from lib.pymaf.utils.imutils import process_image
-from lib.pymaf.utils.geometry import rotation_matrix_to_angle_axis
+from lib.pymaf.utils.geometry import rotation_matrix_to_angle_axis, batch_rodrigues
 
 # for pare
 from lib.pare.pare.core.tester import PARETester
@@ -112,6 +112,20 @@ def __init__(self, cfg, device):
             self.hps = HybrIKBaseSMPLCam(cfg_file=path_config.HYBRIK_CFG, smpl_path=smpl_path, data_path=path_config.hybrik_data_dir)
             self.hps.load_state_dict(torch.load(path_config.HYBRIK_CKPT, map_location='cpu'), strict=False)
             self.hps.to(self.device)
+        elif self.hps_type == 'bev':
+            try:
+                import bev
+            except:
+                print('Could not find bev, installing via pip install --upgrade simple-romp')
+                os.system('pip install simple-romp==1.0.3')
+                import bev
+            settings = bev.main.default_settings
+            # change the argparse settings of bev here if you prefer other settings.
+            settings.mode = 'image'
+            settings.GPU = int(str(self.device).split(':')[1])
+            settings.show_largest = True
+            # settings.show = True # uncommit this to show the original BEV predictions
+            self.hps = bev.BEV(settings)
 
         print(colored(f"Using {self.hps_type} as HPS Estimator\n", "green"))
 
@@ -185,7 +199,7 @@ def __getitem__(self, index):
 
         img_path = self.subject_list[index]
         img_name = img_path.split("/")[-1].rsplit(".", 1)[0]
-        img_icon, img_hps, img_ori, img_mask, uncrop_param = process_image(img_path, self.det, self.hps_type, 512)
+        img_icon, img_hps, img_ori, img_mask, uncrop_param = process_image(img_path, self.det, self.hps_type, 512, device=self.device)
         
         data_dict = {
             'name': img_name,
@@ -195,7 +209,7 @@ def __getitem__(self, index):
             'uncrop_param': uncrop_param
         }
         with torch.no_grad():
-            preds_dict = self.hps.forward(img_hps.to(self.device))
+            preds_dict = self.hps.forward(img_hps)
 
         data_dict['smpl_faces'] = torch.Tensor(
             self.faces.astype(np.int16)).long().unsqueeze(0).to(
@@ -231,9 +245,19 @@ def __getitem__(self, index):
             data_dict['smpl_verts'] = preds_dict['pred_vertices']
             scale, tranX, tranY = preds_dict['pred_camera'][0, :3]
             scale = scale * 2
-
+        
+        elif self.hps_type == 'bev':
+            data_dict['betas'] = torch.from_numpy(preds_dict['smpl_betas'])[[0], :10].to(self.device).float()
+            pred_thetas = batch_rodrigues(torch.from_numpy(preds_dict['smpl_thetas'][0]).reshape(-1,3)).float()
+            data_dict['body_pose'] = pred_thetas[1:][None].to(self.device)
+            data_dict['global_orient'] = pred_thetas[[0]][None].to(self.device)
+            data_dict['smpl_verts'] = torch.from_numpy(preds_dict['verts'][[0]]).to(self.device).float()
+            tranX = preds_dict['cam_trans'][0, 0]
+            tranY = preds_dict['cam'][0, 1] + 0.28
+            scale = preds_dict['cam'][0, 0] * 1.1
+        
         data_dict['scale'] = scale
-        data_dict['trans'] = torch.tensor([tranX, tranY, 0.0]).to(self.device)
+        data_dict['trans'] = torch.tensor([tranX, tranY, 0.0]).to(self.device).float()
         
         # data_dict info (key-shape):
         # scale, tranX, tranY - tensor.float
@@ -317,8 +341,8 @@ def visualize_alignment(self, data):
         {
             'image_dir': "../examples",
             'has_det': True,    # w/ or w/o detection
-            'hps_type': 'hybrik'  # pymaf/pare/pixie/hybrik
+            'hps_type': 'bev'  # pymaf/pare/pixie/hybrik/bev
         }, device)
 
-    
-    dataset.visualize_alignment(dataset[1])
+    for i in range(len(dataset)):
+        dataset.visualize_alignment(dataset[i])
diff --git a/lib/pymaf/utils/imutils.py b/lib/pymaf/utils/imutils.py
@@ -85,7 +85,7 @@ def image_to_hybrik_tensor(img):
     return [image_to_tensor, mask_to_tensor, image_to_pymaf_tensor, image_to_pixie_tensor, image_to_hybrik_tensor]
 
 
-def process_image(img_file, det, hps_type, input_res=512):
+def process_image(img_file, det, hps_type, input_res=512, device=None):
     """Read image, do preprocessing and possibly crop it according to the bounding box.
     If there are bounding box annotations, use them to crop the image.
     If no bounding box is specified but openpose detections are available, use them to get the bounding box.
@@ -141,12 +141,14 @@ def process_image(img_file, det, hps_type, input_res=512):
     img_hps = img_np.astype(np.float32) / 255.
     img_hps = torch.from_numpy(img_hps).permute(2, 0, 1)
     
-    if hps_type == 'hybrik':
-        img_hps = image_to_hybrik_tensor(img_hps).unsqueeze(0)
+    if hps_type == 'bev':
+        img_hps = img_np[:,:,[2,1,0]]
+    elif hps_type == 'hybrik':
+        img_hps = image_to_hybrik_tensor(img_hps).unsqueeze(0).to(device)
     elif hps_type != 'pixie':
-        img_hps = image_to_pymaf_tensor(img_hps).unsqueeze(0)
+        img_hps = image_to_pymaf_tensor(img_hps).unsqueeze(0).to(device)
     else:
-        img_hps = image_to_pixie_tensor(img_hps).unsqueeze(0)
+        img_hps = image_to_pixie_tensor(img_hps).unsqueeze(0).to(device)
     
     # uncrop params
     uncrop_param = {'center': center,
diff --git a/requirements.txt b/requirements.txt
@@ -24,6 +24,7 @@ cython==0.29.20
 rembg>=2.0.3
 opencv-python
 opencv_contrib_python
+simple-romp==1.0.3
 git+https://github.com/Project-Splinter/human_det
 git+https://github.com/YuliangXiu/smplx.git
 git+https://github.com/facebookresearch/pytorch3d.git