GitHub - oakink/OakInk2: 🌴[CVPR 2024] OakInk2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion

A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion

🔧 Dataset Toolkit

Xinyu Zhan* · Lixin Yang* · Yifei Zhao · Kangrui Mao · Hanlin Xu
Zenan Lin · Kailin Li · Cewu Lu†

CVPR 2024

This repo contains the OakInk2 dataset toolkit (oakink2_toolkit) -- a Python package that provides data loading, splitting, and visualization.

Setup dataset files.

Download tarballs from [huggingface](https://huggingface.co/datasets/kelvin34501/OakInk-v2).
You will need the data tarball and the preview version annotation tarball for at least one sequence, the object_raw tarball, the object_repair tarball and the program tarball.
Organize these files as follow:
```
data
|-- data
|   `-- scene_0x__y00z++00000000000000000000__YYYY-mm-dd-HH-MM-SS
|-- anno_preview
|   `-- scene_0x__y00z++00000000000000000000__YYYY-mm-dd-HH-MM-SS.pkl
|-- object_raw
|-- object_repair
|-- object_affordance
`-- program
```

OakInk2 Toolkit

Install the package.
```
pip install .
```
Optionally, install it with editable flags:
```
pip install -e .
```
Check the installation.
```
python -c 'from oakink2_toolkit.dataset import OakInk2__Dataset'
```
It the command runs without error, the installation is successful.

OakInk2 Preview-Tool

Setup the enviroment.
1. Create a virtual env of python 3.10. This can be done by either conda or python package venv.
  1. conda approach
```
conda create -p ./.conda python=3.10
conda activate ./.conda
```
  2. venv approach First use pyenv or other tools to install a python intepreter of version 3.10. Here 3.10.16 is used as example:
```
pyenv install 3.10.16
pyenv shell 3.10.16
```
    Then create a virtual environment:
```
python -m venv .venv --prompt oakink2_preview
. .venv/bin/activate
```
2. Install the dependencies.
  
  Make sure all bundled dependencies are there.
```
git submodule update --init --recursive --progress
```
  Use pip to install the packages:
```
pip install -r req_preview.txt
```
  Note that oakink2_preview is compatible with torch version higher than specified in req_preview.txt. Choose the most appropriate version for your environment.
Download the SMPL-X model(version v1.1) and place the files at asset/smplx_v1_1.

The directory structure should be like:
```
asset
`-- smplx_v1_1
   `-- models
        |-- SMPLX_NEUTRAL.npz
        `-- SMPLX_NEUTRAL.pkl
```

Launch the preview tool:

python -m oakink2_preview.launch.viz.gui --cfg config/gui__preview.yml

Or use the shortcut:

oakink2_viz_gui --cfg config/gui__preview.yml

(Optional) Preview task in segments.
1. Download the MANO model(version v1.2) and place the files at asset/mano_v1_2.
  
  The directory structure should be like:
```
asset
`-- mano_v1_2
    `-- models
        |-- MANO_LEFT.pkl
        `-- MANO_RIGHT.pkl
```
2. Launch the preview segment tool (press enter to proceed). Note seq_key should contain '/' rather than '++' as directory separator.
```
python -m oakink2_preview.launch.viz.seg_3d --seq_key scene_0x__y00z/00000000000000000000__YYYY-mm-dd-HH-MM-SS
```
  Or use the shortcut:
```
oakink2_viz_seg3d --seq_key scene_0x__y00z/00000000000000000000__YYYY-mm-dd-HH-MM-SS
```
(Optional) View the introductory video on youtube.

Dataset Format

data/scene_0x__y00z++00000000000000000000__YYYY-mm-dd-HH-MM-SS

This stores the captured multi-view image streams. Stream from different cameras are stored in different subdirectories.

scene_0x__y00z++00000000000000000000__YYYY-mm-dd-HH-MM-SS
|-- <serial 0>
|   |-- <frame id 0>.jpg
|   |-- <frame id 1>.jpg
|   |-- ...
|   `-- <frame id N>.jpg
|-- ...
`-- <serial 3>
    |-- <frame id 0>.jpg
    |-- <frame id 1>.jpg
    |-- ...
    `-- <frame id N>.jpg

anno/scene_0x__y00z++00000000000000000000__YYYY-mm-dd-HH-MM-SS.pkl

This pickle stores a dictonary under the following format:

{
    'cam_def': dict[str, str],                      # camera serial to name mapping
    'cam_selection': list[str],                     # selected camera names
    'frame_id_list': list[int],                     # image frame id list in current seq 
    'cam_intr': dict[str, dict[int, np.ndarray]],   # camera intrinsic matrix [3, 3]
    'cam_extr': dict[str, dict[int, np.ndarray]],   # camera extrinsic matrix [4, 4]
    'mocap_frame_id_list': list[int],               # mocap frame id list in current seq
    'obj_list': list[str],                          # object part id list in current seq
    'obj_transf': dict[str, dict[int, np.ndarray]], # object transformation matrix [4, 4]
    'raw_smplx': dict[int, dict[str, torch.Tensor]],# raw smplx data
    'raw_mano':  dict[int, dict[str, torch.Tensor]],# raw mano data
}

The raw smplx data is structured as follows:

{
    'body_shape':       torch.Tensor[1, 300],
    'expr_shape':       torch.Tensor[1, 10],
    'jaw_pose':         torch.Tensor[1, 1, 4],
    'leye_pose':        torch.Tensor[1, 1, 4],
    'reye_pose':        torch.Tensor[1, 1, 4],
    'world_rot':        torch.Tensor[1, 4],
    'world_tsl':        torch.Tensor[1, 3],
    'body_pose':        torch.Tensor[1, 21, 4],
    'left_hand_pose':   torch.Tensor[1, 15, 4],
    'right_hand_pose':  torch.Tensor[1, 15, 4],
}

where world_rot, body_pose, {lh,rh}_hand_pose are quaternions in [w,x,y,z] format. The lower body of body_pose, jaw_pose, {l,r}eye_pose are not used.

The raw mano data is structured as follows:

{
    'rh__pose_coeffs':  torch.Tensor[1, 16, 4],
    'lh__pose_coeffs':  torch.Tensor[1, 16, 4],
    'rh__tsl':          torch.Tensor[1, 3],
    'lh__tsl':          torch.Tensor[1, 3],
    'rh__betas':        torch.Tensor[1, 10],
    'lh__betas':        torch.Tensor[1, 10],
}

where {lh,rh}__pose_coeffs are quaternions in [w,x,y,z] format.

object_{raw,scan}/obj_desc.json

This stores the object description in the following format:
```
{
    obj_id: {
        "obj_id": str,
        "obj_name": str,
    }
}
```
object_{raw,scan}/align_ds

This directory stores the object models.
```
align_ds
|-- obj_id
|   |-- *.obj/ply
|   |-- ...
`-- ...
```
object_affordance/affordance_part

This directory stores the object affordance part models.
```
affordance_part
|-- obj_part_id
|   |-- *.obj
|   |-- ...
`-- ...
```

object_affordance/affordance_label.json

This stores the available object affordance labels in the following format:

{
    'all_label': list[str],                     # list of all labels (including affordance & instantiation)
    'affordance_label': list[str],              # list of affordance labels (reflecting part functions)
    'affordance_instantiation_label': list[str] # list of affordance instantiation labels (reflecting interactions & primitive tasks)
}

object_affordance/instance_id.json

This stores the object part id that maps to full object instance in the following format:
```
[
    obj_part_id,                        # object part id that maps to full object instance
    ...
]
```

object_affordance/object_affordance.json

This stores the object affordance annotations in the following format:

{
    obj_part_id: {
        "obj_part_id": str,                     # object part id
        "is_instance": bool,                    # whether the part id maps to an instance
        "has_model": bool,                      # whether the part id has a model, i.e object segmentation
        "affordance": list[str],                # list of affordance labels
        "affordance_instantiation": list[str],  # list of affordance instantiation labels
    }
}

object_affordance/object_part_tree.json

This stores the object part tree in the following format:

{
    obj_part_id: list[str],                     # list of object part id that are children of the current part id
}

object_affordance/part_desc.json

This stores the object part description in the following format:
```
{
    obj_id: {
        "obj_id": str,
        "obj_name": str,
    }
}
```
program/program_info/scene_0x__y00z++00000000000000000000__YYYY-mm-dd-HH-MM-SS.json
```
{
    (str(lh_interval), str(rh_interval)): {
        "primitive": str,
        "obj_list: list[str],
        "interaction_mode": str,        # [lh_main, rh_main, bh_main]
        "primitive_lh": str,
        "primitive_rh": str,
        "obj_list_lh": list[str],
        "obj_list_rh": list[str],
    }
}
```
- {lh,rh}_interval: the interval of the primitive in the sequence. If None, the corresponding hand is not available (e.g. doing something else) in current primitive.
- primitive: the primitive id.
- obj_list: the object list involved in the primitive.
- interaction_mode: the interaction mode of the primitive. lh_main means the left hand is the main hand for affordance implementation. Similarly, rh_main means the right hand is the main hand, and bh_main means both hands are main hands.
- primitive_{lh,rh}: the primitive id for the left/right hand.
- obj_list_{lh,rh}: the object list involved in the left/right hand.

program/desc_info/scene_0x__y00z++00000000000000000000__YYYY-mm-dd-HH-MM-SS.json

{
    (str(lh_interval), str(rh_interval)): {
        "seg_desc": str,                # textual description of current primitive
    }
}

program/initial_condition_info/scene_0x__y00z++00000000000000000000__YYYY-mm-dd-HH-MM-SS.json

{
    (str(lh_interval), str(rh_interval)): {
        "initial_condition": list[str], # initial condition for the complex task
        "recipe": list[str],            # requirements to complete for the complex task
    }
}

program/pdg/scene_0x__y00z++00000000000000000000__YYYY-mm-dd-HH-MM-SS.json

{
    "id_map": dict[interval, int],      # map from interval to primitive id
    "v": list[int],                     # list of vertices
    "e": list[list[int]],               # list of edges
}

FAQ

How to load the dataset with the oakink2_toolkit library?

from oakink2_toolkit.dataset import OakInk2__Dataset

# Load the dataset
oi2_data = OakInk2__Dataset(
    dataset_prefix='data',
    return_instantiated=True,   # set to False if only metainfo wanted
    anno_offset='anno_preview',
    obj_offset='object_repair', # set to 'object_raw' for downsampled object raw scans
    affordance_offset="object_affordance",
)

# Load sequence
# complex_task_data = oi2_data.load_complex_task(seq_key)
# primitive_task_data_list = oakink2_dataset.load_primitive_task(complex_task_data)

oakink2_viz_gui fails to create context and reporting libGL error: failed to load driver: swrast.

Please rerun with environment variable LIBGL_DEBUG=verbose to get more information.

If the error is due to libffi.so.7 has wrong symbols when using conda environment, downgrade the libffi package to version 3.3.
```
conda install libffi=3.3
```

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
asset/smplx_extra		asset/smplx_extra
config		config
doc		doc
script		script
src		src
thirdparty		thirdparty
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
req_preview.txt		req_preview.txt
req_toolkit.txt		req_toolkit.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion

🔧 Dataset Toolkit

CVPR 2024

Setup dataset files.

OakInk2 Toolkit

OakInk2 Preview-Tool

Dataset Format

FAQ

About

Releases

Contributors 2

Languages

oakink/OakInk2

Folders and files

Latest commit

History

Repository files navigation

A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion

🔧 Dataset Toolkit

CVPR 2024

Setup dataset files.

OakInk2 Toolkit

OakInk2 Preview-Tool

Dataset Format

FAQ

About

Topics

Resources

Stars

Watchers

Forks

Releases

Contributors 2

Languages