Skip to content

The official implementation of The paper "Exploring the Potential of Encoder-free Architectures in 3D LMMs"

Notifications You must be signed in to change notification settings

Ivan-Tang-3D/ENEL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploring the Potential of Encoder-free Architectures in 3D LMMs

Official repository for the paper "Exploring the Potential of Encoder-free Architectures in 3D LMMs".

[📖 Paper] [🤗 HF Checkpoints for stage1]

🏠 About

Solution_Teaser
We introduce ENEL, an Encoder-free 3D Large Language Model capable of overcoming the challenges posed by encoder-based architectures, including the inability to adapt to varying point cloud resolutions and the failure of encoder-extracted point features to meet the semantic needs of Large Language Models. Building upon PointLLM, we conduct a comprehensive investigation into how the LLM can assume the role of the 3D encoder. Based on the PointLLM dataset, our 7B model is evaluated across three benchmark tasks: generative 3D object classification, 3D object captioning, and 3D VQA, with assessments performed using GPT-4 scoring and traditional metrics.

🔥 News

  • [2023-02-13] We release the codes for training in the pre-training stage with corresponding checkpoints and the codes for evaluation.
  • [2025-02-13] We release the paper of ENEL;

📋 Contents

💬 Dialogue Examples

Dialogue 1

🔍 Overview

Model

The encoder-free 3D LMM directly utilizes a token embedding module to convert point cloud data into discrete point tokens, which are then concatenated with text tokens to serve as input to the LLM. To assume the role of the encoder, the LLM is guided to extract high-level semantic features of the point clouds and acquire multi-level knowledge from both global and local perspectives.

Experiment Results

Please refer to our paper for more results.

Reminder of the Model Zoo

In https://huggingface.co/IvanTang/ENEL/tree/main, to adapt to different paths, please modify the attributes: _name_or_path in the config.json file and special_tokens_map_file in the tokenizer_config.json file.

📦 Training and Evaluation

Installation

To start:

  1. Clone this repository.
https://github.com/Ivan-Tang-3D/ENEL.git
cd ENEL
  1. Install packages
conda create -n ENEL python=3.10 -y
conda activate ENEL
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

# * for training
pip install ninja
pip install flash-attn

# * for chamfer_dist
git clone https://github.com/Pang-Yatian/Point-MAE.git
cd ./extensions/chamfer_dist
python setup.py install --user

Data Preparation

Objaverse Training Data

  1. Download the two compressed files of 660K Objaverse colored point clouds here. They require about 77GB of storage space.
  2. Run the following command to merge the two files into one and uncompress it. This will produce a folder named 8192_npy containing 660K point cloud files named {Objaverse_ID}_8192.npy. Each file is a numpy array with dimensions (8192, 6), where the first three dimensions are xyz and the last three dimensions are rgb in [0, 1] range.
cat Objaverse_660K_8192_npy_split_a* > Objaverse_660K_8192_npy.tar.gz
tar -xvf Objaverse_660K_8192_npy.tar.gz
  1. In ENEL folder, create a folder data and create a soft link to the uncompressed file in the directory.
cd ENEL
mkdir data
ln -s /path/to/8192_npy data/objaverse_data

Instruction-Following Data

  1. In ENEL/data folder, create a directory named anno_data.
  2. Our instruction-following data, including both the simple-description and complex instructions, can be downloaded here. If you have difficulty downloading the data (e.g. network issue), please email the authors.
  • The simple-description data has 660K samples and the complex instructions have 70K samples.
  • Both training data are based on the Objaverse dataset.
  • The complex instructions are generated with GPT-4.
  1. Put the data files in the anno_data directory. The directory should look like this:
ENEL/data/anno_data
├── PointLLM_brief_description_660K_filtered.json
├── PointLLM_brief_description_660K.json
└── PointLLM_complex_instruction_70K.json
  1. Note, the PointLLM_brief_description_660K_filtered.json is filtered from PointLLM_brief_description_660K.json by removing the 3000 objects we reserved as the validation set.

Evaluation Data

  1. Download the referencing GT PointLLM_brief_description_val_200_GT.json we use for the benchmarks on Objaverse dataset here, and put it in ENEL/data/anno_data.

Training

Download the Initial LLM Weight

  1. In ENEL folder, create a directory named checkpoints.
  2. Download the pre-trained LLM: PointLLM_7B_v1.1_init. Put them in the checkpoints directory.

Start Training

  1. For stage-1 training, simply run:
cd ENEL
scripts/ENEL_train_stage1.sh

📝 TODO List

  • Add training codes for stage1 with checkpoints.
  • Add evaluation&inferencing codes.
  • Add training codes for stage2.

🔗 Citation

If you find our work and this codebase helpful, please consider starring this repo 🌟 and cite:

@misc{tang2025exploringpotentialencoderfreearchitectures,
      title={Exploring the Potential of Encoder-free Architectures in 3D LMMs}, 
      author={Yiwen Tang and Zoey Guo and Zhuhao Wang and Ray Zhang and Qizhi Chen and Junli Liu and Delin Qu and Zhigang Wang and Dong Wang and Xuelong Li and Bin Zhao},
      year={2025},
      eprint={2502.09620},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2502.09620}, 
}

📄 License

Creative Commons License This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

👏 Acknowledgements

About

The official implementation of The paper "Exploring the Potential of Encoder-free Architectures in 3D LMMs"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published