Exploring the Potential of Encoder-free Architectures in 3D LMMs

Official repository for the paper "Exploring the Potential of Encoder-free Architectures in 3D LMMs".

🏠 About

We introduce ENEL, an Encoder-free 3D Large Language Model capable of overcoming the challenges posed by encoder-based architectures, including the inability to adapt to varying point cloud resolutions and the failure of encoder-extracted point features to meet the semantic needs of Large Language Models. Building upon PointLLM, we conduct a comprehensive investigation into how the LLM can assume the role of the 3D encoder. Based on the PointLLM dataset, our 7B model is evaluated across three benchmark tasks: generative 3D object classification, 3D object captioning, and 3D VQA, with assessments performed using GPT-4 scoring and traditional metrics.

🔥 News

[2023-02-13] We release the codes for training in the pre-training stage with corresponding checkpoints and the codes for evaluation.
[2025-02-13] We release the paper of ENEL;

💬 Dialogue Examples

Dialogue 1

🔍 Overview

Model

The encoder-free 3D LMM directly utilizes a token embedding module to convert point cloud data into discrete point tokens, which are then concatenated with text tokens to serve as input to the LLM. To assume the role of the encoder, the LLM is guided to extract high-level semantic features of the point clouds and acquire multi-level knowledge from both global and local perspectives.

Experiment Results

Please refer to our paper for more results.

Reminder of the Model Zoo

In https://huggingface.co/IvanTang/ENEL/tree/main, to adapt to different paths, please modify the attributes: _name_or_path in the config.json file and special_tokens_map_file in the tokenizer_config.json file.

📦 Training and Evaluation

Installation

To start:

Clone this repository.

https://github.com/Ivan-Tang-3D/ENEL.git
cd ENEL

Install packages

conda create -n ENEL python=3.10 -y
conda activate ENEL
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

# * for training
pip install ninja
pip install flash-attn

# * for chamfer_dist
git clone https://github.com/Pang-Yatian/Point-MAE.git
cd ./extensions/chamfer_dist
python setup.py install --user

Data Preparation

Objaverse Training Data

Download the two compressed files of 660K Objaverse colored point clouds here. They require about 77GB of storage space.
Run the following command to merge the two files into one and uncompress it. This will produce a folder named 8192_npy containing 660K point cloud files named {Objaverse_ID}_8192.npy. Each file is a numpy array with dimensions (8192, 6), where the first three dimensions are xyz and the last three dimensions are rgb in [0, 1] range.

cat Objaverse_660K_8192_npy_split_a* > Objaverse_660K_8192_npy.tar.gz
tar -xvf Objaverse_660K_8192_npy.tar.gz

In ENEL folder, create a folder data and create a soft link to the uncompressed file in the directory.

cd ENEL
mkdir data
ln -s /path/to/8192_npy data/objaverse_data

Instruction-Following Data

In ENEL/data folder, create a directory named anno_data.
Our instruction-following data, including both the simple-description and complex instructions, can be downloaded here. If you have difficulty downloading the data (e.g. network issue), please email the authors.

The simple-description data has 660K samples and the complex instructions have 70K samples.
Both training data are based on the Objaverse dataset.
The complex instructions are generated with GPT-4.

Put the data files in the anno_data directory. The directory should look like this:

ENEL/data/anno_data
├── PointLLM_brief_description_660K_filtered.json
├── PointLLM_brief_description_660K.json
└── PointLLM_complex_instruction_70K.json

Note, the PointLLM_brief_description_660K_filtered.json is filtered from PointLLM_brief_description_660K.json by removing the 3000 objects we reserved as the validation set.

Evaluation Data

Download the referencing GT PointLLM_brief_description_val_200_GT.json we use for the benchmarks on Objaverse dataset here, and put it in ENEL/data/anno_data.

Training

Download the Initial LLM Weight

In ENEL folder, create a directory named checkpoints.
Download the pre-trained LLM: PointLLM_7B_v1.1_init. Put them in the checkpoints directory.

Start Training

For stage-1 training, simply run:

cd ENEL
scripts/ENEL_train_stage1.sh

📝 TODO List

Add training codes for stage1 with checkpoints.
Add evaluation&inferencing codes.
Add training codes for stage2.

🔗 Citation

If you find our work and this codebase helpful, please consider starring this repo 🌟 and cite:

@misc{tang2025exploringpotentialencoderfreearchitectures,
      title={Exploring the Potential of Encoder-free Architectures in 3D LMMs}, 
      author={Yiwen Tang and Zoey Guo and Zhuhao Wang and Ray Zhang and Qizhi Chen and Junli Liu and Delin Qu and Zhigang Wang and Dong Wang and Xuelong Li and Bin Zhao},
      year={2025},
      eprint={2502.09620},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2502.09620}, 
}

📄 License

This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
assets		assets
pointllm		pointllm
scripts		scripts
.DS_Store		.DS_Store
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring the Potential of Encoder-free Architectures in 3D LMMs

🏠 About

🔥 News

📋 Contents

💬 Dialogue Examples

🔍 Overview

Model

Experiment Results

Reminder of the Model Zoo

📦 Training and Evaluation

Installation

Data Preparation

Objaverse Training Data

Instruction-Following Data

Evaluation Data

Training

Download the Initial LLM Weight

Start Training

📝 TODO List

🔗 Citation

📄 License

👏 Acknowledgements

About

Releases

Packages

Languages

Ivan-Tang-3D/ENEL

Folders and files

Latest commit

History

Repository files navigation

Exploring the Potential of Encoder-free Architectures in 3D LMMs

🏠 About

🔥 News

📋 Contents

💬 Dialogue Examples

🔍 Overview

Model

Experiment Results

Reminder of the Model Zoo

📦 Training and Evaluation

Installation

Data Preparation

Objaverse Training Data

Instruction-Following Data

Evaluation Data

Training

Download the Initial LLM Weight

Start Training

📝 TODO List

🔗 Citation

📄 License

👏 Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages