Official repository for the paper "Exploring the Potential of Encoder-free Architectures in 3D LMMs".
[📖 Paper] [🤗 HF Checkpoints for stage1]
We introduce ENEL, an Encoder-free 3D Large Language Model capable of overcoming the challenges posed by encoder-based architectures, including the inability to adapt to varying point cloud resolutions and the failure of encoder-extracted point features to meet the semantic needs of Large Language Models. Building upon PointLLM, we conduct a comprehensive investigation into how the LLM can assume the role of the 3D encoder. Based on the PointLLM dataset, our 7B model is evaluated across three benchmark tasks: generative 3D object classification, 3D object captioning, and 3D VQA, with assessments performed using GPT-4 scoring and traditional metrics.- [2023-02-13] We release the codes for training in the pre-training stage with corresponding checkpoints and the codes for evaluation.
- [2025-02-13] We release the paper of ENEL;
- 💬 Dialogue Examples
- 🔍 Overview
- 📦 Training and Evaluation
- 📝 TODO List
- 🔗 Citation
- 📄 License
- 👏 Acknowledgements
Dialogue 1 |
---|
![]() |
Please refer to our paper for more results.
In https://huggingface.co/IvanTang/ENEL/tree/main, to adapt to different paths, please modify the attributes: _name_or_path in the config.json file and special_tokens_map_file in the tokenizer_config.json file.
To start:
- Clone this repository.
https://github.com/Ivan-Tang-3D/ENEL.git
cd ENEL
- Install packages
conda create -n ENEL python=3.10 -y
conda activate ENEL
pip install --upgrade pip # enable PEP 660 support
pip install -e .
# * for training
pip install ninja
pip install flash-attn
# * for chamfer_dist
git clone https://github.com/Pang-Yatian/Point-MAE.git
cd ./extensions/chamfer_dist
python setup.py install --user
- Download the two compressed files of 660K Objaverse colored point clouds here. They require about 77GB of storage space.
- Run the following command to merge the two files into one and uncompress it. This will produce a folder named
8192_npy
containing 660K point cloud files named{Objaverse_ID}_8192.npy
. Each file is a numpy array with dimensions (8192, 6), where the first three dimensions arexyz
and the last three dimensions arergb
in [0, 1] range.
cat Objaverse_660K_8192_npy_split_a* > Objaverse_660K_8192_npy.tar.gz
tar -xvf Objaverse_660K_8192_npy.tar.gz
- In
ENEL
folder, create a folderdata
and create a soft link to the uncompressed file in the directory.
cd ENEL
mkdir data
ln -s /path/to/8192_npy data/objaverse_data
- In
ENEL/data
folder, create a directory namedanno_data
. - Our instruction-following data, including both the simple-description and complex instructions, can be downloaded here. If you have difficulty downloading the data (e.g. network issue), please email the authors.
- The simple-description data has 660K samples and the complex instructions have 70K samples.
- Both training data are based on the Objaverse dataset.
- The complex instructions are generated with GPT-4.
- Put the data files in the
anno_data
directory. The directory should look like this:
ENEL/data/anno_data
├── PointLLM_brief_description_660K_filtered.json
├── PointLLM_brief_description_660K.json
└── PointLLM_complex_instruction_70K.json
- Note, the
PointLLM_brief_description_660K_filtered.json
is filtered fromPointLLM_brief_description_660K.json
by removing the 3000 objects we reserved as the validation set.
- Download the referencing GT
PointLLM_brief_description_val_200_GT.json
we use for the benchmarks on Objaverse dataset here, and put it inENEL/data/anno_data
.
- In
ENEL
folder, create a directory namedcheckpoints
. - Download the pre-trained LLM:
PointLLM_7B_v1.1_init. Put them in the
checkpoints
directory.
- For stage-1 training, simply run:
cd ENEL
scripts/ENEL_train_stage1.sh
- Add training codes for stage1 with checkpoints.
- Add evaluation&inferencing codes.
- Add training codes for stage2.
If you find our work and this codebase helpful, please consider starring this repo 🌟 and cite:
@misc{tang2025exploringpotentialencoderfreearchitectures,
title={Exploring the Potential of Encoder-free Architectures in 3D LMMs},
author={Yiwen Tang and Zoey Guo and Zhuhao Wang and Ray Zhang and Qizhi Chen and Junli Liu and Delin Qu and Zhigang Wang and Dong Wang and Xuelong Li and Bin Zhao},
year={2025},
eprint={2502.09620},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2502.09620},
}
This work is under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.