The code and datasets of our AAAI 2023 paper Visually Grounded Commonsense Knowledge Acquisition.
In this work, we propose to formulate Commonsense Knowledge Extraction (CKE) as a distantly supervised multi-instance learning problem. Given an entity pair (such as person-bottle) and associated images, our model first understands entity interactions in each image, and then selects informative ones (solid line) to summarize the commonsense relations. We present a dedicated CKE framework CLEVER that integrate VLP models with contrastive attention to deal with complex commonsense relation learning. You can find more details in our paper.
Check INSTALL.md for installation instructions.
Check DATASET.md for data preparation.
# Prepare dataset according to 'Data Preparation' Section
cd src/Oscar
bash train.sh
We directly use RTP to extract triplets from Conceptual Captions which contains more than 3 millon image captions. Triplets are sorted by frequency for evaluation.
# Vanilla-FT
cd src
python vanilla_ft.py
# LAMA and Prompt-FT
cd src
conda activate CLEVER_prompt_env # to resolve dependency conflic
python prompt_ft.py
cd Oscar
bash run_instance_pred_cls.sh
bash run_VRD_baseline.sh
You can download the commonsense knowledge triplets extracted by CELEVER on test split from here. The data structure is:
[
(subject, object, predicate, commonsense_confidence),
...
]
Please consider citing this paper if you use the code:
@inproceedings{yao2023clever,
title={Visually Grounded Commonsense Knowledge Acquisition},
author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Li, Mengdi and Xie, Ruobing and Weber, Cornelius and Liu, Zhiyuan and Zheng, Haitao and Wermter, Stefan and Chua, Tat-Seng and Sun, Maosong},
booktitle={Proceedings of AAAI},
year={2023}
}
CLEVER is released under the MIT license. See LICENSE for details.
Our implementation is based on the fantastic code of Oscar.