Skip to content

UKPLab/coling2018-multimodalSurvey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

coling18-multimodalSurvey

This repository contains additional information on the analyses in our Coling2018-survey "Multimodal Grounding for Language Processing".

Abstract: This survey discusses how recent developments in multimodal processing facilitate conceptual grounding of language. We categorize the information flow in multimodal processing with respect to cognitive models of human information processing and analyze different methods for combining multimodal representations. Based on this methodological inventory, we discuss the benefit of multimodal grounding for a variety of language processing tasks and the challenges that arise. We particularly focus on multimodal grounding of verbs which play a crucial role for multimodal compositionality.

Please use the following citation:

@InProceedings{beinborn2018multimodal,
  title = {{Multimodal Grounding for Language Processing}},
	author = {Beinborn, Lisa and Botschen, Teresa and Gurevych, Iryna},
	publisher = {Association for Computational Linguistics},
	booktitle = {Proceedings of COLING 2018, the 27th International Conference on Computational Linguistics: Technical Papers},
	pages = {to appear},
	month = {aug},
	year = {2018},
	location = {Santa Fe, USA},
}

Contact person: Teresa Botschen (botschen@aiphes.tu-darmstadt.de), Lisa Beinborn (lisa.beinborn@uni-due.de)

https://www.ukp.tu-darmstadt.de/

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Analyses

We present first steps towards an investigation of verb grounding and analyze the quality of verb representations in the most common publicly available approaches for multimodal representations.

Data

This project uses different pretrained word embeddings which can be found here: https://fileserver.ukp.informatik.tu-darmstadt.de/coling18-multimodalSurvey

Code

Run the script 'coling18-multimodalSurvey_experiments.py' to step-by-step reproduce the results reported in paper. Use the datasets and embeddings mentioned below.

Resources

We used the following resources to create multimodal representations.

Visual resources

Google dataset with existing image embeddings as provided by Kiela et al. (2016) (paper: https://aclweb.org/anthology/D/D16/D16-1043.pdf, data: http://www.cl.cam.ac.uk/~dk427/cnnexpts.html) (images in Google dataset: obtained by Google image search, embeddings for Google dataset: obtained by using GoogLeNet)

visual Google representations: 1024 dim

imSitu dataset by Yatskar et al. (2016) (paper: https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Yatskar_Situation_Recognition_Visual_CVPR_2016_paper.pdf, data: http://imsitu.org/)

--> We trained the embeddings by applying a pre-trained VGG19 neural network for image classification on the visual resources. (pretrained network: Simonyan and Zisserman (2014) (paper: https://arxiv.org/pdf/1409.1556.pdf))

visual imSitu representations: 4096 dim

Textual resources

Glove embeddings by Pennington et al. (2014) (paper: http://aclweb.org/anthology/D14-1162)

Glove representations: 300 dim

Mapped representations

We mapped from textual to visual embeddings by applying the mapping method by Collell et al. (2017) (paper: http://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/download/14811/14042)

Mapped embeddings for Google dataset: 1024 dim

Mapped embeddings for imSitu dataset: 4096 dim

Verb annotations

In line with previous work, the quality of the representations is evaluated as the Spearman correlation between the cosine similarity of two verb embeddings and their corresponding similarity rating in the SimVerb dataset. We compare the quality of 3498 verb pairs.

SimVerb dataset by Gerz et al. (2016) (paper: http://www.aclweb.org/anthology/D16-1235)

From a multimodal perspective, verbs can be categorized according to their degree of embodiment. This measure indicates to which extent verb meanings involve bodily experience. We obtain embodiment ratings for 1163 pairs. The class 'high embodiment' contains pairs like 'fall-dive' in which the embodiment of both verbs can be found in the highest quartile (135 pairs), 'low embodiment' contains pairs with embodiment ratings in the lowest quartile (81 pairs) like 'know-decide'.

Embodiment ratings for verbs by Sidhu et al. (2014) (paper: http://iranarze.ir/wp-content/uploads/2017/08/7298-English-IranArze.pdf)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages