This is a dataset builder for Tonguscape introduced in Tonguescape: Exploring Language Models Understanding of Vowel Articulation (NAACL 2025).
This dataset is segmented by phoneme from rtMRIDB. The phoneme is defined in original DB.
rmri-extract/
: The directory for extracting the base of this dataset from rtMRIDBscripts/
: The directory for the dataset builder scripts (e.g., removing audio or overlaying shapes)data/
: The directory for the dataset indexassets/
: The directory for the dataset annotations