Here we explain the data format for LogicNLG, please note that we only select a subset of columns beforehand to decrease the irrelevant information input and alleviate the over-size problem. The method to link subset of columns is described in the parser documentation.

The training/dev/test LM file

The three files (train_lm, val_lm, test_lm) are used for training/testing all the models in the following format:

{
  table_id: [ 
    [
      sent1,
      linked columns1,
      table title,
      template1
    ],
    [
      sent2,
      linked columns2,
      table title,
      template2
    ],
    ...
  ]
  table_id: [
    ...
  ]
}

The template sentence is generated by using entity linking file, which is not 100% accurate, it could miss some numbers or entities. Besides that, to accelerate the dataloading, we also preprocess the training file to have train_lm_preprocessed.json, which appends the "linearized table" in each sentence.

The adversarial evaluation file

These files (val_lm_pos_neg.json, test_lm_pos_neg.json) are used for adversarial evaluation, where each sentence is paired with an adversarial example with mild modification to test model's sensitivity against logic errors. The data is in the following format:

{
  table_id: [ 
    {
      pos:[
        sent1,
        linked columns1,
        table title,
        template1        
      ]
      neg:[
        sent1-adv,
        linked columns1,
        table title,
        template1-adv        
      ]
    },
    {
      pos:[
        sent2,
        linked columns2,
        table title,
        template2  
      ]
      neg:[
        sent2-adv,
        linked columns2,
        table title,
        template2-adv
      ]    
    },
    ...
  ],
  table_id: [
    {
      ...
    },
    {
      ...    
    }
    ...
  ]
  ...
}

Other files

vocab.json and full_vocab.json are for the Transformer model with copy mechanism, freq_list.json and stop_words.json is for the entity linking model, tabfact_bootstrap.json is for training the semantic parser.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

The training/dev/test LM file

The adversarial evaluation file

Other files

Files

README.md

Latest commit

History

README.md

File metadata and controls

The training/dev/test LM file

The adversarial evaluation file

Other files