FLD Corpus

This repository includes the released FLD corpora.

See the entry-point repository about the whole FLD project.

Available Corpora

(NEW!) FLDx2 (Formal Logic Deduction Diverse), our newest and the most advanced corpus, which substantially improves the reasoning capability of LLMs. Published alongside our NeurIPS 2024 paper.
JFLD, the Japanese version of FLD, described here. Published alongside our LREC-COLING 2024 paper.
The first FLD corpora, FLD (FLD.3) and FLD★ (FLD.4) , published alongside our ICML 2023 paper.
- Note that these are version 2.0, described in the Appendix.H.

How to use the corpora

First, install the datasets library:

pip install datasets

Then, you can load the FLD corpora as follows:

from datasets import load_dataset
FLD = load_dataset('hitachi-nlp/FLDx2', name='default')

What does the dataset example look like?

Concept

An example of deduction example in our dataset is conceptually illustrated in the figure below:

That is, given a set of facts and a hypothesis, a model must generate a proof sequence and determine an answer marker (proved, disproved, or unknown).

Schema

The most important fields are:

context (or facts in the later version of corpora): A set of facts.
hypothesis: A hypothesis.
proofs: Gold proofs. Each proof consists of a series of logical steps derived from the facts leading towards the hypothesis. Currently, for each example, we have at most one proof.
world_assump_label: An answer, which is either PROVED, DISPROVED, or UNKNOWN.

To train an LLM:

Use prompt_serial for the prompt, which is the serialized representation of the facts and the hypothesis.
Use proof_serial for the output to be generated, which is the serialized representation of the proof and answer.
- Note that, for the FLDx2 corpus, proof_serial sometimes includes both the proof and answer, and sometimes only the answer, working as a sort of augmentation.

For more about the training, see the training repository.

The actual schema can be viewed on the huggingface hub.

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
images		images
LICENSE		LICENSE
README.JFLD.md		README.JFLD.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FLD Corpus

Available Corpora

How to use the corpora

What does the dataset example look like?

Concept

Schema

About

Releases

Packages

Contributors 2

License

hitachi-nlp/FLD-corpus

Folders and files

Latest commit

History

Repository files navigation

FLD Corpus

Available Corpora

How to use the corpora

What does the dataset example look like?

Concept

Schema

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages