veld_data__apis_oebl__ner_gold

Randomized sentences from historical German biographies, containing named entities.

Source of data

The original data was extracted from the Austrian Biographical Lexicon (ÖBL) in the context of the Austrian Prosopographical Information System (APIS) project.

From there, samples were randomly pulled and annotated for Named Entity Recognition tasks, which form this dataset.

The texts concern numerous smaller biographies in the time period between 19th and early 20th century within historical Austria-Hungary, and were produced by the Austrian Acadamey of Sciences between 1957 and 2023.

The language style is rather condensed and contains a lot of domain-specific abbreviations.

The contained NER tags are PER (Person), ORG (Organisation), LOC (Location).

Extracted and transformed from: https://gitlab.oeaw.ac.at/acdh-ch/apis/spacy-ner in the context of a VELD chain: https://github.com/steffres/veld_chain_6__apis_ner_transform_to_conll2003

In the original spacy-ner repo, several data sets are split into corpus / training / eval sets, scattered across folders with their respective nlp models, and also spread across different formats (txt, json, pickle). In order to reuse this data, it was extracted, harmonized, cleaned, deduplicated and transformed it into one consistent data source as json files, where only texts and the indices and tags of their contained named entities are persisted.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data_cleaned		data_cleaned
data_cleaned_simplified		data_cleaned_simplified
data_uncleaned		data_uncleaned
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

veld_data__apis_oebl__ner_gold

Source of data

About

Releases

Packages

License

veldhub/veld_data__apis_oebl__ner_gold

Folders and files

Latest commit

History

Repository files navigation

veld_data__apis_oebl__ner_gold

Source of data

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages