subgenus = Incertae sedis then name string doesn't parse, also strange looking quality values #277

debpaul · 2025-01-22T20:36:48Z

Raw data (unparsed): beulah-first-5000-name-strings-unparsed.csv

Modified GNParsed Data Set: beulath-taxonnames-gnparsed-first-5000-rows.txt

added family column, value = Carabidae
opened file in Notepad ++
changed CRLF line endings to UNIX (LF) (b/c upload to TW batch requires this)

Noticed

the Quality values look strange? Maybe on import into Excel, I need to select a certain data type for this field?
see also line 11 above where the value pseudoflavipes appears changed to pseudoflavipe0s in CanonicalFull column (also lines 116, 117)
- don't know where that 0 comes from
see also Author Year leading and trailing 0. Not sure where they are coming from either
More 0 issues (and delimiters issue?), origin uncertain
Some names did not parse. (Not sure why). See screenshot next. Maybe because all these names have subgenus = (Incertae sedis) and GN doesn't recognize this value at this rank?

In general, subgenus is missing from all parsed values.

Maybe in future?

option to parse (further atomize) down to lowest rank provided

The text was updated successfully, but these errors were encountered:

dimus · 2025-01-22T21:39:46Z

Thanks @debpaul, interesting

Looks like I am missing case where subgenus is Inserte cedis. I do agree, that names like these should be parsed. I will make a separate issue about it.
Strange results in quality is an artefact of postprocessing, it is impossible to get quality 10. The '0' in the middle of Canonical also seems to be postprocessing problem. Try to run this name by itself in parser
Subgenus is provided, just not in the CSV format. If you pick JSON format on the web UI, you will see the subgenus results.

debpaul · 2025-01-22T21:50:04Z

@dimus thanks! I did note that on import to Excel, it asks about modifying or removing leading zeroes. Note sure why. I told it not to modify the data. I'll test again as you suggest.

dimus · 2025-01-22T21:58:53Z

this is what I get without preprocessing;

beulah-parsed.txt

@debpaul can you also try Libreoffice? It consistently gives me better results than Excel

dimus mentioned this issue Jan 22, 2025

Names like "Abacetus (Incertae sedis) artus Andrewes, 1942" are not parsed #278

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

subgenus = Incertae sedis then name string doesn't parse, also strange looking quality values #277

subgenus = Incertae sedis then name string doesn't parse, also strange looking quality values #277

debpaul commented Jan 22, 2025 •

edited

Loading

dimus commented Jan 22, 2025

debpaul commented Jan 22, 2025

dimus commented Jan 22, 2025 •

edited

Loading

subgenus = Incertae sedis then name string doesn't parse, also strange looking quality values #277

subgenus = Incertae sedis then name string doesn't parse, also strange looking quality values #277

Comments

debpaul commented Jan 22, 2025 • edited Loading

dimus commented Jan 22, 2025

debpaul commented Jan 22, 2025

dimus commented Jan 22, 2025 • edited Loading

debpaul commented Jan 22, 2025 •

edited

Loading

dimus commented Jan 22, 2025 •

edited

Loading