Exact authorship match with multiple authorship #126

BenMerSci · 2024-11-25T19:15:49Z

I am not sure if this is normal behaviour or not and so I just want to confirm.

The plant Agrostis tenuis isn't valid as per VASCAN database, and links to two different species based on the authorship that described Agrostis tenuis (https://data.canadensys.net/vascan/name/Agrostis%20tenuis):

Agrostis tenuis Sibthorp links to Agrostis capillaris Linnaeus
and
Agrostis tenuis Vasey links to Agrostis idahoensis Nash

When we query specifially for Agrostis tenuis Sibthorp from VASCAN:
https://verifier.globalnames.org/api/v1/verifications/Agrostis+tenuis+Sibthorp?capitalize=true&all_matches=true&data_sources=147

we get two results, one for Agrostis capillaris and another for Agrostis idahoensis.
I would have expected only one result since the authorship is provided and is supposed to only link to Agrostis capillaris Linnaeus.
Is this normal behaviour?

The text was updated successfully, but these errors were encountered:

dimus · 2024-11-25T19:53:11Z

We use authorship matching for sorting of returned results. So Agrostis capillaris Linnaeus should always be the first 'best' result. However, when 'all-matches' is set to true, all results are visible. We assume all-matches is for people who want to see everything.

For example here we see only 'best' results for each name, and they look as expected:

https://verifier.globalnames.org/?capitalize=on&ds=147&format=html&names=Agrostis+tenuis+Vasey%0D%0AAgrostis+tenuis+Sibthorp%0D%0A

It also manages to figure out 'best' result if authors are abbreviated:

https://verifier.globalnames.org/?capitalize=on&ds=147&format=html&names=Agrostis+tenuis+Vasey%0D%0AAgrostis+tenuis+Sibthorp%0D%0AAgrostis+tenuis+Vas.%0D%0AAgrostis+tenuis+Sibth.%0D%0A

I was thinking to make another 'all-matches' option that would return best match for each data-source, but was not sure if it would be confusing, or helpful.

BenMerSci · 2024-11-25T20:15:39Z

Got it!
I'm just trying to find a way to optimally generalize all my queries, where I can have taxons with or without authorship and taxons incorrectly (fuzzy) spelled and still get the matches I need to resolve the taxonomy.

Here in my example with Agrostis tenuis if I have the authorship (whether Sibthorp or Vasey, it works if I remove the argument all_matches=true because it returns only the match with the correct authorship and I'm not "interested" in the other match since it's a "bad" match (wrong species).
But if we only have, lets say Agrostis tenuis without authorship or even a name incorrectly written (fuzzy), we could want all the matches possible to decide afterward.

I guess that for some taxons I would want all the matches and for others not, but I can't know in advance (querying for a lot of different taxons)...

dimus · 2024-11-26T12:21:55Z

would it work for you to pick the first result from the returned list, if you do want to keep all-matches for all your queries? The first result is guaranteed to correspond to the 'best' match for each data-source. Results are not sorted by data-sources, only by the the quality algorithm, but the first result for each data-source is always the 'best' result for that data-source.

https://verifier.globalnames.org/?all_matches=on&capitalize=on&ds=147&ds=197&ds=196&format=html&names=Agrostis+tenuis+Vasey%0D%0AAgrostis+tenuis+Vas.%0D%0AAgrostis+tenuis+Sibthorp%0D%0AAgrostis+tenuis+Sib.%0D%0AAgrostis+tenuis%0D%0A

dimus · 2024-11-26T12:38:10Z

may be I do need to add a flag 'best-by-data-source' or something of this sort?

BenMerSci · 2024-11-26T14:23:39Z

The thing is we always want to keep all the results for all the data source in the query, to have the synonyms, or if the taxon we queried for is written fuzzy etc., unless the result is simply "wrong" like in my example (Agrostis tenuis Sibthorp matching to Agrostis idahoensis Nash is wrong based on VASCAN).

But we can't know before querying that the taxon may have a "wrong" match and that we should use only the best match.
I don't know if that makes sense?

dimus · 2024-11-26T15:55:18Z

I can imagine 2 things that might help:

Looking at ScoreDetails->AuthorMatchScore. If it is zero: authors did not match at all, if it is less than 0.3, one or both authorships were absent, everything higher means authors matched to some degree.
preparsing names and running only names without authorship with all-matches option.

https://parser.globalnames.org/?code=&format=csv&names=Agrostis+tenuis+Vasey%0D%0AAgrostis+tenuis&with_details=on

BenMerSci · 2024-11-26T16:04:17Z

Yes but we want the matches for all the datasources queried, even for matches with an authorship.

I think we'll manage to find a way to work around this on our side after the query, using the authorship and another field that we have (parent_scientific_name which is either the kingdom or the phylum) to parse through the results from the API with all_matches=true and keep the ones where the authorship matches and that our parent_scientific_name is in the classificationPath key.

I think we can close the issue.
Again, thank you @dimus for taking the time to go through this!

dimus · 2024-11-26T17:27:38Z

#127

dimus closed this as completed Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exact authorship match with multiple authorship #126

Exact authorship match with multiple authorship #126

BenMerSci commented Nov 25, 2024

dimus commented Nov 25, 2024 •

edited

Loading

BenMerSci commented Nov 25, 2024

dimus commented Nov 26, 2024 •

edited

Loading

dimus commented Nov 26, 2024

BenMerSci commented Nov 26, 2024

dimus commented Nov 26, 2024 •

edited

Loading

BenMerSci commented Nov 26, 2024 •

edited

Loading

dimus commented Nov 26, 2024

Exact authorship match with multiple authorship #126

Exact authorship match with multiple authorship #126

Comments

BenMerSci commented Nov 25, 2024

dimus commented Nov 25, 2024 • edited Loading

BenMerSci commented Nov 25, 2024

dimus commented Nov 26, 2024 • edited Loading

dimus commented Nov 26, 2024

BenMerSci commented Nov 26, 2024

dimus commented Nov 26, 2024 • edited Loading

BenMerSci commented Nov 26, 2024 • edited Loading

dimus commented Nov 26, 2024

dimus commented Nov 25, 2024 •

edited

Loading

dimus commented Nov 26, 2024 •

edited

Loading

dimus commented Nov 26, 2024 •

edited

Loading

BenMerSci commented Nov 26, 2024 •

edited

Loading