Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAIR Principles Hierarchy - discussion requested #34

Open
allysonlister opened this issue Feb 11, 2025 · 6 comments
Open

FAIR Principles Hierarchy - discussion requested #34

allysonlister opened this issue Feb 11, 2025 · 6 comments

Comments

@allysonlister
Copy link

FAIRsharing is where OSTrails Dimensions (aka Principles) are registered.
We have already had an offline discussion about what the 'homepage' of such records should be (the IRIs from https://peta-pico.github.io/FAIR-nanopubs/principles/index-en.html).

I'm now determining what hierarchy should look like for the FAIR principles prior to creating the records in production. I would really appreciate your opinions here, please, especially @markwilkinson @pabloalarconm @dgarijo . I also cc @SusannaSansone and @knirirr so that they are aware of this discussion.

We'll start with a list of the FAIR principles as published, which are quite flat; this is also represented in e.g. the GO FAIR foundation webpages, the vocabulary linked above etc:

FAIR Principles record in FAIRsharing: https://doi.org/10.25504/FAIRsharing.WWI10U
F

  • F1. (meta)data are assigned a globally unique and persistent identifier(*, FUJI)
  • F2. data are described with rich metadata (defined by R1 below)
  • F3. metadata clearly and explicitly include the identifier of the data it describes
  • F4. (meta)data are registered or indexed in a searchable resource

A

  • A1. (meta)data are retrievable by their identifier using a standardized communications protocol
    • A1.1 the protocol is open, free, and universally implementable(*)
    • A1.2 the protocol allows for an authentication and authorization procedure, where necessary
  • A2. metadata are accessible, even when the data are no longer available

I

  • I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.(*)
  • I2. (meta)data use vocabularies that follow FAIR principles
  • I3. (meta)data include qualified references to other (meta)data

R

  • R1. meta(data) are richly described with a plurality of accurate and relevant attributes
    • R1.1. (meta)data are released with a clear and accessible data usage license
    • R1.2. (meta)data are associated with detailed provenance
    • R1.3. (meta)data meet domain-relevant community standards

Thought 1: (Meta)data. Does metadata and data ever need to be tested separately, and therefore need separate sub-principles?

In multiple locations, the FAIR principles describes both metadata and data as separate concepts within a single sub-principle.

In the "yes" camp, we would say that, for instance, metadata and data can often be licenced separately (R1.1); in theory we want to have a way to say such things separately, e.g. R1.1A (for metadata's usage licence) and R1.1B (for data's usage licence).

In the "no" camp, what is gained by having separate sub-principles for metadata licencing and data licencing - wouldn't the metric and test look the same? But e.g. would a community benchmark wish to explicitly link to different metrics/requirements for metadata vs data licencing?

Thought 2: Compound Principles. Further sub-principles beyond what has been divided so far

Irrespective of the '(meta)data' question, there are three sub-principles that could and perhaps should be further subdivided, marked with a (*) in the above list. Take 'F1. (meta)data are assigned a globally unique and persistent identifier'. In OSTrails, I would need to separate this into two concepts; the concept of globally unique identifiers, and the concept of persistent identifiers. If we're also considering thought #1, we'd have 4 permutations of subprinciples: globally unique id for metadata, globally unique id for data, and the same for persistent ids. Certainly other developers have seen this issue; FUJI has also further split sub-principles (marked with FUJI in the above list).

Thought 3: A sub-principle for F, A, I and R

Do we want essentially container sub-principles for F, A, I and R? this would allow us to e.g. usefully query to discover 'all metrics related to I', or similar. Or, do we want to go straight from the FAIR Principles record to the items to the individual subprinciples such as F1?

What do I advise?

If we fully expand all of these according to all of the above thoughts, we run the risk of having a superfluous number of sub-principles. Here's my advice:

  • Do not implement Thought 1 - for now. If we have a distinct use case, I will change my vote however! This advice is mostly for simplicity. If necessary, metrics can state whether or not they are testing the metadata or the data aspect, and multiple metrics can point to the same principle. Easy enough to add additional child principles at a later date if required.
  • We SHOULD implement Thought 2 for any compound principles. What would we use for the homepages, though?
  • I am agnostic about Thought 3, except that a grouping principle record e.g. a record for 'F' is likely to be useful, so I lean towards its implementation. However, what would we use for the homepages?

Associated questions:

  • Remind me, can metrics can only link to 'leaf' sub-principles? Might any metric wish to access e.g. R1 rather than R1.2, for example? IIRC metrics would only link to leaf principles, yes?
  • What happens with connected sub-principles? Will any metrics attempt to refer to both R1.3 and F2, which have explicit links to each other? How would this work in practice within OSTrails?

Thanks!

@allysonlister
Copy link
Author

allysonlister commented Feb 24, 2025

Good afternoon everyone! Just a reminder to please provide your comments on the three "thoughts" I have above, as I cannot start creating Principles records and associated metrics and benchmarks records until this is agreed on.

Thanks!

@pabloalarconm
Copy link
Collaborator

Hi @allysonlister, thank you for the reminder. I'm happy to contribute to this discussion:

Thought 1:
Although I'm not the biggest expert in this FAIR room, I believe that dividing FAIR data and metadata should not happen (neither in this question nor in question 2). FAIR evaluation is for metadata, not for data. Altought data is targeted in this evaluation through metadata, it is not the data itself the one that is directly evaluated. (I would love to hear the opinions of others on this.)

Thought 2:
Although granularity is always valuable, I believe it is not entirely necessary in this case. As far as I know, we gain no benefit from further subdividing our dimensions into more specific subprinciples. The only aspect that would benefit from this level of granularity would be the FAIR Metrics (which describe the how) compared to Dimensions (which describe the what).

An parallel consideration: the final assessment score of a FAIR evaluation will not be divided into individual F, A, I, and R categories (at least, this was the conclusion I reached in my discussion with @markwilkinson). So, referencing the dimension/subdimension URL wont require deep granular consideration in the FAIR Metric metadata, it would be useful mainly for classification.

Thought 3:
Sorry, Allyson, I didn't understand what you meant by "container". However, I believe it would greatly benefit the ability to search for all FAIR Metrics associated with a particular Principle. This would allow specific communities to easily create their own benchmarks by exploring the available FAIR Metrics in FAIRSharing.

Related to additional questions:
I see no issue with a particular case where multiple Dimensions are attached to a single FAIR Metric. However, I believe this will not be common, as most of the leaf FAIR Metrics (the ones with tests attached to them) are already granular enough, making it unlikely for them to be associated with multiple subprinciples from different FAIR principles.

Hope it helps!!

@allysonlister
Copy link
Author

Thank you this is very helpful!

Thought 1:
Although I'm not the biggest expert in this FAIR room, I believe that dividing FAIR data and metadata should not happen (neither in this question nor in question 2). FAIR evaluation is for metadata, not for data. Altought data is targeted in this evaluation through metadata, it is not the data itself the one that is directly evaluated. (I would love to hear the opinions of others on this.)

This makes sense. Accordingly, we are leaning towards not implementing Thought 1, thank you!

Thought 2:
Although granularity is always valuable, I believe it is not entirely necessary in this case. As far as I know, we gain no benefit from further subdividing our dimensions into more specific subprinciples. The only aspect that would benefit from this level of granularity would be the FAIR Metrics (which describe the how) compared to Dimensions (which describe the what).

An parallel consideration: the final assessment score of a FAIR evaluation will not be divided into individual F, A, I, and R categories (at least, this was the conclusion I reached in my discussion with @markwilkinson). So, referencing the dimension/subdimension URL wont require deep granular consideration in the FAIR Metric metadata, it would be useful mainly for classification.

The main question here would be: Is there a programmatic decision that would lead to the creation of different Metrics and/or Tests that would separately check for e.g. global uniqueness of an id and separately its persistence?

For example, a UniProt Accession number, e.g. P12345 (https://www.uniprot.org/uniprotkb/P12345). This is not globally unique but it is persistent. With its URL, it becomes globally unique but there is no guarantee of persistence of the URL as a whole. Therefore it is conceivable (and FUJI has certainly implemented it in this way) that there need to be separate tests for global uniqueness and also for persistence. If this is the case, then we should have sub-principles underneath F1 for these two separate concepts, and to link Metrics to those separated concepts.

However, if we feel that we don't need that much granularity we can just make F1. Additional child principles could always be made in future if required. What do you think?

Thought 3:
Sorry, Allyson, I didn't understand what you meant by "container". However, I believe it would greatly benefit the ability to search for all FAIR Metrics associated with a particular Principle. This would allow specific communities to easily create their own benchmarks by exploring the available FAIR Metrics in FAIRSharing.

You got the meaning! Sorry I wasn't clear. This does help, so I would lean towards implementing principle records for F, A I and R for ease of searching and categorisation.

@dgarijo
Copy link
Collaborator

dgarijo commented Feb 24, 2025

Hello, I will contribute to this conversation after March 4th, since I have an urgent deadline (i.e., going through specific thoughts). For now I will just say that I was under the impression that we would use the w3ids from https://w3id.org/fair/principles/ to refer to dimensions in general.

These are community owned and persistent, and include subprinciples ids: e.g., https://w3id.org/fair/principles/terms/A1. They all share the same landing page and, in theory support machine-readibility (I have not checked this though). We can have fairsharing recods as an archival page, and continue using these. Or have fairsharing for the general one and use these as subprinciples (the way they are structured makes quite a lot of sense to me)

@allysonlister
Copy link
Author

We are planning to use those ids! See the start of this ticket at:

We have already had an offline discussion about what the 'homepage' of such records should be (the IRIs from https://peta-pico.github.io/FAIR-nanopubs/principles/index-en.html).

If we decide to do sub-principles (Thought 2) where required, or if we decide to create 'useful' categorisation principles for F, A, I, and R (thought 3), then we will NOT have such ids and would have to come up with other homepage URLs to use. It's only these 'extra' items within the principles hierarchy that the w3ids you refer to wouldn't be appropriate.

@allysonlister
Copy link
Author

allysonlister commented Feb 27, 2025

FYI, relating to Thought 2: I am leaning more and more to the deconstruction of the following to composite sub-principles within our FAIR principles hierarchy:

  1. F1. (meta)data are assigned a globally unique and persistent identifier: If a test ran this as a compound principle, but the code failed on one, we wouldn't know which one it would fail on. Further the actual implemented tests for globally unique and persistent would quite often be different. These should be separate subprinciples.
  2. I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation: same reasoning as for F1; tests for a language being formal are completely different from those testing its accessibility, and from those testing 'applicability' to a particular research community. These should be separated into three subprinciples.

Whereas the other two composite subprinciples (A1.1 and R1.1) could in theory be subdivided, in practice the metrics and tests that would be implemented would generally test all at once, which isn't the case for the examples above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants