Updates to Coding in gks-core-source.yaml #107

mbrush · 2024-12-27T23:10:40Z

These changes aim to address feedback in the gks va-spec ballot release discussion here: ga4gh/va-spec#234 (comment)

fixes a non-resolving url example in the Coding definition
updates recommendations in system and code attribute descriptions
adds two exploratory, optional attributes that would allow complete specification of prefix and prefix expansion where useful (systemPrefix and systemPrefixExpansion) .

- fixes the non-resolving url example in the Coding definition - updates recommendations in several Coding attribute descriptions - adds two exploratory, optional attributes that would allow complete specification of prefix and prefix expansion where useful. These changes aim to address feedback in the gks va-spec ballot release discussion here: ga4gh/va-spec#234 (comment)

korikuzma · 2024-12-27T23:54:32Z

schema/gks-core/gks-core-source.yaml

+        description: >-
+          A symbol uniquely identifying the concept, as in a syntax defined by the code system.
+          CURIE format is preferred where possible (e.g. 'SO:0000704' is the CURIE form of the
+          Sequence Ontology code for 'gene'). See https://www.w3.org/TR/curie/.
      label:
        type: string
        description: The human-readable name for the coded concept, as defined by the code system.
      system:


IMO, system is very broad and can lead to ambiguity / inconsistencies since it can be free-text name, IRI, or URL. Have you considered splitting this out into multiple fields (i.e systemName, systemIri, and systemUrl) and making at least of of them required?

Yeah, I kind of like this idea. although we may not need both systemIri and systemUrl attributes.

Beyond this, I think we need to have a broader discussion about what it is we want the Coding object to do and not to do, and if/how we want to commit to use of CURIEs as values in this object. And generally how we want to handle enumerations that may be wrapped in MappableConcepts and Codings. It still feels a bit complicated and unclear to me.

The use of CURIEs in the linked data / linkML world provides a way to bypass the need to explicitly list out the system and code separately in the data - and relies on a prefix mapping document that lives outside the data for this. We are recommending use of CURIEs here in the context of a more FHIR-like MappableConcept and Coding objects that also capture the system url and code, and now possibly a prefix and prefix expansion. This all seems duplicative and likely to lead to confusion. I think we need to tease apart what we want here and implement something clearer to support it.

I wonder if there could/should be multiple types of Coding? IE one for CURIEs and another for FHIR-style. Flexibility to support either seems like a good thing, but unless you think there are cases where a user wants to provide representation for both or a mixture therein, it might be simpler to just force them to choose one of the two types.

I appreciate the message compactness that a separate prefix expansion document provides -- it might be good to codify this in some kind of GKS object, though, just so that there's an expected standard for how to access it

Yes, this is one option for how to approach the issue, that favors flexibility over interoperability (i.e. enforcing a single way of doing things that would ensure interoperable data) . . . The broader discussion I want to have is what we prioritize in the short and long term to ensure success of the va-spec. Our initial spec leans more toward flexibility over interoperability - to lower the barrier to early adoption. But in the long run if we true interoperability, we will either need to converge on a single way of doing things, or provide tooling that supports automated transformations between alternate representations that can be used to derive interoperable data from divergent representations. The linkML framework is doing some interesting work in this area that I am keen to explore.

korikuzma · 2024-12-28T00:15:40Z

schema/gks-core/gks-core-source.yaml

      systemVersion:
        type: string
        description: Version of the terminology or code system that provided the code.
-      code:
-        $ref: '#/$defs/code'
+      systemPrefix: 


I think I'm a little confused on the purpose of systemPrefix if code is preferred to be a CURIE (systemPrefix:identifier).

systemPrefix and systemPrefixExpansion would just allow data providers to be explicit about what the prefix is, and what it expands to in order to generate a resolvable url - if they wanted to. This kind of info is often captured in prefix mapping sections of schema in the linkML / linked data world. Not sure we need it here right now - but in combination with the other proposed changes it lets a data provider be very clear about the system, the code from that system, and how to resolve this code to find more info about it (in cases where the code is a CURIE with a url prefix that lets it resolve). Whether this is useful/necessary to support is another question.

korikuzma · 2024-12-28T00:17:58Z

schema/gks-core/gks-core-source.yaml

We also want to make sure the json/rst files are updated too

Updating the examples would also be really helpful

Per discussion here: ga4gh/gks-core#107

korikuzma · 2025-01-15T20:41:04Z

schema/gks-core/gks-core-source.yaml

+        description: >-
+          A symbol uniquely identifying the concept, as in a syntax defined by the code system.
+          CURIE format is preferred where possible (e.g. 'SO:0000704' is the CURIE form of the
+          Sequence Ontology code for 'gene'). See https://www.w3.org/TR/curie/.
      label:
        type: string
        description: The human-readable name for the coded concept, as defined by the code system.
      system:
        type: string
        description: >-
          The terminology/code system that defined the code. May be reported as a free-text name


@mbrush , @ahwagner and I had a discussion regarding system when reviewing my draft PR in va-spec where I initially used free text label for this field. However, @ahwagner believes URI should be used to align with FHIR.

Tagging @larrybabb to follow too

korikuzma · 2025-01-15T20:48:54Z

schema/gks-core/gks-core-source.yaml

+          A symbol uniquely identifying the concept, as in a syntax defined by the code system.
+          CURIE format is preferred where possible (e.g. 'SO:0000704' is the CURIE form of the
+          Sequence Ontology code for 'gene'). See https://www.w3.org/TR/curie/.


@mbrush "A symbol uniquely identifying the concept, as in a syntax defined by the code system." and "CURIE format is preferred where possible" seem to conflict. For example, NCIt code syntax is C\d+. But to me, the CURIE format part could mean that we should use ncit:C\d+.

A previous description version states that code should be "Symbol in syntax defined by the terminology system."

I think we should consider updating the description to:

Suggested change

A symbol uniquely identifying the concept, as in a syntax defined by the code system.

CURIE format is preferred where possible (e.g. 'SO:0000704' is the CURIE form of the

Sequence Ontology code for 'gene'). See https://www.w3.org/TR/curie/.

A symbol uniquely identifying the concept, as in a syntax defined by the code system.

(e.g. 'SO:0000704' is the proper code syntax defined by Sequence Ontology for 'gene'

and 'C287' is the proper code syntax defined by NCIt for 'aspirin').

Tagging @ahwagner @larrybabb for comment

Thanks Kori. I think a broader discussion is needed around if and how strongly we recommend the use of CURIEs - so we can address issues like the one you raise, and others that have come up due to our trying to support use of CURIES in a FHIR-like Coding object that was not designed to support them. This will definitely be on the agenda to address over the next few weeks as we finalize things for the 1.0 release.

mbrush · 2025-01-28T19:37:48Z

Longer term (or perhaps even now), esp. if we decide to capture more information about a 'system' that provided a code, it may be worth considering crating a Class to represent such systems. In other models I have seen/developed, there is the notion of an 'InformationResource. For example, the Biolink model defines an InformationResource as:

"A database or knowledgebase and its supporting ecosystem of interfaces and services that deliver content to consumers (e.g. web portals, APIs, query endpoints, streaming services, data downloads, etc.). A single Information Resource by this definition may span many different datasets or databases, and include many access endpoints and user interfaces".

And in projects that use Biolink, such as the NCATS Data Translator, the InformationResource model and corresponding 'InfoRes Catalog' is used to capture provenance of where knowledge and terms come from.

Terminology systems like thesauri, ontologies, etc, represent one kind of InformationResource. We could define a model for describing such resources, that can be populated to capture info about a term / code system within a Coding. And even use the Biolink InfoRes Catalog to provide identifiers and metadata.

e.g. a model may look like:

InformationResource:
  inherits: Entity
  type: object
  properties:
    name:
    homepage_url:
    content_url:
    prefix:
    prefixExpansions:
    version:

Then a Coding instance may look like:

- coding: 
    code: SO:0000704
    label: Gene
    system: 
      id:  infores:so
      name: Sequence Ontology
      homepage_url: http://www.sequenceontology.org/
      content_url: http://purl.obolibrary.org/obo/so.owl
      version: 2.1.3
      prefix: SO
      prefixExpansions: 
          - http://purl.obolibrary.org/obo/SO_
          - http://purl.bioontology.org/ontology/SO/SO
          - http://www.sequenceontology.org/miso/current_release/term/SO:

We likely wouldn't want or need to pass along all this metadata about the system in the message every time a term form the system was used - so we could establish or re-use infrastructure that lets us provide a url that resolves to such info. And only include critical info about the system in the message itself.

korikuzma reviewed Dec 27, 2024

View reviewed changes

korikuzma reviewed Dec 28, 2024

View reviewed changes

korikuzma mentioned this pull request Dec 30, 2024

feat!: use preferred formats for MappableConcept.mappings cancervariants/disease-normalization#215

Merged

korikuzma added a commit to cancervariants/gene-normalization that referenced this pull request Dec 30, 2024

fix!: system should use source homepage

e954ccd

Per discussion here: ga4gh/gks-core#107

korikuzma added a commit to cancervariants/gene-normalization that referenced this pull request Dec 30, 2024

fix!: system should use source homepage

1a141d5

Per discussion here: ga4gh/gks-core#107

korikuzma mentioned this pull request Dec 30, 2024

fix!: system should use source homepage cancervariants/gene-normalization#389

Merged

korikuzma added a commit to cancervariants/gene-normalization that referenced this pull request Dec 30, 2024

fix!: system should use source homepage (#389)

e2eb35c

Per discussion here: ga4gh/gks-core#107

korikuzma reviewed Jan 15, 2025

View reviewed changes

mbrush mentioned this pull request Jan 16, 2025

Clarity about attributes of Coding ga4gh/va-spec#259

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates to Coding in gks-core-source.yaml #107

Updates to Coding in gks-core-source.yaml #107

mbrush commented Dec 27, 2024

korikuzma Dec 27, 2024

mbrush Dec 28, 2024

jsstevenson Dec 30, 2024 •

edited

Loading

mbrush Dec 30, 2024

korikuzma Dec 28, 2024

mbrush Dec 28, 2024

korikuzma Dec 28, 2024

korikuzma Dec 28, 2024

korikuzma Jan 15, 2025

korikuzma Jan 15, 2025

mbrush Jan 16, 2025

mbrush commented Jan 28, 2025

Updates to Coding in gks-core-source.yaml #107

Are you sure you want to change the base?

Updates to Coding in gks-core-source.yaml #107

Conversation

mbrush commented Dec 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsstevenson Dec 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbrush commented Jan 28, 2025

jsstevenson Dec 30, 2024 •

edited

Loading