-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates to Coding in gks-core-source.yaml #107
base: 1.x
Are you sure you want to change the base?
Conversation
- fixes the non-resolving url example in the Coding definition - updates recommendations in several Coding attribute descriptions - adds two exploratory, optional attributes that would allow complete specification of prefix and prefix expansion where useful. These changes aim to address feedback in the gks va-spec ballot release discussion here: ga4gh/va-spec#234 (comment)
description: >- | ||
A symbol uniquely identifying the concept, as in a syntax defined by the code system. | ||
CURIE format is preferred where possible (e.g. 'SO:0000704' is the CURIE form of the | ||
Sequence Ontology code for 'gene'). See https://www.w3.org/TR/curie/. | ||
label: | ||
type: string | ||
description: The human-readable name for the coded concept, as defined by the code system. | ||
system: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, system
is very broad and can lead to ambiguity / inconsistencies since it can be free-text name, IRI, or URL. Have you considered splitting this out into multiple fields (i.e systemName
, systemIri
, and systemUrl
) and making at least of of them required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I kind of like this idea. although we may not need both systemIri
and systemUrl
attributes.
Beyond this, I think we need to have a broader discussion about what it is we want the Coding
object to do and not to do, and if/how we want to commit to use of CURIEs as values in this object. And generally how we want to handle enumerations that may be wrapped in MappableConcepts
and Codings
. It still feels a bit complicated and unclear to me.
The use of CURIEs in the linked data / linkML world provides a way to bypass the need to explicitly list out the system and code separately in the data - and relies on a prefix mapping document that lives outside the data for this. We are recommending use of CURIEs here in the context of a more FHIR-like MappableConcept
and Coding
objects that also capture the system url and code, and now possibly a prefix and prefix expansion. This all seems duplicative and likely to lead to confusion. I think we need to tease apart what we want here and implement something clearer to support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there could/should be multiple types of Coding
? IE one for CURIEs and another for FHIR-style. Flexibility to support either seems like a good thing, but unless you think there are cases where a user wants to provide representation for both or a mixture therein, it might be simpler to just force them to choose one of the two types.
I appreciate the message compactness that a separate prefix expansion document provides -- it might be good to codify this in some kind of GKS object, though, just so that there's an expected standard for how to access it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is one option for how to approach the issue, that favors flexibility over interoperability (i.e. enforcing a single way of doing things that would ensure interoperable data) . . . The broader discussion I want to have is what we prioritize in the short and long term to ensure success of the va-spec. Our initial spec leans more toward flexibility over interoperability - to lower the barrier to early adoption. But in the long run if we true interoperability, we will either need to converge on a single way of doing things, or provide tooling that supports automated transformations between alternate representations that can be used to derive interoperable data from divergent representations. The linkML framework is doing some interesting work in this area that I am keen to explore.
systemVersion: | ||
type: string | ||
description: Version of the terminology or code system that provided the code. | ||
code: | ||
$ref: '#/$defs/code' | ||
systemPrefix: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm a little confused on the purpose of systemPrefix
if code
is preferred to be a CURIE (systemPrefix:identifier
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
systemPrefix
and systemPrefixExpansion
would just allow data providers to be explicit about what the prefix is, and what it expands to in order to generate a resolvable url - if they wanted to. This kind of info is often captured in prefix mapping sections of schema in the linkML / linked data world. Not sure we need it here right now - but in combination with the other proposed changes it lets a data provider be very clear about the system, the code from that system, and how to resolve this code to find more info about it (in cases where the code is a CURIE with a url prefix that lets it resolve). Whether this is useful/necessary to support is another question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also want to make sure the json/rst files are updated too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating the examples would also be really helpful
Per discussion here: ga4gh/gks-core#107
Per discussion here: ga4gh/gks-core#107
Per discussion here: ga4gh/gks-core#107
description: >- | ||
A symbol uniquely identifying the concept, as in a syntax defined by the code system. | ||
CURIE format is preferred where possible (e.g. 'SO:0000704' is the CURIE form of the | ||
Sequence Ontology code for 'gene'). See https://www.w3.org/TR/curie/. | ||
label: | ||
type: string | ||
description: The human-readable name for the coded concept, as defined by the code system. | ||
system: | ||
type: string | ||
description: >- | ||
The terminology/code system that defined the code. May be reported as a free-text name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbrush , @ahwagner and I had a discussion regarding system
when reviewing my draft PR in va-spec where I initially used free text label for this field. However, @ahwagner believes URI should be used to align with FHIR.
Tagging @larrybabb to follow too
A symbol uniquely identifying the concept, as in a syntax defined by the code system. | ||
CURIE format is preferred where possible (e.g. 'SO:0000704' is the CURIE form of the | ||
Sequence Ontology code for 'gene'). See https://www.w3.org/TR/curie/. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbrush "A symbol uniquely identifying the concept, as in a syntax defined by the code system." and "CURIE format is preferred where possible" seem to conflict. For example, NCIt code syntax is C\d+
. But to me, the CURIE format part could mean that we should use ncit:C\d+
.
A previous description version states that code should be "Symbol in syntax defined by the terminology system."
I think we should consider updating the description to:
A symbol uniquely identifying the concept, as in a syntax defined by the code system. | |
CURIE format is preferred where possible (e.g. 'SO:0000704' is the CURIE form of the | |
Sequence Ontology code for 'gene'). See https://www.w3.org/TR/curie/. | |
A symbol uniquely identifying the concept, as in a syntax defined by the code system. | |
(e.g. 'SO:0000704' is the proper code syntax defined by Sequence Ontology for 'gene' | |
and 'C287' is the proper code syntax defined by NCIt for 'aspirin'). |
Tagging @ahwagner @larrybabb for comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kori. I think a broader discussion is needed around if and how strongly we recommend the use of CURIEs - so we can address issues like the one you raise, and others that have come up due to our trying to support use of CURIES in a FHIR-like Coding object that was not designed to support them. This will definitely be on the agenda to address over the next few weeks as we finalize things for the 1.0 release.
Longer term (or perhaps even now), esp. if we decide to capture more information about a 'system' that provided a code, it may be worth considering crating a Class to represent such systems. In other models I have seen/developed, there is the notion of an 'InformationResource. For example, the Biolink model defines an InformationResource as:
And in projects that use Biolink, such as the NCATS Data Translator, the InformationResource model and corresponding 'InfoRes Catalog' is used to capture provenance of where knowledge and terms come from. Terminology systems like thesauri, ontologies, etc, represent one kind of InformationResource. We could define a model for describing such resources, that can be populated to capture info about a term / code system within a Coding. And even use the Biolink InfoRes Catalog to provide identifiers and metadata. e.g. a model may look like:
Then a Coding instance may look like:
We likely wouldn't want or need to pass along all this metadata about the system in the message every time a term form the system was used - so we could establish or re-use infrastructure that lets us provide a url that resolves to such info. And only include critical info about the system in the message itself. |
These changes aim to address feedback in the gks va-spec ballot release discussion here: ga4gh/va-spec#234 (comment)
system
andcode
attribute descriptionssystemPrefix
andsystemPrefixExpansion
) .