-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Add optional flag_methods
variable attribute
#205
Comments
Dear Jessica and others Thank you for this proposal. I'm not familiar with the technical details of your example. My understanding is that your
and thus we would not need a new attribute. By the way, the Best wishes Jonathan |
@JonathanGregory Thanks for your comments. I do not think it makes sense to encode the information into |
I think that the values for the flag_methods attribute need more clarification, if this addition is going to be useful. At the moment the values are wide open, but with the intention of making them machine readable. I'm not sure those two options are compatible. Can you flesh out the potential values, and the governance of the vocabulary if we choose to go that route? |
I think overloading all this into flag_meanings can quickly make the flag_meanings elements nearly incomprehensible. @MaartenSneepKNMI has a valid point. If we allow free-form content at the beginning, it may be hard to impose further rigor later. It would require some clear signal that the content was from a controlled vocabulary so that machine understanding could be successfully applied. This is doable, but we need to be clear on what that looks like up front. For example, if the value of the attribute was a JSON block containing controlled vocabulary elements vs one that was not, it would be a clear signal. |
Dear Jessica Your answer to my comment indicates that I don't really understand your example - which indeed I suspected. My understanding is that you want to add more information about what the Best wishes Jonathan |
I realize this might be "crazy talk" but commenting anyway because it might be interesting to think about. When looking at the example, for some (irrational) reason I had the desire for "all" the QC information to be in a single variable and started to think about what that might look like. I couldn't reason though a way of using things like flag masks. Then I wondered, "just how many of these QARTOD tests are there?". The answer for salinity is 13 of them. That would be a lot of QC variables laying around, not that there is really anything wrong with that on a technical level. What if, all the QC flags were in a single variable, but the last dimension corresponded to the QC test method? Instead of a new attribute, there is a new standard name "flag_method". A variable with is name you would probably want to be an actual "coordinate variable" for the labels of of all the QC tests. Here is a hand modified CDL of the modified CDL from the original proposal for what this might look like:
|
Dear Jessica, I like the idea of providing a but more detail about what is meant by There are several cases in the standard name table in which a specific term is defined for use with a set of of values, such as |
Thank you everyone for all the feedback! We appreciate you taking the time. There is a lot of great stuff for us to think through here. We did imagine that the value of
The whole point of our proposal is to try to define a way for a script to figure out which dataset variable corresponds to which qartod test, so I agree without a controlled vocabulary this is not possible. In terms of how to define or enforce this within the CF conventions, I'm not sure the best way that should be accomplished. Does it make sense to have an additional attribute -- say, (Side note: @martinjuckes, You mentioned the broken links to documentation and questioned how stable this library is, which is a completely valid point. Alongside this work to nail down a new IOOS metadata profile, we are also doing a bunch of work to get the ioos_qc library to 1.0, including moving its location in github as we consolidate it with other qartod libraries. Once we do release 1.0, the links should be stable and if we do move documentation we would add redirects so no links were broken.) @DocOtak, your idea is not crazy at all. I've seen examples of people encoding values for all the QARTOD tests in a single variable, with a @JonathanGregory, Yes, the idea is to add more information about what the
This is an approach we considered, but then discarded because we weren't sure that the qartod test names would be appropriate entries in the standard name table. However based on this conversation, and the recently accepted So to sum up, it sounds like we have two options:
Based on the feedback so far, we're leaning towards (2). But we'll see if there are any other comments people have here before making that proposal. |
Submitted an issue proposing a new set of standard names: #216 If that is accepted, I will close this issue. |
I'm not quite sure if you're proposing that CF be responsible for the "IOOS QARTOD Tests" vocabulary, or if that would be maintained by IOOS. In the latter case, CF would just be accepting flag_methods as a variable attribute with a specific meaning, and the onus would be on IOOS (or other projects) to make sure it was used by their data providers as they want it used. I like your proposal '...to have an additional attribute -- say, flag_methods_vocabulary -- with a value of the vocabulary name?' That would also allow you to make the list of acceptable values a little less cumbersome: climatology, flat_line, range, attenuated_signal, and aggregate, for example. Then these could be more generally useful to different communities using different specifics but the same general vocabulary. |
@ngalbraith To be honest, we are not quite sure either! We are hoping that through these proposals we can figure out the best way. Originally, we were thinking IOOS would maintain this vocabulary, and it would be specified using |
@davidhassell This has been superseded by #216 and can be closed. |
It seems that this stale issue is still causing confusion even though it was superseded by another one about 3 years ago. Can we now close this issue as already suggested? |
Last input was 2y ago - closing, please re-open if necessary :) |
Title: Add
flag_methods
variable attributeRequirement Summary: Optional
flag_methods
attribute in section 3.5. FlagsTechnical Proposal Summary: A new variable attribute called
flag_methods
, that specifies the process used to calculate the flag values.Benefits: This brings clarity around how flags are calculated, and allows for machine-to-machine understanding
Status Quo: One can use
ancillary_variables
to link data to flags,flag_values
andflag_meanings
to describe the flags themselves, andreferences
to provide links to code or documentation. However, there is no clear way to indicate which method was used to calculate flags.This proposal was developed with @mwengren @kwilcox and @kevin-obrien
Detailed Proposal:
Motivation
The U.S. Integrated Ocean Observing System Program (IOOS) is working with NOAA NDBC to develop standards for ingesting real-time environmental sensor data into the GTS from partners across the US via ERDDAP. NDBC wishes to exclude any values that fail data quality control tests, as specified by the QARTOD guidelines (available here: https://ioos.noaa.gov/project/qartod/) from their ingest process. To achieve this, we need consistency in how data providers specify which flag variable indicates the QARTOD "rollup" or "aggregate" flag. This single flag indicates whether a data point has failed any of the applicable QARTOD tests. While IOOS maintains their own metadata profile (that extends CF and ACDD), we feel that standards around how to specify the results of QC tests would be useful for a wider community. Based on recent discussions on the CF-metadata mailing list, we are not alone in thinking about these ideas (see Related Discussions below).
What we propose
We propose a new variable attribute called
flag_methods
, that specifies the process used to calculate the flag values.In other words: the
flag_values
andflag_meanings
describe what the flags are, and theflag_methods
attribute would describe how they were determined (i.e. a specific test name) or should be interpreted (i.e. as an aggregate of multiple tests).Ideally, the value for
flag_methods
would be part of a known vocabulary, and associated with a codebase, documentation or publication via thereferences
attribute. This enables machine-to-machine communication and thus fits our intended use well. However, the value could also be a human-readable description of the methods used to calculate the flags.Example:
In this example the
flag_methods
indicate which QARTOD test was used to calculate each flag. The globalreferences
attribute links to the code documentation for the particular library used.This example is adapted from a live dataset on the CeNCOOS RA ERDDAP server, with modifications to match this proposal. Going back to the NDBC use case: NDBC wishes to pull all values of salinity where the rollup/aggregate flag is not QC fail. They could check for
flag_methods=qartod_aggregate
to determine this.Related Discussions:
quality_flag
attribute in addition to the existingstatus_flag
. We see this proposal as compatible with ours.flag_references
is discussed in much the same way as we useflag_methods
here. The concepts are not exactly the same, but it seems like we are trying to solve the same problem.The text was updated successfully, but these errors were encountered: