Proposal: Add optional `flag_methods` variable attribute #205

jessicaaustin · 2019-09-10T23:42:32Z

Title: Add flag_methods variable attribute

Requirement Summary: Optional flag_methods attribute in section 3.5. Flags

Technical Proposal Summary: A new variable attribute called flag_methods, that specifies the process used to calculate the flag values.

Benefits: This brings clarity around how flags are calculated, and allows for machine-to-machine understanding

Status Quo: One can use ancillary_variables to link data to flags, flag_values and flag_meanings to describe the flags themselves, and references to provide links to code or documentation. However, there is no clear way to indicate which method was used to calculate flags.

This proposal was developed with @mwengren @kwilcox and @kevin-obrien

Detailed Proposal:

Motivation

The U.S. Integrated Ocean Observing System Program (IOOS) is working with NOAA NDBC to develop standards for ingesting real-time environmental sensor data into the GTS from partners across the US via ERDDAP. NDBC wishes to exclude any values that fail data quality control tests, as specified by the QARTOD guidelines (available here: https://ioos.noaa.gov/project/qartod/) from their ingest process. To achieve this, we need consistency in how data providers specify which flag variable indicates the QARTOD "rollup" or "aggregate" flag. This single flag indicates whether a data point has failed any of the applicable QARTOD tests. While IOOS maintains their own metadata profile (that extends CF and ACDD), we feel that standards around how to specify the results of QC tests would be useful for a wider community. Based on recent discussions on the CF-metadata mailing list, we are not alone in thinking about these ideas (see Related Discussions below).

What we propose

We propose a new variable attribute called flag_methods, that specifies the process used to calculate the flag values.

In other words: the flag_values and flag_meanings describe what the flags are, and the flag_methods attribute would describe how they were determined (i.e. a specific test name) or should be interpreted (i.e. as an aggregate of multiple tests).

Ideally, the value for flag_methods would be part of a known vocabulary, and associated with a codebase, documentation or publication via the references attribute. This enables machine-to-machine communication and thus fits our intended use well. However, the value could also be a human-readable description of the methods used to calculate the flags.

Example:

In this example the flag_methods indicate which QARTOD test was used to calculate each flag. The global references attribute links to the code documentation for the particular library used.

This example is adapted from a live dataset on the CeNCOOS RA ERDDAP server, with modifications to match this proposal. Going back to the NDBC use case: NDBC wishes to pull all values of salinity where the rollup/aggregate flag is not QC fail. They could check for flag_methods=qartod_aggregate to determine this.

variables:

    float sea_water_practical_salinity(time, z);
        sea_water_practical_salinity:units = "1";
        sea_water_practical_salinity:long_name = "Salinity";
        sea_water_practical_salinity:standard_name = "sea_water_practical_salinity";
        sea_water_practical_salinity:missing_value = -9999.0;
        sea_water_practical_salinity:ancillary_variables = "sea_water_practical_salinity_qc_agg sea_water_practical_salinity_qc_flat_line_test";

    int sea_water_practical_salinity_qc_agg(time, z);
        sea_water_practical_salinity_qc_agg:long_name = "Salinity QARTOD Aggregate Flag";
        sea_water_practical_salinity_qc_agg:standard_name = "status_flag";
        sea_water_practical_salinity_qc_agg:missing_value = 2;
        sea_water_practical_salinity_qc_agg:flag_meanings = "PASS NOT_EVALUATED SUSPECT FAIL MISSING";
        sea_water_practical_salinity_qc_agg:flag_values = "1, 2, 3, 4, 9";
        sea_water_practical_salinity_qc_agg:flag_methods = "qartod_aggregate";
	sea_water_practical_salinity_qc_agg:references = "https://axiom-data-science.github.io/ioos_qc/aggregate_flag";

    int sea_water_practical_salinity_qc_flat_line_test(time, z);
        sea_water_practical_salinity_qc_flat_line_test:long_name = "Salinity QARTOD Flat Line Test Flag";
        sea_water_practical_salinity_qc_flat_line_test:standard_name = "status_flag";
        sea_water_practical_salinity_qc_flat_line_test:missing_value = 2;
        sea_water_practical_salinity_qc_flat_line_test:flag_meanings = "PASS NOT_EVALUATED SUSPECT FAIL MISSING";
        sea_water_practical_salinity_qc_flat_line_test:flag_values = "1, 2, 3, 4, 9";
        sea_water_practical_salinity_qc_flat_line_test:flag_methods = "qartod_flat_line";
	sea_water_practical_salinity_qc_flat_line_test:references = "https://axiom-data-science.github.io/ioos_qc/qartod_flat_line";

global attributes:

    :references = "http://www.cencoos.org/data/shore/humboldt,https://axiom-data-science.github.io/ioos_qc/";

Related Discussions:

The "New standard_name of quality_flag for corresponding quality control variables" mailing list thread is related. In that thread, @kenkehoe proposes a new quality_flag attribute in addition to the existing status_flag. We see this proposal as compatible with ours.
See "PID to external description of Quality and Status states" on the CF-metadata mailing list. In that thread, flag_references is discussed in much the same way as we use flag_methods here. The concepts are not exactly the same, but it seems like we are trying to solve the same problem.
This issue contains some common themes: Add attribute citation_id #160

The text was updated successfully, but these errors were encountered:

JonathanGregory · 2019-09-11T16:56:32Z

Dear Jessica and others

Thank you for this proposal. I'm not familiar with the technical details of your example. My understanding is that your flag_method of qartod_flat_line is intended to clarify all the flag_meanings, by indicating how they are defined - is that right? If it is right, could this information not be put in the flag_meanings? Although it's cumbersome, you could have something like

flag_meanings = "qartod_flat_line_PASS NOT_EVALUATED qartod_flat_line_SUSPECT qartod_flat_line_FAIL MISSING"

and thus we would not need a new attribute.

By the way, the flag_values attribute should be a vector of the possible values, of the same type as the data variable, int in your case, not a string.

Best wishes

Jonathan

jessicaaustin · 2019-10-21T17:47:26Z

@JonathanGregory Thanks for your comments.

I do not think it makes sense to encode the information into flag_meanings like you've suggested, because the meaning could be (and in this case, is) independent of how it was calculated. In the case of QC, PASS means PASS, regardless of which test implementation was used.

MaartenSneepKNMI · 2019-10-22T11:34:12Z

I think that the values for the flag_methods attribute need more clarification, if this addition is going to be useful. At the moment the values are wide open, but with the intention of making them machine readable. I'm not sure those two options are compatible. Can you flesh out the potential values, and the governance of the vocabulary if we choose to go that route?

JimBiardCics · 2019-10-22T12:49:50Z

I think overloading all this into flag_meanings can quickly make the flag_meanings elements nearly incomprehensible. @MaartenSneepKNMI has a valid point. If we allow free-form content at the beginning, it may be hard to impose further rigor later. It would require some clear signal that the content was from a controlled vocabulary so that machine understanding could be successfully applied. This is doable, but we need to be clear on what that looks like up front. For example, if the value of the attribute was a JSON block containing controlled vocabulary elements vs one that was not, it would be a clear signal.

JonathanGregory · 2019-10-22T13:32:51Z

Dear Jessica

Your answer to my comment indicates that I don't really understand your example - which indeed I suspected. My understanding is that you want to add more information about what the flag_meanings mean, in effect - is that right? Does this extra information necessarily apply to all the flag values of the variable? If it does, I would argue that it is information about the variable as a whole, rather than about the flag meanings. For example, it's a description of the quality-control process (in the case of a variable that records QC information), and not necessarily related to the use of flag values to encode the variable. It would apply even if the other flag attributes were not used.

Best wishes

Jonathan

DocOtak · 2019-10-22T14:43:54Z

I realize this might be "crazy talk" but commenting anyway because it might be interesting to think about.

When looking at the example, for some (irrational) reason I had the desire for "all" the QC information to be in a single variable and started to think about what that might look like. I couldn't reason though a way of using things like flag masks. Then I wondered, "just how many of these QARTOD tests are there?". The answer for salinity is 13 of them. That would be a lot of QC variables laying around, not that there is really anything wrong with that on a technical level.

What if, all the QC flags were in a single variable, but the last dimension corresponded to the QC test method? Instead of a new attribute, there is a new standard name "flag_method". A variable with is name you would probably want to be an actual "coordinate variable" for the labels of of all the QC tests.

Here is a hand modified CDL of the modified CDL from the original proposal for what this might look like:

variables:

    float sea_water_practical_salinity(time, z);
        sea_water_practical_salinity:units = "1";
        sea_water_practical_salinity:long_name = "Salinity";
        sea_water_practical_salinity:standard_name = "sea_water_practical_salinity";
        sea_water_practical_salinity:missing_value = -9999.0;
        sea_water_practical_salinity:ancillary_variables = "sea_water_practical_salinity_qc";

    int sea_water_practical_salinity_qc(time, z, qartod_method);
        sea_water_practical_salinity_qc:long_name = "Salinity QARTOD Flag";
        sea_water_practical_salinity_qc:standard_name = "status_flag";
        sea_water_practical_salinity_qc:missing_value = 2;
        sea_water_practical_salinity_qc:flag_meanings = "PASS NOT_EVALUATED SUSPECT FAIL MISSING";
        sea_water_practical_salinity_qc_agg:flag_values = "1, 2, 3, 4, 9";

    string qartod_method(qartod_method); //or char array
        qartod_method:standard_name = "flag_method";
        qartod_method:references = "https://axiom-data-science.github.io/ioos_qc/";

martinjuckes · 2019-10-22T22:37:47Z

Dear Jessica,

I like the idea of providing a but more detail about what is meant by PASS etc, but I'm slightly concerned that the references given for QARTOD are broken links. How stable are these terms?

There are several cases in the standard name table in which a specific term is defined for use with a set of of values, such as atmosphere_stability_showalter_index, beaufort_wind_force. Following the same pattern, we could introduce a new standard names of qartod_aggregate_status_flag, quartod_flat_line_status_flag. This would, I think, be in line with Jonathan's suggestion that the general CF approach would be to put details of the process associated with a particular term in the definition of the standard name. If we do introduce a new mechanism for specifying a "method", I feel that it should be done in a way which could be applicable to other variables, not just flags. E.g. by using a methods attribute, rather than flag_methods.

jessicaaustin · 2019-10-25T22:13:34Z

Thank you everyone for all the feedback! We appreciate you taking the time. There is a lot of great stuff for us to think through here.

We did imagine that the value of flag_meanings would be in reference to a vocabulary -- at least that is what we would enforce within the IOOS community. For example, to satisfy our QARTOD use case, we would define a vocabulary called "IOOS QARTOD Tests", with initial valid values of:

qartod_aggregate
qartod_attenuated_signal_test
qartod_climatology_test
qartod_flat_line_test
qartod_gross_range_test
etc

The whole point of our proposal is to try to define a way for a script to figure out which dataset variable corresponds to which qartod test, so I agree without a controlled vocabulary this is not possible.

In terms of how to define or enforce this within the CF conventions, I'm not sure the best way that should be accomplished. Does it make sense to have an additional attribute -- say, flag_methods_vocabulary -- with a value of the vocabulary name?

(Side note: @martinjuckes, You mentioned the broken links to documentation and questioned how stable this library is, which is a completely valid point. Alongside this work to nail down a new IOOS metadata profile, we are also doing a bunch of work to get the ioos_qc library to 1.0, including moving its location in github as we consolidate it with other qartod libraries. Once we do release 1.0, the links should be stable and if we do move documentation we would add redirects so no links were broken.)

@DocOtak, your idea is not crazy at all. I've seen examples of people encoding values for all the QARTOD tests in a single variable, with a qartod_tests dimension, just like you describe. So in that case, we (IOOS) would enforce that the values of the qartod_tests string should be part of the "IOOS QARTOD Tests" vocabulary. All that said, I don't think this would be a valid approach for us. Asking people to update their dataset with an additional attribute is not a huge deal, and can be accomplished on top of the existing data (for example with an ncml file). But asking them to add a new dimension and restructure their data would be too much.

@JonathanGregory, Yes, the idea is to add more information about what the flag_values mean. In our example, it is indeed "a description of the quality-control process". And you make a valid point, it really is information about the variable itself, not the flags. Which leads me into @martinjuckes other comment, where he suggested doing away with this new flag_methods attribute, and instead adding to the standard_name table. So to make sure I understand what you are saying, the example would look like:

variables:

    float sea_water_practical_salinity(time, z);
        sea_water_practical_salinity:units = "1";
        sea_water_practical_salinity:long_name = "Salinity";
        sea_water_practical_salinity:standard_name = "sea_water_practical_salinity";
        sea_water_practical_salinity:missing_value = -9999.0;
        sea_water_practical_salinity:ancillary_variables = "sea_water_practical_salinity_qc_agg sea_water_practical_salinity_qc_flat_line_test";

    int sea_water_practical_salinity_qc_agg(time, z);
        sea_water_practical_salinity_qc_agg:long_name = "Salinity QARTOD Aggregate Flag";
        sea_water_practical_salinity_qc_agg:standard_name = "status_flag qartod_aggregate_status_flag";
        sea_water_practical_salinity_qc_agg:missing_value = 2;
        sea_water_practical_salinity_qc_agg:flag_meanings = "PASS NOT_EVALUATED SUSPECT FAIL MISSING";
        sea_water_practical_salinity_qc_agg:flag_values = "1, 2, 3, 4, 9";
 	 sea_water_practical_salinity_qc_agg:references = "https://ioos.github.io/ioos_qc/api/ioos_qc.html#ioos_qc.qartod.qartod_compare";

    int sea_water_practical_salinity_qc_flat_line_test(time, z);
        sea_water_practical_salinity_qc_flat_line_test:long_name = "Salinity QARTOD Flat Line Test Flag";
        sea_water_practical_salinity_qc_flat_line_test:standard_name = "status_flag qartod_flat_line_status_flag";
        sea_water_practical_salinity_qc_flat_line_test:missing_value = 2;
        sea_water_practical_salinity_qc_flat_line_test:flag_meanings = "PASS NOT_EVALUATED SUSPECT FAIL MISSING";
        sea_water_practical_salinity_qc_flat_line_test:flag_values = "1, 2, 3, 4, 9";
	sea_water_practical_salinity_qc_flat_line_test:references = "https://ioos.github.io/ioos_qc/api/ioos_qc.html#ioos_qc.qartod.flat_line_test";

This is an approach we considered, but then discarded because we weren't sure that the qartod test names would be appropriate entries in the standard name table. However based on this conversation, and the recently accepted quality_flag name maybe we were wrong. QARTOD is a community standard after all.

So to sum up, it sounds like we have two options:

Continue down this flag_methods route, and figure out a way to make it clear that there is a standard vocabulary associated with this, or
Propose a new set of standard names for the QARTOD tests, and if that is accepted then there is no change needed to the CF conventions themselves

Based on the feedback so far, we're leaning towards (2). But we'll see if there are any other comments people have here before making that proposal.

jessicaaustin · 2019-11-07T23:27:22Z

Submitted an issue proposing a new set of standard names: #216

If that is accepted, I will close this issue.

ngalbraith · 2019-11-08T18:39:53Z

I'm not quite sure if you're proposing that CF be responsible for the "IOOS QARTOD Tests" vocabulary, or if that would be maintained by IOOS.

In the latter case, CF would just be accepting flag_methods as a variable attribute with a specific meaning, and the onus would be on IOOS (or other projects) to make sure it was used by their data providers as they want it used.

I like your proposal '...to have an additional attribute -- say, flag_methods_vocabulary -- with a value of the vocabulary name?' That would also allow you to make the list of acceptable values a little less cumbersome: climatology, flat_line, range, attenuated_signal, and aggregate, for example. Then these could be more generally useful to different communities using different specifics but the same general vocabulary.

jessicaaustin · 2019-11-09T02:35:20Z

@ngalbraith To be honest, we are not quite sure either! We are hoping that through these proposals we can figure out the best way.

Originally, we were thinking IOOS would maintain this vocabulary, and it would be specified using flag_methods (and possibly also flag_methods_vocabulary). We thought this approach would be useful for a wider community. But on the other hand, having CF maintain the list by incorporating these tests into their standard name list would also be fine with us, and in some ways is much simpler. So we'll see how people respond to #216

mwengren · 2020-05-11T16:36:47Z

@davidhassell This has been superseded by #216 and can be closed.

larsbarring · 2022-10-05T11:40:48Z

It seems that this stale issue is still causing confusion even though it was superseded by another one about 3 years ago. Can we now close this issue as already suggested?

erget · 2022-10-05T12:16:47Z

Last input was 2y ago - closing, please re-open if necessary :)

jessicaaustin added the enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format label Sep 10, 2019

jessicaaustin mentioned this issue Nov 7, 2019

Proposal: Add QARTOD quality flag names to standard name list #216

Closed

davidhassell mentioned this issue May 11, 2020

Planning for the 2020 CF meeting: Santander, 9-11 June cf-convention/discuss#35

Closed

David-Rayner-GVC mentioned this issue Oct 5, 2022

Any convention on what flag_meaning to use to QC pass? cf-convention/discuss#184

Open

erget closed this as completed Oct 5, 2022

JonathanGregory added the agreement not to change Issue closed with agreement not to make a change to the conventions label Oct 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Add optional `flag_methods` variable attribute #205

Proposal: Add optional `flag_methods` variable attribute #205

jessicaaustin commented Sep 10, 2019

JonathanGregory commented Sep 11, 2019

jessicaaustin commented Oct 21, 2019

MaartenSneepKNMI commented Oct 22, 2019

JimBiardCics commented Oct 22, 2019

JonathanGregory commented Oct 22, 2019

DocOtak commented Oct 22, 2019

martinjuckes commented Oct 22, 2019

jessicaaustin commented Oct 25, 2019

jessicaaustin commented Nov 7, 2019

ngalbraith commented Nov 8, 2019

jessicaaustin commented Nov 9, 2019

mwengren commented May 11, 2020

larsbarring commented Oct 5, 2022

erget commented Oct 5, 2022

Proposal: Add optional flag_methods variable attribute #205

Proposal: Add optional flag_methods variable attribute #205

Comments

jessicaaustin commented Sep 10, 2019

JonathanGregory commented Sep 11, 2019

jessicaaustin commented Oct 21, 2019

MaartenSneepKNMI commented Oct 22, 2019

JimBiardCics commented Oct 22, 2019

JonathanGregory commented Oct 22, 2019

DocOtak commented Oct 22, 2019

martinjuckes commented Oct 22, 2019

jessicaaustin commented Oct 25, 2019

jessicaaustin commented Nov 7, 2019

ngalbraith commented Nov 8, 2019

jessicaaustin commented Nov 9, 2019

mwengren commented May 11, 2020

larsbarring commented Oct 5, 2022

erget commented Oct 5, 2022

Proposal: Add optional `flag_methods` variable attribute #205

Proposal: Add optional `flag_methods` variable attribute #205