Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Add optional flag_methods variable attribute #205

Closed
jessicaaustin opened this issue Sep 10, 2019 · 14 comments
Closed

Proposal: Add optional flag_methods variable attribute #205

jessicaaustin opened this issue Sep 10, 2019 · 14 comments
Labels
agreement not to change Issue closed with agreement not to make a change to the conventions enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format

Comments

@jessicaaustin
Copy link

Title: Add flag_methods variable attribute

Requirement Summary: Optional flag_methods attribute in section 3.5. Flags

Technical Proposal Summary: A new variable attribute called flag_methods, that specifies the process used to calculate the flag values.

Benefits: This brings clarity around how flags are calculated, and allows for machine-to-machine understanding

Status Quo: One can use ancillary_variables to link data to flags, flag_values and flag_meanings to describe the flags themselves, and references to provide links to code or documentation. However, there is no clear way to indicate which method was used to calculate flags.

This proposal was developed with @mwengren @kwilcox and @kevin-obrien

Detailed Proposal:

Motivation

The U.S. Integrated Ocean Observing System Program (IOOS) is working with NOAA NDBC to develop standards for ingesting real-time environmental sensor data into the GTS from partners across the US via ERDDAP. NDBC wishes to exclude any values that fail data quality control tests, as specified by the QARTOD guidelines (available here: https://ioos.noaa.gov/project/qartod/) from their ingest process. To achieve this, we need consistency in how data providers specify which flag variable indicates the QARTOD "rollup" or "aggregate" flag. This single flag indicates whether a data point has failed any of the applicable QARTOD tests. While IOOS maintains their own metadata profile (that extends CF and ACDD), we feel that standards around how to specify the results of QC tests would be useful for a wider community. Based on recent discussions on the CF-metadata mailing list, we are not alone in thinking about these ideas (see Related Discussions below).

What we propose

We propose a new variable attribute called flag_methods, that specifies the process used to calculate the flag values.

In other words: the flag_values and flag_meanings describe what the flags are, and the flag_methods attribute would describe how they were determined (i.e. a specific test name) or should be interpreted (i.e. as an aggregate of multiple tests).

Ideally, the value for flag_methods would be part of a known vocabulary, and associated with a codebase, documentation or publication via the references attribute. This enables machine-to-machine communication and thus fits our intended use well. However, the value could also be a human-readable description of the methods used to calculate the flags.

Example:

In this example the flag_methods indicate which QARTOD test was used to calculate each flag. The global references attribute links to the code documentation for the particular library used.

This example is adapted from a live dataset on the CeNCOOS RA ERDDAP server, with modifications to match this proposal. Going back to the NDBC use case: NDBC wishes to pull all values of salinity where the rollup/aggregate flag is not QC fail. They could check for flag_methods=qartod_aggregate to determine this.

variables:

    float sea_water_practical_salinity(time, z);
        sea_water_practical_salinity:units = "1";
        sea_water_practical_salinity:long_name = "Salinity";
        sea_water_practical_salinity:standard_name = "sea_water_practical_salinity";
        sea_water_practical_salinity:missing_value = -9999.0;
        sea_water_practical_salinity:ancillary_variables = "sea_water_practical_salinity_qc_agg sea_water_practical_salinity_qc_flat_line_test";

    int sea_water_practical_salinity_qc_agg(time, z);
        sea_water_practical_salinity_qc_agg:long_name = "Salinity QARTOD Aggregate Flag";
        sea_water_practical_salinity_qc_agg:standard_name = "status_flag";
        sea_water_practical_salinity_qc_agg:missing_value = 2;
        sea_water_practical_salinity_qc_agg:flag_meanings = "PASS NOT_EVALUATED SUSPECT FAIL MISSING";
        sea_water_practical_salinity_qc_agg:flag_values = "1, 2, 3, 4, 9";
        sea_water_practical_salinity_qc_agg:flag_methods = "qartod_aggregate";
	sea_water_practical_salinity_qc_agg:references = "https://axiom-data-science.github.io/ioos_qc/aggregate_flag";

    int sea_water_practical_salinity_qc_flat_line_test(time, z);
        sea_water_practical_salinity_qc_flat_line_test:long_name = "Salinity QARTOD Flat Line Test Flag";
        sea_water_practical_salinity_qc_flat_line_test:standard_name = "status_flag";
        sea_water_practical_salinity_qc_flat_line_test:missing_value = 2;
        sea_water_practical_salinity_qc_flat_line_test:flag_meanings = "PASS NOT_EVALUATED SUSPECT FAIL MISSING";
        sea_water_practical_salinity_qc_flat_line_test:flag_values = "1, 2, 3, 4, 9";
        sea_water_practical_salinity_qc_flat_line_test:flag_methods = "qartod_flat_line";
	sea_water_practical_salinity_qc_flat_line_test:references = "https://axiom-data-science.github.io/ioos_qc/qartod_flat_line";

global attributes:

    :references = "http://www.cencoos.org/data/shore/humboldt,https://axiom-data-science.github.io/ioos_qc/";

Related Discussions:

@jessicaaustin jessicaaustin added the enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format label Sep 10, 2019
@JonathanGregory
Copy link
Contributor

Dear Jessica and others

Thank you for this proposal. I'm not familiar with the technical details of your example. My understanding is that your flag_method of qartod_flat_line is intended to clarify all the flag_meanings, by indicating how they are defined - is that right? If it is right, could this information not be put in the flag_meanings? Although it's cumbersome, you could have something like

flag_meanings = "qartod_flat_line_PASS NOT_EVALUATED qartod_flat_line_SUSPECT qartod_flat_line_FAIL MISSING"

and thus we would not need a new attribute.

By the way, the flag_values attribute should be a vector of the possible values, of the same type as the data variable, int in your case, not a string.

Best wishes

Jonathan

@jessicaaustin
Copy link
Author

@JonathanGregory Thanks for your comments.

I do not think it makes sense to encode the information into flag_meanings like you've suggested, because the meaning could be (and in this case, is) independent of how it was calculated. In the case of QC, PASS means PASS, regardless of which test implementation was used.

@MaartenSneepKNMI
Copy link

I think that the values for the flag_methods attribute need more clarification, if this addition is going to be useful. At the moment the values are wide open, but with the intention of making them machine readable. I'm not sure those two options are compatible. Can you flesh out the potential values, and the governance of the vocabulary if we choose to go that route?

@JimBiardCics
Copy link
Contributor

I think overloading all this into flag_meanings can quickly make the flag_meanings elements nearly incomprehensible. @MaartenSneepKNMI has a valid point. If we allow free-form content at the beginning, it may be hard to impose further rigor later. It would require some clear signal that the content was from a controlled vocabulary so that machine understanding could be successfully applied. This is doable, but we need to be clear on what that looks like up front. For example, if the value of the attribute was a JSON block containing controlled vocabulary elements vs one that was not, it would be a clear signal.

@JonathanGregory
Copy link
Contributor

Dear Jessica

Your answer to my comment indicates that I don't really understand your example - which indeed I suspected. My understanding is that you want to add more information about what the flag_meanings mean, in effect - is that right? Does this extra information necessarily apply to all the flag values of the variable? If it does, I would argue that it is information about the variable as a whole, rather than about the flag meanings. For example, it's a description of the quality-control process (in the case of a variable that records QC information), and not necessarily related to the use of flag values to encode the variable. It would apply even if the other flag attributes were not used.

Best wishes

Jonathan

@DocOtak
Copy link
Member

DocOtak commented Oct 22, 2019

I realize this might be "crazy talk" but commenting anyway because it might be interesting to think about.

When looking at the example, for some (irrational) reason I had the desire for "all" the QC information to be in a single variable and started to think about what that might look like. I couldn't reason though a way of using things like flag masks. Then I wondered, "just how many of these QARTOD tests are there?". The answer for salinity is 13 of them. That would be a lot of QC variables laying around, not that there is really anything wrong with that on a technical level.

What if, all the QC flags were in a single variable, but the last dimension corresponded to the QC test method? Instead of a new attribute, there is a new standard name "flag_method". A variable with is name you would probably want to be an actual "coordinate variable" for the labels of of all the QC tests.

Here is a hand modified CDL of the modified CDL from the original proposal for what this might look like:

variables:

    float sea_water_practical_salinity(time, z);
        sea_water_practical_salinity:units = "1";
        sea_water_practical_salinity:long_name = "Salinity";
        sea_water_practical_salinity:standard_name = "sea_water_practical_salinity";
        sea_water_practical_salinity:missing_value = -9999.0;
        sea_water_practical_salinity:ancillary_variables = "sea_water_practical_salinity_qc";

    int sea_water_practical_salinity_qc(time, z, qartod_method);
        sea_water_practical_salinity_qc:long_name = "Salinity QARTOD Flag";
        sea_water_practical_salinity_qc:standard_name = "status_flag";
        sea_water_practical_salinity_qc:missing_value = 2;
        sea_water_practical_salinity_qc:flag_meanings = "PASS NOT_EVALUATED SUSPECT FAIL MISSING";
        sea_water_practical_salinity_qc_agg:flag_values = "1, 2, 3, 4, 9";

    string qartod_method(qartod_method); //or char array
        qartod_method:standard_name = "flag_method";
        qartod_method:references = "https://axiom-data-science.github.io/ioos_qc/";

@martinjuckes
Copy link
Contributor

Dear Jessica,

I like the idea of providing a but more detail about what is meant by PASS etc, but I'm slightly concerned that the references given for QARTOD are broken links. How stable are these terms?

There are several cases in the standard name table in which a specific term is defined for use with a set of of values, such as atmosphere_stability_showalter_index, beaufort_wind_force. Following the same pattern, we could introduce a new standard names of qartod_aggregate_status_flag, quartod_flat_line_status_flag. This would, I think, be in line with Jonathan's suggestion that the general CF approach would be to put details of the process associated with a particular term in the definition of the standard name. If we do introduce a new mechanism for specifying a "method", I feel that it should be done in a way which could be applicable to other variables, not just flags. E.g. by using a methods attribute, rather than flag_methods.

@jessicaaustin
Copy link
Author

Thank you everyone for all the feedback! We appreciate you taking the time. There is a lot of great stuff for us to think through here.

We did imagine that the value of flag_meanings would be in reference to a vocabulary -- at least that is what we would enforce within the IOOS community. For example, to satisfy our QARTOD use case, we would define a vocabulary called "IOOS QARTOD Tests", with initial valid values of:

  • qartod_aggregate
  • qartod_attenuated_signal_test
  • qartod_climatology_test
  • qartod_flat_line_test
  • qartod_gross_range_test
  • etc

The whole point of our proposal is to try to define a way for a script to figure out which dataset variable corresponds to which qartod test, so I agree without a controlled vocabulary this is not possible.

In terms of how to define or enforce this within the CF conventions, I'm not sure the best way that should be accomplished. Does it make sense to have an additional attribute -- say, flag_methods_vocabulary -- with a value of the vocabulary name?

(Side note: @martinjuckes, You mentioned the broken links to documentation and questioned how stable this library is, which is a completely valid point. Alongside this work to nail down a new IOOS metadata profile, we are also doing a bunch of work to get the ioos_qc library to 1.0, including moving its location in github as we consolidate it with other qartod libraries. Once we do release 1.0, the links should be stable and if we do move documentation we would add redirects so no links were broken.)

@DocOtak, your idea is not crazy at all. I've seen examples of people encoding values for all the QARTOD tests in a single variable, with a qartod_tests dimension, just like you describe. So in that case, we (IOOS) would enforce that the values of the qartod_tests string should be part of the "IOOS QARTOD Tests" vocabulary. All that said, I don't think this would be a valid approach for us. Asking people to update their dataset with an additional attribute is not a huge deal, and can be accomplished on top of the existing data (for example with an ncml file). But asking them to add a new dimension and restructure their data would be too much.

@JonathanGregory, Yes, the idea is to add more information about what the flag_values mean. In our example, it is indeed "a description of the quality-control process". And you make a valid point, it really is information about the variable itself, not the flags. Which leads me into @martinjuckes other comment, where he suggested doing away with this new flag_methods attribute, and instead adding to the standard_name table. So to make sure I understand what you are saying, the example would look like:

variables:

    float sea_water_practical_salinity(time, z);
        sea_water_practical_salinity:units = "1";
        sea_water_practical_salinity:long_name = "Salinity";
        sea_water_practical_salinity:standard_name = "sea_water_practical_salinity";
        sea_water_practical_salinity:missing_value = -9999.0;
        sea_water_practical_salinity:ancillary_variables = "sea_water_practical_salinity_qc_agg sea_water_practical_salinity_qc_flat_line_test";

    int sea_water_practical_salinity_qc_agg(time, z);
        sea_water_practical_salinity_qc_agg:long_name = "Salinity QARTOD Aggregate Flag";
        sea_water_practical_salinity_qc_agg:standard_name = "status_flag qartod_aggregate_status_flag";
        sea_water_practical_salinity_qc_agg:missing_value = 2;
        sea_water_practical_salinity_qc_agg:flag_meanings = "PASS NOT_EVALUATED SUSPECT FAIL MISSING";
        sea_water_practical_salinity_qc_agg:flag_values = "1, 2, 3, 4, 9";
 	 sea_water_practical_salinity_qc_agg:references = "https://ioos.github.io/ioos_qc/api/ioos_qc.html#ioos_qc.qartod.qartod_compare";

    int sea_water_practical_salinity_qc_flat_line_test(time, z);
        sea_water_practical_salinity_qc_flat_line_test:long_name = "Salinity QARTOD Flat Line Test Flag";
        sea_water_practical_salinity_qc_flat_line_test:standard_name = "status_flag qartod_flat_line_status_flag";
        sea_water_practical_salinity_qc_flat_line_test:missing_value = 2;
        sea_water_practical_salinity_qc_flat_line_test:flag_meanings = "PASS NOT_EVALUATED SUSPECT FAIL MISSING";
        sea_water_practical_salinity_qc_flat_line_test:flag_values = "1, 2, 3, 4, 9";
	sea_water_practical_salinity_qc_flat_line_test:references = "https://ioos.github.io/ioos_qc/api/ioos_qc.html#ioos_qc.qartod.flat_line_test";

This is an approach we considered, but then discarded because we weren't sure that the qartod test names would be appropriate entries in the standard name table. However based on this conversation, and the recently accepted quality_flag name maybe we were wrong. QARTOD is a community standard after all.

So to sum up, it sounds like we have two options:

  1. Continue down this flag_methods route, and figure out a way to make it clear that there is a standard vocabulary associated with this, or
  2. Propose a new set of standard names for the QARTOD tests, and if that is accepted then there is no change needed to the CF conventions themselves

Based on the feedback so far, we're leaning towards (2). But we'll see if there are any other comments people have here before making that proposal.

@jessicaaustin
Copy link
Author

Submitted an issue proposing a new set of standard names: #216

If that is accepted, I will close this issue.

@ngalbraith
Copy link

I'm not quite sure if you're proposing that CF be responsible for the "IOOS QARTOD Tests" vocabulary, or if that would be maintained by IOOS.

In the latter case, CF would just be accepting flag_methods as a variable attribute with a specific meaning, and the onus would be on IOOS (or other projects) to make sure it was used by their data providers as they want it used.

I like your proposal '...to have an additional attribute -- say, flag_methods_vocabulary -- with a value of the vocabulary name?' That would also allow you to make the list of acceptable values a little less cumbersome: climatology, flat_line, range, attenuated_signal, and aggregate, for example. Then these could be more generally useful to different communities using different specifics but the same general vocabulary.

@jessicaaustin
Copy link
Author

@ngalbraith To be honest, we are not quite sure either! We are hoping that through these proposals we can figure out the best way.

Originally, we were thinking IOOS would maintain this vocabulary, and it would be specified using flag_methods (and possibly also flag_methods_vocabulary). We thought this approach would be useful for a wider community. But on the other hand, having CF maintain the list by incorporating these tests into their standard name list would also be fine with us, and in some ways is much simpler. So we'll see how people respond to #216

@mwengren
Copy link
Contributor

@davidhassell This has been superseded by #216 and can be closed.

@larsbarring
Copy link
Contributor

It seems that this stale issue is still causing confusion even though it was superseded by another one about 3 years ago. Can we now close this issue as already suggested?

@erget
Copy link
Member

erget commented Oct 5, 2022

Last input was 2y ago - closing, please re-open if necessary :)

@erget erget closed this as completed Oct 5, 2022
@JonathanGregory JonathanGregory added the agreement not to change Issue closed with agreement not to make a change to the conventions label Oct 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agreement not to change Issue closed with agreement not to make a change to the conventions enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format
Projects
None yet
Development

No branches or pull requests

10 participants