Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Planning: Custom attributes and OAFeat-style attribute filters? #39

Open
dblodgett-usgs opened this issue Feb 8, 2025 · 9 comments
Open

Comments

@dblodgett-usgs
Copy link
Member

I wonder if there is a realistic path to implementing support for attribute filters on a couple end points.

For each layer, an more or less full featured OAFeat end point could be exposed.

e.g. https://api.water.usgs.gov/nldi/linked-data/geoconnex-demo?f=json could allow filtering by spatial or attribute queries.

Below that layer, for navigation in particular, we could support similar filtering.

The use case is where a layer has its own type(s) like, active gage vs inactive gage or water rights vs time series of withdrawals, where there is an interest in being able to limit responses to one or the other types.

This would have impacts on crawler configuration, the database / service sql, and on the service API documentation.

@webb-ben and @gzt5142 -- I feel like I'm imagining something that's not all that far fetched given where we are at. Can you flesh out a couple details of the implementation that would clarify how much effort this would be and what we would need to account for in planning?

@gzt5142
Copy link
Collaborator

gzt5142 commented Feb 9, 2025

At first blush, a simple interface might be possible with the addition of a "filter=" query parameter to the endpoint path. something like:

.../linked-data/geoconnex-demo?filter='type=hydrolocation'&f=json

The filter syntax looks a lot like SQL. It would amount to an extra WHERE clause on our queries. It'd then be possible to build that into any query we do against the feature table.

However -- there is a real danger here of SQL injection attacks if we merely pass a user-supplied string along as part of our SQL. We should absolutely not implement in that way.

What I think this means is that we would have to adopt (or build) a parser for allowable filter expressions -- to sanitize and then extend the SQL expressions we're currently using. GDAL and the libraries based on it use a simplified SQL dialect (https://gdal.org/en/stable/user/ogr_sql_dialect.html#where) to compose such filters -- presumably translating to the underlying technology when reading features.

@gzt5142
Copy link
Collaborator

gzt5142 commented Feb 9, 2025

A more secure option with less development overhead would be to allow a query param for every column in the feature table -- by name. There's a little bit of undesirable coupling here, in that changes to the feature table would require concomitant changes to the filter logic (to allow for renamed, added, or deleted columns). Much lower risk, and we'd not need to include a parser for ad-hoc style filter queries.

@dblodgett-usgs
Copy link
Member Author

Right -- absolutely. I was imagining we would steel the pattern pygeoapi uses for OAFeatures filter spec. @webb-ben will have ideas there.

@webb-ben
Copy link
Member

Definitely not something that is too far off from where we are currently. I do have a couple of clarifying questions:

  1. Would this mean adding columns to the SQL tables? For a given feature properties, the response properties are a mix of Crawler Source and Feature table:
{
    "identifier": "ME00213",  # Feature Table
    "navigation": "https://api.water.usgs.gov/nldi/linked-data/geoconnex-demo/ME00213/navigation",  # n/a
    "measure": 0,  # Feature Table
    "reachcode": "01020001002199",  # Feature Table
    "name": "Dole Brook",  # Feature Table
    "source": "geoconnex-demo",  # Crawler Source
    "sourceName": "geoconnex contribution demo sites",  # Crawler source
    "comid": "1699108", # Feature Table
    "type": "hydrolocation", # Crawler Source
    "uri": "https://geoconnex.us/iow/demo/ME00213", # Feature Table
    "mainstem": "https://geoconnex.us/ref/mainstems/2246076" # Mainstem table
}

Is the idea that the existing properties (type) would all be tied to individual features? Or would we be adding properties to this table would allow for JSON-like addition of features?

  1. As for API filtering mechanic, I do think following OAFeat would be a good idea - do we want to implement more than one type of filtering responses? The first is a simple KVP query argument. In pygeoapi this is implemented such that the key has to match one of the queryables, the list of which is generated on runtime. The second of two ways would be to introduce CQL filtering to the NLDI. https://github.com/geopython/pygeofilter should have bindings from CQL filters to the SQLAlchemy query.

@dblodgett-usgs
Copy link
Member Author

The concept would be to allow the crawler to ingest additional attributes at the feature level and store them in the feature table.

KVP would be fine initially, but if we were to just use a pygeofilter, cql would be in the cards, right?

@webb-ben
Copy link
Member

yup exactly. Could add properties column that is JSON?

@dblodgett-usgs
Copy link
Member Author

I was imagining some structure initially to keep things simple.

Like if people have a "type" column or an "active/inactive" flag they could add that and we could filter on it. I would think we would link it to some common graph patterns that we want to encourage people to put in?

@gzt5142
Copy link
Collaborator

gzt5142 commented Feb 19, 2025

Discussion w/ Dave -- implement feature filtering using current schema for the feature table. A future enhancement would allow more dynamic filtering capabilities, but would require change to the schema.

@gzt5142
Copy link
Collaborator

gzt5142 commented Feb 19, 2025

I will work to implement a pygeofilter parser to read a "filter" query parameter and pass that on as a sqlalchemy filter when we select features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants