-
Notifications
You must be signed in to change notification settings - Fork 7
telecon notes
https://github.com/hapi-server/tasks
- Make plan for finishing linkages; white paper to help guide document decisions. See our and ChatGPT's proposal
- approve PR on addition of new units schema in list of allowed schemas
- done - no more PRs
- short discussion on units - where things stand
- IHDEA Units working group will recommend VOUnits for machine-readable unit strings in Heliophysics
- if you don't have a units schema already, start using VOUnits; SPASE and ISTP, for example could start using this
- note: the CODATA meeting is this October in Australia - maybe send a HAPI person? (or see if IHDEA can send someone)
- request from Jon: need a list of all the time series data that NSF cares about (for the NSF proposal):
- most of what's in Madrigal, SuperMAG, AMPERE, MANGO, earthquakes (SEEDS file format; see https://ds.iris.edu/ds/ for some , stream flow, atmospheric info(?)
- Discussion about conda-forge package:
- there is one already! maintained by D. Stansby; it comes from a fully separate repo with it's own copy of the client
- hapi-client has an issue about this: https://github.com/hapi-server/client-python/issues/68
- S. Poulson doing build for PyHC and hapi-client unit tests were failing; one was for time parser tests; some time strings which used to parse OK are now considered errors
- Discuss Bob's Unit email
- the IHDEA units working group is now recommending that if you don't have a schema for units, then use VOUnits as the default schema
- several tickets created...
- SuperMAG update - status of HAPI server
- Jon to talk to M. Friel today
- confirm 3.3 ready for release
- vote on license PR
- one more ticket
- HAPI server for ephem and pointing data
- underlying SPICE-specific server (not a HAPI interface)
- HAPI server wraps underlying server and provides simple table view of pre-canned values (PSP position, planets, other S/C, etc, and also velocities?) and cadences and frames
- one thing to address is how a science user should know which ephemeris source is the best to use: metadata / provenance problem and the way different communities handle updates to this kind of auxiallary information
- proposal to NSF for HAPI data package ingester / standardizer for Zenodo or similar
- we will meet Tue, 3:30pm Eastern for last ticket for 3.3 release: https://github.com/hapi-server/data-specification/issues/251
- creating a conda package?
- conda becoming the de-facto way of managing different Python projects with different dependencies - this is a topic of debate
- conda allows binaries (which we don't need?)
- pip and conda incompatibility is less now - Bob uses this as his primary mode of packages
- conda package needs a maintainer (password for conda account, updates for new HAPI versions, unit tests)
- PyOpenSci has PyPi to conda conversion process that is automated (Bernie uses this, but it has been fragile as the YAML requirements at conda change from underneath you)
- Sandy: PyOpenSci's advice on going from in-PyPi to Conda forge: https://www.pyopensci.org/python-package-guide/tutorials/publish-conda-forge.html
- Bernie: this is the same procedure (and grayskull package) that is used for SPDF packages
- action item (Jon): research / survey people to find out more about popularity of conda versus pip
- what to present / showcase / teach at combined CEDAR & GEM meetings (June 22-27, 2025, Des Moines, IA)?
- GEM tutorials day (Bob): 1-2 hours of HAPI in Python notebooks, or similar (like what SPEDAS does routinely); focus is just "taste test" - not enough time for full tutorial
- also focus on mini-GEM tutorial on day before Fall AGU 2025; ways to combine Gannon storm data from NASA and ESA missions, SuperMAG, etc
- meetings this week to continue the release path for 3.3
- question: what is the KNMI plotting library? Ans: d3js (which does SVG - slower?); HighCharts; Plotly (NOAA / NESDIS? spot plotting system)
- for Sandy: send slides on HC access to Bobby
Went over detailed wording in two open pull requests regarding location
and vectorComponents
- https://github.com/hapi-server/data-specification/pull/255
- https://github.com/hapi-server/data-specification/pull/252
- review of HAPI offsite meeting last week
- version 3.3 nearly ready for release: go over last steps
- proposal ideas
- meetings this year
To Do list for releasing 3.3:
- make change log - use some AI?
- new schema for 3.3 for verifier
- test server with 3.3 features?
- go through the dev ops process to execute the release:
- make PDF (Bob)
- update lots of links
- release to Zeonodo, including re-vamp of current Zeonodo DOI page, or start with new all-version zenodo
- slides about new release
Other to-do items:
- make 2 slides for Jesper at ISWAT meeting
- close issue on adding location (make branch from ticket, add mods using ideas from ticket, make PR, present to HAPI telecon on 1/27, possibly approve PR)
Preparation: Study linkages
- Finish linkages; white paper to help guide document decisions. See our and ChatGPT's proposal
- Long-term plans and proposals
- servers repo (indexing metadata); need schema for all.json and ways to keep the .txt. and .json in sync
- need schema update for modified
info
response that now includesaltitude
andlocation
andgeoLocation
; see issues 232 and mostly 165: https://github.com/hapi-server/data-specification/issues/165 - make official / clean up / finish instructions for adding new server into our official list; requirements: pass validation; stability requirements? active person behind contact email? acknowledge that we will be displaying status info of your server; future updates when HAPI spec changes; rules / guidelines for when your data changes?; FAIR version of HAPI
- Status page; mechanism for contacting admins.
- Funding for keeping servers running - what mechanism. Discuss ideas with Brian. (IDIQs?)
- add recent presentations to GitHub repo
- look at time range issue by Eelco: https://github.com/hapi-server/data-specification/issues/97
-
SuperMAG - server now updated after some down-time and external certificate issues; see https://github.com/hapi-server/tasks/issues/9
-
Jon to meet with Jesper
-
Prepare agenda for
-
Jeremy's pull request for https://github.com/hapi-server/data-specification/issues/232
-
Jon report on meeting with Jesper
-
If SuperMAG not up by end of month, use pass through?
-
Discuss 165
- Jon to submit to EGU meeting (virtual participation)
- Paper ideas (Announce when SuperMAG magnetometer data available; open standards and documentation to facilitate AI generation of data access ... more of idea for presentation)
- Space Weather Workshop, abstract deadline February 17, 2025, registration deadline of March 5, 2025 https://cpaess.ucar.edu/meetings/space-weather-workshop-2025, March 17-21, 2025 in Boulder, CO
- Meet with Masha about serving simulation data and challenge results via HAPI
- Discuss 165
- discuss SuperMAG status
- add recent presentations to GitHub repo
- News:
- summary of PyHC meeting: new level coming for PyHC projects: bronze, silver gold, each with increasing requirements for things like testing, documentation, interoperability, etc;
- also was an Open Science Data Systems Workshop; summary: will be some papers on best practices, common terminology, and examples of science dat systems; Earth science has hySDS, an open-source and highly scalable system for managing the largest Earth science missions (75 TB/day plus surge capacity)
- status of FAIR pull request:
- https://github.com/hapi-server/data-specification/pull/224
- we had a large discussion on provenance
- summary: just go with very simple free-text
provenance
attribute at the dataset level for now; need to talk to use-case owners (Baptiste) about anything more details; the linkages endpoint could support provenance info that is based on the time range of the request - this is linked to the addition of a files endpoint, which could support fine-grained provenance for file-based data
- https://github.com/hapi-server/data-specification/issues/218
- Wed. 1pm meeting to close out the FAIR pull request (by adding the agreed-upon
provenance
attribute and cleaning up and finalizing FAIR language - during AGU, we'd like to have a 6 hour block of time for an in-person HAPI meeting, probably Mon or Wed
- Discuss http://radiojove.net/archive.html
- no meeting next week - because of holiday for some and PyHC meeting
- proposed spec changes to support FAIR principles are ready for some review:
-
license
keyword will be short-phrase from SPDX.org -
provenance
info – need something simple for now; will eventually link to file listings
-
- upcoming talks:
- HAPI at PyHC next week: I’d like comments on my slides (unbelievably, a draft is ready now for quick review)
- HAPI at the Open Science Data System Workshop (Nove 14-15); ideas welcome
- planning for the future – we need to prioritize our current efforts and have some more intensive meeting episodes
- work on caching is ramping up ahead of AGU
- web page work (by Bob's student): start with current content and revamp CSS; presentation list to be auto-linked to github repo listing presentations
Here’s a list of some things we have going right now that needs to be organized / prioritized:
- central location to find known HAPI servers and datasets
- adding of JSON-LD to central landing page of known HAPI datasets
- centrally track any outages of known HAPI servers
- adding ability to link HAPI datasets together – especially for different cadences
- HAPI caching mechanism
- HAPI amalgamation / local data serving tool
- generic plotting client via adapting of KNMI tool (Eelco Doornbos)
- pursuing standard designation via IHDEA / PyHC
- mapping of HAPI metadata to SPASE
- standard templates for specific kinds of HAPI output
- file listings
- event lists
- data availability info
- pursuing many other data centers for addition of HAPI:
- U Calgary site with lots of ground-based data
- Madrigal – the 600 lb gorilla of the CEDAR community (NSF)
- ESAC is moving to a centralized Heliophysics data system for all its missions that will include HAPI
- integration within HelioCloud – using HAPI for cloud-to-cloud data transfers
Discuss https://docs.google.com/document/d/1j8lMkvwFuJI8pdFK5K42XQL0Ua21aO3UkH8JtZSGqdQ/edit?tab=t.0
Action items:
- Jon and Bob to meet Tue 4pm for finishing FAIR updates to spec
- HAPI Browse Tool needs official name
- Bob to add status into to HAPI Browse Tool page (info for each server)
- DASH outcomes and upcoming PyHC meeting on standards
- Bob: focus on search could be done by making JSON-LD for every HAPI dataset; the focus here is creating rudimentary search mechanism based on the 10,000 HAPI records that now exist; moving to JSON-LD too, so that search engines (Google, etc) can find it; SPASE allows faceted search, which is hard for general Heliophysics - specific communities often have their own search GUIs
- Bob's student has new web site HAPI home page - info density is a little low for people' liking right now; Bob will work with him on this; Bobby is using the "astro" framework for new static page generation
Topics:
- addressing HAPI uptime - Bob has a HAPI server that captures uptime; possibly write client code to leverage this info so other client writers can use it easily (and report server down time as needed)
- Potential other HAPI servers: Joey at SwRI for TRACERS (would be a C/C++ based implementation)
- SOAR server at ESAC - this will become the main server for Heliophyiscs data at ESAC
- Sandy: HAPI would benefit from more outreach to scientists
Actions:
- Jon and Sandy meet to talk about outreach to scientists; Eric: replicate a paper - use a specific MMS one that SPEDAS has done before; see webinars for MMS examples for SPEDAS - how how to load and plot data from a specific study; https://www.science.org/doi/10.1126/science.aaf2939 lots of examples are created for or signed off by instrument teams; use data from specific events on Oct 6, 2015; try these: FPI and FGM and plasma beta calculations (that use FPI and FGM)
- Jon and Bob - prep for next week - finalize the FAIR pull request; go over recent surge-work on linkages
Agenda: Highlights from IHDEA:
- potential push to add HAPI as an officially recommended standard by IHDEA, which would first require IHDEA to create a standards promotion process, and HAP could be the first project to go through this
- new HAPI servers: Ralf Kiel has one in Germany for internal; SciQLop (via SpEasy tool by Alexi J.; SpEasy will become HAPI client first, then get used in server to expose data that is not otherwise avail. via HAPI)
- consider supporting Japanese effort by making a pass-through HAPI server over this data:
- At IHDEA, Emoto Masahiko presented on Data Repository of Aurora Imaging Spectroscopy (DRAIS) and has an Open Data Server with an API for access to their time series data: HySCAI data from Japanese spectral observations; see https://projects.nifs.ac.jp/aurora/en/repository.html (can't get this site to respond)
We need to start dealing with the fact that we keep on saying "Search and Discovery is for a different project" but such a project has not emerged. (Prompted by Speasy discussion).
Follow up wiFth Ryan
Agenda:
- Review of the FAIR updates from last week (I updated that branch with my version of Rebecca’s comments)
- A review of some things from the HAPI focus week (which was 2 weeks ago):
- an idea for linkages using a separate endpoint
- there is a need to capture a schema (even an informal one) for file listings, event lists, and availability info
- A preview of DASH and IHDEA talks about HAPI
- Allow unitsSchema to apply to only certain parameters? Will know more if need after CDAWeb units discussion.
- Discuss https://iswat-cospar.org/clusters_teams
- 2025 Space Weather Workshop March 17-21, 2025 Boulder, CO - Save The Date!
- Discuss Julie's/Alex's email about tutorials
Notes from today:
- Some links from the Zoom chat:
- From Bernie: https://citeas.org/api
- From Jon: HAPI spec mods for FAIR: https://github.com/hapi-server/data-specification/blob/217-needed-elements-for-fair/hapi-dev/HAPI-data-access-spec-dev.md#86-fair
- From Rebecca: https://datacite-metadata-schema.readthedocs.io/en/4.5/properties/subject/#subject
- From Bobby: https://github.com/IHDE-Alliance/ISTP_metadata/tree/main/v1.0.0
- From Rebecca: example of qualified ref: https://datacite-metadata-schema.readthedocs.io/en/4.5/properties/relatedidentifier/#b-relationtype
- Jon's notes about Rebecca'a answers to our FAIR questions (these have also been injected into section 8.6 FAIR on Bob's branch and PR for the relevant issue):
Notes about FAIR and HAPI:
on this branch:
https://github.com/hapi-server/data-specification/blob/217-needed-elements-for-fair/hapi-dev/HAPI-data-access-spec-dev.md
add 8.6 FAIR to the TOC
list of known hapi servers could be mentioned in 8.6
CDAWeb references several thousand datasets
more description in point 4 under Accessible
(HAPI is service)
HPDE.io has SPASE metadata registries for Helio data, for example.
Interoperable
mention that the language is JSON
indicate where JSON can be found:
1. info
2. catalog?include=all
3. or with the data
links to the document!
and mention that we have a schema
Interoperable:
spell: acessiable
Rebecca: what is definition of 'shared' and 'broadly applicable'?
ans: a lot of people use this!
2. (Meta)data use vocabularies that follow FAIR principles
Rebecca: If we wanted to use vocabularies, what would it look like?
To use FAIR vocab, you need to use something like:
https://datacite-metadata-schema.readthedocs.io/en/4.5/properties/subject/#subject
note: this is mostly for the values of the attributes
subjectScheme: name of vocab
schemeURI:
valueIRU: MMS
classificationCode:
This is mostly for being able to link keywords about a dataset (not datasets names) so that datasets can be linked more easily.
Since it is focused on discovery, it's not as relevant for HAPI as a service focused on access.
FAIR vocabulary is more about search-related terms
really it's a scope decision - does HAPI really need to use this?
best example of how to use this is how the Subject
HAPI uses some controlled vocab, but not really in for formal way required by DataCite:
(Meta)data include qualified references to other (meta)data
Other metadata can be referenced using additionalMetadata. For units, an external schema can be referenced using unitsSchema. Rebecca: what does "qualified" mean?
Ans:
from DataCite:
qualified ref. means that you choose the relationship type:
this id is related to this other id by a defined relationship
https://datacite-metadata-schema.readthedocs.io/en/4.5/properties/relatedidentifier/#b-relationtype
Provenance bock could could mimic the "related dataset" relationship mechanism by using
lots of prov. is taken care of by
list of relationships allowed can constrain the metadata
just used "derivedFrom" (..other files. for example)
versus including cameFrom (..mission..) which would be harder to implement
for #4:
(Meta)data meet domain-relevant community standards
HAPI is a community standard.
Rebecca: yes, but what other standards are you using, leveraging
Bob: can we map this to something like OpenAPI mechanisms
Bob: there is not another standard for timeseries in JSON,
Point to specific pieces: it's a REST PI; it uses JSON with schemas; we use HTTP; we use ISO8601; we follow what community was already using for data server access (hence)
We do these things to support FAIR, and here is where we end (our scope). We are a data access service, and most of the FAIR aspects that we don't cover are covered by the formal pros.
working meetings this week:
- Mon 9am to 11am (Jon, Sandy, Jeremy, Bob)
- Tues 9am to noon (Jon, Jeremy, Bob)
- Wed 9am to noon (Jon, Bob, not much Jeremy)
- Thu 9am to 11am - Jon and Jeremy
- Fri 9am to noon? wait and see if needed
Topics for this week:
- settle presentations for DASH and IHDEA
- finish off the section on FAIR and how HAPI metadata is related
- CDAWeb server (Jeremy and Bob)
- SuperMAG server (Sandy and Jon): https://github.com/hapi-server/tasks/issues/9 and https://github.com/hapi-server/tasks/issues/20
- KNMI timeline viewer was already doing SuperMAG as a pass-through
- dev ops and releases (Jon): https://github.com/hapi-server/tasks/issues/22 also issue 3 and also 4
- urgent issues: point releases; patches; Zenodo IDs; larger issue is how to manage releases
- closure on servers.all (Bob and Jeremy): https://github.com/hapi-server/tasks/issues/11
- recommendations / clarification on how to deal with: cadences, file listings, availability in the spec for now (without solving the full, general linkages problem)
- for consideration (but risky as quagmire...):
- linkages between dataset related by cadence; how the linkages are expressed will affect how to communicate that there are other types of things available for a dataset, such as images, file listings, data availability, semantic types of science data, etc: https://github.com/hapi-server/tasks/issues/19
For HAPI meeting at noon:
- can SPDF or HDRL help manage one-off (pass-through) HAPI servers?
- HDRL might be willing to float specific, small requests for service sustainment (of HAPI servers)
- justifications would be needed: how much usage? what are costs, and why is it running? NASA dataset or personal? (NASA gets priority) Is it hosted some other way (if it is nowhere else right now, HDRL more interested)? How often used by community (metrics tbd)?
- heliophysics.net or helioanalytics.io could possibly host this
- 100,000 requests in a month with 50GB downloaded is $5/month, and this is a very reasonable level
- (it's better for the archives to pull in services so that they become responsible for them working, especially when breaking changes get made - the other services need to be part of the test process after changes!)
- Rebecca again visits at noon to talk about FAIR w.r.t HAPI
- Jon gives updates on completed tasks
- Bob summarize his meeting with Rebecca and https://github.com/hapi-server/data-specification/pull/224
- Rebecca will join meeting at 12:30 pm to answer questions
- Sandy update on SuperMag and also using Heliocloud for datashop, etc.
Notes: Rebecca joined at 12:30pm; lots of discussion about details of FAIR as it applies to services versus data. There are at least two kinds of identifiers, for example. A persistent one, like a DOI, and HAPI should not use those as dataset identifiers. This should be provided in the optional resourceID
field in the info
response. Some FAIR principles apply to just this identifier (i.e., there should be one). Other principles related to data identifiers apply across both the persistent id and the local HAPI server id for the dataset.
For provenance, Rebecca suggested just going with something simple: Dataset Version and also HAPI version.
The approach we are taking for a HAPI server to be FAIR:
- make a few changes to the spec to allow for FAIR items not currently present; main ones seem to be: add
dataLicense
, addprovenanceDetails
- for HAPI to be FAIR, the underlying data must also already be FAIR, and HAPI can't really make up the difference if this is not true
- Bob is nearly done with an Appendix to describe how to express FAIR data using HAPI
Action items:
- Bob works on appendix and we go over it next time with Rebecca starting at noon eastern
- Jon to email Jeremy and Bob about potential longer meetings next week for making further progress on specific issues
- Bob to meet with Rebecca about FAIR request
- merge in Bob's changes for notes and warnings messages.
- Discuss https://github.com/hapi-server/data-specification/pull/223
- From Julie: "Any updates you’d like to present for HAPI at the upcoming PyHC Fall Meeting (looking for it on Day 1 (Nov 11th)? Purely updates, no overview." Jon will respond to her email.
- From Rebecca: We discussed a few weeks ago some proposed changes to the HAPI metadata schema. I recall those changes being received well. Do you have any idea what the timeline looks like for a new version of the HAPI metadata schema to be released with those changes? Bob emailed Rebecca about a meeting this week to clarify some points.
- Can we get SuperMAG up (We were a month from having it running in 2020 and it would be good to have this finished finally!)
-
When new cdaweb server is up, all spase records need to have ProductKey for HAPI server updated. Some will change.
-
Discuss https://ivoa.net/documents/VOUnits/20231215/REC-VOUnits-1.1.html#tth_sEc2.4
-
Link in https://hapi-server.github.io/docs/2021_COSPAR.pdf is broken. But text is correct. Problem is with their publication.
-
Bob will ask student about creating GitHub runner to generate list of presentations.
How are SPASE records synced with master CDFs? Automated process?
Discuss prep things needed for 3-day meeting.
Mandate upload at least 2 presentations/abstracts to https://github.com/hapi-server/presentations
- Zach to present on verification script that goes through SPASE and tests each mention of HAPI
- name of repo for HAPI tools (since
hapitools
is already taken. Decided on hapiutils for pypi. Won't change tools-python to utils-python; tools-python may not be a single package and may want "tools-matlab", so don't want to try to have as similar as possible to package name. - confirm DASH / IHDEA submissions
- Discuss Simon's email. No updates. Bob sent clarification about interpretation of 1404 and 1405
Notes:
- Zach presented on his FAIR analysis script, which also checks HAPI Servers from NASA's SPASE records
- his code is available in this repo: https://github.com/Kurokio/HDRL-internship-2024
- next working meeting: this Fri 9am - 11am Eastern
- instead of
hapitools
(which is taken) we will usehapiutils
for the package name (different than repo name) - Bob suggests having development of HAPI amalgamation tool go through some code review / technique analysis to ensure it is re-using code form previously done solutions to common problems; have Calley reach out when starting a new problem? have discussion first on interface for any new capability (similar to what we did for caching)
- consider looking at SpacePy since those developers have solutions for similar issues
- consider a larger group discussion to facilitate ideas for functionality ideas
- DASH/IHDEA talks:
- DASH: Jon - talk on forward looking view of HAPI; plans and needs to enhance interoperability
- DASH: Jeremy: poster / talk about two servers in development: SPDF and ESAC (same new Java server) and NOAA SWPC (own from scratch); also the new ones for Mag data: SuperMAG and WDC
- DASH: Sandy and Calley: Poster for HAPI Amalgamator basics (need poster representative)
- DASH: poster by Nobes? or extra slide in Jeremy's talk?
- IHDEA: Jon HAPI status update (short)
- IHDEA: Bob covers more general topic about metadata
- Consider Week of Sep 23 as in-person HAPI dev meeting; need venue in remote B&B with good wifi...
- today at 1pm: status of WDC server Adam and Oli; they've been working on this for a month on their own, integrating with existing system
TODO:
- Jon: email Zach and Rebecca about next week
Agenda items:
- Schedule meeting with WDC devel 1 pm Eastern, July 29 (extend regular on Mondays, They have a running HAPI server and want feedback.
- talks at AGU (Dec 9-13) and DASH/IHDEA (Oct 14-16; IHDEA Oct 17-18 both in Madrid, Spain - with some remote participation):
- focus at AGU - networking with people who could use HAPI (NOAA AI center folks, etc); set up several 1 hr meetings
- Jon: AGU: HAPI overview in one of these:
- I'd like to present more than the overview / update to also cover applications / analysis and instantiation of HAPI servers
- IN039 Big Data and Open Sci. in Helio and Planetary https://agu.confex.com/agu/agu24/prelim.cgi/Session/226235
- P042 Machine Learning and Data Science Methods for Planetary Science https://agu.confex.com/agu/agu24/prelim.cgi/Session/226555
- R2O2R Space Weather Session SH034 https://agu.confex.com/agu/agu24/prelim.cgi/Session/227464
- Jon: DASH and IHDEA options
- forward looking talk: what could take HAPI to the next level? probs we've encountered; how to enable search, servers going up an down (true for any distributed system with an API approach); people building Visualizations - need more of that!! and this helps you future-proof your app -= your own Viz. tools may age, but if you are accessible via a std, other people's Viz. Apps might work; also looking at making things more FAIR; managing multiple server versions (handle automatically in clients that we distribute); metadata problem - caching of metadata, and fixing / patching of metadata with a possibly centralized overlay of metadata fixes; compelling story of things we can build on from here - what do we focus on next now that reading is becoming more standardized? Tools for data amalgamation are in fledgling state - next step is for more feature-rich set of HAPI-based data manipulations**
- Getting HAPI up and running on your Data Center (no specific session)
- Current Reach and Potential of HAPI
- Automated Testing of HAPI servers
- Using HAPI Metadata to Populate other Metadata content (would require some work to have something to say)
- Bob: AGU
- Bob: DASH - latest capabilities: new data centers / sources and some of the new tools (relatively simple poster)
- Bob: IHDEA - creation of updated HAPI metadata; has revealed opportunities for improvement in metadata world; relapsed. to SPASE directions; what often gets missed and implications; press is similar to making code - needs evolution and lots of check, esp. overcoming human factors; summary of HAPI experience and recommendations; have Rebecca on this too (for use with her search interface); search only useful if it really has access to everything, and it also needs to not be broken!!, and needs to be more compelling than google! Also mention about FAIR aspects of HAPI
- Nobes / Jeremy: IN-018 Data Deluge (UCAR and HDF) https://agu.confex.com/agu/agu24/prelim.cgi/Session/226709 Caching presentation for DASH; generic capability that can assist clients in any language; also separate presentation at DASH
- Calley with Sandy? - for DASH only (since deadline is not until Aug 10)?
- Jeremy: AGU - IN039 Accessing PDS3 an PDS4 in Autoplot, plus some about HAPI
- Jeremy: DASH
- for later:
- Sandy / Calley: AGU or DASH or IHDEA: HAPI Amalgamation
- FYI for AGU - Jon also doing HelioCloud poster in SM033 Dayside Magnetosphere Interactions
Action items:
- Need to revisit (came up with WDC folks): OpenAPI is catching on and allows auto-generation of clients in any language; but it only deals in JSON, and does pagination for long responses, so an OpenAPI client might be good for interacting with metadata, and we'd still need custom code to deal with our long JSON responses and especially for binary and CSV
Notes:
- web page is updated; looks much better; we could eventually hire professional web developer? (see HDF group page, for one example https://www.hdfgroup.org)
Presentation by Rebecca on FAIR:
- specific citation for the data (not the instrument paper, but a data citation) should be a structured element that includes 4 key / required fields: Author, Publisher, Year, Dataset Name, optional: Description, Data Version number
- consider adding License for data usage; could be per dataset?
- suggested on his Creative Commons Zero V1.0
- spdx.org/licenses
- https://spdx.org/licenses/
- For
license
content, she suggests using a link link this: https://spdx.org/licenses/CC0-1.0.html
- Three parts for HAPI team:
- update schema to reflect ways to capture all FAIR info
- update verifier to allow for check for FAIR
- work with existing servers to add updated info for licensing, extra data
Action Items:
- create ticket to update citation element (probably use same element structure for citation in both
about
and datasetinfo
) - add to DevOps page about new releases: the way to add a new PDF version to Zenodo HAPI entry
Agenda:
- Presentation by Rebecca: FAIR principles
- Sandy: talk about HAPI web page at github.io via astro or jekyll
- To discuss: conferences
- 2024 Open Source Science Data Repositories Workshop (see email sent to Jon and Bob); sponsored by NASA’s Science Mission Directorate, Caltech in Pasadena, California, Sep 25-27 (Wed to Fri)
- DASH (Oct 14-18)
- AGU (Dec 9-13)
Notes:
- discussion with Zach and Rebecca; main presentation put off to Aug 5 telecon; Zach working in own repo for now; we will consider linking, using or moving to another repo under HAPI later once it's more mature; current location is: https://github.com/Kurokio/HDRL-Internship-2024
- Bob talked to WDC people about their data and API; they have course data going back to 1900; their new API is close but still very programmer-focused; they are open to adding other endpoints and so will likely implement it themselves; Bob gave them the start of a command line program for them to flesh out
- HAPI needs provenance info - see ticket 186; longer discussion needed; including provenance in HAPI info response makes that response time-dependent, which an mess up caching; but not every dataset has files, so listing files might not appropriate for every dataset
- Sandy - we need a repo location for one-off, small tools (Python only); proposal is to create
hapitools
package where people could integrate elements as part of the package or maybe make a sub-package; Jeremy: call ittools-python
to follow conventions: Python client is inclient-python
and the import isimport hapiclient
so that the tools import will be 'import hapitools` - DASH - could we get a 60 to 90 minute tutorial deep dive into HAPI possibly in another room; Sandy will email the committee about demo rooms
- DASH - main HAPI presentation ideas: process that we go through to get people up and running (social side of it); some of the challenges (provenance, tracking users - should these be in HAPI or SPASE); good place to ask and discuss about cache maintenance / behavior - get other people's perspectives - get feedback on what they have implemented
- IHDEA - community and standards and challenges (provenance, interoperability)
Action items:
- create list of presentations you want to make at DASH and AGU
- Jon to contact SWPC people about HAPI server status
- Sandy and Bob to meet on web site stuff
- Jeremy, Bob and Bobby to meet on CDAWeb HAPI server
- Nobes and Jeremy to met on caching
- To discuss: 2024 Open Source Science Data Repositories Workshop2024 Open Source Science Data Repositories Workshop (see email sent to Jon and Bob)
- WDC update;
- https://wdc.bgs.ac.uk/dataportal/webservice_doc.html
- they have data to the 1800's; some differences from SuperMAG and INTERMAGNET (these seem to start in 1991, so after the 1989 Quebec storm)
- Bob will meet with WDC on Wed 9:30 - others can attend; they are revising their API (far along already)
- Rebecca on July 8 - also with intern;
- WDC group on July 15th? Will decide this Wednesday if they want to meet more and pursue HAPI
- where is the global list of ground magnetometer stations and the place(s) to get data from them?
- JAXA / ARASE in Japan has some ground stations (some with elec. field data)
TODO:
- Jon to email Rebecca to confirm July 8
- Bob to send invite to HAPI developers for WDC meeting
- Sandy to investigate ways to update HAPI web page; better HTML exists - need way to get this through the markdown-heavy approach at github.io using Jekyll (seems complex) or Astro (used by SPDF web developer)
- SuperMAG metadata notes
- Rebecca's email about metadata
- SPASE units
No meeting held.
- Jeremy is working on the new CDAWeb server which is to replace Nand's server
- Jon, Jeremy, Baptiste and Eelko to talk about "relations"
- multi-resolution
- other associations to files and other things
Bob - report on Simon's question related to https://github.com/hapi-server/data-specification/issues/105
Bob - report on INTERMAGNET updates
Bob - discuss servers should accept blank list of parameters, as in parameters=""
in the request URL (which is like requesting all parameters)
- Discuss https://github.com/hapi-server/data-specification/pulls
- Remaining 3.2 tasks verifier | schema
- Add https://spdf.gsfc.nasa.gov/pub/catalogs/spdf-plotwalk-catalog.json to template repo?
- Update on https://github.com/hapi-server/data-specification/issues/176
Notes:
TOPCAT presentation by Mark T.
Questions: Jeremy: all versions? Mark - yes!
Action items:
- Keep Mark in the loop on new versions so he can keep TOPCAT up to date!
We are now going to track our TODO items as tickets in a separate repo; here are the ones due for next week:
Assignments
- We all should look through this page and find things that need to be followed up on or that were not finished.
- Discuss Rebecca's email about ESIP Schema.org Cluster
- Discuss how to track to-do assignments
Discussion:
- talked about SPASE generation tool redo and it could also spit out HAPI metadata; meeting Apr 2 at noon about this
- put off talking about ESIP time series folks
- to track to-do items: make new repo for "api-tasks" and everyone look over the last few meeting notes in the wiki and move your items into api-tasks as an issue (assigned to you and with an expected due date, etc)
- looking over issues for 3.3 and beyond; some are easy; deciding about categorizing all the others
- nominal target date for 3.3 release is July 1
- for a next step, look at all issues related to making an "association" or linkage between HAPI datasets;
- TOPCAT person Mark can present - maybe at next week's HAPI dev meeting
- SPASE asks if HAPI has a service description; answer is no, but they can make one if they want to
- Bob reached out to Das2 HAPI Server person Chris P. who will get it up and running again at UIowa
- two low hanging fruit servers: DataShop and SuperMAG
- discussion of next priorities
- briefly discussed possibility of a bundle with defined characteristics that allow it to be served from HAPI; needs more development with concepts like the URI template and a HAPI-JSON from CDF mechanism; could be an HTM proposal
- Bob - magneto-telluric data (as presented at PyHC last week) is something he uses a lot; the EarthScope project via several universities and a larger data effort; Bob would prefer other to pursue this to keep scientific distance; we make a sample data server to get them interested and then there are options: 1) they like it and pick it up themselves; 2) they like it and propose it (as leads) on a proposal with us to help develop it; 3) or we propose it with them as helpers; FYI there is already a Java API for getting to the data
- go through list of next features for HAPI 3.3 or 4.x (Bob led this)
Action items
- Jon to release PDF of 3.2 to Zenodo
- Jon to add Zenodo push to release process
- Yes Bob to check on verifier status - does it handle 3.2 yet? Jeremy will try his 3.2 server with the verifier
- Jon to work with Sandy to convince Jesper to pull the trigger on HAPI support
- Jon to see if Nobes can update DataShop server
- Jon can ask about HelioCloud offering HAPI-ready environment for people with data; or an aspect of HelioCloud could be to provide a HAPI service for data that meets certain ingest criteria
- Jeremy to poke USGS about stream height dat (lots of these)
- Jon and Bob to find a way to create a separate TODO list (separate from Issues related to the spec)
- Jon to delete issue 168 after moving it to new TODO list
- another issue (157) needs to go on project TODO list
- Decided did not make sense. Will discuss Bob to send email to have people vote on issues for 3.3 - we should pick top 3 or 4 to focus on
- decide if we want to do larger releases or small releases
Jon's list of things that HAPI can server that could be described semantically:
- file list
- event list
- availability data
- different cadence of other dataset
- geographically co-located list of something (either fixed or moving)
Maybe just have elements within a dataset that identify themselves as being specific types of things. Ryan M and Rebecca R and Baptiste C have already looked at semantic linkages
- release of 3.2 ready to go - we can merge the PR if no objections
- discussion with SWPC developer(s) on HAPI server options
- location in GitHub for HAPI presentations (SunPy has this)
- FYI - caching meetings on pause for this week (and next?)
- list out new features of interest for next release:
- multi-resolution links to different datasets
- federation of HAPI servers or better way to track / report known HAPI servers
- availability info for a dataset
- lists of files (as a provenance mechanism)
- find a way to include provenance in
info
response; address how HAPI communicates provenance for underlying data that is changing; make sure references follow conventions (some now recommend not using URLs but paper titles to go into a search engine) - possibly a more generic way of semantic communication of HAPI content
Actions:
- check with Bob about Verifier status for version 3.2
- Jon to add repo for talks and presentations
- Sandy and Jeremy to look at Python server (load capacity / multi-threading); also advertise this more!
- eventually have Sandy present about HAPI amalgamation
- also consider Brent or Darren present about using HAPI for model output (spacecraft fly-throughs)
- release of 3.2 is almost ready; there's a branch for the new 3.2 directory
- one clarification needs to go in 3.2: Bob created a ticket and will update the text as usual; Jon will get this into the 3.2 release files
- next week's meeting will start at 1pm to avoid colliding with the PyHC spring meeting (online only; 9am-11am eastern M-Th)
- next week we will vote on the release of HAPI 3.2
- Nobes and Jeremy to meet this week on caching items
- CDAWeb server has trouble with voyager dataset; Bernie looked and it has an empty directory; this Voyager dataset seems to have problems with the current HAPI server (sot it's not a great dataset for testing); best idea is to move to the new Java server that ESAC and NOAA/SWPC are using
- TOPCAT is now reading HAPI data; Sandy replied to Tess about this and asked to talk about HAPI as a potential VO standard with her sometime
- Bob will contact the TOPCAT developer to see how HAPI interacts with these services; we talked about TAP servers; actually it is EPN-TAP that is mostly what Heliophysics / planetary sites use in Europe
- Jon to send around list of upcoming data and software-related meetings to hapi-dev group
- Review pull requests.
- Jon integrate change log info into spec document
- Bob maybe finish TestData3.2 with FITS.
- Discuss Workshop for Collaborative and Open-Source Science Data Systems April 29 – May 1, 2024 at the Laboratory for Atmospheric and Space Physics in Boulder, CO. (Greg Lucas and others)
- Follow-up Meeting is at U of Iowa Aug 12-14 is Open Source SDC and Aug 15-16 is Rebecca's meeting (TWSC funded, if it wins)
- Also in Boulder on May. 29 to May. 31, 2024: Innovations in Open Science (IOS) Planning Workshop: Community Expectations for a Geoscience Data Commons; https://www2.cisl.ucar.edu/events/innovations-open-science-ios-planning-workshop-community-expectations-geoscience-data
- Discussion about the all.json contents and related federated info that is now being collected in the "servers" Github project; eventually we may want another layer of services on top of this that presents it more as a service and less of a raw set of Github files, but for now we are collecting HAPI server data and can add services later
- Talk about different portals: Heliophysics Data Portal (SPASE driven) at https://heliophysicsdata.gsfc.nasa.gov/websearch/dispatcher which is what Aaron started and runs at SPDF; newer portal is by Rebecca R. and aims to include more solar datasets which are not necessarily in SPASE; earthdata.nasa.gov is for non-experts to find and learn about what is available (theme-based: atm., ocean, etc); leads to hand-selected data products suitable for non-experts and entry into the field; NASA HQ wants other divisions to have a similar thing; key task is metadata creation - will need significant effort to get solar data into SPASE; people are working on this, but it's taking a while and its being done by hand; difficulty with this is that hand-edited metadata becomes obsolete and so not useful). Daniel (at GSFC) works on this under HDRL - progress is slow-ish; nothing public yet; the earth data one looks to have had a lot of financial support over multiple years
- Focus for us for HAPI data - how best to make it searchable in a federated mechanism that knows about all the HAPI resources relevant for a project; might be possible to make SPASE records from HAPI sources (keep it automated so we can adjust to changes in SPASE)
- action item - (Bob) talk to Brian about getting more SPASE from HAPI or other ideas about SPASE evolution
Agenda for Thur 3-5pm (Eastern) meeting on HAPI 3.2 release progress:
- any comments on the
stringType
wording updates - flesh out changeling entries for 3.0, 3.1, 3.2
- check_array unit tests - are they right and complete?
- more test cases for 1) schema validation 2) test data servers (esp. for 3.2 and 3.3 test data)
- discuss what goes in
all.json
and how to arrange
- clarification of tasks at upcoming PyHC events: spring meeting in March online only - Jon to provide HAPI updates; summer school May 20-24 in person - Bob to be there in person for tutorial and roulette/tasking game;
- Mark from NOAA / Colorado U - using the server-java implementation with Eelco's timeline viewer; he's got data going through by creating products by hand, and using server-java as a stand-alone mechanism and providing it data; he does have to make the JSON info for the catalog and info responses
- check up on various other projects - specific meetings scheduled this week (Wed for caching and Fri for 3.2 release push)
Agenda:
- PyHC summer school (May 20-24 in Boulder, CO) needs some HAPI support: "Combining PyHC packages examples"; HAPI adapters seems appropriate, either Kamodo, except there are two version now, someone mad e afford, and that's what you get with pip install kamodo!) or maybe better then is SpacePy or SunPy (bit of a stretch); need someone to help out with this (Jon to find someone - maybe Jon N.)
- COSPAR (in July) presentation by Sandy "HAPI in Analysis Codes" with themes of data amalgamation, TOPS, PyHC, ML, serverless ops; abstract due end of this week
- UCAR / NCAR workshop on data and standards May 29-31
- SWPC is going to make a HAPI Server and use Eelco's tool
- action item run down
Discussion and items that still need action:
- Bob still needs to know what needs to be added to 3.2 (he needs to know the few things that need adding to 3.2), so Jeremy to get short list of things to be added to 3.1 and also 3.2
- updates to check_array.js: major task is to move semantic checking (that can't be done with JSON schema) into a package separate from the verifier; this would give people the ability to fully check the JSON from a server without running the full verifier; the verifier then can use this separate capability inside; Jon to look at the code to see if it has all the semantic checks that are needed
- Jon - Add section to hapi-dev for 3.2 change list (with links to associated issues!)
- Jon - Check that all 3.1 changes actually ended up in the 3.1 change list
- From last week: Jon: remove comment about constraining strings to enums
- From last week: Jon: create pull request for adding error 1412 "unsupported depth value" (comments about this also in the uber ticket above)
- From last week: Jeremy: come up with schema for all.txt
News:
- ESAC server for Solar Orbiter coming along
- SPDF server (having memory issues?) down on weekend due to memory leaks
- for next week: update on the Java server being worked for ESA that will eventually be used at GSFC
Action items for getting 3.2 ready:
- Jeremy: look over 3.1 schema and let Bob know of anything that needs to be added still (do this manually, and put the result sent he ticket); Jeremy has schema-sorting code that may help; Bob's formatting standardizer has made comparisons harder - we need to move to this longer term, but for now, we have just 20 lines of JSON to add into 3.1 (and then 3.2); see this ticket: https://github.com/hapi-server/data-specification-schema/issues/1
- Jeremy: also do this for 3.2 (the current 3.2 is a copy, possibly lightly edited, of 3.1
- Jon and Jeremy: review the test for units and labels (see the same ticket for a description); look at verifier-nodes / lib / checkArray_test.js and make sure the test uses (units string and array sizes) are consistent
- Jon: more reviewing and checking of schema for general errors
- Jon (and see if Nobes wants to help): make lots of examples data-specification-schema / test /3.0 (and other versions) - create examples of info responses that flesh out the core aspects and also some corner cases of the spec; remove scripts that are mixed in with test cases
- Jeremy(?): some clients (TopCat person for example) use our test datasets, so we need to make sure to capture new features of given version of spec so people can have confidence that their client can handle everything even the corner cases
- Bob and Jeremy: Jeremy to ask Chris P to send / show Bob the Python code for plotting time-varying bins, and Bob can see about adding that
- Jon: remove comment about constraining strings to enums
- Bob: move issue about error 1412 from schema repo to specification schema. Done: https://github.com/hapi-server/data-specification/issues/187
- Jon: create pull request for adding error 1412 "unsupported depth value" (comments about this also in the uber ticket above)
- Someday: now that the catalog can also list all the info MD for each catalog, we need a way to manage the schema objects jointly; to the outside world, there needs to be one schema for each JSON response (one per endpoint) and this will look like there is a lot of copy/paste going on, since the entire info response can be inside parts of the catalog response. So for inside (the JSON schemas that we maintain), there needs to be no copy/paste, and as little code as possible, but somehow (are there JSON #include options?) incorporate certain schemas within others; one option is to have our own way of storing things, and then a set of scripts to create al the endpoint-specific schemas out of our own lean content (but then we have to maintain those scripts)
- Bob and Jeremy: make a JSON schema for the all.txt (shouldn't take long - maybe 30 min) and deprecate the all.txt and encourage people to use the JSON version; there was discussion about having a service to harvest "about/" info and present a cached version of it to people, since if a server goes down, you want' even be able to get to its about page; if a server does not have an about/ response (or its' weak), then we can offer to write an about/ kind of info for any server below 3.2 (and then our cache may or may not have harvested recent info from other servers about/ endpoints)
- Nobes: come up with list of parameters for data caching - options for interrogating the cache and changing settings; Jeremy suggests a no-op implementation that has the right API to the cache to flesh that out; Sandy's three use cases are a good starting point: don't use the cache (always get live data); only use the cache (don't refresh anything; should a cache miss then be a failure?); use cache but check every time so that I'm always using the latest data; Nobes will send meeting invite to Jeremy and Bob for brainstorming ideas (mostly the parameters / arguments) on Tue or Wed
Bob and Jon met with Nga Chung at JPL. She leads https://sdap.apache.org/ effort. Will also be working on GRACE follow-on. They've looked into OGC, which has a draft standard. It is complex and not much software written for it. Thinks HAPI is a good candidate. Will report back to us in 3 months.