1 Significant Changes to Specification
1.1 v2 to v3 API Changes
1.2 v2 to v3 Schema Changes
1.3 3.X Changes
2 Introduction
2.1 Overview
2.2 Facilitating Adoption
2.3 Limitations
3 Endpoints
3.1 Overview
3.2 hapi
3.3 about
3.4 capabilities
3.5 catalog
3.6 info
3.6.1 Request Parameters
3.6.2 info Response Object
3.6.3 unitsSchema Details
3.6.4 coordinateSystemSchema Details
3.6.5 additionalMetadata Object
3.6.6 stringType Object
3.6.7 parameter Object
3.6.8 size Details
3.6.9 fill Details
3.6.10 units and label Array
3.6.11 vectorComponents Object
3.6.12 location and geoLocation Details
3.6.13 bins Object
3.6.14 JSON References
3.7 data
3.7.1 Request Parameters
3.7.2 Response
3.7.3 Examples
3.7.3.1 Data with Header
3.7.3.2 Data Only
3.7.4 Response formats
3.7.4.1 CSV
3.7.4.2 Binary
3.7.4.3 JSON
3.7.5 Errors While Streaming
3.7.6 Representation of Time
3.7.6.1 Incoming time values
3.7.6.2 Outgoing time values
3.7.6.3 Time Range With No Data
3.7.6.4 Time Range With All Fill Values
4 Status Codes
4.1 status Object
4.2 status Error Codes
4.3 Client Error Handling
5 Implementation Details
5.1 Cross-Origin Resource Sharing
5.2 Security Notes
5.3 HEAD Requests and Efficiency
6 References
7 Contact
8 Appendix
8.1 Sample Landing Page
8.2 Allowed Characters in id, dataset, and parameters
8.3 JSON Object of Status Codes
8.4 Examples
8.5 Robot clients
8.6 FAIR
8.6.1 Findable
8.6.2 Accessible
8.6.3 Interoperable
8.6.4 Reusable
Version 3.3.0-dev | Heliophysics Data and Model Consortium (HDMC) |
This is the development version of the HAPI Data Access Specification.
The most recent stable release is Version 3.3.0.
Note that version 3.0 introduced some breaking changes. They are listed here and are documented as issues in the 3.0 Milestone list.
Non-backward compatible changes introduced to the request interface in HAPI 3.0:
- The URL parameter
id
was replaced withdataset
. time.min
andtime.max
were replaced withstart
andstop
, respectively.- Addition of a new endpoint, "about", for server description metadata.
These changes were discussed in issue #77. HAPI 3 servers must accept both the old and these new parameter names, but the HAPI 2 specification requires an error response if the new URL parameter names are used. In a future version, the deprecated older names will no longer be valid.
This list captures new optional elements that can be included in the info
response as of 3.0:
- Ability to specify time-varying bins (#83)
- Ability to use JSON references in
info
response (#82) - Ability to indicate a units schema (if one is being used for
units
strings) (#81)
3.0, 3.1, 3.2, and 3.3 changes are documented in the changelog. We follow Semantic Versioning conventions, so all 3.X versions are backward compatible with version 3.0.
This document describes the Heliophysics Application Programmer’s Interface (HAPI) specification, which is an API, metadata, and data streaming format specification for time-series data. The intent of this specification is to enhance interoperability among time series data providers. The objective of this specification is to capture features available from many existing data providers and to codify implementation details so that providers can use a common API. This will make it possible to obtain time series science data content seamlessly from many sources and use a variety of programming languages.
This document is intended to be used by two groups of people: first by data providers who want to make time-series data available through a HAPI server, and second by data users who want to understand how data is made available from a HAPI server, or perhaps to write client software to obtain data from an existing HAPI server.
HAPI constitutes a minimum but complete set of capabilities needed for a server to allow access to the time series data values within one or more data collections. Because of its focus on data access, the HAPI metadata standard is not intended for complex search and discovery. However, the metadata schema allows ways to indicate where further descriptive details for any dataset could be found and the metadata contains enough information to enable its use by complex search and discovery software.
The HAPI API is based on REpresentational State Transfer (RESTful) principles, which emphasize that URLs are stable endpoints through which clients can request data. Because it is based on well-established HTTP request and response rules, a wide range of HTTP clients can be used to interact with HAPI servers.
Key definitions for terms used in this document include
-
parameter – a measured science quantity or a related ancillary quantity at one instant in time; may be scalar or a multi-dimensional array as a function; must have units and must have a fill value that indicates no measurement was available or absent information
-
record – all the parameters and an associated time value
-
dataset – a collection of records with the same parameters; a HAPI service presents a dataset as a seamless collection of time- records such that it or a subset of it can be retrieved without knowledge of the actual storage details
-
catalog - a collection of datasets
-
request parameter – keywords that appear after the
?
in a URLFor example, in the request (see change notes)
http://server/hapi/data?dataset=alpha&start=2016-07-13&stop=2016-07-14
the three request parameters are
dataset
(corresponding to the identifier of the dataset),start
, andstop
. They have values ofalpha
,2016-07-13
, and2016-07-14
, respectively. This document will always use the full phrase "request parameter" to refer to these URL elements to distinguish them from the parameters in a dataset.In the above URL, the segment represented as
server
captures the hostname for the HAPI server as well as any prefix path elements before the requiredhapi
element. For example, inhttp://example.com/public/data/hapi
, theserver
element isexample.com/public/data
.
The text from this section was taken from Weigel et al., 2021.
The HAPI specification is designed to provide a standardized way to provide streaming services that many time-series data providers already offer. The use of the common aspects of existing features of HP data providers means that data providers can, with minimal effort, modify existing streaming servers to make them HAPI compliant.
In addition, several tools have been developed to facilitate adoption.
-
Although the HAPI streaming formats are simple enough that only 10-20 lines of code are needed to read a basic HAPI response, there are error conditions and other optimizations to consider (such as caching and parallelization of requests); a robust and comprehensive client implementation is beyond what should be expected of science users or even software developers who want to create tools that use HAPI data. Mature HAPI server client libraries are available from open-source HAPI projects for Java, IDL, MATLAB, and Python, so most users can begin with these libraries. These clients read a HAPI response into a data structure appropriate for the given language (e.g., in Python, the response is read into a NumPy N-D array with timestamps converted to Python
datetime
objects). Work is underway to incorporate HAPI readers into existing popular analysis frameworks, such as SPEDAS in IDL and Python, Autoplot in Java, SunPy in Python, and SpacePy in Python. -
To assist server developers with adoption, a cross-platform reference server has been developed. With this server, a data provider only needs to provide HAPI JSON metadata and a command-line program that emits HAPI CSV given inputs of a dataset identifier and start/stop times. The server handles HAPI error responses, request validation, and logging.
-
A validator/verifier has been developed that runs many tests on a server. The server executes a comprehensive suite of tests for compliance with the specification. By either downloading and running the test software locally, or by entering the URL of a HAPI server into a website, data providers can test most aspects of their HAPI implementation, with detailed results indicating warnings or errors. This has proven to be a very effective way to assist developers, and it helps ensure that new HAPI servers are fully functional. The tests include checks of JSON responses against the HAPI metadata schema, verifying that the server handles different allowed start/stop time representations (e.g, year, day-of-year and year, month day with varying amounts of additional time precision), error responses and message, timeouts, and testing that output is independent of the output format. Also, the validator/verifier makes recommendations. For example, if a server does not include cross-origin request sharing headers or respond with compressed data when gzip is specified in the
Accept-Encoding
request header, it emits a message noting their advantages and a link to information on how to enable or implement these features.
Because HAPI requires a single time column to be the first column, each record (one row of data) must be associated with one time value (the first value in the row). This has implications for serving files with multiple time arrays. Suppose a file contains 1-second data, 3-second data, and 5-second data, all from the same measurement but averaged differently. A HAPI server could expose this data, but not as a single dataset. To a HAPI server, each time resolution could be presented as a separate dataset, each with its unique time array.
Note that there are only a few supported data types: isotime
, string
, integer
, and double
. This is intended to keep the client code simple in terms of dealing with the data stream. However, the spec may be expanded to include other types, such as 4-byte floating-point values (which would be called float), or 2-byte integers (which would be called short).
The HAPI specification has five required endpoints that give clients a precise way to first determine the data holdings of the server and then to request data. The functionality of the required endpoints is as follows:
-
/hapi/capabilities
lists the output formats the server can stream (csv
,binary
, orjson
, described below). -
/hapi/about
lists the server id and title, contact information, and a brief description of the datasets served (this endpoint is new in the version 3 HAPI specification). -
/hapi/catalog
lists the catalog of available datasets; each dataset is associated with a unique id and may optionally have a title. -
/hapi/info
lists information about a dataset with a given id; a primary component of the description is the list of parameters in the dataset. -
/hapi/data
streams data for a dataset of a given id and over a given time range; a subset of parameters in a dataset may be requested (default is all parameters).
There is also an optional landing page endpoint /hapi
that returns human-readable HTML. Although there is recommended content for this landing page, it is not essential to the functioning of the server.
Servers may have non-standard endpoints (an endpoint not given above) under /hapi/
if they are prefixed by x_
, e.g., /hapi/x_custom_endpoint
. This will communicate to the user that this is not a standard endpoint and will prevent a name conflict if the HAPI standard adds custom_endpoint
to the specification.
The five required endpoints are RESTful-style in that the resulting HTTP response is the complete response for each endpoint. In particular, the /data
endpoint does not give URLs for file or links to where the data can be downloaded; instead, it streams the data in the HTTP response body. The full specification for each endpoint is described below.
All endpoints must have a /hapi
path element in the URL, and only the /info
and /data
endpoints take query parameters:
http://server/hapi (Optional HTML landing page)
http://server/hapi/capabilities
http://server/hapi/about
http://server/hapi/catalog
http://server/hapi/info?dataset=...[&...]
http://server/hapi/data?dataset=...&...
We recommend that any requests with a trailing slash are handled by sending a 301
redirect, e.g.,
http://server/hapi/ => 301 redirect to http://server/hapi
http://server/hapi/info/?dataset=... => 301 redirect to http://server/hapi/info?dataset=...
Requests to a HAPI server must not change the server state. Therefore, all HAPI endpoints must respond only to HTTP HEAD and GET requests.
The request parameters and their allowed values must be strictly enforced by the server. HAPI servers must not add request parameters beyond those in the specification. If a request URL contains any unrecognized or misspelled request parameters, a HAPI server must respond with an error status (see HAPI Status Codes for more details). The principle being followed is that the server must not silently ignore unrecognized request parameters because this would falsely indicate to clients that the request parameter was understood and was taken into account when creating the output. That is, if a server is given a request parameter that is not part of the HAPI specification, such as averagingInterval=5s
, the server must report an error for two reasons: 1. additional request parameters are not allowed, and 2. the server will not do any averaging.
The outputs from a HAPI server to the about
, catalog
, capabilities
, and info
endpoints are JSON objects, the formats of which are described below in the sections detailing each endpoint. The data
endpoint must be able to deliver Comma Separated Value (CSV) data following the RFC 4180 standard [1], but may optionally deliver data content in binary format or JSON format. The response stream formats are described in the Data Stream Content section.
The following is the detailed specification for the five main HAPI endpoints as well as the optional landing page endpoint.
This root endpoint is optional and should provide a human-readable landing page for the server. Unlike the other endpoints, there is no strict definition for the output, but if present, it should include a brief description of the data and other endpoints and links to documentation on how to use the server. An example landing page that can be easily customized for a new server is in the Appendix.
There are many options for landing page content, such as an HTML view of the catalog or links to commonly requested data.
Sample Invocation
http://server/hapi
Request Parameters
None
Response
The response is in HTML format with a mime type of text/html
. There is no specification for the content but should provide an overview that is useful for science users.
Example
Retrieve the landing page for this server.
http://server/hapi
Example Response
See the Appendix.
Sample Invocation
http://server/hapi/about
Request Parameters
None
Response
The server's response to this endpoint must be in JSON format [3] as defined by RFC-7159, and the response must indicate a mime type of application/json
. Server attributes are described using keyword-value pairs, with the required and optional keywords described in the following table.
About Object
Name | Type | Description |
---|---|---|
HAPI |
string | Required The version number of the HAPI specification this description complies with. |
status |
Status object | Required Server response status for this request. |
id |
string | Required A unique ID for the server. Ideally, this ID has the organization name in it, e.g., NASA/SPDF/SSCWeb, NASA/SPDF/CDAWeb, INTERMAGNET, UIowa/RPWG, LASP/TSI, etc. A list of known ids is available in the servers repository. |
title |
string | Required A short human-readable name for the server that defines an acronym in the id or provided additional context about data served (e.g., LASP Total Solar Irradiance (TSI) data). The suggested maximum length is 40 characters. We recommend against including "HAPI" and/or the HAPI version of the server in the title. |
contact |
string | Required Contact information or email address for server issues. HAPI clients should show this contact information when it is certain that an error is due to a problem with the server (as opposed to the client). Ideally, a HAPI client will recommend that the user check their connection and try again at least once before contacting the server contact. |
description |
string | Optional A brief description of the type of data the server provides. |
contactID |
string | Optional The identifier in the discovery system for information about the contact. For example, a SPASE ID of a person identified in the contact string. |
resourceID |
string | Optional An identifier associated with all datasets. |
citation |
string | Optional Deprecated; use serverCitation . |
serverCitation |
string | Optional How to cite the HAPI server. An actionable DOI is preferred (e.g., https://doi.org/...). This serverCitation differs from the dataCitation in an /info response. Here the serverCitation is for the entity that maintains the data server. |
note |
string or array of strings | Optional General notes about the server that are not appropriate to include in description . For example, a change log that lists added datasets or parameters in datasets. If an array of strings is used to describe datestamped notes, we recommend prefixing the note with a HAPI restricted ISO 8601 format, e.g., ["2024-01-01T00: Note on this date/time", "2024-04-02T00: Note on this date/time"] . |
warning |
string or array of strings | Optional Temporary warnings about the data server, such as scheduled downtime and known general problems. Dataset-specific warnings should be placed in the warning element of the /info response. |
dataTest |
DataTest |
Optional Information that a client can use to check that a server is operational. Data response should contain more than zero records. See below for the definition of this object. |
DataTest
Object
Name | Type | Description |
---|---|---|
query |
Query |
Required See the Query object definition below. |
name |
string | Optional Name of the test. |
A server test query then has the following definition:
Query
Object
Name | Type | Description |
---|---|---|
dataset |
string | Required String identifier for the dataset to be used in the test. |
start |
string | Required String value of test data start time in restricted ISO 8601 format. |
stop |
string | Required String value of test data stop time in restricted ISO 8601 format. |
parameters |
string | Optional A comma-separated list of parameters to include in the response (allowed characters). If not provided, all parameters will be included. |
Example
http://server/hapi/about
Example Response:
{
"HAPI": "3.3",
"status": {"code": 1200, "message": "OK"},
"id": "TestData3.3",
"title": "HAPI 3.3 Test Data and Metadata",
"contact": "examplel@example.org",
"dataTest": {
"name": "Ping Test",
"query": {
"dataset": "dataset1",
"start": "2023-01-01T12:00:00Z",
"stop": "2023-01-01T14:00:01Z",
"parameters": "parameter1,parameter2"
}
}
}
This endpoint describes relevant implementation capabilities for this server. Currently, the only possible variability from server to server is the list of output formats that are supported.
A server must support the csv
output format, but binary
output format and json
output may optionally be supported. Servers may support custom output formats, which would be advertised here. All custom formats listed by a server must begin with the string x_
to indicate that they are custom formats and avoid naming conflicts with possible future additions to the specification.
Sample Invocation
http://server/hapi/capabilities
Request Parameters
None
Response
The server's response to this endpoint must be in JSON format [3] as defined by RFC 7159, and the response must indicate a mime type of application/json
. Server capabilities are described using keyword-value pairs, with outputFormats
being the only keyword currently in use.
Capabilities Object
Name | Type | Description |
---|---|---|
HAPI |
string | Required The version number of the HAPI specification this description complies with. |
status |
Status object | Required Server response status for this request. |
outputFormats |
string array | Required The list of output formats the server can emit. All HAPI servers must support at least csv output format, with binary and json output formats being optional. Any custom formats not in this list must begin with a prefix of x_ as shown in the example below. |
catalogDepthOptions |
string array | Optional A list of options for /catalog?depth= requests. It can include one or more of dataset and all . See the catalog endpoint description. |
Example
Retrieve a listing of the capabilities of this server.
http://server/hapi/capabilities
Example Response:
{
"HAPI": "3.3",
"status": {"code": 1200, "message": "OK"},
"outputFormats": ["csv", "binary", "json", "x_hdf"]
}
Note the non-standard format of HDF data, and that it has the required x_
prefix.
If a server only reports an output format of csv
, then requesting binary
data should cause the server to respond with an error status of 1409 "Bad request - unsupported output format"
with a corresponding HTTP response code of 400. See
below for more about error responses.
This endpoint provides a list of datasets available from the server.
Sample Invocation
http://server/hapi/catalog
Request Parameters
Name | Description |
---|---|
depth |
Optional Possible values are dataset (the default) and all . Servers may choose to implement the all option, which allows all of the metadata from a server to be obtained in a single request. If this request parameter is supported, the /capabilites end point must return catalogDepthOptions=["dataset", "all"] . |
resolve_references |
Optional If false, allow the server to send responses with references that need resolution by the client. Default is true . See 3.6.13 JSON References. |
Response |
The response is in JSON format [3] as defined by RFC-7159 and has a MIME type of application/json
. The catalog
is a simple listing of identifiers for the datasets available from the server. Additional metadata about each dataset is available through the info
endpoint (described below). The catalog takes no query parameters and always lists the full catalog.
Catalog Object
Name | Type | Description |
---|---|---|
HAPI |
string | Required The version number of the HAPI specification this description complies with. |
status |
object | Required Server response status for this request. (see HAPI Status Codes) |
catalog |
array of Dataset | Required A list of datasets available from this server. |
Dataset Object
Name | Type | Description |
---|---|---|
id |
string | Required The computer-friendly identifier (allowed characters) that the host system uses to locate the dataset. Each identifier must be unique within the HAPI server where it is provided. |
title |
string | Optional A short label for the dataset usable in selecting a dataset from a list. If none is given, it defaults to the id. The suggested maximum length is 40 characters. |
info |
object | Optional but required when depth=all is use in request. This object should be identical in content to what is returned by a /info?dataset=id request. The HAPI and status nodes should be omitted inside the info objects. (The status does not belong there, and the HAPI version should be the same for all info elements, so it is redundant information.) |
The identifiers must be unique within a single HAPI server. Also, dataset identifiers in the catalog should be stable over time. Including rapidly changing version numbers or other revolving elements (dates, processing ids, etc.) in the datasets identifiers should be avoided. The intent of the HAPI specification is to allow data to be referenced using RESTful URLs with a reasonable lifetime.
Identifiers must be limited to the set of characters, including upper and lower case letters, numbers, and the following characters: comma, colon, slash, minus, and plus. See 89 for a discussion of this.
Example
Retrieve a listing of datasets available from a server.
http://server/hapi/catalog
Example Response:
{
"HAPI" : "3.3",
"status": {"code": 1200, "message": "OK"},
"catalog":
[
{"id": "ACE_MAG", title:"ACE Magnetometer data"},
{"id": "data/IBEX/ENA/AVG5MIN"},
{"id": "data/CRUISE/PLS"},
{"id": "any_identifier_here"}
]
}
Example
Retrieve a listing of all the datasets and embed the info
metadata for each in the dataset object.
First, verify that this is supported:
http://server/hapi/capabilities
{
"HAPI": "3.3",
"status": {"code": 1200, "message": "OK"},
"outputFormats": ["csv", "binary", "json"],
"catalogDepthOptions": ["dataset", "all"]
}
The /capabilities
response indicates that the /catalog
endpoint allows the depth=all
option. Then, a subsequent request for the entire catalog will succeed:
http://server/hapi/catalog?depth=all
Example Response:
(where ...
is a placeholder for additional metadata that has been omitted for clarity)
{
"HAPI" : "3.3",
"status": {"code": 1200, "message": "OK"},
"catalog":
[
{
"id": "MAG1", title:"Magnetometer data",
"info": {
"startDate": "2010-01-01T00:00:00Z",
"stopDate": "2020-01-02T00:00:00Z",
"parameters": [{"name": "Time", ...}]
...
}
},
{
"id": "MAG2", title:"Magnetometer data",
"info": {
"startDate": "2000-01-02T00:00:00Z",
"stopDate": "2010-01-01T00:00:00Z",
"parameters": [{"name": "Time", ...}]
...
}
},
{
...
}
]
}
This endpoint provides a data header for a given dataset. The header is expressed in JSON format [3] as defined by RFC-7159 and has a MIME type of application/json
. The specification for the header is that it provides a minimal amount of metadata that allows for the automated reading by a client of the data content streamed via the data
endpoint. The header must include a list of the parameters in the dataset and the date range covered by the dataset. There are also optional metadata elements for capturing other high-level information, such as a brief description of the dataset, the nominal cadence of the data, and ways to learn more about a dataset. The table below lists all required and optional dataset attributes in the header.
Servers may include additional custom (server-specific) keywords or keyword/value pairs in the header provided that the keywords begin with the prefix x_
. While a HAPI server must check all request parameters (servers must return an error code given any unrecognized request parameter as described earlier), the JSON content output by a HAPI server may contain additional, user-defined metadata elements. All non-standard metadata keywords must begin with the prefix x_
to indicate to HAPI clients that these are extensions. Custom clients could use the additional keywords, but standard clients would ignore the extensions. By using the standard prefix, the custom keywords will not conflict with any future keywords added to the HAPI standard. Servers using these extensions may wish to include additional, domain-specific characters after the x_
to avoid possible collisions with extensions from other servers.
Each parameter listed in the header must itself be described by specific metadata elements, and a separate table below describes the required and optional parameter attributes.
By default, the parameter list in the info
response will include all parameters in the dataset. However, a client may request a header for just a subset of the parameters. The subset of interest is specified as a comma-separated list via the request parameter called parameters
. (Note that the client would have to obtain the parameter names from a prior request.) There must not be any duplicates in the subset list, and the subset list must be arranged according to the ordering in the original, full list of parameters. The reduced header is useful because it is also possible to request a subset of parameters when asking for data (see the data
endpoint), and a reduced header can be requested that would then match the subset of parameters in the data. This correspondence of reduced header and reduced data ensures that a data request for a subset of parameters can be properly interpreted even if additional subset requests are made with no header. (Although a way to write a client as safe as possible would be always to request the full header and rely on its parameter ordering to determine the data column ordering.)
Note that the data
endpoint may optionally prepend the info
header to the data stream when include=header
is included in the request URL. In cases where the data
endpoint response includes a header followed by csv
or binary
data, the header must always end with a newline. This enables the end of the JSON header to be more easily detected.
Sample Invocation
http://server/hapi/info?dataset=ACE_MAG
Items with a * superscript in the following table have been modified from version 2 to 3; see change notes.
Name | Description |
---|---|
dataset * |
Required The identifier for the dataset (allowed characters) |
parameters |
Optional A comma-separated list of parameters to include in the response (allowed characters). Default is all; ...¶meters=&... in URL should be interpreted as meaning all parameters. See "subsetting parameters" in this subsection for additional examples. |
resolve_references |
Optional If false , allow the server to send responses with references that need resolution by the client. Default is true . See JSON References. |
Response
The response is in JSON format [3] and provides metadata about one dataset.
Subsetting Parameters
Clients may request an info
response that includes only a subset of the parameters or a data stream for a subset of parameters (via the data
endpoint, described next). The logic on the server is the same for info
and data
requests in terms of what dataset parameters are included in the response. The primary time parameter (always required to be the first parameter in the list) is always included, even if not requested. These examples clarify the way a server must respond to various types of dataset parameter subsetting requests:
-
request: do not ask for any specific parameters (i.e., there is no request parameter called
parameters
);
example:http://server/hapi/data?dataset=MY_MAG_DATA&start=1999Z&stop=2000Z
response: all columns -
request: ask for just the primary time parameter;
example:http://server/hapi/data?dataset=MY_MAG_DATA¶meters=Epoch&start=1999Z&stop=2000Z
response: just the primary time column -
request: ask for a single parameter other than the primary time column (like
parameters=Bx
);
example:http://server/hapi/data?dataset=MY_MAG_DATA¶meters=Bx&start=1999Z&stop=2000Z
response: primary time column and the one requested data column -
request: ask for two or more parameters other than the primary time column;
example:http://server/hapi/data?dataset=MY_MAG_DATA¶meters=Bx,By&start=1999Z&stop=2000Z
response: primary time column followed by the requested parameters in the order they occurred in the original, non-subsetted dataset header (note that the parameter ordering in the request must match the original ordering anyway - see just below)
Note that the order in which parameters are listed in the request must not differ from the order that they appear in the response. For a data set with parameters Time,param1,param2,param3
this subset request
?dataset=ID¶meters=Time,param1,param3
is acceptable, because param1
is before param3
in the parameters
array (as determined by the /info
response). However, asking for a subset of parameters in a different order, as in
?dataset=ID¶meters=Time,param3,param1
is not allowed, and servers must respond with an error status. See HAPI Status Codes for more about error conditions and codes.
Dataset Attribute | Type | Description |
---|---|---|
HAPI |
string | Required The version number of the HAPI specification with which this description complies. |
status |
object | Required Server response status for this request; see HAPI Status Codes. |
format |
string | Required (when the header is prefixed to data stream) Format of the data as csv or binary or json . |
parameters |
array of Parameter | Required Description of the parameters in the data. |
startDate |
string | Required Restricted ISO 8601 date/time of first record of data in the entire dataset. |
stopDate |
string | Required Restricted ISO 8601 date/time of the last record of data in the entire dataset. For actively growing datasets, the end date can be approximate, but it is the server's job to report an accurate end date. |
timeStampLocation |
string | Optional Indicates the positioning of the timestamp within the measurement window. Must be one of begin , center , end or other . If this attribute is absent, clients are to assume a default value of center , which is meant to indicate the exact middle of the measurement window. A value of other indicates that the location of the time stamp in the measurement window is not known or these options cannot be used for an accurate description. See also HAPI convention notes. (Note: version 2.0 indicated that these labels were in all upper case. Starting with version 2.1, servers should use all lower case. Clients, however, should be able to handle both all upper case and all lower case versions of these labels.) |
cadence |
string | Optional Time difference between records as an ISO 8601 duration. This is meant as a guide to the nominal cadence of the data and not a precise statement about the time between measurements. See also HAPI convention notes. |
sampleStartDate |
string | Optional Restricted ISO 8601 date/time of the start of a sample time period for a dataset, where the time period must contain a manageable, representative example of valid, non-fill data. Required if sampleStopDate given. |
sampleStopDate |
string | Optional Restricted ISO 8601 date/time of the end of a sample time period for a dataset, where the time period must contain a manageable, representative example of valid, non-fill data. Required if sampleStartDate given. |
maxRequestDuration |
string | Optional An ISO 8601 duration indicating the maximum duration for a request. This duration should be interpreted by clients as a limit above which a request for all parameters will very likely be rejected with a HAPI 1408 error; requests for fewer parameters and a longer duration may or may not be rejected. |
description |
string | Optional A detailed description of the dataset content, caveats, relationships to other data, references and links -- the information users need to know for a basic interpretation of the data. Suggested length is a few lines of text, e.g., 1000 characters; extended details can be referenced with links. |
unitsSchema |
string | Optional The name of the units convention that describes how to parse all units strings in this dataset. Currently, the only allowed values are: udunits2 , astropy3 , and cdf-cluster . See unitsSchema Details for additional information about these conventions. The list of allowed unit specifications is expected to grow to include other well-documented unit standards. |
coordinateSystemSchema |
string | Optional The name of the schema or convention that contains a list of coordinate system names and definitions. If this keyword is provided, any coordinateSystemName keyword given in a parameter definition should follow this schema. See coordinateSystemSchema Details for additional information. |
location |
object | Optional A way to specify where a dataset's measurements were made. See location and geoLocation Details for an explanation of the two ways to indicate location: one for a fixed location and one for when the measurement location changes with time. |
geoLocation |
array of double | Optional A simplified, compact way of indicating a fixed location for a dataset in Earth-based coordinates. The array has 2 or 3 elements: [longitude in degrees, latitude in degrees, (optional) altitude in meters]. For information and examples, see location and geoLocation Details. |
resourceURL |
string | Optional URL linking to more detailed information about this dataset. |
resourceID |
string | Optional An identifier by which this data is known in another setting (e.g., DOI) |
creationDate |
string | Optional Restricted ISO 8601 date/time of the dataset creation. |
citation |
string | Optional Deprecated; use datasetCitation . |
datasetCitation |
string | Optional How to cite the data set. An actionable DOI is preferred (e.g., https://doi.org/...). Note that there is a serverCitation in an /about response for citing the data server, but datasetCitation is for the dataset citation, which is typically different. |
licenseURL |
string or array of strings | Optional A URL or array of URLs to a license landing page. If the license is in the spdx.org list, provide a link to it. A list of licenses implies that users may choose any one from the list. |
provenance |
string | Optional A description of the provenance of this dataset. |
modificationDate |
string | Optional Restricted ISO 8601 date/time of the modification of the any content in the dataset. |
contact |
string | Optional Relevant contact person name (and possibly contact information) for science questions about the dataset. |
contactID |
string | Optional The identifier in the discovery system for information about the contact. For example, the SPASE ID or ORCID of the person. |
additionalMetadata |
object | Optional A way to include a block of other (non-HAPI) metadata. See below for a description of the object, which can directly contain the metadata or point to it via a URL. |
definitions |
object | Optional An object containing definitions that are referenced using a JSON reference |
note |
string or array of strings | Optional General notes about the dataset that are inappropriate to include in description . For example, a change log that lists added parameters. If an array of strings is used to describe datestamped notes, we recommend prefixing the note with a HAPI restricted ISO 8601 format, e.g., ["2024-01-01T00: Note on this date time", "2024-04-02T00: Note on this date time"] . |
warning |
string or array of strings | Optional Temporary warnings about the dataset, such as "dataset stopDate is typically updated continuously, but ..." |
One optional attribute is unitsSchema
. This allows a server to specify, for each dataset, what convention is followed for the units
strings in the parameters of the dataset. Currently, the only allowed values for unitsSchema
are: udunits2
, astropy3
, and cdf-cluster
. These represent the currently known set of unit conventions with software available for parsing and interpreting unit strings. Only major version numbers (if available) are indicated in the convention name. It is expected that this list will grow over time as needed. The current locations of the official definitions and software tools for interpreting the various unit conventions are in the following table:
Convention Name | Current URL | Description (context help if link is broken) |
---|---|---|
udunits2 |
https://www.unidata.ucar.edu/software/udunits | Unidata from UCAR; a C library for units of physical quantities |
astropy3 |
https://docs.astropy.org/en/stable/units/ | package inside astropy that handles defining, converting between, and performing arithmetic with physical quantities, such as meters, seconds, Hz, etc |
cdf-cluster |
https://caa.esac.esa.int/documents/DS-QMW-TN-0010.pdf which is referenced on this page: https://www.cosmos.esa.int/web/csa/documentation | conventions created and used by ESA's Cluster mission |
vounits1.1 |
https://www.ivoa.net/documents/VOUnits/ | IVOA Recommendation for Units in the VO |
The parameter
object (described below) allows for a coordinateSystemName
to be associated with any parameter.
To allow a precise and computer-readable meaning to these coordinate system names, a dataset
can have a coordinateSystemSchema
. This schema should contain a computer-readable and publically available
list of coordinate system names and definitions. Few such schemas exist. One suggested schema designation
for Heliophysics is provided in the following table.
If more community-approved schemas emerge for coordinate systems, then they could be included in the table, and if these schemas become
widely adopted, it might make sense in the future to restrict coordinateSystemSchema
to a set of enumerated values, but for now
it is unconstrained.
Schema Name | URL | Description |
---|---|---|
spase2.4.1 |
https://spase-group.org/data/schema/spase-2.4.1.xsd | Schema containing the names and definitions of coordinate systems commonly used in Heliophysics. See also an ΗΤΜL version of relevant section of schema. |
This schema captures a Heliophysics-specific list of coordinate systems and is part of the larger SPASE metadata model (also Heliophysics-specific). It serves as an example of how a publically accessible list of coordinate frames can be referenced by the coordinateSystemSchema
keyword. Different communities can reference their schemas.
Within the parameter
object is the coordinateSystemName
keyword, which contains the name of the coordinate
system (must be from the schema). There is also the vectorComponents
keyword to capture the details about which
coordinate components are present in the parameter. See the description
below (the vectorComponents
section) for details on how to describe
parameters that contain directional (i.e., vector) quantities.
HAPI allows for bulk inclusion of additional metadata that may exist for a dataset. Additional metadata keywords can be inserted by prefixing them with x_
(which indicates that the element is not part of the HAPI standard), but this means any original metadata would have to modify its keywords.
The additionalMetadata
object is a list of objects represented by the table below and allows for one or more sets of additional metadata, which can be isolated from each other and HAPI keywords.
{
"additionalMetadata" : [ md1, md2, md3 ]
}
There can be one or more metadata objects (md1, md2, md3, etc above) in the list, and the keywords for these objects are as follows:
Keyword | Type | Description |
---|---|---|
name |
string | Optional the name of the additional metadata |
content |
string or JSON object | Required (if no contentURL ) either a string with the metadata content (for XML), or a JSON object representing the object tree for the additional metadata |
contentURL |
string | Required (if no content ) URL pointing to additional metadata |
schemaURL |
string | Optional points to computer-readable schema for the additional metadata |
aboutURL |
string | Optional points to human-readable explanation for the metadata |
The name
is appropriate if the additional metadata follows a known standard that people know about. One of content
or contentURL
must be present. The content
can be a string version of the actual metadata or a JSON object tree. If there is a schema reference embedded in the metadata (easy to do with XML and JSON), clients can figure that out, but if no internal schema is in the metadata, then the schemaURL
can point to an external schema. The aboutURL
is for humans to learn about the given type of additional metadata.
For the name
, please use these if appropriate CF
, FITS
, ISTP
, SPASE
. Other fields beyond Heliophysics will likely have their metadata names, which could be listed here if requested.
Example
{
"additionalMetadata" : [
{ "name": "SPASE",
"contentURL": "https://hpde.io/NASA/DisplayData/ACE/MAG/27-Day.xml",
"aboutURL": "http://spase-group.org"
},
{ "name": "CF",
"content": { "keyword" : "value1", "keyword2": "value2" },
"aboutURL": "https://cfconventions.org/"
},
{ "name": "ISTP",
"content": { "json object (not shown) representing the tree of ISTP keyword-value pairs" },
"aboutURL": "https://spdf.gsfc.nasa.gov/istp_guide/variables.html"
},
{ "name": "FITS",
"content": { "fits_header_keyword1" : "value1", "fits_header_keyword2": "value2" },
"aboutURL": "http://fits.gsfc.nasa.gov"
}
]
}
Note that no single dataset would likely have this variety of additional metadata -- these are provided for illustration.
stringType
is an optional element within each parameter
object, and it allows servers to indicate
that a string parameter has a special interpretation.
Currently, the only special stringType
allowed is a URI, and it can be used to indicate that a string
parameter contains a time series of links to resources (pointed to by the URIs).
The main use of HAPI is serving numeric data, but the ability to also serve URIs that point to data opens up two use cases for HAPI servers.
-
Serving image URIs. In this case, the images should be in a widely recognized format that could be easily interpreted by libraries available to many clients, such as JPG, PNG, etc.
-
Serving data file URIs to provide a list of files used to construct a HAPI numeric data response. For example, if a server has a dataset named
dataset1
, the files used to construct a request fordataset1
could be provided in a dataset nameddataset1Files
. In this case, a user can requestdataset1
over a time range and determine what filesdataset1
came from using a request fordataset1Files
over the same time range.It is emphasized that a HAPI server that provides only datasets with data file URIs that contain time series data that could be served as HAPI numeric data is not recommended. HAPI clients should only need to read a HAPI stream and not have to read and parse data in arbitrary file formats.
A recommended practice in both cases is also to include columns that provide metadata values.
The stringType
attribute can either have a simple value that is just the string uri
,
or it can be an object that is a dictionary with uri
as
the key and a value that is another object with three optional elements: mediaType
, scheme
, and base
.
Thus a stringType
will have one of the following forms:
"stringType": "uri"
or
"stringType": {
"uri": {
"mediaType": "image/png",
"scheme": "https",
"base": "https://cdaweb.gsfc.nasa.gov/pub/pre_generated_plots/de/pwi/"
}
}
The uri
object attributes are:
stringType Attribute | Type | Description |
---|---|---|
mediaType |
string | Optional indicates content type behind the URI (also referred to as MIME type) |
scheme |
string | Optional the access protocol for the URI |
base |
string | Optional allows each URI string value to be relative to a base URI |
The media type indicates the type of data each URI points to. HAPI places no constraints on the values
for mediaType
, but servers should only use standard values,
such as image/fits
, ' image/pngor
application/x-cdf`. There are standard lists of media types available,
and we do not repeat them in the HAPI specification.
The scheme
describes the access protocol. Again, there are no restrictions, but there is an expectation that it should
be a well-known protocol, such as http
or https
or ftp
or doi
or s3
(used for Amazon object stores).
The base
allows the individual string values for the parameter to be relative to a base URI, typically a web-accessible location ending in a slash.
URIs should follow the syntax outlines in RFC 3986. The basic pattern is:
URI = scheme ":" ["//" authority] path ["?" query] ["#" fragment]
URI strings should not be encoded because this is what most clients expect, and clients typically do their own encoding of a URI before issuing a request to retrieve the content.
The units for a string parameter that is a URI should be null
. The units value here should not be used
to try and describe the contents behind the URIs. URI content is likely too variable to be uniformly
handled by this simple units indicator.
Example:
"parameters":
[ { "name": "Time",
"type": "isotime",
"units": "UTC",
"fill": null,
"length": 24
},
{ "name": "solar_images",
"description": "full-disk images of the Sun by SDO/AIA at wavelength of 304 angstroms",
"type": "string",
"length": 64,
"stringType": {"uri": {"mediaType": "image/fits", "scheme": "https"}},
"units": null,
"fill": null
},
{ "name": "cadence",
"description": "time between images",
"type": "double",
"units": "sec",
"fill": "-999"
},
{ "name": "wavelength",
"description": "which wavelength filter was active for this image",
"type": "double",
"units": "angstrom",
"fill": "NaN"
},
{ "name": "contains_active_region",
"description": "boolean indicator of active region presence: 0=no, 1=yes",
"type": "integer",
"units": null,
"fill": "NaN"
}
]
This example shows what the parameters
portion of a HAPI info
response would look like for a set of solar images.
The parameter name solar_images
is given a stringType
of uri
and has a mediaType
and scheme
specified.
No base
is given, so the URIs must be fully qualified. There are also other parameters (cadence
,
wavelength
, and contains_active_region
) that could be used on the client side for filtering the images
based on the values of those parameters.
The approach shown here emphasizes a useful way for HAPI to provide image lists. HAPI queries can only constrain a set of images by time, but if the response contains metadata values in other columns, clients can restrict the image list further by filtering on values in the metadata columns.
The focus of the header is to list the parameters in a dataset. The first parameter in the list must be a time value. This time column serves as the independent variable for the dataset. The time column parameter may have any name, but its type must be isotime
, and there must not be any fill values in the data stream for this column. Note that the HAPI specification does not clarify if the time values given are the start, middle, or end of the measurement intervals. There can be other parameters of type isotime
in the parameter list. The table below describes the parameter
object attributes.
Parameter Attribute | Type | Description |
---|---|---|
name |
string | Required A short name for this parameter. Each parameter must have a unique name. It is recommended that all parameter names start with a letter or underscore, followed by letters, underscores, or numbers. This allows the parameter names to become variable names in computer languages. Parameter names in a dataset must be unique, and names are not allowed to differ only by having a different case. Note that because parameter names can appear in URLs that can serve as permanent links to data, changing them will have negative implications, such as breaking links to data. Therefore, parameter names should be stable over time. |
type |
string | Required One of string , double , integer , or isotime . Binary content for double is always 8 bytes in IEEE 754 format, integer is 4 bytes signed little-endian. There is no default length for string or isotime types. String parameters may include UTF-8 encoded Unicode characters. Strings may be flagged as representing URIs - see the optional stringType attribute in this table. |
length |
integer | Required For type string and isotime ; not allowed for others. The maximum number of bytes that the string may contain. If the response format is binary and a string has fewer than this maximum number of bytes, the string must be padded with ASCII null bytes. If the string parameter contains only ASCII characters, length means the maximum number of ASCII characters. If a string parameter contains UTF-8 encoded Unicode characters, length means the maximum number of bytes required to represent all of the characters. For example, if a string parameter can be A or α length: 2 is required because α in Unicode requires two bytes when encoded as UTF-8. HAPI clients that read CSV output from a HAPI server will generally not need to use the length parameter. However, for HAPI binary, the length parameter is needed for parsing the stream. See also the description of HAPI binary. |
size |
array of integers | Required For array parameters; not allowed for others. Must be a 1-D array whose values are the number of array elements in each dimension of this parameter. For example, "size"=[7] indicates that the value in each record is a 1-D array of length 7. For the csv and binary output, there must be 7 columns for this parameter -- one column for each array element, effectively unwinding this array. The json output for this data parameter must contain an actual JSON array (whose elements would be enclosed by [ ] ). For arrays 2-D and higher, such as "size"=[2,3] , the later indices are the fastest moving, so that the CSV and binary columns for such a 2 by 3 would be [0,0] , [0,1] , [0,2] and then [1,0] , [1,1] , [1,2] .Note that "size": [1] is allowed but discouraged because clients may interpret it as either an array of length 1 or as a scalar. Similarly, an array size of 1 in any dimension is discouraged because of ambiguity in how clients would treat this structure. Array sizes of arbitrary dimensionality are allowed, but from a practical view, clients typically support up to 3D or 4D arrays. See the size Details section for more about array sizes. |
units |
null OR string OR array of string | Required The units for the data values represented by this parameter. For dimensionless quantities, the value can be the literal string "dimensionless" or the special JSON value null . Note that an empty string "" is not allowed. For isotime parameters, the units must be UTC . If a parameter is a scalar, the units must be a single string. For an array parameter, a units value that is a single string means that the same units apply to all elements in the array. If the elements in the array parameter have different units, then units can be an array of strings to provide specific units strings for each element in the array. Individual values for elements in the array can also be "dimensionless" or null (but not an empty string) to indicate no units for that element. The shape of such a units array must match the shape given by the size of the parameter, and the ordering of multi-dimensional arrays of unit strings is as discussed in the size attribute definition above. See the following example responses to an info query for examples of a single string and string array units and additional details in units and label Arrays. |
fill |
string | Required A fill value indicates no valid data. If a parameter has no fill present for any records in the dataset, this can be indicated by using a JSON null for this attribute as in "fill": null See the fill Details section for more about fill values, including the issues related to specifying numeric fill values as strings. Since the primary time column cannot have fill values, it must specify "fill": null in the header. |
description |
string | Optional A brief, one-sentence description of the parameter. |
label |
string OR array of string | Optional A word or very short phrase that could serve as a label for this parameter (as on a plot axis or in a selection list of parameters). It is intended to be less cryptic than the parameter name. If the parameter is a scalar, this label must be a single string. If the parameter is an array, a single string label or an array of string labels are allowed. null and the empty string "" are not allowed. A single label string will be applied to all elements in the array, whereas an array of label strings specifies a different label string for each element in the array parameter. The shape of the array of label strings must match the size attribute, and the ordering of multi-dimensional arrays of label strings is as discussed in the size attribute definition above. See also the following example responses to an info query for examples of a single string and string array labels and additional details in Units and Label Arrays. |
stringType |
string or object | Optional A string parameter can have a specialized type. Currently, the only supported specialized type is a URI. See The stringType Object for more details on syntax and allowed values for stringType . |
coordinateSystemName |
string | Optional Some parameters contain directional or position information, such as look direction, spacecraft location, or a measured vector quantity. This keyword specifies the name of the coordinate system for these quantities. If coordinateSystemName is given, vectorComponents should be given (but this is not required). If a coordinateSystemSchema was given for this dataset, then the coordinateSystemName must come from the schema. See the vectorComponents Object for more about coordinate systems. |
vectorComponents |
string or array of strings | Optional The name or list of names of the vector-related elements. For a scalar parameter , only a single string indicating its component type is allowed. For an array parameter , an array of corresponding component names is expected. If a parameter has a coordinateSystemName and three elements, and they represent Cartesian (x,y,z) values, then as a convenience, you may omit the vectorComponents . For any other situation (not three elements, not Cartesian, not ordered as x,y,z), then you must specify the vectorComponents explicitly. See the vectorComponents Object for additional details. |
bins |
array of Bins object | Optional For array parameters, each object in the bins array corresponds to one of the dimensions of the array and describes values associated with each element in the corresponding dimension of the array. The bins object table below describes all required and optional attributes within each bins object. For example, if the parameter represents a 1-D frequency spectrum, the bins array will have one object describing the frequency values for each frequency bin; within that object, the centers attribute points to an array of values to use for the central frequency of each channel, and the ranges attribute specifies a range associated with each channel. |
Example
These examples show an info
response for a hypothetical magnetic field dataset.
http://server/hapi/info?dataset=ACE_MAG
Example Response:
{
"HAPI": "3.3",
"status": { "code": 1200, "message": "OK"},
"startDate": "1998-001Z",
"stopDate" : "2017-100Z",
"coordinateSystemSchema" : { "schemaName": "SPASE", "schemaURI": "TBD"},
"parameters": [
{ "name": "Time",
"type": "isotime",
"units": "UTC",
"fill": null,
"length": 24 },
{ "name": "radial_position",
"type": "double",
"units": "km",
"fill": null,
"description": "radial position of the spacecraft",
"label": "R Position"},
{ "name": "quality_flag",
"type": "integer",
"units": "none",
"fill": null,
"description ": "0=OK and 1=bad"},
{ "name": "mag_GSE",
"type": "double",
"units": "nT",
"fill": "-1e31",
"size" : [3],
"coordinateSystemName": "GSE",
"description": "hourly average Cartesian magnetic field in nT in GSE",
"label": "B field in GSE"}
]
}
The size
attribute is required for array parameters and not allowed for others. The length of the size
array indicates the number of dimensions, and each element in the size
array indicates the number of elements in that dimension. For example, the size attribute for a 1-D array would be a 1-D JSON array of length one, with the one element in the JSON array indicating the number of elements in the data array. For a spectrum, this number of elements is the number of wavelengths or energies in the spectrum. Thus "size": [9]
refers to a data parameter that is a 1-D array of length 9, and in the csv
and binary
output formats, there will be 9 columns for this data parameter. In the json
output for this data parameter, each record will contain a JSON array of 9 elements (enclosed in brackets [ ]
).
For arrays of size 2-D or higher, the column orderings need to be specified for the csv
and binary
output formats. For both of these formats, the array needs to be "unrolled" into individual columns. The mapping of 2-D array element to an unrolled column index is done so that the later array elements change the fastest. For example, given a 2-D array of "size": [2, 5]
, the 5-item index changes the most quickly. Items in each record will be like this: [0,0] [0,1] [0,2] [0,3] [0,4] [1,0] [1,1] [1,2] [1,3] [1,4]
. The ordering is similar for higher dimensions.
No unrolling is needed for JSON arrays because JSON syntax can represent arrays of any dimension. The following example shows one record of data with a time parameter and a single data parameter "size":[2,5]
(of type double):
["2017-11-13T12:34:56.789Z", [ [0.0, 1.1, 2.2, 3.3, 4.4], [5.0, 6.0, 7.0, 8.0, 9.0] ] ]
Time-Varying size
If the size of a dimension in a multi-dimensional parameter changes over time, the only way to represent this in HAPI is to define the parameter as having the largest potential size
, and then use a fill
value for any data elements which are no longer actually being provided.
If this size-changing parameter has bins, then the number of bins would also presumably change over time. Servers can indicate the absence of one or more bins by using the time-varying bin mechanism described above and then providing all fill values for the ranges
and centers
of the records where those bins are absent.
The following example shows a conceptual data block (not in HAPI format) where there is an array parameter whose values are in columns d0
through d3
. The corresponding bin centers are in the columns c0
through c3
. The data block shows what happens in the data if the size of the parameter changes from 4 to 3 after the third time step. The data values change to fill (-1.0e31
in this case), and the values for the centers also change to fill to indicate that the corresponding array elements are no longer valid elements in the array.
time data0 data1 data2 data3 center0 center1 center1 center3
2019-01-01T14:10:30.5 1.2 3.4 5.4 8.9 10.0 20.0 30.0 40.0
2019-01-01T14:10:31.5 1.1 3.6 5.8 8.4 10.0 20.0 30.0 40.0
2019-01-01T14:10:32.5 1.4 3.8 5.9 8.3 10.0 20.0 30.0 40.0
2019-01-01T14:10:33.5 1.3 3.1 5.3 -1.0e31 15.0 25.0 35.0 -1.0e31
2019-01-01T14:10:34.5 1.2 3.0 5.4 -1.0e31 15.0 25.0 35.0 -1.0e31
2019-01-01T14:10:35.5 1.2 3.0 5.4 -1.0e31 15.0 25.0 35.0 -1.0e31
Note that the fill value in the bin centers column indicates that this data3
array element is gone more permanently than just finding a fill value in data3
. Finding some fill values in an array parameter would not necessarily indicate that the column was permanently gone, while the bin center being fill indicates that the array size has effectively changed. If a bin center is fill, the corresponding data column should also be fill, even though this is duplicate information (since having a fill center3
in a record already indicates a non-usable data3
in that record.)
Recall that the static centers
and ranges
objects in the JSON info
header cannot contain null or fill values.
Note that fill values for all types must be specified as a string (not just as ASCII within the JSON, but as a literal JSON string inside quotes). For double
and integer
types, the string should correspond to a numeric value. In other words, using a string like invalid_int
would not be allowed for an integer fill value. Care should be taken to ensure that the string value given will have an exact numeric representation and special care should be taken for double
values, which can suffer from round-off problems. For integers, string fill values must correspond to an integer value small enough to fit into a 4-byte signed integer. For double
parameters, the fill string must parse to an exact IEEE 754 double representation. One suggestion is to use large negative integers, such as -1.0E30
. The string NaN
is allowed, in which the case csv
output should contain the string NaN
for fill values. For binary
data output with double NaN values, the bit pattern for quiet NaN should be used, as opposed to the signaling NaN, which should not be used (see [6]). For string
and isotime
parameters, the string fill
value is used at face value, and it should have a length that fits in the length of the data parameter.
When a scalar units
value is given for an array parameter, the scalar is assumed to apply to all elements in the array -- a kind of broadcast application of the single value to all values in the array. For multi-dimensional arrays, the broadcast applies to all elements in every dimension. A partial broadcast to only one dimension in the array is not allowed. Either a full set of unit strings are given to describe every element in the multi-dimensional array, or a single value is given to apply to all elements. This allows for the handling of special cases while keeping the specification simple. The same broadcast rules govern labels.
The previous example included the optional label
attribute for some parameters. Using a single string for the units
and label
of the array parameter mag_GSE
indicates that all array elements have the same units and label. The next example shows an info
response for a magnetic field dataset where the vector components are assigned distinct units and labels.
Example
{
"HAPI": "3.3",
"status": {"code": 1200, "message": "OK"},
"startDate": "1998-001Z",
"stopDate" : "2017-100Z",
"parameters": [
{ "name": "Time",
"type": "isotime",
"units": "UTC",
"fill": null,
"length": 24 },
{ "name": "radial_position",
"type": "double",
"units": "km",
"fill": null,
"description": "radial position of the spacecraft",
"label": "R Position"},
{ "name": "quality_flag",
"type": "integer",
"units": "none",
"fill": null,
"description ": "0=OK and 1=bad " },
{ "name": "mag_GSE",
"description": "B field as magnitude and two angles theta (colatitude) and phi (longitude)",
"type": "double",
"size" : [3],
"coordinateSystemName": "GSE",
"vectorComponents": [ "r", "colatitude", "longitude" ],
"units": ["nT","degrees", "degrees"],
"label": ["B Magnitude", "theta", "phi"],
"fill": "-1e31" }
]
}
Each element in the string array applies to the corresponding element in the mag_GSE
data array.
The following are examples for units
and label
values for a parameter of size=[2,3]
. The parameter is vector velocity from two separate instruments. In a JSON response, the parameter at a given time would have the form
["2017-11-13T12:34:56.789Z", [ [1, 2, 3] [4, 5, 6] ] ]
The [1,2,3]
are measurements from the first instrument and the [4, 5, 6]
are measurements from the second instrument.
Allowed (scalar for units
and for label
applies to all elements in the array)
"size": [2,3]
"units": "m/s",
"label": "velocity"
Allowed (scalar units applied to all six elements in the array; unique label for each element)
"size": [2,3],
"units": "m/s",
"label": [["V1x","V1y","V1z"],["V2x","V2y","V2z"]]
Allowed (array of length 1 is treated like scalar; not preferred but allowed)
"size": [2,3]
"units": ["m/s"],
"label": [["V1x","V1y","V1z"],["V2x","V2y","V2z"]]
Allowed (all elements are properly given their own units
string)
"size": [2,3],
"units": [["m/s","m/s","km/s"],["m/s","m/s","km/s"]],
"label": [["V1x","V1y","V1z"],["V2x","V2y","V2z"]]
Not Allowed (array size does not match parameter size -- must specify all units
elements if not just giving a scalar)
"size": [2,3],
"units": ["m/s","m/s","km/s"],
"label": [["V1x","V1y","V1z"],["V2x","V2y","V2z"]]
Not Allowed (units
array size does not match parameter size)
"size": [2,3]
"units": ["m/s",["m/s","m/s","km/s"]],
"label": [["V1x","V1y","V1z"],["V2x","V2y","V2z"]]
For a parameter
that describes a vector-related quantity (position of spacecraft relative to a body, location of ground station, direction of measured vector quantity, detector look direction), the vectorComponents
keyword indicates the vector-related components present in the data. For a scalar parameter
that is associated with a vector component, this keyword contains a single string describing the component. For non-scalar parameters, this keyword contains an array of strings naming the coordinate values present in the parameter.
The names represent the mathematical components typically found in vector or positional quantities.
Possible component names are constrained to be one of the following:
Component Name | Meaning |
---|---|
x |
Cartesian x component of vector |
y |
Cartesian y component of vector |
z |
Cartesian z component of vector |
r |
Magnitude of vector |
rho |
Magnitude of radial component (perpendicular distance from z-axis) in a Cylindrical coordinate representation |
latitude |
Angle relative to x-y plane from -90° to 90°, or -π to π (90° corresponds to +z axis) |
colatitude |
Angle relative to +z axis, from 0 to 180°, or 0 to π |
longitude |
Angle relative to +x axis of a projection of the vector into the x-y plane, from -180° to 180°, or -π to π (90° corresponds to +y axis; this is also known as "East longitude") |
longitude0 |
Angle relative to +x axis of a projection of the vector into the x-y plane, from 0° to 360°, or 0 to 2π (270° corresponds to -y axis; this is also known as "East longitude") |
altitude |
Altitude above a reference sphere or surface as identified in coordinateSystemName . For example, Earth's locations are often described by longitude, latitude, and altitude. |
other |
Any parameter component that cannot be described by a name in this list |
Note that vectorComponents
should be interpreted as describing vector-related elements. In some instances, the vectorComponents
will describe a proper vector (e.g., if vectorComponents = ["x", "y", "z"]
).
Many angular quantities in datasets have different names than the ones here (azimuth instead of longitude, or elevation or inclination instead of latitude), but most quantities directly map to these commonly used component names, which come from spherical or cylindrical coordinate systems. A future version of HAPI may offer a way to represent other angular components.
The following parameter object of an /info
response (but not a full /info
response) demonstrates the use of directional quantities.
"parameters": [
{ "name": "spacecraftLocationXYZ",
"description": "S/C location in X,Y,Z; vector direction from center of Earth to spacecraft",
"size": [3],
"units": "km",
"coordinateSystemName": "GSM",
"vectorComponents": ["x", "y", "z"] },
{ "name": "spacecraftLocationSpherical",
"description": "position in R, MLT and magnetic latitude",
"size": [3],
"units": ["Re", "hours", "degrees" ],
"coordinateSystemName": "GSM",
"vectorComponents": ["r", "longitude", "latitude"] },
{ "name": "magnetic_field",
"description": "mag vector in Cartesian coords; components are x,y,z which is the assumed default for size [3]",
"size": [3],
"units": "nT",
"coordinateSystemName": "GSM" },
{ "name": "magnetic_field_cylindrical",
"description": "mag vector in cylindrical coords",
"size": [3],
"units": ["nT","degrees", "nT"],
"coordinateSystemName": "GSM",
"vectorComponents": ["rho", "longitude", "z"] },
{ "name": "Bx",
"description": "mag X component",
"units": "nT",
"coordinateSystemName": "GSE",
"vectorComponents": "x" },
{ "name": "By",
"description": "mag Y component",
"units": "nT",
"coordinateSystemName": "GSE",
"vectorComponents": "y" },
{ "name": "Bz",
"description": "mag Z component",
"units": "nT",
"coordinateSystemName": "GSE",
"vectorComponents": "z" },
]
In this example, the spacecraft position is provided in two different ways, Cartesian and spherical.
The default value for vectorComponents
is ["x", "y", "z"]
, and so it may be included if
desired, as is the case with the first position parameter, spacecraftLocationXYZ
, or omitted
as is shown for the magnetic_field
parameter. If the units
provided is a scalar, it applies
to all components of the parameter (this is true regardless of whether the parameter
is a vector).
The parameter magnetic_field_cylindrical
is indeed in cylindrical coordinates, so it does list
specific vector components of ["rho", "longitude", "z"]
, and since the units of each component are
not the same, those are listed in an array as ["nT", "degrees", "nT"]
.
Note that the longitude
units of the spacecraftLocationSpherical
parameter are hours
. This
must be a floating point value to capture fractions of hours, not a string with hours, minutes, and seconds.
So, within the data values for this parameter component, 20.5 would be an interpretable value
for 20 hours 30 minutes, but 20:30:00 would not be ok.
The last three parameters in this example (Bx
, By
, and Bz
) illustrate how a vector quantity
can be spread across multiple scalar parameters. The vectorComponents
attribute identifies which
component is in the parameter. All relevant components would need to be manually combined to create the
vector quantity.
Some datasets have parameters whose measurements are associated with a fixed location. Other measurements
are made from locations that change with time.
The location
attribute can express the location of measurements either as a single, fixed location,
or as location values that change with time. An alternative attribute, geoLocation
, can
be used instead as a compact way of describing an Earth-based position for a dataset's measurements.
Only location
or geoLocation
should be used, not both.
For the case of a fixed location on Earth, the geoLocation
attribute concisely
represents a point location with an array of longitude, latitude, and optionally altitude.
"geoLocation": [longitude, latitude]
-OR-
"geoLocation": [longitude, latitude, altitude]
Angles in geoLocation
must be in deg
and altitude in m
and the coordinate system is WGS84 (see Geodetic Coordinate Systems).
This makes geoLocation
consistent with the GeoJSON specification for a point location.
As an example, the location of measurements made by a hypothetical magnetometer at the peak of Sugarloaf Mountain in Maryland, USA, could be represented as:
"geoLocation": [-77.395, 39.269, 391.0]
For a fixed location expressed in a different coordinate system, or with other vector components for the location,
use a location
object with these four attributes:
location Attribute | Type | Description |
---|---|---|
point |
double or double array | Required values to specify the location |
vectorComponents |
string or string array | Required the kind of vectorComponents values in the point array. Number of values should match what is in point . |
units |
string or string array | Required units string or strings for the vectorComponents values. To see how units are applied to scalars or arrays, see the section about units and arrays |
coordinateSystemName |
string | Required the name of the coordinate system for the vectorComponents quantities |
If a unitsSchema
has been specified for this dataset, any string given for the units
must be consistent with that schema. Similarly, if a coordinateSystemSchema
has been specified for this dataset, any string given for the coordinateSystemName
must be consistent with that schema.
Examples
A verbose version of "geoLocation": [-77.395, 39.269, 391.0]
:
"location": {
"point": [-77.395, 39.269, 391.0],
"components": ["longitude", "latitude", "altitude"],
"units": ["deg", "deg", "m"],
"coordinateSystemName": "wgs84"
}
Location in non-WGS84 coordinate system and with cartesian vector components:
"location": {
"point": [-4.1452e3, 1.2050e3, 0.10201e3],
"components": ["x", "y", "z"],
"units": "km",
"coordinateSystemName": "GSE"
}
Note that in the second example, the units value of km
applies to all components elements.
Time-Varying Locations
If a dataset has parameters where the measurement location changes over time, the location
object can indicate
the names of other parameters in the dataset that contain the corresponding time-varying locations. If the time-varying position is
present in more than one coordinate system, each can be referenced. Therefore, the location
attribute is an array (outer array) consisting of a list of inner arrays of string parameter names.
If a parameter containing location info has all the vectorComponents
in it to fully represent the location, then the
inner array will have just one element: the name of the fully sufficient parameter.
If the position info for one coordinate system is spread over multiple parameters, then
each parameter name needs to be in the inner array.
"location": {
"parameters": [ ["param_name_for_location_using_coord_sys_A"], ["param_name_for_location_using_coord_sys_B"] ]
# each parameter here must be a vector and have in its attributes a full set of `vectorComponents` to describe the vector
}
Examples help illustrate:
"location": {
"parameters": [ ["Location_GEO"], ["Location_GSE"] ]
}
In this first example, two parameters provide position info, each in a different coordinate system.
Within the definition of each location parameter, there must be a vectorComponent
description.
location: {
"parameters": [ ["Location_GEO_X", "Location_GEO_Y", "Location_GEO_Z],
["Location_J2000_X", "Location_J2000_Y", "Location_J2000_Z"]
]
}
In this second example, there are also two coordinate systems, but each one is expressed across three parameters, one each for the x, y, and z values of the position.
The bins attribute of a parameter is an array of JSON objects with the following attributes.
Bins Attribute | Type | Description |
---|---|---|
name |
string | Required Name for the dimension (e.g. "Frequency"). |
centers |
array of n doubles | Required if no ranges The centers of each bin range. |
ranges |
array of n arrays of 2 doubles | Required if no centers The boundaries for each bin (array of [min, max] ). |
units |
string | Required The units for the bin ranges and/or center values. |
label |
string | Optional A label appropriate for a plot (use if name is not appropriate) |
description |
string | Optional Brief comment explaining what the bins represent. |
Notes:
- At least one of
ranges
andcenters
must be given. - Some dimensions of a multi-dimensional parameter may not represent binned data. Each dimension must be described in the
bins
object, but any dimension not representing binned data should indicate this by using"centers": null
and not including the'ranges'
attribute. - Bin centers and/or ranges may depend on time (but the number of centers and/or ranges may not change), see the description of time-varying bins later in this section.
- Bin centers and/or ranges may not depend on non--time dimensions. For example, suppose the parameter dimensions are time, energy and pitch angle. The pitch angle bin centers and/or ranges cannot depend on the energy bins.
Example
"parameters":
[
{
"name": "Time",
"type": "isotime",
"units": "UTC",
"fill": null,
"length": 24
},
{
"name": "Protons_10_to_20_keV_pitch_angle_spectrogram",
"type": "double",
"units": "1/(cm^2 s^2 ster keV)",
"fill": "-1.0e31",
"size": [6],
"bins": [{
"name": "angle_bins",
"ranges": [
[0.0, 30.0],
[30.0, 60.0],
[60.0, 90.0],
[90.0, 120.0],
[120.0, 150.0],
[150.0, 180.0]
],
"units": "degrees",
"label": "Pitch Angle"
}]
}
]
The data given for centers
and ranges
must not contain any null
or missing values. The number of valid numbers in the centers
array and the number of valid min/max pairs in the ranges
array must match the size of the parameter dimension being described. So this is not allowed:
Invalid
centers = [2.0, 3.0, 4.0],
ranges = [[1.0, 3.0], null, [3.0, null]]
Invalid
centers = [2.0, 3.0, null]
Valid
centers = [2.0, 3.0, 4.0],
ranges = [[1.0, 3.0], [2.0, 4.0], [3.0, 5.0]]
Time-Varying Bins
In some datasets, the bin centers and/or ranges may vary with time. The static values in the bins
object definition for ranges
or centers
are fixed arrays and therefore cannot represent bin boundaries that change over time. As of HAPI 3.0, the ranges
and centers
objects can be, instead of a numeric array, a string value that is the name of another parameter in the dataset. This allows the ranges
and centers
objects to point to a parameter that is the source of numbers for the bin centers
or ranges
. The size of the target parameter must match that of the bins being represented. And, of course, each record of data can contain a different value for the parameter, effectively allowing the bin ranges
and centers
to change potentially at every time step.
Due to its complexity, plotting data with time-varying bins may not be supported by all clients. We recommend that if not supported, the client communicates this to the user.
The following example shows a dataset of multi-dimensional values: proton intensities over multiple energies and at multiple pitch angles. The data parameter name is proton_spectrum
, and it has bins for both an energy dimension (16 different energy bins) and a pitch angle dimension (3 different pitch angle bins). For the bins in both of these dimensions, a parameter name is given instead of numeric values for the bin locations. The parameter energy_centers
contains an array of 16 values at each time step, and these are to be interpreted as the time-varying centers of the energies. Likewise, there is a pitch_angle_centers
parameter, which serves as the source of numbers for the centers of the other bin dimension. There are also ranges
parameters that are two-dimensional elements since each range consists of a high and low value.
Note that the comments embedded in the JSON (with a prefix of //
) are for human readers only since comments are not supported in JSON.
{
"HAPI": "3.3",
"status": {"code": 1200, "message": "OK"},
"startDate": "2016-01-01T00:00:00.000Z",
"stopDate": "2016-01-31T24:00:00.000Z",
"parameters":
[ { "name": "Time",
"type": "isotime",
"units": "UTC",
"fill": null,
"length": 24
},
{ "name": "proton_spectrum",
"type": "double",
"size": [16,3],
"units": "particles/(sec ster cm^2 keV)",
"fill": "-1e31",
"bins":
[
{ "name": "energy",
"units": "keV",
"centers": "energy_centers",
"ranges": "energy_ranges"
},
{ "name": "pitch_angle",
"units": "degrees",
"centers": "pitch_angle_centers",
"ranges": "pitch_angle_ranges"
}
]
},
{
"name": "energy_centers",
"type": "double",
"size": [16], // 16 matches size[0] in #/proton_spectrum/size
"units": "keV", // Should match #/proton_spectrum/units
"fill": "-1e31" // Clients should interpret as meaning no measurement made in bin
},
{ "name": "energy_ranges",
"type": "double",
"size": [16,2], // 16 matches size[0] in #/proton_spectrum/size; size[1] must be 2
"units": "keV", // Should match #/proton_spectrum/units
"fill": "-1e31" // Clients should interpret as meaning no measurement made in bin
},
{ "name": "pitch_angle_centers",
"type": "double",
"size": [3], // 3 matches size[1] in #/proton_spectrum/size
"units": "degrees", // Should match #/proton_spectrum/units
"fill": "-1e31" // Clients should interpret as meaning no measurement made in bin
},
{ "name": "pitch_angle_ranges",
"type": "double",
"size": [3,2], // 3 matches size[1] in #/proton_spectrum/size; size[1] must be 2
"units": "degrees", // Should match #/proton_spectrum/units
"fill": "-1e31" // Clients should interpret as meaning no measurement made in bin
}
]
}
If the same information appears more than once within the info
response, it is better to represent this in a structured way rather than to copy and paste duplicate information. Consider a dataset with two parameters -- one for the measurement values and one for the uncertainties. If the two parameters both have bins
associated with them, the bin definitions would likely be identical. Having each bins
entity refer back to a pre-defined, single entity ensures that the bin values are identical, and it also more readily communicates the connection to users, who otherwise would have to do a value-by-value comparison to see if the bin values are the same.
JSON has a built-in mechanism for handling references. HAPI utilizes a subset of these features, focusing on the simple aspects implemented in many existing JSON parsers. Also, using only simple features makes it easier for users to implement custom parsers. Note that familiarity with the full description of JSON references (5), is helpful in understanding the use of references in HAPI described below.
JSON objects placed within a definitions
block can be pointed to using the $ref
notation. We begin with an example. This definitions
block contains this object called angle_bins
that can be referred to using the JSON syntax #/definitions/angle_bins
{
"definitions": {
"pitch_angle_bins": {
"name": "angle_bins",
"ranges": [ [0.0, 30.0],
[30.0, 60.0],
[60.0, 90.0],
[90.0, 120.0],
[120.0, 150.0],
[150.0, 180.0]
],
"units": "degrees",
"label": "Pitch Angle"
}
}
}
Here is a parameter fragment showing the reference used in two places:
{
"parameters": [
{ "name": "Time",
"type": "isotime",
"units": "UTC",
"fill": null,
"length": 24
},
{ "name": "Protons_10_to_20_keV_pitch_angle_spectrogram",
"type": "double",
"units": "1/(cm^2 s^2 ster keV)",
"fill": "-1.0e38",
"size": [6],
"bins": [
{"$ref" : "#/definitions/angle_bins"}
]
},
{ "name": "Uncert_for_Protons_10_to_20_keV_pitch_angle_spectrogram",
"type": "double",
"units": "1/(cm^2 s^2 ster keV)",
"fill": "-1.0e38",
"size": [6],
"bins": [
{"$ref" : "#/definitions/angle_bins"}
]
}
]
}
The following rules govern the use of JSON references a HAPI info response.
- Anything referenced must appear in a top-level node named
definitions
(this is a JSON Schema convention [5] but a HAPI requirement). - Objects in the
definitions
node may not contain references (JSON Schema [5] allows this, HAPI does not) - Referencing by
id
is not allowed (JSON Schema [5] allows this, HAPI does not) name
may not be a reference (names must be unique anyway - this would make HAPIinfo
potentially very confusing).
By default, a server resolves these references and excludes the definitions node. Stated more directly, a server should not return a definitions
block unless the request URL includes
resolve_references=false
in which case the response metadata should contain references to items in the definitions
node. Note these constraints on what can be in the definitions
:
- Any element found in the
definitions
node must be used somewhere in the full set of metadata. Note that this full metadata can be obtained via aninfo
request or by adata
request to which the header is prepended (usinginclude=header
). - If an
info
request or adata
request withinclude=header
is for a subset of parameters (e.g.,/hapi/info?id=DATASETID¶meters=p1,p2)
, thedefinitions
node may contain objects that are not referenced in the metadata for the requested subset of parameters; removal of unused definitions is optional in this case.
Here, then, is a complete example of an info response with references unresolved, showing a definitions
block and the use of references. The units
string is commonly used, so it is captured as a reference, as is the full bins definition. Note that this example shows how just a part of the bins
object could be represented -- the centers
object in this case. The example is valid HAPI content, but normally, using both approaches in a single info
response would not make sense.
{
"HAPI": "3.3",
"status": {
"code": 1200,
"message": "OK"
},
"startDate": "2016-01-01T00:00:00.000Z",
"stopDate": "2016-01-31T24:00:00.000Z",
"definitions": {
"spectrum_units": "particles/(sec ster cm^2 keV)",
"spectrum_units_explicit": ["particles/(sec ster cm^2 keV)","particles/(sec ster cm^2 keV)","particles/(sec ster cm^2 keV)","particles/(sec ster cm^2 keV)"],
"spectrum_bins_centers": [15, 25, 35, 45],
"spectrum_bins": {
"name": "energy",
"units": "keV",
"centers": [15, 25, 35, 45]
}
},
"parameters": [{
"name": "Time",
"type": "isotime",
"units": "UTC",
"fill": null,
"length": 24
},
{
"name": "proton_spectrum",
"type": "double",
"size": [4],
"units": {"$ref": "#/definitions/spectrum_units"},
"fill": "-1e31",
"bins": [{
"name": "energy",
"units": "keV",
"centers": {"$ref": "#/definitions/spectrum_bins_centers"}
}]
},
{
"name": "proton_spectrum2",
"type": "double",
"size": [4],
"units": {"$ref": "#/definitions/spectrum_units"},
"bins": [
{"$ref": "#/definitions/spectrum_bins"}
]
},
{
"name": "proton_spectrum3",
"type": "double",
"size": [4],
"units": {"$ref": "#/definitions/spectrum_units_explicit"},
"bins": [
{"$ref": "#/definitions/spectrum_bins"}
]
}
]
}
This endpoint provides access to a dataset and allows for selecting time ranges and parameters to return. Data is returned as a CSV [2], binary, or JSON stream. The Data Stream Content section describes the stream structure and layout for each format.
The resulting data stream can be considered a stream of records, where each record contains one value for each dataset parameter. Each data record must contain a data value or a fill value (of the same data type) for each parameter.
Items with a * superscript in the following table have been modified from version 2 to 3; see change notes.
Name | Description |
---|---|
dataset * |
Required The identifier for the dataset (allowed characters). |
start * |
Required The inclusive begin time for the data to include in the response. |
stop * |
Required The exclusive end time for the data to include in the response. |
parameters |
Optional A comma-separated list of parameters to include in the response (allowed characters). Default is all; ...¶meters=&... in URL should be interpreted as meaning all parameters. |
include |
Optional Has one possible value of "header" to indicate that the info header should precede the data. The header lines will be prefixed with the "#" character. |
format |
Optional The desired format for the data stream. Possible values are "csv", "binary", and "json". |
The data
endpoint response is in one of three formats: CSV format as defined by [2] with a mime type of text/csv
; binary format where floating points number are in IEEE 754 [4] format and byte order is LSB and a mime type of application/octet-stream
; JSON format with the structure as described below and a mime type of application/json
. The default data format is CSV. See the Data Stream Content section for more details.
If the header is requested, then for binary and CSV formats, each line of the header must begin with a hash (#) character. For JSON output, no prefix character should be used because the data object will be another JSON element within the response. Other than the possible prefix character, the contents of the header should be the same as returned from the info endpoint. When a data stream has an attached header, the header must contain an additional "format" attribute to indicate if the content after the header is csv
, binary
, or json
. Note that when a header is included in a CSV response, the data stream is not strictly in CSV format.
The first parameter in the data must be a time column (type of isotime
) and this must be the independent variable for the dataset. If a subset of parameters is requested, the time column is always provided, even if it is not requested.
Note that the start
request parameter represents an inclusive lower bound, and the stop
request parameter is the exclusive upper bound. The server must return data records within these time constraints, i.e., no extra records outside the requested time range. This enables the concatenation of results from adjacent time ranges.
There is a relationship between the info
endpoint and the data
endpoint because the header from the info
endpoint describes the record structure of data emitted by the data
endpoint. Thus, after a single call to the info
endpoint, a client could make multiple calls to the data
endpoint (for multiple time ranges, for example) with the expectation that each data response would contain records described by the single call to the info
endpoint. The data
endpoint can optionally prefix the data stream with header information, potentially obviating the need for the info
endpoint.
Both the info
and data
endpoints take an optional request parameter (recall the definition of request parameter in the introduction) called parameters
that allows users to restrict the dataset parameters listed in the header and data stream, respectively. This enables clients (that already have a list of dataset parameters from a previous info or data request) to request a header for a subset of parameters that will match a data stream for the same subset of parameters. The parameters in the subset request must be ordered according to the original order of the parameters in the metadata, i.e., the subset can contain fewer parameters, but their order must be the same as in the parameter
array in the /info
response.
Consider the following dataset header for a fictional dataset with the identifier MY_MAG_DATA
.
An info
request for this dataset*
http://server/hapi/info?dataset=MY_MAG_DATA
results in a header listing of all the dataset parameters:
{
"HAPI": "3.3",
"status": { "code": 1200, "message": "OK"},
"startDate": "2005-01-21T12:05:00.000Z",
"stopDate" : "2010-10-18T00:00:00Z",
"parameters": [
{ "name": "Time",
"type": "isotime",
"units": "UTC",
"fill": null,
"length": 24 },
{ "name": "Bx", "type": "double", "units": "nT", "fill": "-1e31"},
{ "name": "By", "type": "double", "units": "nT", "fill": "-1e31"},
{ "name": "Bz", "type": "double", "units": "nT", "fill": "-1e31"},
]
}
An info
request for a single parameter has the form
http://server/hapi/info?dataset=MY_MAG_DATA¶meters=Bx
and would result in the following header:
{
"HAPI": "3.3",
"status": { "code": 1200, "message": "OK"},
"startDate": "2005-01-21T12:05:00.000Z",
"stopDate" : "2010-10-18T00:00:00Z",
"parameters": [
{ "name": "Time",
"type": "isotime",
"units": "UTC",
"fill": null,
"length": 24 },
{ "name": "Bx", "type": "double", "units": "nT", "fill": "-1e31" },
]
}
Note that the time parameter is included even though it was not requested.
In this request*,
http://server/hapi/info?dataset=MY_MAG_DATA¶meters=By,Bx
the parameters are out of order. So the server should respond with an error code. See HAPI Status Codes for more about error conditions.
Two examples of data requests and responses are given – one with the header and one without.
Note that in the following request, the header is to be included, so the same header from the info
endpoint will be prepended to the data but with a ‘#’ character as a prefix for every header line.
http://server/hapi/data?dataset=path/to/ACE_MAG&start=2016-01-01Z&stop=2016-02-01Z&include=header
Response
#{
# "HAPI": "3.3",
# "status": { "code": 1200, "message": "OK"},
# "format": "csv",
# "startDate": "1998-001Z",
# "stopDate" : "2017-001Z",
# "parameters": [
# { "name": "Time",
# "type": "isotime",
# "units": "UTC",
# "fill": null,
# "length": 24
# },
# { "name": "radial_position",
# "type": "double",
# "units": "km",
# "fill": null,
# "description": "radial position of the spacecraft"
# },
# { "name": "quality flag",
# "type": "integer",
# "units ": null,
# "fill": null,
# "description ": "0=OK and 1=bad "
# },
# { "name": "mag_GSE",
# "type": "double",
# "units": "nT",
# "fill": "-1e31",
# "size" : [3],
# "description": "hourly average Cartesian magnetic field in nT in GSE"
# }
# ]
#}
2016-01-01T00:00:00.000Z,6.848351,0,0.05,0.08,-50.98
2016-01-01T01:00:00.000Z,6.890149,0,0.04,0.07,-45.26
...
...
2016-01-01T02:00:00.000Z,8.142253,0,2.74,0.17,-28.62
The following example is the same, except it lacks the request to include the header.
http://server/hapi/data?dataset=path/to/ACE_MAG&start=2016-01-01&stop=2016-02-01
Response
Consider a dataset that contains a time field, two scalar fields, and one array field of length 3. The response will have the form:
2016-01-01T00:00:00.000Z,6.848351,0,0.05,0.08,-50.98
2016-01-01T01:00:00.000Z,6.890149,0,0.04,0.07,-45.26
...
...
2016-01-01T02:00:00.000Z,8.142253,0,2.74,0.17,-28.62
Note that there is no leading row with column names. The RFC 4180 CSV standard [2] indicates that such a header row is optional. Leaving out this row avoids the complication of naming individual columns representing array elements within an array parameter. Recall that an array parameter has only a single name. HAPI specifies parameter names via the info
endpoint, which also provides size details for each parameter (scalar or array, and array size if needed). The size of each parameter must be used to determine how many columns it will use in the CSV data. By not specifying a row of column names, HAPI avoids the need to have a naming convention for columns representing elements within an array parameter.
The three possible output formats are csv
, binary
, and json
. A HAPI server must support csv
, while binary
and json
are optional. We emphasize that this is a simple and ephemeral streaming transport format and should not be considered or used in the same way as standard file formats such as FITS, HDF, CDF, and netCDF, which were not designed for streaming.
The format of the CSV stream should follow the guidelines for CSV data as described by RFC 4180 [2]. Each CSV record is one line of text, with commas between the values for each dataset parameter. Any value containing a comma must be surrounded with double quotes, and any double-quote within a value must be escaped by a preceding double quote. An array parameter (i.e., the value of a parameter within one record is an array) will have multiple columns resulting from placing each element in the array into its own column. For 1-D arrays, the ordering of the unwound columns is just the index ordering of the array elements. For 2-D arrays or higher, the right-most array index is the fastest moving index when mapping array elements to columns.
It is up to the server to decide how much precision to include in the ASCII values when generating CSV output.
Clients programs interpreting the HAPI CSV stream are encouraged to use existing CSV parsing libraries to be able to interpret the full range of possible CSV values, including quoted commas and escaped quotes. However, it is expected that a simple CSV parser would probably handle more than 90% of known cases.
The binary data output is best described as a binary translation of the CSV stream, with full numerical precision and no commas or newlines. Recall that the dataset header provides type information for each dataset parameter, and this definitively indicates the number of bytes and the byte structure of each parameter and, thus, of each binary record in the stream. Array parameters are unwound in the same way for binary as for CSV data as described above. All numeric values are little-endian (LSB), integers are always signed and four-byte and floating-point values are always IEEE 754 double-precision values.
Dataset parameters of type string
and isotime
(strings of ISO 8601 dates) have a maximum length specified in the info header. This length indicates how many bytes to read for each string value. If the string content is less than the length, the remaining bytes must be padded with ASCII null bytes. If a string uses all the bytes specified in the length
, no null terminator or padding is needed.
For the JSON output, an additional data
element added to the header contains the array of data records. These records are similar to the CSV output, except that strings must be quoted and arrays must be delimited with array brackets in standard JSON fashion. An example helps to illustrate what the JSON format looks like. Consider a dataset with four parameters: time, a scalar value, a 1-D array value with an array length of 3, and a string value. The header with the data object might look like this:
{ "HAPI": "3.3",
"status": { "code": 1200, "message": "OK"},
"startDate": "2005-01-21T12:05:00.000Z",
"stopDate" : "2010-10-18T00:00:00Z",
"parameters": [
{ "name": "Time", "type": "isotime", "units": "UTC", "fill": null, "length": 24 },
{ "name": "quality_flag", "type": "integer", "description": "0=ok; 1=bad", "fill": null },
{ "name": "mag_GSE", "type": "double", "units": "nT", "fill": "-1e31", "size" : [3],
"description": "hourly average Cartesian magnetic field in nT in GSE" },
{ "name": "region", "type": "string", "length": 20, "fill": "???", "units" : null}
],
"format": "json",
"data" : [
["2010-001T12:01:00Z",0,[0.44302,0.398,-8.49],"sheath"],
["2010-001T12:02:00Z",0,[0.44177,0.393,-9.45],"sheath"],
["2010-001T12:03:00Z",0,[0.44003,0.397,-9.38],"sheath"],
["2010-001T12:04:00Z",1,[0.43904,0.399,-9.16],"sheath"]
]
}
The data element is a JSON array of records. Each record is itself an array of parameters. The time and string values are in quotes, and any data parameter in the record that is an array must be inside square brackets. This data element appears as the last JSON element in the header.
The record-oriented arrangement of the JSON format is designed to allow a streaming client reader to begin reading (and processing) the JSON data stream before it is complete. Note also that servers can stream the data when records are available. In other words, the JSON format can be read and written without holding all the records in memory. This may require a custom JSON formatter, but this streaming capability is important for large responses.
If the server encounters an error after the data has begun and can no longer continue, it must terminate the stream. As a result, clients must detect an abnormally terminated stream and treat this aborted condition the same as an internal server error. See HAPI Status Codes for more about error conditions.
Time values are always strings, and the HAPI Time format is a subset of the ISO 8601 date and time format [1].
The restriction on the ISO 8601 standard is that time must be represented as
yyyy-mm-ddThh:mm:ss.sssZ
or
yyyy-dddThh:mm:ss.sssZ
and the trailing Z
is required. Strings with less precision are allowed as per ISO 8601, e.g., 1999-01Z
and 1999-001Z
. The HAPI JSON schema lists a series of regular expressions that codifies the intention of the HAPI Time specification. The schema allows leap seconds and hour=24, but it should be expected that not all clients will be able to correctly interpret such time stamps.
The name of the time parameter is not constrained. However, the time column name is strongly recommended to be "Time" or "Epoch" or some easily recognizable label.
Note that the ISO 8601 time format allows arbitrary precision on the time values. HAPI servers and clients should be able to interpret times with at least nanosecond precision.
Servers must require incoming time values from clients (i.e., the start
and stop
values on a data request) to be valid ISO 8601 time values. The full ISO 8601 specification allows many esoteric options, but servers must only accept a subset of the full ISO 8601 specification, namely one of either year-month-day (yyyy-mm-ddThh:mm:ss.sssZ
) or day-of-year (yyyy-dddThh:mm:ss.sssZ
). Any date or time elements missing from the string are assumed to take on their smallest possible value. For example, the string 2017-01-15T23:00:00.000Z
could be given in truncated form as 2017-01-15T23Z
. Servers should be able to parse and properly interpret these truncated time strings. When clients provide a date at day resolution only, the T
must not be included, so servers should be able to parse day-level time strings without the T
, as in 2017-01-15Z
.
Note that in the ISO 8601 specification, a trailing Z
on the time string indicates that no time zone offset should be applied (so the time zone is GMT+0). If a server receives an input value without the trailing Z
, it should still interpret the time zone as GMT+0 rather than a local time zone. This is true for time strings with all fields present and for truncated time strings with some fields missing.
Example time range request | comments |
---|---|
start=2017-01-15T00:00:00.000Z&stop=2017-01-16T00:00.000Z |
OK - fully specified time value with proper trailing Z |
start=2017-01-15Z&stop=2017-01-16Z |
OK - truncated time value that assumes 00:00.000 for the time |
start=2017-01-15&stop=2017-01-16 |
OK - truncated with missing trailing Z, but GMT+0 should be assumed |
There is no restriction on the earliest date or latest date a HAPI server can accept, but as a practical limit, clients are likely to be written to handle dates only from 1700 to 2100.
Time values in the outgoing data stream must be ISO 8601 strings. A server may use one of either the yyyy-mm-ddThh:mm:ss.sssZ
or the yyyy-dddThh:mm:ss.sssZ
form, but must use one format and length within any given dataset. The time values must not have any local time zone offset, and they must indicate this by including the trailing Z
. Time or date elements may be omitted from the end to indicate that the missing time or date elements should be given their lowest possible value. For date values at day resolution (i.e., no time values), the T
must be omitted, but the Z
is still required. Note that this implies that clients must be able to parse potentially truncated ISO strings of both Year-Month-Day and Year-Day-of-year styles.
For binary
and csv
data, the length of the time string, truncated or not, is indicated with the length
attribute for the time parameter, which refers to the number of printable characters in the string. Every time string must have the same length, so padding of time strings is unnecessary.
The data returned from a request should strictly fall within the limits of start
and stop
, i.e., servers should not pad the data with extra records outside the requested time range. Furthermore, note that the start
value is inclusive (data at or beyond this time can be included), while stop
is exclusive (data at or beyond this time shall not be included in the response).
The primary time column is not allowed to contain any fill values. Each record must be identified with a valid time value. If other columns contain parameters of type isotime
(i.e., time columns that are not the primary time column), there may be fill values in these columns. Note that the fill
definition is required for all types, including isotime
parameters. The fill value for a (non-primary) isotime
parameter does not have to be a valid time string - it can be any string, but it must be the same length string as the time variable.
HAPI metadata (in the info
header for a dataset) allows a server to specify where timestamps fall within the measurement window. The timeStampLocation
attribute for a dataset is an enumeration with possible values of begin
, center
, end
, or other
. This attribute is optional, but the default value is center
, which refers to the exact middle of the measurement window. If the location of the timestamp is not known or is more complex than any of the allowed options, the server can report other
for the timeStampLocation
. Clients are likely to use center
for other
, simply because there is not much else they can do. Note that the optional cadence
attribute is not meant to be accurate enough to use as a way to compute an alternate time stamp location. In other words, given a timeStampLocation
of begin
and a cadence
of 10 seconds, it may not always work to just add 5 seconds to get to the center of the measurement interval for this dataset. This is because the cadence
provides a nominal duration, and the actual duration of each measurement may vary significantly throughout the dataset. Some datasets may have specific parameters devoted to accumulation time or other measurement window parameters, but HAPI metadata does not capture this level of measurement window details. Suggestions on handling the issues discussed in this paragraph are given on the implementation notes page.
If a request is made for a time range in which there are no data, the server must respond with an HTTP 200 status code. The HAPI status-code must be either HAPI 1201
(the explicit no-data code) or HAPI 1200
(OK). While the more specific HAPI 1201
code is preferred, servers may have a difficult time recognizing the lack of data before issuing the header, in which case the issuing of HAPI 1200
and the subsequent absence of any data records communicates to clients that everything worked but no data was present in the given interval. Any response that includes a header (JSON always does, and CSV and binary when requested) must have this same HAPI status in the header. For CSV or binary responses without a header, the message body should be empty to indicate no data records.
This example clarifies the ideal case where a user has not requested the header. If servers have no data, the HTTP header
HTTP/1.1 200 OK
is acceptable, but a more specific header
HTTP/1.1 200 OK; HAPI 1201 OK - no data for time range
is preferred if the server can detect that there is no data in time and can modify the HTTP status message. This allows clients to verify that the empty body was intended without making a second request that includes the header containing a HAPI status object.
If a request is made with a time range in which the response will contain all fill values, the server must respond with all fill values and not an empty body.
There are two ways that HAPI servers must report errors, and these must be consistent. Because every HAPI server response is an HTTP response, an appropriate HTTP status and message must be set for each response. The HTTP integer status codes to use are the standard ones (200 means OK, 404 means not found, etc), and these are listed below.
The text message in the HTTP status should not just be the standard HTTP message but should include a HAPI-specific message, and this text should include the HAPI integer code along with the corresponding HAPI status message for that code. These HAPI codes and messages are also are described below. Note the careful use of "must" and "should" here. The use of the HTTP header message to include HAPI-specific details is optional, but the setting of the HTTP integer code status is required.
As an example, it is recommended that a status message such as
HTTP/1.1 404 Not Found
is modified to include the HAPI error code and error message (as described below)
HTTP/1.1 404 Not Found; HAPI 1402 Bad request - error in start time
Although the HTTP header mechanism is robust, it is more difficult for some clients to access -- a HAPI client using a high-level URL retrieving mechanism may not have easy access to HTTP header content. Therefore the HAPI response itself must also include a status indicator. This indicator appears as a status
object in the HAPI header. The two status indicators (HAPI and HTTP) must be consistent, i.e., if one indicates success, so must the other. Note that some HAPI responses do not include a header, and in these cases, the HTTP header is the only place to obtain the status.
The HAPI status
object is described as follows:
Name | Type | Description |
---|---|---|
code |
integer | Specific value indicating the category of the outcome of the request - see HAPI Status Codes. |
message |
string | Human readable description of the status - must conceptually match the intent of the integer code. |
HAPI servers must categorize the response status using at least three status codes: 1200 - OK
, 1400 - Bad Request
, and 1500 - Internal Server Error
. These are intentionally analogous to the similar HTTP codes 200 - OK
, 400 - Bad Request
, and 500 - Internal Server Error
.
HTTP code | HAPI status code |
HAPI status message |
---|---|---|
200 |
1200 | OK |
400 |
1400 | Bad request - user input error |
500 |
1500 | Internal server error |
The exact wording in the HAPI message does not need to match what is shown here. The conceptual message must be consistent with the status, but the wording can differ (or in another language, for example). If the server also includes the HAPI error message in the HTTP status message (recommended, not required), the HTTP status wording should be as similar as possible to the HAPI message wording.
The about
, capabilities
, and catalog
endpoints need to indicate 1200 - OK
or 1500 - Internal Server Error
since they do not take any request parameters. The info
and data
endpoints do take request parameters, so their status response must include 1400 - Bad Request
when appropriate.
A response of 1400 - Bad Request
must also be given when the user requests an endpoint that does not exist.
Servers may optionally provide the more specific codes in the table below. For convenience, a JSON object with these codes and messages is given in the Appendix. It is recommended but not required that a server implement this more complete set of status responses. Servers may add their own codes, but must use numbers outside the 1200
s, 1400
s, and 1500
s to avoid collisions with possible future HAPI codes.
HTTP code | HAPI status code |
HAPI status message |
---|---|---|
200 |
1200 |
OK |
200 |
1201 |
OK - no data for time range |
400 |
1400 |
Bad request - user input error |
400 |
1401 |
Bad request - unknown API parameter name |
400 |
1402 |
Bad request - syntax error in start time |
400 |
1403 |
Bad request - syntax error in stop time |
400 |
1404 |
Bad request - start equal to or after stop |
400 |
1405 |
Bad request - start < startDate and/or stop > stopDate |
404 |
1406 |
Bad request - unknown dataset id |
404 |
1407 |
Bad request - unknown dataset parameter |
400 |
1408 |
Bad request - too much time or data requested |
400 |
1409 |
Bad request - unsupported output format |
400 |
1410 |
Bad request - unsupported include value |
400 |
1411 |
Bad request - out-of-order or duplicate parameters |
400 |
1412 |
Bad request - unsupported resolve_references value |
400 |
1413 |
Bad request - unsupported depth value |
500 |
1500 |
Internal server error |
500 |
1501 |
Internal server error - upstream request error |
Note that there is an OK status to indicate that the request was fulfilled correctly but that no data was found. This can be useful feedback to clients and users, who may otherwise suspect server problems if no data is returned.
Error 1405
implies that a HAPI server should not send data outside of the time range of availability indicated by startDate
and stopDate
from an /info
response. The motivation for this is consistency between metadata and data responses and complexities that could result if servers did not enforce this error condition (see a related issue discussion). For a frequently updated dataset, we recommend that the server set the stopDate
to a future time if there is a desire not to update the stopDate
in the /info
response at the same frequency. We also recommend that the allowed start and stop dates are included in the error message associated with a 1405
error.
Note also the response 1408
indicating that the server will not fulfill the request since it is too large. This gives a HAPI server a way to inform clients about internal limits within the server.
For errors that prevent any HAPI content from being returned (such as a 400 - not found
or 500 - internal server error
), the HAPI server should return a JSON object that is a HAPI header with just the status information. The JSON object should be returned even if the request was for non-JSON data. Returning server-specified content for an error response is also how HTTP servers handle error messages -- think about custom HTML content that accompanies the 404 - not found
response when asking a server for a data file that does not exist.
In cases where the server cannot create a full response (such as an info
request or data
request for an unknown dataset), the JSON header response must include the HAPI version and a HAPI status object indicating that an error has occurred.
{
"HAPI": "3.3",
"status": { "code": 1406, "message": "Bad request - unknown dataset id"}
}
For the data
endpoint, clients may request data with no JSON header, and in the case of an error, the HTTP status is the only place a client can determine the response status.
Because web servers are not required to limit HTTP return codes to those in the above table, HAPI clients should be able to handle the full range of HTTP responses. Also, the HAPI server code may not be the only software that interacts with a URL-based request from a HAPI server. There may be a load balancer or upstream request routing or caching mechanism in place. Therefore, it is a good client-side practice to be able to handle any HTTP errors.
Also, recall that in a three-digit HTTP error code, the first digit is the main one client code should examine for determining the response status. Subsequent digits give a finer nuance to the error, but there may be variability between HTTP server implementations for the exact values of the seconds and third digits. HAPI servers can use more specific values for these second and third digits but must keep the first digit consistent with the table above.
Consider the HTTP 204
error code, which represents "No data." A HAPI server can return this code when no data was present over the time range indicated, but (per HTTP rules) it must only do so in cases where the HTTP body contains no data. A HAPI header would count as HTTP data, so the HTTP 204 code can only be sent by a server when the clients request CSV or binary data with no header. Here is a sample HTTP response for this case:
HTTP/1.1 204 OK - no data for time range
(Note: it may be difficult to change a server's response code message, so this is optional.)
Regardless of whether the server uses a more specific HTTP code, the HAPI code embedded in the HTTP message must properly indicate the HAPI status.
Because of the increasing importance of JavaScript clients that use AJAX requests, HAPI servers are strongly encouraged to implement Cross-Origin Resource Sharing CORS. This will allow AJAX requests by browser clients from any domain. Enabling CORS is common for servers with only public data, and not implementing CORS limits the type of clients that can interface with a HAPI server. For testing purposes, the following headers have been sufficient for browser clients to HAPI servers:
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET
Access-Control-Allow-Headers: Content-Type
When the server sees a request parameter that it does not recognize, it should throw an error.
So given this query
http://server/hapi/data?dataset=DATA&start=T1&stop=T2&fields=mag_GSE&avg=5s
the server should throw an error with a status of 1400 - Bad Request
with an HTTP status of 400
. The server could optionally be more specific with 1401 - misspelled or invalid request parameter
with an HTTP code of 404 - Not Found
.
Following general security best practices, HAPI servers should screen incoming request parameter name values. Unknown request parameters and values, including incorrectly formatted time values, should not be echoed in the error response.
Although HEAD requests are allowed (and required by the HTTP specification), the HAPI specification does not define any additional or new aspects to the response of a HEAD request. Note that many server frameworks will respond to a HEAD request by making a GET request and then omitting the body in the response since this is a simple way to guarantee that the meta-information in the HEAD request is the same as that in the GET request (as is required by the HTTP specification). As a result, HAPI server developers may want to modify this default behavior to prevent unnecessary processing for HEAD requests.
- ISO 8601:2019 Date Time Format Standard, https://www.iso.org/standard/70908.html
- CSV format, https://tools.ietf.org/html/rfc4180
- JSON Format, https://tools.ietf.org/html/rfc7159; http://json-schema.org/
- IEEE Standard for Floating-Point Arithmetic, http://doi.org/10.1109/IEEESTD.2008.4610935
- Understanding JSON Schema - Structuring a Complex Schema, https://json-schema.org/understanding-json-schema/structuring.html
- Jon Vandegriff (jon.vandegriff@jhuapl.edu)
- Robert Weigel (rweigel@gmu.edu)
- Jeremy Faden (faden@cottagesystems.com)
- Sandy Antunes (sandy.antunes@jhuapl.edu)
- Doug Lindholm (doug.lindholm@lasp.colorado.edu)
- Robert Candey (Robert.M.Candey@nasa.gov)
- Bernard Harris (bernard.t.harris@nasa.gov)
See https://github.com/hapi-server/server-ui
HAPI allows the use of UTF-8 encoded Unicode characters for id
, dataset
, and parameters
. (id
is used in the /catalog
request and dataset
and parameter
are used in /info
and /data
requests.)
Not allowed
- Comma (ASCII decimal code 44)
Recommended
- strings that match the regular expression
[_a-zA-Z][_a-zA-Z0-9]{0,30}
so that URL encoding is not required and names can be mapped directly to a variable name in most programming languages and a file name on modern operating systems; - strings with any of
a-z
,A-Z
,-
,.
,\_
, and~
so that URL encoding is not required; and - short strings - the number of bytes required to write a comma-separated list of all parameters in a dataset and all other parts of a request URL (i.e.,
http://.../hapi/data?dataset=...
) should be less than 2048 bytes, which is a limitation on a URL length for most web browsers.
Allowed
- Any non-control Unicode characters (but we recommend against using Unicode characters that look like a comma)
{
"1200": {"status":{"code": 1200, "message": "OK"}},
"1201": {"status":{"code": 1201, "message": "OK - no data for time range"}},
"1400": {"status":{"code": 1400, "message": "Bad request - user input error"}},
"1401": {"status":{"code": 1401, "message": "Bad request - unknown API parameter name"}},
"1402": {"status":{"code": 1402, "message": "Bad request - syntax error in start time"}},
"1403": {"status":{"code": 1403, "message": "Bad request - syntax error in stop time"}},
"1404": {"status":{"code": 1404, "message": "Bad request - start equal to or after stop"}},
"1405": {"status":{"code": 1405, "message": "Bad request - start < startDate and/or stop > stopDate"}},
"1406": {"status":{"code": 1406, "message": "Bad request - unknown dataset id"}},
"1407": {"status":{"code": 1407, "message": "Bad request - unknown dataset parameter"}},
"1408": {"status":{"code": 1408, "message": "Bad request - too much time or data requested"}},
"1409": {"status":{"code": 1409, "message": "Bad request - unsupported output format"}},
"1410": {"status":{"code": 1410, "message": "Bad request - unsupported include value"}},
"1411": {"status":{"code": 1411, "message": "Bad request - out-of-order or duplicate parameters"}},
"1412": {"status":{"code": 1412, "message": "Bad request - unsupported resolve_references value"}},
"1413": {"status":{"code": 1413, "message": "Bad request - unsupported depth value"}},
"1500": {"status":{"code": 1500, "message": "Internal server error"}},
"1501": {"status":{"code": 1501, "message": "Internal server error - upstream request error"}}
}
The following two examples illustrate two different ways to represent a magnetic field dataset. The first lists a time column and three scalar data columns, Bx, By, and Bz for the Cartesian components.
{
"HAPI": "3.3",
"status": { "code": 1200, "message": "OK"},
"startDate": "2016-01-01T00:00:00.000Z",
"stopDate": "2016-01-31T24:00:00.000Z",
"parameters": [
{"name" : "timestamp", "type": "isotime", "units": "UTC", "fill": null, "length": 24},
{"name" : "bx", "type": "double", "units": "nT", "fill": "-1e31"},
{"name" : "by", "type": "double", "units": "nT", "fill": "-1e31"},
{"name" : "bz", "type": "double", "units": "nT", "fill": "-1e31"}
]
}
This example shows a header for the same conceptual data (time and three magnetic field components), but with the three components grouped into a one-dimensional array of size 3.
{
"HAPI": "3.3",
"status": { "code": 1200, "message": "OK"},
"startDate": "2016-01-01T00:00:00.000Z",
"stopDate": "2016-01-31T24:00:00.000Z",
"parameters": [
{ "name" : "timestamp", "type": "isotime", "units": "UTC", "fill": null, "length": 24 },
{ "name" : "b_field", "type": "double", "units": "nT", "fill": "-1e31", "size": [3] }
]
}
These two different representations affect how a subset of parameters could be requested from a server. The first example, by listing Bx, By, and Bz as separate parameters, allows clients to request individual components:
http://server/hapi/data?dataset=MY_MAG_DATA&start=2001Z&stop=2010Z¶meters=Bx
This request would just return a time column (always included as the first column) and a Bx column. But in the second example, the components are all inside a single parameter named b_field
and so a request for this parameter must always return all the components of the parameter. There is no way to request individual elements of an array parameter.
The following example shows a proton energy spectrum and illustrates the use of the ‘bins’ element. Note also that the uncertainty of the values associated with the proton spectrum is a separate variable. There is currently no way in the HAPI spec to explicitly link a variable to its uncertainties.
{"HAPI": "3.3",
"status": { "code": 1200, "message": "OK"},
"startDate": "2016-01-01T00:00:00.000Z",
"stopDate": "2016-01-31T24:00:00.000Z",
"parameters": [
{ "name": "Time",
"type": "isotime",
"units": "UTC",
"fill": null,
"length": 24
},
{ "name": "qual_flag",
"type": "int",
"units": null,
"fill": null
},
{ "name": "maglat",
"type": "double",
"units": "degrees",
"fill": null,
"description": "magnetic latitude"
},
{ "name": "MLT",
"type": "string",
"length": 5,
"units": "hours:minutes",
"fill": "??:??",
"description": "magnetic local time in HH:MM"
},
{ "name": "proton_spectrum",
"type": "double",
"size": [3],
"units": "particles/(sec ster cm^2 keV)",
"fill": "-1e31",
"bins": [ {
"name": "energy",
"units": "keV",
"centers": [ 15, 25, 35 ],
} ],
{ "name": "proton_spectrum_uncerts",
"type": "double",
"size": [3],
"units": "particles/(sec ster cm^2 keV)",
"fill": "-1e31",
"bins": [ {
"name": "energy",
"units": "keV",
"centers": [ 15, 25, 35 ],
} ]
}
]
}
This shows how "ranges" can specify the bins:
{
"HAPI": "3.3",
"status": { "code": 1200, "message": "OK"},
"startDate": "2016-01-01T00:00:00.000Z",
"stopDate": "2016-01-31T24:00:00.000Z",
"parameters": [
{
"length": 24,
"name": "Time",
"type": "isotime",
"fill": null,
"units": "UTC"
},
{
"bins": [{
"ranges": [
[ 0, 30 ],
[ 30, 60 ],
[ 60, 90 ],
[ 90, 120 ],
[ 120, 150 ],
[ 150, 180 ]
],
"units": "degrees"
}],
"fill": "-1e31",
"name": "pitchAngleSpectrum",
"size": [6],
"type": "double",
"units": "particles/sec/cm^2/ster/keV"
}
]
}
Processes that introduce traffic to servers which does not represent immediate use for science purposes should be
identifiable by the servers. For example, an indexing service that queries
a server for metadata or a process that verifies that a server is responsive. Robot clients making such requests should set the User-Agent
property. This property should identify both an id
and a URL that describes the bot. For example, hapibot-a
pings all HAPI servers each hour to see each
is responsive and sets the User-Agent
agent to
"hapibot-a/1.0; https://github.com/hapi-server/data-specification/wiki/hapi-bots.md#hapibot-a".
Note that the use of the wiki page to describe bots is encouraged.
For each of the elements of FAIR listed below (copied from https://www.go-fair.org/fair-principles/), we describe their relationship with the HAPI specification.
Some aspects of HAPI, which is an API and metadata standard, directly address FAIR; however, some FAIR principles must be addressed by an external service or the data provider. As such, HAPI supports FAIR principles to the extent that the principles are within its scope.
The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component of the FAIRification process.
1. (Meta)data are assigned a globally unique and persistent identifier
The resourceID
attribute can be used for a globally unique and persistent identifier.
Alternatively,
- If each HAPI dataset does not have a globally unique and persistent identifier, but the server does, a data provider can use the
resourceID
in the/about
response (but data providers are encouraged to have dataset-levelresourceID
s). - If a dataset is associated with more than one identifier, a data provider can create a dataset of identifiers and serve this dataset as a time series.
2. Data are described with rich metadata (defined by Reusable, item 1. below)
Reusable, item 1: (Meta)data are richly described with a plurality of accurate and relevant attributes
The HAPI metadata specification has accurate and relevant attributes; the data provider needs to ensure the attribute values accurately describe the data and include information required for interpretation.
3. Metadata clearly and explicitly include the identifier of the data they describe
The HAPI metadata specification requires an identifier (id
) for every dataset. The list of all id
s is returned in a catalog/
request. id
is also used in the dataset
request parameter in the URL for an info/
or data/
request. (The HAPI /info
response does not contain the id
because we have generally avoided the duplication of metadata in responses from different endpoints.)
4. (Meta)data are registered or indexed in a searchable resource
This is outside the scope of the HAPI specification. However, there is a way to explore all known HAPI servers at https://hapi-server.org/servers/. We also work with other projects that address registration, indexing, and searching.
Once the user finds the required data, they need to know how they can be accessed, possibly including authentication and authorization.
1. (Meta)data are retrievable by their identifier using a standardized communications protocol
All HAPI endpoints use the HTTP protocol; HAPI metadata is JSON. The info/
endpoint takes the dataset id
and returns JSON metadata. The data/
endpoint also takes the id
and returns data in CSV, JSON, or binary.
2. The protocol is open, free, and universally implementable
HAPI delivers JSON metadata and well-structured data over HTTP(S). The schema for the data and metadata is free, open, and can be implemented in any programming language.
3. The protocol allows for an authentication and authorization procedure, where necessary
This is out of scope for HAPI, which was designed to access open data. The HAPI specification explicitly does not include an option for authentication. Access restrictions can still be implemented using authentication mechanisms outside or independent of HAPI.
4. Metadata are accessible, even when the data are no longer available
This is outside the scope of the HAPI specification. However, an affiliated HAPI project caches metadata from all known HAPI servers nightly.
The data usually needs to be integrated with other data. In addition, the data needs to interoperate with applications or workflows for analysis, storage, and processing.
1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
HAPI metadata are in JSON, and JSON schemas are available for validation. HAPI data is transmitted as JSON or Comma Separated Values (CSV), both widely used. (HAPI servers may use a simple binary format, which uses IEEE standards for binary numbers, and the layout mimics the CSV output.)
2. (Meta)data use vocabularies that follow FAIR principles
HAPI metadata does not use vocabularies directly, but links can be made to external metadata that uses vocabularies (see next item).
3. (Meta)data include qualified references to other (meta)data
Other metadata can be referenced using additionalMetadata
.
The ultimate goal of FAIR is to optimize the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.
1. (Meta)data are richly described with a plurality of accurate and relevant attributes
This is satisfied by the HAPI specification.
2. (Meta)data are released with a clear and accessible data usage license
This can be satisfied by using the license
attribute.
3. (Meta)data are associated with detailed provenance
This can be satisfied using the provenance
attribute.
4. (Meta)data meet domain-relevant community standards
HAPI is built using the widely adopted RESTful approach to web-accessible resources. We use JSON in a way that is common in the community. The time standard is a subset of the ISO8601 standard for time strings. The design of the HAPI API for requesting and receiving data followed from an analysis of the API of many time series data providers, and HAPI is a standard that provides a common set of features.