Help: availability v.1

Description

The irisws-availability web service returns detailed time span information of what timeseries data is available at the DMC archive.

There are two service query methods:

/extent

Produces lists of available time extents (earliest to latest) for selected channels (network, station, location and quality).

/query

Produces lists of contiguous time spans for selected channels (network, station, location, channel and quality) and time ranges.

Limitations

Cache Latency

In order to be performant, the irisws-availability service uses a cache of assembled timespan information. The cache is derived from an internal database which tracks miniSEED data in the DMC archive. The cache used by the irisws-availability service sutures together time segment information recorded in the database. The cache takes over an hour to assemble and is refreshed several times per day

The vast majority of the data contained in miniSEED archive does not change between cache refreshes, however, there will always be a certain amount of disagreement between the cache used by the webservice and the archive.

Realtime Data

The irisws-availability service only catalogs data in the archive and not data in the realtime system (BUD). Consequently, it is generally not useful for querying data availability close to realtime. It usual takes between 4 and 26+ hours for data to be copied from the BUD into the DMC archive. Data is archived in 24 hour segments by GMT day. Consequently, data from just before the end of a GMT day is placed into the archive quicker than data just after the start of a GMT day.

Memory Limitations

A small number of channels cannot be processed by the irisws-availability service’s /query method due to having too many timespans to load into memory. The maximum processable limit is currently set to 1,000,000 timespans. Any channel with more timespans than this value cannot be processed by the service. Clicking on the link http://service.iris.edu/irisws/availability/1/extent?orderby=timespancount_desc&rowlimit=500 will show the top 500 channels sorted by number of timespans. As can be seen, only a comparatively, small number (less than 100) cannot be processed.

By default channels with too many timespans will be ignored by the /query method. Using the excludetoolarge=true option will cause a HTTP code 413 (request too large) to be returned if any of these channels are selected.

Restricted Data

Currently, the irisws-availability service does not take into account whether data is restricted. The service may show data as available, but the actual timeseries data may not be available to general, non-autheticated, users. Note that if a request is made via the fdsnws-dataselect webservice to restricted data and the request is not authenticated or the authentication does not allow access to the data, the data will simply appear to not exist. The webservice will not return a denied access error code (403). This can be confusing if you are not aware of this detail. A future release of the irisws-availability service will take into account the restriction of data.

Missing Metadata.

There is a small amount of timeseries miniSEED data in the archive which for which there is no corresponding meta-data. Currently, the irisws-availability service will show this data as available. Attempting to request this data using a tool such as fdsnws-dataselect may not work.

Timespan Merging Logic

When the cache of assembled timespans is compiled, timespans from identical network, station, location, channel, sample-rate and quality tuples are merged together where possible.

As illustrated in the following figure, timespan A can be merged with timespan B if the start of B is in the window of time shown: End-of-A + 1/2-sample-period to End-of-A + 3/2-sample-period:

.
By default, timespans from identical network, station, location, channel and sample-rate tuples but different qualities are merged together using the logic shown above. This can be disabled by setting mergequality=false. If mergesamplerate=true is chosen, the same logic shown above will be applied, with the sample-period taken from Timespan B.

In general, the distribution of timespans for a network, station, location, channel, sample-rate and quality tuple can be quite complicated. This is illustrated in this figure:

If mergeovelap=true is selected, timespans that overlap in time will be merged together. Also, timespans that are separated by less than 1/2 sample-period will also be merged.

The mergetolerance=<seconds> option will suture together timespans that are separated by no more than the given time. It can only be used with mergeovelap=true

/query and /extent POST queries

/query POST

Requests submitted to /query via HTTP POST allow for arbitrary combinations of network, station, location, channel and time window. All of the parameters that can be submitted with the GET method are allowed in POST. The general form of a POST request is parameter=value pairs, one per line, followed by an arbitrary number of channel and time window selection lines:

parameter=<value>
parameter=<value>
parameter=<value>
<network> <station> <location> <channel> <starttime> <endtime>
<network> <station> <location> <channel> <starttime> <endtime>
...

Example:

$ cat availability.request
mergequality=true
mergesamplerate=true
mergeoverlap=true
format=text
TA A25A -- BH? 2010-03-25T00:00:00 2010-04-01T00:00:00
IU ANMO * BH? 2010-03-25T00:00:00 2010-04-01T00:00:00
IU ANMO 10 HHZ 2010-03-25T00:00:00 2010-04-01T00:00:00
II KURK 00 BH? 2010-03-25T00:00:00 2010-04-01T00:00:00

/extent POST

Requests submitted to /query via HTTP POST allow for arbitrary combinations of network, station, location and channel. All of the parameters that can be submitted with the GET method are allowed in POST. The general form of a POST request is parameter=value pairs, one per line, followed by an arbitrary number of channel and time window selection lines:

parameter=<value>
parameter=<value>
parameter=<value>
<network> <station> <location> <channel>
<network> <station> <location> <channel>
...

Example:

$ cat extent.request
format=geocsv
quality=M
TA A25A -- BH?
IU ANMO * BH?
IU ANMO 10 HHZ
II KURK 00 BH?

Submitting POST request files via wget and curl

Requests can be made with a selection file using either the wget or curl Unix command line utilities. The commands below will POST the selection file to the server and save the results in a text files

$ wget --post-file=availability.request -O availability.txt http://service.iris.edu/irisws/availability/1/query
$ curl -L --data-binary @availability.request -o availability.txt http://service.iris.edu/irisws/availability/1/query
$ wget --post-file=extent.request -O extents.txt http://service.iris.edu/irisws/availability/1/extent
$ curl -L --data-binary @extent.request -o extents.txt http://service.iris.edu/irisws/availability/1/extent

We recommend always using the -L option to allow curl to follow HTTP redirections specified by our systems. The DMC uses HTTP redirection during maintenance to keep servicing requests.

When using curl, you may wish to use the -f option. This will cause curl to return an exit code of 22 if data is not found or the request is improperly formatted. See http://curl.haxx.se/docs/manpage.html for more information.