Help: availability v.1

Description

The irisws-availability web service returns detailed time span information of timeseries data available in the DMC archive.

There are two service query methods:

/extent

Produces lists of available time extents (earliest to latest) for selected channels (network, station, location and quality).

/query

Produces lists of contiguous time spans for selected channels (network, station, location, channel and quality) and time ranges.

Sample queries

/extent Sample queries

Extent information for all network IU, station ANMO channels in text format (default)
http://service.iris.edu/irisws/availability/1/extent?network=IU&station=ANMO

Extent information for all network IU, station ANMO channels in JSON format
http://service.iris.edu/irisws/availability/1/extent?network=IU&station=ANMO&format=json

Extent information for all network IU, sorted by number of time-spans descending, limited to 100 rows (default)
http://service.iris.edu/irisws/availability/1/extent?network=IU&orderby=timespancount_desc&rowlimit=100

Any channel that has more that has more than 1,000,000 timespans cannot be processed by the /query method. This will reveal which channels cannot be processed (ie those with more than 1 million timespans)
http://service.iris.edu/irisws/availability/1/extent?network=*&orderby=timespancount_desc&rowlimit=500

Extent information for all network IU, sorted by update-date, limited to 100 rows (default)
http://service.iris.edu/irisws/availability/1/extent?network=IU&orderby=latestupdate&rowlimit=100

/query Sample queries

Demonstrations of wildcard and multiple selections via CSV (comma separated values)

All BH channels for a station
http://service.iris.edu/irisws/availability/1/query?start=2010-02-23&end=2011-02-23&network=IU&station=ANMO&channel=BH?

Network IU, stations ANMO, and BILL location 00 and BH1 and BHE channels
http://service.iris.edu/irisws/availability/1/query?start=2010-02-23&end=2010-04-20&network=IU&location=00&station=ANMO,BILL&channel=BH1,BHE

Note: the , (comma) and ? (question mark) characters may be displayed as %2C and %3F after you click on the previous two links.

Demonstrations of merging

Channel with changing sample rates
http://service.iris.edu/irisws/availability/1/query?nodata=404&network=TA&station=134A&channel=VM1

Same as previous with sample rates merged
http://service.iris.edu/irisws/availability/1/query?nodata=404&network=TA&station=134A&channel=VM1&mergesamplerate=true

Same as previous with overlaps overlaps merged
http://service.iris.edu/irisws/availability/1/query?nodata=404&network=TA&station=134A&channel=VM1&mergesamplerate=true&mergeoverlap=true

Same as previous with gaps of one day or less merged
http://service.iris.edu/irisws/availability/1/query?nodata=404&network=TA&station=134A&channel=VM1&mergesamplerate=true&mergeoverlap=true&mergetolerance=86400.0

Demonstration of memory limitation behavior

Two queries demonstrating behavior when too many timespans are present for processing. (See Memory Limitations for more information.)

This query reports no data available because the selected station contains too many timespans:
http://service.iris.edu/irisws/availability/1/query?network=GB&station=DYA&nodata=404

Identical query, but with excludetoolarge=false explicitely set.
http://service.iris.edu/irisws/availability/1/query?network=GB&station=DYA&nodata=404&excludetoolarge=true

This query reports Error 413 (request too large) because of excludetoolarge=false
http://service.iris.edu/irisws/availability/1/query?network=GB&station=DYA&nodata=404&excludetoolarge=false

HTTP POST queries

/extent and /query methods can be accessed via HTTP POST. All of the parameters that can be submitted with the GET method are allowed in POST.

/extent POST

Requests submitted to /query via HTTP POST allow for arbitrary combinations of network, station, location and channel. The general form of a POST request is parameter=value pairs, one per line, followed by an arbitrary number of channel and time window selection lines:

parameter=<value>
parameter=<value>
parameter=<value>
Net Sta Loc Chan
Net Sta Loc Chan
...

Example:

$ cat extent.request
format=geocsv
quality=M
TA A25A -- BH?
IU ANMO * BH?
IU ANMO 10 HHZ
II KURK 00 BH?

/query POST

Requests submitted to /query via HTTP POST allow for arbitrary combinations of network, station, location, channel and time window. The general form of a POST request is parameter=value pairs, one per line, followed by an arbitrary number of channel and time window selection lines:

parameter=<value>
parameter=<value>
parameter=<value>
Net Sta Loc Chan StartTime EndTime
Net Sta Loc Chan StartTime EndTime
...

Example:

$ cat availability.request
mergequality=true
mergesamplerate=true
mergeoverlap=true
format=text
TA A25A -- BH? 2010-03-25T00:00:00 2010-04-01T00:00:00
IU ANMO * BH? 2010-03-25T00:00:00 2010-04-01T00:00:00
IU ANMO 10 HHZ 2010-03-25T00:00:00 2010-04-01T00:00:00
II KURK 00 BH? 2010-03-25T00:00:00 2010-04-01T00:00:00

Submitting POST request files via wget and curl

Requests can be made with a selection file using either the wget or curl Unix command line utilities. The commands below will POST the selection file to the server and save the results in a text files

$ wget --post-file=availability.request -O availability.txt http://service.iris.edu/irisws/availability/1/query
$ curl -L --data-binary @availability.request -o availability.txt http://service.iris.edu/irisws/availability/1/query
$ wget --post-file=extent.request -O extents.txt http://service.iris.edu/irisws/availability/1/extent
$ curl -L --data-binary @extent.request -o extents.txt http://service.iris.edu/irisws/availability/1/extent

We recommend always using the -L option to allow curl to follow HTTP redirections specified by our systems. The DMC uses HTTP redirection during maintenance to keep servicing requests.

When using curl, you may wish to use the -f option. This will cause curl to return an exit code of 22 if data is not found or the request is improperly formatted. See http://curl.haxx.se/docs/manpage.html for more information.

Limitations

Cache Latency

In order to be performant, the irisws-availability service uses a cache of assembled timespan information. The cache is derived from an internal database which tracks miniSEED data in the DMC archive. The cache used by the irisws-availability service sutures together time segment information recorded in the database. The cache takes over an hour to assemble and is refreshed several times per day

The vast majority of the data contained in miniSEED archive does not change between cache refreshes, however, there will always be a certain amount of disagreement between the cache used by the webservice and the archive.

Realtime Data

The irisws-availability service only catalogs data in the archive and not data in the realtime system (BUD). Consequently, it is generally not useful for querying data availability close to realtime. It usual takes between 4 and 26+ hours for data to be copied from the BUD into the DMC archive. Data is archived in 24 hour segments by GMT day. Consequently, data from just before the end of a GMT day is placed into the archive quicker than data just after the start of a GMT day.

Memory Limitations

A small number of channels cannot be processed by the irisws-availability service’s /query method due to having too many timespans to load into memory. The maximum processable limit is currently set to 1,000,000 timespans. Any channel with more timespans than this value cannot be processed by the service. Clicking on the link http://service.iris.edu/irisws/availability/1/extent?orderby=timespancount_desc&rowlimit=500 will show the top 500 channels sorted by number of timespans. As can be seen, only a comparatively, small number (less than 100) cannot be processed.

By default channels with too many timespans will be ignored by the /query method. Using the excludetoolarge=false option will cause a HTTP code 413 (request too large) to be returned if any of these channels are selected.

Restricted Data

Currently, the irisws-availability service does not take into account whether data is restricted. The service may show data as available, but the actual timeseries data may not be available to general, non-autheticated, users. Note that if a request is made via the fdsnws-dataselect webservice to restricted data and the request is not authenticated or the authentication does not allow access to the data, the data will simply appear to not exist. The webservice will not return a denied access error code (403). This can be confusing if you are not aware of this detail. A future release of the irisws-availability service will take into account the restriction of data.

Missing Metadata.

There is a small amount of timeseries miniSEED data in the archive which for which there is no corresponding meta-data. Currently, the irisws-availability service will show this data as available. Attempting to request this data using a tool such as fdsnws-dataselect may not work.

Latest Update-Date Inaccuracies

In some circumstances the reported latest update-dates returned from the /query method maybe later (but never earlier) than their actual values.

This is a result of how the irisws-availability service catalogs these dates in it’s internal cache. In the cache, latest update-dates are stored per GMT calendar day per network, station, location, channel, quality, sample-rate tuple. Because of the way in which most data is archived, this method of caching results in accurate update-dates being reported. However if the requested time segment does not cover a part of a day that was most recently loaded, the reported time may be later than it’s actual value.

Most data is loaded into the archive in a fashion that tends to not make this an issue. It is worth emphasizing that this behavior should never result in update-dates dates being reported as earlier than their actual values, only later.

Timespan Merging Logic

When the cache of assembled timespans is compiled, timespans from identical network, station, location, channel, sample-rate and quality tuples are merged together where possible.

As illustrated in the following figure, timespan A can be merged with timespan B if the start of B is in the window of time shown: End-of-A + 1/2-sample-period to End-of-A + 3/2-sample-period:

.
By default, timespans from identical network, station, location, channel and sample-rate tuples but different qualities are merged together using the logic shown above. This can be disabled by setting mergequality=false. If mergesamplerate=true is chosen, the same logic shown above will be applied, with the sample-period taken from Timespan B.

In general, the distribution of timespans for a network, station, location, channel, sample-rate and quality tuple can be quite complicated. This is illustrated in this figure:

If mergeovelap=true is selected, timespans that overlap in time will be merged together. Also, timespans that are separated by less than 1/2 sample-period will also be merged.

The mergetolerance=<seconds> option will suture together timespans that are separated by no more than the given time. It can only be used with mergeovelap=true