The irisws-availability web service returns detailed time span information of timeseries data available in the DMC archive.
There are two service query methods:
Produces lists of available time extents (earliest to latest) for selected channels (network, station, location and quality).
Produces lists of contiguous time spans for selected channels (network, station, location, channel and quality) and time ranges.
- Sample Queries
- HTTP POST queries
- Timespan Merging Logic
/extent Sample queries
Extent information for all network IU, station ANMO channels in text format (default)
Extent information for all network IU, station ANMO channels in JSON format
Extent information for all network IU, sorted by number of time-spans descending, limited to 100 rows (default)
Any channel that has more that has more than 1,000,000 timespans cannot be processed by the
/query method. This will reveal which channels cannot be processed (ie those with more than 1 million timespans)
Extent information for all network IU, sorted by update-date, limited to 100 rows (default)
/query Sample queries
Demonstrations of wildcard and multiple selections via CSV (comma separated values)
All BH channels for a station
Network IU, stations ANMO, and BILL location 00 and BH1 and BHE channels
, (comma) and
? (question mark) characters may be displayed as
%3F after you click on the previous two links.
Demonstrations of merging
Channel with changing sample rates
Same as previous with sample rates merged
Same as previous with overlaps overlaps merged
Same as previous with gaps of one day or less merged
Demonstration of memory limitation behavior
Two queries demonstrating behavior when too many timespans are present for processing. (See Memory Limitations for more information.)
This query reports no data available because the selected station contains too many timespans:
Identical query, but with
excludetoolarge=false explicitely set.
This query reports
Error 413 (request too large) because of
HTTP POST queries
/query methods can be accessed via HTTP POST. All of the parameters that can be submitted with the GET method are allowed in POST.
Requests submitted to
/query via HTTP POST allow for arbitrary combinations of network, station, location and channel. The general form of a POST request is parameter=value pairs, one per line, followed by an arbitrary number of channel and time window selection lines:
parameter=<value> parameter=<value> parameter=<value> Net Sta Loc Chan Net Sta Loc Chan ...
$ cat extent.request format=geocsv quality=M TA A25A -- BH? IU ANMO * BH? IU ANMO 10 HHZ II KURK 00 BH?
Requests submitted to
/query via HTTP POST allow for arbitrary combinations of network, station, location, channel and time window. The general form of a POST request is parameter=value pairs, one per line, followed by an arbitrary number of channel and time window selection lines:
parameter=<value> parameter=<value> parameter=<value> Net Sta Loc Chan StartTime EndTime Net Sta Loc Chan StartTime EndTime ...
$ cat availability.request mergequality=true mergesamplerate=true mergeoverlap=true format=text TA A25A -- BH? 2010-03-25T00:00:00 2010-04-01T00:00:00 IU ANMO * BH? 2010-03-25T00:00:00 2010-04-01T00:00:00 IU ANMO 10 HHZ 2010-03-25T00:00:00 2010-04-01T00:00:00 II KURK 00 BH? 2010-03-25T00:00:00 2010-04-01T00:00:00
Submitting POST request files via wget and curl
Requests can be made with a selection file using either the
curl Unix command line utilities. The commands below will POST the selection file to the server and save the results in a text files
$ wget --post-file=availability.request -O availability.txt http://service.iris.edu/irisws/availability/1/query
$ curl -L --data-binary @availability.request -o availability.txt http://service.iris.edu/irisws/availability/1/query
$ wget --post-file=extent.request -O extents.txt http://service.iris.edu/irisws/availability/1/extent
$ curl -L --data-binary @extent.request -o extents.txt http://service.iris.edu/irisws/availability/1/extent
We recommend always using the
-L option to allow curl to follow HTTP redirections specified by our systems. The DMC uses HTTP redirection during maintenance to keep servicing requests.
curl, you may wish to use the
-f option. This will cause
curl to return an exit code of 22 if data is not found or the request is improperly formatted. See http://curl.haxx.se/docs/manpage.html for more information.
In order to be performant, the irisws-availability service uses a cache of assembled timespan information. The cache is derived from an internal database which tracks miniSEED data in the DMC archive. The cache used by the irisws-availability service sutures together time segment information recorded in the database. The cache takes over an hour to assemble and is refreshed several times per day
The vast majority of the data contained in miniSEED archive does not change between cache refreshes, however, there will always be a certain amount of disagreement between the cache used by the webservice and the archive.
The irisws-availability service only catalogs data in the archive and not data in the realtime system (BUD). Consequently, it is generally not useful for querying data availability close to realtime. It usual takes between 4 and 26+ hours for data to be copied from the BUD into the DMC archive. Data is archived in 24 hour segments by GMT day. Consequently, data from just before the end of a GMT day is placed into the archive quicker than data just after the start of a GMT day.
A small number of channels cannot be processed by the irisws-availability service’s
/query method due to having too many timespans to load into memory. The maximum processable limit is currently set to 1,000,000 timespans. Any channel with more timespans than this value cannot be processed by the service. Clicking on the link http://service.iris.edu/irisws/availability/1/extent?orderby=timespancount_desc&rowlimit=500 will show the top 500 channels sorted by number of timespans. As can be seen, only a comparatively, small number (less than 100) cannot be processed.
By default channels with too many timespans will be ignored by the
/query method. Using the
excludetoolarge=false option will cause a HTTP code 413 (request too large) to be returned if any of these channels are selected.
Currently, the irisws-availability service does not take into account whether data is restricted. The service may show data as available, but the actual timeseries data may not be available to general, non-autheticated, users. Note that if a request is made via the fdsnws-dataselect webservice to restricted data and the request is not authenticated or the authentication does not allow access to the data, the data will simply appear to not exist. The webservice will not return a denied access error code (403). This can be confusing if you are not aware of this detail. A future release of the irisws-availability service will take into account the restriction of data.
There is a small amount of timeseries miniSEED data in the archive which for which there is no corresponding meta-data. Currently, the irisws-availability service will show this data as available. Attempting to request this data using a tool such as fdsnws-dataselect may not work.
Latest Update-Date Inaccuracies
In some circumstances the reported latest update-dates returned from the
/query method maybe later (but never earlier) than their actual values.
This is a result of how the irisws-availability service catalogs these dates in it’s internal cache. In the cache, latest update-dates are stored per GMT calendar day per network, station, location, channel, quality, sample-rate tuple. Because of the way in which most data is archived, this method of caching results in accurate update-dates being reported. However if the requested time segment does not cover a part of a day that was most recently loaded, the reported time may be later than it’s actual value.
Most data is loaded into the archive in a fashion that tends to not make this an issue. It is worth emphasizing that this behavior should never result in update-dates dates being reported as earlier than their actual values, only later.
Timespan Merging Logic
When the cache of assembled timespans is compiled, timespans from identical network, station, location, channel, sample-rate and quality tuples are merged together where possible.
As illustrated in the following figure, timespan A can be merged with timespan B if the start of B is in the window of time shown: End-of-A + 1/2-sample-period to End-of-A + 3/2-sample-period:
By default, timespans from identical network, station, location, channel and sample-rate tuples but different qualities are merged together using the logic shown above. This can be disabled by setting
mergesamplerate=true is chosen, the same logic shown above will be applied, with the sample-period taken from Timespan B.
In general, the distribution of timespans for a network, station, location, channel, sample-rate and quality tuple can be quite complicated. This is illustrated in this figure:
mergeovelap=true is selected, timespans that overlap in time will be merged together. Also, timespans that are separated by less than 1/2 sample-period will also be merged.
mergetolerance=<seconds> option will suture together timespans that are separated by no more than the given time. It can only be used with