Help: dataselect v.1

Description

The fdsnws-dataselect service provides access to time series data for specified channels and time ranges.

Data are selected using SEED time series identifiers (network, station, location & channel) in addition to time ranges. Data are returned in miniSEED format.

This service is an implementation of the FDSN web service specification version 1.

To retrieve raw waveform data in miniSEED format, submit a request by either of two methods:

  • via HTTP GET : Provide a series of parameter-value pairs in the URL that specify the start-time and end-time, along with the desired network(s), station(s), location(s) and channel(s) . Wildcards are supported. Please visit the fdsnws-dataselect service interface for parameter usage details.
  • via HTTP POST: Submit a pre-formatted request (e.g. a file) to the service containing a list of the desired networks, stations, locations, channels, start-times and end-times. The POSTing method is described in more detail on this page.

This service is designed to handle very large1 data requests and can easily be used with command line programs such as wget, curl or similar utilities.

Data selection

A data selection is composed of a list of network, station, location, channel, start time and end time entries. Channel codes follow the conventions documented in Appendix A of the SEED Manual. The appendix has been reproduced here to be more easily searched.

Optional parameters allow for more detailed data selection and can include:

  • SEED quality/version, selection of a specific SEED data quality code (defaults to “best”)
  • Minimum segment length, limit data return to segments longer than a specified value
  • Longest segment only, limit data return to the longest segment per channel

An example selection, submitted using HTTP POST, might look like:

TA A25A -- BHZ 2010-03-25T00:00:00 2010-04-01T00:00:00
IU ANMO * BH? 2010-03-25T00:00:00 2010-04-01T00:00:00
IU ANMO 10 HHZ 2010-03-25T00:00:00 2010-04-01T00:00:00
II KURK 00 BHN 2010-03-25T00:00:00 2010-04-01T00:00:00
  • When the selected SEED quality is “best” (the default) all qualities available are merged together and overlap data is removed in order of increasing quality preference.
  • Glob expressions (wildcards) are allowed in all fields except date fields.

wget example

Requests can be made with a selection file and the wget Unix command line utility.

$ cat waveform.request
quality=B
TA A25A -- BHZ 2010-03-25T00:00:00 2010-04-01T00:00:00
TA A25A -- BHE 2010-03-25T00:00:00 2010-04-01T00:00:00
$ wget --post-file=waveform.request -O TA.miniseed http://service.iris.edu/fdsnws/dataselect/1/query

This will send the request to the server and save the results in a file name TA.miniseed

cURL example

Requests can also be made with a selection file and the curl Unix command line utility.

$ cat waveform.request
quality=B
TA A25A -- BHZ 2010-03-25T00:00:00 2010-04-01T00:00:00
TA A25A -- BHE 2010-03-25T00:00:00 2010-04-01T00:00:00
$ curl -L --data-binary @waveform.request -o TA.miniseed http://service.iris.edu/fdsnws/dataselect/1/query

Here is the equivalent request using query parameters instead of a selection file…

$ curl -L -o TA.miniseed "http://service.iris.edu/fdsnws/dataselect/1/query?net=TA&sta=A25A&cha=BHZ,BHE&loc=--&starttime=2010-03-25&endtime=2010-04-01"

This will send the request to the server and save the results in a file name TA.miniseed

We recommend always using the -L option to allow curl to follow HTTP redirections specified by our systems. The DMC uses HTTP redirection during maintenance to keep servicing requests.

You may wish to use the -f option. This will cause curl to return an exit code of 22 if data is not found or the request is improperly formatted.

See http://curl.haxx.se/docs/manpage.html for more information.

Working with miniSEED

A variety of software tools are available from the DMC to assist with organizing and viewing miniSEED data or converting it to another format. Detailed descriptions and usage examples for each piece of software can be found by clicking the links below.

mseed2sac – for converting miniSEED to SAC format
mseed2ascii – for converting miniSEED to ASCII formats
dataselect – for selecting and sorting miniSEED
miniSEED Inspector – for quickly parsing and summarizing miniSEED data
rdseed – for reading and extracting data in SEED volumes. NOTE: A dataless SEED volume must be used in combination with miniSEED for most conversions. A request must be submitted prior to downloading the rdseed software. http://ds.iris.edu/ds/nodes/dmc/forms/rdseed

Accessing restricted data

Requesting restricted data via this web service requires authentication. The authentication is done using a standard HTTP mechanism called digest access authentication, a sort of 3-way handshake. To submit a request with authentication credentials you would use the queryauth method of the service in place of the query method. You must use a client that supports digest authentication, luckily such support is common. All of the IRIS DMC’s clients support accessing restricted data through digest authentication.

For example, submitting a request and subsequently initiating the authentication handshake would be done by requesting this URL:

http://service.iris.edu/fdsnws/dataselect/1/queryauth?net=IU&sta=ANMO&loc=00&cha=BHZ&start=2010-02-27T06:30:00&end=2010-02-27T10:30:00

This request could be submitted, along with authentication credentials, using a command line tool like http://curl:

$ curl -L --digest --user EMAIL:PASSWORD -o data.mseed 'http://service.iris.edu/fdsnws/dataselect/1/queryauth?net=IU&sta=ANMO&loc=0Z&start=2010-02-27T06:30:00&end=2010-02-27T10:30:00'

where you replace EMAIL and PASSWORD with your own credentials. If you are submitting this request from the command line, then for security purposes, you may consider not including PASSWORD in your request, as it is an optional parameter. If only EMAIL is specified, then curl will prompt you for your password when the request is submitted.

You may try out authentication using your software with the following test credentials: email=nobody@iris.edu and password=anonymous. A working version of the http://curl example above using the test credentials would be:

$ curl -L --digest --user nobody@iris.edu:anonymous -o data.mseed 'http://service.iris.edu/fdsnws/dataselect/1/queryauth?net=IU&sta=ANMO&loc=00&cha=BHZ&start=2010-02-27T06:30:00&end=2010-02-27T10:30:00'

Note: A known problem can occur when repeatedly submitting queryauth requests for longer than a minute or so. The symptom is an authentication failure occurring, despite using proper credentials, after one or more successful requests. The work-around is for the client to re-submit the queryauth request. Only a single re-submission should be needed. If authentication repeatedly fails for queryauth requests, it indicates a different problem. The DMC will continue to look for a long-term solution to this issue, but for now, the recommendation of a single retry should work robustly.

Considerations

1 In general, it is preferable to not ask for too much data in a single request. Large requests take longer to complete. If a large request fails due to any networking issue, it will have to be resubmitted to be completed. This will cause the entire request to be completely reprocessed and re-transmitted. By breaking large requests into smaller requests, only the smaller pieces will need to be resubmitted and re-transmitted if there is a networking problem. Web service network connections will break after 5 to 10 minutes if no data is transmitted. For large requests, the fdsnws-dataselect web service can take several minutes before it starts returning data. When this happens, the web service may “flush” the HTTP headers with an “optimistic” success (200) code to the client in order to keep the network connection alive. This gives about 10 minutes to the underlying data retrieval mechanism to start pulling data out of the IRIS archive. Thus for larger requests, the HTTP return code can be unreliable. As data is streamed back to the client, the fdsnws-dataselect service partially buffers the returned data. During time periods when the underlying retrieval mechanism stalls, the web service will dribble the partial buffer to the client in an effort to keep the network connection alive.

It is less efficient to ask for too little data in each request. Each time a request is made, a network connection must be established and a request processing unit started. For performance reasons, it is better to group together selections from the same stations and place them in the same request. This is especially true of selections that cover the same time periods.

This utility should handle a week or month of data from several stations.

Merged/manufactured and esoteric data streams

By default the service will return the best time series available (denoted as quality “B” or “M”), which may be a mix of multiple qualities of the data delivered by the contributing network operator. In practice, most requests are simply the best (latest) quality of data available. For these “best” or “merged” data streams, the DMC will trim the time series to the example samples selected by the requested time window. For all other qualities (D, R and Q), the request will be trimmed at the nearest data record boundary in order to return the unmanipulated data to the user.

Some data streams are composed solely of state-of-health (SOH) information and do not contain primary data samples, other data streams contain components such as event detection records that also do not have samples. Historically such data components are challenging for the DMC data extractions system, which is optimized for time series data (many non-time series data records become a confetti of discontinuous data points in such a system). For this reason, the extraction system skips any data records that do no contain any samples for (the default) quality B or M requests. The end result is that detection records and data streams such as ACE or OCF channels which do not include data samples are not returned for quality B or M requests. These channels are available if the quality requested is D, R, Q or a wildcard.

Usage guidelines & real time data

All usage should follow the usage guidelines, specifically do not make too many concurrent requests or requests too quickly.

All open data arriving in (near) real time at the IRIS DMC are available from the DMC’s SeedLink server.

This web service should not be used to retrieve continuous, real time data via repeated polling. Instead, the SeedLink server should be used when continuous data streams are needed.