SetiQuest Data Format

From setiquest wiki

Revision as of 18:35, 30 April 2012 by SigBlips (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
Baudline kepler-4 FSK.png

The setiQuest Data Format is quadrature (I/Q) 8-bit waveform samples which have 2 channels interleaved at a sample rate of 8738133 samples per seconds. The collected data is broken down into multiple 2 GB raw files for easier consumption. There are also some auxiliary file formats that support the main raw data. The data archive can be accessed at SetiQuest Data Links.

Contents


File Specifics

All files associated with a particular observation are grouped together into a sub-directory. A sub-directory contains a README file explaining the file formats. We expand on this information here.

In general, there are four file types:

  1. complex voltage sample files (*1-of-n.dat)
  2. a headers file (*hdrs.dat)
  3. a file containing metadata (*meta.txt)
  4. a file containing antenna tracking coordinates (*.ephem).

Metadata File

Probably the first file you'll need, the metadata file (*meta.txt) contains useful information in simple text format, including: the observation source name, frequency, bandwidth (aka Sample Rate), date and time, and and which antennas were used by the beamformer to form the synthetic beam.

There are a few datasets where the *meta.txt was lost due to a disk crash event.

Sample Rate

Unless otherwise specified, sample rate is 8738133.333... complex samples per second (8.7 MHz) except one or two test cases with twice that sample rate.

Note: some metadata files list a truncated sample rate of 8.7381 MS/s. The precise sample rate is always 8.738133(3), or twice this value.

Complex Voltage Sample Files (aka Raw Data)

The complex voltage sample files are split into manageable chunks. For example, an observation of the Crab Pulsar on 2010-03-26 yielded a large complex voltage sample file that was divided by 2048000000 bytes into five chunks:

Users can choose to download one or more chunks for analysis. There are no missing samples (time) between them, so they can be combined in any number of simple ways. For example, in a Unix/Linux environment:

cat 2010-03-26-crab-8bit-{1,2,3,4,5}-of-5.dat > 2010-03-26-crab-8bit-combined.dat

The complex voltage sample files have no header, containing only signed 8-bit complex coefficients arranged: real, imag, real, imag, ... The first two coefficients represent sample 1, the second two represent sample 2, etc.

For example, if we look at a dump of the first few coefficients of a complex voltage sample file chunk:

xxd 2010-03-26-crab-8bit-1-of-5.dat | head -n 1

0000000: fdea f9c4 1cf7 f3de 10f3 f2fb 0218 04f4 ................

the signed 8-bit values are interpreted as complex values:

(-3-22i) (-7-60i) (28-9i) (-13-34i) (16-13i) (-14-5i) (2+24i) (4-12i) ...

Headers File

The header file contains information about each data record in the complex voltage sample files. To access a data record you must first find the byte offset of the data record by reading from the corresponding record in the headers file.

As an example of the header file structure here is a hex dump of first two header records from 2011-01-28-exo-gl581_4462_1-hdrs.dat:

2011-01-28-gl581-dump-first-two-hdrs dat-records.png

The headers file structure can be represented in C (programming language) with the following structure:

 struct pkt_hdr_fmt
 {
   unsigned char group;
   unsigned char pkt_version;
   unsigned char bits_per_sample;
   unsigned char binary_point_pos;
   uint32_t endian_magic_num;
   unsigned char pkt_type;
   unsigned char nmbr_of_streams;
   unsigned char pol_code;
   unsigned char hdr_len;
   uint32_t data_source;
   uint32_t channel_num;
   uint32_t sequence_num;
   double frequency;
   double sample_rate;
   float useable_bw;
   unsigned char reserved[4];
   uint64_t timestamp;
   uint32_t status_flags;
   uint32_t data_len;
 };

 struct extended_pkt_hdr_fmt 
 {
   struct pkt_hdr_fmt pkt_hdr;
   uint64_t zeros;
   uint64_t offset;
 };


Examples (Perl) of dumping header record fields. These examples work in Linux (the dd utility is not available in standard Windows).

Sequence number of first header record:

$ perl -e 'read(STDIN,$b,80);print(unpack("x20L1",$b),"\n")' <2011-01-28-exo-gl581_4462_1-hdrs.dat
2147919

Sequence number of tenth header record:

$ dd if=2011-01-28-exo-gl581_4462_1-hdrs.dat bs=80 count=1 skip=10 | perl -e 'read(STDIN,$b,80);print(unpack("x20L1",$b),"\n")'
2147929

Frequency (MHz) of first header record:

$ perl -e 'read(STDIN,$b,80);print(unpack("x24d1",$b),"\n")' <2011-01-28-exo-gl581_4462_1-hdrs.dat
1413.4464

Sample_rate of first header record:

$ perl -e 'read(STDIN,$b,80);print(unpack("x32d1",$b),"\n")' <2011-01-28-exo-gl581_4462_1-hdrs.dat
8.73813333333333

Timestamp of first header record:

$ perl -e 'read(STDIN,$b,80);@t=unpack("x48L2",$b);printf("%s + %8.6f seconds\n",scalar(gmtime($t[1])),$t[0]/(2**32))' <2011-01-28-exo-gl581_4462_1-hdrs.dat
Fri Jan 28 19:22:24 2011 + 0.120995 seconds

Timestamp of last header record:

$ ls -lt *hdrs.dat
-r--r--r-- 1 setiquest users 552125680 2011-01-28 21:33 2011-01-28-exo-gl581_4462_1-hdrs.dat

$ bc
(552125680/80)-1
6901570
quit

$ dd if=2011-01-28-exo-gl581_4462_1-hdrs.dat bs=80 count=1 skip=6901570 | perl -e 'read(STDIN,$b,80);@t=unpack("x48L2",$b);printf("%s + %8.6f seconds\n",scalar(gmtime($t[1])),$t[0]/(2**32))'
Fri Jan 28 19:35:52 2011 + 0.898761 seconds

Ephemeris File

The ephemeris file (*.ephem) contains antenna tracking information for the observed source.


What to do with setiQuest Data

Viewing Raw data

Linux had great utilities built in for viewing binary files to see what raw information lies within. Gerry likes xxd.

But such utilities are not built in for the Windows user. Some suggested methods for windows are (please add more!):

Some suggestions at:

 http://www.hexedit.com/
 http://setiquest.org/forum/topic/now-able-partition-large-standard-seti-data-files-and-save-hex-content-text-ascii-format.

Viewing Waterfall Data

Some of the datasets have been processed and waterfall plots are available for browsing.

You can also make waterfalls on your own, using Algorithms, baudline, or other tools described below.

Data Analysis Tools

Algorithms

With the generous support from Google through a Google Summer of Code (GSoC) internship, the SETI Institute has developed an open source package called SETIKit which allows users to string together piped command line programs for very flexible analyses. You can generate waterfall outputs and experiment with new or unusual signal processing algorithms. Take a look!

Baudline

The data can be fed into the baudline signal analyzer with the following command line:

cat *.dat | baudline -stdin -format s8 -channels 2 -quadrature -samplerate 8738133

Examples of datasets analyzed with Baudline can be found at: http://setiquest.org/wiki/index.php/User:SigBlips#setiQuest_datasets.

Matlab and Octave

The data can be read into Matlab/Octave programs with the following commands:

fd = fopen(filename.dat);
coeffs = fread(fd, height*2, "int8");
cmplx(1:height) = coeffs(1:2:height*2) + coeffs(2:2:height*2) * i;

Some sample functions for reading and processing is available here:

https://github.com/hartze11/setiQuest-Octave-Code

To load data into Octave:

x=read_seti('2010-05-07-psrb0329+54-8bit-1-of-5.dat');

The fft_avg function an performs an FFT averaging on the vector X, by applying a Hanning window (to clean-up the FFT). The window length can be specified (win_length).

f=fft_avg(x);

To view the FFT, issue:

plot(fftshift(f));

For a more detailed description and some examples see the setiQuest Octave Tutorial page.

References

  1. http://en.wikipedia.org/wiki/International_Atomic_Time
  2. http://download.oracle.com/javase/1.4.2/docs/api/java/sql/Time.html

See also

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox