SetiQuest Data tutorial octave

From setiquest wiki

(Redirected from SetiData tutorial octave)
Jump to: navigation, search

setiQuest Data/Octave Tutorial

(Under Construction)

This tutorial will explain how to ......

1. Download and view the setiQuest Data Links and its contents.

2. Help explain and understand the different setiQuest Data Formats.

3. Load the Data into Octave.

4. Analyze and Graph the Data inside Octave.

The setiQuest Data

Don’t Panic!

-The downloading and analysis of the SETI quest data will take a while.

-This tutorial is written for relatively inexperienced windows users.


1. The Data


- The SETI Quest Data available for download is collected from the Allen Telescope Array (ATA) at Hat Creek Radio Observatory which is a group of 42 radio interferometers. The data is broken down into 4 to 10 separate 1.9 GB raw .dat files for faster download and usage, although depending upon your internet connection it still may take several hours to download just one file. The primary data format is quadrature (I/Q) 8-bit samples which is 2 channels interleaved. The Data is available for download at setiQuest Data Links.

- There are four different file types available from each observation:

  1. complex voltage sample files (*1-of-n.dat) 
  2. a headers file (*hdrs.dat) 
  3. a file containing metadata (*meta.txt) 
  4. a file containing antenna tracking coordinates (*.ephem). 

- In the file names the (*) represents the part of the data file that changes with each set. Each format has a different use.

- The bulk of the data is the complex voltage sample files. These files contain signed 8-bit complex coefficients arranged: real, imaginary, real, imaginary, etc. Most of the analysis and graphing will use this data format.

-The header file contains information about each data record in the complex voltage sample files:

[Chart to be added here]

-The metadata files contain information on the observation such as date, time, frequency, and bandwidth.

- The ephem file contains useful tracking information on the antennas.

-Further explanation of the Data is available at setiQuest Data


2. Download and Install instructions


-There are a few other programs that need to be downloaded in order to be able to analyze and graph the setiQuest data. If you have the resources and time then the best program to use would be MATLAB, but Octave is an open source free alternative that has many of the same capabilities. The windows Octave download is available at http://octave.en.softonic.com/. Octave will be used in the correlation and graphing of the complex voltage sample files.


To install Octave:

1. Double click on the downloaded file

2. Accept the License Agreement

3. Uncheck the all Three Bing toolbar boxes and click next

4. Wait for program to download and install

5. Click finish to Open program.


-To be able to view and edit the programs to be used in Octave it is necessary to use a program that can edit C and C++ language files. Notepad ++ is a free text editor and source code editor for windows that works with dozens of different programming languages. Notepad ++ can be downloaded at http://download.cnet.com/Notepad/3000-2352_4-10327521.html.


To install notepad ++:

1. Double click on the downloaded file

2. Click next step when cnet download helper opens

3. Uncheck install FoxTab box then click next step

4. After download is complete click install now

5. Click finish to begin using program


-The data itself cannot be directly used in octave. It will first need to be auto correlated and formatted. This requires prewritten programs that tell octave what to do. These programs can be found here ftp://ftp.seti.org/gharp/SQMatlab/. Copy all three of these programs (AutoCorrelation.m, PowerSpectrum.m, SQdataToolkit.m) into Notepad ++ and save them in the same directory and folder as octave.

- Due to the size of the files many programs cannot directly read or work with the data. It also takes a very long time to correlate and analyze that amount of information. The program hjsplit found at http://hjsplit.en.softonic.com/ can be used to cut the files down into much more manageable chunks. Since most programs handle 10 to 20 MB size files rather well it would behoove you to set the files to a size in that range. This program is very easy to use, first select file you want to split, then specify the size you want it split into (10-20 MB) then click split.


To install hjsplit:

1. Double click on the downloaded file

2. Accept the License Agreement

3. Uncheck the all Three Bing toolbar boxes and click next

4. Wait for program to download and install

5. Click finish to Open program.


-Once split into more manageable files they can then be read in the hex editor HxD. The download is found at http://download.cnet.com/HxD-Hex-Editor/3000-2352_4-10891068.html. This will also show you what the actual files look like.


To install HxD:

1. Double click on the downloaded file

2. Click next step when cnet download helper opens

3. Uncheck install FoxTab box then click next step

4. After download is complete click install now

5. Click finish to begin using program



3. Using the Data


-To start using the data you need to first open the command prompt in windows. This can be done by either typing in command prompt or cmd in the search or run bar that pops up after clicking on the windows logo. It can also be found under the accessories folder on most windows systems.

-Auto Correlation and explanation (Skip next section if you just want to process the data with no explanation):

-Auto Correlation: The program AutoCorrelation.m reads and then processes SetiQuest data using an autocorrelation algorithm. This works by reading blocks of data, performing a Fourier Transform on the data, and then averaging over blocks. Finally the blocks are transformed back to the time domain at the very end.


---More advanced explanation of AutoCorrelation.m Program---


(Advanced Autocorrelation and Power Spectral details) Hyperlink*


These lines actually read a block of data from file:


fin = fopen(filename); [rawbytes, count] = fread(fin, fft_len, "int8");


Then the raw byte data are converted to single precision


coeffs1 = single(rawbytes);


Then the program reads some more data and converts that to single precision too. This is a subtlety where we try to avoid artifacts in the Fast Fourier Transform function by stepping along the data 1/2 of a block at a time. For now let’s ignore this part.

The single-precision data operation is actually storing a complex-valued array, with the first two elements being the real and imaginary part of the first sample. The second two elements of coeffs1 store the second complex value, and so on. To convert from this funny representation to Octave complex numbers we do this:


cdata(1:fft_len/2) = (coeffs1(1:2:fft_len).+(i * coeffs1(2:2:fft_len)));


Now, the tricky part is that coeffs1 is a length (fft_len = N) array, but it actually contains only half as many samples because there are two elements per sample. So we complete the cdata array (which should have N samples) by adding another N/2 elements:


cdata(fft_len/2+1:fft_len) = (coeffs2(1:2:fft_len).+(i * coeffs2(2:2:fft_len)));


To get started you don't need this complexity. Just read 2N bytes from the file, convert to single precision, and then convert them to a complex-valued array in one simple step:


N = 1048576;  % Should be a power of 2, in this case 2^20 fin = fopen(filename); [rawbytes, count] = fread(fin, 2*N, "int8"); coeffs = single(rawbytes) cdata(1:N) = (coeffs(1:2:2*N).+(i * coeffs(2:2:2*N)))


---End---


Inside the command prompt:

   C:\Users\Kyle>

1. Type in the directory and file extension that is affiliated with octave and program files (mine were stored in my Documents directory in a folder named SETI).

To get to mine, I typed: chdir Documents/SETI

   C:\Users\Kyle\Documents\SETI>

2. To run the octave .exe the file path had to be typed in.

Most paths should look like: C:\Octave\3.2.4_gcc-4.4.0\bin\Octave

Thus the partially completed line should look like this (Not done with the line yet)

   C:\Users\Kyle\Documents\SETI> C:\Octave\3.2.4_gcc-4.4.0\bin\Octave

-the last “Octave” tells the computer that it will be running the following program information inside the octave executable.

3. We then want to complete the line by telling octave that it should run the written program AutoCorrelation.m along with the number of iterations we want and the fft-len number.

	AutoCorrelation.m(program)  100000000(iteration) 4194304(fft-len) 

C:\Users\Kyle\Documents\SETI> C:\Octave\3.2.4_gcc-4.4.0\bin\Octave AutoCorrelation.m 100000000 4194304

4. The last thing to add is the file to be auto correlated

For instance: 2011-01-28-exo-gl581_4462_1-8bit-01.dat

Thus the complete line should look similar to the following

   C:\Users\Kyle\Documents\SETI> C:\Octave\3.2.4_gcc-4.4.0\bin\Octave AutoCorrelation.m 100000000 4194304 2011-01-28-exo-gl581_4462_1-8bit-01.dat

-after octave there should only be one space in between each command and no slashes

-After the command starts executing it could take up to several hours depending on processing power.

-The "octave" command runs the autocorrelation script with a very large number of repetitions and an fft-len of 4194304 = 2^22 on a specific data file. The 100000000 value says to repeat many times. I believe this number is larger than the size of all the data, so the program runs until all the data have been analyzed. The 4194304 is a power of 2, and is the fft-len.

-To have it run all the data sets at once you need only replace the file name with `ls *01.dat *02.dat *03.dat *04.dat *05.dat *06.dat *07.dat *08.dat *09.dat *10.dat | xargs`

-Make sure that the files are stored inside the same folder as octave. Only store one data set at a time in this directory, otherwise they could get mixed up.

-The "ls" command inside the back-tics is evaluated before the octave program is run (this is a feature in bash). This command generates a list of filenames based on all data in the directory where the program is run. Most of our setiQuest datasets are broken into many pieces, with as many as 10 different files. This picks up all files that end in *XX.dat with XX = 01 to 10. Because the ls command generates a list that may contain carriage returns, we pipe the list into "xargs" before sending it to the octave script.

-This will create a file in the same folder that the original was stored and will be classified as an .autocorr file. The following is an explanation of the .autocorr file:


There are two columns of data in the output - the delay value followed by the autocorrelation value. The delays are calculated with this code:


calculate delay

delta_t = 1/BW_actual;

delay = [1:fft_len];

delay = (delay - (fft_len/2+1)) * delta_t;


Because the bandwidth is specified in MHz, the delay values are in microseconds. If you follow the code you will see that the delay values should be (nearly) symmetric, going from -delta_t * (N/2+1) to delay_t * (N/2-1). Make sure that you really have 1048576 values in the output file. If the file appears to be truncated. Try this:


wc -l my_output_filename


Which will count the lines in the output. Also, I'm assuming that you are using 1048576 as the input number of samples to fft at once.

If the autocorrelation values from your run are all NaN, this usually means a divide by zero error. Consider this code, which is directly above the write out section:


perform last FFT for AutoCorrelation

autocorr = fft(avg_pwr);

autocorr = autocorr/fft_len;

autocorr = abs(autocorr).**2;

autocorr = autocorr/autocorr(1);

autocorr = fftshift(autocorr);


The average of the power spectrum is fourier transformed, divided by the fft_len (this is actually a redundant operation), and then squared. There should be no negative numbers in the output after the square.

The real normalization is done now, dividing the results by the value in the first bin. This means that there should be exactly 1 value in the file with value = 1. Finally we perform an "fftshift". Because of the fftshift, the zero-delay autocorrelation should appear in bin number (N/2 + 1), that is in the middle of the spectrum. I suggest that you take a look at the center bin and see if it is unity.


Suggestion: Print out some values from the array avg_pwr and verify that most values are nonzero.


Suggestion: Print out the value of the first element in the autocorrelation array (used as divisor for normalization). This value should never be zero, but if it is, then something is wrong.


Finally I strongly suggest you create your own, simple file on which to test the autocorrelation program. This can be done easily in any ascii editor (remember character values are bytes). For example, if you type the following text into a file:


10000000


then when this is read by the program, it will see the following complex numbers:


46, 45

45, 45

45, 45

45, 45


Notice the first number is different from the rest. The values on each line are the real and imaginary part of the data. Knowing this much, it is fairly easy to prepare a repeating signal that will show up nicely in autocorrelation:


10000000100000001000000010000000100000001000000010000000100000001000000010000000100000001000000010000000100000001000000010000000


Copy the above string into a file. Notice that there are 64 samples in this file (128 bytes). Now use this data file as input and set the fft_len to 64 (or a smaller power of 2). This will generate a short output file. Since the repeat distance is 4 samples, the output should have peaks every 4 samples as well.


-After all the files are correlated they can then be loaded into octave to make graphs and images.


4. Analysis and Graphing in Octave

(coming soon)

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox