I just wanted to leave a little note describing interrelationships between various algorithms associated with autocorrelation.
Firstly, I point you to the convolution theorem (http://en.wikipedia.org/wiki/Convolution_theorem) which is the heart of a FAST method for performing convolutions.
Algorithm 1) Next I'll point out that an autocorrelation is the same as taking a signal and convolving it with itself. Using FFT (or another approximate Fourier Transform), the autocorrelation of a signal s(t) can be computed like this:
Autocorrelation of time spectrum = | InverseFFT( | FFT( s(t) ) |^2 ) |^2
This is a nested set of operators that start from the inside and moving out. First the signal is FFT'd to the frequncy domain. The result is squared. Then an inverse FFT brings you back to the time domain (delay space) resulting in the complex-valued autocorrelation. While not exactly necessary, the autocorrelation is squared again so that it can be easily plotted on a 2-D graph.
Notice that only 2 operators are required. An FFT (the inverse is essentially another FFT) and the magnitude-squaring operator, or just "squaring" for short. Squaring means that each sample is replaced by the sample's product with its own complex conjugate.
Algorithm 2) These same two operators can be re-ordered to give other quantities of interest. Suppose you would like to see the autocorrelation of the frequency spectrum S(f) = FFT( s(t) ). I'll leave it as an exercise for the student to prove
Autocorrelation of frequency spectrum = | FFT( |s(t)|^2 ) |^2
That is, you take the raw data, square it, and then apply an FFT. For the sake of communication with the human mind, this result is squared to make it plottable on a 2-D graph.
Algorithm 3) Backing up a little, it is of interest to examine the frequency spectrum of Algorithm 1. This is, simply
| FFT ( s(t) ) |^2
(note relationship to algorithm 1). This is called the "power spectrum" of the time series signal.
Algorithm 4) By analogy, it is interesting to examine the power versus time, simply
| s(t) |^2
In this short note, I draw attention to the similarity of the "original 4" analysis methods suggested in the GSoC proposal idea. The last operation is always to square the resulting data. The stuff inside is built of alternating FFT's and squaring operations.
To test your understanding, suppose you have two programs; one that takes the FFT (or inverse FFT) and one that performs magnitude-square operation in place. Write down the four algorithms using only these two operators connected by pipes. Feel free to post your results here.
I'll post more soon on 2-D domains (waterfall and variants) related to these algorithms and other algorithms.
Hi -- a little more discussion:
Detection - When we say we've "detected" a signal, we usually mean that we've run a thresholding algorithm across the signal result from a given algorithm. All thresholds are computed exactly the same way (one more code block). First, characterize the signal by computing its mean and standard deviation (both are really estimates, but they're pretty good estimates). Frequently we call the standard deviation "sigma" because of the Greek letter that is conventionally used to denote this quantity.
Set a threshold like 5-sigma. Any data point that is more than 5 sigma away from the mean value is a "detection." Setting the threshold value (how many sigma) is an art.
Algorithm 4) Pulses: Is useful for discovering strong pulses in the data. Just use a threshold operator; any point in the data which is >5 sigma from the mean value is a "pulse." For the record, pulses that come from far away sometimes suffer "dispersion" which is a topic for another day.
Algorithm 3) Narrow band tones or carrier waves: This is the traditional SETI search, looking for something like a radio station transmitting "dead air," (silence). Narrow tone signals stand out on a background of cosmic noise.
Algorithm 2) Amplitude Modulation: This algorithm is particularly good for spotting AM radio transmitters, especially if they are periodic. Not sensitive to pure phase modulated signals.
Algorithm 1) Phase Modulation: In general, if the phase is modulated in a regular way, then you'll see something here. Our favorite satellite transmitter for phase modulation is GPS (type BPSK into wikipedia). Purely AM modulations do not show up very well, whereas periodic FM modulation shows up nicely.
It is late so check my descriptions. What have I left out?
Do you perform any iterative sigma clipping on your data before ascertaining your mean and standard deviation or are the predicted signals so weak that their impact on the data makes this unnecessary?
I am also interested as to why you choose a 5 sigma detection over the "traditional" 3 sigma threshold used to quantify a statistically relevant result. Is this limit obtained empirically or is there a theoretical framework for it?
Clipping data during digitization: First you have to know that in a radio telescope, the receiver noise (dark noise) ususally dominates any observation. The radiation from the sky is << than this. So the gain of the telescope is set to optimize the measurement of Gaussian distruibuted noise (in the complex domain). Also, each sample of this noise is stored as 8-bit real + 8-bit imaginary value.
(Question from the alert student, why do we use a complex representation when the electric field is intrinsically real-valued? The incoming real-valued data from the digital sampler is processed with a Hilbert transform so that subsequent math operations are "simplified" to use exponential notation.)
With 8 bits in amplitude, you have 255 available values. How does one set the input amplifiers to make the best use of these bits (retain maximal information) assuming the values represent a linear scale? (Nonlinear scales retain more info, but are harder to process). For a Gaussian signal where mean amplitude ~ sigma, the correct answer is to put the gain such that the average value is ~15, depending on some details. This means that you have 15 values to represent the mean value and samples smaller than the mean. On the high end you have 255/15 = 17 sigma headroom for big samples. The chances of a 17 sigma event are negligibly small in our observations, but since man-made transmitters show up from time to time and blast the receiver, we leave extra headroom for RFI events.
That is, the data are clipped at 17 sigma of the mean reciever noise.
Choosing the signal threshold: To understand how to set the detector threshold (how many sigma from the mean is considered an event), we have to look at the probability of false alarm based on reciever noise. This is a lower limit on where we place our threshold. In a 12 second observation at 8 MHz bandwidth (typical), there are 100 million samples. So you have a good chance of seeing a one in 100-million event in every such observation, just from the noise alone. Converting 1/100,000,000 to sigma makes this a 6-sigma event.
To put it another way, for a 3-sigma detection threshold (1/400) you can expect to see 240,000 events in any observation by pure noise alone! These are not real signals, just receiver noise. If there is a real signal in there, then it will be difficult to find. If we set the threshold to 5-sigma (1/2,000,000), then we'll see approximately 48 detections due to noise alone in the same data. This is still probably too much. The point is that you want to identify signals that could not appear due to noise alone. At 6-sigma, every observation has one detection by noise. I think you can see why ordinarily the fully-automated system uses 8-sigma thresholds. This pretty much eliminates noise hits, yet still gives many reports of signals that are evidently aritificial. Most (or all) are human-generated RFI (not accounted for in Gaussian statistics).
In our reductions, we will have to adjust our thresholds to reduce the number of detections to a managable level, in the presence of RFI. That is, there is an "art" to tuning the threshold to prevent you from spending all your time chasing down non-SETI signals. That is just the limit of our ability right now.
As Jill Tarter's overarching philosophy of building detectors for classifying more and more signal types, we may be able to stop hand-tuning and use statistical models to adjust thresholds more scientifically.
Out of curiosity: How does choosing the average as 15 retain maximal information?
There are two things we wish to optimize in our system: The best possible sensitivity to weak signals and the dynamic range to resist the deleterious effects of strong signals (RFI). This trade off is complicated by the fact that we have an analog-to-digital converter between our analog receivers and the digital signal processing.
To most accurately measure a weak signal, we should "use most of the 8 bits most of the time," which allows us to measure small signal variations accurately. To be most resistant to RFI, we should put attenuation in front of the digitizer to bring the RFI "back into range," but this reduces our sensitivity to weak signals.
I can't find the reference I am looking for (will post if I find it), but there is a mathematical theorem which tells you how to optimize the analog signal into your digitizer for the most accurate measurement of weak signals. With a 1-bit digitizer you have only two values to choose from. It turns out that adjusting your gain so that any positive signal is registered as +1.4 and any negative signal is registered as -1.4 will give you the optimal digitization of the signal, as measured in terms of keeping the most information.
With this assignment, 1-bit digitization retains 64%of the information in a cross-correlation figure of merit (not described here). Similar optimizations result in 88% efficiency for 2-bit digitization (4 values) and 96% efficiency for 3-bit digitization (8 different values are recorded). Higher number of bits get you diminishing returrns. Notice that expression of 11 different values requires a bit more than 3 bits (log2(11) = 3.5 bits), giving you something like 98% sensitivity to weak signals.
Now, thanks to the digitizer industry, an 8-bit digitizer is about the smallest (and fastest) digitizer you can buy off the shelf. But we need only 3.5 bits! So we use all the rest of the bits for dynamic range. With 11 = 1 standard deviation, we get 127/11 = 11 standard deviations (setting aside one bit for sign). With an 8-bit digitizer we use 1 bit for sign, 3.5 bits devoted to weak signals and 3.5 bits devoted to resistance against RFI.
I hope this discussion helps to answer your question. If you have more interest, look up (http://adsabs.harvard.edu/full/1970AuJPh..23..521C). Cooper was not worried about RFI, but the small number of bits part is the difficult part.
Is it not simply (in pseudo-code format):
1) cat some_input_data.dat | FFT -stdin | SQUARE -stdin | FFT -stdin -inverse | SQUARE -stdin -o some_output_data.dat
2) cat some_input_data.dat | SQUARE -stdin | FFT -stdin | SQUARE -stdin -o some_output_dat.dat
3) cat some_input_data.dat | FFT -stdin | SQUARE -stdin -o some_output_data.dat
4) cat some_input_data.dat | SQUARE -stdin -o some_output_data.dat
-stdin tells the routines that the information is being read from STDIN (only really required if each routine had the additional option to accept an input file, e.g. with a flag of -i). This is not necessary if we use cat.
-o [FILENAME] tells the routine to write the information to an output file, presumably ASCII.