Frequency Domain measurement of Pulsar Period
From setiquest wiki
TITLE Frequency Domain measurement of Pulsar Period AUTHOR Dave Robinson DATE 24th August 2010 CLASSIFICATION Discussion Document
This paper briefly describes a methodology for accurately measuring Pulsar periods using a Frequency Domain methodology. The technique relies on specific characteristics of a Pulsar signal within the Fourier domain.
As an example of the process, the data from the first setiQuest Data file (2010-05-07-psrb0329+54-8bit-1-of-5.dat) has been processed, and generates a result that has less than 0.002% error from the accepted period. Undoubtedly a more accurate result can be obtained by generating a longer time record, by concatenating the data from the subsequent files in the sequence.
The process documented here is a two pass operation.
- An approximate repeat period which has an integer number of samples per period is made.
- An exact repeat period is determined using a widened sampling signal set at the approximate repeat period.
In order to provide a simpler sampling environment time has been redefined into units called SetiSeconds which is simply a period of time equal to 2^23 samples of data. This is approximately 0.96 seconds. Frequency is now measured in cycles per Setisec which is used in place of Hz.
It is hoped that an extension of this technique can be used as a basis for an automatic Pulsar detection algorithm.
The author welcomes any suggestions, comments or constructive criticism.
Frequency Domain Characteristics of a Pulsar
In general, the time domain representation of a Pulsar signal can be modeled as a finite width pulse occurring at a very regular intervals. Invariably this signal is buried within a Gaussian noise environment. What is needed is some characteristic that can distinguish between the Pulsar signal and the noise component. Subject to the exact period of the Pulsar being known, then techniques such as synchronous integration can be used to pull the signal out from the noise. However for detection of unknown Pulsars, this data is unavailable, and some other features must be used to separate the noise and the signal.
The technique outlined here attempts to investigate the different characteristics between the noise and the signal in the frequency domain.
- The Gaussian noise is essentially white, which means that it is distributed evenly throughout the spectrum.
- The finite width of the individual Pulsar pulses will mean that the spectrum of the pulse will be compact, the longer the pulse lasts, the narrower the frequency band actually containing the pulse information will be.
- Because the pulse repeats itself continuously within the measurement frame, the resulting spectrum will be sparse within its limited frequency band. The spectrum will look like a collection of uniformly spaced 'spikes', whose separation is an exact measurement of the repeat period of the Pulsar.
Thus the energy in the received data due to the noise component is spread right across the spectrum, the energy from the Pulsar itself is condensed into the low frequency region of the spectrum, and limited to a finite number of equally spaced spectral lines. The question then is – does this compaction of the Pulsar energy provide a signal that raises the visibility above the ambient noise level? Clearly the process is not going to be as effective as detecting a pure sine-wave, where the energy of the signal gets compacted into a single spike (or two if you include the negative frequency region), the following section shows that for the particular Pulsar analysed here, the answer is an unequivocal yes.
This graph is the low frequency region of the positive frequency region of the spectrum of the Pulsar. Several interesting features can be seen in the graph. The first is a significant drop in the noise level at the point at about 16000. This is entirely due to the data having been previously low-pass filtered at a frequency of 128 cycles/Setisec for a previous experiment. The next features are two sinusoidal spikes, at about 7000 and another about 14000, these correspond to a frequency of about 55Hz and 110Hz. The source of these signals obviously need investigation, (it is interesting that they should appear to be harmonically related, and be situated just at the point where the Pulsar signal effectively stops). Most important for the purpose of this document is the spiked distribution of the Pulsar itself, exactly in the form that Fourier theory tell us it should look.
Analysis of the Pulsar Data
The first step was to isolate the pulsar spectrum, then measure the spacing between its spectral lines. The first stage was undertaken by extracting a sub-matrix from the spectrum from the start of the spectrum to 6800 (just short of the first mystery spike). The next step was to augment the data vector with another column simply containing the sample number. As can be seen from the graph, in this region, the largest data values are those of the Pulsar harmonics, so these were gathered together simply by sorting the 2 column matrix by the data in the first column in descending order (the amplitude values). Thus the biggest spikes were top of the list, the smallest were at the bottom. Note however the position of these samples are still being tracked by the associated sample number column which gets rearranged along with the amplitude data in the sorting process.
Only the data containing the harmonics are needed, the rest of the data can be discarded, this is accomplished simply by creaming off the top section of the sorted data, and discarding the rest. The data can then be reorganized back into the order that it came in by simply sorting the shortened matrix using the sample number column in ascending order. The information that is required is the spacing between the spikes, which is given by differencing the sample number column from its preceding value.
Generating an approximate repeat spacing
In order to deal with the case where there may be a 'phantom' spike in this region of the spectrum, such as may be caused by pick-up from a sinusoidal source generating a non-related spectral line in the sorted data, a two step process was undertaken. The first simply took the sample number difference vector, and found its mode. That is to say the spacing that occurred most often in that data was assumed to be the repeat spacing. This would completely remove the effect of the 'phantom' spike, if it existed. However this technique has the problem that it is limited to an integer sample spacing, and the chances of the Pulsar having a period that was directly related to the length of time that we were observing it (and hence the spectral resolution of our graph) is very remote indeed.
For this set of data the approximate repeat spacing came out at 164 samples. Experimentation shows exactly what we expect, the integer sample spacing, whilst being very close to the harmonic spikes, does not correspond exactly to all of them. This was dealt with with a second stage of processing.
Generating the exact Pulsar repeat spacing
A sampling window was constructed which was centered on sample 164, and covered a band of 30 samples. This window was searched for the position of the largest amplitude signal, and its position was stored in a two column array, together with the measurement number (1 in this case). The measurement window was then translated forward by 164 samples, so it was now centered on sample 328, again the window was searched for the largest signal, and its position was noted in the second row of the array, along with the measurement number (2 in this case). The process was repeated for all valid data. The resulting array was used to generate a linear graph, whose slope is equal to the repeat spacing of the Pulsar harmonics. One or two of the values could be out – say there was a 'phantom' spike in the capture window, or there was an atypical noise spike that was larger than the harmonic itself. This simply generates a perturbation on the linear graph, which can be effectively eliminated from the measurement by computing the slope of the data using a linear regression algorithm. For the Pulsar used here the resulting linear graph is shown below.
Here the X axis is simply the measurement number, and the Y axis is the peak position within the corresponding measurement window.
The slope value evaluates to 163.91 samples, which when converted to pulse repetition period is 0.71453 seconds which compares well with the value quoted by the SETI experts in this Forum of 0.71452 seconds.