Has anyone thought of techniques to pre-process the raw data using GA/GP techniques in a way similar to a kind of digital Petri dish? It occurs to me that certain elements of pattern could be grown from the seeds of the raw data and then algorithms used to train an 'observer' application to spot the emergence of order from chaos, classify it, and alert a human to look at the evolution of the signal data over time.
In this way, we may be able to trace back to a faint signal from an emergent property of the data which emphasises some aspects, removes others, and enhances possible signals that might be lost in the swamp of information. It might also be easier to spot a signal from the rest of the general mish-mash if it's moving/growing/emerging?
[I think there are some good ideas here, but I seem to be having issues expressing them in suitable terms...]
I have spent many a happy hour discussing these algorithms with my Son, who uses them extensively, for computational chemistry (pharmacology) applications. In order to really make them work usefully, you really need to understand what sort of signal you are trying to extract. Key to their operation is the so called fitness function, which you, the searcher, needs to define with great precision based upon intimate knowledge of how your data needs to be processed. If you don't use a meaningful fitness function, the algorithm just won't generate any useful output, and will, in all probability, generate meaningless output, as the algorithm evolves into 'local potential wells' and not into the output that you are actually searching for. Here my Son has the advantage over Seti Searchers - he understands how his molecules bind together, and so can construct his fitness functions, with all of the quantum mechanics that he needs. We have absolutely no idea what a signal from ET looks like, and so have no idea how to generate a meaningful fitness function. He also has the advantage of having access to an array processor containing 1024 processors, which I guess very few of us have.
Having said that I agree with you that there are areas where such techniques might be applicable. For example in automatic Pulsar identification, you would expect the pulsar signal to be easily identified in a search field of repertition rate against dispersion, and this may well be ameanable to a Genetic solution, whether this would be the best route forward, I currently don't know.
I agree with Dave, the fitness function is the major requirement for GA to be worthwhile. I would prefer to experiment more with Kohonen SOMs for classifying the data that there's something of interest, to a modified form of Particle Swarm Optimisation for enhancing potential signals in a waterfall. I'm hoping that in January, I will have a chance to test some of these theories.
The use of a SOM seems an interesting possibility. I have used Kohonen-Grossberg SOM's in a Content Based Image Retrieval system, which, for the limited number of images I had, worked very reliably. I found that the clue to getting the full power out of these things was to pre-transform the source data into a domain where the parameters I was interested in tended to tightly cluster, this often required dimensional reduction techniques such as Principal Component Analysis on the data. I have spent quite an amount of time trying to cast the Waterfall data into such a high compaction domain, but due to a gross failure of imagination on my part, I have been unable to identify any. The best I have been able to come up with was applying the Radon Transform to the data, a solution which Sigblips points out can be carried out much more effectively using his algorithms on Baudline.
I would be VERY interested if you have solved this problem, and can see how to cast the standard SETI Data into a form which would allow it to be classified using a SOM efficiently, and would be willing to discuss it here.
Threads like this one is what I had hoped would be the normal on this Forum, one in which we all get a chance to learn, contribute, and end up possibly generating a unique SETI detection algorithm between us
The Radon Transform application comment Dave is referring to is described here:
Thank you Dave & Sigblips for the informative feedback.
Since SOM has been used quite successfully for handwriting analysis & OCR, but to take a complete data snapshot and to try and classify that is too much considering the scarce information with regards the noise.
My idea was to break the data into small A x B matrices and have the SOM classify these. During experimentation, this could be a self adapting SOM but for speed purposes a trained SOM would be ideal. This would effectively increase the granularity of the data sample but keep important characteristics of that section, this could also then be repeated x number of times to give an overall classification of the complete data snapshot.
I found this paper that is similar to what I have in mind.
I have down loaded the paper you referenced, but need to study it in detail.
Looking at the images they show at the bottom of the paper, I can see where you are coming from, some look uncannily like the waterfall display; in fact in one respect the tile problem is more difficult, as the defect can occur in any direction, but (I think) the Waterfall would have a much smaller range of validity, as the valid signals would be more in the vertical direction. Also what it has going for it is that it can easily cope with Jill's Squiggle signals.
Bearing in mind I have only currently skimmed the paper, and not fully understood it, what concerns me is the amount of processing it entails. First the data needs to be Fourier transformed, to get it into the 'waterfall domain' then it requires application of the Gabor transform which I guess is of similar complexity, and a Canny edge detection, which as a second generation edge detector is fairly compicated compared to the convolutional edge detectors such as Sobel, and do require a bit of 'tuning' to optimize them for the edge you are trying to detect.
However none of this really puts the technique out of bounds, because each of these processes can be pipelined, and may be ameanable to the 'VisionNet' concept of having multiple cheap machines connected together via fast Ethernet, the first rank doing the Fourier Transform, the second doing the Gabor Transform and the next doing the Canny edge detection, I discussed this in the thread highlighted by Sigblips on my Radon Transform thread.
There may be of course milage to be gained in combining this technique with my Wavelet based methodology I outlined in the following blog:-
Out of interest one of the non SETI projects I have in mind to try is to build a cheap 'VisionNet' system out of a few Raspberry Pi units, the fact that they can run Python, Numpy, Scipy etc, and come fully equipped with ethernet ports suggest that they might make a useful DSP chain. Alas my pension doesn't stretch to having a string of Pentium machines, as we had when we built the original, but at 25GBP per shot The Pi might make a good experimental basis for one.
My apolgies for such a delayed reply but other commitments have kept me busy.
I have downloaded a few datafiles as well as sigblip's excellent Baudline( This thing can do near everything but make me coffee in the morning!!). There is a lot I need to understand and learn with regards to signal processing, so please guys bear with me if I miss the correct terminology or state the obvious at times. My fortè lies more in statistical data analysis so this is a whole new field for me but the enjoyment of these types of challenges is what drives me.
I personally think that the Sobel or Canny Edge detection might be an overkill for this type of processing and will continue to experiment with a faster alternate means for this. I experimented a bit with reasonable success for an edge detection with this method: a 3x3 tile numbered 1,2,3/4,5,6/7,8,9 result1 = Abs((1 + 9) - (3 + 7)) and result2 = Abs((2 + 8) - (4 + 6)) these will outline the points of interest but still nothing to write home about.
I have now taken some of the interesting sections of the signals (I used Jill's Epsilon Eridani Signal, & the 2010-05-14-kepler04 ) broken them up into 4x4 tiles which I am currently categorising by hand and then will train the SOM using them. Once this part is done, then it should be easy to apply the trained SOM to most of signals.
With reference to creating the DSP Chain with Rasberry Pi's. My feeling is that the more intense DSP only needs to take place on the sections of data that have flagged an interest instead of all 8 Meg of sample space thus 95% of the chaffe can be eliminated with the main CPU cycles doing the in-depth analysis on the sections that need it.
I am thinking aloud here, so don't flame too hard if I have got it wrong, I take it that you are applying your edge detector to the data in what I call the 'waterfall' domain' i.e. its the frequency domain in the 'X' direction and the time domain in the 'Y' direction.
The edge detectors you are using are relatively small, and thus may be prone to generating a high rate of miss hits due to the high noise environment in the data. You may be better in using a larger array. There are a family of convolutional edge detectors that I have used, but whose name has currently slipped into the mists of time. They have the property of differentiating in one direction (i.e. calculating the gradient, which is what you want, whilst integrating in the orthogonal direction (e.g. reducing the noise) . Now you might argue that because your edge detectors appear to be computationally simpler than convolution, they should be faster - but that might not be true. Convolution is a long winded operation if carried out directly, however if you check up on any DSP text, you will find a process called Fast Convolution; here you Fourier Transform the filter (a once only operation) and Fourier Transform the Waterfall. Then simply do a pixel by pixel multiplication followed by an inverse transform. This might sound like a lot of work but due to the O(nlog(n)) of the FFT it is a lot lot faster than doing a straight convolution.
Now what is going through my mind is that the 'waterfall domain' has already had the initial data Fourier Transformed, now we are going to do that again ready for the Fast Convolution. Now with the exception of a scaling factor, the operation of double Fourier Transforming a real function is the same as doing a forward then a reverse transform - that is to say it is a null operation. So (I haven't tried this) this means that if you take the original time domain data (converting it to magnitude or power to remove the complex number format) and instead of Fourier Transforming its rows, you Fourier Transform its columns, you should end up with something not a million miles away from the Fourier transform of the waterfall plot. Now multiply this by the FFT of your convolutional edge detector, the inverse transform the resulting 2D array, and hopefully you should end up with something like you are looking for.
I have not considered some details here, such as configuring your SOM, and haven't given you data on the larger edge detectors. I need to do some archeology to try to get data on these, but I didn't want to go through such a process if I have misunderstood your methodology
There is a fair whack here for me to chew on, I'll reply as soon as I have something worthwhile.
thanks for your continued thoughts on the signal detection problem.
remember in waterfall plots, signals that are absolutely vertical will be locally generated RFI, because they would have to be locked to the observatory frequency standard in order to avoid the drift that Earth's diurnal rotation impresses on signals from the sky (there is a very small probability that the signal might have some intrinsic frequency change that exactly cancels the diurnal term over ~100 sec, but we ignore that. - and yes, i'm familiar with Paul Horowitz's former META search that precisely removed Earth's motion wrt several standards of rest, in order to use simple summation in individual channels to trigger events above threshold.) we don't chrip our LO, preferring to use zero-drift as a discriminant to clean out the digital clocks and other noise that even a very clean observatory generates). our current DADD algorithm is optimized for detection of straight line patterns up to max +/- angle, and it supposes that the next sample in the signal will be moving in the same direction as previous samples - thus it doesn't work very well on the squiggles that abruptly change direction. it also doesn't do well as what Ron Bracewell used to call the 'cartoon detection algorithm', unless the pattern, so easily comprehended by the human eye/brain, has linear components with lots of power in them.
Thanks Jill for your comments, I had understood that vertical lines were very unlikely to be be anything but local signals, and realized that the most interesting targets were to be those exhibiting some form of slope. I was interested in your point that you only have a limited range of slopes that you consider, Could you let us know what that range is - it may well help Patrick in generating his SOM's in as much as it may well limit his search space specifications.
In principle there is no reason why a line detector should not be configured to be selective to a sloped line rather than a vertical one. In the case of most convolutional filters, it is just a matter of rotating the filter to the required angle prior to running the convolution.
I have been playing with a synthetic Waterfall plot and various convolutional line detectors, in the hope that the results could be of assistance to Patrick. The Waterfall plot consists of three lines, 1 vertical and two sloped: -
I buried the signal in bipolar Gaussian noise, and then checked what it looked like - the display is distorted in as much as I have clipped all of the negative signal to zero, so I am only showing the positive 'pixels - although the calculations are done on the full data
The first filter I tried was a simple 3 x 3 Line detector - note that this is a vertical line detector, I hadn't bothered to rotate it. The filter response was:-
Applying this filter to the noisy 'Waterfall' generated the following output
Although the signals are still there, and visible, it is clear that the small size of the filter in conjunction has done little to enhance the 'visibility' of the signal. I then tried a larger filter which has the action of line detection in one plane, whilst effectively integrating the noise in the other, its characteristics are: -
The result of applying this filter is as follows: -
Which has a much better result in extracting the lines. However, as you would expect, the bigger the filter, the more angle selective it becomes, and so tends to discriminate against the more sloped signal.
I hope that these results are of some use
Addition to original contribution
PS Note that the purpose of this letter was to primarily assist Patrick123 in his experiment to evaluate the possibility of using Self Organizing Maps, to post process 'Waterfall Plots'. It is not meant as a stand alone methodology, which can be done much more effectively using the non-linear Wavelet filtering discussed by me in one of my blog entries on Wavelets and astigmatic filtering.
Are you getting anywhere with your SOM experiments?