There is a new addition to the Blogs, which can be found here
It describes a post processing methodology which uses the Radon Transform to convert the usual Frequency/Time display Waterfall plot, to a Frequency/Doppler Gradient plot. This has the effect of compacting the lines found in the Waterfall plot to single bright spots. This effective information compression provides a much higher signal to noise ratio, and allows the identification of much smaller signals than is possible using the Waterfall diagram.
Opinions of the technique are welcome.
This is fascinating! I particularly like the image with "bow-ties" at the end of your analysis.
The DADD algorithm (latter stage of SonATA) is more efficient than the conventionally programmed Radon transform for a non-human algorithmic detection of Doppler-drifted tones, however, the Radon transform, as you have suggested, may be useful for producing images for humans to search for signals. A concern with waterfall plots, as you also suggested, is that they can contain weak, statistically significant signals that cannot be seen above the noise by the human eye.
As is true for other transforms, we could devise signal types for which the Radon transform would not be sensitive.
Much has been said about the computational load to compute the Radon Transform. This is undeniably true. But what is also true that it lends its self to parallelization(?) Certainly in the SETI case one can see that the chunks of the Frequency/Time data can be processed in parallel, but in addition the individual Doppler Gradient rows can also be subcontracted to seperate computational elements prior to stitching them back together again.
Although it needs a lot of programming skill to use, there exists in the standard pentium processor the SIMD processor bank (Single Instruction Multiple Data), which consists, if my memory serves me correctly, a gang of 8 floating point arithmetic units all of which will simultaneously compute the same operation on different data values, which seems at first sight ideally suited for undertaking the Doppler Gradient computations. I am ignoring the fact that you can now get motherboards which can take multiple processor devices, each with multiple cores.
I must admit that I wasn't considering trying to develop a real time system, merely an automated tool that I could use on the down loaded data. My days of developing real time image processing systems finished the day I retired.
Would this be something that a GPU would be good at?
Those are good ideas.
SIMD programming has been simplified by compiler extensions such as GCC vector instructions (e.g., http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html). The SonATA channelizer uses these extensions.
Another idea, if implemented by setiQuest, would be to distribute waterfall post-processing, such as Radon-transform-on-waterfall, into the browser client by exploiting the Adobe Flash virtual machine stack. This would allow citizen scientists to experiment with various methods of analysis using local processing or a server augmented by local processing exchange.
I made a crude demo of this technology a couple of years ago: http://ackrman.net/seti/swf-client-demo/
Another more recent example, but not related to SETI: http://ackrman.net/mandelbrot/
I am afraid my memory let me down. It is the integer arithmetic MMX register that has a gang of 8 processing elements not the SIMD, which has 4. It doesn't effect the argument about reforming the calculation into parallel format in any way, but felt I aught to mention that I goofed.
I used to be head of research at the UK division of one of the leading machine vision companies. We had developed an interesting architecture that we had christened VisionNet. Its purpose was to allow us to apply numerically intensive algorithms such as Hough, colour image correlation, super-resolution colour camera's etc to the task of real time image processing.
The structure consisted of the image server, which controlled the camera, and had control of splitting the image into its various subtasks. This data was then transmitted out to the auxiliary processing units via local Gigabit ethernet. These satellite processors could communicate either back to the server with the processed results, or to each other, or to another machine (the image consolidator), again using Gigabit ethernet. The system was very powerful, and scaleable. We had some units which were not doing too arduous a job using one computer, and the parallel processing done within that machine, using its built in parallel capability still using the socket communication via a virtual link (127.0.0.1) For a slightly more extensive task the Server would be connected to either a single multiprocessor machine, else an array of small cheap 104 format PC's,( in fact it could be expanded much further than we ever reached) . The system worked like a dream. After installing a couple of these machines to customer sites, the American company sold of the UK branch, my research team was disbanded, and made redundant. I have yet to see the concept having been reinvented elsewhere, although I haven' been activily been looking.
That description of your work is very interesting AND impressive. setiQuest processing (yet to happen in earnest) could benefit from a similar distributed, parallel environment.
I like the English term "redundant." It sounds so much better than what happens to us here in the U.S.
Even though I resigned from the SETI Institute first, I was about to be made "redundant."
Thats what made me bring it up. The time taken to acquire one of the 1.9Gigabyte blocks is circa 2 minutes (122 SETISecs as I recall). Using a Machine with a dedicate 1 Gigabyte ethernet link could spit this out to its waiting satellite array in just over 2 seconds, thats a factor of just under 60 times faster than it is being acquired, so that it still has time to do some of the processing, such as the task configuration etc. However it does rely on the rest of the code being able to be split into either a parallel, or pipeline configuration for it all to work smoothly. We found that most of the stuff that we were doing, (and I believe that this would also be true of the SETI calculations) was throughput critical. In other words what was crucial was the speed you can put the data into the computation chain. Latency wasn't a problem (i.e the time data entered the chain to when the results came out), so it was very ameanable to this decomposition into distributable tasks. Powerful computers are now cheap - even I can afford one on my limited pension;-) when you take into account the cost of software engineers time it is probably more cost effective to throw in another box, than have the engineer trying to crowbar massive amounts of code into one box if it can be easily spread into 2. We found that we gained in 3 ways; the tasks were easily distributable over a software team; data would come in on a socket, and leave the same way. They had just the one task to worry about. Secondly the individual tasks are that much more simple; often the programs were not even stressing the individual Satellite nodes, thus thirdly we were able to use really cheap machines, and no longer required 8 processor motherboards. It was quite a struggle to come from where I started out where the hardware was expensive, and the temptation to limit the number of nodes and make them do more work was difficult to overcome; but using VisionNet the productivity of my team was exceptional (often able to beat the sales team deadlines - not often have I been able to do that in my career;-)
The baudline 0.92 release back in August 2002 had a new feature called Auto Drift that extracted weak linear drifting signals:
The development of baudline's Auto Drift algorithm was inspired by Project Phoenix's DADD algorithm and I've used it numerous times in my baudline-setiQuest blog posts. Auto Drift isn't just a single algorithm but it's actually a collection of three different algorithms all of order O(n * log n). The folding paste algorithm in the Average window is most like the batch-centric DADD. The other two Auto Drift algorithms operate with a real-time stream and with the spectrogram. The spectrogram version of Auto Drift is very much like the Hough-based Radon transform that Dave's paper discusses. The concept is the same but my frame of reference is different. So instead of the Radon spectrogram's "bow ties" only the lower half of the bow is used. See this example image:
Auto Drift was designed for extracting linear drifting tones but surprisingly it also works well for the drifting-random-walk "squiggles" found in the setiQuest data and for tracking non-linear curves as seen in the above image. A spectrogram based Auto Drift could also be useful for tracking the Doppler shaped S-curves of LEO fly-bys.
Both Rob (above) and Jill mentioned that speed is a major problem with the Hough transform. Algorithmic efficiency being key matches my experience designing and developing baudline's Auto Drift. I agree that speed is very important but IMO all this talk about SIMD, GPU's, multi-processor motherboards, and distributed computing is a bit premature. Algorithmic efficiency trumps all those other optimization techniques.
Another key question is how useful is this for SETI? The demonstrated power and potential of these techniques are obvious but that's not the problem. The challenge of SETI is not that of detecting weak signals, that's easy. The challenge of SETI is determining which signals have an extraterrestrial origin and which ones are local RFI.
I would agree with you whole heartedly about the need to develop efficient algorithms - its a must. The point that I was making is that in many signal processing tasks very elegant and effective solutions can be developed by decomposing the algorithm into a parallel & Pipeline architecture (In many respects thats what Cooley & Tukey did for the Fourier Transform, without which there probably wouldn't have been any Digital Signal Processing).
Whether it is too early to be discussing Hardware implementations, all I can say in my defence is that the Forum has been much busier than it has been in a very long time, and I don't think that is in any way a bad thing. Anything that stops it dying on its feet seems a good way to go.
Do you really feel that having a vote on Interstellar flight makes sense if discussing potential hardware implementation for system implementation is premature;-)
Knuth famously said "Premature optimization is the root of all evil." (:
The design and development of a highly-optimized / network-distributed slow O(n^2) algorithm will require a huge expenditure of effort. Most of this work will be made obsolete with the discovery of a faster O(n log n) algorithm. This sort of dramatic increase in efficiency changes all of the optimization decisions.
I agree that the discovery of the FFT has had an enormous influence on the field of DSP but I disagree that the field would not exist without it. Baudline wouldn't exist without it but I'd wager that more than 50% of all the DSP in use today is not FFT based. Why? Transforming into the frequency domain isn't always the fastest way to solve a problem. I guess what I'm saying is that it's always important to select the best algorithm for the job.
The interstellar travel poll isn't my favorite and it isn't the most on-topic setiQuest poll but anything that promotes discussion is a good thing ...
About the poll question - Inerstellar Travel - I was referring to DARPA's http://www.100yss.org/. Jill went to this symposium, so i assume it warrants some attention.
If you or anyone else has an idea(s) for a poll question, please comment here, or better yet at http://setiquest.org/forum/topic/weekly-polls-website. i need ideas!
Knuth famously said "Premature optimization is the root of all evil."
Well he did say that in 1974 - which in our game is back in the Jurassic period
Knuth is a man of Wisdom, and needs to be taken seriously. However when he first uttered this comment, computers were expensive bits of kit. Now they cost less than a couple of days salary for a good software engineer. Designing multiprocessor systems in the way that we did in VisionNet meant we spent quite some period of time optimizing our structure, but we ended up with a system that not only had code reuse, but had hardware/software reuse. For example structures such as the image server was a standard off the shelf unit. The way that the modules interlinked via the gigabit ethernet meant that we could build a working system now while the customer was waving the folding green stuff at us. Our shareholders wouldn't be very impressed if we had told them, put the money away we need to wait until Sigblips invents a O(nlogn) algorithm. In fact the architecture was quite robust because providing that we configured the Sigblips algorithm with the socket in socket out structure, it could be plugged straight into the system with very little interference to the overall structure.
I think I possibly confused you by talking about my O(n^2) Radon transform experiment in the same thread as I discussed VisionNet. We didn't use the performance VisionNet offered as an excuse to use obsolete slow code; each module was programmed up using what was then state of the art implementations.
The prime motivation in my Radon Transform Blog was not to suggest that is was in any way better than your auto-drift algorithm, I couldn't possibly say that, as far as I can tell I can't run Baudline on my Windows system easily (put me right if I am wrong on this); all I was trying to do was to highlight that there may be better methods of display than the standard waterfall plots, and illustrated my experimental results using a very inefficient Radon Transform, to be perfectly honest I have been out of Image Processing for such a time, I didn't know about your O(nlogN) methodology.
Siigblips is a man of Wisdom, and needs to be taken seriously
I'm not that wise, I've just done this before.
I think the most important question is how useful will this new transform be for SETI? After playing with baudline's Auto Drift for the past 9 years I have a mixed and conflicted collection of answers:
First, Auto Drift is extremely cool and powerful if used correctly. Proper algorithm operation is very dependent on setup parameters which are not obvious to the inexperienced user. You just can't set it, forget it, and use it. A trained operator is required who is capable of questioning and testing what they see.
Any strong signals will swamp out the weak drifting signals. A clean noise floor free of RFI is required for best use. Also, the RFI environment of the ATA is already signal rich. I'm not sure if finding more signals would be considered helpful.
Unfortunately over the past 9 years I've seen very little interest in baudline's Auto Drift. It's a complicated niche feature and the majority of people just don't understand it. Spectrograms and the frequency domain are complex enough concepts to grasp. Auto Drift is a couple levels above that. If most baudline users don't get Auto Drift then I doubt that any setiQuest users will.