Is it not time that the Seti institute/Setiquest/seti@home initiatives joined forces, to throw all that distributed computing power at data from the ATA?
It seems to make sense.
Right now, S@H has its hands full with data from Arecibo, I think, but isn't Arecibo supposed to be closing?
setiQuest and SETI@Home are not out of alignment. Our focus is different. SETI@Home is looking to process large amounts of data from Arecibo. setiQuest is looking for help from you all in interpreting data that cannot be processed using computers, and improving algorithms that the software uses. We do work with SETI@Home. Last week at exactly this time, I was sitting in a conference room at SETI@Home office with Dan Werthimer and many members of the SETI@Home team. If you have specific ideas on SETI@Home and setiQuest collaboration, I would love to explore them.
And, here is what Dan said about Arecibo in an email exchange a few minutes ago.
> arecibo just received funding for another five years,
> and we expect NSF and NASA will continue
> for several decades.
SETI@Home has one important asset: it's installed base of millions of clients. Could we utilize this resource somehow?
How about this: we set up a new BOINC server, dedicated to the setiQuest community. This server will differ from SETI@Home's server in that anyone in our community will be able to propose new experiments (algorithms etc.) to run on the server. There will be a process by which new experiments are submitted, approved and installed on the server. This process will largely be driven by setiQuest volunteers (like, say, me), but UC Berkeley will have the final say in which experiments are accepted for inclusion.
Now, the idea is that, every once in a while, when the UC Berkeley folks feel they have spare computing cycles, their server will redirect connecting SETI@Home clients to our server for new work. Our server will then send algorithms and work units for a currently running experiment back to these clients. Depending on how UC Berkeley manages the redirects, this can potentially be a lot of clients. The clients begin crunching away and return the results to our server. Whoever initiated the experiment can then follow the outcome there.
For instance, imagine that sigblips has seen something unusual in a data file. He wants to run an CPU-intensive analysis on the file to examine the feature, but it will take far too long on his own computer, and even in Amazon's cloud. He therefore sets up the analysis as an experiment on our BOINC server, and have it approved. Soon, his analysis is being computed by thousands of SETI@Home clients world-wide, completing it in no time - and perhaps they find something interesting.
This would require some coordination with the UC Berkeley folks, and updating of the SETI@home client, most probably. But nothing that isn't doable.
This is an old thread, but it looks like a reasonable place to post on this topic. I would be interested in hearing general comparisons and contrasts between the SETI Institute's focus, hardware, data analysis, etc. and that of S@H. Obviously, S@H and SI are looking at separate data sets from different telescopes. Does that make a difference in how the data is treated? Are the S@H "enhanced" and "astropulse" applications similar to the software SI uses to analyze signals. Does SI look for pulses? Does SI do additional types of processing? (More succinctly, I guess I'm asking how Sonata compares to the S@H apps.)
Due to the nature of S@H, much processor time is devoted to cross-validation of results from different clients. How does that compare to SI's validation methods?
Jill Tarter often speaks of Moore's Law when speaking publicly. My impression is that processor power is far more of a bottleneck than telescope data, at this point, despite the fact that the Allen Array currently has far fewer nodes than eventually planned. Just what kind of computing horsepower do you have? Is it all CPU? All x86_64? Any GPU or other, more exotic platforms?
I realize that I've asked quite a few (possibly disparate) questions here. I'd be happy to see comments on any subset (or superset) of these subjects, or a pointer to where some of them have already been answered.
Here are a few quick answers to some of your questions:
* The ATA does its processing with a large farm of x86 machines that run Linux. Because of this dedicated farm of trusted machines, "work units" are only processed once (this isn't actually how SonATA works but the general concept is correct). Validation at the ATA refers to a human becoming involved in the decision making process.
* A lack of CPU processing power and the narrow Internet data pipe are limitations at the ATA. Only a small fraction of the processed radio telescope data is exported and some of that makes it to the setiQuest data download page.
* Because of the narrow data pipe the vast majority of the data processing is done on-site at the ATA and in real-time.
* seti@home uses the world's largest supercomputer so it can afford to be wasteful by the multiple reprocessing of the same work unit and the use of inefficient algorithms. The benefit of this is that the seti@home algorithms are slightly more sensitive.
SETI@home and SonATA search for essentially the same things, namely narrowband signals, though I am sure there are some technical differences. The fundamental difference in philosophy is that SETI Institute wants to be able to check up on candidates immediately, because this is the only way to determine for certain whether the candidate is RFI, while SETI@home hopes that there are very persistent ET signals out there which can be followed-up on long after they were first detected.
SI's approach makes fewer assumptions than SETI@home, but requires that they are able to analyze the whole datastream in real-time (since if the detection does not happen in real-time, the follow-up can't be immediate). The requirement for analysis to happen in real-time in turn means that data has to be processed locally at the telescope.
Thank you, Sigblips and Anders, for the responses. That's an interesting difference. Of course, SI has control of its telescope now, whereas I believe S@H is still piggybacking on other projects for the data. It's not hard to see how a real ET signal might have to be rejected as RFI if the measurement is not repeatable however long after the fact that it is detected. Owning your own telescope makes a lot of difference, I suppose.
My perception was that SI's ~100TB/day of data got spooled to permanent storage for later analysis, and then got analyzed as resources permitted. (And I'm wondering what permanent storage medium one uses for 40 petabytes per year.) But it sounds like it gets triaged immediately for signals which can be detected more quickly, and then stored for more thorough analysis later? 1GB/sec seems like a *lot* of data to do *anything* intensive with in real time! Is it specialized hardware that handles the initial triage? Or am I completely on the wrong track?
BTW, and amusingly, at the exact moment that I started reading your message, Seth Shostack said the same thing about SI's real time analysis, while broadcasting from right in front of the Arecibo dish, in the December 7, 2003 episode of "Are We Alone" that I happened to be listening to. Though he made it a point to mention that SETI researchers don't actually wear headphones, ala Jodie Foster. :-)
No, you are actually right - they currently use a system called Prelude which relies on custom-made hardware components. However, due to Moore's law, it is now becoming possible to phase out the Prelude system in favor of SonATA which is all software on commodity hardware. Until recently, the commodity hardware was indeed too slow to keep up with the astounding data rate.
There is no permanent storage of the raw data (the volumes are too big), but individual signals detected by SonATA are registered in a database, which is used for RFI mitigation. As far as I understand, the database only holds basic characteristics of each signal (frequency, time and so on), so no actual signal processing or analysis is done on these data. Everything is geared to happen in real-time.
Though he made it a point to mention that SETI researchers don't actually wear headphones, ala Jodie Foster. :-)
Actually, right now I'm wearing my Sennheiser HD 280 pro's while listening to the modulated signal discovered in the Kepler-4b redux data. I'm not a real SETI researcher though, I'm just an amateur. (:
Here is a perfect idea for collaboration:
SETI@home now has something called Near Time Persistency Checker (NTPCKR). The NTPCKR identifies candidates in SETI@home's database that appear persistent in the sky across multiple observations.
Currently, when NTPCKR identifies a persistent candidate, it may be months before Arecibo swings past that patch of sky again so the team can verify or falsify the candidate. Why not lend them a hand?
A system could be devised that lets the NTPCKR submit persistent candidates as a recommended target for the ATA. If the target is accepted, SonATA can check it out, as can we in the setiQuest community, if the data is uploaded to the cloud.
Not only would it be faster than Arecibo, it would also constitute the second observation from another location which is necessary for confirmation.
Apart from being meaningful SETI, this would also be a nice gesture towards the community around SETI@home, which is still much larger and more vibrant than ours. It could be a good way to get more people interested in setiQuest.