Discussion Forums

Some brief answers to many questions

5 replies [Last post]
gerryharp
Offline
Joined: 2010-05-15
Posts: 365

Hi All

I was pretty much away all of August, so I'm just getting back up to speed. Avinash sent me a large number of questions which may have originated with Anders. I have taken a first crack at answering them. This information might be helpful to produce a FAQ or populate the Wiki. I'm sure that not all the info is perfectly clear, and you should probably read it backwards. I started off with precise technical jargon, but as I went along, I relaxed and began to use more colloquial language.

Cheers,

Gerry

Firstly, some papers:

Below are several memos/papers that contain lower level descriptions of our techniques and algorithms. Some of these memo's were generated while I was learning the ropes (only a few years ago!), and I go back to all of them from time to time to retrieve important results. It might be good to gather these memos onto the wiki (a separate copy in a special place). I wouldn't rely on my berkeley website for long-term storage, I only have that account by the grace of soem friends there. As for the published papers (Jack's), we can link to they're presence on arXiv, or otherwise find the best access to the documents.

A low-level description of the poly-phase filter bank implementation of the Fourier Transform (this is better than a straight FFT) is given in:
www.gerryharp.com/documents/seti-memo-3.pdf
This paper is actually desribing a prototype implementation of SonATA. Except for some numerical details and hard-core implementation, this is essentially what SonATA does.

A low-level description of the effects if the Interstellar Medium in contained in:
www.gerryharp.com/documents/seti-memo-5.pdf

A higher level description of the same effects and also about the AutoCorrelation method and conventional SETI is contained here:
www.gerryharp.com/documents/AutoCorrelation%20method%2018.pdf
This one is published in the literature, "A new class of SETI beacons that contain information. In/ Communication with Extraterrestrial Intelligence," Harp, G. R., R. F. Ackermann, S. K. Blair, J. Arbunich, P. R. Backus, J. C. Tarter, and the ATA Team. 2010., ed. D. A. Vakoch. Albany, NY: State University of New York Press.

A description of some simulations that back up our "multi-beam RFI excision" method is found in:
http://astro.berkeley.edu/~gharp/documents/8p.pdf

Of course, Jack Welch's paper on the ATA is relevant and should be posted for all to see.
  "The Allen Telescope Array: The First Widefield, Panchromatic, Snapshot Radio Camera for Radio Astronomy and SETI," Jack Welch, et al., Proceedings of the IEEE, Special Issue on Advances in Radio Telescopes (2009).

Questions:

1. My favorite topic would be: directions for SETI (algorithm) research - unexplored areas that the speaker with all his or her experience thinks hold promise for SETI and would encourage others to look into.

The kinds of algorithms that rise to the top in my mind have one or two characteristics:

  1. Searches where we have a fast implementation
  2. Searches that depend minimally on the exact form of the signal.

For #1, my favorite example is autocorrelation, as is well known to everyone. But there are many different things one can correlate with another. Different frequencies, different beams and so on. The key to this approach is that the implementation is very fast because we don't have to search over huge multidimensional volumes to get the answer. Autocorrelation is limited, however, by SNR. This is where #2 comes in.

For #2, one example would be developing a variant of the KLT algorithm, which determines the signal strength and its shape with no a priori information (such as that the signal is a sine wave). So far, I haven't seen an interesting implementation of KLT that would not require at least N^4 scaling on the number of samples, N. (Three for the matrix inversion, and one more for the "time folding" big T parameter in Claudio Maccone's work). Compare this to standard SETI or correlation, both of which scale as N log2(N). Someone should think hard about this and come up with a better mousetrap. For example, you could probably reduce KLT to N^3 by computing only a few eigenvalues instead of inverting a matrix, using a Lanczos method for computing eigenvalues / eigenvectors. Still, there is a long way to go.

Another example is to use non-Gaussianity of the noiselike data to determine if there is embedded redundancy. I have recently been converted to thinking that the single most important characteristic for distinguishing artificial signals (or at least, "interesting" signals) is that it contains redundancy. Methods to compute redundancy, even if they don't tell you about the nature of redundancy, can help us find arbitrary signals buried deeply in the noise.

2. The nature of noise in SETI. For 50 years, SETI has seen nothing but noise and will probably continue to do so for at least a while. What is noise? What kind of characteristics does it have? (I here mean noise in the broadest possible sense, as in any data that isn't an ETI signal.) What kinds of deviations from GWN can we expect to see which are not ETI signals (pulsar pulses are a classic example, but are there others?) What are the spectral, temporal, statistical etc. properties of non-GWN noise?

As I mentioned to Avinash, I think we can save ourselves work by using existing encyclopedias, esp. Wikipedia for introductory material such as the properties of Gaussian White Noise. Thus the starting point is:

http://en.wikipedia.org/wiki/White_noise

White noise is a signal that has equal power at all frequencies over an infinite time series. From other Wiki pages you can deduce that Gaussian white noise is a type of white noise that, if you look at any frequency bin over a finite time period, the power in that bin will have a Gaussian distribution

http://en.wikipedia.org/wiki/Gaussian_distribution

To answer the rest of this question: In unpopulated bands (those not in active use by humans), GWN is the rule, only occasionally broken in nature. However, our telescope by its very nature introduces correlations (or redundancy) into our signals. For example, the light-travel time across our array is ~1 microsecond and across 1 dish is ~0.01 microsecond. Therefore, I will hesitate to believe "discoveries" of signals with redundancy on timescales between 0.01 and 1 microsecond, since there are known array artifacts in this range.

Then there are the Walsh function frequencies which range from some 100 of Hz to 20 kHz (gotta double check those numbers). Converting this frequency range to microseconds, we can expect artifacts over time durations of 50 microseconds to 10,000 microseconds. (This way of looking at things just now occurred to me).

Then there is self-generated RFI. In a very broad sense of RFI, we can consider receiver noise to be RFI. Essentially, we are making measurements of the electric field arriving from the sky using instruments that are "hot." Our receivers are red-hot, no white-hot at 1.4 GHz. The pitifully small signals we receive from the sky are completely overwhelmed by the self-generated noise of our receivers. To a pretty good approximation, receiver noise is GWN.

There is another kind of self-generated RFI, which is in our signal processing room. Here the signals from the antennas are digitized at very high rates and processed with 2-3 GHz CPU's. In order to run this computing equipment we need lots of clocks ticking along with periods anywhere from 1 second (10^6 microseconds) down to 10^-4 microseconds.

I realized this week that we can make a statistical measurement of such self-generated RFI by looking at data from our imaging correlator, not usually employed for SETI. If we look at a point source in a dark field, we should see a flat response when we plot the correlation amplitude versus baseline length. In my memory, however, I believe we usually see declining amplitude with baseline length, which is just what one would expect from self-generated clocking RFI. This is an example of a statistical analysis that doesn't tell us what the RFI is, but gives us a measure of its total energy.

Finally, there is RFI generated by people other than the ATA. This stuff is all over the place. It looks like sine waves everywhere thanks to the human love of clocks. It looks like dispersed pulsars because radar transmitters often use chirped pulses. It can even look like GWN in constrained frequency ranges. But it isn't noise from the galaxy or cosmic microwave background or even from our antenna receivers. It is satellite signals.

The only reason we have hope of overcoming all these sources of noise is that we can either identify them and throw them away, or search in sky-frequency ranges where they don't appear.

3. The interstellar medium and its meaning for SETI. Two Cornell researchers, Cordes and Lazio, have developed this whole theory on how the ISM would affect an ETI signal. For instance, they've found that the signal will exhibit a quick brightening followed by a huge dimming due to turbulence in the ISM. Among other things, this can possibly be used to discriminate interference from extra-solar signals. It would be interesting if someone could give an overview of this topic in an accessible format, but with enough detail to actually go out and apply it.

A few years ago, I wrote up some notes (for my own records). This brief introduction gives a demonstration of how a broadband pulse is distorted by ISM over short distances (also listed above):

http://astro.berkeley.edu/~gharp/documents/seti-memo-5.pdf

This memo describes only first-order effects (dispersion) in the interstellar medium. But because the ISM is clumpy, there are second order effects which cause fading. Imagine that you have a plasma "bubble" sitting in between you and a distant source. Then than bubble will act as a lens (much like a crystal ball). If your source is in exactly the right place behind the bubble, you will see a minimally distorted, magnified image of the source. If the source is in the wrong place, you won't see it at all. Now imagine that the ISM between you and the source is full of many bubbles, of all different sizes. It is very hard to predict what when you may see the source without detailed knowledge of the ISM. It is very difficult to measure the plasma content of the ISM (except along pencil lines toward pulsars, of which there are too few).

Using fading to find SETI signals might be difficult. Human-made signals fade in and out too, for all sorts of reasons. Also the reason SETI signals fade is because the ISM has a lot of structure, none of which we know. So we cannot predict fading for any particular signal, or even make statements about the statistics of the fading. Only when we average over a large ensemble of putative signals can we begin to make statistical inferences, and we don't even have one yet!

I believe the main point of Lazio and Cordez work is that fading makes SETI a lot harder than it otherwise would be. To overcome fading, the simplest way to start is to look only at nearby stars and relatively low frequencies (the waterhole). For larger distances / higher frequencies, they suggest a cycle of repeated random observing of the same source over and over until we catch it at the right moment.

Having said that, I don't want to discourage you entirely. With some cleverness you may yet find a way to use fading to your advantage.

4. How the ATA works for SETI. Detailed description of how the ATA is designed and operated and how it effects the data products we see in setiQuest.

The place to begin here is with Jack Welch's paper described at the top. I wonder if this paper is available on arXiv?

I think the problem is not the detailed description, of which we have an abundance. What we need is a distilled version of the ATA raw data collection and processing that encapsulates our best knowledge of potential artifacts introduced into the data by our instrument. Put it this way, the ATA measures the flux coming from the sky in a particular direction. To first order, that is all you need to know.

To second order, we need a very basic description of the array that focuses on areas where artifacts are introduced (receiver noise, array time delays, self-generated RFI, etc.). For those with a mastery of this second-order description, we can just data dump the already documented bits of the ATA and suggest that motivated collaborators read Thompson, Moran and Swenson (TMS, http://www.amazon.com/Interferometry-Synthesis-Astronomy-Richard-Thompso...) to put it together. There won't be many users of detailed descriptions who are not professional radio astronomers because they need both the time and the mathematical background to read and understand TMS. It would be pointless and probably harmful if SETI or even a devoted volunteer would attempt to rewrite TMS.

It might be possible for a dedicated volunteer to read and digest some of the most crucial aspects of radio interferometry and to produce a more accessible summary of key points. That might be a 2.5-order step...

5. How is the raw data processed before it is made available to systems such as SonATA and to us here on the website?

Speaking for setiQuest raw data series, these are produced by a beamformer. To a first approximation, these data contain signals that arrive at our telescope from all directions but with unequal sensitivity in all directions. Therefore a weak SETI signal may be found in the beam pointing direction and a strong RFI signal (self-generated or not) may be seen coming from any direction. Added to this weighted "sky flux," we have a background level of reciever noise. The good news is that by beamforming, which is essentially an averaging over antennas, the effective receiver noise goes down by the sqrt(Na) relative to the sky flux, where Na is the number of antennas (typically Na = 20, not 42 for a variety of reasons I don't want to cover here).

A second-order description would describe how the antennas are phased up with one another (calibrated is the technical term) and the layout of the antennas on the ground. As mentioned above, the antenna layout produces time-delay artifacts when there are calibration errors. Also, there is the Walshing system, also mentioned above. Probably there are a few more details that should be added to complete the second order description.

As you can see, I'm punting on some of the more detailed stuff. If we have our questions and answers organized on the website, then it might be easier for scientists to write 2nd order descriptions of processing when they're broken up into smaller chunks. It would be great if someone would help us to organize this information.

6. What are your RFI mitigation strategies?

A. Look to see if this signal has been previously observed (within the last week) in another pointing direction. As described in C. below, this proves the signal is RFI. All of the signals observed in the prior week are distilled down to a few parameters that are stored in a database. This database needs to be "seeded" initially using steps B and C below, but after seeding it becomes a very effective filter for RFI.

B. Look for persistent signals. Signals that appear only in one observation and never again could be the result of a rare but expected noise spike in the data. Once in a while, the noise in our receivers accidentally line up in phase with one another just enough to generate the appearance of a signal. Also, human RFI (like a LEO satellite) may accidentally run through our telescope field of view, never to return. For reasons like this, we must insist that the signal appear more than once. I'm told that this form of interference rejection gets rid of most of our signals. I don't know the exact fraction.

C. Point the telescope away from the direction where the signal is observed. Almost all RFI that appears in our telescope is not coming from the direction where we point. This is because it is impossible to build a perfect telescope unless it is infinitely large in diameter. The ratio of sensitivity in our pointing direction to the sensitivity everywhere else is inversely proportional to the diameter of our telescope. Since the ATA is relatively small (both in dish diameter and in array diameter), we can see some satellites no matter where we point.

The good news is that for such satellites, moving the telescope away from the intended source does not make the RFI go away. So if we move and the signal does not change, we know that the signal is not coming from the pointing direction and we classify it as RFI. This is fair, because if there were a SETI signal as strong as a satellite, it would have been discovered a long time ago!

D. If the signal disappears in the first "off" measurement, perform many on/off trials done a few hundred different ways (100 ways times 100 seconds per trial = about a day) to be really really sure the signal is coming from the target direction. in the history of the SETI institute, we have never gone beyond this stage. In fact, it is exceedingly rare (once a year?) that any signal persists after 3 on/off trials.

E. Contact other observatories and ask, "Do you see what we see?"

Beyond point D we enter the realm of the international SETI protocol which was developed and is hanging out on the web somewhere. It lays out what to do if you find a very interesting signal.

7. What are sources of noise internal to the ATA?

I think I summarized this above. Or rather, I gave an outline of a summary.

8. Also, what would it take to stream data from the ATA live?

A great big bundle of fiber and the electronics to light it up. There are rumors of the existence of enough dark fiber to substantially improve the ATA connectivity, but besides just hooking up the fiber to the ATA, a lot of repeater stations and routers would be required. Also the know-how.

However, we must constrain our dreams. As we continue to improve so-called "setiQuest observing," we can not emit even 8 hours of 8.7 MHz data over a period of a week (that is 0.5 TB per week or 6 Mb/s). Why is the allowed rate so small? Because we're not the only ones using the fiber! Obviously, if we add hardware, fiber, &c, specifically for emitting setiQuest raw data, it would be mostly dedicated to the task.

To go to just 80 MHz (currrent useful bandwidth on one channel) is a factor of ~10. Then there are 4 tunings (1 beam per tuning) times 2 polarizations for another factor of ~10. We are a factor of 100 away from emitting all the data that can feasibly captured, and that doesn't include a bunch of hardware / software on site to pump the data.

Jill has set a goal of 4 beams per tuning, increasing the total bandwidth to 400x what we have currently.

The next step would be to capture not 4 beams per tuning, but rather, 42 antennas per tuning. Then we could form beams in any direction off line (ususally constrained to the field of view, which covers ~2500 beams). At this stage, wtih only a factor of 10 in bandwidth we obtain a factor of 2500 in useful data. This brings the total bandwidth to 4000x what we have now.

To push the envelope further, we have high hopes of increasing the number of antennas in the array to 350, or lets just call it a factor of 10. This increases the emitted output to 40,000x current.

Finally, the ATA antennas produce not 4x100 MHz bandwidth, but 100x100 MHz bandwidth. If we had 25x as many downconverters and digitizers, we could capture the entire 10 GHz bandwidth of the receivers. This increases the emitted data by 25x, for a total bandwidth of 1,000,000 times the bandwidth we have now. That is 6 Tb/s if I did the calculations right.

9. State-of-the-art in SETI. What algoritms are currently in operation and how do they work?

The conventional sine-wave search is the bread and butter of SETI all over the world. This is implemented by taking time series data (like from a beamformer) and performing a very long Fourier Transform on it. Because of disturbances just in our own solar system, it doesn't make sense to look for signals with frequency bandwidth < 0.01 Hz. That means you want to do the FT over ~100 seconds. At a rate of 100 MHz, that is a FFT of length 10 GSamples.

Other examples are 1) look for narrowband pulses (variant of sine wave search), 2) look for widband pulses 3) look for another very specific kind of signal (bniary representation of pi, 4) look for repetitious signals (with identifying exactly what they are), and so forth. All of the work looking for classes of signals 2, 3, 4, amount to less (perhaps much less than 10% of all SETI observing).

10. What kinds of signals do they find and which ones do they miss?

Conventional SETI: looks for drifting sine waves and nothing else. Sometimes other signal types are strong enough to "contain a component" of a drifting sine wave, or many components. These are classified as RFI and thrown away. Anything that doesn't look like a clean sine wave is thrown away.

Narrowband pulse search: Same as above except that the signal is allowed to switch on and off at a regular pace. Everything else is thrown away.

The other classes of signal have pretty much the same quality (matched filter search (3), autocorrelation search (4), etc.

You can compute the number of possible signals that can be embedded into a time series of length N. The number is N. For a 100 second observation at 20 MHz, the number of possible signals is 2 billion. Comparatively, a single SETI observation at the Institute might pinpoint a few thousand signals. So most SETI scientists are open minded about searching using other methods. But sine wave searches have won so far.

11. What well-known signal processing algoritms has been considered but rejected and why?

KLT -- takes too long. However, we dont' know if this is intrinsic, or faster implementations could be found.
.
Boradband Pulse searches -- out of favor because a pulse traveling in the interstellar medium do not stay pulse-like.. So the search takes too long (by comparison to the ground you can cover in conventional SETI during the same time period). Again, we're limited by our number crunchers and pulse searches are not intrinsically slow.

Autocorrelation searches -- As fast as conventional SETI to within 2x, these searches identify redundant information energy (the signal) with less SNR, depending on the setup.

Chirp searches -- same problems as pulse searches.

Non-Gaussianity searches -- Kurtosis is one measure of non-Gaussianity. This is a relatively new direction for SETI and hasn't been applied very much to date. Perhaps it will be used, someday!

AM Radio / FM Radio analogs -- The first is very sensitive to fading and gives lower SNR than a sine wave search. The second one does not fade any worse than a sine wave, but the computation for such a search is long.

Looking for an obviously artificial but specific signal, like the binary representation of Pi. Until now the best search algorithms for such signals (a la David Messerschmitt) is still very slow compared to a sine wave search.

Are you beginning to see a trend?  (^_^)

12. Historical background and lessons learned. What searches has been carried out so far, what were the results and what do they tell us about the nature of ETI signals and RFI?

Jill Tarter is a very good historian. I bet she has a paper relating to this...

One big lesson is that if you find something that looks like a signal, DO NOT LET IT GO! That is, we need to follow up immediately. We hope that ET keeps their transmitter turned in our direction forever, but perhaps they change stars once a week. If we find a signal that lasts at least a week and has been verified at many observatories around the world, then even if it goes away we'll still have something; pretty strong evidence.

RFI is very often persistent. So keeping a database of previously seen RFI is really helpful for identifiying candidates.

You need as much sensitivity as you can get. Much of the sky has been covered at very low sensitivity by both professionals and amateurs. If there were a really strong and steady source out there, e.g. as strong as a GPS satellite, then most likely we would know about it by now. The ATA is not a huge telescope, but when it is connected to SonATA, it is big enough to achieve sensitivity to signals billions to trillions of times weaker than GPS. However, it would be a much more effective (~10x better) instrument if we had 350 dishes instead of 42.

13. What does the SETI Institute need to improve their SETI program? What specific development tasks can the open source community help address? What would you like SonATA to be?

Many problems could be solved with more money, but that is not what setiQuest is about. As you suggest, setiQuest is about opening up the search to the entire world and particularly to people who don't have a lot of money but do have talent. I imagine a college student in Africa, a stockbroker in Romania, or a software developer in Ecuador. Or perhaps a highly motivated individual in Sweden...

This is an important question and I can't give a clear answer on short notice. Perhaps Avinash can comment on setiQuest plans.

I hope this posting is partially helpful, and I expect plenty more questions. I hope that we can capture all this info somewhere that is easily found by all the new setiData / setiCloud / setiCode participants.

Gerry

gerryharp
Offline
Joined: 2010-05-15
Posts: 365
link to ATA paper

Hi

I found the arXiv link to the paper on ATA by Jack Welch: http://arxiv.org/ftp/arxiv/papers/0904/0904.0762.pdf

Gerry

Anders Feder
Offline
Joined: 2010-04-22
Posts: 618
Lots of good info, thanks.

Lots of good info, thanks. I'll try and get as much as possible into the wiki.

gerryharp
Offline
Joined: 2010-05-15
Posts: 365
Thanks

Thanks! Anders.

Jill Tarter has suggested that we upload some power point "tutorials" on interferometry. They may or may not be self-explanatory. Can you suggest where I should post them on the wiki?

Gerry

Anders Feder
Offline
Joined: 2010-04-22
Posts: 618
The Astronomy section should

The [[Astronomy]] section should be good for this.

Anders Feder
Offline
Joined: 2010-04-22
Posts: 618
Memos

Gerry, these memos that you are linking to, is there any chance that you can make available under a Creative Commons license or similar? Then we can upload them to the wiki in case their current host goes away.