Dr. Tarter points out that we regularly observe signals that look like squiggles in a standard waterfall plot. See her Google Summer of Code project posting for background:
I thought of one approach to searching for these kinds of signals, which uses the property of signal continuity versus time. Any connected squiggle can be straightened by comparing adjacent raster lines. Start with a "vertical" integration over the waterfall (that is, summing all raster lines over the time dimension). From the resulting 1-D power versus frequency graph, a figure of merit (FOM) is computed (this could just be the maximum value in the plot, or kurtosis, etc.). Our goal is to "collapse" the signal power in to as few frequency bins as possible.
With this as a starting point, shift one of the raster lines to the left or right by one (or a few) bins. Recompute the FOM. If the FOM goes up, keep the result, if not, then go back to the previous solution. Perform this analysis at least once for every raster pair. If the signal is strong and well behaved, this method will lead to a waterfall where the signal is a vertical line in the waterfall plot when the FOM is maximized.
This algorithm is an extension of the regular SETI search algorithm that looks for straight lines in the waterfall. It is known that the straight-line detector algorithm performs well on weak signals.
I'll admit that I haven't read every post concerning these squiggles, so I apologize in advance if my thoughts have already been hashed out.
Do we know the cause of the squiggles or at least maybe an idea? If we know it's a terristeral, then would the cause be due to Doppler? The more information we can state about what we think is the model, the better our signal processing will be. From the few plots I've seen, it looks a little random walk-ish. Also, when I look at the plots my gut reaction is to throw a Kalman filter or Hidden Markov Model at it.
My thoughts on Gerry's Questions:
1. At least the way I am understanding your algorithm, it seems like a combinatoric problem. Your greedy approach suggestion seems reasonable. Maybe a branch-and-bound search would help? If we can come up with a heuristic function to determine how well the search is going, we'll be able to better more efficiently.
2. If the approach works at higher SNRs, then it "should" work at lower SNRs. We just need more integration gain. Suppose you have T raster lines, when we sum over the raster lines we have some integration gain of x dB. If we had 2T raster lines, we'd get an extra 3dB of gain. So if we're in a low SNR regime, we just need more lines.
3. Simulated annealing would most defintiely help preventing the greedy approach getting stuck in a local optimum.
4. This gets back to being able to model the cause of the squiggle. If we can assign part of the cause to motion, then let's throw a Kalman filter at it. If it's a thermal or physical issue, then I'd suggest the HMM or a general Bayesian network.
5. If there's multiple squiggles, assuming they are not crossing each other, but located in separate swaths of bandwidth, we should bandpass filter, resample and run the algorithm on each narrower signal.
There's my two cents for now.
P.S. Here's a very nitty nitpick and it's probably more a case of I'll have to deal with it, but... when I see the words 'waterfall plot' in a communications system context I think of bit error rate curves as they are referred to in Proakis, etc... What is referred to as waterfall plots here ( and maybe in the radio astronomy? / astrophysics? community ), in my experience have always been call spectograms.
Instead of answers, some thoughts.
We think of the signal prototype as an uncontrolled oscillator whose frequency is variable with time. The freqency vs. time is a continuous function and continuously random-walks with a specific time constant (different for each signal). It is a good explanation, and it is suggested to think of terrestrial origin first, since such explanations are preferred by Occam's razor.
Integration gain: Good point!
Lets keep thinking.
I've found an incredible number of drifting-random-walks in most of the setiData so I agree that a terrestrial uncontrolled oscillator is a good explanation. But it also seems plausible that the ISM could have a similar effect on a constant tone. We talked about the ISM and the drifting-random-walks a bit last summer in the forums and here are some relevant links:
The large number of drifting-random-walks suggests a terrestrial answer but it seems dangerous to lump this signal in with RFI if it is possible that the ISM could be responsible. We don't want to throw away a SETI beacon signal just because the ISM makes it look like RFI.
Unfortunately there isn't a good way of testing this theory except for using models. Pulsars are our best test signal source of the ISM but a broadband pulse is very different than a tone. Now an interesting idea is to use the Pulsar's broadband pulse to build an impulse response of the ISM. I'm sure the Pulsar reseachers have thought of this idea already. A problem though is that this doesn't deal with the ISM's highly time variant nature that I'm suggesting.
Good point, Sigblips. We also don't want to throw a SETI beacon just because they have encoded _information_ in the signal, in the form of a relatively slow frequency modulation. !
There are examples of sources (like quasars) where there is a drifting cloud of plasma floating in the foreground. This causes scintillation (which would have time variable delay effects simulating frequency modulation in a narrowband signal). I'll look for the reference.
I agree, to get started on finding a SETI signal, we must first reliably identify this kind of modulated signal. Then we can start to look for signals that come from only one direction on the sky...
I assume an AWGN channel is typically used. I wonder if a multipath fading channel really should be used? Then we do not have to find the impulse response for the ISM. I imagine a model of ISM would be fairly random since the ISM could be different sizes, material, etc.. Just like in cellphone communications, we use multipath to account for the fact that the signal is going to bounce off of leaves, trees, buildings, etc ... It seems that would be a nice analogy to the ISM?
AWGN -- additive white Gaussian noise, had to look that up.
Multichannel fading is a likely scenario for human generated signals. Take the signal Rob showed, where the signal is spread across ~10 Hz bandwidth at any instant. What is the length scale for multipath to produce this interference? The light-crossing distance must be approx 1/10 the distance travelled in 1 second. 3 x 10^7 meters. The earth itself is 1.2 x 10^y meters in diameter, so scattering from nearby objects (mountains, etc.) couldn't produce this kind of incoherence.
There is no effective limit to the lengthscale for ISM scintillation. I know of scintillation on the order of milliarcseconds has been observed. (http://adsabs.harvard.edu/abs/2004ApJ...614..607O) This paper suggests that some scintillation (and or intrinsic variations) may be happening close to the vicinity of the source (if I read it correctly).
This scintillation has lead some (non-mainstream) astronomers to believe that quasars have proper motion, and suggest they are much closer than they appear to be based on Hubble's law (http://www.angelfire.com/az/BIGBANGisWRONG/wheresquasars.html). The scintillation explanation is the way out of this conundrum.
Getting back to ISM scintillation, the shortest scintillation timescales are thought to be in the range of minutes (0.01 Hz). This can't explain the 10 Hz width of the posted by Rob.
This probably means that Rob's signal is intrinsically unstable at 0.1 second intervals. It is very likely artificial and human made (not that anyone thought otherwise).
After looking at more of the plots, it seems clear that there is some Doppler effect going on. At a bird's eye view the signal is moving in frequency. Is this due to the motion of the Earth and our receiver? Is that type of motion already compensated?
Seems like we should be be able to roughly estimate the Doppler and remove it. Now we should be looking at a signal that is basically a straight line in the waterfall plot with the wiggles. At that point, if the hypothesis is a random walk as the cause AND it truly is a random walk, then if we integrate many raster lines the Gaussian-ity should eventually get rid of the wiggle.
However, say there is some extremely slow frequency modulation causing the wiggle ( assuming the transmitted bits are sent to be decoded, i.e. not randomized ), then the integration will not go to zero.
Just two more cents.
>Seems like we should be be able to roughly estimate the Doppler and remove it.
The transmitter, terrestrial or ET, may have an acceleration as well which would be unknown to us.
>if the hypothesis is a random walk...
Many of the signals appear to be from uncontrolled oscillators, however, there are a significant number of signals that appear to be oscillators that have some degree of feedback control. Here is an interesting signal that behaves like it is an oscillator trained with a RC filter in the feedback loop. One can see the exponential rise and decay in frequency versus time. Some of the signals that zig-zag I suspect are bouncing in a feedback loop between upper and lower frequency thresholds that are, in some cases, themselves drifting. A true random walk should have larger and smaller zigs and zags in a fractal pattern.
>The transmitter, terrestrial or ET, may have an acceleration as well which would be unknown to us.
What if we use a Kalman filter to estimate the motion to yield a Doppler estimate? We need to remove/compensate the Doppler to simplify the signal.
With the Doppler compenstated, we should have a signal essentially on a "constant" center frequency with the wiggle. At this point, maybe we use another Kalman filter or some model of the wiggle due to clock/oscillator drift.
Ultimately, we have a binary hypothesis test. H0 the signal is terrestrial in nature; H1 the signal is ET in nature. Building a model assuming H0 will be a lot easier than H1. Do we approach the problem by assuming H0 model and look for significant differences?
Thanks for the example of a cool signal .I agree that an unconstrained random walk would be fractal. I have always sort of thought that there is a minimum timescale (maximum frequency) for the "random walks" in most of our signals .When we build an algorithm to spot and characterize these squiggles, it will be possible to produce the curve f versus t. A power spectrum of this curve will not be flat in most cases.
For the signal pointed out by Jill, there appears to be a cutoff frequency (~0.1 Hz?) above which the oscillation power becomes very small. This creates the tell-tale Gibbs oscillations around 0.1 Hz.
In the signal Rob points out, it looks very much like a filtered sawtooth. The time constant (period) of the sawtooth is about 500 seconds or 0.002 Hz. It is very regular (zoom out to see). However, the signal also has a small short-time instability (if we wish to represent it as a continuous sine wave. The instability has ~10 Hz bandwidth, judging from this apparent width of the signal.
For signals like the latter, they may be easier to follow if we reduce the frequency resolution to ~10 Hz (just average adjacent frequeny points). A harder approach would be to use less frequency resolution and more time resolution. Would 10 raster lines per second cause the singal to drop to a single bin width? Just speculating.
Rob's signal shows that a squiggle detector will have to deal with multi-resolution waterfalls to achieve maximal sensitivity. We'd have to (algorithmically) match the bandwidth of the signal to the representation. This might be achievable by binning the data and then seeing if FOM goes up or down?
The algorithm Gerry describes should work for a strong signal but my gut feeling is that it will fail for weak signals. I don't think it will converge. It's trying to pull something random out of noise which itself is random. Also multiple signals, such as seen in this example, will cause problems:
I wrote up what I think is a good and easy to implement solution to this problem but I deleted it from this post. I will post it after the GSoC application deadline on April 8th. I don't want to fuel this GSoC frenzy anymore. It has already consumed enough of my time and I'm not a student or a mentor!
Last night I thought about the crits we've had so far. Here's another approach that uses multiple rasters at start and may have better convergence properties.
If there are N raster lines in the waterfall, break the waterfall in half with (approx) half of rasters on top and have of rasters on bottom. Sum the rasters in each section. Then autocorrelate the two sums and find the optimal offset that gives the greatest peak autocorrelation value. Line up the two blocks of rasters using this offset.
Now take the individual "halves," divide them in half the same way. Using 1 quarter, sum the raster lines as before, and compare this to the sum of all the other raster lines. Find the best value of the displacement, and shift the data. etc. Iterating until you are down to the single line level.
This has more computation overall, but the early stages compress the signal down with better SNR. If the data are too noisy (e.g. no signal) then we can break out of iterations when the computed FOM is so low that we don't trust the result.
Also, we never have to go beyond 45 degrees (a single pixel shift between two lines) in the algorithm. If we are interested in signals with >45 degree slope, then we rotate the image 90 degrees and send it in.
We'll have to build a "noisy squiggle" data generator.
Multiple non-crossing signals should not be a problem ( like shown in your link ).
Jill indicated that the current software will a signal detect with a larger bandwidth.
If that's case, we should be able to use that larger bandwidth to bandpass and resample.
Now we have a single wiggly signal to run the algorithm.
If we get information from setiQuest citizens that tell us the bandwidth, then narrowing to include only the signal will be helpful.
Otherwise, we may not know the bandwidth until the (blind) algorithm has completed. BW would be an output from the algorithm.
You are correct that we have the raw data, so we can operate on any bandwidth we like, up to >6 MHz, which is the limit of our collecting capability.
I agree that it's important for the chosen algorithm to have the requirement to handle multiple non-crossing signals. It's a doable problem, it's just a bit more difficult. I was saying that Gerry's original algorithm, as he described it, would have a problem dealing with multiple signals.