(This was just posted on the setiQuest group page on meetup com; I thought it also might be suitable here, and might perhaps explain my thoughts in somewhatgreater detail.)
To all members of setQuest on meetup.com:
I have been seeking for the past several weeks to be able to open and properly format, for amateur experimental analysis, copies of data files from the setiQuest.org website. While I do have a good deal of experience in scientific programming as well as signal processing, data analysis, and time series, the format of the files involved, some of which I have been very kindly sent a link to by jrseti of setiquest.org (with whom I have spoken and had several e-mail discussions), are extremely nevertheless difficult for me to properly format and access.
In having mentioned the difficulties I have encountered on the setiQuest.org website, I had suggested the idea of trying to convert such data to purely ASCII text format, in order to try to greatly simplify such access for amateur analysis. I have been able to open the files using a freeware version of a hex editor suggested by jrseti, and I do genuinely understand his reasons for preferring that I accustom myself to the binary format presently available. However, after looking at the considerable complexity of the data formats in hex binary, I am still very much of the view that converting the data to ASCII, purely apart from any increased storage requirements, could only aid amateur experimentation with the data. Also, the package github.com which was mentioned for use in trying to create a "fork" of such data onto the PC of an amateur seeking to experiment with it, is actually a fairly sophisticated package, requiring an equally fairly advanced level of technical experience and perspective.
While I will certainly make every effort to do as jrseti suggests, and try to use the binary format, I would merely point out that the file sizes of up to some 1.9-2.0 Gbytes are extremely large. They take an extremely long time to download, even on a broadband cable-TV connection, up to some 20-30 mins each; further, their size, I expect, would greatly tax many scientific software packages I have envisioned using to try to analyze them. I am very much aware of freeware software for both splitting and joining files, that, in my view, could potentially allow such files to be broken up for website download, making them of a reasonable, and relatively manageable size, and thus far more tractable for use with most spreadsheet-based signal, time series, and/or data analysis software.
I entirely realize that what I am suggesting would likely be deemed as an extremely cumbersome attempt to take a step backward from present data formatting approaches used for such SETI data. Further, I am equally well aware that I am obviously a newcomer to amateur SETI data analysis, and am by no means seeking to suggest such an approach in the expectation that it would be at all readily deemed acceptable. However, prior to my now permanent and total disability, by reason of which I can no longer work, I spent many years in electrical engineering (EE), physics, and math, as well as having been preparing for my doctoral-level qualifying exams in math, prior to going, instead, for doctoral-level allied health training in a clinical field, which, unfortunately, due to my diability, I no longer work in.
For those reasons, I spent many years involved with the analysis of many very large, serious, scientific data sets, which, while unrelated to SETI, frequently bore on very similar and/or closely aligned scientific subject areas. I have thus generally found that, where feasible, data should be stored in ASCII format, whether web-based, and/or using some sort of cloud-computing and/or virtual-desktop environment, for which I am aware of several open-source software projects that might be suitable for adaptation to such a purpose.
Also, I am extremely interested in trying to set up a chapter of this meetup.com group related to setiQuest on Long Island. There are, to my knowledge, at least two other amateur scientific groups on meetup.com near me that could, in my view, be more than reasonable possible venues for such a chapted, and/or serious amateur scientific and/or avocational involvement in analysing SETI data through setiQuest. As such, I would very greatly appreciate any possible discussion in that regard, and would be more than entirely willing to devote a good deal of my free time toward such an effort at making a Long Island chapter of setiQuest a reality.
I have, in fact, already spoken with several members of one of those two other meetup.com amateur science groups about the idea, and do genuinely think that, if the SETI data available could be rendered into a more tractable format capable of easier analysis and more readily-apparent format, that both of those groups, as well as others, on meetup.com, could potentially have adequate interest amongst their members to actually be able to provide a productive and contributing amateur scientific environment to the entire amateur SETI effort. I would thus greatly appreciate hearing from any member here that might possibly be interested in assisting me with such objectives, and would look forward to any possible suggestions and/or encouragement in those regards, whenver might possibly be convenient, whether by e-mail through the meetup.com website, or through additions to this discussion.
We are thinking about a new setiQuest meetup. Maybe in September? Maybe we could have the meetup here locally at SETI, but also over the internet live with other groups.
I would suggest we make this a "working" meeting where we show how to do something, like analyze the data.
Most any day in September is good for me. We should plan an agenda to make this Meetup even more productive.
July 10, 2011
To any members of setiQuest:
I of course understand the various responses I have received regarding my initial comments posted earlier; however, I felt there is something very important that I very definitely needed to clarify. I in no way speak for those I met at the one local meetup.com group near me where I merely initially, and very superficially raised the whole concept of trying to get involved in setiQuest, nor can I tell you if I will in fact obtain any serious interest.
That one group is certainly not mine personally, nor am I a founder of it; thus, I can in no way anticipate their thoughts and/or reactions. I merely suggested the whole concept, since, as turned out to be the case, the mere idea of getting involved in the setiQuest project had evidently never occurred to those with whom I spoke, nor were they even aware of the existence of the entire setiQuest effort, and it simply struck me as a potentially worthwhile possible project for that one local group, and possibly one other I am aware of (but with which I have not yet raised the idea). I was merely seeking to explore the possibility of such a local setiQuest group on Long Island, nothing more; I am in virtually no position of authority, nor did I mean to imply otherwise; I merely had an initial thought and interest, nothing else.
Further, I have looked at many of the e-mails I got back since last night about my thoughts, and will obviously look further at the remainder. While I entirely understand the point about showing me how, for example, in one of them, to cull out some 10 complex-valued data pairs from a SETI data file, I cannot seem to convey my point. Which is, that the overall process of needing to re-process all of the data to be analyzed on a purely amateur, hobbyist level is, at least in my view (neophyte though I am, certainly), far too needlessly complex at first glance.
Let me then try to paint all of you an ideal, if you all mihgt be good enough to try to follow it, naive though it may initialy seem: My ideal scenario, then, goes roughly as follows: No re-processing needed, and no programming, to access the data, no use of Perl, Curl, github.com, nothing. Just a finished, well-organized spreadsheet of ASCII data showing the complex-valued voltages directly, in actual decimal values, with zero intermediate steps needed to obtain them. The ephemeris, header, and metadata information, I need to examine much further, certainly, to try to understand their relevance to the entire subject; I am obviously not a radio astronomer, nor would I even remotely suggest that I am.
However, there are, to my way of thinking, far too many intermediate steps involved here, just to be able to get at the actual complex-valued voltage signals, which, at least to me, naively, is the heart of the matter. Those values represent the material that needs to be re-examined by amateur users, and which could potentially have intelligible SETI content to be checked and re-checked for. Those are the values that I should certainly think setiQuest would want to have the greatest number of pairs of eyes examine, and re-examine, from a purely amateur, hobbyist, or technical non-SETI-staff standpoint. And, unfortunately, my point is that is simply not yet possible, at least from what I have gleaned thus far to be able to do such a re-examination directly, without what seem to be countless intermediate steps to make that data both visible, and tractable.
Let us suppose, purely for sake of intellectual argument, that you got rid of virtually all of the intermediate steps. That is to say, absolutely no binary format usage; no need to read binary data and convert it back to decimal; merely a sequence of complex-valued voltage values, and the dates and times they were recorded, as a true, ASCII time series. Also, the only other thing one would need would be what frequency, or frequencies, such data would be recorded at.
That is, ideally virtually all I personally would want, and virtually all I would initially need, to at least get started. Given that data, and nothing else, I would neither need to care about the binary data formatting SETI uses, nor the technical characteristics of the ATA; the ATA then simply becomes nothing more than a black box. Certainly, out of purely technical interest, I would be more than motivated to learn about more of those technical details, and being originaly an electrical engineer (EE), I would obviously make every possible effort to learn about such aspects.
However, when it comes to actually working on the data itself, and analyzing it according to my own lights and likes, it is merely that time, frequency, and complex-voltage data that I want, and need, nothing else. All the rest, for data-analysis purposes, in the ideal, would then become nothing more than a superfluous, albeit interesting, technical distraction, and diversion. A fascinating one to be sure, but it is the data itself that could possibly contain intelligent SETI patterns and/or Doppler-shifted RF carrier signals. How SETI formats it, and the need to reconstruct that data, merely becomes, regrettably, little more than a time-consuming diversion, and one that clearly seems to need to be gotten through before one can even see such data in any finished form, for any actual SETI observations that have thus far been done.
Mind you, in saying that, please also realize something: I have had to become involved with many similar technical projects, involving complarably complex databases, requiring me to need to learn all the technical datails of how those who recorded it processed the information. Obviously,the scientific purposes were entirely different and unrelated to SETI, albeit of comparable scientific complexity. However, the need for me to have to understand a comparable level of technical detail was, nonetheless, eerily similar.
And, while I did become engrossed in such technical details, the sum and substance of what I was required to do was generally the same as with setiQuest: I nad time- and/or frequency-series data I needed to analyze, nothing more. The same is true here; all I need, ultimately, with the SETI data, are the dates, times, frequencies, and complex voltages, and that is all.
How I, and other setiQuest hobbyists, would process that information, is utterly beside the point. I gather the data formatting SETI uses is extremely complex; that much is abundantly apparent. That is, assumedly, why setiQuest is seeking to make such data available; to allow us, as uninvolved hobbyist-level participants to be able to re-analyze that data. However, at the same time, desirably, we would, I should think, wish to be as free of as many of the technical details of getting that data to us, as might remotely be feasible.
Mind you, that is by no means meant to minimize the complexity of writing routines to extract such data, and get it into proper, ASCII, tabular, time-series, and frequency-series format. I know, and can obviously tell, that to do so is involved; I very clearly gather that by now. My point is, I merely wish to get about it before the next century, and actually start having a chance to analyze it, and not need to practically do a Ph.D.-level dissertation project merely to be able to even have organized ASCII access to it, so I could even begin.
Obviously, I would gradually seek to become more familiar, and to write and use software with sufficient fluency to contribute. However, I would also like, at least for the moment, to actually have such real, ASCII, spreadsheet data, in the proper format, so I might at least be able to start, in the interim, while I would be teaching myself enough, along the way, to be able to write such routines to be able to extract such data.
In summary, my view is this: I understand what I, as a hobbyist, am apparently expected to do, in order to re-format, and, eventually, be able to access the SETI data. My point is, if I just had the data in ASCII format, in files of reasonable, and manageable size (possibly a few Mbytes), I could do it myself, and, I think reasonably expeditiously, with considerably less delay, and/or minimal need for programming. Also, the consequent frustration of knowing the actual SETI data is there, and yet needing to jump through all of the hoops apparently being required along the way, before I could even get a chance to plot and/or analyze it. I can thus only guess at the comparable frustrations of many other would-be participants like myself who might wish to also be able to get at such data openly, directly, with no such required intermediate steps, other than merely reformatting a spreadsheet, if the data were already in ASCII format, not binary, and thus requiring no such mass reprocessing.
I quite honestly do not know of any other way to convey my thoughts, other than to have explained them as I have here. I am well aware of the many hundreds, if not thousands, of man-hours such data fmormatting must have taken to develop the present SETI data. I have written a good deal of scientific software, and am by no means naive as to the complexity of the SETI project. However, the goal of setiQuest is to get SETI data into the hands of hobbyists, and/or amateurs, with minimal initial complexity; from the description I have seen, clearly, that seems to not be the case.
I also fully understand that the method(s) and direction(s) provided, and which I have examined thus far, are all very clearly well-intentioned, in terms of trying to make the re-formatting process tractable. If, therefore, I could possibly obtain such pre-processed, ready-to-use raw, ASCII data, in spreadsheet format, I can only say that would obviously make actually beginning analysis far easier, and far less taxing, not merely to myself, but also any other numbers of hobbyists and/or amateurs who might a;sp care to participate.
I entirely realize that my viewpoint very likely runs entirely contrary to the prevailing views of other setiQuest members (it is, in all likelihood, diametrically opposed). However, I merely felt I should explain my thoughts, in order to at least clarify my perspective, for whatever it might be worth. In saying the foregoing, I am of course not suggesting that I could be provided such ASCII spreadhseets of SETI data with the wave of a magic wand. However, as a purely wish-list ambition, I would, in all candor, obviously be most thrilled to actually see it, as, I think, would many other potential setiQuest participants besides myself. I would of course look forward to any further thoughts and/or views, while I seek to digest what I have received thus far, in an effort to make any possible progress in the interim.