Discussion Forums

Having considerable difficulty accessing data

14 replies [Last post]
ENGRPOD
Offline
Joined: 2011-06-29
Posts: 11

To any setiQuest member: I recently joined setiQuest.org, and have been trying to access data from links I had been provided. The specific page involved to which I had been allowed access, after speaking with one of the SETI staff, is http://184.73.186.167/download/2010-11-06-dorothy_tauceti_1420_1/, which I had been told in a separate e-mail from that staff member that I could share, in an effort to make any possible progress. For some reason, any of the data files I try to download, all of which are some 1.9-2 Gbytes in size, are invariably being given extensions for Nero ShowTime; the specific file extension applied is "NeroShowTime.Files7.dat", which, after looking on Google, I have found many others also seem to have difficulty opening. I am uncertain whether that might obviously be due to my PC seeking to apply Nero ShowTime as the likeliest package to open the files (it does not, in fact, do so, apparently). However, the SETI staff member I spoken with mentioned a package called octal dump editor (ODE), which I have searched for, albeit with no success thus far. I have been able to get into the files, and at least look at the bit content using a free file viewer that essentialy acts as a hex viewer and/or editor (I did not edit the files, merely look ed them). However, any and/or all efforts I have made to try to access the data in text format, presumably tabular, using, e.g., Word, has been totally fruitless. Either package is too physically large to access, or unintelligible, or both; thus far, I am totally and completely stymied. For those reasons, I am seeking whatever help I could possibly obtain on the forums to be able ot access the actual SETI data for my own experimentation. I am also aware of the setiQuest group listed on meetup.com, and have in fact tried to interest others interested in such hobbyist-level efforts in creating a branch of that same meetup.com setiQuest group nearby where I am, on Long Island, outside of NYC. I would certainly have thought that such a group might have existed by now; however, that seems to not be the case, with the only meetup.com group for setiQuest being the one listed in Mountain View, CA. I am more than willing to devote a good deal of time to such an effort to create a branch group for meetup.com here; however, I will obviously need assistance, and/or be able to speak with those involved in the meetup.com group that presently exists. For that reason, I would very greatly appreciate if anyone involved with it could possibly get back to me, if at all possible. I no longer work due to total and permanent disability, and am merely seeking to become more involved in setiQuest, purely from a hobbyist standpoint, as an avocation. I was originally trained as an electrical engineer (EE)l, as well as extensively in both physics and math, and was in fact extremely close to my doctiral qualifying exams in math at one point. I have also been extensively clinically trained in a doctoral-level allied-health field (not medicine, albeit comparable); however, as I have said, due to my now total and permanent disability, I can no longer work in any of the foregoing areas, and, thus, am seeking sometihng worthwhile to do with my time. I have very extensive training and experience in signal processing, statistical time-series analysis, and also numerous open-source, freeware, and other serious scientific software packages that could conceivably be of use by other setiQuest members. I also need considerable help with the use of github.com, as well as the process of downloading SETI data in some intelligble fashion, capable of having the files opened, using the process of "forking" data. While I am reasonably well-versed in many aspects of serious computer science, as well as information technology, I was away from it for some years, due to my clinical training, and, as such, there are areas with which I am unfamiliar, and would need help regarding, if any member here might possibly be able to assist me. Finally, I suggested to the SETI staff member with whom I had spoken the idea of providing all SETI data in open ASCII text format. While I fully understand his disagreeing with my reasons for suggesting text format, principally the size of such files as a result, I would only wish to point out that the principal goal of setiQuest, as I understand it, is to be able to have free access to SETI data for anateur analysis. I would only point out that github.com is a fairly sophisticated package; while I could certainly adapt to using it over time, and will of course try, to have the SETI data out there, in files of manageable size, preferably in a cloud format, could only help, in my view, to facilitate its widest possible distribution, freeing users with less free time to become familiar with the arcane details of how to download it binary to actually assist in analyzing it. I would likewise be more than willing to personally assist with such a data-conversion effort to open text format, if other members might possibly care to make use of me toward such an end, as I very much believe such an effort to be both highly worthwhile, as well as, in my view, quite necessary, to make the SETI data as simple as possible ot both download, and re-process, through amateur efforts. I likewise realize that other members might also believe such a suggestion to be wholly impractical as well, and realize I am new to the effort; however, I merely wanted to make that suggestion, for whatever consideration it might possibly be seriously given, as I feel it could only serve to further the efforts of setiQuest as a whole. If, therefore, anyone could possibly get back to me here at setiQuest on the forum, I would obviously be most appreciative, and would look forward to any responses and/or thoughts, whether positive or negative, and/or any possible assistance I might be afforded to become more involved in the project overall, whenever might possibly be convenient; please feel free to e-mail me at any time, if anyone were to have any thoughts and/or directions. Sincerely, ENGRPOD@AOL.COM

jrseti
jrseti's picture
Offline
Joined: 2010-07-22
Posts: 250
Engrpod, It is OK to tell

Engrpod,

It is OK to tell everyone the staff member was jrseti :)

In windows I downloaded the file mentioned above. I also downloaded the program HexEdit.exe. See http://www.hexedit.com/ for the download. There is a free 30 day trial.

I opened the HexEdit.exe program (after installing it). The I selected File->Open and selected one of the header files (*.hdr). I could view the entire file in a hex representation. The same worked for the complex amplitude files. So, they re easy to view in Windows.

But, if I bring up Windows Explorer and double click on the data files, Windows does not know what to do with the file. That is what I think your "Nero" problem you mentioned was stemming from.

I added a lot to http://setiquest.org/wiki/index.php/SetiData explaining exactly the format of the data, header files, etc. You may wish to review this to become comfortable with the data.

 

Question: If you had the data in ASCII format, what would you do with it?

 

-jrseti

-jrseti

ENGRPOD
Offline
Joined: 2011-06-29
Posts: 11
Copy of thoughts just posted on setiQuest meetup.com site...

(This was just posted on the setiQuest group page on meetup com; I thought it also might be suitable here, and might perhaps explain my thoughts in somewhatgreater detail.)

To all members of setQuest on meetup.com:

I have been seeking for the past several weeks to be able to open and properly format, for amateur experimental analysis, copies of data files from the setiQuest.org website. While I do have a good deal of experience in scientific programming as well as signal processing, data analysis, and time series, the format of the files involved, some of which I have been very kindly sent a link to by jrseti of setiquest.org (with whom I have spoken and had several e-mail discussions), are extremely nevertheless difficult for me to properly format and access.

In having mentioned the difficulties I have encountered on the setiQuest.org website, I had suggested the idea of trying to convert such data to purely ASCII text format, in order to try to greatly simplify such access for amateur analysis. I have been able to open the files using a freeware version of a hex editor suggested by jrseti, and I do genuinely understand his reasons for preferring that I accustom myself to the binary format presently available. However, after looking at the considerable complexity of the data formats in hex binary, I am still very much of the view that converting the data to ASCII, purely apart from any increased storage requirements, could only aid amateur experimentation with the data. Also, the package github.com which was mentioned for use in trying to create a "fork" of such data onto the PC of an amateur seeking to experiment with it, is actually a fairly sophisticated package, requiring an equally  fairly advanced level of technical experience and perspective.

While I will certainly make every effort to do as jrseti suggests, and try to use the binary format, I would merely point out that the file sizes of up to some 1.9-2.0 Gbytes are extremely large. They take an extremely long time to download, even on a broadband cable-TV connection, up to some 20-30 mins each; further, their size, I expect, would greatly tax many scientific software packages I have envisioned using to try to analyze them. I am very much aware of freeware software for both splitting and joining files, that, in my view, could potentially allow such files to be broken up for website download, making them of a reasonable, and relatively manageable size, and thus far more tractable for use with most spreadsheet-based signal, time series, and/or data analysis software.

I entirely realize that what I am suggesting would likely be deemed as an extremely cumbersome attempt to take a step backward from present data formatting approaches used for such SETI data. Further, I am equally well aware that I am obviously a newcomer to amateur SETI data analysis, and am by no means seeking to suggest such an approach in the expectation that it would be at all readily deemed acceptable. However, prior to my now permanent and total disability, by reason of which I can no longer work, I spent many years in electrical engineering (EE), physics, and math, as well as having been preparing for my doctoral-level qualifying exams in math, prior to going, instead, for doctoral-level allied health training in a clinical field, which, unfortunately, due to my diability, I no longer work in.

For those reasons, I spent many years involved with the analysis of many very large, serious, scientific data sets, which, while unrelated to SETI, frequently bore on very similar and/or closely aligned scientific subject areas. I have thus generally found that, where feasible, data should be stored in ASCII format, whether web-based, and/or using some sort of cloud-computing and/or virtual-desktop environment, for which I am aware of several open-source software projects that might be suitable for adaptation to such a purpose.

Also, I am extremely interested in trying to set up a chapter of this meetup.com group related to setiQuest on Long Island. There are, to my knowledge, at least two other amateur scientific groups on meetup.com near me that could, in my view, be more than reasonable possible venues for such a chapted, and/or serious amateur scientific and/or avocational involvement in analysing SETI data through setiQuest. As such, I would very greatly appreciate any possible discussion in that regard, and would be more than entirely willing to devote a good deal of my free time toward such an effort at making a Long Island chapter of setiQuest a reality.

I have, in fact, already spoken with several members of one of those two other meetup.com amateur science groups about the idea, and do genuinely think that, if the SETI data available could be rendered into a more tractable format capable of easier analysis and more readily-apparent format, that both of those groups, as well as others, on meetup.com, could potentially have adequate interest amongst their members to actually be able to provide a productive and contributing amateur scientific environment to the entire amateur SETI effort. I would thus greatly appreciate hearing from any member here that might possibly be interested in assisting me with such objectives, and would look forward to any possible suggestions and/or encouragement in those regards, whenver might possibly be convenient, whether by e-mail through the meetup.com website, or through additions to this discussion.

Sincerely,

ENGRPOD.

jrseti
jrseti's picture
Offline
Joined: 2010-07-22
Posts: 250
ENGRPOD, About the download

ENGRPOD,

About the download of 1.9GB files. I really see this as no problem, I have cable internet at home, it takes 30 to 60 minutes to download a file. I just download them all over night. When I wake up, they are all there. That is what I would suggest you do.

The data is inherently large, there is nothing we can do about that. Breaking it up into smaller chunks really would not solve anything, you would still have to piece smaller chunks together into larger chunks to have enough data to analyze.

Any modern data analysis package suitable for doing any type of analysis should not have any problem with 1.9GB of data.

Can you tell me what you intend to do with the data? How are you intending to analyze the data? Looking at the data in tabular form is no good, a human can not make sense of the numbers without a computer program to help. Loading it into excel would not be good, there is too much data. So, using a spreadsheet would not be feasible, unless you know things about spreadsheets I do not?

So, can you tell us how you intend to analyze the data? That would be a good place to start.

 

-jrseti

-jrseti

Dave Robinson
Dave Robinson's picture
Offline
Joined: 2010-04-29
Posts: 196
Accessing the ATA Data

Hi ENGRPOD

I haven't downloaded any data recently; however what I usually do in Windows is simply copy the file onto my large external 1TByte hard disc. THis is connected via USB so it is slow, so lots of patience is called for. I simply right click on the filename from the SETI.ORG website, and 'copy as' straight into my data directory on my big disc.

As you point out 1.9GBytes is a lot of data to directly play with - everything I have Malab, Octave or MathCad simply falls over laughing with an out of memory error. So to avoid this what I do is to break up these files into what I call SetiSecond chunks, formatted in Matlab readable files. A SetiSecond is a a period of time defined by 2^23 samples of the data at the ATA standard sampling rate. It actually amounts to about 0.96 seconds of data; Each 1.9GByte file breaks down into 122 SetiSec blocks + a small amount left over. These I save as what I call a RUMP file, which I keep; so that when I download the next file in the sequence, I can use this RUMP file to ensure that the set of SETISecond Blocks is contiguous with the first set.

I usually save these Matlab compatible blocks in a slightly obsolete standard ('-v4' in the save statement) this enables me to read the data in MathCad as a Matlab file. PTC haven't updated their Matlab file reader yet.

The following code has been written in Octave, which as you probably know is a downloadable package largely compatible with Matlab. However I would warn you that the Octave computational speed leaves a great deal to be desired.

Hope this helps

% TITLE                 Block Complete Data
%
% AUTHOR                Dave Robinson
%
% DATE                  21st February 2011
%
% CLASSIFICATION        Script Source Code
%
% VERSION               First Draft
%
% (c) Dave Robinson     21st February 2011
% ********************************************************************
% DOCUMENT DESCRIPTION
%
% This script allows the user to specify a Manifest File, and it then
% Proceeds to extract the data into SETISec Blocks, correctly treating
% the joint between adjacent files
%
% ********************************************************************
% CHANGE CONTROL
%
% First Draft
%
% Dave Robinson                                     21st February 2011
% --------------------------------------------------------------------
% ********************************************************************

% The user completes the following section to select the required
% Manifest File
Manifest_FilePath = 'g:\SETI Data\Raw Data\psrb0329+54\Manifest.mat';

% The user needs to specify the Pathname to the directory to hold the
% Data Blocks
Block_Pathname = 'g:\SETI Data\Octave\Data\Pulsar\B0329+54 Blocks';

% The user needs to specify the Filename Preamble to be attached to each
% Block File
Block_Preamble = 'B0329+54-';

% The user needs to specify the start index to filename numbering
File_Index = 1;

% Define the Block size required for 1 SETISec of data
BlockSize = 2^23;

% Predefine a buffer to hold the extracted data
Data_Buffer = zeros(BlockSize,1);

% Define the Rump Buffer to be an empty array
Rump_Buffer = [];

% Bring in the Manifest data
load(Manifest_FilePath);            % Data in "FilePath_Buffer"

% Get the number of files present in the Manifest
Total_Files = size(FilePath_Buffer,1);

% Outer loop go through the current files we have available
for fileno = 1:Total_Files

    % Get the current filename
    Filename_Current = FilePath_Buffer(fileno,:);
   
    disp(Filename_Current);
   
    % Open the file for reading
    File_Handle = fopen(Filename_Current,'r','native');
   
    % Do we have a Valid file?
    if(File_Handle < 0)

        % No!!!!
        Error_Message = 'Filename not recognized';
        disp(Error_Message);
        Data_Frame = [];
       
    else
   
        % Yes!!!!
       
        % We need to find out how many entries we have in the file, so seek the
        % end of the file
        fseek(File_Handle,0, 'eof'); % Go to the end of the file
      
        % Now we are there get the File Position marker
        End_Position = ftell(File_Handle); % Find the pointer position

        % Now get back to the beginning so we can read the data
        frewind(File_Handle);
   
 
        % From the End position, we can tell how many bytes in the file
        % Hence how many integer blocks we can extract
        No_Blocks = floor(End_Position/(2*BlockSize));  % Remember data is compl        ex
       
        % Get the number of bytes left in the Rump file
        No_Rump = End_Position - (No_Blocks * 2 * BlockSize);
       
       
        % Scan through the current file and populate files
        for block = 1:No_Blocks
       
            disp('Block Number'); disp(block);
       
            % Calculate File Pointer position
            Cur = ftell(File_Handle);
       
            % Special case for block 1 in the cycle
            if(size(Rump_Buffer,1) == 0)    % Anything in rump buffer?
       
                % NO!!!!
                % Get the current block
                % Now get the required block of data
                Raw_Data = fread(File_Handle,2*BlockSize,'schar');

                % Convert the Raw data format into Complex notation
                Data_Buffer = 1i*Raw_Data(1:2:2*BlockSize)...
                              + Raw_Data(2:2:2*BlockSize);
           
            else
       
                % YES!!!!
               
                % Calculate the size of data required to fill this buffer
                Residual_Size = 2*(BlockSize - size(Rump_Buffer,1));
               
                % Get the data required to fill the buffer from current file
                % Now get the required block of data
                Raw_Data = fread(File_Handle,Residual_Size,'schar');

                % Convert the Raw data format into Complex notation
                R_Frame = 1i*Raw_Data(1:2:Residual_Size)...
                                            + Raw_Data(2:2:Residual_Size);
               
                Data_Buffer = cat(1,Rump_Buffer, R_Frame);
       
                Rump_Buffer = [];   % Empty Rump Buffer
               
            endif
       
            % Create the filename
            Buffer_Filename = strcat(Block_Preamble,num2str(File_Index),'.mat');
       
            % Create the full filepath to write the blockfile
            Block_FilePath = fullfile(Block_Pathname,Buffer_Filename);
       
            % Now Save the block
            save(Block_FilePath,'Data_Buffer','-v4');
       
            % Increment the File_Index for the next buffer
            File_Index = File_Index + 1;
       
        endfor
       
        % Calculate File Pointer position
        Cur = ftell(File_Handle);
       
        % Now we need to get the Rump data from the file
        % Now get the required block of data
        Raw_Data = fread(File_Handle,No_Rump,'schar');

        % Convert the Raw data format into Complex notation
        Rump_Buffer = 1i*Raw_Data(1:2:end)...
                       + Raw_Data(2:2:end);
       
        fclose(File_Handle);
   
    endif

endfor 

jrseti
jrseti's picture
Offline
Joined: 2010-07-22
Posts: 250
Dave, Wow, thanks for all

Dave,

Wow, thanks for all the info. I've never used Matlab before, but I have used Octave a little bit. But that was on a computer with a LOT of RAM.

What does the acronym RUMP stand for?

Do you have any simple examples of using these RUMP files in Octave, maybe to create a graph or something? If so, I'll try to get this working and create a simple tutorial on the WiKi to show the entire process from downloading the data to getting a graph to pop up on your screen.

 

-jrseti

-jrseti

Dave Robinson
Dave Robinson's picture
Offline
Joined: 2010-04-29
Posts: 196
Maybe I have not made myself

Maybe I have not made myself clear here - sorry about that. What I have done was generate what I call a Manifest File, which is simply a text list of the names of the files that I want to convert into my SETISecond block files. Each of these blocks that contain exactly 1 SETISec of data gets stored as a standard Matlab/Octave file of name something like
<Preamble>-0.mat ... <Preamble>-121.mat. However there is never an exactly integer number of SETISeconds in one of your Data Files. It is also important to note that the next SETI File exactly continues from where the previous one finishes; therefore I need to save the remnant of the first file, so that I can take another chunk of the next one so that when they are concatenated they form a complete block. As I am in the position where my ISP gets rather upset if I download more than one of your files a month (I download a lot of other stuff to) What I do is to store this little chunk of data as <Preamble>-RUMP.mat. So that when I go to process the subsequent SETI FIle I have the remnants or RUMP of the data left from the previous file processing; thus avoiding having holes in my series of 1 SETISecond blocks.

Hope that makes it a little clearer. Sorry for the confusion I have caused.

Regards

Dave Robinson

ENGRPOD
Offline
Joined: 2011-06-29
Posts: 11
About the difficulties I have had accessing data...

To all setiQuest members:

I have been specifically (and quite understandably) asked by jrseti to be far briefer in my posts; for that reason, I will try to be. My point is this: I am finding the entire process of trying to convert the SETI data into some usable, readonable format, next to impossible. I realize that is clearly a "met thing", and an "all of you thing"; however, that is genuinely the case. I do not have 1 Tbyte hard drives, and was a clinician for many years after I stopped working as an engineer. While I can, therefore, write software, I am not quick at it, and will doubtless need time to adapt to the methods all of you seem to take for granted in reading the types of data files jrseti was good enough to send me a sample link to.

For that reason, I would most earnestly ask if someone, anyone, might possibly have any ready-made, pre-text formatted, complex-voltage data, along with the times and frequencies of the data samples. That, at least initially, is all I seek. Certainly, I would like to do ambiguity function plots, to try to see if I can find any linear Doppler drift indicating an RF carrier; then, too, I would like to try to do some autocorrelation of the  data, obtain power spectra, and try to become familiar with it, using certain signal-processing and data-analysis packages I explained I had available to jrseti.

I realize what I am asking for is highly unusual, and that all of you are clearly wholly conversant with both the data formatting, as well as have far superior hardware, and far more current software background. I am only a very small, insignificant, little would-be hobbyist, who has simply been itching to get his hands on some SETI data for a long time, that is all I am sayng. Could someone therefore, please, find some way, any way, to merely provide me a text and/or ASCII version of some of the type of SETI data I have mentjoned, so I could please just start working with it, with no further preliminaries, and/or preludes? I have long wanted to get involved; however, at this rate, it will be, I assure all of you, the next millenium before I actually have any data to even look at. Some of us who would like to analyze the data merely wish to analyze it, not become engrossed with getting it into the right format; for that reason, I would obviously be most grateful for any possibly assistance toward that end, that is all I am saying. If such data is available, jrseti does know how to get it to me; any such help would be very sorely appreciated.

Sincerely,

ENGRPOD

Dave Robinson
Dave Robinson's picture
Offline
Joined: 2010-04-29
Posts: 196
Hi Engrpod   I am not quite

Hi Engrpod

 

I am not quite sure that you appreciate the difficulty of the task that you are asking. Unfortunately, with the few exceptions of some very loud signals coming from our own artificial satellites, very few of the signals that are in the SETI Data files are plainly visible; most of them are buried in ridiculous quantities of noise. Because of the sampling rate used at the ATA, 1 seconds worth of Data takes over 8Mega Samples to store (however remember that they are complex numbers, and hence each sample provides 2 numbers). If you convert to ASCII, and take into account the need to allow for negative numbers, then you could require a file of over 16MBytes just to hold 1 seconds worth of data. I know that the latest version of Excel has increased its ability to display long columns of numbers; but I don't know if it can manage that amount of data.

 

What can you see in a single seconds worth of data. Well effectively nothing but noise. In order to start to see the interesting stuff you need to do something about removing or at least minimising it. This means (as I am sure you already know) you need to combine lots of 1 second blocks in order to remove the random noise, and start to see the coherent signal that is buried in it. For example I was only just able to see the Pulsar on the PSRB0329+54 data set, after using all 122 1 second blocks contained in the 1.9 Gbyte file downloaded from the SETIQuest database. Now this is one of the strongest Pulsar signals that is visible from the Earth.

 

I understand exactly what you are trying to do, for like you, I am a complete amateur, like you I have come from a different background of signal processing , and had some difficulty in originally extracting the data. However if your machine comes equipped with Windows, then you can get a free copy of the Matlab Clone called Octave, which you can find by doing a Google search. This will give you a relatively powerful mathematics program which will do virtually everything that you need – including reading the raw data files ( see my previous response on this thread). If on the other hand you run Linux, then getting a copy of Baudline might be your best bet (see the many first rate contributions that Sigblips has made to this forum).

Providing your machine comes equipped with a USB port, then you can easily equip yourself with a 1 TByte drive for I would guess $100, that would plug straight into your machine, and be instantly recognized.

 

Hope you find this response useful

 

Regards

 

Dave Robinson

jrseti
jrseti's picture
Offline
Joined: 2010-07-22
Posts: 250
I think that learning how to

I think that learning how to use Octave may be a big undertaking, but well worth it in the end. Once ENGRPOD learns how to use Octave a whole new world of analysis possibilities will open up.

You made a good point about 1 second of data not being enough to do any good. You need a lot of seconds worth of data to see any signal patterns.

 

-jrseti

-jrseti

ENGRPOD
Offline
Joined: 2011-06-29
Posts: 11
Some further explanation on potential usefulness of spreadsheets

(This is a slightly modified copy of an e-mail I had just sent to jrseti, further explaining my rationale on the potential usefulness of spreadsheet approaches to SETI, along with some further suggestions at the bottom, that he has asked I post here for review. I am aware it is long; however, I ask the readers to please try to be patient, and to please try to understand the points I am seeking to make. I assure all of you, I have very definitely read allof the other the responses to my previous thoughts, as posted here; I think, nevertheless, that you will all please bear with me, and merely read what I have written here, that all of you might perhaps better understand my turn of mind on the entire topic. I am also aware jrseti would wish me to be brief, and, certainly, in future, I will of course try to be. However, there is a very definite point I am seeking to make here, which requires an explanation of a certain minimum length, which I can see no way of avoiding, if the point is to be made adequately.)

Dear jrseti:

   Yes, obviously, of course, I have already read it, and have just re-read, the response to me about the impracticality of a spreadsheet approach, just now, to be certain I caught everything. I am extremely well aware of Octave, and have been for quite some time; however, while I of course entirely appreciate why the one who wrote the answer is suggesting that a spreadsheet approach would be useless, I respectfully do not necessarily agree, for a multitude of reasons, which I will also try to explain on the forum.

 

    First, to simply exclude any possibility of usefulness of spreadsheet usage, regardless of the data-record length required to be meaningful, is, at least to my way of thinking, a needlessly restrictive mindset that deprives the average amateua and/or hobbyist of a method, which, however apparently primitive for the purpose, is one which virtually all serious computer users can readily make use of. You clearly want as many people as possible to examine the SETI data; to virtually hamstring their efforts, which would doubtless include spreadhseets as part of their typical repertoire, merely due to a turn of thought that would say that such an approach is virtually useless, at face value, at least from my perspective, robs the potential amateur and/or hobbyist of a potentially extremely valuable tool in their frequently limited arsenal.

 

    Further, if you take, for example, the various software packages I mentioned to you I have available for me, I might well possibly be able to use them to handle arrays of sufficient length to actually make spreadsheet use for SETI data practicable. Precisely how is utterly besides the point, at least for the present; suffice to say, and please believe me on this point, that it is seriously, technically possible. I know you would want more explanation, and entirely understand why; however, please leave that discussion for another time, and I will try to explain it to all of you at an appropriate point.

 

    In addition, there are numerous software packages for data analysis, signal processing, and/or time-series analysis, that are expressly designed to work with spreadsheet data. Like it or not, believe it or not, dubious or not, cumbersome or not, spreadsheet approaches are absolutely inevitable for what you are seeking to do, at least in terms of  making them available to mass amateur and/or hobbyist participants. In fact, I have already used such software tools, not once, but many times, to manipulate extremely long DNA and/or RNA sequences I have been able to download.

 

    Mind you, I am not speaking here of a few hundred bases in a sequence; I am speaking of hundreds of thousands, and, on occasionally, into the millions. I do such experimentation fairly routinely, in an effort to look at autocorrelation signal characteristics of such seqeuences; that, too, is an altogether and yet equally serious separate hobby of mine, namely, the application of signal-processing techniques to genomic analysis. However, as Kipling said, that is another story altogether....

 

    The point, then, is this:  I do entirely understand why you and others suggest that spreadsheets seem superficially unsuited to the SETI purpose. However, I can equally well assure you, at least in my view, that I have extremely good reason to believe that such is very definitely not the case, and that their use, or, at minimum, their availability, is absolutely necessary, and requisite, for such amateur and/or hobbyist experimentation. For those reasons (and I will also post this e-mail, or some variant of it, on the forum), I thus beg you, and others, to please trust that I, too, do entirely understand what I am speaking of, and that, regardless of what equally entirely understandable doubts you and/or others on the forum may have, I do know for a fact that I am convinced I am correct, or, at minimum, that I at least have the serious potential to be.

 

    For those reasons, please, so far as possible, trust my good intentions and also not insignificant large-scale data-analysis experience in the matter, and please believe that my intention is to help, not hinder, by making such a request, however unrealistic it might seem, at least at first glance. If you and/or others involved with the data can manage to provide me with realistic volumes of ASCII- and/or text-formatted versions of the complex voltage amplitudes, times, and frequencies, I am telling you, and them, in all seriousness, that I am absolutely convinced that I have a reasonable chance of using such spreadsheet-based signal-processing and/or data-analysis software packages to make such a contribution, which methods I would gladly be willing to share with both you, as well as others on the forum, once I would have been able to make such attempts. In any event.

 

    I still tell you, purely out of a desire to be of use and help, that having the SETI data out there in ASCII and/ot text format could only serve to advance the cuase of the entire project, and make the entire process of being able to then experiment with such data far less painless for the average hobbyist and/or amateur, in any event. I look forward to your further thoughts, and would appreciate hearing from you whenever might be further convenient. Meanwhile, I will of course try to become familiar with Octave, as suggested, and also how the SETI data is actually presently formatted.

 

Sincerely,

ENGRPOD.
 

P.S. I would also very seriously suggest that all of you look at the Dataplot package, provided by the Statistical Engineering Div. of NIST; it could be of signifiicant value to all of you in your efforts, as I will also try to make use of it myself. The same is true of the software package ImageJ, made available from NIH; both ImageJ and Dataplot, so far as I am aware, are either open source, or at least have the possibility of having source code be obtainable. In addition, please also look up the Working Group on Reverse Engineering (WCRE), of the IEEE; they are specifically involved in efforts very comparable to those of SETI, and could provide an excellent additional forum for such data analysis. I am very well acquainted with them, and it does not take Einstein to realize that SETI represents an attempt to both locate, and also understand, transmissions from unknown intelligent source, a task not unlike reconstructing an unknown computing architecture from purely bit-sequence-level assembled binary data to be found in any computing environment. To my way of thinking, the analogy is both precise, as well as fundamental; thus, to not seek to make serious use of WCRE expertise for SETI efforts would seem to represent an utter waste of their potential, and, in fact, SETI could well represent a potential task for their attention, entirely deserving of their efforts. 

jrseti
jrseti's picture
Offline
Joined: 2010-07-22
Posts: 250
ENGRPOD,   Do you have any

ENGRPOD,

 

Do you have any computer programming capability? For instance, do you know how to program in C, do you have a compiler on your windows machine?

 

-jrseti

-jrseti

ENGRPOD
Offline
Joined: 2011-06-29
Posts: 11
More explanation...

Dear jrseti:

Yes, certainly, by all means, I do have very significant, varied, extensive, and serious scientific programming experience, as well as having a C and C++ comiler available at present. I further did extensive FORTAN programming years ago, which while dated, hopefully, at least illustrates the point. Also, you had wanted me to make that added point here about the possibility of uploading YouTube vieos, wholly apart from a webinar approach, for others on the forum to continue. In fact, I also recently located several extremely useful, and tractable, BASIC interpreters, that I felt, and feel, could be of considerable use to setiQuest amateurs and hobbyists. Believe me, I am real, I am trained, I do know what I am doing; I may be rusty in places, however, not altogether hopeless. However, if you will permit me, that is not the point I am seeking to make:

In our discussions, you are aware I had mentioned other certain software packages I was aware of, and which I made you aware of, which, while canned, are nevertheless extremely powerful, and sophisticated, for signal processing, time series, statistical, and data analysis purposes. Also, there are several add-ins for Excel, and/or possibly its OpenOffice.org equivalent, that could significantly enhance the computing power of those two packages. I have made a very detailed study of such packages, as well as very extensively researched a whole host of freeware packages, in an effort to avoid needing to re-invent the wheel in many cases. I have frequently found such efforts to be extremely useful, and have in fact saved both myself, and others, considerable time and effort, by such invrestigations.

My point is this: You all want the SETI data examined by as many people as possible; to do that, there needs, in my view, to be a complete re-think here with regard to flexibility as to how that data is to be formatted, and made available, for amateur and/or hobbyist consumption. I am amply aware of how much noise it contains; that nuance, I assure you, is not lost on me. However, the process of being able to re-format for individual experimentation, at least from what I have seen, and even with the best of intentions on the part of all of you (which I do of course entirely admit), is unbelievably time-consuming, unintentionally frustrating for the less-experienced would-be experimenter, and, were a package to be written to even allow the experimenter to merely feed binary-formatted data through it to convert it to either ASCII and/or text, totally, entirely, and completely unnecessary.

That, to my way of thinking, is at least one area where effort should be expended, and an area I obviously intend to expend a good deal of my time to try to create. I simply point out that all of you are clearly far more familiar with the format of the SETI data than myself, and it should thus certainly not seem to be an insurmountable difficulty to create such a GUI-driven data-conversion utitlity, that is all I am saying. I dp understand the reason for the questions I am being asked, and the suggestions I am being given; I also entirely understand that my viewpoint is also, no doubt, likely highly unusual to all of you.

Nevertheless, I can only give you my thoughts as they occur to me; the data-conversion process, as it stands, is, at least to my way of thinking, far too unwildy, cumbersome, and time-consuming, needs to be significantly streamlined for the average would-be amateur and/or hobbyist experimenter, needs to allow for the creation and availability of ASCII and/or text archives, and needs to allow for the potential adaptation of spreadsheet methods, as one potential tool in the would-be experimenter armamentarium. I realize many, most, and/or all will disagree; however, while I will of course try to adapt to the methods you presently use, those are my thoughts, as I see them, for whatever potential use they may have, or be worth.

Sincerely,

ENGRPOD.

jrseti
jrseti's picture
Offline
Joined: 2010-07-22
Posts: 250
ENGRPOD, Since you have

ENGRPOD,

Since you have access to a C compiler, and know how to program in C, it would be very easy for you to convert the setiQuest data to the ASCII format you want and try to do something with it. The things you discover doing this may be of interest to the people in the project.

Give it a try.

-jrseti

-jrseti

robackrman
Offline
Joined: 2010-04-15
Posts: 235
The Apache server hosting the

[no-glossary]
The Apache server hosting the setiQuest data honors the "Range" header, therefore, you can download any portion of any of the posted data files that you desire.  Furthermore, you can convert to ASCII on the fly.  Here is an example of downloading and converting to ASCII the first 10 pairs of complex values from the observation file you previously mentioned using cURL and Perl:

$ curl -s --header "Range:bytes=0-19" http://184.73.186.167/download/2010-11-06-dorothy_tauceti_1420_1/2010-11... | perl -e 'while(read(STDIN,$b,2)==2){printf("%d,%d\n",unpack("c1",$b),unpack("x1c1",$b))}' >test-download.dat
$ cat test-download.dat
-1,8
13,26
13,-5
-2,-18
6,5
8,1
3,4
-19,11
-1,16
16,17

If you are a Windows user, then install curl and Perl under cygwin (www.cygwin.com) or install the individual binary packages from the cURL and Perl main websites.  Instead of "cat" use "type" or view or analyze the downloaded ASCII file with some other utility.

[/no-glossary]