SetiCloud

From setiquest wiki

Revision as of 18:51, 30 April 2012 by SigBlips (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
SetiCloud.png

setiCloud is a discontinued platform for testing and executing signal analysis algorithms on data from the Allen Telescope Array in the Amazon Web Services cloud. The company Cloudant[1] donated the software platform and worked in collaboration with the SETI Institute. The setiCode project was announced and demonstrated at the OSCON 2010 event.[2][3]

Contents

Discontinued

The setiCloud project had numerous problems[4][5][6] and Cloudant never did support it. Numerous people in the setiQuest forum complained about this poor situtation. The problems, lack of support, and complaints led to the abandonment of the setiCloud project in January 2011.

Architecture

The Cloudant architecture spawned on-demand AWS instances that would run end-user created Matlab (Octave) scripts. Running the scripts in parallel would greatly reduce computation time. This required that the scripts be written to run in isolation on a small chunk of data. There was no message passing and overlapping data was a problem. Another requirement was that the setiQuest Data had to be split into smaller work chunks. This duplication of data wasted the AWS storage resource.

Tutorial

This tutorial serves as a basic introduction to setiCloud and its web user interface. It walks you through the development of the very simple kurtosis program[7] made by Anders Feder, which computes the kurtosis (a measure of how a set of data deviates from Gaussian white noise) of all the setiQuest data sets which has been uploaded to setiCloud so far.

The tutorial assumes basic knowledge of programming in the GNU Octave[8] language.

1. First of all, log in at http://seticloud.cloudant.com/. If you are unsure about your username and password (e.g. if you log in on this site using OpenID), you have to set a new password on your user profile on the present site.

2. Once you are logged in, you should see a frame on the left-hand side of the browser window. Towards the bottom of the frame, click "New view group".

3. You should now see a view group item named "unnamed_view_group" in the left-hand frame. Under the view group there are two buttons: "New view" and "New attachment". Click "New view".

4. You should now have a new item under the view group, called "unnamed_view". This is a view. The term alludes to how the view defines a way to present data stored in the setiCloud database.

5. Under the new view, you should have two items: "Map" and "Reduce". These two items lets you define two different GNU Octave[9] scripts, with two distinct purposes.

6. Click "Map". This should open a blank area in the code frame on the right-hand side of your screen. This is where you enter your "Map" script. The "Map" is the probably the most important part of your program. The script you define here is invoked once for every data object that exists in the setiCloud database. The data objects in this case are "chops" - files of raw data from the ATA, similar to those posted on setiquest.org, only segmented into more convenient 64 megabyte partitions - and their related metadata.

7. In most regards, your "Map" script is just like an ordinary Octave script. There is (at least) one important difference, though. Your script must contain at least one function, and that function must be named map() and take only a single string argument. Also, it may only return one string value. Thus, your "Map" will look like this:

function [json] = map(doc)
  #Your code here.
endfunction

Where doc is a specially-formatted JSON representation of the data object currently being processed, and json is a JSON[10] representation of the resulting output from your script. That is, setiCloud invokes your map() function with a data object filled into the doc variable, and your script returns its output in the json variable.

8. Now let's fill in some code in the "#Your code here." part of your function. Because the doc input variable is written in JSON, we have to parse it before we can use it in our script. For this, we use the special parse_json function:

 extract = parse_json(doc);

This statement will parse the JSON string contained in the doc variable into a GNU Octave data structure[11] called extract which your script can more easily operate on.

9. The extract variable will now contain a list of key-value pairs derived from the setiCloud data object that your script received in the form of the doc string variable. There is no official documentation of the keys used in these setiCloud data objects, but you can look under "Browse data" in the left-hand frame for examples to get an idea of what each of them means.

10. We will now extract a value from the extract variable we created in the previous step for further processing:

 url = extract{1}.url;

This statement picks the first element in the extract variable, extracts the element with the key url, and transfers the value to a new variable called url. In setiCloud, the url element contains the address of the raw data associated with the data object currently being processed.

11. Now we download that raw data to a variable that we can work with in the Octave script using Octave's built-in urlread()[12] function on the address we obtained in the previous step:

 [data,status,errmsg] = urlread(url);

This statement loads the raw data stored at the address contained in the url variable into a new variable called data. If the operation fails, the status and errmsg variables will be set with codes that can be used for diagnostics (see urlread() documentation for details).

12. As Rob Ackermann figured out in the thread linked[13] in the beginning of the post, the urlread() function by default loads the raw data as (unsigned) 8-bit string characters, while the data as output by the ATA actually represents signed 8-bit integers. Additionally, Octave requires double precision data for most computations. We will perform both conversions (from unsigned character to signed integer to double precision) with this statement:

 x = double(int8(data));

13. Now we can finally perform our analysis on the data:

 output = kurtosis(x);

However, how do we view the result? We can't use disp()[14] or similar functionality since we can't view the terminal output of our Octave script in setiCloud.

14. Instead, we return the result using the return variable json we defined in the beginning of the tutorial. setiCloud only accepts return values in JSON, so we will use the following statement to format the result:

 json = sprintf('[["%s",{"kurtosis":%f,"status":"%d","message":"%s"}]]',url,output,status,errmsg);

I won't go into details about the sprintf()[15] function and JSON formatting (many tutorials on JSON exist around the web), but this statement creates a JSON data structure containing the values of the variables url, output, status, and errmsg.

Upon completion your function should look like this:

  function [json] = map(doc)
    extract = parse_json(doc);
    url = extract{1}.url;
    [data,status,errmsg] = urlread(url);
    x = double(int8(data));
    output = kurtosis(x);
    json = sprintf('[["%s",{"kurtosis":%f,"status":"%d","message":"%s"}]]',url,output,status,errmsg);  
  endfunction

15. The final step is to save your work to setiCloud by clicking "Save" next to your view group in the left-hand frame. Then click your view ("unnamed_view"). The right-hand frame should read: "The view is building". This means that the cloud is executing your "Map" script on every object in its database. Once it is done, the results should appear in the same frame (as will any errors during execution).

You can close the browser window or shut down your computer if you want, while the script is being executed - the computation job will continue running in the cloud until it is done. The time it takes the job to complete depends on the availability of resources within the cloud.

While the job is executing, clicking unnamed_view should produce messages similar to the following:

  The view is building (0% done).
  The view is building (X% done). With X being any value between 0.0 and 99.9.
  The view is building (NaN% done).

It is possible that a message other than the type indicated above will be displayed. If it appears an error is indicated, verify your code is correct and begin this step again at the point of (re)saving your code. Upon completion, clicking unnamed_view produces a right-hand display similar to the following:

   View unnamed_view
   > http://s3.amazonaws.cc: {...}
   > http://s3.amazonaws.cc: {...}
   > http://s3.amazonaws.cc: {...}
   ...

Clicking any one of the arrow icons (indicated as > above) or associated elipses ({...} above) should produce output similar to the following:

  status:   1
  message:  value
  kurtosis: -0.01407

This concludes our tutorial. Slightly more advanced topics are attachments and "Reduce" scripts. Attachments are basically libraries of support functions that all views in the same view group can use. Attachments can define an unlimited number of functions while "Map" and "Reduce" scripts can only define one. "Reduce" scripts are similar to "Map" scripts, but instead of operating on input from the setiCloud database, they operate on the output from their associated "Map" script after it has been invoked on every object in the database. That is, data objects are sent from the database to the "Map" script, and all outputs from the "Map" script are then sent to a single invocation of the "Reduce" script. This is to allow you to aggregate results from each invocation of "Map" in a single output from "Reduce". For more details, see the examples available under "Public analyses" in the bottom of the left-hand frame.

Elaborate documentation on the general concepts can also be found on the website of the CouchDB project[16], which setiCloud is based upon.

References

  1. https://cloudant.com/
  2. http://setiquest.org/blog/fun-oscon
  3. http://blog.cloudant.com/cloudant-and-seti-crowdsourcing-the-search-for-e-t/
  4. http://issues.setiquest.org/projects/seticloud/issues
  5. http://setiquest.org/forum/topic/seticloud-buggyness
  6. http://setiquest.org/forum/topic/problems-running-sample-analysis-octave
  7. http://setiquest.org/forum/topic/kurtosis-seticloud-data
  8. http://setiquest.org/forum/topic/using-gnu-octave
  9. http://setiquest.org/forum/topic/using-gnu-octave
  10. http://en.wikipedia.org/wiki/JSON
  11. http://www.gnu.org/software/octave/doc/interpreter/Data-Structures.html
  12. http://www.network-theory.co.uk/docs/octave3/octave_268.html
  13. http://setiquest.org/forum/topic/kurtosis-seticloud-data
  14. http://www.gnu.org/software/octave/doc/interpreter/Terminal-Output.html#doc_002ddisp
  15. http://www.network-theory.co.uk/docs/octave3/octave_140.html
  16. http://couchdb.apache.org/docs/intro.html

External links

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox