Virtual organization in cosmology and astrophysics

For future development, VIRGO needs to increase the number of available computing resources for data analysis. This task can be solved by using the close co-operation of the VIRGO participants with the BITP Laboratory of distributed computing (lead by E.S.Martynov and S.Ya.Svistunov). The activity of this laboratory and the support from the BITP director A.G.Zagorodny have lead to the creation of the network of Ukrainian National GRID.

The term GRID means "network" of computing facilities similar to electric grid. It allows its users to obtain access to a great number of computing resources, which are dynamically redistributed between the users to increase their efficiency. Thus, unified access through GRID simplifies the launch of computing problems, supporting both unique projects that require huge amount of computing resources and "daily" access of a great number of "normal" users.

The largest increase in efficiency will be obtained for the run of a large number of very simple (intended for 1 CPU) problems that do not require interaction with each other during computations. This is the major reason for using GRID in the data analysis from Large Hardon Collider at CERN -- the leader in experimental data production. Importantly, the LHC data consists of "events" each of which can be analyzed independenly. The fact that modern astronomical observatories (especially in high-energy astrophysics, which is closely related to high-energy physics) have the same data structure means that it is possible to use the whole GRID infrastructure (initially optimized for problems in high-energy physics) by making several small modifications.

Another (somewhat unexpected) advantage of GRID is its "universality" for large computing projects running simultaneously by tens or hundreds of CPUs in a single computing cluster. An example of such a project can be N-body simulations of the large-scale structure formation (the most detailed of them, like the Millenium simulation, which uses the GADGET-II code, need hundreds of thousands of CPU-hours in an up-to-date computing cluster). Usually, such computing projects are run at single computing clusters with a variety of hardware, software and access policies. As a result, the computing resources often are used inefficiently, and new (e.g., more important or competitive) projects can be significantly delayed due to the lack of available resources and/or significant time needed for "familiarization" with a computing cluster.

Therefore, running large computing projects in GRID, even though not increasing the computation speed, significantly reduces the time of the data analysis due to unification of project runs and provides an important next step in building the "complete" GRID infrastructure allowing for a wide range of computations. On the other hand, large participation of our scientists in an international project on cosmology and astroparticle physics motivates its participants to play a leading role in the "completion" of the GRID infrastructure, obtaining the remote access to computing resources of the Ukrainian Academic GRID, which are little used at the moment. Of course, the adaptation of the existing computing resources for a homogeneous launch of large computing projects needs significant amount of manpower not directly involved in scientific work. To obtain the corresponding resources, the initiative group of the project participants have formulated the proposal about the "creation and maintenance of a GRID-Virtual Organization in cosmology and astrophysics" and have submitted it to the Board of Ukrainian Governmental R&D Program for GRID-technologies for 2009-2013.

Software used by VO, requirements.

  1. XMMSAS — XMM-Newton data analisys software
    • 5G local space for soft and calibration files (does not need any specific libraries)
    • Task uses 1 core (0.5-5 CPU-hours per task), real analysis (launch mass of independent tasks) needs few hundreds-thouthands of CPU-hours, uploading of few Gb of data and produces hundreds Gb of products
  2. Fermi — ПЗ аналізу даних місії Fermi
    • 5G local space for soft and data
    • Real tasks needs from hundreds to thouthands (max. 30000) CPU-hours and generate dozens Gb of products
  3. CosmoMC — CosmoMC a Fortran 90 Markov-Chain Monte-Carlo (MCMC) engine for exploring cosmological parameter space, together with code for analysing Monte-Carlo samples and importance sampling
    • CFITSIO - A FITS File Subroutine Library
    • WMAP likelihood code and data
    • gsl - the GNU scientific library
    • mpi - the ‘Message Passing Interface’
    • OpenMP
    • LAPACK - Linear Algebra PACKage

    • Typical task needs 10-30 CPU for week with use of OpenMP, thus few thouthands CPU-hours
    • Up to 2 GB free disk space
    • ANSI C compiler
    • ANSI C++ compiler
    • Fortran compiler On PC Linux, the GNU Fortran compiler (gfortran or g77) is recommended.
    • make *GNU make (gmake) is *required*
    • perl (5.6.0 or higher recommended)
    • X11 / X-Windows (optional)

    • Typical task needs 5-30 CPU-hours per individual CPU, produces < 1G of products. Often starting massively, from few tasks to few hundreds of its.
  5. GADGET-2 A code for cosmological simulations of structure formation
    • mpi - the ‘Message Passing Interface’ (version 1.0 or higher)
    • gsl - the GNU scientific library
    • fftw - the ‘Fastest Fourier Transform in the West’. version 2.x
    • hdf5 - the ‘Hierarchical Data Format’ (version 5.0 or higher)
    • Needs large amount of computing resources (eg. starting from 50G of RAM) and Infiniband. Typical task runs for dozens-hundreds thouthands CPU-hours