Astronomers use vast quantities of data for their research. These data are obtained using many different kinds of telescopes situated on the ground and on satellites in space. Because of the rapid advance in telescope and detector technology there has been a phenomenal increase in the quantity of data, both from observation of specific objects as well as from surveys which cover entire areas of the study. The data volumes have already reached terabytes and great growth is expected over the coming decade.
Because of the large volume of data, and the many different forms in which it is available, the storage and retrieval of data have become difficult tasks. It has also become very challenging for astronomers to use the vast store house of data to produce exciting new scientific discoveries. Projects are being undertaken in the USA and some countries in Europe to efficiently utilize the data through the establishment of a Virtual Observatory.
The Virtual Observatory-India (VOI)project seeks to bring together
astronomers and software developers, with experience in
handling large volumes of data, to contribute substantially to the international
VO effort, and to make accessible to astronomers in India and in other countries the
developments in the most useful forms.
The aim over the next few years will be to
*Undertake
research and development for data search and retrieval;
*Develop
software for equal and efficient use of the data;
*Enable
astronomers and other interested scientists to undertake major scientific
projects using the data and
*Make the technology available for use by
other fields, like remote sensing, population studies, bioinformatics,
health care etc. which involve large volumes of data.
The project will be undertaken through a collaboration between:
*
Astronomers and other scientists working in research institutes and
university departments
* Expert computer software developers from
the industry.
The project will establish a model for collaboration between
experts from academic fields and from industry in the area of information
technology. A good part of the developmental and scientific work will be
undertaken jointly with
California Institute of Technology and other
renowned institutes in the USA. The products which arise from this
collaboration will have wide applicability and will enable scientists and
technologists all over India to benefit from the rich databases which are
being developed all over the world. The project will use Indian scientific
and technical expertise to make important contributions to the world wide
effort, and establish India as one of the pioneers of virtual science.
The Ministry of Communication & Information Technology providing a major part
of the funds required for the project. Substantial contribution to the effort,
particularly in the form of expert software engineers,
will be received from
Persistent Systems Pvt. Ltd. and other sources.
IUCAA
will provide infrastructure,
expertise, computing facilities and
other resources.
Astronomers carry out their
observations using a variety of telescopes,
based on the ground or on space platforms. They also use a variety of
detectors like photographic planes, radio receivers, CCD cameras, X-ray
detectors etc. The type and location of the telescopes and the kind of
detectors used depend upon the region of the electromagnetic spectrum in
which observations are to be made. There are two basic kinds of observing
strategies which are followed :
*Observations of specific targets
which are of interest to specific groups of astronomers.
*Observations
which survey large portions of the sky and which can be later used in a
variety of scientific projects.
Over the last two decades there has been great progress in telescope and detector technology. It has therefore been possible to build many large telescopes and increasingly sensitive detectors. The large installations are extremely expensive and the demand has been to build telescopes and detectors through collaborative efforts and to make them available to a wide community. Astronomers all over the world can therefore use the advanced facilities to which they may otherwise not have had access. The data obtained using these facilities is generally archived and made available to the entire community, regardless of who obtained it in the first place.
Modern telescopes generate data at an astounding rate. Data volumes now can be as large as several hundreds of Gbytes to a few tens of terabytes. It is expected that some of the surveys which will be initiated over the next few years will generate several terabytes of data per day. Storing, retrieving and scientifically using these vast databases is a formidable task, which cannot be managed by astronomers alone. It will require the joint effort of astronomers and computer scientists to adopt existing hardware and software technology, and to develop new hardware and software to meet the challenging task of making the data available to all potential users.
Computer storage for large volumes of data is still very expensive and therefore the storage will have to be successfully managed using a hierarchy of archiving hardware and advanced data compression techniques. It will not be practical, over the foreseeable future, to maintain copies of all the data in different locations in the world. It will therefore be necessary to develop strategies for distribution of data in the most efficient possible manner and to provide for high speed data transfer between the different data centres. Data obtained in different parts of the electromagnetic spectrum requires vastly different kinds of processing before it is brought to a scientifically usable form. The data is also stored using quite different hardware and software systems, and technique have to be developed for bringing together the different structures.
A Virtual Observatory (VO) seeks to facilitate the storage of large volumes of data and its use in an efficient manner. Efforts in establishing such structures have been made, with moderate success, by different observatories and institutes from time to time. But the huge increase in the volumes of data now available, and the need to carry out research simultaneously in many different parts of the electromagnetic spectrum, has made it necessary to make collaborative efforts, much in the manner of joint effort undertaken to develop major new ground and space based telescopes. The need is to make a comprehensive solution available to help astronomers, regardless of their geographic location, or their own area of expertise, to access data generated from different telescopes all over the world and in space.
At the present time devices which can usefully store terabytes of data are very expensive and it is expected that this situation will prevail for some time to come. It is therefore not practical to store all available data in many locations in the world for it to be easily available to astronomers everywhere. It is necessary to selectively store data in strategic locations and to make it available using the Internet as well as other means for data transfer. A VO will seek to make the distributed data seamlessly available to astronomers. This will require the development of highly sophisticated data retrieval software which can federate data stored in many data fields. The data will be available in the form of catalogues, spectra and images. The VO will enable astronomers to use these different kinds of data simultaneously, irrespective of their location and basic nature, for a full multiwavelength analysis.
The vast quantities of data will enable astronomers to look for very rare objects, patterns and relationships which remained totally inaccessible when only very limited data were available. Searching for these rare features will require the development of highly sophisticated data mining techniques for the search to be completed in finite time. The features found will have to be subject to analysis, and to be compared with the results of numerical simulations. The VO will seek to provide hardware and software platforms on which all these operations can be carried out.
The aims and general tasks to be performed are summed up in the White Paper on the National Virtual Observatory being established in the USA and are as follows:
*Establishment of a common systems approach to data pipelining, archiving and retrieval that will ensure easy access by a large and diverse community of users, at minimum cost and completion times;
*Enabling the distributed development of a suite of commonly usable new software tools to make possible querying, correlation, visualization and statistical comparisons of data;
*Coordinating the establishment of high speed data transfer networks that are essential to providing the connectivity among archives, terascale computing facilities, and the widespread community of users;
*Facilitating productive collaborations among astronomy centers and major academic institutions, both national and international, in order to maximize productivity and minimize infrastructure costs;
*Ensuring communication and possible collaborations with scientists in other disciplines facing similar problems, and with the private sector;
*Maintaining a continuing program of public and educational outreach that capitalizes upon the unique resources, in both data and software, of the VO to provide a unique window into astronomy and scientific methodology.
India has a community of astronomers who work on a variety of theoretical, observational and instrumentation projects. The research produced in the country is of world standard and significant contributions to the development of astronomy have been made over the years. In recent times the observational facilities available to Indian astronomers have been augmented by installation of moderately sized optical and infrared telescopes and the state-of-the-art Giant Metrewave Radio Telescope. Experiments have been conducted from space, and a satellite for multiwave length astronomy is to be launched in a few years. While these developments have been commendable and exciting, it has been necessary to supplement observations made from Indian facilities by observations made with international facilities and space borne instruments. The facilities offered by a VO will very well complement those which so far been accessible, and will make a continuous stream of high quality data available for exciting research projects.
The work related to a VO will require the development of highly sophisticated tools for different storage, retrieval and for data mining . Great expertise in these matters is available for the Indian software community. The expertise developed by the Indian industry in software development and project management will be of immense benefit in development of the VO. The present project seeks to bring together astronomers and experts on computer science for the development of Virtual Observatory- India.
The developments in India will be undertaken in discussion and collaboration with international groups working on VO. This will mean that the developments will have world wide impact, and India will become a founding member and partner of the international VO community. The pioneering efforts will enable other more sophisticated projects to be undertaken over the coming decade.
Deliverables from the project
The project will benefit all
scientists and technologists who are
interested in using large volumes of data. Since the project envisages
setting up of virtual structures for remote access, the benefits will be
available to even those who are usually deprived of access to advance
facilities. A very important contribution of the project will be to
develop a model for collaboration between academics and the software
industry. The benefits to be derived from the project can be summarized as
follows :
1. Astronomers in institutes, universities and colleges will be provided
access to unified data collection, hardware and software for efficient use
of the data.
2. Teachers and students from the university sector will be provided
opportunities to participate in state-of-the-art projects on a continuing
basis. Their access to the VO will be through inexpensive computers and
easily available, low cost internet connections. A whole new world of
research will therefore become open to them.
3. The VO will be developed through collaborations between astronomers and
other scientists in research institutes and universities on the one hand,
and computer science and data mining experts from the industry. The
project hopes to become a model for such interaction.
4. The expertise and programmes developed in setting up the VO will be of
direct and immediate relevance to a host of other fields which involve
large data volumes, including remote sensing, population, studies from
census data, meteorology and weather prediction and health care.
5. The project will be executed in active collaboration with groups in the
USA and
Europe.
Results of the project will provide direct inputs to
international efforts. This will establish India as one of the pioneers in
the field of virtual observatories, which is the emerging concept that
promises to be of immense application in the coming decade.
6. The VO will be used in formulating public outreach programmes. Students,
teachers and other interested persons will be able to make use of its
facilities.
Work plan : The tasks to be undertaken
as part of the project will be of
two kinds:
*Those which involve software development with specific
astronomical applications and
*Applications to other areas.
The steps
to be taken in executing these tasks will be as follows :
1. Computer hardware, consisting mainly of mass storage media including
backup devices and servers will be acquired and installed.
2. Large databases on which the developed programmes are to work will be
acquired.
3. Software development dedicated to the federating of databases, rapid
searches, copying and archiving data efficiently, integration of numerical
and image data etc. will be undertaken.
4. Software will be developed for data mining including cluster analysis,
supervised and unsupervised classification, visualization, statistical
analysis etc. These will lead to research projects and specific results.
5. Programmes for applications to other areas will be developed in
collaboration with scientists from different disciplines.
Execution : The project will be
executed in the following manner :
1. The project will be coordinated from
IUCAA,
where the hardware acquired
for the project will be installed. IUCAA has a highly sophisticated
computer centre, infrastructure and computer engineers to install and
maintain the equipments. It has a highly competent team of astronomers,
which includes members of its faculty as well as a number of visiting
associates from universities who spend a significant amount of time in
IUCAA. Development of the astronomy related software and applications will
be undertaken by these scientists in collaboration with colleagues from
other institutes and university departments.
2. The software development related to the management of databases and data
mining will be undertaken in collaboration with the industry. The main
effort here will be provided by computer scientists and developers
from
Persistent Systems Pvt. Ltd., which is a highly successful and leading
software development house which specializes in data related projects.
Persistent Systems is located close to IUCAA and there will be close
collaboration between personnel in the two organizations. Persistent
Systems will use the hardware installed in IUCAA remotely as well as
during visits of its personnel to the IUCAA campus.
3. The project is a complex one and will require inputs from astronomers,
computer scientists and other experts located in many different
organizations. Their expertise will be used in the execution of specific
tasks and in applying the developed software to different situations. We
have assurances from faculty at
IIT, Mumbai
who have expressed an interest
in participating in this activity.
4. Close collaboration is being developed with
California Institute of
Technology (Caltech) in the USA on several aspects of the project. Caltech
will be making major contributions to the VO in the USA. The development
proposed to be carried out in India will be coherent with the development
being carried out in Caltech and other institutions. A vigorous exchange
programme between the VO centres located in different countries will be
developed.
Timetable : It is expected that significant progress will be made in three years from the start of the project, i.e. by November 2004.
IUCAA
The Inter-University Centre for Astronomy and Astrophysics (IUCAA) is an internationally known centre of excellence for research in astronomy, astrophysics and related areas. It has a faculty of 14 astronomers and many research students and post-doctoral fellows. Research in various branches of theoretical astrophysics, observational astronomy and instrumentation development are carried out in IUCAA. The Centre has more than 80 visiting associates from universities and colleges who spend significant periods of time in IUCAA, participating in the research and developmental activities.
IUCAA has very well developed infrastructure and one of its high points is the state-of-the-art computer centre. IUCAA is one of the important nodes of ERNET and is linked by three independent 2Mbps line to other nodes, and thus has high speed connectivity to research establishments spread all over India and the world. IUCAA has a data centre which was developed in situ, and houses mirrors of very large astronomical databases and scientific literature. The academics at IUCAA and the computer centre staff are skilled in the use and development of databases. IUCAA is now in the process of setting up a 2m optical telescope facility located about a 100 km from Pune which will begin to generate data early in 2002.
Persistent Systems Pvt. Ltd.
Persistent Systems Private Limited (PSPL) is an 11-year old 350-person software development company located in Pune. Persistent Systems specializes in developing data infrastructure software and has developed such software for companies including Microsoft, Hewlett-Packard, Agilent Technologies, Informix, i2 Technologies, Engage Technologies etc. More specifically, Persistent Systems has developed data management components such as bit-vector indexes, query optimizers, ETL Tools, OLAP Tools etc. Persistent Systems' expertise in development of data management software would be beneficial to the development of data management software for astronomy databases.
Persistent Systems proposes to contribute a team of four software developers at no cost to the Virtual Observatory project. As a Company, Persistent Systems has the expertise to develop and project manage software teams that would make a significant contribution to the project.