Introduction

          Astronomers use vast quantities of data for their research. These data are obtained using many different kinds of telescopes situated on the ground and on satellites in space. Because of the rapid advance in telescope and detector technology there has been a phenomenal increase in the quantity of data, both from observation of specific objects as well as from surveys which cover entire areas of the study. The data volumes have already reached terabytes and great growth is expected over the coming decade.

         Because of the large volume of data, and the many different forms in which it is available, the storage and retrieval of data have become difficult tasks. It has also become very challenging for astronomers to use the vast store house of data to produce exciting new scientific discoveries. Projects are being undertaken in the USA and some countries in Europe to efficiently utilize the data through the establishment of a Virtual Observatory.

The Virtual Observatory-India (VOI) project seeks to bring together astronomers and software developers, with experience in handling large volumes of data, to contribute substantially to the international VO effort, and to make accessible to astronomers in India and in other countries the developments in the most useful forms. The aim over the next few years will be to (1) Undertake research and development for data search and retrieval;
(2) Develop software for equal and efficient use of the data;
(3) Enable astronomers and other interested scientists to undertake major scientific projects using the data and
(4) Make the technology available for use by other fields, like remote sensing, population studies, bioinformatics, health care etc. which involve large volumes of data.

The project will be undertaken through a collaboration between
(1) astronomers and other scientists working in research institutes and university departments and
(2) expert computer software developers from the industry. The project will establish a model for collaboration between experts from academic fields and from industry in the area of information technology. A good part of the developmental and scientific work will be undertaken jointly with California Institute of Technology and other renowned institutes in the USA. The products which arise from this collaboration will have wide applicability and will enable scientists and technologists all over India to benefit from the rich databases which are being developed all over the world. The project will use Indian scientific and technical expertise to make important contributions to the world wide effort, and establish India as one of the pioneers of virtual science. The Ministry of Information Technology providing a major part of the funds required for the project. Substantial contribution to the effort, particularly in the form of expert software engineers, will be received from Persistent Systems Pvt. Ltd. and other sources. IUCAA will provide infrastructure, expertise, computing facilities and other resources.

Background

         Astronomers carry out their observations using a variety of telescopes, based on the ground or on space platforms. They also use a variety of detectors like photographic planes, radio receivers, CCD cameras, X-ray detectors etc. The type and location of the telescopes and the kind of detectors used depend upon the region of the electromagnetic spectrum in which observations are to be made. There are two basic kinds of observing strategies which are followed :
(1) Observations of specific targets which are of interest to specific groups of astronomers.
(2) Observations which survey large portions of the sky and which can be later used in a variety of scientific projects.

         Over the last two decades there has been great progress in telescope and detector technology. It has therefore been possible to build many large telescopes and increasingly sensitive detectors. The large installations are extremely expensive and the demand has been to build telescopes and detectors through collaborative efforts and to make them available to a wide community. Astronomers all over the world can therefore use the advanced facilities to which they may otherwise not have had access. The data obtained using these facilities is generally archived and made available to the entire community, regardless of who obtained it in the first place.

         Modern telescopes generate data at an astounding rate. Data volumes now can be as large as several hundreds of Gbytes to a few tens of terabytes. It is expected that some of the surveys which will be initiated over the next few years will generate several terabytes of data per day. Storing, retrieving and scientifically using these vast databases is a formidable task, which cannot be managed by astronomers alone. It will require the joint effort of astronomers and computer scientists to adopt existing hardware and software technology, and to develop new hardware and software to meet the challenging task of making the data available to all potential users.

         Computer storage for large volumes of data is still very expensive and therefore the storage will have to be successfully managed using a hierarchy of archiving hardware and advanced data compression techniques. It will not be practical, over the foreseeable future, to maintain copies of all the data in different locations in the world. It will therefore be necessary to develop strategies for distribution of data in the most efficient possible manner and to provide for high speed data transfer between the different data centres. Data obtained in different parts of the electromagnetic spectrum requires vastly different kinds of processing before it is brought to a scientifically usable form. The data is also stored using quite different hardware and software systems, and technique have to be developed for bringing together the different structures.



VO Concept

         A Virtual Observatory (VO) seeks to facilitate the storage of large volumes of data and its use in an efficient manner. Efforts in establishing such structures have been made, with moderate success, by different observatories and institutes from time to time. But the huge increase in the volumes of data now available, and the need to carry out research simultaneously in many different parts of the electromagnetic spectrum, has made it necessary to make collaborative efforts, much in the manner of joint effort undertaken to develop major new ground and space based telescopes. The need is to make a comprehensive solution available to help astronomers, regardless of their geographic location, or their own area of expertise, to access data generated from different telescopes all over the world and in space.

         At the present time devices which can usefully store terabytes of data are very expensive and it is expected that this situation will prevail for some time to come. It is therefore not practical to store all available data in many locations in the world for it to be easily available to astronomers everywhere. It is necessary to selectively store data in strategic locations and to make it available using the Internet as well as other means for data transfer. A VO will seek to make the distributed data seamlessly available to astronomers. This will require the development of highly sophisticated data retrieval software which can federate data stored in many data fields. The data will be available in the form of catalogues, spectra and images. The VO will enable astronomers to use these different kinds of data simultaneously, irrespective of their location and basic nature, for a full multiwavelength analysis.

         The vast quantities of data will enable astronomers to look for very rare objects, patterns and relationships which remained totally inaccessible when only very limited data were available. Searching for these rare features will require the development of highly sophisticated data mining techniques for the search to be completed in finite time. The features found will have to be subject to analysis, and to be compared with the results of numerical simulations. The VO will seek to provide hardware and software platforms on which all these operations can be carried out.



VO Tasks

         The aims and general tasks to be performed are summed up in the White Paper on the National Virtual Observatory being established in the USA and are as follows:

         * Establishment of a common systems approach to data pipelining, archiving and retrieval that will ensure easy access by a large and diverse community of users, at minimum cost and completion times;

         * Enabling the distributed development of a suite of commonly usable new software tools to make possible querying, correlation, visualization and statistical comparisons of data;

         * Coordinating the establishment of high speed data transfer networks that are essential to providing the connectivity among archives, terascale computing facilities, and the widespread community of users;

         * Facilitating productive collaborations among astronomy centers and major academic institutions, both national and international, in order to maximize productivity and minimize infrastructure costs;

         * Ensuring communication and possible collaborations with scientists in other disciplines facing similar problems, and with the private sector;

         * Maintaining a continuing program of public and educational outreach that capitalizes upon the unique resources, in both data and software, of the VO to provide a unique window into astronomy and scientific methodology.



VO-India

         India has a community of astronomers who work on a variety of theoretical, observational and instrumentation projects. The research produced in the country is of world standard and significant contributions to the development of astronomy have been made over the years. In recent times the observational facilities available to Indian astronomers have been augmented by installation of moderately sized optical and infrared telescopes and the state-of-the-art Giant Metrewave Radio Telescope. Experiments have been conducted from space, and a satellite for multiwave length astronomy is to be launched in a few years. While these developments have been commendable and exciting, it has been necessary to supplement observations made from Indian facilities by observations made with international facilities and space borne instruments. The facilities offered by a VO will very well complement those which so far been accessible, and will make a continuous stream of high quality data available for exciting research projects.

         The work related to a VO will require the development of highly sophisticated tools for different storage, retrieval and for data mining . Great expertise in these matters is available for the Indian software community. The expertise developed by the Indian industry in software development and project management will be of immense benefit in development of the VO. The present project seeks to bring together astronomers and experts on computer science for the development of Virtual Observatory- India.

         The developments in India will be undertaken in discussion and collaboration with international groups working on VO. This will mean that the developments will have world wide impact, and India will become a founding member and partner of the international VO community. The pioneering efforts will enable other more sophisticated projects to be undertaken over the coming decade.



Deliverables from the project

         The project will benefit all scientists and technologists who are interested in using large volumes of data. Since the project envisages setting up of virtual structures for remote access, the benefits will be available to even those who are usually deprived of access to advance facilities. A very important contribution of the project will be to develop a model for collaboration between academics and the software industry. The benefits to be derived from the project can be summarized as follows : 1.Astronomers in institutes, universities and colleges will be provided access to unified data collection, hardware and software for efficient use of the data.

2.Teachers and students from the university sector will be provided opportunities to participate in state-of-the-art projects on a continuing basis. Their access to the VO will be through inexpensive computers and easily available, low cost internet connections. A whole new world of research will therefore become open to them.

3.The VO will be developed through collaborations between astronomers and other scientists in research institutes and universities on the one hand, and computer science and data mining experts from the industry. The project hopes to become a model for such interaction.

4.The expertise and programmes developed in setting up the VO will be of direct and immediate relevance to a host of other fields which involve large data volumes, including remote sensing, population, studies from census data, meteorology and weather prediction and health care.

5.The project will be executed in active collaboration with groups in the USA and Europe. Results of the project will provide direct inputs to international efforts. This will establish India as one of the pioneers in the field of virtual observatories, which is the emerging concept that promises to be of immense application in the coming decade.

6.The VO will be used in formulating public outreach programmes. Students, teachers and other interested persons will be able to make use of its facilities.



Project Execution

         Work plan : The tasks to be undertaken as part of the project will be of two kinds:<>br (1) those which involve software development with specific astronomical applications and
(2) applications to other areas. The steps to be taken in executing these tasks will be as follows :

1.Computer hardware, consisting mainly of mass storage media including backup devices and servers will be acquired and installed.
2.Large databases on which the developed programmes are to work will be acquired.
3.Software development dedicated to the federating of databases, rapid searches, copying and archiving data efficiently, integration of numerical and image data etc. will be undertaken.
4.Software will be developed for data mining including cluster analysis, supervised and unsupervised classification, visualization, statistical analysis etc. These will lead to research projects and specific results.
5.Programmes for applications to other areas will be developed in collaboration with scientists from different disciplines.

         Execution : The project will be executed in the following manner :

1.The project will be coordinated from IUCAA, where the hardware acquired for the project will be installed. IUCAA has a highly sophisticated computer centre, infrastructure and computer engineers to install and maintain the equipments. It has a highly competent team of astronomers, which includes members of its faculty as well as a number of visiting associates from universities who spend a significant amount of time in IUCAA. Development of the astronomy related software and applications will be undertaken by these scientists in collaboration with colleagues from other institutes and university departments.

2.The software development related to the management of databases and data mining will be undertaken in collaboration with the industry. The main effort here will be provided by computer scientists and developers from Persistent Systems Pvt. Ltd., which is a highly successful and leading software development house which specializes in data related projects. Persistent Systems is located close to IUCAA and there will be close collaboration between personnel in the two organizations. Persistent Systems will use the hardware installed in IUCAA remotely as well as during visits of its personnel to the IUCAA campus.

3.The project is a complex one and will require inputs from astronomers, computer scientists and other experts located in many different organizations. Their expertise will be used in the execution of specific tasks and in applying the developed software to different situations. We have assurances from faculty at IIT, Mumbai who have expressed an interest in participating in this activity.

4.Close collaboration is being developed with California Institute of Technology (Caltech) in the USA on several aspects of the project. Caltech will be making major contributions to the VO in the USA. The development proposed to be carried out in India will be coherent with the development being carried out in Caltech and other institutions. A vigorous exchange programme between the VO centres located in different countries will be developed.



Timetable : It is expected that significant progress will be made in three years from the start of the project, i.e. by November 2004.



Participating Institutes


Profile of IUCAA

         The Inter-University Centre for Astronomy and Astrophysics (IUCAA) is an internationally known centre of excellence for research in astronomy, astrophysics and related areas. It has a faculty of 14 astronomers and many research students and post-doctoral fellows. Research in various branches of theoretical astrophysics, observational astronomy and instrumentation development are carried out in IUCAA. The Centre has more than 80 visiting associates from universities and colleges who spend significant periods of time in IUCAA, participating in the research and developmental activities.

         IUCAA has very well developed infrastructure and one of its high points is the state-of-the-art computer centre. IUCAA is one of the important nodes of ERNET and is linked by three independent 2Mbps line to other nodes, and thus has high speed connectivity to research establishments spread all over India and the world. IUCAA has a data centre which was developed in situ, and houses mirrors of very large astronomical databases and scientific literature. The academics at IUCAA and the computer centre staff are skilled in the use and development of databases. IUCAA is now in the process of setting up a 2m optical telescope facility located about a 100 km from Pune which will begin to generate data early in 2002.



Profile of Persistent Systems Pvt. Ltd.

         Persistent Systems Private Limited (PSPL) is an 11-year old 350-person software development company located in Pune. Persistent Systems specializes in developing data infrastructure software and has developed such software for companies including Microsoft, Hewlett-Packard, Agilent Technologies, Informix, i2 Technologies, Engage Technologies etc. More specifically, Persistent Systems has developed data management components such as bit-vector indexes, query optimizers, ETL Tools, OLAP Tools etc. Persistent Systems' expertise in development of data management software would be beneficial to the development of data management software for astronomy databases.

         Persistent Systems proposes to contribute a team of four software developers at no cost to the Virtual Observatory project. As a Company, Persistent Systems has the expertise to develop and project manage software teams that would make a significant contribution to the project.