Introduction
Astronomers use vast quantities of
data for their research. These data are
obtained using many different kinds of telescopes situated on the ground
and on satellites in space. Because of the rapid advance in telescope and
detector technology there has been a phenomenal increase in the quantity
of data, both from observation of specific objects as well as from surveys
which cover entire areas of the study. The data volumes have already
reached terabytes and great growth is expected over the coming decade.
Because of the large volume of data,
and the many different forms in which
it is available, the storage and retrieval of data have become difficult
tasks. It has also become very challenging for astronomers to use the vast
store house of data to produce exciting new scientific discoveries.
Projects are being undertaken in the USA
and some countries in Europe to
efficiently utilize the data through the establishment of a Virtual
Observatory.
The Virtual Observatory-India (VOI) project seeks to bring together
astronomers and software developers, with experience in
handling large volumes of data, to contribute substantially to the international
VO effort, and to make accessible to astronomers in India and in other countries the
developments in the most useful forms.
The aim over the next few years will be to (1) Undertake
research and development for data search and retrieval;
(2) Develop
software for equal and efficient use of the data;
(3) Enable
astronomers and other interested scientists to undertake major scientific
projects using the data and
(4) Make the technology available for use by
other fields, like remote sensing, population studies, bioinformatics,
health care etc. which involve large volumes of data.
The project will be undertaken through a collaboration between
(1)
astronomers and other scientists working in research institutes and
university departments and
(2) expert computer software developers from
the industry. The project will establish a model for collaboration between
experts from academic fields and from industry in the area of information
technology. A good part of the developmental and scientific work will be
undertaken jointly with
California Institute of Technology and other
renowned institutes in the USA. The products which arise from this
collaboration will have wide applicability and will enable scientists and
technologists all over India to benefit from the rich databases which are
being developed all over the world. The project will use Indian scientific
and technical expertise to make important contributions to the world wide
effort, and establish India as one of the pioneers of virtual science.
The Ministry of Information Technology providing a major part
of the funds required for the project. Substantial contribution to the effort,
particularly in the form of expert software engineers,
will be received from
Persistent Systems Pvt. Ltd. and other sources.
IUCAA
will provide infrastructure,
expertise, computing facilities and
other resources.
Background
Astronomers carry out their
observations using a variety of telescopes,
based on the ground or on space platforms. They also use a variety of
detectors like photographic planes, radio receivers, CCD cameras, X-ray
detectors etc. The type and location of the telescopes and the kind of
detectors used depend upon the region of the electromagnetic spectrum in
which observations are to be made. There are two basic kinds of observing
strategies which are followed :
(1) Observations of specific targets
which are of interest to specific groups of astronomers.
(2) Observations
which survey large portions of the sky and which can be later used in a
variety of scientific projects.
Over the last two decades there
has been great progress in telescope and
detector technology. It has therefore been possible to build many large
telescopes and increasingly sensitive detectors. The large installations
are extremely expensive and the demand has been to build telescopes and
detectors through collaborative efforts and to make them available to a
wide community. Astronomers all over the world can therefore use the
advanced facilities to which they may otherwise not have had access. The
data obtained using these facilities is generally archived and made
available to the entire community, regardless of who obtained it in the
first place.
Modern telescopes generate data at an
astounding rate. Data volumes now
can be as large as several hundreds of Gbytes to a few tens of terabytes.
It is expected that some of the surveys which will be initiated over the
next few years will generate several terabytes of data per day. Storing,
retrieving and scientifically using these vast databases is a formidable
task, which cannot be managed by astronomers alone. It will require the
joint effort of astronomers and computer scientists to adopt existing
hardware and software technology, and to develop new hardware and software
to meet the challenging task of making the data available to all potential
users.
Computer storage for large volumes of
data is still very expensive and
therefore the storage will have to be successfully managed using a
hierarchy of archiving hardware and advanced data compression techniques.
It will not be practical, over the foreseeable future, to maintain copies
of all the data in different locations in the world. It will therefore be
necessary to develop strategies for distribution of data in the most
efficient possible manner and to provide for high speed data transfer
between the different data centres. Data obtained in different parts of
the electromagnetic spectrum requires vastly different kinds of processing
before it is brought to a scientifically usable form. The data is also
stored using quite different hardware and software systems, and technique
have to be developed for bringing together the different structures.
VO Concept
A Virtual Observatory (VO)
seeks to facilitate the storage of large
volumes of data and its use in an efficient manner. Efforts in
establishing such structures have been made, with moderate success, by
different observatories and institutes from time to time. But the huge
increase in the volumes of data now available, and the need to carry out
research simultaneously in many different parts of the electromagnetic
spectrum, has made it necessary to make collaborative efforts, much in the
manner of joint effort undertaken to develop major new ground and space
based telescopes. The need is to make a comprehensive solution available
to help astronomers, regardless of their geographic location, or their own
area of expertise, to access data generated from different telescopes all
over the world and in space.
At the present time devices which
can usefully store terabytes of data are
very expensive and it is expected that this situation will prevail for
some time to come. It is therefore not practical to store all available
data in many locations in the world for it to be easily available to
astronomers everywhere. It is necessary to selectively store data in
strategic locations and to make it available using the Internet as well as
other means for data transfer. A VO will seek to make the distributed data
seamlessly available to astronomers. This will require the development of
highly sophisticated data retrieval software which can federate data
stored in many data fields. The data will be available in the form of
catalogues, spectra and images. The VO will enable astronomers to use
these different kinds of data simultaneously, irrespective of their
location and basic nature, for a full multiwavelength analysis.
The vast quantities of data will
enable astronomers to look for very
rare objects, patterns and relationships which remained totally
inaccessible when only very limited data were available. Searching for
these rare features will require the development of highly
sophisticated
data mining techniques for the search to be completed in
finite time. The features found will have to be subject to analysis, and
to be compared with the results of numerical simulations. The VO will seek
to provide hardware and software platforms on which all these operations
can be carried out.
VO Tasks
The aims and general tasks to be performed are
summed up in the White Paper on the National Virtual Observatory being established in the
USA and are as follows:
* Establishment of a common
systems approach to data pipelining,
archiving and retrieval that will ensure easy access by a large and
diverse community of users, at minimum cost and completion times;
* Enabling the distributed
development of a suite of commonly usable
new software tools to make possible querying, correlation, visualization
and statistical comparisons of data;
* Coordinating the
establishment of high speed data transfer
networks that are essential to providing the connectivity among archives,
terascale computing facilities, and the widespread community of users;
* Facilitating productive collaborations
among astronomy centers and
major academic institutions, both national and international, in order to
maximize productivity and minimize infrastructure costs;
* Ensuring communication and possible
collaborations with scientists
in other disciplines facing similar problems, and with the private sector;
* Maintaining a continuing
program of public and educational
outreach that capitalizes upon the unique resources, in both data and
software, of the VO to provide a unique window into astronomy and
scientific methodology.
VO-India
India has a community of astronomers
who work on a variety of theoretical,
observational and instrumentation projects. The research produced in the
country is of world standard and significant contributions to the
development of astronomy have been made over the years. In recent times
the observational facilities available to Indian astronomers have been
augmented by installation of moderately sized optical and infrared
telescopes and the state-of-the-art Giant Metrewave Radio Telescope.
Experiments have been conducted from space, and a satellite for multiwave
length astronomy is to be launched in a few years. While these
developments have been commendable and exciting, it has been necessary to
supplement observations made from Indian facilities by observations
made with international facilities and space borne instruments. The
facilities offered by a VO will very well complement those which so
far been accessible, and will make a continuous stream of high quality
data available for exciting research projects.
The work related to a
VO will require the development of highly
sophisticated tools for different storage, retrieval and for
data mining .
Great expertise in these matters is available for the Indian software
community. The expertise developed by the Indian industry in software
development and project management will be of immense benefit in
development of the VO. The present project seeks to bring together
astronomers and experts on computer science for the development of Virtual
Observatory- India.
The developments in
India will be undertaken in discussion and
collaboration with international groups working on VO. This will mean that
the developments will have world wide impact, and India will become a
founding member and partner of the international VO community. The
pioneering efforts will enable other more sophisticated projects to be
undertaken over the coming decade.
Deliverables from the project
The project will benefit all
scientists and technologists who are
interested in using large volumes of data. Since the project envisages
setting up of virtual structures for remote access, the benefits will be
available to even those who are usually deprived of access to advance
facilities. A very important contribution of the project will be to
develop a model for collaboration between academics and the software
industry. The benefits to be derived from the project can be summarized as
follows :
1.Astronomers in institutes, universities and colleges will be provided
access to unified data collection, hardware and software for efficient use
of the data.
2.Teachers and students from the university sector will be provided
opportunities to participate in state-of-the-art projects on a continuing
basis. Their access to the VO will be through inexpensive computers and
easily available, low cost internet connections. A whole new world of
research will therefore become open to them.
3.The VO will be developed through collaborations between astronomers and
other scientists in research institutes and universities on the one hand,
and computer science and data mining experts from the industry. The
project hopes to become a model for such interaction.
4.The expertise and programmes developed in setting up the VO will be of
direct and immediate relevance to a host of other fields which involve
large data volumes, including remote sensing, population, studies from
census data, meteorology and weather prediction and health care.
5.The project will be executed in active collaboration with groups in the
USA and
Europe.
Results of the project will provide direct inputs to
international efforts. This will establish India as one of the pioneers in
the field of virtual observatories, which is the emerging concept that
promises to be of immense application in the coming decade.
6.The VO will be used in formulating public outreach programmes. Students,
teachers and other interested persons will be able to make use of its
facilities.
Project Execution
Work plan : The tasks to be undertaken
as part of the project will be of
two kinds:<>br (1) those which involve software development with specific
astronomical applications and
(2) applications to other areas. The steps
to be taken in executing these tasks will be as follows :
1.Computer hardware, consisting mainly of mass storage media including
backup devices and servers will be acquired and installed.
2.Large databases on which the developed programmes are to work will be
acquired.
3.Software development dedicated to the federating of databases, rapid
searches, copying and archiving data efficiently, integration of numerical
and image data etc. will be undertaken.
4.Software will be developed for data mining including cluster analysis,
supervised and unsupervised classification, visualization, statistical
analysis etc. These will lead to research projects and specific results.
5.Programmes for applications to other areas will be developed in
collaboration with scientists from different disciplines.
Execution : The project will be
executed in the following manner :
1.The project will be coordinated from
IUCAA,
where the hardware acquired
for the project will be installed. IUCAA has a highly sophisticated
computer centre, infrastructure and computer engineers to install and
maintain the equipments. It has a highly competent team of astronomers,
which includes members of its faculty as well as a number of visiting
associates from universities who spend a significant amount of time in
IUCAA. Development of the astronomy related software and applications will
be undertaken by these scientists in collaboration with colleagues from
other institutes and university departments.
2.The software development related to the management of databases and data
mining will be undertaken in collaboration with the industry. The main
effort here will be provided by computer scientists and developers
from
Persistent Systems Pvt. Ltd., which is a highly successful and leading
software development house which specializes in data related projects.
Persistent Systems is located close to IUCAA and there will be close
collaboration between personnel in the two organizations. Persistent
Systems will use the hardware installed in IUCAA remotely as well as
during visits of its personnel to the IUCAA campus.
3.The project is a complex one and will require inputs from astronomers,
computer scientists and other experts located in many different
organizations. Their expertise will be used in the execution of specific
tasks and in applying the developed software to different situations. We
have assurances from faculty at
IIT, Mumbai
who have expressed an interest
in participating in this activity.
4.Close collaboration is being developed with
California Institute of
Technology (Caltech) in the USA on several aspects of the project. Caltech
will be making major contributions to the VO in the USA. The development
proposed to be carried out in India will be coherent with the development
being carried out in Caltech and other institutions. A vigorous exchange
programme between the VO centres located in different countries will be
developed.
Timetable : It is expected that significant progress will be made in three years
from the start of the project, i.e. by November 2004.
Participating Institutes
Profile of IUCAA
The Inter-University Centre for
Astronomy and Astrophysics
(IUCAA) is an
internationally known centre of excellence for research in astronomy,
astrophysics and related areas. It has a faculty of 14 astronomers and
many research students and post-doctoral fellows. Research in various
branches of theoretical astrophysics, observational astronomy and
instrumentation development are carried out in IUCAA. The Centre has more
than 80 visiting associates from universities and colleges who spend
significant periods of time in IUCAA, participating in the research and
developmental activities.
IUCAA has very well developed
infrastructure and one of its high points is
the state-of-the-art computer centre. IUCAA is one of the
important nodes of ERNET and is linked by three independent 2Mbps line to
other nodes, and thus has high speed connectivity to research
establishments spread all over India and the world.
IUCAA has
a data centre which was developed in
situ, and houses mirrors of very large astronomical databases and
scientific literature. The academics at IUCAA and the computer centre
staff are skilled in the use and development of databases. IUCAA is
now in the process of setting up a 2m optical telescope facility located
about a 100 km from Pune which will begin to generate data early in 2002.
Profile of Persistent Systems Pvt. Ltd.
Persistent Systems
Private Limited (PSPL) is an
11-year old 350-person software development company located in Pune.
Persistent Systems specializes in developing data infrastructure software
and has developed such software for companies including Microsoft,
Hewlett-Packard, Agilent Technologies, Informix, i2 Technologies, Engage
Technologies etc. More specifically, Persistent Systems has developed
data management components such as bit-vector indexes, query optimizers,
ETL Tools, OLAP Tools etc. Persistent Systems' expertise in development
of data management software would be beneficial to the development of
data management software for astronomy databases.
Persistent Systems proposes to
contribute a team of four software
developers at no cost to the Virtual Observatory project. As a Company,
Persistent Systems has the expertise to develop and project manage
software teams that would make a significant contribution to the project.