C++ VOTable Streaming Parser

(Version 1.0)

 

Introduction

 

The C++ Streaming Parser is a library in C++ for parsing VOTables, with streaming support. The APIs provided can be used directly to develop VOTable applications without having to carry out raw VOTable processing.

 

The C++ VOTable Streaming Parser has been developed as a part of the Virtual Observatory - India initiative by Persistent Systems Private Limited and the Inter-University Centre for Astronomy and Astrophysics (IUCAA).

 

The parser has been implemented on top of the SAX based parser Xerces-C++ 2.2.0. Xerces basically has a push-based approach, but the C++ Streaming Parser acts as a layer on top of Xerces and provides a pull-based approach for the user applications.

 

The parser currently supports only reading streaming VOTable table-data. It does not support reading binary or FITS data or writing of VOTables.

 

What is the Streaming Parser?

 

The streaming parser is different from its previous, non-streaming version, in that it does not create a tree-representation of the document in the memory. It reads the VOTable in chunks. Hence, it can be used for parsing real-time, streaming VOTables as well as those VOTables, which are too large to fit in the memory. The other aspect of being a streaming parser is that it is a single-pass parser, ie the APIs provided do not allow you to move back and forth in the document.

 

It might be helpful to think of an imaginary pointer, indicating the current position of the parser in the document. When parsing is initiated, the pointer is pointing to the first element in the VOTable. As the parser progresses through the VOTable, this pointer is moved to the next elements. Being a streaming parser, the pointer moves in only forward direction. If it goes past an element, it cannot go back to retrieve that element’s data.

 

The APIs provided by the parser are basically of two types – nextXXX() and getXXX(). The nextXXX() APIs simply move the imaginary pointer so that it points to the next requested element. The getXXX() APIs will return the data from the element that the pointer currently points to.

 

The nextXXX() and getXXX() are only applicable to the following elements

·         RESOURCE

·         TABLE

·         ROW

 

The parser provides a pull-based approach to parsing VOTables. This allows the user application to have greater control over parsing. It can decide when to start or stop the parser by using the provided APIs. Eg. calling a nextXXX() API causes the parser to start parsing the document till the requested element is encountered and then it stops. Calling a getXXX() API causes the parser to completely parse the requested element & return its data. As a result of this approach, in case of a real-time streaming VOTable, it is the responsibility of the user application to make sure that the parsing proceeds at a fair pace & no data is lost.


Available APIs

 

1

nextResource ()

Move the pointer to the next <RESOURCE> element.

2

getResource (Resource)           

Get meta data from the current <RESOURCE> element.

3

nextTable ()

Move the pointer to the next <TABLE> element.

4

getTable (Table)

Get meta data from the current <TABLE> element.

5

nextRow ()

Move the pointer to the next <TR> element.

6

getRow (Row)

Get data from the current <TR> element.

7

getNextResourceOrTable (type)      

Move to the next <RESOURCE> or <TABLE> element whichever comes first.

8

getVOTableMetaData (votable)

Get meta data from the <VOTABLE> element.

                                   

A detailed description of the APIs can be found at API Documentation