About eGURU
eGURU Process
Operational Details
Project Ideas
The Proposal Details
Design and implementation of a queriable compression tool for XML data.
XML Parsing
XML language, by virtue of its self-describing nature has become extremely popular as a medium of data exchange, especially over the Internet. XML documents are extremely verbose as compared to their intrinsic data content since the schema is repeated for every record in the document. While compressors are available for XML data, they do not facilitate easy and direct querying of the document. The project aims at developing a compression, decompression and query processor tool for XML documents. These three utilities will be available as three executables. The compressed document is directly queriable, i.e. the queries on the original XML file can be carried out on the compressed document. Minimal decompression, ideally only of the answer is needed for query evaluation.
The main parts of the project are the compressor, decompressor and the query processor. 1) The tool should supports the query language XPath(2.0). 2) The types of queries supported should be range match, wildcard and exact match. 3) Decompression of the query answer as well as the whole document should be provided for. 4) The user interface is through the command line. 1.Compressor - The input is the XML file, which is parsed and events are generated accordingly. These events are handled by the event handler which calls the appropriate compression module. Based on the type of the data handled by the event handler, the corresponding compression algorithm is applied 1.1) Input is an XML file (version 1.0). 1.2) Output is a text file that contains the compressed data 2.Decompressor – The decompressor takes as input the compressed file and produces an XML file. 2.1) Input is an compressed XML file. 2.2) Output is a decompressed XML file. 3.Query Processor - The input is the compressed document and the query in XPath language. The output is the query answer. The query is parsed and accordingly handlers in for that query is called. These handlers search the compressed document and the compressed answer is found. The compressed answer is then sent to the respective decompressor. 3.1) The query language used is XPath (version 2.0)
1.Extensible Markup Language (XML) – http://www.w3.org/XML 2.Xerces C++ parser – http://xml.apache.org/xerces-c 3.XML Path Language (Xpath) – http://www.w3.org/TR/xpath 4.Hartmut Liefke and Dan Suciu. Xmill: An Efficient Compressor for Xml Data. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. 5.Jun-Ki Min, Myung-Jae Park and Chin-Wan Chung. XPRESS: A Queriable Compression for XML Data. Proceedings of the 2003 ACM SIGMOD Internations Conference on Management of Data. 6.P. M. Tolani and J. R. Haritsa. XGRIND: A Query-Friendly XML Compressor. In Proceedings of 18th International Conference on Database Engineering. February 2002.