EP2039009A1 - Methods and devices for compressing structured documents - Google Patents

Methods and devices for compressing structured documents

Info

Publication number
EP2039009A1
EP2039009A1 EP07734998A EP07734998A EP2039009A1 EP 2039009 A1 EP2039009 A1 EP 2039009A1 EP 07734998 A EP07734998 A EP 07734998A EP 07734998 A EP07734998 A EP 07734998A EP 2039009 A1 EP2039009 A1 EP 2039009A1
Authority
EP
European Patent Office
Prior art keywords
event
stream
byte
sequence
aligned
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP07734998A
Other languages
German (de)
French (fr)
Inventor
Grégoire Pau
Robin Berjon
Philippe De Cuetos
Cédric Thienot
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Expway SA
Original Assignee
Expway SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Expway SA filed Critical Expway SA
Publication of EP2039009A1 publication Critical patent/EP2039009A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • G06F40/143Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/149Adaptation of the text data for streaming purposes, e.g. Efficient XML Interchange [EXI] format

Definitions

  • the present invention relates in general to the field of computer systems for transmitting, storing, retrieving and displaying data. It more particularly relates to a method and system for compressing and decompressing structured documents having a structure which is not necessarily known.
  • a structured document is a set of information elements each associated with a type and attributes, and interconnected by relationships that are mainly hierarchical.
  • Such documents use a markup language such as Standard Generalized Markup Language (SGML), Hypertext Markup Language (HTML), or Extensible Markup Language (XML), serving in particular to distinguish between the various elements of information making up the document.
  • SGML Standard Generalized Markup Language
  • HTML Hypertext Markup Language
  • XML Extensible Markup Language
  • the content information of the document is mixed in with layout information and type information.
  • a structured document includes markers also called "tags” for separating different information element in the document.
  • tags For SGML, XML, or HTML formats, these tags have the form " ⁇ XXXX>" and “ ⁇ /XXXX>", the first tag “XXXX” marking the beginning of an information element, and the second tag “ ⁇ /XXXX>” marking the end of said element.
  • An information element may itself be made up of a plurality attributes and lower-level information elements also called “subelements”.
  • a structured document presents a tree or hierarchical structure, each node representing an information element and being connected to a node at a higher hierarchical level representing an information element that contains the information elements at lower level.
  • the nodes located at the ends of branches in such a tree structure represent information elements containing data of a predetermined unstructured type, which is not divided into information subelements.
  • a structured document contains separation markers or tags generally represented in textual form, said tags defining information elements or subelements that can themselves contain other information subelements separated by tags.
  • markup languages such as XML are verbose languages and thus they are inefficient to be processed and costly to be transmitted or stored.
  • many software applications tend to produce very large structured documents. This is particularly the case of software applications creating HTML documents and digital graphical documents such as scene description, art, technical drawings, schematics and the like.
  • the documents produced by graphical applications include graphical data describing a large number of points, lines and curves.
  • graphical objects are described by graphical structured elements using a language such as SVG (Scalable Vector Graphics) describing two-dimensional vector and mixed vector/raster graphic objects.
  • ISO/IEC 15938-1 MPEG-7 - Moving Picture Expert Group
  • ISO/IEC 23001-1 proposes a method and a binary format for encoding (compressing) a XML structured document and decoding such a binary format.
  • This standard is more particularly designed to deal with highly structured data, such as multimedia metadata, having a known structure defined in one or more schemas.
  • ISO/IEC 23001-1 provide a binary stream which is not byte-aligned.
  • conventional compression algorithms such as ZLIB (Compression Library) are not efficient to further compress the binary streams provided by these compression methods.
  • ZLIB Compression Library
  • An embodiment is to enable a structured document to be encoded and decoded without using a schema defining the structure of the document.
  • An embodiment provides a compression method providing a byte-aligned binary stream that can be further processed by a conventional compression algorithm such as ZLIB.
  • one embodiment provides a compression method of compressing a structured document having a tree-like structure comprising structured elements nested in each other, each said structured elements comprising structuring elements defining the structure of the element and delimiting at least one value element which is a set of at least one structured element or unstructured element.
  • the method comprises steps of: converting the structured document into a stream of events comprising events corresponding to structuring elements of the structured document, and encoding the event stream by generating a binary stream comprising byte-aligned codes each encoding an event or at least a second occurrence of a sequence of consecutive events occurring in the event stream.
  • the method comprises a step of applying a compression algorithm to the binary stream to obtain a compressed binary stream.
  • the compression algorithm is ZLIB.
  • the encoding step comprises for an event of the event stream: attributing one byte-aligned code to an event sequence including at least two consecutive events occurring in the event stream and ending with said event, and attributing one byte-aligned code to said event.
  • the encoding step comprises for an event of the event stream: attributing one byte-aligned code to an event sequence including three consecutive events occurring in the event stream and ending with said event, attributing one byte-aligned code to an event sequence including two consecutive events occurring in the event stream and ending with said event, and attributing one byte-aligned code to said event.
  • a correspondence table establishes a link between each byte-aligned code of the binary stream and an event or an event sequence, said correspondence table being of limited size.
  • a new event or event sequence is inserted in the correspondence table by replacing an oldest event or event sequence by the new event or event sequence, so that the byte-aligned code of the oldest event or event sequence is attributed to the new event or event sequence.
  • the byte-aligned codes are one byte long.
  • Another embodiment of the present invention provides a decompression method of decompressing a binary stream resulting from compression of an original structured document, the original structured document having a tree-like structure comprising structured elements nested in each other, each said structured elements comprising structuring elements defining the structure of the element and delimiting at least one value element which is a set of at least, one structured element or unstructured element.
  • the binary stream comprises a succession of byte-aligned codes encoding events of an event stream, said events corresponding to structuring elements of the structured document, said decompression method comprising steps of: decoding the binary stream by generating said event stream which comprises an event or at least a second occurrence of an event sequence of consecutive events for each byte-aligned code of the binary stream, and converting each event of the event stream into structuring elements so as to provide the original structured document.
  • the decompression method comprises a previous step of applying a decompression algorithm to a compressed binary stream to obtain the binary stream.
  • the decompression algorithm is ZLIB.
  • the decoding step comprises for an event of the event stream: attributing one byte-aligned code to an event sequence including at least two consecutive events in the event stream and ending with said event, and attributing one byte-aligned code to said event.
  • the decoding step comprises for an event of the event stream: attributing one byte-aligned code to an event sequence including three consecutive events occurring in the event stream and ending with said event, attributing one byte-aligned code to an event sequence including two consecutive events occurring in the event stream and ending with said event, and attributing one byte-aligned code to said event.
  • a correspondence table establishes a link between each byte-aligned code of the binary stream and an event or an event sequence, said correspondence table being of limited size.
  • a new event or event sequence is inserted in the correspondence table by replacing an oldest event or event sequence by the new event or event sequence, so that the byte-aligned code of the oldest event or event sequence is attributed to the new event or event sequence.
  • the byte-aligned codes are one byte long.
  • a compression device for compressing a structured document having a tree-like structure comprising structured elements nested in each other, each said structured elements comprising structuring elements defining the structure of the element and delimiting at least one value element which is a set of at least one structured element or unstructured element.
  • the compression device comprises: a converter for converting the structured document into a stream of events comprising events corresponding to structuring elements of the structured document, and an encoder for encoding the event stream by generating a binary stream comprising byte-aligned codes each encoding an event or at least a second occurrence of a sequence of consecutive events occurring in the event stream.
  • the compression device comprises a compression module for applying a compression algorithm to the binary stream to obtain a compressed binary stream.
  • the compression algorithm is ZLIB.
  • the encoder is configured to process an event of the event stream by: attributing one byte-alig ⁇ ed code to an event sequence including at least two consecutive events in the event stream and ending with said event, and attributing one byte-aligned code to said event.
  • the encoder is configured to process an event of the event stream by: attributing one byte-aligned code to an event sequence including three consecutive events occurring in the event stream and ending with said event, attributing one byte-aligned code to an event sequence including two consecutive events occurring in the event stream and ending with said event, and attributing one byte-aligned code to said event.
  • a correspondence table establishes a link between each byte-aligned code of the binary stream and an event or an event sequence, said correspondence table being of limited size.
  • the encoder is configured to insert a new event or event sequence in the correspondence table when it is full, by replacing an oldest event or event sequence by the new event or event sequence, so that the byte-aligned code of the oldest event or event sequence is attributed to the new event or event sequence.
  • the byte-aligned codes are one byte long.
  • Another embodiment of the present invention provides a decompression device for decompressing a binary stream resulting from compression of an original structured document, the original structured document having a tree-like structure comprising information elements nested in each other, each said structured elements comprising structuring elements defining the structure of the element and delimiting at least one value element which is a set of at least one structured element or unstructured element.
  • the binary stream comprises a succession of byte-aligned codes encoding events of an event stream, said events corresponding to structuring elements of the structured document
  • said decompression device comprising: a decoder for decoding the binary stream by generating said event stream which comprises an event or at least a second occurrence of an event sequence of consecutive events for each byte-aligned code of the binary stream, and a converter for converting each event of the event stream into structuring elements so as to provide the original structured document.
  • the decompression device comprises a decompression module for applying a decompression algorithm to a compressed binary stream to obtain the binary stream.
  • the decompression algorithm is ZLIB.
  • the decoder is configured to process an event of the event stream by: attributing one byte-aligned code to an event sequence including at least two consecutive events in the event stream and ending with said event, and attributing one byte-aligned code to said event.
  • the decoder is configured to process an event of the event stream by: attributing one byte-aligned code to an event sequence including three consecutive events occurring in the event stream and ending with said event, attributing one byte-aligned code to an event sequence including two consecutive events occurring in the event stream and ending with said event, and attributing one byte-aligned code to said event.
  • a correspondence table establishes a link between each byte-aligned code of the binary stream and an event or an event sequence, said correspondence table being of limited size.
  • the decoder is configured to insert a new event or event sequence in the correspondence table when it is full, by replacing an oldest event or event sequence by the new event or event sequence, so that the byte-aligned code of the oldest event or event sequence is attributed to the new event or event sequence.
  • Figure 1 represents in block form a structured document
  • Figure 2 represents in block form a structured document compression device according to one embodiment of the present invention
  • Figure 3 represents in block form a structured document decompression device according to one embodiment of the present invention
  • Figures 4 to 6 are flow charts of procedures executed by the compression device of Figure 2
  • Figure 7 is a flow chart of a procedure executed by the decompression device of Figure 3.
  • Figure 1 represents a structured document 1 comprising a header HD and a main element MEL.
  • the main element MEL comprises a type identifier Type, a set of attributes Att.l, Att.2, ... Attn and a value VaI.
  • the value of the main element MEL may include one or more structured elements 4 called "subelements of the main element", each comprising a type identifier Type, a set of attributes Att.l- Attn and a value VaI.
  • the value of each element 4 may itself also include one or more structured or unstructured subelements.
  • the unstructured elements have a known format such as string, integer number, floating-point number, ...
  • Each element or subelement is associated with a type defining the structure of the element.
  • Each type of the elements of a structured document may be defined in a schema (for example XML schema in XML language).
  • a structured element of a structured document has the following form in
  • XML or in languages derived from XML such as HTML and SVG:
  • "type” is a type identifier of the structured element
  • " ⁇ /type>” is an end tag delimiting the end of the element in the document
  • value is the value of the element which may comprise structured or unstructured subelements.
  • Figure 2 represents a compressing device according to an embodiment of the invention.
  • the compressing device comprises a parser XSXP receiving a structured document DOC to be compressed, a binary encoder BCD, and preferably a compression module ZIP such as ZLIB providing a compressed binary stream BDOC.
  • the parser analyzes the structured document DOC in the form of an alphanumerical document and identifies structuring elements of the document, i.e. alphanumerical strings defining the tags, attributes and values of the elements composing the document and converts these structuring elements into a stream of events EVST.
  • the generated event stream EVST comprises at least one event such as a SAX event (SAX: Simple API - Application Program Interface - for XML) for each structuring element of the document DOC.
  • SAX is defined in detail in http://www.saxproject.org/. For example, the apparition of a XML opening or closing tag in a XML document is a SAX event.
  • the binary encoder BCD converts the event stream EVST into a binary stream BST.
  • the binary stream comprises a byte code for each event or event sequence of two or three consecutive events. Every occurrence of a new event sequence of two or three consecutive events in the event stream is memorized and a byte code is attributed to the event sequence. When another occurrence of a memorized event sequence is detected in the event stream, the sequence is encoded using the byte code attributed thereto.
  • the binary encoder BCD uses a symbol table STl and a symbol code map table SCMl, these tables containing events provided by the converter XSXP. These tables are initialized with a table TNEVT containing possible events or most frequent events that can be provided by the converter XSXP.
  • the table SCMl establishes a correspondence between each event or event sequence in the table STl and a byte code used by the encoder BCD to encode the event or event sequence.
  • the table STl contains the last events or event sequences encoded by the encoder.
  • the binary stream provided by the binary encoder BCD is byte-aligned, i.e. each byte or sequence of successive bytes of the binary stream corresponds to a part of the structured document DOC which is encoded using an integer number of bytes.
  • a compression algorithm such as ZLIB can be applied with efficiency to the binary stream BST provided by the binary encoder BCD.
  • the binary stream BST is further compressed by a compression module ZIP which provides a compressed binary stream BDOC.
  • the module ZIP implements a conventional compression algorithm such as ZLIB.
  • Figure 3 represents a decompressing device according to an embodiment of the invention.
  • the decompressing device comprises a binary decoder BDCD, and a converter SXXP providing a structured document DOC which is the same as the document applied to the compression device.
  • the decompressing device further comprises a decompression module DZIP applying to the binary stream BDOC an initial decompression processing implementing a conventional decompression algorithm such as ZLIB and providing a binary stream BST which is processed by the decoder BDCD.
  • a decompression module DZIP applying to the binary stream BDOC an initial decompression processing implementing a conventional decompression algorithm such as ZLIB and providing a binary stream BST which is processed by the decoder BDCD.
  • the binary decoder BDCD converts the binary stream BST applied to the decompression device or provided by the decompression module DZIP into a stream of events EVST.
  • the converter SXXP converts the event stream EVST provided by the binary decoder BDCD into tags constituting the structured document DOC.
  • the binary decoder BDCD uses a symbol table ST2 and a symbol code map table SCM2 which are similar to the tables STl and SCMl used by the encoder BCD. These tables are initialized with the same table INEVT containing possible events or most frequent events that can be provided by the converter XSXP.
  • the table SCM2 establishes a correspondence between each event or event sequence in the table ST2 and a byte code that may appear in the binary stream BST.
  • the table ST2 contains the last events or event sequences appearing in the event stream EVST provided by the decoder BDCD. hi the case of XML and SAX, the SAX events are listed in the table below:
  • AU SAX events are defined by a corresponding SAX event callback.
  • the three special events ADD_NS, ADD_ENAME and ADD_ANAME are used to dynamically add a namespace, an element name or an attribute name in the corresponding dictionary.
  • the UID number is a fixed numerical ID able to unambiguously define an event, but this is not the value used to encode an event, as explained below.
  • An event can carry zero, one or several parameters, which can be strings or numerical values, which point to a corresponding dynamic strings dictionary. Strings are encoded in UTF-8 format, with a terminating zero.
  • Figure 4 represents a process performed by the binary encoder BCD.
  • the process of figure 4 comprises steps S1-S17.
  • This process uses the symbol table STl and the symbol code map table SCMl.
  • the table STl is previously initialized with table INEVT so as to contain all events listed above in Tables 1 and 2.
  • Tables STl and SCMl contain a limited number of events which is equal to 127 for example.
  • step Sl the stream of events EVST is read event by event until all events in the stream EVST provided by the converter XSXP from the document are processed (step S2).
  • three events are loaded into a FIFO buffer (First-In First-Out) Bevt.
  • Table SCMl establishes a correspondence between each event in the table STl and a code which is used to encode an event in the binary stream BST generated by the encoder BCD.
  • the code corresponding to each event in the tables STl and SCMl is equal for example to the position of the event in the table SCMl.
  • each of the memory location of the buffer Bevt is compared to null. If only the third event Bevt[0] in the buffer Bevt is not null, a symbol sym containing the event Bevt[0] is generated at step S6.
  • the code corresponding to the event Bevt[0] is determined from table SCMl . This code and the parameters associated with the event are inserted into the binary stream BST generated by the encoder BCD.
  • the symbol sym is used to update the symbol table STl and the symbol code map table SCMl.
  • the process of updating tables STl and SCMl which will be explained below in reference of Figures 5 and 6 consists in inserting the symbol into the tables if it is not already in these tables and putting the symbol at the beginning of the table STl.
  • table STl is arranged so that the newest symbols are listed at the beginning of the table.
  • the oldest event Bevt[0] is pushed outside the buffer Bevt at step S9 and the process continues with a new iteration at step Sl where a new event is read in the event stream EVST and loaded in the location Bevt[2] of the buffer Bevt.
  • step S8 tables STl and SCMl are updated with the events contained in the symbol sym
  • step S9 the buffer Bevt is shifted once more. It should be noted that the execution of steps S 12 and S9 pushes two events outside the buffer Bevt since the two events Bevt[0] and Bevt[l] have been processed. Then the process continues with a new iteration at step Sl where two new events are read in the event stream EVST and loaded in the location Bevt[l] and Bevt[2] of the buffer Bevt.
  • This code and the parameters associated with the events Bevt[0], Bevt[l] and Bevt[2] are inserted into the binary stream BST generated by the encoder BCD.
  • the process continues at step S8 where tables STl and SCMl are updated with the three events contained in the symbol sym, and at step S9 where the buffer Bevt is shifted once more.
  • the buffer Bevt is thus shifted three times at steps S 16, S 12 and S9. Therefore the buffer Bevt is empty when executing step Sl again for a new iteration since three events were processed.
  • Figure 5 represents a process 20 executed at step S8.
  • the process 20 which comprises steps S21 to S28 updates the tables STl and SCMl with one symbol sym that can contain up to three events.
  • a counter i is initialized.
  • steps S23 and S24 are executed to test whether the two previously processed events memorized by variables evt-2 and evt-1 are null or not. If the two previously processed events are null, a procedure of inserting a symbol equal to the event evt into tables STl and SCMl is executed at step S25.
  • step S26 the variable evt-2 is updated with the value of variable evt-1, the variable evt-1 is updated with the value of variable evt, and the counter i is incremented by 1. Then the process continues at step S22 for a new iteration to process a symbol sym with two or three events.
  • variable evt-1 If the previously processed event memorized in variable evt-1 is not null at step S24, the procedure of insertion of a symbol into tables STl and SCMl is executed at step S27 for inserting a symbol equal to the concatenated events evt-1 and evt. Then the process continues at step S25. Thus if the variable evt-1 is not null the symbols evt-1 //evt and evt are successively inserted into both tables STl and SCMl.
  • variable evt-2 If the previously processed event memorized in variable evt-2 is not null at step S23, the procedure of insertion of a symbol into tables STl and SCMl is executed at step S28 for inserting a symbol equal to the concatenated events evt-2, evt-1 and evt. Then the process continues at step S27.
  • the variables evt-1 and evt-2 are not null the symbols evt-2//evt-l//evt, evt-1 //evt and evt are successively inserted into both tables STl and SCMl .
  • Figure 6 represents a process 30 of insertion of a symbol sym into the symbol table STl and the symbol code map table SCMl.
  • the process 30 which comprises steps S31 to S35 is executed at steps S25, S27 and S28 of procedure 20.
  • the symbol sym is searched in table STl. If it is found in table STl, the symbol sym is removed from table STl and inserted at the beginning of this table at step S32. Otherwise, if table STl is not full at step S33, the symbol sym is inserted at the beginning of table STl and inserted into table SCMl at a position corresponding to a code equal to the size of table STl (step S34).
  • the first bit (most significant bit) of each code encoding an event indicates whether the event is encoded with one or two bytes, and the 7 or 15 other bits is the code of the event provided by table SCMl or INEVT.
  • the first bit of the code being equal to 1 indicating that the event is encoded with two bytes.
  • This XMS sequence is processed by the parser XSXP which generates the following stream of events: START_ELT_1, START_ELT__2, END_ELT, START_ELT_2, END_ELT (2)
  • the symbol table STl after initialization has the following content:
  • the events in the table STl are arranged so that the most recent event is memorized at the beginning of the table.
  • the event IDLE at the beginning of the table is not an existent event but is used to separate the events inserted during and after initialization of the table.
  • the column "code” gives the code corresponding to each symbol in table STl as provided by table SCMl .
  • the events START_ELT_1, START_ELT_2 and END_ELT are loaded into the buffer Bevt at step Sl.
  • the steps S2, S3, S14, S15, SlO, Sl 1, S6-S9 are successively executed by the encoder BCD.
  • the first event START_ELT_1 encoded with the byte 25 is inserted into the binary stream provided by the encoder BCD.
  • the event STARTJELT_1 is moved up at the beginning of table STl as follows:
  • step S 8 only the symbol START_ELT_1 is inserted at the beginning of table STl since the variables evt-1 and evt-2 of process 20 are null.
  • the buffer Bevt is loaded at step Sl with a next event START_ELT_2 of the event stream, so that the buffer contains the events START_ELT_2, END_ELT and START_ELT_2.
  • the encoder BCD then executes again steps S2, S3, S14, S15, SlO, SIl, S6-S9.
  • step S7 the second event START_ELT_2 encoded with the byte 27 is inserted into the binary stream provided by the encoder BCD.
  • step S8 the symbols START_ELT_1//START_ELT_2 and START_ELT_2 are successively inserted at the beginning of table STl (the variable evt-2 is null), as shown in the table below:
  • table STl Before inserting the symbol START_ELT_1//START_ELT_2, table STl is full. Therefore, when inserting this symbol, the symbol ATTR_52 at the end of table STl is removed and the code 127 is attributed to the new symbol START_ELT_1//START_ELT_2. Then the symbol START_ELT_2 is moved up at the beginning of table STl.
  • the buffer Bevt is loaded at step Sl with a next event END_ELT of the event stream, so that the buffer contains the events END_ELT, START_ELT_2 and END_ELT.
  • the encoder BCD then executes again steps S2, S3, S14, S15, SlO, SIl, S6-S9.
  • the third event END_ELT encoded with the byte 4 is inserted into the binary stream BST provided by the encoder BCD.
  • step S8 the symbols START_ELT_1//START_ELT_2 //END_ELT, START_ELT__2//END_ELT and END_ELT are successively inserted at the beginning of table STl, as shown in the table below:
  • START_ELT_2//END_ELT is inserted at the beginning of table STl and receives the code 125 of the last symbol which is removed from the table. Then the symbol END_ELT is moved up at the beginning of table STl .
  • the buffer Bevt is loaded at step Sl with a next event of the event stream, so that the buffer contains the events START_ELT_2 and END_ELT in the first and second positions of the buffer.
  • the encoder BCD then executes again the steps S2, S3, S 14, Sl 5 and SlO, SIl. Since the symbol START_ELT_2//END_ELT belongs to table STl, the encoder BCD further executes steps S12, S13 and S8, S9. It results that the sequence of the two consecutive events START_ELT_2 and END_ELT are encoded with the single byte code 125 which is inserted into the binary stream provided by the encoder BCD at step S13.
  • step S8 the symbols START_ELT_2 and END_ELT are successively moved up at the beginning of table STl and the corresponding sequences of two and three consecutive events inserted in tables STl and SCMl.
  • the symbols START_ELT_2//END_ELT//START_ELT_2 are successively moved up at the beginning of table STl and the corresponding sequences of two and three consecutive events inserted in tables STl and SCMl.
  • Figure 7 represents a process performed by the binary decoder BDCD.
  • the process of Figure 7 comprises steps S41-S45.
  • This process also uses a symbol table ST2 and a symbol code map table SCM2.
  • the table ST2 is previously initialized and contains all events listed above in Tables 1 and 2.
  • Table ST2 and SCM2 contain a limited number of events which is equal to 128 for example.
  • the decoder BDCD reads a next code in the binary stream BST to be decoded. If the end of the binary stream BST is not reached at step S42, the decoder executes step S43 where the code read is translated into a symbol, i.e. a sequence of one or more concatenated events, using the table symbol code map
  • step S44 the symbol is translated into events.
  • tables SCM2 and ST2 are updated by executing the procedure 20 with each event obtained after execution of step 44.
  • the decoding process performed by the decoder DBCD will be now described using the above example used for illustrating the encoding process. In this example, the decoder has to decode the byte code sequence (2).
  • the decoder reads the code 25 in the binary stream.
  • code 25 corresponds to the symbol START_ELT_1 which contains a single event.
  • the decoder inserts the event thus obtained into the event stream it provides.
  • the table ST2 is then updated by moving up the event START_ELT_1 at the beginning of the table.
  • table ST2 contains the symbols of table 4.
  • the decoder BDCD reads the next code 27 in the binary stream.
  • Code 27 corresponds in table SCM2 to the symbol START_ELT_2 which contains a single event. Then the decoder inserts the event thus obtained into the event stream it provides.
  • the tables ST2 and SCM2 are updated by adding the symbol START_ELT_1//START_ELT_2 associated with the code 127, this symbol being put at the beginning of table ST2.
  • Table ST2 is further updated by moving up the event START_ELT_2 at the beginning of the table.
  • table ST2 contains the symbols of table 5.
  • the decoder BDCD reads the next code 4 in the binary stream.
  • Code 4 corresponds in table SCM2 to the symbol END_ELT which contains a single event. Then the decoder inserts the event thus obtained into the event stream EVST it provides.
  • the tables ST2 and SCM2 are updated by adding the symbol START_ELT_1//START_ELT_2//END_ELT associated with the code 126, and the symbol START_ELT_2//ENDJELT associated with the code 125, these symbols being put at the beginning of table ST2.
  • Table ST2 is further updated by moving up the event END_ELT at the beginning of the table.
  • table ST2 contains the symbols of table 6.
  • the decoder BDCD reads the next code 125 in the binary stream.
  • Code 125 corresponds in table SCM2 to the symbol START_ELT_2//END_ELT which contains a two events.
  • the decoder inserts the events thus obtained into the event stream EVST it provides.
  • the tables ST2 and SCM2 are updated so that the events START_ELT_2 and END_ELT are successively moved up at the beginning of table STl and the corresponding sequences of two and three consecutive events inserted in tables STl and SCMl.
  • table ST2 contains the symbols of table 7.
  • the decoder BDCD provides the event stream (2) which is then translated by the converter SXXP into the XML tag sequence (1).
  • Simulations have been conducted on a collection of 228 MPEG-7 and MPEG- 21 test files with three different setups.
  • An external tag name table containing the namespaces, element names and attribute names dealt in these simulations can be used or not.
  • a ZLIB post- compression using the tag name table as a bootstrap dictionary is done on the output event streams EVST. Moreover, comments, white spaces between elements and prefix mappings can be transmitted or not.
  • a tag name table is used and comments, white spaces and prefix mappings are omitted.
  • An average compression ratio of 19.15 is observed with respect to raw input XML files and a ratio of 6.58 is observed when compared to ZLIB-compressed XML input files.
  • a tag name table is used and comments, white spaces and prefix mappings are kept.
  • An average compression ratio of 4.01 is observed with respect to raw input XML files and a ratio of 1.30 is observed when compared to ZLIB-compressed XML input files.
  • API application program interfaces
  • StAX Streaming API for XML
  • DOM Document Object Model
  • the events of the event stream correspond to nodes in the tree which are successively considered according to a predefined path in the tree.
  • the invention is not limited to the algorithm described above for generating sequences of one to three events associated to a same code. Sequences of two events or more than three events can be thus generated, so that a single code is used to encode several events.
  • other algorithms for grouping together events into sequences can be used. For example it can be provided that a sequence of consecutive events is inserted into tables ST and SCM only after the second occurrence of the event sequence, so as to prevent event sequences that have only one occurrence in the event stream EVST to be inserted into the tables. In this manner, the events are changed in the table ST and SCM not too quickly. Some events appear rarely in an event stream generated from a structured document. Thus it can be provided to prevent generation of event sequences including such events.

Abstract

The invention relates to a method of compressing a structured document (DOC) having a tree-like structure comprising elements nested in each other, each said structured elements comprising structuring elements defining the structure of the element and delimiting at least one value element which is a set of at least one structured element or unstructured element, the method comprising converting the structured document (DOC) into a stream of events (EVST) comprising events corresponding to structuring element of the structured document, and encoding the event stream by generating a binary stream (BST) comprising byte-aligned codes each encoding an event or at least a second occurrence of a sequence of consecutive events occurring in the event stream. Application of the invention to the compression of XML documents without using an XML schema of the document.

Description

METHODS AND DEVICES FOR COMPRESSING STRUCTURED
DOCUMENTS
BACKGROUND OF THE INVENTION
1. Field of the Invention The present invention relates in general to the field of computer systems for transmitting, storing, retrieving and displaying data. It more particularly relates to a method and system for compressing and decompressing structured documents having a structure which is not necessarily known.
It applies particularly but not exclusively to handling, transmitting, storing, and reading structured multimedia documents, digital or video images or image sequences, movies or video programs, and more generally to any transfer of said documents between processor units interconnected by data transmission networks, or between a processor unit and a storage unit, or indeed between a processor unit and a playback unit such as a television set if the document contains digital or video images.
2. Description of the Prior Art
More and more frequently, documents handled and transmitted in this way contain a plurality of different types of data integrated in a structure. A structured document is a set of information elements each associated with a type and attributes, and interconnected by relationships that are mainly hierarchical. Such documents use a markup language such as Standard Generalized Markup Language (SGML), Hypertext Markup Language (HTML), or Extensible Markup Language (XML), serving in particular to distinguish between the various elements of information making up the document. In contrast, in a "linear" document, the content information of the document is mixed in with layout information and type information.
A structured document includes markers also called "tags" for separating different information element in the document. For SGML, XML, or HTML formats, these tags have the form "<XXXX>" and "</XXXX>", the first tag "XXXX" marking the beginning of an information element, and the second tag "</XXXX>" marking the end of said element. An information element may itself be made up of a plurality attributes and lower-level information elements also called "subelements". Thus, a structured document presents a tree or hierarchical structure, each node representing an information element and being connected to a node at a higher hierarchical level representing an information element that contains the information elements at lower level. The nodes located at the ends of branches in such a tree structure represent information elements containing data of a predetermined unstructured type, which is not divided into information subelements.
Thus, a structured document contains separation markers or tags generally represented in textual form, said tags defining information elements or subelements that can themselves contain other information subelements separated by tags.
However markup languages such as XML are verbose languages and thus they are inefficient to be processed and costly to be transmitted or stored. In addition, many software applications tend to produce very large structured documents. This is particularly the case of software applications creating HTML documents and digital graphical documents such as scene description, art, technical drawings, schematics and the like. The documents produced by graphical applications include graphical data describing a large number of points, lines and curves. In these graphical documents, graphical objects are described by graphical structured elements using a language such as SVG (Scalable Vector Graphics) describing two-dimensional vector and mixed vector/raster graphic objects.
Since structured documents are intended to be stored or transmit through digital network, there is a need for reducing the size of such structured documents.
A known solution to reduce the size of structured document is to apply a compression process to the document. In this respect, ISO/IEC 15938-1 (MPEG-7 - Moving Picture Expert Group) or more recently ISO/IEC 23001-1 proposes a method and a binary format for encoding (compressing) a XML structured document and decoding such a binary format. This standard is more particularly designed to deal with highly structured data, such as multimedia metadata, having a known structure defined in one or more schemas. However the compression methods according to ISO/IEC 15938-1 or
ISO/IEC 23001-1 provide a binary stream which is not byte-aligned. Thus conventional compression algorithms such as ZLIB (Compression Library) are not efficient to further compress the binary streams provided by these compression methods. Moreover, the compression or decompression methods according to ISO/IEC
15938-1 or ISO/IEC 23001-1 require to be efficient a schema defining the structure of the documents to be compressed or decompressed. However such a schema is not necessarily available to the encoder or decoder.
. SUMMARY OF THE INVENTION
An embodiment is to enable a structured document to be encoded and decoded without using a schema defining the structure of the document. An embodiment provides a compression method providing a byte-aligned binary stream that can be further processed by a conventional compression algorithm such as ZLIB.
Thus one embodiment provides a compression method of compressing a structured document having a tree-like structure comprising structured elements nested in each other, each said structured elements comprising structuring elements defining the structure of the element and delimiting at least one value element which is a set of at least one structured element or unstructured element.
According to an embodiment, the method comprises steps of: converting the structured document into a stream of events comprising events corresponding to structuring elements of the structured document, and encoding the event stream by generating a binary stream comprising byte-aligned codes each encoding an event or at least a second occurrence of a sequence of consecutive events occurring in the event stream. According to an embodiment, the method comprises a step of applying a compression algorithm to the binary stream to obtain a compressed binary stream.
According to an embodiment, the compression algorithm is ZLIB.
According to an embodiment, the encoding step comprises for an event of the event stream: attributing one byte-aligned code to an event sequence including at least two consecutive events occurring in the event stream and ending with said event, and attributing one byte-aligned code to said event.
According to an embodiment, the encoding step comprises for an event of the event stream: attributing one byte-aligned code to an event sequence including three consecutive events occurring in the event stream and ending with said event, attributing one byte-aligned code to an event sequence including two consecutive events occurring in the event stream and ending with said event, and attributing one byte-aligned code to said event.
According to an embodiment, a correspondence table establishes a link between each byte-aligned code of the binary stream and an event or an event sequence, said correspondence table being of limited size.
According to an embodiment, when the correspondence table is full, a new event or event sequence is inserted in the correspondence table by replacing an oldest event or event sequence by the new event or event sequence, so that the byte-aligned code of the oldest event or event sequence is attributed to the new event or event sequence.
According to an embodiment, the byte-aligned codes are one byte long.
Another embodiment of the present invention provides a decompression method of decompressing a binary stream resulting from compression of an original structured document, the original structured document having a tree-like structure comprising structured elements nested in each other, each said structured elements comprising structuring elements defining the structure of the element and delimiting at least one value element which is a set of at least, one structured element or unstructured element.
According to an embodiment, the binary stream comprises a succession of byte-aligned codes encoding events of an event stream, said events corresponding to structuring elements of the structured document, said decompression method comprising steps of: decoding the binary stream by generating said event stream which comprises an event or at least a second occurrence of an event sequence of consecutive events for each byte-aligned code of the binary stream, and converting each event of the event stream into structuring elements so as to provide the original structured document. According to an embodiment, the decompression method comprises a previous step of applying a decompression algorithm to a compressed binary stream to obtain the binary stream.
According to an embodiment, the decompression algorithm is ZLIB. According to an embodiment, the decoding step comprises for an event of the event stream: attributing one byte-aligned code to an event sequence including at least two consecutive events in the event stream and ending with said event, and attributing one byte-aligned code to said event.
According to an embodiment, the decoding step comprises for an event of the event stream: attributing one byte-aligned code to an event sequence including three consecutive events occurring in the event stream and ending with said event, attributing one byte-aligned code to an event sequence including two consecutive events occurring in the event stream and ending with said event, and attributing one byte-aligned code to said event.
According to an embodiment, a correspondence table establishes a link between each byte-aligned code of the binary stream and an event or an event sequence, said correspondence table being of limited size.
According to an embodiment, when the correspondence table is full, a new event or event sequence is inserted in the correspondence table by replacing an oldest event or event sequence by the new event or event sequence, so that the byte-aligned code of the oldest event or event sequence is attributed to the new event or event sequence.
According to an embodiment, the byte-aligned codes are one byte long. Another embodiment of the present invention provides a compression device for compressing a structured document having a tree-like structure comprising structured elements nested in each other, each said structured elements comprising structuring elements defining the structure of the element and delimiting at least one value element which is a set of at least one structured element or unstructured element. According to an embodiment, the compression device comprises: a converter for converting the structured document into a stream of events comprising events corresponding to structuring elements of the structured document, and an encoder for encoding the event stream by generating a binary stream comprising byte-aligned codes each encoding an event or at least a second occurrence of a sequence of consecutive events occurring in the event stream.
According to an embodiment, the compression device comprises a compression module for applying a compression algorithm to the binary stream to obtain a compressed binary stream. According to an embodiment, the compression algorithm is ZLIB.
According to an embodiment, the encoder is configured to process an event of the event stream by: attributing one byte-aligήed code to an event sequence including at least two consecutive events in the event stream and ending with said event, and attributing one byte-aligned code to said event. According to an embodiment, the encoder is configured to process an event of the event stream by: attributing one byte-aligned code to an event sequence including three consecutive events occurring in the event stream and ending with said event, attributing one byte-aligned code to an event sequence including two consecutive events occurring in the event stream and ending with said event, and attributing one byte-aligned code to said event.
According to an embodiment, a correspondence table establishes a link between each byte-aligned code of the binary stream and an event or an event sequence, said correspondence table being of limited size.
According to an embodiment, the encoder is configured to insert a new event or event sequence in the correspondence table when it is full, by replacing an oldest event or event sequence by the new event or event sequence, so that the byte-aligned code of the oldest event or event sequence is attributed to the new event or event sequence.
According to an embodiment, the byte-aligned codes are one byte long. Another embodiment of the present invention provides a decompression device for decompressing a binary stream resulting from compression of an original structured document, the original structured document having a tree-like structure comprising information elements nested in each other, each said structured elements comprising structuring elements defining the structure of the element and delimiting at least one value element which is a set of at least one structured element or unstructured element. According to an embodiment, the binary stream comprises a succession of byte-aligned codes encoding events of an event stream, said events corresponding to structuring elements of the structured document, said decompression device comprising: a decoder for decoding the binary stream by generating said event stream which comprises an event or at least a second occurrence of an event sequence of consecutive events for each byte-aligned code of the binary stream, and a converter for converting each event of the event stream into structuring elements so as to provide the original structured document.
According to an embodiment, the decompression device comprises a decompression module for applying a decompression algorithm to a compressed binary stream to obtain the binary stream.
According to an embodiment, the decompression algorithm is ZLIB. According to an embodiment, the decoder is configured to process an event of the event stream by: attributing one byte-aligned code to an event sequence including at least two consecutive events in the event stream and ending with said event, and attributing one byte-aligned code to said event.
According to an embodiment, the decoder is configured to process an event of the event stream by: attributing one byte-aligned code to an event sequence including three consecutive events occurring in the event stream and ending with said event, attributing one byte-aligned code to an event sequence including two consecutive events occurring in the event stream and ending with said event, and attributing one byte-aligned code to said event.
According to an embodiment, a correspondence table establishes a link between each byte-aligned code of the binary stream and an event or an event sequence, said correspondence table being of limited size. According to an embodiment, the decoder is configured to insert a new event or event sequence in the correspondence table when it is full, by replacing an oldest event or event sequence by the new event or event sequence, so that the byte-aligned code of the oldest event or event sequence is attributed to the new event or event sequence. .
BRIEF DESCRIPTION OF THE DRAWINGS
These and other advantages and features of the present invention will be presented in greater detail in the following description of the invention in relation to, but not limited by the appended drawings in which:
Figure 1 represents in block form a structured document,
Figure 2 represents in block form a structured document compression device according to one embodiment of the present invention,
Figure 3 represents in block form a structured document decompression device according to one embodiment of the present invention,
Figures 4 to 6 are flow charts of procedures executed by the compression device of Figure 2, Figure 7 is a flow chart of a procedure executed by the decompression device of Figure 3.
DETAILED DESCRIPTION OF THE INVENTION Figure 1 represents a structured document 1 comprising a header HD and a main element MEL. The main element MEL comprises a type identifier Type, a set of attributes Att.l, Att.2, ... Attn and a value VaI. The value of the main element MEL may include one or more structured elements 4 called "subelements of the main element", each comprising a type identifier Type, a set of attributes Att.l- Attn and a value VaI. The value of each element 4 may itself also include one or more structured or unstructured subelements. The unstructured elements have a known format such as string, integer number, floating-point number, ... Each element or subelement is associated with a type defining the structure of the element. Each type of the elements of a structured document may be defined in a schema (for example XML schema in XML language). A structured element of a structured document has the following form in
XML, or in languages derived from XML such as HTML and SVG:
<type attl-name="attl -value" att2-name="att2-value" ... attn-name ="attb-value">value</type>
where "<type ...>" is a beginning tag delimiting the beginning of the element in the document,
"type" is a type identifier of the structured element, "</type>" is an end tag delimiting the end of the element in the document, . "atti-name=atti- value" are the name of the attribute "i" of the element, and the value of the attribute, and
"value" is the value of the element which may comprise structured or unstructured subelements.
Figure 2 represents a compressing device according to an embodiment of the invention. The compressing device comprises a parser XSXP receiving a structured document DOC to be compressed, a binary encoder BCD, and preferably a compression module ZIP such as ZLIB providing a compressed binary stream BDOC.
The parser analyzes the structured document DOC in the form of an alphanumerical document and identifies structuring elements of the document, i.e. alphanumerical strings defining the tags, attributes and values of the elements composing the document and converts these structuring elements into a stream of events EVST. The generated event stream EVST comprises at least one event such as a SAX event (SAX: Simple API - Application Program Interface - for XML) for each structuring element of the document DOC. SAX is defined in detail in http://www.saxproject.org/. For example, the apparition of a XML opening or closing tag in a XML document is a SAX event.
The binary encoder BCD converts the event stream EVST into a binary stream BST. The binary stream comprises a byte code for each event or event sequence of two or three consecutive events. Every occurrence of a new event sequence of two or three consecutive events in the event stream is memorized and a byte code is attributed to the event sequence. When another occurrence of a memorized event sequence is detected in the event stream, the sequence is encoded using the byte code attributed thereto.
The binary encoder BCD uses a symbol table STl and a symbol code map table SCMl, these tables containing events provided by the converter XSXP. These tables are initialized with a table TNEVT containing possible events or most frequent events that can be provided by the converter XSXP. The table SCMl establishes a correspondence between each event or event sequence in the table STl and a byte code used by the encoder BCD to encode the event or event sequence. During the encoding process performed by the binary encoder BCD, the table STl contains the last events or event sequences encoded by the encoder.
The binary stream provided by the binary encoder BCD is byte-aligned, i.e. each byte or sequence of successive bytes of the binary stream corresponds to a part of the structured document DOC which is encoded using an integer number of bytes. Thus a compression algorithm such as ZLIB can be applied with efficiency to the binary stream BST provided by the binary encoder BCD.
According to an embodiment of the present invention, the binary stream BST is further compressed by a compression module ZIP which provides a compressed binary stream BDOC. The module ZIP implements a conventional compression algorithm such as ZLIB.
Figure 3 represents a decompressing device according to an embodiment of the invention. The decompressing device comprises a binary decoder BDCD, and a converter SXXP providing a structured document DOC which is the same as the document applied to the compression device.
According to an embodiment of the present invention, the decompressing device further comprises a decompression module DZIP applying to the binary stream BDOC an initial decompression processing implementing a conventional decompression algorithm such as ZLIB and providing a binary stream BST which is processed by the decoder BDCD.
The binary decoder BDCD converts the binary stream BST applied to the decompression device or provided by the decompression module DZIP into a stream of events EVST. The converter SXXP converts the event stream EVST provided by the binary decoder BDCD into tags constituting the structured document DOC.
The binary decoder BDCD uses a symbol table ST2 and a symbol code map table SCM2 which are similar to the tables STl and SCMl used by the encoder BCD. These tables are initialized with the same table INEVT containing possible events or most frequent events that can be provided by the converter XSXP. The table SCM2 establishes a correspondence between each event or event sequence in the table ST2 and a byte code that may appear in the binary stream BST. During the decoding process performed by the binary encoder BDCD, the table ST2 contains the last events or event sequences appearing in the event stream EVST provided by the decoder BDCD. hi the case of XML and SAX, the SAX events are listed in the table below:
Table 1
There is no generic start element event STARTJELT reporting the beginning of an element. Instead, the SAX START_ELT event is split into specific attribute events ATTR_#att and a specific START_ELT_#elt event. Therefore, infinity of ATTR_#att events and START_ELT_#elt virtually belong to the table of events, as listed in the table below:
Table 2
Three dynamic dictionaries are used to store the XML structural items names: namespaces, elements names and attributes names. These strings dictionaries are dynamic and can grow during the encoding process, with the help of the ADD_NS,
ADD_ENAME and ADD-ANAME special events. By default, these dictionaries are initialized as empty ones.
Currently understood events are described in table 1. AU SAX events are defined by a corresponding SAX event callback. The three special events ADD_NS, ADD_ENAME and ADD_ANAME are used to dynamically add a namespace, an element name or an attribute name in the corresponding dictionary.
The UID number is a fixed numerical ID able to unambiguously define an event, but this is not the value used to encode an event, as explained below. An event can carry zero, one or several parameters, which can be strings or numerical values, which point to a corresponding dynamic strings dictionary. Strings are encoded in UTF-8 format, with a terminating zero.
Figure 4 represents a process performed by the binary encoder BCD. The process of figure 4 comprises steps S1-S17. This process uses the symbol table STl and the symbol code map table SCMl. The table STl is previously initialized with table INEVT so as to contain all events listed above in Tables 1 and 2. Tables STl and SCMl contain a limited number of events which is equal to 127 for example.
At step Sl, the stream of events EVST is read event by event until all events in the stream EVST provided by the converter XSXP from the document are processed (step S2). At this step, three events are loaded into a FIFO buffer (First-In First-Out) Bevt. Table SCMl establishes a correspondence between each event in the table STl and a code which is used to encode an event in the binary stream BST generated by the encoder BCD. The code corresponding to each event in the tables STl and SCMl is equal for example to the position of the event in the table SCMl. At steps S2, the processing performed by the encoder BCD ends when the end of the event stream EVST is reached. At steps S3, S4, S5, the content of each of the memory location of the buffer Bevt is compared to null. If only the third event Bevt[0] in the buffer Bevt is not null, a symbol sym containing the event Bevt[0] is generated at step S6. At step S7, the code corresponding to the event Bevt[0] is determined from table SCMl . This code and the parameters associated with the event are inserted into the binary stream BST generated by the encoder BCD.
At step S8, the symbol sym is used to update the symbol table STl and the symbol code map table SCMl. The process of updating tables STl and SCMl which will be explained below in reference of Figures 5 and 6 consists in inserting the symbol into the tables if it is not already in these tables and putting the symbol at the beginning of the table STl. Thus table STl is arranged so that the newest symbols are listed at the beginning of the table. Then the oldest event Bevt[0] is pushed outside the buffer Bevt at step S9 and the process continues with a new iteration at step Sl where a new event is read in the event stream EVST and loaded in the location Bevt[2] of the buffer Bevt.
If Bevt[l] is not null at step S4, a symbol sym = Bevt[0]//Bevt[l] resulting from the concatenation of Bevt[0] and Bevt[l] is generated at step SlO. If the symbol sym is not already in table STl at step SI l, steps S6-S9 are executed. Otherwise, the buffer Bevt is shifted to push one event (the content of Bevt[0]) outside the buffer Bevt at step S12. At step S13, the code corresponding to the symbol sym=Bevt[0]//Bevt[l] is determined from table SCMl. This code and the parameters associated with the events Bevt[0] and Bevt[l] are inserted into the binary stream BST generated by the encoder BCD. The process continues at step S8 where tables STl and SCMl are updated with the events contained in the symbol sym, and at step S9 where the buffer Bevt is shifted once more. It should be noted that the execution of steps S 12 and S9 pushes two events outside the buffer Bevt since the two events Bevt[0] and Bevt[l] have been processed. Then the process continues with a new iteration at step Sl where two new events are read in the event stream EVST and loaded in the location Bevt[l] and Bevt[2] of the buffer Bevt.
If Bevt[2] is not null at step S3, a symbol sym = Bevt[0]//Bevt[l] //Bevt[2] resulting from the concatenation of Bevt[0], Bevt[l] and Bevt[2] is generated at step S 14. If the symbol sym is not already in table STl at step S 15, the process continues at step SlO. Otherwise, the buffer Bevt is shifted to push one event outside the buffer at step S 16. At step S 17, the code corresponding to the symbol sym = Bevt[0]//Bevt[l]//Bevt[l] is determined from table SCMl. This code and the parameters associated with the events Bevt[0], Bevt[l] and Bevt[2] are inserted into the binary stream BST generated by the encoder BCD. The process continues at step S8 where tables STl and SCMl are updated with the three events contained in the symbol sym, and at step S9 where the buffer Bevt is shifted once more. The buffer Bevt is thus shifted three times at steps S 16, S 12 and S9. Therefore the buffer Bevt is empty when executing step Sl again for a new iteration since three events were processed.
Figure 5 represents a process 20 executed at step S8. The process 20 which comprises steps S21 to S28 updates the tables STl and SCMl with one symbol sym that can contain up to three events. At the first step S21, a counter i is initialized. At step S22, if the ith event of symbol sym is null the process ends. Otherwise steps S23 and S24 are executed to test whether the two previously processed events memorized by variables evt-2 and evt-1 are null or not. If the two previously processed events are null, a procedure of inserting a symbol equal to the event evt into tables STl and SCMl is executed at step S25. At step S26, the variable evt-2 is updated with the value of variable evt-1, the variable evt-1 is updated with the value of variable evt, and the counter i is incremented by 1. Then the process continues at step S22 for a new iteration to process a symbol sym with two or three events.
If the previously processed event memorized in variable evt-1 is not null at step S24, the procedure of insertion of a symbol into tables STl and SCMl is executed at step S27 for inserting a symbol equal to the concatenated events evt-1 and evt. Then the process continues at step S25. Thus if the variable evt-1 is not null the symbols evt-1 //evt and evt are successively inserted into both tables STl and SCMl. If the previously processed event memorized in variable evt-2 is not null at step S23, the procedure of insertion of a symbol into tables STl and SCMl is executed at step S28 for inserting a symbol equal to the concatenated events evt-2, evt-1 and evt. Then the process continues at step S27. Thus if the variables evt-1 and evt-2 are not null the symbols evt-2//evt-l//evt, evt-1 //evt and evt are successively inserted into both tables STl and SCMl .
Figure 6 represents a process 30 of insertion of a symbol sym into the symbol table STl and the symbol code map table SCMl. The process 30 which comprises steps S31 to S35 is executed at steps S25, S27 and S28 of procedure 20. At the first step S31, the symbol sym is searched in table STl. If it is found in table STl, the symbol sym is removed from table STl and inserted at the beginning of this table at step S32. Otherwise, if table STl is not full at step S33, the symbol sym is inserted at the beginning of table STl and inserted into table SCMl at a position corresponding to a code equal to the size of table STl (step S34). The symbol is thus inserted at the end of table SCMl. If table STl is full at step 33, a step S35 is executed. At this step, the oldest symbol is removed from table STl. The symbol sym is inserted at the beginning of table STl and in table SCMl at the location of the removed symbol so as to correspond to the code of the latter. Since tables STl and SCMl contain the 127 most recent events, a new event occurring in the event stream EVST may have been pushed outside these tables. In this case, the first occurrence of such an event is encoded using two bytes using a code of the event provided by table INEVT. The first bit (most significant bit) of each code encoding an event indicates whether the event is encoded with one or two bytes, and the 7 or 15 other bits is the code of the event provided by table SCMl or INEVT. Thus at step S7 (figure 4), if the event does not belong to the table SCMl5 the event is encoded with two bytes as specified by table INEVT, the first bit of the code being equal to 1 indicating that the event is encoded with two bytes.
When a new event to be processed is START_ELEMENT_eltid or ATTRIBUTE_attid (see table 2), such an event is defined using the special events ADD_NS, ADD_ENAME and ADD_ANAME.
The process performed by the encoder BCD (Figure 4) will be now described using an example. Suppose that the following XML tag sequence is to be encoded:
<a><b><b/xb><b/>... (1)
This XMS sequence is processed by the parser XSXP which generates the following stream of events: START_ELT_1, START_ELT__2, END_ELT, START_ELT_2, END_ELT (2)
Besides, the symbol table STl after initialization has the following content:
Table 3
The events in the table STl are arranged so that the most recent event is memorized at the beginning of the table. The event IDLE at the beginning of the table is not an existent event but is used to separate the events inserted during and after initialization of the table. The column "code" gives the code corresponding to each symbol in table STl as provided by table SCMl .
In a first iteration of the encoding process, the events START_ELT_1, START_ELT_2 and END_ELT are loaded into the buffer Bevt at step Sl. The steps S2, S3, S14, S15, SlO, Sl 1, S6-S9 are successively executed by the encoder BCD. At step S7, the first event START_ELT_1 encoded with the byte 25 is inserted into the binary stream provided by the encoder BCD. At step S 8, the event STARTJELT_1 is moved up at the beginning of table STl as follows:
Table 4
During execution of the procedure 20 (step S 8) only the symbol START_ELT_1 is inserted at the beginning of table STl since the variables evt-1 and evt-2 of process 20 are null.
In a second iteration of the encoding process, the buffer Bevt is loaded at step Sl with a next event START_ELT_2 of the event stream, so that the buffer contains the events START_ELT_2, END_ELT and START_ELT_2. The encoder BCD then executes again steps S2, S3, S14, S15, SlO, SIl, S6-S9. At step S7, the second event START_ELT_2 encoded with the byte 27 is inserted into the binary stream provided by the encoder BCD. During execution of the procedure 20 (step S8) the symbols START_ELT_1//START_ELT_2 and START_ELT_2 are successively inserted at the beginning of table STl (the variable evt-2 is null), as shown in the table below:
Table 5
Before inserting the symbol START_ELT_1//START_ELT_2, table STl is full. Therefore, when inserting this symbol, the symbol ATTR_52 at the end of table STl is removed and the code 127 is attributed to the new symbol START_ELT_1//START_ELT_2. Then the symbol START_ELT_2 is moved up at the beginning of table STl.
In a third iteration of the encoding process, the buffer Bevt is loaded at step Sl with a next event END_ELT of the event stream, so that the buffer contains the events END_ELT, START_ELT_2 and END_ELT. The encoder BCD then executes again steps S2, S3, S14, S15, SlO, SIl, S6-S9. At step S7, the third event END_ELT encoded with the byte 4 is inserted into the binary stream BST provided by the encoder BCD. During execution of the procedure 20 (step S8) the symbols START_ELT_1//START_ELT_2 //END_ELT, START_ELT__2//END_ELT and END_ELT are successively inserted at the beginning of table STl, as shown in the table below:
Table 6
Before inserting the symbol START_ELT_1//START_ELT_2 //END_ELT, table STl is full. Therefore, when inserting this symbol the symbol START_ELT_51 at the end of table STl is removed and the code 126 is attributed to the new symbol START_ELT_1//START_ELT_2//END_ELT. Then the symbol
START_ELT_2//END_ELT is inserted at the beginning of table STl and receives the code 125 of the last symbol which is removed from the table. Then the symbol END_ELT is moved up at the beginning of table STl .
In a fourth iteration of the encoding process, the buffer Bevt is loaded at step Sl with a next event of the event stream, so that the buffer contains the events START_ELT_2 and END_ELT in the first and second positions of the buffer. The encoder BCD then executes again the steps S2, S3, S 14, Sl 5 and SlO, SIl. Since the symbol START_ELT_2//END_ELT belongs to table STl, the encoder BCD further executes steps S12, S13 and S8, S9. It results that the sequence of the two consecutive events START_ELT_2 and END_ELT are encoded with the single byte code 125 which is inserted into the binary stream provided by the encoder BCD at step S13. At step S8, the symbols START_ELT_2 and END_ELT are successively moved up at the beginning of table STl and the corresponding sequences of two and three consecutive events inserted in tables STl and SCMl. In other words, the symbols START_ELT_2//END_ELT//START_ELT_2,
END_ELT//START_ELT_2, START_ELT_2,
END_ELT//STARTJELT_2//END_ELT, START_ELT_2 //END_ELT and END_ELT are successively inserted or moved up at the beginning of table STl and inserted in table SCMl if necessary, as shown in the table below:
Table 7
It results that the XML sequence <a><b><b/><b><b/> is encoded by the following sequence of byte codes:
25/27/4/125 (3)
Figure 7 represents a process performed by the binary decoder BDCD. The process of Figure 7 comprises steps S41-S45. This process also uses a symbol table ST2 and a symbol code map table SCM2. The table ST2 is previously initialized and contains all events listed above in Tables 1 and 2. Table ST2 and SCM2 contain a limited number of events which is equal to 128 for example.
At step 41, the decoder BDCD reads a next code in the binary stream BST to be decoded. If the end of the binary stream BST is not reached at step S42, the decoder executes step S43 where the code read is translated into a symbol, i.e. a sequence of one or more concatenated events, using the table symbol code map
SCM2. At step S44, the symbol is translated into events. At step 45, tables SCM2 and ST2 are updated by executing the procedure 20 with each event obtained after execution of step 44. The decoding process performed by the decoder DBCD will be now described using the above example used for illustrating the encoding process. In this example, the decoder has to decode the byte code sequence (2).
In a first iteration, the decoder reads the code 25 in the binary stream. In table SCM2 corresponding to initial table ST2 as shown in table 3, code 25 corresponds to the symbol START_ELT_1 which contains a single event. Then the decoder inserts the event thus obtained into the event stream it provides. The table ST2 is then updated by moving up the event START_ELT_1 at the beginning of the table. At the end of the first iteration, table ST2 contains the symbols of table 4.
In a second iteration, the decoder BDCD reads the next code 27 in the binary stream. Code 27 corresponds in table SCM2 to the symbol START_ELT_2 which contains a single event. Then the decoder inserts the event thus obtained into the event stream it provides. The tables ST2 and SCM2 are updated by adding the symbol START_ELT_1//START_ELT_2 associated with the code 127, this symbol being put at the beginning of table ST2. Table ST2 is further updated by moving up the event START_ELT_2 at the beginning of the table. At the end of the second iteration, table ST2 contains the symbols of table 5.
In a third iteration, the decoder BDCD reads the next code 4 in the binary stream. Code 4 corresponds in table SCM2 to the symbol END_ELT which contains a single event. Then the decoder inserts the event thus obtained into the event stream EVST it provides. The tables ST2 and SCM2 are updated by adding the symbol START_ELT_1//START_ELT_2//END_ELT associated with the code 126, and the symbol START_ELT_2//ENDJELT associated with the code 125, these symbols being put at the beginning of table ST2. Table ST2 is further updated by moving up the event END_ELT at the beginning of the table. At the end of the third iteration, table ST2 contains the symbols of table 6.
In a fourth iteration, the decoder BDCD reads the next code 125 in the binary stream. Code 125 corresponds in table SCM2 to the symbol START_ELT_2//END_ELT which contains a two events. Then the decoder inserts the events thus obtained into the event stream EVST it provides. The tables ST2 and SCM2 are updated so that the events START_ELT_2 and END_ELT are successively moved up at the beginning of table STl and the corresponding sequences of two and three consecutive events inserted in tables STl and SCMl. At the end of the fourth iteration, table ST2 contains the symbols of table 7.
Thus the decoder BDCD provides the event stream (2) which is then translated by the converter SXXP into the XML tag sequence (1).
Simulations have been conducted on a collection of 228 MPEG-7 and MPEG- 21 test files with three different setups. An external tag name table containing the namespaces, element names and attribute names dealt in these simulations can be used or not. A ZLIB post- compression using the tag name table as a bootstrap dictionary is done on the output event streams EVST. Moreover, comments, white spaces between elements and prefix mappings can be transmitted or not.
In a first setup, a tag name table is used and comments, white spaces and prefix mappings are omitted. An average compression ratio of 19.15 is observed with respect to raw input XML files and a ratio of 6.58 is observed when compared to ZLIB-compressed XML input files. In a second setup, a tag name table is used and comments, white spaces and prefix mappings are kept. An average compression ratio of 4.01 is observed with respect to raw input XML files and a ratio of 1.30 is observed when compared to ZLIB-compressed XML input files.
In a third setup, no tag name table is used and comments, white spaces and prefix mappings are omitted. An average compression ratio of 4 is observed with respect to raw input XML files and a ratio of 1.30 is observed when compared to ZLIB-compressed XML input files.
In the light of the examples described above, it will be clear to those skilled in the art that the method and device according to the invention are susceptible to several variations of implementations or applications. In particular, the invention is not limited to XML language or derived XML languages such as HTML or SVG. The invention more generally applies to all document structure languages.
Other application program interfaces (API) than SAX such as StAX (Streaming API for XML) and DOM (Document Object Model) can be used to generate an event stream. In the case of DOM, the structure of the document is represented as a tree comprising nodes. The events of the event stream correspond to nodes in the tree which are successively considered according to a predefined path in the tree.
The invention is not limited to the algorithm described above for generating sequences of one to three events associated to a same code. Sequences of two events or more than three events can be thus generated, so that a single code is used to encode several events. In addition, other algorithms for grouping together events into sequences can be used. For example it can be provided that a sequence of consecutive events is inserted into tables ST and SCM only after the second occurrence of the event sequence, so as to prevent event sequences that have only one occurrence in the event stream EVST to be inserted into the tables. In this manner, the events are changed in the table ST and SCM not too quickly. Some events appear rarely in an event stream generated from a structured document. Thus it can be provided to prevent generation of event sequences including such events.

Claims

1. A compression method of compressing a structured document (DOC) having a tree-like structure comprising structured elements (4) nested in each other, each said structured elements comprising structuring elements defining the structure of the element and delimiting at least one value element (VaI) which is a set of at least one structured element or unstructured element, characterized in that it comprises steps of: converting the structured document (DOC) into a stream of events (EVST) comprising events corresponding to structuring elements of the structured document, and encoding the event stream by generating a binary stream (BST) comprising byte-aligned codes each encoding an event or at least a second occurrence of a sequence of consecutive events occurring in the event stream.
2. The compression method according to claim 1, comprising a step of applying a compression algorithm (ZIP) to the binary stream (BST) to obtain a compressed binary stream (BDOC).
3. The compression method according to claim 2, wherein the compression algorithm (ZIP) is ZLIB.
4. The compression method according to anyone of claims 1 to 3, wherein the encoding step comprises for an event of the event stream (EVST): attributing one byte-aligned code to an event sequence including at least two consecutive events occurring in the event stream and ending with said event, and attributing one byte-aligned code to said event.
5. The compression method according to anyone of claims 1 to 4, wherein the encoding step comprises for an event of the event stream (EVST): attributing one byte-aligned code to an event sequence including three consecutive events occurring in the event stream and ending with said event, attributing one byte-aligned code to an event sequence including two consecutive events occurring in the event stream and ending with said event, and attributing one byte-aligned code to said event.
6. The compression method according to anyone of claims 1 to 5, wherein a correspondence table (SCMl) establishes a link between each byte-aligned code of the binary stream (BST) and an event or an event sequence, said correspondence table being of limited size.
7. The compression method according to claim 6, wherein when the correspondence table (SCMl) is full, a new event or event sequence is inserted in the correspondence table by replacing an oldest event or event sequence by the new event or event sequence, so that the byte-aligned code of the oldest event or event sequence is attributed to the new event or event sequence.
8. The compression method according to anyone of claims 1 to 7, wherein the byte-aligned codes are one byte long.
9. A decompression method of decompressing a binary stream (BST) resulting from compression of an original structured document (DOC), the original structured document (DOC) having a tree-like structure comprising structured elements (4) nested in each other, each said structured elements comprising structuring elements defining the structure of the element and delimiting at least one value element (VaI) which is a set of at least one structured element or unstructured element, characterized in that the binary stream (BST) comprises a succession of byte- aligned codes encoding events of an event stream (EVST), said events corresponding to structuring elements of the structured document, said decompression method comprising steps of: decoding the binary stream (BST) by generating said event stream which comprises an event or at least a second occurrence of an event sequence of consecutive events for each byte-aligned code of the binary stream, and converting each event of the event stream into structuring elements so as to provide the original structured document (DOC).
10. The decompression method according to claim 9, comprising a previous step of applying a decompression algorithm (DZIP) to a compressed binary stream (BDOC) to obtain the binary stream (BST).
11. The decompression method according to claim 10, wherein the decompression algorithm (DZIP) is ZLIB.
12. The decompression method according to anyone of claims 9 to 11, wherein the decoding step comprises for an event of the event stream (EVST): attributing one byte-aligned code to an event sequence including at least two consecutive events in the event stream and ending with said event, and attributing one byte-aligned code to said event.
13. The decompression method according to anyone of claims 9 to 12, wherein the decoding step comprises for an event of the event stream (EVST): attributing one byte-aligned code to an event sequence including three consecutive events occurring in the event stream and ending with said event, attributing one byte-aligned code to an event sequence including two consecutive events occurring in the event stream and ending with said event, and attributing one byte-aligned code to said event.
14. The decompression method according to anyone of claims 9 to 13, wherein a correspondence table (SCM2) establishes a link between each byte-aligned code of the binary stream (BST) and an event or an event sequence, said correspondence table being of limited size.
15. The decompression method according to claim 14, wherein when the correspondence table (SCM2) is full, a new event or event sequence is inserted in the correspondence table by replacing an oldest event or event sequence by the new event or event sequence, so that the byte-aligned code of the oldest event or event sequence is attributed to the new event or event sequence.
16. The decompression method according to anyone of claims 9 to 15, wherein the byte-aligned codes are one byte long.
17. A compression device for compressing a structured document (DOC) having a tree-like structure comprising structured elements (4) nested in each other, each said structured elements comprising structuring elements defining the structure of the element and delimiting at least one value element (VaI) which is a set of at least one structured element or unstructured element, characterized in that it comprises: a converter (XSXP) for converting the structured document (DOC) into a stream of events (EVST) comprising events corresponding to structuring elements of the structured document, and an encoder (BCD) for encoding the event stream by generating a binary stream (BST) comprising byte-aligned codes each encoding an event or at least a second occurrence of a sequence of consecutive events occurring in the event stream.
18. The compression device according to claim 17, comprising a compression module (ZIP) for applying a compression algorithm to the binary stream (BST) to obtain a compressed binary stream (BDOC).
19. The compression device according to claim 18, wherein the compression algorithm is ZLIB.
20. The compression device according to claim 17 or 19, wherein the encoder (BCD) is configured to process an event of the event stream (EVST) by: attributing one byte-aligned code to an event sequence including at least two consecutive events in the event stream and ending with said event, and attributing one byte-aligned code to said event.
21. The compression device according to claim 17 or 20, wherein the encoder (BCD) is configured to process an event of the event stream (EVST) by: attributing one byte-aligned code to an event sequence including three consecutive events occurring in the event stream and ending with said event, attributing one byte-aligned code to an event sequence including two consecutive events occurring in the event stream and ending with said event, and attributing one byte-aligned code to said event.
22. The compression device according to anyone of claims 17 to 21, wherein a correspondence table (SCMl) establishes a link between each byte-aligned code of the binary stream (BST) and an event or an event sequence, said correspondence table being of limited size.
23. The compression device according to claim 22, wherein the encoder (BCD) is configured to insert a new event or event sequence in the correspondence table (SCMl) when it is full, by replacing an oldest event or event sequence by the new event or event sequence, so that the byte-aligned code of the oldest event or event sequence is attributed to the new event or event sequence.
24. The compression device according to anyone of claims 17 to 23, wherein the byte-aligned codes are one byte long.
25. A decompression device for decompressing a binary stream (BST) resulting from compression of an original structured document (DOC), the original structured document (DOC) having a tree-like structure comprising information elements (4) nested in each other, each said structured elements comprising structuring elements defining the structure of the element and delimiting at least one value element (VaI) which is a set of at least one structured element or unstructured element, characterized in that the binary stream (BST) comprises a succession of byte- aligned codes encoding events of an event stream (EVST), said events corresponding to structuring elements of the structured document, said decompression device comprising: a decoder (BDCD) for decoding the binary stream (BST) by generating said event stream which comprises an event or at least a second occurrence of an event sequence of consecutive events for each byte-aligned code of the binary stream, and a converter (SXXP) for converting each event of the event stream into structuring elements so as to provide the original structured document (DOC).
26. The decompression device according to claim 25, comprising a decompression module (DZIP) for applying a decompression algorithm (DZIP) to a compressed binary stream (BDOC) to obtain the binary stream (BST).
27. The decompression device according to claim 26, wherein the decompression algorithm (DZIP) is ZLIB.
28. The decompression device according to anyone of claims 25 to 27, wherein the decoder (BDCD) is configured to process an event of the event stream (EVST) by: attributing one byte-aligned code to an event sequence including at least two consecutive events in the event stream and ending with said event, and attributing one byte-aligned code to said event.
29. The decompression device according to anyone of claims 25 to 28, wherein the decoder (BDCD) is configured to process an event of the event stream
(EVST) by: attributing one byte-aligned code to an event sequence including three consecutive events occurring in the event stream and ending with said event, attributing one byte-aligned code to an event sequence including two consecutive events occurring in the event stream and ending with said event, and attributing one byte-aligned code to said event.
30. The decompression device according to claim 29, wherein a correspondence table (SCM2) establishes a link between each byte-aligned code of the binary stream (BST) and an event or an event sequence, said correspondence table being of limited size.
31. The decompression device according to claim 30, wherein the decoder (BDCD) is configured to insert a new event or event sequence in the correspondence table (SCM2) when it is full, by replacing an oldest event or event sequence by the new event or event sequence, so that the byte-aligned code of the oldest event or event sequence is attributed to the new event or event sequence.
32. The decompression device according to anyone of claims 25 to 31, wherein the byte-aligned codes are one byte long.
EP07734998A 2006-07-12 2007-07-06 Methods and devices for compressing structured documents Withdrawn EP2039009A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US80713106P 2006-07-12 2006-07-12
PCT/IB2007/001992 WO2008010059A1 (en) 2006-07-12 2007-07-06 Methods and devices for compressing structured documents

Publications (1)

Publication Number Publication Date
EP2039009A1 true EP2039009A1 (en) 2009-03-25

Family

ID=38578679

Family Applications (1)

Application Number Title Priority Date Filing Date
EP07734998A Withdrawn EP2039009A1 (en) 2006-07-12 2007-07-06 Methods and devices for compressing structured documents

Country Status (3)

Country Link
EP (1) EP2039009A1 (en)
JP (1) JP2009543243A (en)
WO (1) WO2008010059A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186611B (en) * 2011-12-30 2016-03-30 北大方正集团有限公司 A kind of compression, decompress(ion) and inquiry document method, device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2008010059A1 *

Also Published As

Publication number Publication date
JP2009543243A (en) 2009-12-03
WO2008010059A1 (en) 2008-01-24

Similar Documents

Publication Publication Date Title
US20080294980A1 (en) Methods and Devices for Compressing and Decompressing Structured Documents
KR100614677B1 (en) Method for compressing/decompressing a structured document
US7707154B2 (en) Method and devices for encoding/decoding structured documents, particularly XML documents
US7565452B2 (en) System for storing and rendering multimedia data
JP4373721B2 (en) Method and system for encoding markup language documents
US20070143664A1 (en) A compressed schema representation object and method for metadata processing
US8015218B2 (en) Method for compressing/decompressing structure documents
US20040054669A1 (en) Method for dividing structured documents into several parts
CN102214170A (en) Methods and systems for compressing and decompressing extensible markup language (XML) data
US20040111677A1 (en) Efficient means for creating MPEG-4 intermedia format from MPEG-4 textual representation
US7627586B2 (en) Method for encoding a structured document
JP2006517309A (en) Efficient means to create MPEG-4 Textual Representation from MPEG-4 InternalFormat
WO2019018030A1 (en) Structured record compression and retrieval
US7571152B2 (en) Method for compressing and decompressing structured documents
WO2008010059A1 (en) Methods and devices for compressing structured documents
KR20050023411A (en) Method and devices for encoding/decoding structured documents, especially xml documents
JP2004342029A (en) Method and device for compressing structured document
EP1199893A1 (en) Method for structuring a bitstream for binary multimedia descriptions and method for parsing this bitstream
JP2005276193A (en) Schema and style sheet for dibr data

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20090128

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL BA HR MK RS

RIN1 Information on inventor provided before grant (corrected)

Inventor name: THIENOT, CEDRIC

Inventor name: DE CUETOS, PHILIPPE

Inventor name: BERJON, ROBIN

Inventor name: PAU, GREGOIRE

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20110201