WO2008010059A1 - Procédés et dispositifs de compression de documents structurés - Google Patents
Procédés et dispositifs de compression de documents structurés Download PDFInfo
- Publication number
- WO2008010059A1 WO2008010059A1 PCT/IB2007/001992 IB2007001992W WO2008010059A1 WO 2008010059 A1 WO2008010059 A1 WO 2008010059A1 IB 2007001992 W IB2007001992 W IB 2007001992W WO 2008010059 A1 WO2008010059 A1 WO 2008010059A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- event
- stream
- byte
- sequence
- aligned
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H03—ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/14—Tree-structured documents
- G06F40/143—Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/149—Adaptation of the text data for streaming purposes, e.g. Efficient XML Interchange [EXI] format
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Document Processing Apparatus (AREA)
Abstract
L'invention concerne un procédé de compression d'un document structuré (DOC) possédant une structure arborescente comprenant des éléments imbriqués, chacun desdits éléments structurés comprenant des éléments structurants qui en définissent la structure et délimitent au moins un élément de valeur constituant un ensemble d'au moins un élément structuré ou un élément non structuré. Le procédé comprend les étapes consistant à convertir le document structuré (DOC) en un flux d'événements (EVST) comprenant des événements correspondant aux éléments structurants du document structuré; et à coder le flux d'événements en générant un flux binaire (BST) comprenant des codes alignés sur des octets codant chacun un événement ou au moins une deuxième occurrence d'une séquence d'événements consécutifs apparaissant dans le flux d'événements. L'invention permet de comprimer un document XML sans faire appel à un schéma XLM du document.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP07734998A EP2039009A1 (fr) | 2006-07-12 | 2007-07-06 | Procédés et dispositifs de compression de documents structurés |
JP2009518997A JP2009543243A (ja) | 2006-07-12 | 2007-07-06 | 構造化文書の圧縮のための方法と装置 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US80713106P | 2006-07-12 | 2006-07-12 | |
US60/807,131 | 2006-07-12 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2008010059A1 true WO2008010059A1 (fr) | 2008-01-24 |
Family
ID=38578679
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2007/001992 WO2008010059A1 (fr) | 2006-07-12 | 2007-07-06 | Procédés et dispositifs de compression de documents structurés |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP2039009A1 (fr) |
JP (1) | JP2009543243A (fr) |
WO (1) | WO2008010059A1 (fr) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103186611B (zh) * | 2011-12-30 | 2016-03-30 | 北大方正集团有限公司 | 一种压缩、解压及查询文档的方法、装置 |
-
2007
- 2007-07-06 JP JP2009518997A patent/JP2009543243A/ja active Pending
- 2007-07-06 WO PCT/IB2007/001992 patent/WO2008010059A1/fr active Application Filing
- 2007-07-06 EP EP07734998A patent/EP2039009A1/fr not_active Withdrawn
Non-Patent Citations (3)
Title |
---|
CHENEY, J.: "Compressing XML with multiplexed hierarchical PPM models", DATA COMPRESSION CONFERENCE, 2001. PROCEEDINGS. DCC 2001., 29 March 2001 (2001-03-29), XP002455711 * |
JAMES CHENEY: "SAX event encoding", INTERNET ARTICLE, 24 November 2000 (2000-11-24), XP002455712, Retrieved from the Internet <URL:http://xmlppm.sourceforge.net/paper/node5.html> [retrieved on 20071018] * |
ÖZDEN M: "A Binary Encoding for Efficient XML Processing", INTERNET CITATION, 17 December 2002 (2002-12-17), XP002386926, Retrieved from the Internet <URL:http://www.ti5.tu-harburg.de/publication/2002/> [retrieved on 20060623] * |
Also Published As
Publication number | Publication date |
---|---|
EP2039009A1 (fr) | 2009-03-25 |
JP2009543243A (ja) | 2009-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20080294980A1 (en) | Methods and Devices for Compressing and Decompressing Structured Documents | |
KR100614677B1 (ko) | 구조화된 문서를 압축/복원하기 위한 방법 | |
US7707154B2 (en) | Method and devices for encoding/decoding structured documents, particularly XML documents | |
US7565452B2 (en) | System for storing and rendering multimedia data | |
JP4373721B2 (ja) | マークアップ言語文書を符号化するための方法およびシステム | |
US20070143664A1 (en) | A compressed schema representation object and method for metadata processing | |
US7275060B2 (en) | Method for dividing structured documents into several parts | |
US20040013307A1 (en) | Method for compressing/decompressing structure documents | |
CN102214170A (zh) | 一种xml数据压缩和解压缩方法及系统 | |
US20040111677A1 (en) | Efficient means for creating MPEG-4 intermedia format from MPEG-4 textual representation | |
US7627586B2 (en) | Method for encoding a structured document | |
JP2006517309A (ja) | MPEG−4IntermediaFormatからMPEG−4TextualRepresentationを作成する効率的な手段 | |
US7735001B2 (en) | Method and system for decoding encoded documents | |
WO2019018030A1 (fr) | Compression et récupération d'enregistrements structurés | |
US7571152B2 (en) | Method for compressing and decompressing structured documents | |
EP2039009A1 (fr) | Procédés et dispositifs de compression de documents structurés | |
KR20050023411A (ko) | 구조화된 문서들, 특히 xml 문서들을인코딩/디코딩하기 위한 방법 및 장치 | |
JP2004342029A (ja) | 構造化文書圧縮方法及び装置 | |
EP1199893A1 (fr) | Méthode pour structurer un flux de données de descriptions multimédia binaires et méthode d'analyse synthaxique associée | |
EP2161667A1 (fr) | Procédé et dispositif pour le codage d'éléments | |
JP2005276193A (ja) | Dibrデータのためのスキーマ及びスタイルシート |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 2007734998 Country of ref document: EP |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07734998 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2009518997 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: RU |