WO2008010059A1 - Procédés et dispositifs de compression de documents structurés - Google Patents

Procédés et dispositifs de compression de documents structurés Download PDF

Info

Publication number
WO2008010059A1
WO2008010059A1 PCT/IB2007/001992 IB2007001992W WO2008010059A1 WO 2008010059 A1 WO2008010059 A1 WO 2008010059A1 IB 2007001992 W IB2007001992 W IB 2007001992W WO 2008010059 A1 WO2008010059 A1 WO 2008010059A1
Authority
WO
WIPO (PCT)
Prior art keywords
event
stream
byte
sequence
aligned
Prior art date
Application number
PCT/IB2007/001992
Other languages
English (en)
Inventor
Grégoire Pau
Robin Berjon
Philippe De Cuetos
Cédric Thienot
Original Assignee
Expway
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Expway filed Critical Expway
Priority to EP07734998A priority Critical patent/EP2039009A1/fr
Priority to JP2009518997A priority patent/JP2009543243A/ja
Publication of WO2008010059A1 publication Critical patent/WO2008010059A1/fr

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • G06F40/143Markup, e.g. Standard Generalized Markup Language [SGML] or Document Type Definition [DTD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/149Adaptation of the text data for streaming purposes, e.g. Efficient XML Interchange [EXI] format

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Document Processing Apparatus (AREA)

Abstract

L'invention concerne un procédé de compression d'un document structuré (DOC) possédant une structure arborescente comprenant des éléments imbriqués, chacun desdits éléments structurés comprenant des éléments structurants qui en définissent la structure et délimitent au moins un élément de valeur constituant un ensemble d'au moins un élément structuré ou un élément non structuré. Le procédé comprend les étapes consistant à convertir le document structuré (DOC) en un flux d'événements (EVST) comprenant des événements correspondant aux éléments structurants du document structuré; et à coder le flux d'événements en générant un flux binaire (BST) comprenant des codes alignés sur des octets codant chacun un événement ou au moins une deuxième occurrence d'une séquence d'événements consécutifs apparaissant dans le flux d'événements. L'invention permet de comprimer un document XML sans faire appel à un schéma XLM du document.
PCT/IB2007/001992 2006-07-12 2007-07-06 Procédés et dispositifs de compression de documents structurés WO2008010059A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP07734998A EP2039009A1 (fr) 2006-07-12 2007-07-06 Procédés et dispositifs de compression de documents structurés
JP2009518997A JP2009543243A (ja) 2006-07-12 2007-07-06 構造化文書の圧縮のための方法と装置

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US80713106P 2006-07-12 2006-07-12
US60/807,131 2006-07-12

Publications (1)

Publication Number Publication Date
WO2008010059A1 true WO2008010059A1 (fr) 2008-01-24

Family

ID=38578679

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2007/001992 WO2008010059A1 (fr) 2006-07-12 2007-07-06 Procédés et dispositifs de compression de documents structurés

Country Status (3)

Country Link
EP (1) EP2039009A1 (fr)
JP (1) JP2009543243A (fr)
WO (1) WO2008010059A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186611B (zh) * 2011-12-30 2016-03-30 北大方正集团有限公司 一种压缩、解压及查询文档的方法、装置

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHENEY, J.: "Compressing XML with multiplexed hierarchical PPM models", DATA COMPRESSION CONFERENCE, 2001. PROCEEDINGS. DCC 2001., 29 March 2001 (2001-03-29), XP002455711 *
JAMES CHENEY: "SAX event encoding", INTERNET ARTICLE, 24 November 2000 (2000-11-24), XP002455712, Retrieved from the Internet <URL:http://xmlppm.sourceforge.net/paper/node5.html> [retrieved on 20071018] *
ÖZDEN M: "A Binary Encoding for Efficient XML Processing", INTERNET CITATION, 17 December 2002 (2002-12-17), XP002386926, Retrieved from the Internet <URL:http://www.ti5.tu-harburg.de/publication/2002/> [retrieved on 20060623] *

Also Published As

Publication number Publication date
EP2039009A1 (fr) 2009-03-25
JP2009543243A (ja) 2009-12-03

Similar Documents

Publication Publication Date Title
US20080294980A1 (en) Methods and Devices for Compressing and Decompressing Structured Documents
KR100614677B1 (ko) 구조화된 문서를 압축/복원하기 위한 방법
US7707154B2 (en) Method and devices for encoding/decoding structured documents, particularly XML documents
US7565452B2 (en) System for storing and rendering multimedia data
JP4373721B2 (ja) マークアップ言語文書を符号化するための方法およびシステム
US20070143664A1 (en) A compressed schema representation object and method for metadata processing
US7275060B2 (en) Method for dividing structured documents into several parts
US20040013307A1 (en) Method for compressing/decompressing structure documents
CN102214170A (zh) 一种xml数据压缩和解压缩方法及系统
US20040111677A1 (en) Efficient means for creating MPEG-4 intermedia format from MPEG-4 textual representation
US7627586B2 (en) Method for encoding a structured document
JP2006517309A (ja) MPEG−4IntermediaFormatからMPEG−4TextualRepresentationを作成する効率的な手段
US7735001B2 (en) Method and system for decoding encoded documents
WO2019018030A1 (fr) Compression et récupération d&#39;enregistrements structurés
US7571152B2 (en) Method for compressing and decompressing structured documents
EP2039009A1 (fr) Procédés et dispositifs de compression de documents structurés
KR20050023411A (ko) 구조화된 문서들, 특히 xml 문서들을인코딩/디코딩하기 위한 방법 및 장치
JP2004342029A (ja) 構造化文書圧縮方法及び装置
EP1199893A1 (fr) Méthode pour structurer un flux de données de descriptions multimédia binaires et méthode d&#39;analyse synthaxique associée
EP2161667A1 (fr) Procédé et dispositif pour le codage d&#39;éléments
JP2005276193A (ja) Dibrデータのためのスキーマ及びスタイルシート

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2007734998

Country of ref document: EP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07734998

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2009518997

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: RU