CN103605730A - XML (extensible markup language) compressing method and device based on flexible-length identification codes - Google Patents

XML (extensible markup language) compressing method and device based on flexible-length identification codes Download PDF

Info

Publication number
CN103605730A
CN103605730A CN201310580015.5A CN201310580015A CN103605730A CN 103605730 A CN103605730 A CN 103605730A CN 201310580015 A CN201310580015 A CN 201310580015A CN 103605730 A CN103605730 A CN 103605730A
Authority
CN
China
Prior art keywords
data dictionary
xml document
xml
attribute
represent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310580015.5A
Other languages
Chinese (zh)
Inventor
龚如宾
张炼珠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANXI SANHENG AUTOMATION EQUIPMENT CO Ltd
University of Shanghai for Science and Technology
Original Assignee
SHANXI SANHENG AUTOMATION EQUIPMENT CO Ltd
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANXI SANHENG AUTOMATION EQUIPMENT CO Ltd, University of Shanghai for Science and Technology filed Critical SHANXI SANHENG AUTOMATION EQUIPMENT CO Ltd
Priority to CN201310580015.5A priority Critical patent/CN103605730A/en
Publication of CN103605730A publication Critical patent/CN103605730A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3084Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction using adaptive string matching, e.g. the Lempel-Ziv method

Abstract

The invention relates to the technical field of data business, in particular to an XML (extensible markup language) compressing method and device. The compressing method uses flexible-length identification codes to replace the elements and attributes in an XML file, and the XML file with large amount of elements and attributes can be compressed efficiently with low load. The method includes: defining one flexible-length identification code, corresponding to each element of the XML file, in a data dictionary; defining one flexible-length identification code, corresponding to each attribute of the XML file, in a data dictionary; by a transmitting party, replacing the elements and attributes in the XML file one by one with the corresponding flexible-length identification codes defined in the data dictionary so as to realize replacement compression of the XML file; by a receiving party, replacing the flexible-length identification codes of the received compressed XML file with the elements and attributes corresponding to the flexible-length identification codes and defined in the data dictionary to realize decompression of the XML file.

Description

The compression method of a kind of XML based on random length identification code and device
Technical field
The present invention relates to data service technical field, particularly the compression method of a kind of XML and device.
Background technology
Extend markup language (XML, Extensible Markup Language) as a kind of cross-platform normal data Interchange Format, be widely used in that digital books form shfft shows, the aspect such as exchanges data and storage, be the powerful when pre-treatment structured document information.Owing to comprising label and the structural information repeating in a large number in XML document, so comprise a large amount of repetition label and structural information in XML document, the cost of its storage, transmission is increased, hindered to a certain extent the development of XML application, on bandwidth and resource-constrained mobile device, seemed especially outstanding especially.Therefore in a lot of application such as mobile reading of digital books, need XML file to compress.Although XML document can adopt generic text compress technique, (as Gzip, Bzip2, WinZip etc.) compresses, and can lose like this XML file inherent advantages (as architectural feature, semantic feature etc.).Utilize the intrinsic redundancy properties of XML file to be collapsed into the focus of current research, conventional XML compression method has XMILL, XMLPPM, XWRT etc., but these compression algorithms are had relatively high expectations to CPU arithmetic capability, this moves application for mobile phone is a bottleneck, because in mobile application, need to compress and decompress(ion) XML file with low operand.
Publication number is: 102096704A, denomination of invention is: the compression method of a kind of XML and the application for a patent for invention of device disclose compression method and the device of a kind of XML, playing technical scheme is: for each element in XML document, define a byte of answering in contrast in data dictionary; And for each attribute in XML document, in data dictionary, define a byte of answering in contrast.Said method is compressed XML document at high speed, but only has 5 owing to distributing to the number of bits of masurium, can only represent at most 32 masuriums; Distribute to the number of bits of attribute-name and only have 6, can only represent at most 64 attribute-name, this is nowhere near in the expression of format e-book, because format e-book will represent geometry layout structure and the logic layout structure of document, often needs to represent up to a hundred masuriums and attribute-name.
Summary of the invention
The present invention overcomes the deficiency that prior art exists, and provides a kind of and replaces the element that represents in XML document and the compression method of attribute with random length identification code, realizes underload, compresses the XML document that contains a great number of elements and attribute efficiently.
For achieving the above object, the invention provides the compression method of a kind of XML based on random length identification code, the method comprises:
For each element in XML document, in data dictionary, define a random length identification code of answering in contrast; And for each attribute in XML document, in data dictionary, define a random length identification code of answering in contrast;
Transmit leg is used the corresponding random length identification code defining in described data dictionary to replace one by one element and the attribute in XML document, realizes the replacement compression of XML document;
Take over party replaces the random length identification code in the XML document after received replacement compression according to the element corresponding with corresponding random length identification code defining in described data dictionary and attribute, realizes the decompress(ion) of XML document;
Described for each element in XML document, in data dictionary, define a random length identification code of answering in contrast; And for each attribute in XML document, in data dictionary, define a random length identification code of answering in contrast and comprise:
For each element in XML document, adopt 8 bit identification codes to represent, or adopt 16 bit identification codes to represent; Wherein the 1st in high 4 is used for determining whether XML form, the 2nd is used for determining whether element, the 3rd is used for determining whether closure element, and the 4th need to represent identity element by the byte of two 8 for judging whether, remaining bit is used for representing this element;
For each attribute in XML document, adopt 8 bit identification codes to represent, or adopt 16 bit identification codes to represent; Wherein the 1st in high 3 is used for determining whether XML form, the 2nd is used for determining whether attribute, the 3rd need to represent same attribute by the byte of two 8 for judging whether, remaining bit is used for representing this attribute, and wherein the value of attribute represents with string format.
In described data dictionary, for each element in XML document, adopt DDD analysis method to decide and adopt 8 bit identification codes to represent, or adopt 16 bit identification codes to represent.
In described data dictionary, for each element in XML document, employing expends byte number analytic approach and decides 8 bit identification codes of employing to represent, or adopts 16 bit identification codes to represent.
In described data dictionary, for each attribute in XML document, adopt DDD analysis method to decide and adopt 8 bit identification codes to represent, or adopt 16 bit identification codes to represent.
In described data dictionary, for each attribute in XML document, employing expends byte number analytic approach and decides 8 bit identification codes of employing to represent, or adopts 16 bit identification codes to represent.
The present invention also provides the compression set of a kind of XML, and this device comprises: XML read module, data dictionary memory module, label replacement compression module and universal compressed module for compression; Wherein:
XML read module, for reading XML bytes of stream data;
Data dictionary memory module is used in compression, for save data dictionary;
In described data dictionary, for each element in XML document, in data dictionary, defined a random length identification code of answering in contrast; And for each attribute in XML document, in data dictionary, defined a random length identification code of answering in contrast;
Label is replaced compression module, for the corresponding random length identification code defining according to data dictionary memory module, replaces one by one element and the attribute in XML document, generates the XML document of replacing after compression;
Universal compressed module, for using universal compressed algorithm further to compress to the XML document after data dictionary and replacement compression, generates packed data.
The present invention further provides the decompression device of a kind of XML, this device comprises: working solution die block, label are replaced decompression module and decompress(ion) data dictionary memory module, wherein:
Working solution die block, for being used general decompression algorithm to carry out decompress(ion) to the packed data receiving;
Decompress(ion) data dictionary memory module, for storing data dictionary;
In described data dictionary, for each element in XML document, in data dictionary, defined a random length identification code of answering in contrast; And for each attribute in XML document, in data dictionary, defined a random length identification code of answering in contrast;
Label is replaced decompression module, uses the data dictionary of data dictionary memory module storage for decompress(ion), and the random length identification code in the XML document after compression is replaced with to corresponding element and attribute one by one, obtains original XML document.
The data dictionary that above-mentioned transmit leg is used comprises identical content with the data dictionary that take over party uses, and the random length identification code defining in transmit leg usage data dictionary is replaced element and the attribute in XML document one by one, realizes the replacement compression of XML document; Take over party receives after packed data, according to the element corresponding with corresponding random length identification code and the attribute of described data dictionary definition, replace one by one the random length identification code in received XML document, realize the replacement decompress(ion) of XML document, after decompress(ion), just obtain original XML document; The invention solves XML document in storage and transmitting procedure and contain a great number of elements and attribute, and the large problem of data volume.
Accompanying drawing explanation
Below in conjunction with accompanying drawing, the present invention is further detailed explanation.
Fig. 1 is the process flow diagram of the compression method of a kind of XML based on random length identification code in the present invention.
Fig. 2 is the structural representation of the compression set of a kind of XML in the present invention.
Fig. 3 is the structural representation of the decompression device of a kind of XML in the present invention.
Fig. 4 defines the structural representation of a shared conventional data dictionary for a plurality of XML document in the present invention.
Embodiment
Fig. 1 is the process flow diagram of the compression method of a kind of XML based on random length identification code in the present invention, and as shown in Figure 1, the compression method of a kind of XML based on random length identification code comprises the steps:
Step 101 for each element in XML document, defines a random length identification code of answering in contrast in data dictionary; And for each attribute in XML document, in data dictionary, define a random length identification code of answering in contrast;
Step 102, transmit leg is used the corresponding random length identification code defining in described data dictionary to replace one by one element and the attribute in XML document, realizes the replacement compression of XML document, generates the XML document after compression;
Step 103, transmit leg is used Flate, and the universal compressed algorithm such as LZW compresses XML document and the data dictionary after compressing;
Step 104, take over party uses Flate, and the general decompression algorithm such as LZW is carried out decompress(ion) to XML document and the data dictionary after compressing, and obtains the XML document after data dictionary and compression;
Step 105, take over party replaces the random length identification code in the XML document after received compression according to the element corresponding with corresponding random length identification code and the attribute that define in described data dictionary, realizes the decompress(ion) of XML document.
For clearer description technical scheme of the present invention, below in conjunction with drawings and Examples, describe the present invention.
In the concrete application such as digital books compression, owing to need to expressing geometry layout information and the logic layout information of the page, need to represent with up to a hundred labels the information such as word, word, row, paragraph, row reading direction, alignment, character direction, use language, font, word size, chapter, joint and title; In order to replace corresponding tag name with short as far as possible identification code, the present invention defines the identification code corresponding with each element and attribute by following rule.
For each element in XML document, can adopt 8 bit identification codes to represent, or adopt 16 bit identification codes to represent; Wherein the 1st in high 4 is used for determining whether XML form, if it is puts 1, otherwise sets to 0; The 2nd is used for determining whether element, if it is puts 1, otherwise set to 0; The 3rd is used for determining whether closure element, if it is puts 1, otherwise set to 0; The 4th need to represent identity element by the byte of two 8 for judging whether, if it is put 1, otherwise set to 0; Remaining bit is used for representing this element; Wherein element format is as shown in table 1, if is-two-byte position is 1, represents that next byte is also used for representing this element; If is-two-byte position is 0, represent only by a byte, to represent this element; Like this, 4096 kinds of elements can be represented altogether, most application demand can be met.
Table 1
As shown in table 2, for each attribute in XML document, can adopt 8 bit identification codes to represent, or adopt 16 bit identification codes to represent; Wherein the 1st in high 3 is used for determining whether XML form, if it is puts 1, otherwise sets to 0, and the 2nd is used for determining whether attribute, if it is sets to 0, otherwise puts 1; The 3rd need to represent identity element by the byte of two 8 for judging whether, if it is put 1, otherwise set to 0; Remaining bit is used for representing this attribute; The value of attribute represents with string format, and usings the ending of specific character as string format.In an embodiment of the present invention, using 0x00 as the ending of the string format of the value of attribute, for distinguishing each character string.Wherein attribute format is as shown in table 2, if is-two-byte position is 1, represents that next byte is also used for representing this attribute; If is-two-byte position is 0, represent only by a byte, to represent this attribute; Like this, 8192 kinds of elements can be represented altogether, most application demand can be met.
Table 2
Text in XML document represents with the form of character string, usings the beginning and end of specific character as the string format of text.Specific implementation is: described text is carried out to utf-8 coding, and the character string obtaining is started and ended up with 0x00 with 0x00.
From table 1 and table 2, can obtain the conclusion shown in table 3.When element is used double-byte representation, most-significant byte bit will be expressed as F* or the D* of 16 systems, and wherein F* represents start element, and D* represents closure element; When element is used byte to represent, most-significant byte bit will be expressed as E* or the C* of 16 systems, and wherein E* represents start element, and C* represents closure element; When attribute is used double-byte representation, most-significant byte bit will be expressed as A* or the B* of 16 systems; When attribute is used byte to represent, most-significant byte bit will be expressed as 9* or the 8* of 16 systems.
Table 3
Value for element in XML file, becomes character string by the text by utf-8 code conversion, and usings 0x00 and insert in output character stream as the beginning and end of the string format of text.Value for attribute in XML file, becomes character string by utf-8 code conversion, and usings 0x00 and insert in output character stream as the ending of the string format of the value of attribute.Like this in step 105, by reading character, flow, can differentiate current character belongs to text character string ﹑ attribute character string ﹑ double byte and opens the plain ﹑ byte of first plain ﹑ double word section end unit that begins and open any in first plain ﹑ individual character section end plain ﹑ double byte attribute of unit and byte attribute of beginning, can from data dictionary, find corresponding identification code and replace decompress(ion), to restore original initial XML file.
For each element in XML document, adopt DDD analysis method to determine to adopt 8 bit identification codes to represent, or adopt 16 bit identification codes to represent; Specific implementation adopts the method for two-pass scan, first pass counts the access times of each element, by access times the most frequently 16 elements with 8 bit identification codes, represent, for other elements, with 16 bit identification codes, represent, in embodiment, adopt the access times of element of DDD analysis method statistics as shown in table 4.
Figure 330664DEST_PATH_IMAGE007
Table 4
Except decide this element according to usage frequency, be adopt 8 bit identification codes to represent or adopt 16 bit identification codes represent, can also calculate the value that each element expends byte number, i.e. use expends byte number analytic approach and decides 8 bit identification codes of employing represent or adopt 16 bit identification codes to represent, wherein expends byte number and calculates by formula (1).
Expend byte number=element occurrence number * element byte length (1)
The size sequence that expends the value of byte number according to all elements in XML document sorts from small to large, wherein 16 of maximum are expended 16 elements that byte number is maximum corresponding to representing in original XML document, its corresponding element is used 8 bit identification codes to represent, remaining element is used 16 bit identification codes to represent, in embodiment, adopt expend byte number analytic approach statistics element to expend byte number as shown in table 5.
Figure DEST_PATH_IMAGE008
Table 5
For each attribute in XML document, employing DDD analysis method determines to adopt 8 bit identification codes to represent or adopts 16 bit identification codes to represent; Specific implementation adopts the method for two-pass scan, first pass counts the access times of each attribute, by access times the most frequently 16 attributes with 8 bit identification codes, represent, for other attributes, with 16 bit identification codes, represent, no longer enumerate embodiment herein.
Except decide this attribute according to usage frequency, be adopt 8 bit identification codes to represent or adopt 16 bit identification codes represent, can also calculate the value that each attribute expends byte number, i.e. use expends byte number analytic approach and decides 8 bit identification codes of employing represent or adopt 16 bit identification codes to represent, wherein expends byte number and calculates by formula (2).
Expend byte number=attribute occurrence number * attribute byte length (2)
The size sequence that expends the value of byte number according to all properties in XML document sorts from small to large, wherein 32 maximum corresponding expressions are expended 32 attributes that byte number is maximum in original XML document, its corresponding attribute is used 8 bit identification codes to represent, remaining attribute is used 16 bit identification codes to represent, no longer enumerates embodiment herein.
The present invention also provides the compression set of a kind of XML, and as shown in Figure 2, this device comprises: XML read module 201, data dictionary memory module 202, label replacement compression module 203 and universal compressed module 204 for compression; Wherein:
XML read module 201, for reading XML bytes of stream data;
Data dictionary memory module 202 for compression, for save data dictionary;
In described data dictionary, for each element in XML document, in data dictionary, defined a random length identification code of answering in contrast; And for each attribute in XML document, in data dictionary, defined a random length identification code of answering in contrast;
In described data dictionary, for each element in XML document, adopt DDD analysis method to decide and adopt 8 bit identification codes to represent, or adopt 16 bit identification codes to represent.
In described data dictionary, for each element in XML document, employing expends byte number analytic approach and decides 8 bit identification codes of employing to represent, or adopts 16 bit identification codes to represent.
In described data dictionary, for each attribute in XML document, adopt DDD analysis method to decide and adopt 8 bit identification codes to represent, or adopt 16 bit identification codes to represent.
In described data dictionary, for each attribute in XML document, employing expends byte number analytic approach and decides 8 bit identification codes of employing to represent, or adopts 16 bit identification codes to represent.
Label is replaced compression module 203, for the corresponding random length identification code defining according to data dictionary memory module, replaces one by one element and the attribute in XML document, generates the XML document of replacing after compression;
Universal compressed module 204, for the XML document after data dictionary and replacement compression is used to deflate, the universal compressed algorithm such as LZW further compresses, and generates packed data.
Described label is replaced compression module 203, and by the text in XML document, the form with character string represents, usings the beginning and end of specific character as the string format of text.
The present invention also provides the decompression device of a kind of XML, and as shown in Figure 3, this device comprises: working solution die block 301, decompress(ion) are replaced decompression module 303 with data dictionary memory module 302 and label, wherein:
Working solution die block 301, is used deflate for the packed data to receiving, and the general decompression algorithm such as LZW is carried out decompress(ion);
Data dictionary memory module 302 for decompress(ion), for storing data dictionary;
In described data dictionary, for each element in XML document, in data dictionary, defined a random length identification code of answering in contrast; And for each attribute in XML document, in data dictionary, defined a random length identification code of answering in contrast;
In described data dictionary, for each element in XML document, adopt DDD analysis method to decide and adopt 8 bit identification codes to represent, or adopt 16 bit identification codes to represent.
In described data dictionary, for each element in XML document, employing expends byte number analytic approach and decides 8 bit identification codes of employing to represent, or adopts 16 bit identification codes to represent.
In described data dictionary, for each attribute in XML document, adopt DDD analysis method to decide and adopt 8 bit identification codes to represent, or adopt 16 bit identification codes to represent.
In described data dictionary, for each attribute in XML document, employing expends byte number analytic approach and decides 8 bit identification codes of employing to represent, or adopts 16 bit identification codes to represent.
Label is replaced decompression module 303, use the data dictionary of data dictionary memory module storage for decompress(ion), random length identification code in XML document after replacement is compressed is anti-corresponding element and the attribute of replacing with one by one, obtains the original XML document without any information dropout.
In sum, XML document of the prior art shared data traffic in transmitting procedure is very high, and the element and the attribute that comprise are various, technical scheme provided by the invention adopts the data dictionary based on random length identification code first element and attribute in the XML document of needs transmission to be replaced, realize the replacement compression of XML document, transmit leg is transferred to take over party by the XML document of replacing after compression, take over party receives after the XML document after this replacement compression, according to the data dictionary based on random length identification code to anti-replacement of random length identification code of answering with element or Attribute Relative in the XML document receiving, realize the decompress(ion) of XML document, after decompress(ion), just obtain original XML document, technique scheme is not subject to the content constraints in compressed XML document, and the algorithm of compression and decompress(ion) is very simple, less demanding to CPU, can be good at solving XML document data volume in data transmission procedure large, and element and the various problem of attribute.
The compression method of the above-mentioned XML based on random length identification code is also applicable to a plurality of have identical DTD (Document Type Definition) or a plurality of compressions with the XML document of identical XML SCHMEA, by adding up the usage frequency of element and attribute in a plurality of XML document or expending byte number, to a conventional data dictionary of above-mentioned a plurality of XML document definition, make a plurality of XML document share above-mentioned conventional data dictionary and carry out the replacement compression of element in XML document and attribute and replace decompress(ion); As shown in Figure 4, adopt conventional data dictionary compress a plurality of XML document and the advantage of the scheme of decompress(ion) is: when compressing the individual different XML document of N, and N value is when very large, if being respectively each XML document defines different data dictionaries and can expend a large amount of storage spaces, if now realize a plurality of XML document, share a general data dictionary, just can save the required storage space of save data dictionary; Transmit leg only need to be preserved this shared conventional data dictionary, with the random length identification code defining in this data dictionary, replaces one by one element and the attribute in each XML document, can realize the replacement compression of a plurality of XML document; Take over party also only need to preserve this shared conventional data dictionary, uses this data dictionary to anti-replacement of random length identification code of answering with element or Attribute Relative in the XML document receiving, and can realize the replacement decompress(ion) of a plurality of XML document.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of making, be equal to replacement, improvement etc., within all should being included in the scope of protection of the invention.

Claims (8)

1. a compression method of the XML based on random length identification code, is characterized in that: the method comprises:
For each element in XML document, in data dictionary, define a random length identification code of answering in contrast; And for each attribute in XML document, in data dictionary, define a random length identification code of answering in contrast;
Transmit leg is used the corresponding random length identification code defining in described data dictionary to replace one by one element and the attribute in XML document, realizes the replacement compression of XML document;
Take over party replaces the random length identification code in the XML document after received replacement compression according to the element corresponding with corresponding random length identification code defining in described data dictionary and attribute, realizes the decompress(ion) of XML document;
Described for each element in XML document, in data dictionary, define a random length identification code of answering in contrast; And for each attribute in XML document, in data dictionary, define a random length identification code of answering in contrast and comprise:
For each element in XML document, adopt 8 bit identification codes to represent, or adopt 16 bit identification codes to represent; Wherein the 1st in high 4 is used for determining whether XML form, the 2nd is used for determining whether element, the 3rd is used for determining whether closure element, and the 4th need to represent identity element by the byte of two 8 for judging whether, remaining bit is used for representing this element;
For each attribute in XML document, adopt 8 bit identification codes to represent, or adopt 16 bit identification codes to represent; Wherein the 1st in high 3 is used for determining whether XML form, the 2nd is used for determining whether attribute, the 3rd need to represent same attribute by the byte of two 8 for judging whether, remaining bit is used for representing this attribute, and wherein the value of attribute represents with string format.
2. the compression method of a kind of XML based on random length identification code according to claim 1, it is characterized in that: in described data dictionary, for each element in XML document, adopt DDD analysis method to decide and adopt 8 bit identification codes to represent, or adopt 16 bit identification codes to represent.
3. the compression method of a kind of XML based on random length identification code according to claim 1, it is characterized in that: in described data dictionary, for each element in XML document, employing expends byte number analytic approach and decides 8 bit identification codes of employing to represent, or adopts 16 bit identification codes to represent.
4. the compression method of a kind of XML based on random length identification code according to claim 1, it is characterized in that: in described data dictionary, for each attribute in XML document, adopt DDD analysis method to decide and adopt 8 bit identification codes to represent, or adopt 16 bit identification codes to represent.
5. the compression method of a kind of XML based on random length identification code according to claim 1, it is characterized in that: in described data dictionary, for each attribute in XML document, employing expends byte number analytic approach and decides 8 bit identification codes of employing to represent, or adopts 16 bit identification codes to represent.
6. according to the compression method of a kind of XML based on random length identification code described in arbitrary claim in claim 1 to 5, it is characterized in that: described compression method is also applicable to a plurality of have identical DTD or a plurality of compressions with the XML document of identical XML SCHMEA, and wherein said data dictionary is the shared conventional data dictionaries of a plurality of XML document.
7. a compression set of XML, is characterized in that: this device comprises: XML read module, data dictionary memory module, label replacement compression module and universal compressed module for compression; Wherein:
XML read module, for reading XML bytes of stream data;
Data dictionary memory module is used in compression, for save data dictionary;
In described data dictionary, for each element in XML document, in data dictionary, defined a random length identification code of answering in contrast; And for each attribute in XML document, in data dictionary, defined a random length identification code of answering in contrast;
Label is replaced compression module, for the corresponding random length identification code defining according to data dictionary memory module, replaces one by one element and the attribute in XML document, generates the XML document of replacing after compression;
Universal compressed module, for using universal compressed algorithm further to compress to the XML document after data dictionary and replacement compression, generates packed data.
8. a decompression device of XML, is characterized in that: this device comprises: working solution die block, decompress(ion) are replaced decompression module with data dictionary memory module and label, wherein:
Working solution die block, for being used general decompression algorithm to carry out decompress(ion) to the packed data receiving;
Decompress(ion) data dictionary memory module, for storing data dictionary;
In described data dictionary, for each element in XML document, in data dictionary, defined a random length identification code of answering in contrast; And for each attribute in XML document, in data dictionary, defined a random length identification code of answering in contrast;
Label replacement decompression module, is used the data dictionary of data dictionary memory module storage for decompress(ion), and the random length identification code in the XML document after replacement is compressed is anti-corresponding element and the attribute of replacing with one by one, and decompress(ion) obtains original XML document.
CN201310580015.5A 2013-11-19 2013-11-19 XML (extensible markup language) compressing method and device based on flexible-length identification codes Pending CN103605730A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310580015.5A CN103605730A (en) 2013-11-19 2013-11-19 XML (extensible markup language) compressing method and device based on flexible-length identification codes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310580015.5A CN103605730A (en) 2013-11-19 2013-11-19 XML (extensible markup language) compressing method and device based on flexible-length identification codes

Publications (1)

Publication Number Publication Date
CN103605730A true CN103605730A (en) 2014-02-26

Family

ID=50123952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310580015.5A Pending CN103605730A (en) 2013-11-19 2013-11-19 XML (extensible markup language) compressing method and device based on flexible-length identification codes

Country Status (1)

Country Link
CN (1) CN103605730A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216958A (en) * 2014-08-20 2014-12-17 深圳市邦彦信息技术有限公司 Transmission method and device based on structured data
WO2017036348A1 (en) * 2015-09-06 2017-03-09 阿里巴巴集团控股有限公司 Method and device for compressing and decompressing extensible markup language document
CN109241498A (en) * 2018-06-26 2019-01-18 中国建设银行股份有限公司 XML file processing method, equipment and storage medium
CN109450450A (en) * 2018-10-17 2019-03-08 杭州费尔斯通科技有限公司 A kind of compression of JSON data real non-destructive and decompressing method
CN111797596A (en) * 2020-05-18 2020-10-20 冠群信息技术(南京)有限公司 Method and device for compressing and decompressing extensible markup language (XML) document

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101222476A (en) * 2007-01-08 2008-07-16 华为技术有限公司 Expandable markup language file editor, file transferring method and system
CN101420295A (en) * 2008-12-01 2009-04-29 刘江海 Ciphering method for bit reassigning and mutual replacing on different positions of the same byte
CN101901234A (en) * 2009-05-27 2010-12-01 国际商业机器公司 Method and system for converting XML data into resource description framework data
CN102073663A (en) * 2009-11-24 2011-05-25 北大方正集团有限公司 Method and device for rapidly processing XML (Extensible Markup Language) compressed data
CN102096704A (en) * 2010-12-29 2011-06-15 北京新媒传信科技有限公司 XML (extensible markup language) compression method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101222476A (en) * 2007-01-08 2008-07-16 华为技术有限公司 Expandable markup language file editor, file transferring method and system
CN101420295A (en) * 2008-12-01 2009-04-29 刘江海 Ciphering method for bit reassigning and mutual replacing on different positions of the same byte
CN101901234A (en) * 2009-05-27 2010-12-01 国际商业机器公司 Method and system for converting XML data into resource description framework data
US20100306207A1 (en) * 2009-05-27 2010-12-02 Ibm Corporation Method and system for transforming xml data to rdf data
CN102073663A (en) * 2009-11-24 2011-05-25 北大方正集团有限公司 Method and device for rapidly processing XML (Extensible Markup Language) compressed data
CN102096704A (en) * 2010-12-29 2011-06-15 北京新媒传信科技有限公司 XML (extensible markup language) compression method and device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104216958A (en) * 2014-08-20 2014-12-17 深圳市邦彦信息技术有限公司 Transmission method and device based on structured data
CN104216958B (en) * 2014-08-20 2018-01-12 邦彦技术股份有限公司 Transmission method and device based on structured data
WO2017036348A1 (en) * 2015-09-06 2017-03-09 阿里巴巴集团控股有限公司 Method and device for compressing and decompressing extensible markup language document
CN109241498A (en) * 2018-06-26 2019-01-18 中国建设银行股份有限公司 XML file processing method, equipment and storage medium
CN109241498B (en) * 2018-06-26 2023-08-15 中国建设银行股份有限公司 XML file processing method, device and storage medium
CN109450450A (en) * 2018-10-17 2019-03-08 杭州费尔斯通科技有限公司 A kind of compression of JSON data real non-destructive and decompressing method
CN111797596A (en) * 2020-05-18 2020-10-20 冠群信息技术(南京)有限公司 Method and device for compressing and decompressing extensible markup language (XML) document

Similar Documents

Publication Publication Date Title
CN104199927B (en) Data processing method and data processing equipment
US8060652B2 (en) Extensible binary mark-up language for efficient XML-based data communications and related systems and methods
US8862759B2 (en) Multiplexing binary encoding to facilitate compression
CN102571966B (en) Network transmission method for large extensible markup language (XML) document
CN103605730A (en) XML (extensible markup language) compressing method and device based on flexible-length identification codes
CN101346689A (en) A compressed schema representation object and method for metadata processing
US7738717B1 (en) Systems and methods for optimizing bit utilization in data encoding
CN101223699A (en) Methods and devices for compressing and decompressing structured documents
CN105450232A (en) Encoding method, decoding method, encoding device and decoding device
US9128912B2 (en) Efficient XML interchange schema document encoding
CN103236847A (en) Multilayer Hash structure and run coding-based lossless compression method for data
CN102096704B (en) XML (extensible markup language) compression method and device
CN101729075A (en) Data compression method, data compression device, data decompression method and data decompression device
CN102880703B (en) Chinese web page data encoding, coding/decoding method and system
CN102541926B (en) Data exchange processing method, equipment and system
CN101534124B (en) Compression algorithm for short natural language
CN104734722A (en) Data compression method and data decompression device
CN104021121A (en) Method, device and server for compressing text data
CN103210590A (en) Compression method and apparatus
Mahmood et al. A feasible 6 bit text database compression scheme with character encoding (6BC)
US9235610B2 (en) Short string compression
CN103138766A (en) Method and device of compression and decompression of data
CN101465902B (en) Compression communication method of mobile phone short message
CN105183750B (en) Close-coupled XML resolution system
US20050138545A1 (en) Efficient universal plug-and-play markup language document optimization and compression

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
C53 Correction of patent of invention or patent application
CB02 Change of applicant information

Address after: Business street high tech Zone 030006 in Shanxi province Taiyuan City No. 19 big innovation park room 1407

Applicant after: Shanxi Yun Tuo Technology Co., Ltd.

Applicant after: University of Shanghai for Science and Technology

Address before: Business street high tech Zone 030006 in Shanxi province Taiyuan City No. 19 big innovation park room 1407

Applicant before: Shanxi Sanheng Automation Equipment Co., Ltd.

Applicant before: University of Shanghai for Science and Technology

COR Change of bibliographic data

Free format text: CORRECT: APPLICANT; FROM: SHANXI SANHENG AUTOMATION EQUIPMENT CO., LTD. TO: SHANXI YUNTUO TECHNOLOGY CO., LTD.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140226