CN103618554A - Internal storage page compression method based on dictionary - Google Patents

Internal storage page compression method based on dictionary Download PDF

Info

Publication number
CN103618554A
CN103618554A CN201310643898.XA CN201310643898A CN103618554A CN 103618554 A CN103618554 A CN 103618554A CN 201310643898 A CN201310643898 A CN 201310643898A CN 103618554 A CN103618554 A CN 103618554A
Authority
CN
China
Prior art keywords
new
byte
characters
length
record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310643898.XA
Other languages
Chinese (zh)
Other versions
CN103618554B (en
Inventor
宋彬
裴远
宋秉玺
李慧玲
甄立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201310643898.XA priority Critical patent/CN103618554B/en
Publication of CN103618554A publication Critical patent/CN103618554A/en
Application granted granted Critical
Publication of CN103618554B publication Critical patent/CN103618554B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses an internal storage page compression method based on a dictionary in the technical field of data processing. The main purpose is to solve the problem that an existing compression method is low in internal storage page compression speed. The internal storage page compression method is characterized in that four bytes serve as a basic unit to compress and decompress integral storage page data; a new hash function and a compression format suitable for compressing an internal storage page are designed. The dictionary is a hash table to which a key value is used for getting access, the four bytes are read in from an input data flow, the first two bytes are used for being subject to exclusive OR, so that a new byte A is obtained, the last two bytes are used for being subject to exclusive OR, so that a new byte B is obtained, and two low-order bits of the A and two high-order bits of the B are used for being subject to exclusive OR, so that the key value of fourteen bits is obtained. The new compression format is that first four bits of the first byte are used for recording the length of repetition characters, and last four bits are used for recording the length of new four characters. The length of remaining new four characters is recorded from the second byte, and then the new four characters are recorded. The length and anaphora distance of remaining repetition four bytes in the internal storage page are then recorded. Coding is easy, and decoding is rapid.

Description

Memory pages compression method based on dictionary
Technical field
The invention belongs to technical field of data processing, relate to the data compression method of device memory.The present invention adopts new data compression format to improve the speed of compression according to the feature of internal storage data when data compression, can be used in the embedded mobile device of memory-limited.
Background technology
In recent years, along with the development of mobile Internet, mobile device more and more becomes a kind of means of communication that people are indispensable.Due to the memory-limited of mobile device, if can its internal storage data be compressed, economize out memory headroom, can improve the overall performance of equipment.The continuous growth of modern society's amount of information, people also have higher requirement to the performance of computer system, as higher speed, lower power consumption, less volume, can the more information of access etc.In order to reach various performance requirements above, people have proposed various improved methods.Wherein, less expensive one of improve one's methods is data compression technique.
Lempel and Ziv have proposed a kind of high efficiency undistorted compression technology in 1977, be LZ77, the cardinal principle of this compression algorithm is the repetition word string of utilizing shorter mark representative to occur above, and tag format is (repeat length refers to back distance), as abcdekabcdeha, can be encoded into abcdek (5,6) ha and represent, so on the whole, shorter information replaces longer information, thereby has reached the effect of compression.Nineteen eighty-two, James Storer and Thomas Szymanski improve algorithm on LZ77 basis, have improved compression ratio, have proposed LZSS algorithm.Lempel-Ziv-Oberhumer improved algorithm again on the basis of LZSS afterwards, had improved compression speed, had proposed LZO algorithm.LZO algorithm is a kind of harmless data compression algorithm based on dictionary, has that compression speed is fast, the feature of instantaneity.This algorithm is according to repeat character (RPT) number and refer to back that distance has designed five kinds of compressed formats, by these five kinds of different forms of first byte size discrimination of compressed format.Its key step is the length that (1) reads internal storage data and the internal storage data of mobile device; (2) judge whether institute's read data is new data, if institute's read data is not recorded in dictionary, is judged to new data, and new data is charged in dictionary, continues to read internal storage data, until there is not new data; (3) if institute's read data has been recorded in dictionary, according to the length of repeating data with refers to back apart from carrying out compressed encoding; (4) judge that whether coding site is internal storage data ending, if data and the data length after output squeezing, and record end flag, otherwise return to step (2), continue to read in new data.The weak point that the method exists is, the main flow system that current 32 systems are computers, consider the impact of internal memory alignment, the data overwhelming majority in internal memory be take 4 bytes and is write as unit, and LZO algorithm is to take a byte to be applicable to compression memory data completely as unit, this will spend the more time; LZO initial designs object is the indefinite data of reduction length, and for the memory pages of 4K size, the compressed format of LZO is also inapplicable.
Summary of the invention
The object of the invention is to overcome the deficiency of above-mentioned prior art, proposed a kind of memory pages compression method based on dictionary, can compress faster and decompression memory pages, thus the delay of minimizing EMS memory data access.
Realizing technical scheme of the present invention is: according to the data characteristics of memory pages, design the compressed format (decompressed format is identical) of a kind of new hash function and memory pages, the nybble of take carries out compressed encoding and decompression as elementary cell to memory pages, and concrete steps are as follows:
(1) read internal storage data in mobile device and the length of internal storage data;
(2) judge whether institute's read data is new data, if institute's read data is not recorded in dictionary, is judged to new data, and this new data is charged in dictionary, continues to read internal storage data, until there is not new data;
(3) if institute's read data has been recorded in dictionary, institute's read data is carried out to compressed encoding and decompression by new compressed format;
(4) judge whether to be encoded to internal storage data ending, if data and the data length after output squeezing, and record end flag, otherwise return to step (2), continue to read in new data;
Dictionary in described step (2) is the Hash table of directly accessing according to key value, key value is to calculate by hash function, being designed to of hash function: read in four bytes from input traffic, by the first two byte, do xor operation and obtain new byte A, by latter two byte, do xor operation and obtain new byte B, with low level 2 bits of new byte A and a high position 2 bits of B, do the key value that xor operation obtains 14 bits;
New compressed format in described step (3) be take nybble as elementary cell is to memory pages compressed encoding and decoding, and its form is:
1) front 4 bits record of first byte repeats the length of four characters, and rear 4 bits record the length of new four characters;
2) since second byte, record remaining new four character lengths, then record new four characters;
3) in step 2) after new four character records complete, then record the length of remaining repetition four characters of memory pages and refer to back distance.Refer to back that distance is the distance between last times four character position recording in the position of current repetition four characters and Hash table.
In the present invention, the compression encoding process of memory pages is described below:
1.1) first with 4 bits after first byte, record the length of new four characters, if new four character lengths are greater than 14, after first byte, 4 bits serve as a mark with 15, since second byte, record remaining new four character lengths, if remaining new four character lengths are greater than 255, record a byte 0 and length is subtracted to 255, until remain new four character lengths, being less than 255, recording this and remain new four character lengths;
1.2) in step 1.1) after new four character lengths have recorded, record new four characters;
1.3) with the front 4 bit records of first byte, repeat the length of four characters, if repeat four character lengths, be greater than 14, with front 4 bits of first byte, with 15, serve as a mark, then record remaining repetition four character lengths.If remaining repetition four character lengths are greater than 255, record a byte 0 and length is subtracted to 255, until repeating four character lengths, residue is less than 255, record this residue and repeat four character lengths;
1.4) in step 1.3) complete after, the finger that record repeats four characters returns distance.
In the present invention, the decompression process of memory pages is described below:
2.1) read the first byte of compressed format, after judgement first byte, the size of 4 bits, if be less than 15 for the length of new four characters, and exports new four characters; If equal 15, new four character lengths add 14, since second byte, if 0 new four character length of byte add 255, until the byte of reading is non-zero, new four character lengths are added to this non-zero byte, and export new four characters;
2.2) determining step 2.1) in the size of front 4 bits of first byte, if be less than 15 for the length of repetition four characters; Otherwise, if equal 15, repeat four character lengths and add 14, continue to read, if byte is 0, repeats four character lengths and add 255, until the byte of reading is non-zero, repetition four character lengths are added to this non-zero byte;
2.3) read last byte of compressed format, the finger that is repetition four characters returns distance, and according to the length that repeats four characters, output repeats four characters.
Compared with prior art, tool of the present invention has the following advantages:
Compare with current LZO lossless compression algorithm, new compressed format of the present invention is simple, very fast to the compression of internal memory page data and decompression speed, compression ratio is substantially suitable simultaneously, can significantly improve the operational efficiency of mobile device, test result also proves that compression time and decompression time all improved 60%.
Accompanying drawing explanation
Fig. 1 is the compression and decompression format chart in the present invention;
Fig. 2 is compression process figure of the present invention;
Fig. 3 is decompression flow process figure of the present invention;
Embodiment
Below in conjunction with Fig. 1, compression and decompression form of the present invention is described in further detail:
1) with front 4 bits record of first byte, repeat the length of four characters, rear 4 bits record the length of new four characters;
2) if new four character lengths are greater than 14, after first byte, 4 bits serve as a mark with 15, since second byte, record remaining new four character lengths.If remaining new four character lengths are greater than 255, record a byte 0 length and deduct 255, until remain new four character lengths, be less than 255, record this simultaneously and remain new four character lengths, after new four character lengths have recorded, then record new four characters;
3) if repeat four character lengths, be less than or equal to 14, with front 4 bits of first byte in step 1), represent repetition four character lengths; If repeat four character lengths, be greater than 14, front 4 bits of first byte serve as a mark with 15, then record remaining repetition four character lengths, if remaining repetition four character lengths are greater than 255, record a byte 0 length and deduct 255, until residue repeats four character lengths, be less than 255, record this residue simultaneously and repeat four character lengths;
4), after step 3) completes, the finger that record repeats four characters returns distance.
Below in conjunction with Fig. 2, the implementation process of compressed encoding of the present invention is described in further detail:
Step 1: read in four characters from input traffic, do Hash operation for the first time, enter step 2;
Step 2: whether the position that judges nybble is legal, enters step 3 if legal, if illegal renewal Hash table returns to step 1;
Step 3: whether the data that judge position that Hash table is deposited with whether read in four characters identical, enter step 6, if difference enters step 4 if identical;
Step 4: do Hash operation for the second time, judge that whether four character positions are legal, enter step 5 if legal, if illegal renewal Hash table returns to step 1;
Step 5: whether the data that judge position that Hash table is deposited with whether read in four characters identical, enter step 6, if different update Hash table returns to step 1 if identical;
Step 6: calculate new four character lengths, judge whether to be longer than 14, enter step 7 if be longer than, otherwise directly use first byte record, enter step 8;
Step 7: judge whether new four character lengths are longer than 255, if be recorded as a byte 0, length subtracts 255 simultaneously, until new four character lengths are less than 255, finally new four character lengths of record residue, carry out step 8;
Step 8: record new four character datas, enter step 9;
Step 9: calculate and repeat four character numbers, judge whether to be longer than 14, if enter step 10, otherwise directly use first byte record, enter step 10;
Step 10: judgement repeats four character lengths and whether is longer than 255, if be recorded as a byte 0, length subtracts 255 simultaneously, until repeat four character numbers, is less than 255, finally record residue repeats four character lengths, enters step 11;
Step 11: calculate with record and refer to back distance; Judge whether to be encoded to ending, if record remains new four characters, output encoder length; Otherwise enter step 1.
The implementation process decompressing below in conjunction with 3 couples of the present invention of accompanying drawing is described in further detail:
Step 1: from input traffic, read in a byte, judge after this byte whether 4 bits are 15, if the step 2 of entering, otherwise the size that rear four bits represent is new four character lengths, enters step 5;
Step 2: new four character lengths add 14;
Step 3: judge whether next byte is 0, if new four character lengths add 255, until the byte of reading is non-zero, then enter step 4;
Step 4: new four character lengths add remaining new four character lengths, enter step 5;
Step 5: according to new four character lengths, write new four characters, enter step 6;
Step 6: whether front four bits of first byte that judgement is read in are 15, if the step 7 of entering, otherwise the size that rear 4 bits represent is repetition four character lengths, enters step 10;
Step 7: repeat four character numbers and add 14;
Step 8: judge whether next byte is 0, adds 255 if repeat four character lengths, until the byte of reading is non-zero, enters step 9;
Step 9: repeat four character lengths and add remaining repetition four character lengths, enter step 10;
Step 10: calculate and refer to back distance, according to repeating four character lengths, write repetition four characters, enter step 11;
Step 11: judge whether to be encoded to ending, if so, output encoder length; Otherwise enter step 1.
Step 12: if output page-size represents normal decoder, if not, output error.
Below in conjunction with following table, effect of the present invention is described further:
This experiment adopts C language to write the compression method that invention proposes, and by comparing the present invention and the compression effectiveness of traditional LZO dictionary method to internal storage data page, the advantage that the inventive method compression speed is fast is described.LZO is current best lossless compression method.The internal storage data that this experiment adopts is the internal storage data page of the 4K size of typical mobile device, in VS2010 programming development environment test result:
Figure BDA0000427671130000051
Table 1
Test usage data is memory pages compressed package, and compressed package size is 256M.Time in table is compression time and the decompression time of all memory pages of whole compressed package, in form, data are to have moved the result being averaged for 100 times, compression time and decompression time have all improved 60% as can be seen from the above table, completed the project indicator, compression ratio loss is 5.12%, concrete numerical value is for using LZO algorithm can be compressed to 96M left and right, and use the present invention can be compressed to 109M left and right.Therefore the fast access for internal storage data, exchanges the compression time of a times for the compression stroke of 10M and conciliates compression time and be worth.

Claims (3)

1. the memory pages compression method based on dictionary, designs the compressed format of a kind of new hash function and memory pages, and the nybble of take carries out compressed encoding and decompression as elementary cell to memory pages coding, and concrete steps are as follows:
(1) read internal storage data in mobile device and the length of internal storage data;
(2) judge whether institute's read data is new data, if institute's read data is not recorded in dictionary, is judged to new data, and new data is charged in dictionary, continues to read internal storage data, until there is not new data;
(3) if institute's read data has been recorded in dictionary, institute's read data is carried out to compressed encoding and decompression by new compressed format;
(4) judge that whether coding site is internal storage data ending, if the data after output squeezing and the length of data, and record end flag, otherwise return to step (2), continue to read in new data;
Dictionary in described step (2) is the Hash table of directly accessing according to key value, key value is to calculate by hash function, being designed to of hash function: read in four bytes from input traffic, by the first two byte, do xor operation and obtain new byte A, by latter two byte, do xor operation and obtain new byte B, with low level 2 bits of new byte A and a high position 2 bits of B, do the key value that xor operation obtains 14 bits;
New compressed format in described step (3) be take nybble as elementary cell is to memory pages compressed encoding and decoding, and its form is:
1) front 4 bits record of first byte repeats the length of four characters, and rear 4 bits record the length of new four characters;
2) since second byte, record remaining new four character lengths, then record new four characters;
3) in step 2) after new four character records complete, then record the length of remaining repetition four characters of memory pages and refer to back distance.
2. the memory pages compression method based on dictionary according to claim 1, is characterized in that: the compression encoding process of memory pages is described below:
2.1) first with 4 bits after first byte, record the length of new four characters, if new four character lengths are greater than 14, after first byte, 4 bits serve as a mark with 15, since second byte, record remaining new four character lengths, if remaining new four character lengths are greater than 255, record a byte 0 and length is subtracted to 255, until remain new four character lengths, being less than 255, recording this and remain new four character lengths;
2.2) in step 2.1) after new four character lengths have recorded, record new four characters;
2.3) with the front 4 bit records of first byte, repeat the length of four characters, if repeat four character lengths, be greater than 14, with front 4 bits of first byte, with 15, serve as a mark, then record remaining repetition four character lengths.If remaining repetition four character lengths are greater than 255, record a byte 0 length and subtract 255, until repeating four character lengths, residue is less than 255, record this residue and repeat four character lengths;
2.4) in step 2.3) complete after, the finger that record repeats four characters returns distance.
3. the memory pages compression method based on dictionary according to claim 1, is characterized in that: the decompression process of memory pages is described below:
3.1) read the first byte of compressed format, after judgement first byte, the size of 4 bits, if be less than 15 for the length of new four characters, exports new four characters; If equal 15, new four character lengths add 14, since second byte, if 0 new four character length of byte add 255, until the byte of reading is non-zero, new four character lengths are added to this non-zero byte, export new four characters;
3.2) determining step 3.1) in the size of front 4 bits of first byte, if be less than 15 for the length of repetition four characters; If equal 15, repeat four character lengths and add 14, continue to read, if byte is 0, repeats four character lengths and add 255, until the byte of reading is non-zero, repetition four character lengths are added to this non-zero byte;
3.3) read last byte of compressed format, the finger that is repetition four characters returns distance, and according to the length that repeats four characters, output repeats four characters.
CN201310643898.XA 2013-12-01 2013-12-01 Memory pages compression method based on dictionary Expired - Fee Related CN103618554B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310643898.XA CN103618554B (en) 2013-12-01 2013-12-01 Memory pages compression method based on dictionary

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310643898.XA CN103618554B (en) 2013-12-01 2013-12-01 Memory pages compression method based on dictionary

Publications (2)

Publication Number Publication Date
CN103618554A true CN103618554A (en) 2014-03-05
CN103618554B CN103618554B (en) 2016-07-06

Family

ID=50169258

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310643898.XA Expired - Fee Related CN103618554B (en) 2013-12-01 2013-12-01 Memory pages compression method based on dictionary

Country Status (1)

Country Link
CN (1) CN103618554B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104378119A (en) * 2014-12-09 2015-02-25 西安电子科技大学 Quick lossless compression method for file system data of embedded equipment
CN104410424A (en) * 2014-11-26 2015-03-11 西安电子科技大学 Quick lossless compression method of memory data of embedded device
CN106533450A (en) * 2016-11-14 2017-03-22 国网北京市电力公司 PMS coding compression method and device
CN108011952A (en) * 2017-12-01 2018-05-08 北京奇艺世纪科技有限公司 A kind of method and apparatus for obtaining compression dictionary

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5627995A (en) * 1990-12-14 1997-05-06 Alfred P. Gnadinger Data compression and decompression using memory spaces of more than one size
CN103258030A (en) * 2013-05-09 2013-08-21 西安电子科技大学 Mobile device memory compression method based on dictionary encoding and run-length encoding

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5627995A (en) * 1990-12-14 1997-05-06 Alfred P. Gnadinger Data compression and decompression using memory spaces of more than one size
CN103258030A (en) * 2013-05-09 2013-08-21 西安电子科技大学 Mobile device memory compression method based on dictionary encoding and run-length encoding

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
IWATA K,ARIMURA M,SHIMA Y: "An Improvement in Lossless Data Compression Via Substring Enumeration", 《COMPUTER AND INFORMATION SCIENCE,2011,IEEE/ACIS 10TH INTERNATIONAL CONFERENCE》, 18 May 2011 (2011-05-18), pages 219 - 223, XP032024581, DOI: doi:10.1109/ICIS.2011.41 *
KREFT S,NAVARRO G: "Self-indexing based on LZ77", 《COMBINATORIAL PATTERN MATCHING 22ND ANNUAL SYMPOSIUM CPM 2011 PALERMO,ITALY,PROCEEDINGS》, 29 June 2011 (2011-06-29), pages 41 - 54, XP047024427 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104410424A (en) * 2014-11-26 2015-03-11 西安电子科技大学 Quick lossless compression method of memory data of embedded device
CN104410424B (en) * 2014-11-26 2017-06-16 西安电子科技大学 The fast and lossless compression method of embedded device internal storage data
CN104378119A (en) * 2014-12-09 2015-02-25 西安电子科技大学 Quick lossless compression method for file system data of embedded equipment
CN104378119B (en) * 2014-12-09 2017-06-13 西安电子科技大学 The fast and lossless compression method of file system of embedded device data
CN106533450A (en) * 2016-11-14 2017-03-22 国网北京市电力公司 PMS coding compression method and device
CN106533450B (en) * 2016-11-14 2019-05-24 国网北京市电力公司 PMS code compression method and device
CN108011952A (en) * 2017-12-01 2018-05-08 北京奇艺世纪科技有限公司 A kind of method and apparatus for obtaining compression dictionary

Also Published As

Publication number Publication date
CN103618554B (en) 2016-07-06

Similar Documents

Publication Publication Date Title
CN103236847B (en) Based on the data lossless compression method of multilayer hash data structure and Run-Length Coding
CN103258030B (en) Based on the mobile device memory compression methods that dictionary and brigade commander are encoded
US9077368B2 (en) Efficient techniques for aligned fixed-length compression
WO2019153700A1 (en) Encoding and decoding method, apparatus and encoding and decoding device
CN104410424B (en) The fast and lossless compression method of embedded device internal storage data
CN112953550B (en) Data compression method, electronic device and storage medium
CN100517979C (en) Data compression and decompression method
US10187081B1 (en) Dictionary preload for data compression
CN103618554B (en) Memory pages compression method based on dictionary
CN101667843B (en) Methods and devices for compressing and uncompressing data of embedded system
US20100274926A1 (en) High-speed inline data compression inline with an eight byte data path
US9479194B2 (en) Data compression apparatus and data decompression apparatus
CN110021369B (en) Gene sequencing data compression and decompression method, system and computer readable medium
US20200186165A1 (en) Hardware friendly data compression
CN103685589A (en) Binary coding-based domain name system (DNS) data compression and decompression methods and systems
JP2021527376A (en) Data compression
CN107565970B (en) Hybrid lossless compression method and device based on feature recognition
CN104378119A (en) Quick lossless compression method for file system data of embedded equipment
CN108810553A (en) A kind of mobile node monitoring data sequence compaction method based on LS-SVM sparseness
JP2017530592A (en) Adaptation rate compression hash processing device
CN113312325B (en) Track data transmission method, device, equipment and storage medium
CN110021368B (en) Comparison type gene sequencing data compression method, system and computer readable medium
WO2021103013A1 (en) Methods for data encoding and data decoding, device, and storage medium
CN104682966A (en) Non-destructive compressing method for list data
Konecki et al. Efficiency of lossless data compression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160706

Termination date: 20211201