CN108829872B

CN108829872B - Method, device, system and storage medium for rapidly processing lossless compressed file

Info

Publication number: CN108829872B
Application number: CN201810657224.8A
Authority: CN
Inventors: 王防修
Original assignee: Wuhan Polytechnic University
Current assignee: Wuhan Polytechnic University
Priority date: 2018-06-22
Filing date: 2018-06-22
Publication date: 2021-03-09
Anticipated expiration: 2038-06-22
Also published as: CN108829872A

Abstract

The invention discloses a method, equipment, a system and a storage medium for rapidly processing a lossless compressed file. The processing equipment of the invention obtains all characters to be processed of a source file to be compressed and codes corresponding to the characters to be processed, establishes a mapping relation between the characters to be processed and the codes corresponding to the characters to be processed, replaces the characters to be processed with the codes corresponding to the characters to be processed in the source file to be compressed respectively in the mapping relation, completes the coding of the source file to be compressed, and directly replaces the characters with the codes by establishing one-to-one mapping between the characters and the codes.

Description

Method, device, system and storage medium for rapidly processing lossless compressed file

Technical Field

The present invention relates to the field of file compression technologies, and in particular, to a method, an apparatus, a system, and a storage medium for fast processing a lossless compressed file.

Background

In order to improve the utilization efficiency of the external memory, the stored data file is often required to be compressed. For a lossy compression, the complete information before compression cannot be restored after decompression. However, for some important information, lossless compression must be used so that the decompressed information is the same as the information before compression. First, only files that have redundancy can be losslessly compressed. Secondly, the same source file is compressed, and different compression ratios can be obtained by different encoding methods. However, if the encoding speed in the compression process is too slow, it takes too much waiting time for the user to compress the file. Also, if the speed of decompressing the compressed file is too slow, the user will also wait too long. Therefore, it is very important to research methods for increasing the compression and decompression speed of files.

Under the condition that the software and hardware environments of a computer are not changed, a fast code word query method is required for improving the coding speed in the compression process, and a faster character query method is required to be designed for improving the decompression speed of a compressed file.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide a method, equipment, a system and a storage medium for rapidly processing a lossless compression file, and aims to solve the problem that in the prior art, the encoding and decoding speed is low in the file compression and decompression processes.

In order to achieve the above object, the present invention provides a method for rapidly processing a lossless compressed file, the method comprising the steps of:

acquiring all characters to be processed of a source file to be compressed and codes corresponding to the characters to be processed;

establishing a mapping relation between the character to be processed and the code corresponding to the character to be processed;

and respectively replacing the characters to be processed with the codes corresponding to the characters to be processed in the source file to be compressed in the mapping relation, and completing the coding of the source file to be compressed.

Preferably, after the characters to be processed are replaced by the codes corresponding to the characters to be processed in the source file to be compressed in the mapping relationship, respectively, and the codes of the source file to be compressed are completed, the method further includes:

acquiring a compressed file of the source file to be compressed;

establishing a binary tree based on all codes to be processed of the compressed file and a plurality of characters corresponding to the codes to be processed;

and traversing the binary tree, respectively acquiring characters corresponding to the codes to be processed of the compressed file, and completing the decoding of the compressed file.

Preferably, the acquiring all characters to be processed of the source file to be compressed and the codes corresponding to the characters to be processed specifically includes:

acquiring all characters to be processed of the source file to be compressed, codes corresponding to the characters to be processed and positions of the characters to be processed in the source file to be compressed;

correspondingly, the establishing a mapping relationship between the character to be processed and the code corresponding to the character to be processed specifically includes:

and according to the position of the character to be processed in the source file to be compressed, establishing a mapping relation between the character to be processed and a code corresponding to the character to be processed.

Preferably, the respectively replacing the characters to be processed with the codes corresponding to the characters to be processed in the source file to be compressed in the mapping relationship to complete the coding of the source file to be compressed, specifically including:

reading a current character from the source file to be compressed;

replacing the current character with the code corresponding to the current character in the mapping relation, and judging whether the current character is the last character in the source file to be compressed;

when the current character is the last character in the source file to be compressed, finishing the encoding of the source file to be compressed;

when the current character is not the last character in the source file to be compressed, reading the next character from the source file;

and repeating the steps of replacing the current character by the code corresponding to the current character in the mapping relation and judging whether the current character is the last character in the source file to be compressed until the code of the source file to be compressed is finished.

Preferably, after the mapping relationship is established between the character to be processed and the code corresponding to the character to be processed, the method further includes:

and storing the mapping relation in a memory.

Preferably, after the obtaining of the compressed file of the source file to be compressed, the method further includes:

acquiring characters corresponding to the codes to be processed based on the codes to be processed in the compressed file;

establishing a code table according to the code to be processed and the character corresponding to the code to be processed;

correspondingly, the establishing of the binary tree based on the characters corresponding to the codes to be processed in the compressed file specifically includes:

and establishing a binary tree based on the coding table.

Preferably, traversing the binary tree, respectively obtaining characters corresponding to the to-be-processed code of the compressed file, specifically includes:

reading a current code from the compressed file;

traversing the binary tree, searching for a character corresponding to the current code, and judging whether the current code is the last code in the compressed file;

when the current code is the last code in the compressed file, finishing decoding the compressed file;

reading a next code from the compressed file when the current code is not the last code in the compressed file;

and repeatedly executing the steps of traversing the binary tree, searching for the character corresponding to the current code, and judging whether the current code is the last code in the compressed file until the characters corresponding to all the codes in the compressed file are searched.

Further, to achieve the above object, the present invention also provides a fast processing apparatus for lossless compression files, comprising: a memory, a processor and a fast processing program of lossless compressed files stored on said memory and executable on said processor, said fast processing program of lossless compressed files being configured to implement the steps of the method of fast processing of lossless compressed files as described above.

In addition, to achieve the above object, the present invention provides a system for rapidly processing a lossless compressed file, including: the system comprises an acquisition module, an establishment module and a replacement module;

the acquisition module is used for acquiring all characters to be processed of a source file to be compressed and codes corresponding to the characters to be processed;

the establishing module is used for establishing a mapping relation between the character to be processed and the code corresponding to the character to be processed;

and the replacing module is used for replacing the characters to be processed with the codes corresponding to the characters to be processed in the source file to be compressed in the mapping relation respectively to finish the coding of the source file to be compressed.

In addition, to achieve the above object, the present invention further provides a storage medium having stored thereon a fast processing program of a lossless compression file, the fast processing program of the lossless compression file, when executed by a processor, implementing the steps of the fast processing method of the lossless compression file as described above.

In the invention, a processing device acquires all characters to be processed of a source file to be compressed and codes corresponding to the characters to be processed, a mapping relation is established between the characters to be processed and the codes corresponding to the characters to be processed, the characters to be processed are respectively replaced by the codes corresponding to the characters to be processed in the source file to be compressed in the mapping relation, the codes of the source file to be compressed are completed, the characters are directly replaced by the codes through one-to-one mapping established between the characters and the codes, the codes corresponding to the characters are not required to be searched in the compression process, a large amount of character comparison time is saved, and the processing speed in the file processing process is effectively improved.

Drawings

FIG. 1 is a schematic diagram of a fast processing device for lossless compression of files in a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a first embodiment of a method for fast processing a lossless compressed file according to the present invention;

FIG. 3 is a flowchart illustrating a second embodiment of a method for fast processing of lossless compressed files according to the present invention;

FIG. 4 is a functional block diagram of a first embodiment of a system for fast processing of losslessly compressed files according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a fast processing device for lossless compression of files in a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the apparatus for rapidly processing a lossless compression file may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the architecture shown in FIG. 1 does not constitute a limitation of a fast processing apparatus for lossless compression of files, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating system, a network communication module, a user interface module, and a fast handler for lossless compression of files.

In the apparatus for rapidly processing a lossless compression file shown in fig. 1, the network interface 1004 is mainly used for data communication with an external network; the user interface 1003 is mainly used for receiving input instructions of a user; the apparatus for fast processing of lossless compression files calls a fast processing program of lossless compression files stored in the memory 1005 through the processor 1001, and performs the following operations:

Further, the processor 1001 may call a fast handler of the lossless compression file stored in the memory 1005, and also perform the following operations:

acquiring a compressed file of the source file to be compressed;

reading a current character from the source file to be compressed;

and storing the mapping relation in a memory.

and establishing a binary tree based on the coding table.

reading a current code from the compressed file;

According to the scheme, the processing equipment acquires all characters to be processed of the source file to be compressed and codes corresponding to the characters to be processed, mapping relations are established between the characters to be processed and the codes corresponding to the characters to be processed, the characters to be processed are replaced by the codes corresponding to the characters to be processed in the source file to be compressed in the mapping relations, the source file to be compressed is encoded, the characters are directly replaced by the codes through one-to-one mapping established between the characters and the codes, the codes corresponding to the characters do not need to be searched in the compression process, a large amount of character comparison time is saved, and the processing speed in the file processing process is effectively improved.

Based on the hardware structure, the embodiment of the method for rapidly processing the lossless compression file is provided.

Referring to FIG. 2, FIG. 2 is a flow chart of a first embodiment of the method for fast processing a lossless compressed file according to the present invention.

In a first embodiment, the method for fast processing of lossless compressed files comprises the following steps:

s10: acquiring all characters to be processed of a source file to be compressed and codes corresponding to the characters to be processed.

It can be understood that the source file can be compressed as long as all characters contained in the source file and the number of times that various characters appear in the source file are counted.

For example, there are n different characters C in the source file_i，W_iIs a character C_iThe number of occurrences in a document, after counting the characters contained in said source document, can also be obtained the character C_iCorresponding prefix code word b_i。

S20: and establishing a mapping relation between the character to be processed and the code corresponding to the character to be processed.

In specific implementation, when all characters to be processed of the source file to be compressed and codes corresponding to the characters to be processed are obtained, the positions of the characters to be processed in the source file to be compressed can also be obtained at the same time.

It is understood that after the mapping relationship is established, the mapping relationship may be stored in the memory, so that although a certain storage space is occupied, the encoding speed is increased.

S30: and respectively replacing the characters to be processed with the codes corresponding to the characters to be processed in the source file to be compressed in the mapping relation, and completing the coding of the source file to be compressed.

In a specific implementation, a first character can be read from a source file to be compressed, the current character is replaced by the code corresponding to the current character in the mapping relation, and judging whether the current character is the last character in the source file to be compressed, when the current character is the last character in the source file to be compressed, namely, the encoding of the source file to be compressed, namely, the compression process is completed, when the current character is not the last character in the source file to be compressed, the next character is read from the source file, and repeatedly performing the replacement of the current character with the code corresponding to the current character in the mapping relation, and judging whether the current character is the last character in the source file to be compressed or not until the encoding of the source file to be compressed is finished.

In this embodiment, a processing device obtains all characters to be processed of a source file to be compressed and codes corresponding to the characters to be processed, a mapping relationship is established between the characters to be processed and the codes corresponding to the characters to be processed, the characters to be processed are replaced by the codes corresponding to the characters to be processed in the source file to be compressed in the mapping relationship, the source file to be compressed is encoded, the characters are directly replaced by the codes by establishing one-to-one mapping between the characters and the codes, the codes corresponding to the characters are not required to be searched in the compression process, a large amount of character comparison time is saved, and the processing speed in the file processing process is effectively improved.

Further, as shown in fig. 3, a second embodiment of the method for fast processing a lossless compression file according to the present invention is proposed based on the first embodiment, and in this embodiment, after step S30, the method further includes:

s40: and acquiring a compressed file of the source file to be compressed.

S50: and establishing a binary tree based on a plurality of characters corresponding to all codes to be processed of the compressed file.

It is understood that after the compressed file is obtained, the codes to be processed in the compressed file and the character suggestion code table corresponding to the codes to be processed can be obtained.

Because the codes of the lossless compression file are all prefix codes, namely any one code in the code table cannot be the prefix of other codes, a binary tree can be established according to the codes in the code table.

In a specific implementation, the binary tree is established as follows:

(1) defining the structure of the binary tree: typedef struct node1 × bintree; struch node1{ unidimensioned char ch; bintree lchild, rchild };

(2) applying for a system according to a node t and satisfying t- > lchild ═ l- > rchild ═ NULL;

(3) let i equal to 1 and point to the first codeword in the coding table;

(4) let j equal to 1, point toCode word b_iThe first bit binary number of (a);

(5) let p be t, which means to start the search from the root node;

(6) if b is_ij0 and p->lchild<>NULL, then p->lchild；

(7) If b is_ij0 and p->And if lchild is NULL, applying for a new node q from the system. Simultaneously, the following operations are carried out:

q->lchild＝q->rchild＝NULL；p->lchild＝q；p＝q；

(8) if b is_ij1 and p->rchild<>NULL, then p->rchild；

(9) If b is_ij1 and p->And applying for a new node q from the system if rchild is NULL. Simultaneously, the following operations are carried out:

q->lchild＝q->rchild＝NULL；p->rchild＝q；p＝q；

(10) if j is<l_iThen execute j ═ j +1 and return to step (6)

(11) Performing p->ch＝c_i；

(12) If i < n, executing i-i +1 and returning to the step (4);

(13) the binary tree building is finished.

It will be appreciated that the binary tree is built using the recursive idea: a pointer pointing to a root node is given, then a create function is called recursively, a binary tree is automatically generated, and of course, before establishment, a node structure is defined first. The traversal of the binary tree also adopts the recursive idea that if the nodes have data, the root nodes and the child nodes are traversed according to the traversal rule, if no data exists, the data is returned until all the data are traversed, and the recursion is ended.

Wherein lchild and rchild respectively represent left and right branches of the binary tree, char ch is char date, which defines the meaning of data, and stores the corresponding data into the binary tree one by one>Indicating the direction, after the root node is established, p is t, indicating that the search is started from the root node, assigning t to p, and leading the traversal if p->lchild<>NULL, indicating that the node already exists, then p ═ p->lchild is the assignment of the current node to p, if p->If lchild is NULL, indicating that the node does not exist, applying for a new node q to the system, assigning a value to the new node and assigning p to q, repeating the establishment of the binary tree node and executing p->ci, i.e. characters, are stored in the nodes of the binary tree and passed through b_ij0 and p->lchild<>NULL, then p->lchild and b_ij1 and p->rchild<>NULL, then p->rchild stores all 0's in the binary data in the left branch tree and all 1's in the binary data in the right branch tree.

As can be seen from the above binary tree establishment procedure, all characters are located in the leaf nodes of the binary tree.

S60: and traversing the binary tree, respectively acquiring characters corresponding to the codes to be processed of the compressed file, and completing the decoding of the compressed file.

In a specific implementation, a current code is read from the compressed file; traversing the binary tree, searching for a character corresponding to the current code, and judging whether the current code is the last code in the compressed file; when the current code is the last code in the compressed file, finishing decoding the compressed file; reading a next code from the compressed file when the current code is not the last code in the compressed file; and repeatedly executing the steps of traversing the binary tree, searching for the character corresponding to the current code, and judging whether the current code is the last code in the compressed file until the characters corresponding to all the codes in the compressed file are searched.

The process of implementing the fast decoding of the compressed file by using the established binary tree is as follows:

(1) reading a one-bit binary number b from a compressed file_t；

(2) If the compression side part is read to the end, jumping to the step (9);

(3) let p be t;

(4) if b is_t0, then p ═ p->lchild；

(5) If b is_t1, then p ═ p->rchild；

(6) Repeating steps (1), (2), (3), (4), (5) until p- > lchild ═ p- > rchild

Until 0 is obtained;

(7) writing the character p- > ch into the decompressed file;

(8) returning to the step (1);

(9) the decompression process ends.

As can be known from the binary tree establishing process, all 0's in the binary data are stored in the left branch tree, and all 1's in the binary data are stored in the right branch tree, so in the process of searching characters, the first bit binary number is read out from the compressed file, if the read binary number is 0, all left branch trees of the binary tree are traversed, and if the read binary number is 1, all right branch trees of the binary tree are traversed.

The specific process is that a bit binary code bt is read from a compressed file, a root node t is assigned to p, the traversal query is started from p, and if the bit binary code bt is b_tReturning left node data assignment to p for 0, returning right node data assignment to p if bt is 1, returning p and assigning to ch if traversing left node and right node are all 0, writing ch into compressed file, and ending the process.

As can be seen from the decoding process, the method searches for the characters with the comparison times equal to the length of the code word, thereby reducing the search time.

In this embodiment, a compressed file decoding method based on a binary tree is provided, where a binary tree corresponding to an encoding table is established, and a query of a character corresponding to a codeword in a compressed file is performed by traversing nodes of the binary tree, so that the number of comparison times during search is reduced, and thus the time spent during decompression is reduced.

In order to verify the effect of the encoding and decoding method provided by the present patent, the following description will use the compression and decompression process of a specific file as verification.

The test environment for this lossless compression and decompression is: (1) software development environment: indows 7, Microsoft Visual Studio 2008; (2) softA piece operation environment: windows 7; (3) hardware development environment: dell vostro220PC.

Dual-Core CPU 2.70 GHz; 2GB DDR3SDRAM memory; 320GB SATA (7200RPM) hard disk; (4) hardware operating environment: dell vostro220PC (5) programming language and version number: microsoft Visual C + + 2008.

The source files to be compressed are: word file, size 162304 bytes.

The text file containing the characters and each character C_iNumber of occurrences in a file W_iStatistics are shown in table 1:

table 1 each character C in the file_iAnd the number of occurrences W_i

Taking the compression of the same Word file by Huffman coding, Shannon coding and Vorino coding as an example, the time spent in the coding process by using the code Word searching mode provided by the invention is 5296.532475 microseconds, 5317.198312 microseconds and 5344.752762 microseconds respectively.

Similarly, the time spent in the decoding process by using the binary tree-based character searching mode provided by the invention is 9332.854102 microseconds, 9083.64842 microseconds and 9265.994041 microseconds respectively by decompressing the same compressed file by using the Huffman code, the Shannon code and the Vorono code, and other methods are also adopted for decompressing, so that more time is required for result display.

Therefore, the character code word one-to-one mapping search method can effectively improve the compression speed of the file. Also, the binary tree search character method can improve the file decompression speed more quickly.

Referring to fig. 4, fig. 4 is a functional block diagram of a first embodiment of a system for rapidly processing a lossless compressed file according to the present invention, which is proposed based on a method for rapidly processing a lossless compressed file.

In this embodiment, the system for rapidly processing a lossless compressed file includes: the system comprises an acquisition module 10, an establishment module 20 and a replacement module 30;

the obtaining module 10 is configured to obtain all characters to be processed of a source file to be compressed and codes corresponding to the characters to be processed.

The establishing module 20 is configured to establish a mapping relationship between the character to be processed and the code corresponding to the character to be processed.

The replacing module 30 is configured to replace the characters to be processed with the codes corresponding to the characters to be processed in the source file to be compressed in the mapping relationship, respectively, so as to complete the coding of the source file to be compressed.

In addition, an embodiment of the present invention further provides a storage medium, where the storage medium stores a fast processing program for a lossless compressed file, and when executed by a processor, the fast processing program for the lossless compressed file implements the following operations:

Further, the fast handler for lossless compression of files, when executed by a processor, further performs the following operations:

acquiring a compressed file of the source file to be compressed;

reading a current character from the source file to be compressed;

and storing the mapping relation in a memory.

and establishing a binary tree based on the coding table.

reading a current code from the compressed file;

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method for fast processing of lossless compressed files, the method comprising the steps of:

replacing the characters to be processed with the codes corresponding to the characters to be processed in the source file to be compressed in the mapping relation respectively;

acquiring a compressed file of the source file to be compressed;

acquiring characters corresponding to the codes to be processed based on the codes to be processed in the compressed file; establishing an encoding table according to the code to be processed and the character corresponding to the code to be processed, and establishing a binary tree based on the encoding table;

reading a current code from the compressed file;

and repeatedly executing the steps of traversing the binary tree, searching for the character corresponding to the current code, and judging whether the current code is the last code in the compressed file or not until the characters corresponding to all codes in the compressed file are searched, and decoding the compressed file.

2. The method according to claim 1, wherein the obtaining of all characters to be processed of a source file to be compressed and codes corresponding to the characters to be processed specifically comprises:

3. The method according to claim 1, wherein the step of replacing the characters to be processed with the codes corresponding to the characters to be processed in the source file to be compressed in the mapping relationship to complete the encoding of the source file to be compressed includes:

reading a current character from the source file to be compressed;

4. The method of claim 1, wherein after the mapping relationship between the character to be processed and the code corresponding to the character to be processed is established, the method further comprises:

and storing the mapping relation in a memory.

5. A fast processing apparatus for lossless compression of a file, comprising: memory, a processor and a fast processing program of lossless compressed files stored on the memory and executable on the processor, the fast processing program of lossless compressed files being configured to implement the steps of the method of fast processing of lossless compressed files according to any of claims 1 to 4.

6. A system for fast processing of lossless compressed files, the system comprising: the system comprises an acquisition module, an establishment module and a replacement module;

the replacing module is used for replacing the characters to be processed with the codes corresponding to the characters to be processed in the source file to be compressed in the mapping relation respectively to finish the coding of the source file to be compressed;

the system for fast processing of lossless compressed files is configured to implement the steps of the method for fast processing of lossless compressed files according to any one of claims 1 to 4.

7. A storage medium, characterized in that the storage medium has stored thereon a fast processing program of lossless compression file, which when executed by a processor implements the steps of the method of fast processing of lossless compression file according to any one of claims 1 to 4.