CN113746487B - Data compression method and device, electronic equipment and storage medium - Google Patents

Data compression method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113746487B
CN113746487B CN202110981416.6A CN202110981416A CN113746487B CN 113746487 B CN113746487 B CN 113746487B CN 202110981416 A CN202110981416 A CN 202110981416A CN 113746487 B CN113746487 B CN 113746487B
Authority
CN
China
Prior art keywords
node
data
huffman tree
leaf
compressed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110981416.6A
Other languages
Chinese (zh)
Other versions
CN113746487A (en
Inventor
秦臻
张永兴
徐江波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Original Assignee
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd filed Critical Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority to CN202110981416.6A priority Critical patent/CN113746487B/en
Publication of CN113746487A publication Critical patent/CN113746487A/en
Application granted granted Critical
Publication of CN113746487B publication Critical patent/CN113746487B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application discloses a data compression method, a data compression device, an electronic device and a computer readable storage medium, wherein the method comprises the following steps: acquiring data to be compressed, constructing a Huffman tree corresponding to the data to be compressed, and judging whether the maximum depth of the current Huffman tree is larger than a preset depth; if yes, determining a first leaf node with the layer number smaller than the preset depth and closest to the preset depth and one or more second leaf nodes with the deepest layer number in the current Huffman tree; merging the first leaf node and one or more second leaf nodes into a subtree, and merging the subtree to the position of the first leaf node in the current Huffman tree; and (3) repeating the steps until the maximum depth of the current Huffman tree is smaller than or equal to the preset depth, and compressing the data to be compressed by using the current Huffman tree to obtain compressed data. The method ensures that the maximum depth of the generated Huffman tree does not exceed the preset depth, thereby improving the data compression efficiency.

Description

Data compression method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of data processing technology, and more particularly, to a data compression method and apparatus, and an electronic device and a computer readable storage medium.
Background
In recent years, with the continuous development of science and technology, data information presents an explosion-time growth. The compressed data provides a method for improving the performance of the computer from a new angle, and has important practical significance for the research of the compressed data.
The algorithm of dynamic Huffman coding is used for carrying out variable length coding on data processing by counting the occurrence times of characters, constructing a Huffman binary tree by utilizing the occurrence times of the characters, coding from leaf nodes to root nodes of the tree, replacing shorter codes by more characters, replacing shorter codes by longer codes by fewer characters, and realizing the overall size reduction of the data and achieving the purpose of compression.
However, huffman tree shapes vary greatly due to different complexity of data information or different lengths of data blocks. However, for compression and coding efficiency, a maximum code length, i.e., a maximum depth of the huffman tree, is generally set. In the huffman tree generation process, the depth cannot be ensured to be not more than the specified depth, which results in lower data compression efficiency.
Therefore, how to ensure that the maximum depth of the generated huffman tree does not exceed the preset depth, and further improve the data compression efficiency is a technical problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The application aims to provide a data compression method and device, electronic equipment and a computer readable storage medium, which ensure that the maximum depth of a generated Huffman tree does not exceed a preset depth, thereby improving the data compression efficiency.
In order to achieve the above object, the present application provides a data compression method, including:
acquiring data to be compressed, constructing a Huffman tree corresponding to the data to be compressed, and judging whether the maximum depth of the current Huffman tree is larger than a preset depth;
if yes, determining a first leaf node with the layer number smaller than the preset depth and closest to the preset depth and one or more second leaf nodes with the deepest layer number in the current Huffman tree; the number of the second leaf nodes is determined based on the number of layers of the first leaf nodes from the preset depth;
merging the first leaf node and one or more second leaf nodes into a subtree, and merging the subtree to the position of the first leaf node in the current Huffman tree;
and re-entering the step of judging whether the maximum depth of the current Huffman tree is larger than the preset depth or not until the maximum depth of the current Huffman tree is smaller than or equal to the preset depth, and compressing the data to be compressed by using the current Huffman tree to obtain compressed data.
Wherein, still include:
determining the number N of the second leaf nodes based on the number N of layers of the first leaf nodes from the preset depth; wherein n=2 n -1。
Wherein merging the first leaf node, one or more of the second leaf nodes into a subtree comprises:
generating N intermediate nodes based on the number N of second leaf nodes;
and merging the N intermediate nodes, the first leaf nodes and the N second leaf nodes into a subtree, wherein the first leaf nodes and the N second leaf nodes are leaf nodes in the merged subtree.
Wherein after merging the subtrees to the position of the first leaf node in the current huffman tree, the method further comprises:
judging whether a target leaf node with the layer number larger than the preset depth and no brother node exists in the current Huffman tree;
if yes, the target leaf node is moved to the position of the father node of the target leaf node.
The constructing the huffman tree corresponding to the data to be compressed includes:
extracting all characters in the data to be compressed, and counting the occurrence times of each character in the characters;
generating corresponding nodes for each character, and adding the nodes corresponding to each character into a node set; the numerical value of each node is the number of times that the corresponding character appears in the data to be compressed;
a first node with the smallest value and a second node with the smallest value are taken out from a current node set, a father node is established for the first node and the second node, and the value of the father node is the sum of the values of the first node and the second node;
generating a subtree comprising the first node, the second node and the father node, wherein the first node is a left child node and the second node is a right child node;
judging whether the current node set is empty or not;
if not, adding the father node into the current node set, and re-entering the step of taking out the first node with the smallest value and the second node with the second smallest value from the current node set until the current node set is empty, and completing the construction of the Huffman tree corresponding to the data to be compressed.
The compressing the data to be compressed by using the current huffman tree to obtain compressed data includes:
and determining the Huffman coding of each character in the data to be compressed according to the current Huffman tree, and replacing each character in the data to be compressed with the corresponding Huffman coding to obtain compressed data.
The determining, according to the current huffman tree, the huffman code of each character in the data to be compressed includes:
and determining the Huffman code of the character corresponding to each leaf node according to the path from the root node to each leaf node in the current Huffman tree.
To achieve the above object, the present application provides a data compression apparatus comprising:
the construction module is used for acquiring data to be compressed and constructing a Huffman tree corresponding to the data to be compressed;
the judging module is used for judging whether the maximum depth of the current Huffman tree is larger than the preset depth; if yes, starting the working flow of the first determining module, and if not, starting the working flow of the compressing module;
the first determining module is configured to determine, in a current huffman tree, a first leaf node with a layer number smaller than the preset depth and closest to the preset depth, and one or more second leaf nodes with the deepest layer number, and start a workflow of the merging module; the number of the second leaf nodes is determined based on the number of layers of the first leaf nodes from the preset depth;
the merging module is configured to merge the first leaf node and one or more second leaf nodes into a subtree, merge the subtree into a position where the first leaf node is located in a current huffman tree, and restart the workflow of the judging module;
the compression module is used for compressing the data to be compressed by using the current Huffman tree to obtain compressed data.
To achieve the above object, the present application provides an electronic device including:
a memory for storing a computer program;
and a processor for implementing the steps of the data compression method as described above when executing the computer program.
To achieve the above object, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the data compression method as described above.
According to the scheme, the data compression method provided by the application comprises the following steps: acquiring data to be compressed, constructing a Huffman tree corresponding to the data to be compressed, and judging whether the maximum depth of the current Huffman tree is larger than a preset depth; if yes, determining a first leaf node with the layer number smaller than the preset depth and closest to the preset depth and one or more second leaf nodes with the deepest layer number in the current Huffman tree; the number of the second leaf nodes is determined based on the number of layers of the first leaf nodes from the preset depth; merging the first leaf node and one or more second leaf nodes into a subtree, and merging the subtree to the position of the first leaf node in the current Huffman tree; and re-entering the step of judging whether the maximum depth of the current Huffman tree is larger than the preset depth or not until the maximum depth of the current Huffman tree is smaller than or equal to the preset depth, and compressing the data to be compressed by using the current Huffman tree to obtain compressed data.
According to the data compression method provided by the application, the first leaf nodes with the layer number smaller than the preset depth and the second leaf nodes with the layer number larger than the preset depth are recombined, and the second leaf nodes with the layer number larger than the preset depth are combined into the preset depth, so that the maximum depth of the generated Huffman tree is ensured not to exceed the preset depth, and the data compression efficiency is further improved. The application also discloses a data compression device, an electronic device and a computer readable storage medium, and the technical effects can be achieved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification, illustrate the disclosure and together with the description serve to explain, but do not limit the disclosure. In the drawings:
FIG. 1 is a flow chart illustrating a method of data compression according to an exemplary embodiment;
fig. 2 is a flow chart illustrating one method of building a huffman tree according to an exemplary embodiment;
fig. 3 is a flowchart illustrating a huffman tree coding according to an exemplary embodiment;
fig. 4 is a block diagram of a huffman tree a shown according to an exemplary embodiment;
fig. 5 is a schematic diagram illustrating a first step in a huffman tree finishing process according to an exemplary embodiment;
fig. 6 is a schematic diagram illustrating a second step in a huffman tree finishing process according to an exemplary embodiment;
fig. 7 is a schematic diagram illustrating a third step in a huffman tree finishing process according to an exemplary embodiment;
fig. 8 is a schematic diagram illustrating a fourth step in a huffman tree finishing process according to an exemplary embodiment;
fig. 9 is a schematic diagram illustrating a fifth step in a huffman tree finishing process according to an exemplary embodiment;
fig. 10 is a schematic diagram illustrating a sixth step in a huffman tree finishing process according to an exemplary embodiment;
FIG. 11 is a block diagram of a data compression device according to an exemplary embodiment;
fig. 12 is a block diagram of an electronic device, according to an example embodiment.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application. In addition, in the embodiments of the present application, "first", "second", etc. are used to distinguish similar objects and are not necessarily used to describe a particular order or precedence.
The embodiment of the application discloses a data compression method, which ensures that the maximum depth of a generated Huffman tree does not exceed the preset depth, thereby improving the data compression efficiency.
Referring to fig. 1, a flowchart of a data compression method according to an exemplary embodiment is shown, as shown in fig. 1, including:
s101: acquiring data to be compressed, and constructing a Huffman tree corresponding to the data to be compressed;
in this step, a huffman tree is constructed based on the data to be compressed. As a possible implementation manner, the constructing a huffman tree corresponding to the data to be compressed includes: extracting all characters in the data to be compressed, and counting the occurrence times of each character in the characters; generating corresponding nodes for each character, and adding the nodes corresponding to each character into a node set; the numerical value of each node is the number of times that the corresponding character appears in the data to be compressed; a first node with the smallest value and a second node with the smallest value are taken out from a current node set, a father node is established for the first node and the second node, and the value of the father node is the sum of the values of the first node and the second node; generating a subtree comprising the first node, the second node and the father node, wherein the first node is a left child node and the second node is a right child node; judging whether the current node set is empty or not; if not, adding the father node into the current node set, and re-entering the step of taking out the first node with the smallest value and the second node with the second smallest value from the current node set until the current node set is empty, and completing the construction of the Huffman tree corresponding to the data to be compressed.
In specific implementation, the data to be compressed is read, the occurrence times of each character are counted, each character is regarded as a node or a tree containing a node, the numerical value of the node is the occurrence times of the character, two nodes with the smallest numerical value and the smallest numerical value are found, the two nodes are combined, the node with the smaller numerical value is placed on the left side, the node with the smallest numerical value is placed on the right side, a father node is added for the two nodes to form a new subtree, the value of the father node is the sum of the numerical values of the two nodes, the new subtree is placed in all nodes, and the two nodes with the smallest numerical value and the smallest numerical value are continuously selected to be inserted into the father node until all the nodes form a tree, namely the Huffman tree corresponding to the data to be compressed.
For example, the frequency statistics of the data character to be compressed are shown in table 1, and the flow of creating huffman tree is shown in fig. 2.
TABLE 1
Character(s) A B C D E
Number of occurrences 4 5 5 6 14
S102: judging whether the maximum depth of the current Huffman tree is larger than a preset depth; if yes, go to S103; if not, entering S105;
in a specific implementation, if the maximum depth of the current huffman tree is greater than the preset depth, the process goes to S103 to sort the current huffman tree, otherwise, the process goes to S105 directly.
S103: determining a first leaf node with the layer number smaller than the preset depth and closest to the preset depth and one or more second leaf nodes with the deepest layer number in a current Huffman tree; the number of the second leaf nodes is determined based on the number of layers of the first leaf nodes from the preset depth;
in this step, first, a first leaf node whose number of layers is smaller than a preset depth and closest to the preset depth is determined in the current huffman tree, and the number N of leaf nodes that can be mounted at the position of the first leaf node, that is, the number N of second leaf nodes, n=2, is determined based on the number N of layers of the first leaf node from the preset depth n -1. Next, the second leaf node with the deepest N layers is determined in the current huffman tree.
S104: merging the first leaf node and one or more second leaf nodes into a subtree, merging the subtree to the position of the first leaf node in the current Huffman tree, and re-entering step S102;
in this step, the first leaf node and the N second leaf nodes determined in the previous step are combined into a subtree, and the subtree is combined to the position of the first leaf node in the current huffman tree. The process of step S103 and step S104 is repeated until the maximum depth of the current huffman tree is less than or equal to the preset depth, and the process proceeds to S105.
As a possible implementation manner, merging the first leaf node, one or more second leaf nodes into one subtree includes: generating N intermediate nodes based on the number N of second leaf nodes; and merging the N intermediate nodes, the first leaf nodes and the N second leaf nodes into a subtree, wherein the first leaf nodes and the N second leaf nodes are leaf nodes in the merged subtree. In a specific implementation, N intermediate nodes are generated, and the first leaf node, the N second leaf nodes, and the N intermediaries are combined into a subtree, where the first leaf node and the N second leaf nodes are the same leaf node.
As a preferred embodiment, after merging the subtree into the current huffman tree at the position of the first leaf node, the method further comprises: judging whether a target leaf node with the layer number larger than the preset depth and no brother node exists in the current Huffman tree; if yes, the target leaf node is moved to the position of the father node of the target leaf node. In an implementation, the target leaf node that was disconnected by fetching the second leaf node is moved up to the location where its parent node is located.
S105: and compressing the data to be compressed by using the current Huffman tree to obtain compressed data.
In the step, the current Huffman tree is utilized to compress the data to be compressed to obtain the compressed data. As a possible implementation manner, the compressing the data to be compressed by using the current huffman tree to obtain compressed data includes: and determining the Huffman coding of each character in the data to be compressed according to the current Huffman tree, and replacing each character in the data to be compressed with the corresponding Huffman coding to obtain compressed data.
In specific implementation, huffman coding of the character corresponding to each leaf node is determined according to the path from the root node to each leaf node in the current Huffman tree. Where the leaf nodes in the current huffman tree represent each character in the data to be compressed, for each binary tree the left node encodes 0 and the right node encodes 1, as shown in fig. 3. Thus, the encoding from the root node to each leaf node results in the encoding of all leaf nodes, i.e., each character, as shown in Table 2:
TABLE 2
Character(s) A B C D E
Encoding 100 101 110 111 0
It can be seen that the coding of each leaf cannot be consistent with the coding of other leaves, so that one Huffman code cannot be used as the prefix of the other code, thereby ensuring the smooth proceeding of the decoding process. From the trend of the code length change, it can be seen that the code value is shorter as the number of times of occurrence of characters is larger as the number of times is smaller.
Therefore, the calculation method adopted in the embodiment is not complex, and redundant and complicated calculation is not needed, and the position of the leaf node exceeding the preset depth is only needed to be reset. The result can be calculated by using one-time loop iteration calculation, and the calculation is fast and efficient. In addition, the calculation and storage modes related to the embodiment are simple and convenient, not only provide a foundation for high-efficiency software, but also provide possibility for hardware, and various realizable forms enable the application of the hardware to be more flexible.
According to the data compression method provided by the embodiment of the application, the first leaf nodes with the layer number smaller than the preset depth and the second leaf nodes with the layer number larger than the preset depth are recombined, and the second leaf nodes with the layer number larger than the preset depth are combined into the preset depth, so that the maximum depth of the generated Huffman tree is ensured not to exceed the preset depth, and the data compression efficiency is further improved.
The following describes a huffman tree a as an example of the huffman tree organizing process in the present application. Suppose that a huffman tree a obtained by counting, sorting, and growing a huffman tree is shown in fig. 4, and that the huffman code length of each leaf node should not exceed 6 (in fig. 4, shown by a bolded double dot-dash line, this line is subsequently described as a standard line). It can be seen from the figure that the maximum depth of huffman tree a is 9, i.e. the maximum length of the current leaf node reaches 9, exceeding the required maximum length 6.
The first step: the shape of huffman tree a is analyzed to find a leaf node (p node, at the ninth level) with the deepest layer number and a leaf node (e node, at the ninth level) closest to the standard line that does not exceed the leaf node, as shown in fig. 5. And taking out the e node and the p node, matching the 14 th node to form a subtree, and merging the subtree into the Huffman tree A in a specific merging mode that the 14 th node is combined with the position of the fifth layer original e node. Since p in the ninth layer is removed and optimized to the sixth layer, only one leaf node n (the ninth layer) is hung under the node 13 (the eighth layer), the node 13 is removed at this time, and the leaf node n is moved up to the position of the original node 13 of the eighth layer.
And a second step of: after finishing the Huffman tree through the operation of the first step, summarizing the new distribution situation of the leaf nodes. As shown in fig. 6, the eighth layer of the exceeded number of layers contains 2 leaf nodes and the seventh layer contains 3 leaf nodes. The sixth layer, which corresponds to the number of layers, contains 4 leaf nodes, the fifth layer contains 1 leaf node, the fourth layer also contains 2 leaf nodes, and the third layer contains 2 leaf nodes. Next, a regular leaf node f closest to the required layer number is found, the layer number of the point is five, the required layer number is six, and a layer of difference is required, so that 2n-1 leaf nodes exceeding the layer number can be hung. An extra long leaf node n may be provided at the f-leaf node, for example, in the following manner.
And a third step of: and firstly, taking out the eighth layer of leaf nodes to be moved and the fifth layer of f nodes to be combined, adding No. 15 node nodes, and combining the nodes into a subtree. The subtrees are then merged into the main tree, specifically the node number 15 is merged into the position of the original f leaf node. Because only one m-number leaf node is hung under the 12-number node at present, the 12-number node is removed, and the eighth-layer m-number node is moved up to the position of the original seventh-layer 12-number node.
Fourth step: and (3) finishing the Huffman tree after the third step operation to obtain the figure 8. As shown, the number of leaf nodes on the seventh layer beyond the required number of layers is 4. The number of leaf nodes satisfying the number of layers is 6 leaf nodes in the sixth layer, 0 leaf nodes in the fifth layer, 2 leaf nodes in the fourth layer, and 2 leaf nodes in the third layer. And searching the leaf nodes closest to the required layer number (sixth layer) in the Huffman tree A, which meet the standard layer number. As shown in fig. 8, we can find that the leaf node reaching the requirement is the c-leaf node of the fourth layer, and the distance requirement layer number is 2. It can hang up to 2n-1 leaf nodes beyond the layer number, i.e., 3 leaf nodes.
Fifth step: and (3) taking out a leaf node c nearest to the standard layer and 3 out-of-the-way nodes m, i and j according to the calculation result of the fourth step. As shown in fig. 9, we combine the extracted leaf nodes c, m, i, j and node nodes 16, 17 and 18 into a new subtree, i.e., the left subtree. The new subtree is joined to the main tree by joining the node No. 16 to the position at the original leaf node c and moving the seventh layer of leaf nodes k up to the position of the node No. 11 of the sixth layer.
Sixth step: the result of fig. 10 is obtained by integrating the result of the fifth step. So far, the leaf nodes with the ultra-long depth are all finished, namely, no leaf nodes with the depth exceeding the required layer number (layer number 6) are finished, and the Huffman tree finishing is finished.
A data compression apparatus according to an embodiment of the present application is described below, and a data compression apparatus described below and a data compression method described above may be referred to each other.
Referring to fig. 11, a structure diagram of a data compression apparatus according to an exemplary embodiment is shown, as shown in fig. 11, including:
the construction module 100 is configured to obtain data to be compressed, and construct a huffman tree corresponding to the data to be compressed;
the judging module 200 is configured to judge whether the maximum depth of the current huffman tree is greater than a preset depth; if yes, the workflow of the first determining module 300 is started, and if not, the workflow of the compressing module 500 is started;
the first determining module 300 is configured to determine, in a current huffman tree, a first leaf node having a layer number smaller than the preset depth and closest to the preset depth, and one or more second leaf nodes having the deepest layer number, and start a workflow of the merging module 400; the number of the second leaf nodes is determined based on the number of layers of the first leaf nodes from the preset depth;
the merging module 400 is configured to merge the first leaf node and one or more second leaf nodes into a subtree, merge the subtree into a position of the first leaf node in the current huffman tree, and restart the workflow of the judging module 200;
the compression module 500 is configured to compress the data to be compressed by using a current huffman tree to obtain compressed data.
According to the data compression device provided by the embodiment of the application, the first leaf nodes with the layer number smaller than the preset depth and the second leaf nodes with the layer number larger than the preset depth are recombined, and the second leaf nodes with the layer number larger than the preset depth are combined into the preset depth, so that the maximum depth of the generated Huffman tree is ensured not to exceed the preset depth, and the data compression efficiency is further improved.
On the basis of the above embodiment, as a preferred implementation manner, the method further includes:
the second determining module is used for determining the number N of the second leaf nodes based on the number N of layers of the first leaf nodes from the preset depth; wherein n=2 n -1。
Based on the above embodiment, as a preferred implementation manner, the merging module 400 includes:
a first generating unit configured to generate N intermediate nodes based on the number N of the second leaf nodes;
the first merging unit is used for merging the N intermediate nodes, the first leaf nodes and the N second leaf nodes into a subtree, wherein the first leaf nodes and the N second leaf nodes are leaf nodes in the merged subtree;
and the second merging unit is used for merging the subtrees to the position of the first leaf node in the current Huffman tree.
Based on the above embodiment, as a preferred implementation manner, the merging module 400 further includes:
the mobile unit is used for judging whether a target leaf node with the layer number larger than the preset depth and no brother node exists in the current Huffman tree; if yes, the target leaf node is moved to the position of the father node of the target leaf node.
On the basis of the above embodiment, as a preferred implementation, the building block 100 includes:
the acquisition unit is used for acquiring data to be compressed;
the statistics unit is used for extracting all characters in the data to be compressed and counting the occurrence times of each character in the characters;
the second generation unit is used for generating a corresponding node for each character and adding the node corresponding to each character into a node set; the numerical value of each node is the number of times that the corresponding character appears in the data to be compressed;
the creating unit is used for extracting a first node with the smallest value and a second node with the next smallest value from the current node set, creating a father node for the first node and the second node, wherein the value of the father node is the sum of the values of the first node and the second node;
a third generating unit, configured to generate a subtree including the first node, the second node, and the parent node, where the first node is a left child node, and the second node is a right child node;
the judging unit is used for judging whether the current node set is empty or not; if not, adding the father node into the current node set, and restarting the workflow of the creation unit until the current node set is empty, and completing the construction of the Huffman tree corresponding to the data to be compressed.
Based on the foregoing embodiment, as a preferred implementation manner, the compression module 500 specifically determines huffman coding of each character in the data to be compressed according to the current huffman tree, and replaces each character in the data to be compressed with the corresponding huffman coding to obtain compressed data.
Based on the foregoing embodiment, as a preferred implementation manner, the compression module 500 specifically determines huffman codes of characters corresponding to each leaf node according to a path from a root node to each leaf node in a current huffman tree, and replaces each character in the data to be compressed with a corresponding huffman code to obtain compressed data.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
Based on the hardware implementation of the program modules, and in order to implement the method according to the embodiment of the present application, the embodiment of the present application further provides an electronic device, fig. 12 is a block diagram of an electronic device according to an exemplary embodiment, and as shown in fig. 12, the electronic device includes:
a communication interface 1 capable of information interaction with other devices such as network devices and the like;
and the processor 2 is connected with the communication interface 1 to realize information interaction with other devices and is used for executing the data compression method provided by one or more technical schemes when running the computer program. And the computer program is stored on the memory 3.
Of course, in practice, the various components in the electronic device are coupled together by a bus system 4. It will be appreciated that the bus system 4 is used to enable connected communications between these components. The bus system 4 comprises, in addition to a data bus, a power bus, a control bus and a status signal bus. But for clarity of illustration the various buses are labeled as bus system 4 in fig. 12.
The memory 3 in the embodiment of the present application is used to store various types of data to support the operation of the electronic device. Examples of such data include: any computer program for operating on an electronic device.
It will be appreciated that the memory 3 may be either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. Wherein the nonvolatile Memory may be Read Only Memory (ROM), programmable Read Only Memory (PROM, programmable Read-Only Memory), erasable programmable Read Only Memory (EPROM, erasable Programmable Read-Only Memory), electrically erasable programmable Read Only Memory (EEPROM, electrically Erasable Programmable Read-Only Memory), magnetic random access Memory (FRAM, ferromagnetic random access Memory), flash Memory (Flash Memory), magnetic surface Memory, optical disk, or compact disk Read Only Memory (CD-ROM, compact Disc Read-Only Memory); the magnetic surface memory may be a disk memory or a tape memory. The volatile memory may be random access memory (RAM, random Access Memory), which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as static random access memory (SRAM, static Random Access Memory), synchronous static random access memory (SSRAM, synchronous Static Random Access Memory), dynamic random access memory (DRAM, dynamic Random Access Memory), synchronous dynamic random access memory (SDRAM, synchronous Dynamic Random Access Memory), double data rate synchronous dynamic random access memory (ddr SDRAM, double Data Rate Synchronous Dynamic Random Access Memory), enhanced synchronous dynamic random access memory (ESDRAM, enhanced Synchronous Dynamic Random Access Memory), synchronous link dynamic random access memory (SLDRAM, syncLink Dynamic Random Access Memory), direct memory bus random access memory (DRRAM, direct Rambus Random Access Memory). The memory 2 described in the embodiments of the present application is intended to comprise, without being limited to, these and any other suitable types of memory.
The method disclosed in the above embodiment of the present application may be applied to the processor 2 or implemented by the processor 2. The processor 2 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 2 or by instructions in the form of software. The processor 2 described above may be a general purpose processor, DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor 2 may implement or perform the methods, steps and logic blocks disclosed in embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiment of the application can be directly embodied in the hardware of the decoding processor or can be implemented by combining hardware and software modules in the decoding processor. The software modules may be located in a storage medium in the memory 3 and the processor 2 reads the program in the memory 3 to perform the steps of the method described above in connection with its hardware.
The corresponding flow in each method of the embodiments of the present application is implemented when the processor 2 executes the program, and for brevity, will not be described in detail herein.
In an exemplary embodiment, the present application also provides a storage medium, i.e. a computer storage medium, in particular a computer readable storage medium, for example comprising a memory 3 storing a computer program executable by the processor 2 for performing the steps of the method described above. The computer readable storage medium may be FRAM, ROM, PROM, EPROM, EEPROM, flash Memory, magnetic surface Memory, optical disk, or CD-ROM.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.
Alternatively, the above-described integrated units of the present application may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solutions of the embodiments of the present application may be embodied in essence or a part contributing to the prior art in the form of a software product stored in a storage medium, including several instructions for causing an electronic device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (9)

1. A method of data compression, comprising:
acquiring data to be compressed, constructing a Huffman tree corresponding to the data to be compressed, and judging whether the maximum depth of the current Huffman tree is larger than a preset depth;
if yes, determining a first leaf node with the layer number smaller than the preset depth and closest to the preset depth and one or more second leaf nodes with the deepest layer number in the current Huffman tree; the number of the second leaf nodes is determined based on the number of layers of the first leaf nodes from the preset depth;
merging the first leaf node and one or more second leaf nodes into a subtree, and merging the subtree to the position of the first leaf node in the current Huffman tree;
re-entering the step of judging whether the maximum depth of the current Huffman tree is larger than the preset depth or not until the maximum depth of the current Huffman tree is smaller than or equal to the preset depth, and compressing the data to be compressed by using the current Huffman tree to obtain compressed data;
wherein, still include:
determining the number N of the second leaf nodes based on the number N of layers of the first leaf nodes from the preset depth; wherein n=2 n -1。
2. The method of data compression of claim 1, wherein merging the first leaf node, one or more of the second leaf nodes into a sub-tree comprises:
generating N intermediate nodes based on the number N of second leaf nodes;
and merging the N intermediate nodes, the first leaf nodes and the N second leaf nodes into a subtree, wherein the first leaf nodes and the N second leaf nodes are leaf nodes in the merged subtree.
3. The method of data compression of claim 1, wherein merging the subtree into the current huffman tree at the location of the first leaf node further comprises:
judging whether a target leaf node with the layer number larger than the preset depth and no brother node exists in the current Huffman tree;
if yes, the target leaf node is moved to the position of the father node of the target leaf node.
4. The method for compressing data according to claim 1, wherein said constructing a huffman tree corresponding to said data to be compressed comprises:
extracting all characters in the data to be compressed, and counting the occurrence times of each character in the characters;
generating corresponding nodes for each character, and adding the nodes corresponding to each character into a node set; the numerical value of each node is the number of times that the corresponding character appears in the data to be compressed;
a first node with the smallest value and a second node with the smallest value are taken out from a current node set, a father node is established for the first node and the second node, and the value of the father node is the sum of the values of the first node and the second node;
generating a subtree comprising the first node, the second node and the father node, wherein the first node is a left child node and the second node is a right child node;
judging whether the current node set is empty or not;
if not, adding the father node into the current node set, and re-entering the step of taking out the first node with the smallest value and the second node with the second smallest value from the current node set until the current node set is empty, and completing the construction of the Huffman tree corresponding to the data to be compressed.
5. The method of data compression according to claim 4, wherein compressing the data to be compressed using the current huffman tree to obtain compressed data comprises:
and determining the Huffman coding of each character in the data to be compressed according to the current Huffman tree, and replacing each character in the data to be compressed with the corresponding Huffman coding to obtain compressed data.
6. The method of data compression according to claim 5, wherein determining huffman codes for each character in the data to be compressed based on a current huffman tree comprises:
and determining the Huffman code of the character corresponding to each leaf node according to the path from the root node to each leaf node in the current Huffman tree.
7. A data compression apparatus, comprising:
the construction module is used for acquiring data to be compressed and constructing a Huffman tree corresponding to the data to be compressed;
the judging module is used for judging whether the maximum depth of the current Huffman tree is larger than the preset depth; if yes, starting the working flow of the first determining module, and if not, starting the working flow of the compressing module;
the first determining module is configured to determine, in a current huffman tree, a first leaf node with a layer number smaller than the preset depth and closest to the preset depth, and one or more second leaf nodes with the deepest layer number, and start a workflow of the merging module; the number of the second leaf nodes is determined based on the number of layers of the first leaf nodes from the preset depth;
the merging module is configured to merge the first leaf node and one or more second leaf nodes into a subtree, merge the subtree into a position where the first leaf node is located in a current huffman tree, and restart the workflow of the judging module;
the compression module is used for compressing the data to be compressed by utilizing a current Huffman tree to obtain compressed data;
wherein, still include:
the second determining module is used for determining the number N of the second leaf nodes based on the number N of layers of the first leaf nodes from the preset depth; wherein n=2 n -1。
8. An electronic device, comprising:
a memory for storing a computer program;
processor for implementing the steps of the data compression method according to any one of claims 1 to 6 when executing said computer program.
9. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the data compression method according to any of claims 1 to 6.
CN202110981416.6A 2021-08-25 2021-08-25 Data compression method and device, electronic equipment and storage medium Active CN113746487B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110981416.6A CN113746487B (en) 2021-08-25 2021-08-25 Data compression method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110981416.6A CN113746487B (en) 2021-08-25 2021-08-25 Data compression method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113746487A CN113746487A (en) 2021-12-03
CN113746487B true CN113746487B (en) 2023-11-03

Family

ID=78732771

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110981416.6A Active CN113746487B (en) 2021-08-25 2021-08-25 Data compression method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113746487B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116073835B (en) * 2023-03-06 2023-08-25 智慧足迹数据科技有限公司 Geographic position data compression method and device, electronic equipment and storage medium
CN117668702B (en) * 2023-11-17 2024-04-26 北京城建设计发展集团股份有限公司 Automatic insertion method and device for isolation module in wiring of building equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005252813A (en) * 2004-03-05 2005-09-15 Murata Mach Ltd Huffman decoding device, huffman encoding device, and huffman tree data
CN102255617A (en) * 2010-05-19 2011-11-23 鸿富锦精密工业(深圳)有限公司 Storage method of Huffman tree and method of decoding data by using arrays
CN104283568A (en) * 2013-07-12 2015-01-14 中国科学院声学研究所 Data compressed encoding method based on part Hoffman tree
CN108933599A (en) * 2018-08-17 2018-12-04 郑州云海信息技术有限公司 A kind of huffman coding implementation method, system, equipment and computer storage medium
CN109412604A (en) * 2018-12-05 2019-03-01 云孚科技(北京)有限公司 A kind of data compression method based on language model
CN112003625A (en) * 2020-08-14 2020-11-27 山东云海国创云计算装备产业创新中心有限公司 Huffman coding method, system and equipment
CN112886967A (en) * 2021-01-23 2021-06-01 苏州浪潮智能科技有限公司 Data compression coding processing method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050053996A (en) * 2003-12-03 2005-06-10 삼성전자주식회사 Method and apparatus for decoding huffman code effectively
US7834781B2 (en) * 2009-04-06 2010-11-16 International Business Machines Corporation Method of constructing an approximated dynamic Huffman table for use in data compression
TWI524681B (en) * 2011-04-26 2016-03-01 晨星半導體股份有限公司 Huffman decoder and decoding method thereof
JP6511836B2 (en) * 2015-01-30 2019-05-15 富士通株式会社 Compression program, compression method, compression apparatus and decompression program

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005252813A (en) * 2004-03-05 2005-09-15 Murata Mach Ltd Huffman decoding device, huffman encoding device, and huffman tree data
CN102255617A (en) * 2010-05-19 2011-11-23 鸿富锦精密工业(深圳)有限公司 Storage method of Huffman tree and method of decoding data by using arrays
CN104283568A (en) * 2013-07-12 2015-01-14 中国科学院声学研究所 Data compressed encoding method based on part Hoffman tree
CN108933599A (en) * 2018-08-17 2018-12-04 郑州云海信息技术有限公司 A kind of huffman coding implementation method, system, equipment and computer storage medium
CN109412604A (en) * 2018-12-05 2019-03-01 云孚科技(北京)有限公司 A kind of data compression method based on language model
CN112003625A (en) * 2020-08-14 2020-11-27 山东云海国创云计算装备产业创新中心有限公司 Huffman coding method, system and equipment
CN112886967A (en) * 2021-01-23 2021-06-01 苏州浪潮智能科技有限公司 Data compression coding processing method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
EHTC: An Enhanced Huffman Tree Coding Algorithm and its FPGA Implementation;Sukrut Kesari Pasumarthi;《2018 IEEE 4th International Conference on Computer and Communications (ICCC)》;第2692-2697页 *
最优二叉树的生成及应用;张广学;《现代电子技术》;第112-119页 *

Also Published As

Publication number Publication date
CN113746487A (en) 2021-12-03

Similar Documents

Publication Publication Date Title
CN113746487B (en) Data compression method and device, electronic equipment and storage medium
CN104283567B (en) A kind of compression of name data, decompression method and equipment
CN111249736B (en) Code processing method and device
WO2012033498A1 (en) Systems and methods for data compression
CN107565973B (en) Method for realizing node-extensible Huffman coding and circuit structure
CN110557124A (en) Data compression method and device
CN112732321B (en) Firmware modification method, device, computer readable storage medium and equipment
Maruyama et al. Fully online grammar compression in constant space
CN112070652A (en) Data compression method, data decompression method, readable storage medium and electronic device
CN114513210B (en) State selection method, system, storage medium and device for finite state entropy coding
CN116089663A (en) Rule expression matching method and device and computer readable storage medium
CN114332353A (en) Three-dimensional model matching method and device, electronic equipment and storage medium
CN116505954B (en) Huffman coding method, system, device and medium
CN108153528A (en) Extension processing method, device, storage medium and the electronic equipment of procedural model
CN110120819B (en) Boolean circuit coding method, device and system
CN111767280A (en) Data processing method, device and storage medium
CN108829872B (en) Method, device, system and storage medium for rapidly processing lossless compressed file
US8228216B2 (en) Systems and methods for data compression
CN114356386A (en) Block differential upgrading method, terminal equipment and computer readable storage medium
CN114647764A (en) Graph structure query method and device and storage medium
JP2016170750A (en) Data management program, information processor and data management method
CN114385624A (en) Encoding method, encoding searching method, device, electronic equipment and storage medium
CN114301468A (en) FSE encoding method, device, equipment and storage medium
CN110990640B (en) Data determination method, device, equipment and computer readable storage medium
CN116073835B (en) Geographic position data compression method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant