CN110808738B - Data compression method, device, equipment and computer readable storage medium - Google Patents

Data compression method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN110808738B
CN110808738B CN201910871503.9A CN201910871503A CN110808738B CN 110808738 B CN110808738 B CN 110808738B CN 201910871503 A CN201910871503 A CN 201910871503A CN 110808738 B CN110808738 B CN 110808738B
Authority
CN
China
Prior art keywords
data
compressed
tag
prefix
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910871503.9A
Other languages
Chinese (zh)
Other versions
CN110808738A (en
Inventor
张�杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910871503.9A priority Critical patent/CN110808738B/en
Priority to PCT/CN2019/117104 priority patent/WO2021051532A1/en
Publication of CN110808738A publication Critical patent/CN110808738A/en
Application granted granted Critical
Publication of CN110808738B publication Critical patent/CN110808738B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/3059Digital compression and data reduction techniques where the original information is represented by a subset or similar information, e.g. lossy compression

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application relates to the technical field of big data, and discloses a data compression method, a device, equipment and a computer readable storage medium. The method comprises the following steps: acquiring data to be compressed, and classifying the data to be compressed; identifying the classified data to be compressed, and determining a label and a numerical value with numerical value type data and a label without numerical value type data, wherein the label comprises a label prefix and a label subscript; compressing the tag subscript and the numerical value with the numerical value type data; inquiring the tag prefix of the data to be compressed, and judging whether continuous data to be compressed with the same tag prefix exists or not; if the label prefix does not exist, a second preset symbol is inserted between the label prefix and the compressed label subscript; and obtaining compressed data based on the tag prefix, the second preset symbol and the compressed tag subscript or the tag prefix, the second preset symbol, the compressed tag subscript and the compressed numerical value, and outputting the compressed data. By the application, the data compression rate is improved.

Description

Data compression method, device, equipment and computer readable storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data compression method, apparatus, device, and computer readable storage medium.
Background
The data compression is a technical method for reducing the data volume of original data to reduce the storage space and improving the transmission, storage and processing efficiency of the original data without losing useful information. In the prior art, a string compression algorithm is generally used when compressing original data, and currently, the most commonly used string compression algorithm generally compresses strings with identical characters appearing consecutively. For example, the string to be compressed is aawcccaaa, and then a2b1c4a3. The compression method has a very narrow application range and a limited compression ratio, and is far from adequate for large items storing hundreds of millions of data.
Disclosure of Invention
The application mainly aims to provide a data compression method, a device, equipment and a computer readable storage medium, and aims to solve the technical problem that the compression rate of the existing data compression method is low.
To achieve the above object, the present application provides a data compression method comprising the steps of:
acquiring data to be compressed, and classifying the data to be compressed, wherein the data to be compressed comprises data with numerical value and data without numerical value;
identifying the classified data to be compressed, determining the label and the numerical value with the numerical value data, and determining the label without the numerical value data, wherein the label comprises a label prefix and a label subscript, the label prefix is a letter, and the label subscript and the numerical value are numbers;
compressing the tag subscript and the numerical value with the numerical value data based on a preset algorithm, and compressing the tag subscript without the numerical value data to obtain a compressed tag subscript and a compressed numerical value;
inquiring the tag prefix of the data to be compressed, and judging whether continuous data to be compressed with the same tag prefix exists or not;
if the continuous data to be compressed with the same tag prefix does not exist, inserting a second preset symbol between the tag prefix and the compressed tag subscript;
and obtaining compressed data based on the tag prefix, the second preset symbol and the compressed tag index, or the tag prefix, the second preset symbol, the compressed tag index and the compressed numerical value, and outputting the compressed data.
Optionally, the obtaining the data to be compressed and classifying the data to be compressed include:
acquiring data to be compressed, inquiring the data to be compressed, and judging whether the data to be compressed contains a first preset symbol or not;
determining data to be compressed containing a first preset symbol as numeric data, and determining data to be compressed which does not contain the first preset symbol as numeric data;
if there is no continuous data to be compressed with the same tag prefix, inserting a second preset symbol between the tag prefix and the compressed tag subscript, and then, including:
and obtaining compressed data based on the tag prefix, the second preset symbol, the compressed tag subscript, the first preset symbol and the compressed numerical value, and outputting the compressed data.
Optionally, the querying the tag prefix of the data to be compressed, after determining whether there is continuous data to be compressed with the same tag prefix, further includes:
if n continuous data to be compressed with the same tag prefix exist, simplifying the number of the same tag prefix in the compressed data into 1, wherein n is more than or equal to 2;
inserting a second preset symbol between the same tag prefix and the compressed tag subscript;
and placing the same label prefix on the left side of a second preset symbol, and placing different label subscripts and numerical values corresponding to the same label prefix on the right side of the second preset symbol in sequence.
Optionally, the placing the same tag prefix at the left side of the second preset symbol, and placing different tag subscripts and values corresponding to the same tag prefix sequentially at the right side of the second preset symbol, further includes:
when the n continuous data to be compressed with the same tag prefix and the data to be compressed with different tag prefixes appear, inserting a third preset symbol after the tag subscript or the numerical value of the n continuous data to be compressed with the same tag prefix and before the different tag prefixes.
In addition, to achieve the above object, the present application also provides a data compression apparatus including:
the classifying module is used for acquiring data to be compressed and classifying the data to be compressed, wherein the data to be compressed comprises data with numerical value and data without numerical value;
the dividing module is used for identifying the classified data to be compressed, determining the label and the numerical value with the numerical value data and determining the label without the numerical value data, wherein the label comprises a label prefix and a label subscript, the label prefix is a letter, and the label subscript and the numerical value are numbers;
the compression module is used for compressing the label subscript and the numerical value with the numerical value data based on a preset algorithm, and compressing the label subscript without the numerical value data to obtain a compressed label subscript and a compressed numerical value;
the query module is used for querying the tag prefix of the data to be compressed and judging whether continuous data to be compressed with the same tag prefix exists or not;
the inserting module is used for inserting a second preset symbol between the tag prefix and the compressed tag subscript if continuous data to be compressed with the same tag prefix does not exist;
and the output module is used for obtaining compressed data based on the tag prefix, the second preset symbol and the compressed tag subscript or the tag prefix, the second preset symbol, the compressed tag subscript and the compressed numerical value and outputting the compressed data.
Optionally, the classification module includes:
the inquiring unit is used for acquiring data to be compressed, inquiring the data to be compressed and judging whether the data to be compressed contains a first preset symbol or not;
and the determining unit is used for determining the data to be compressed containing the first preset symbol as the numeric data, and determining the data to be compressed which does not contain the first preset symbol as the numeric data.
Optionally, the data compression device further includes:
the simplifying module is used for simplifying the number of the same tag prefixes in the compressed data to 1 if n continuous data to be compressed with the same tag prefixes exist, wherein n is more than or equal to 2;
the simplified inserting module is used for inserting a second preset symbol between the same label prefix and the compressed label subscript;
the arrangement module is used for placing the same label prefix on the left side of a second preset symbol, and placing different label subscripts and numerical values corresponding to the same label prefix on the right side of the second preset symbol in sequence.
Optionally, the data compression device further includes:
and the connection module is used for inserting a third preset symbol after the label subscript or the numerical value of the n-th continuous data to be compressed with the same label prefix and before the different label prefixes when the data to be compressed with different label prefixes appear after the n continuous data to be compressed with the same label prefix.
In addition, to achieve the above object, the present application also provides a data compression apparatus including an input-output unit, a memory, and a processor, the memory storing computer-readable instructions which, when executed by the processor, cause the processor to perform the steps of the data compression method as described above.
In addition, in order to achieve the above object, the present application also provides a computer-readable storage medium having stored thereon a data compression program which, when executed by a processor, implements the steps of the data compression method as described above.
The data compression method provided by the application comprises the steps of firstly obtaining data to be compressed, classifying the data to be compressed, and dividing the data into data with numerical value and data without numerical value; identifying the classified data to be compressed, and determining a label with numerical data and a label without the numerical data, wherein the label comprises a label prefix and a label subscript; compressing the tag subscript with the numerical data and the numerical value based on a preset algorithm, and compressing the tag subscript without the numerical data to obtain a compressed tag subscript and a compressed numerical value; further inquiring the tag prefix of the data to be compressed, and if the continuous data to be compressed with the same tag prefix does not exist, inserting a second preset symbol between the tag prefix and the compressed tag subscript; and finally, obtaining compressed data based on the label prefix, the second preset symbol and the compressed label subscript or the label prefix, the second preset symbol, the compressed label subscript and the compressed numerical value, and outputting the compressed data. According to the data compression method, the tag subscript and the numerical part of the data to be compressed are respectively compressed by adopting a preset algorithm, so that the memory space close to half can be saved, and the data compression rate is greatly improved; meanwhile, a second preset symbol is inserted between the label prefix and the compressed label subscript, so that the decompression speed and the accuracy are improved.
Drawings
FIG. 1 is a schematic diagram of a data compression device in a hardware operating environment according to an embodiment of the present application;
FIG. 2 is a flow chart of an embodiment of a data compression method according to the present application;
FIG. 3 is a schematic diagram of functional modules of an embodiment of a data compression device according to the present application;
FIG. 4 is a schematic diagram showing functional units of a classification module in an embodiment of the data compression device according to the present application;
FIG. 5 is a schematic diagram of functional blocks of another embodiment of the data compression device according to the present application;
fig. 6 is a schematic functional block diagram of another embodiment of the data compression device of the present application.
The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
Referring to fig. 1, fig. 1 is a schematic diagram of a data compression device of a hardware running environment according to an embodiment of the present application.
The data compression device in the embodiment of the application can be a terminal device with data processing capability such as a portable computer, a server and the like.
As shown in fig. 1, the data compression apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the aforementioned processor 1001.
It will be appreciated by those skilled in the art that the data compression device structure shown in fig. 1 does not constitute a limitation of the data compression device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
As shown in fig. 1, an operating system, a network communication module, a user interface module, and a data compression program may be included in the memory 1005, which is a type of computer storage medium.
In the data compression device shown in fig. 1, the network interface 1004 is mainly used for connecting to a background server, and performing data communication with the background server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be used to call a data compression program stored in the memory 1005 and perform the operations of the following embodiments of the data compression method.
Referring to fig. 2, fig. 2 is a flow chart of an embodiment of a data compression method according to the present application, in which the data compression method includes:
step S10, obtaining data to be compressed, and classifying the data to be compressed, wherein the data to be compressed comprises data with numerical value and data without numerical value.
In this embodiment, the data to be compressed is first obtained, and specifically, the data to be compressed may be divided into data with a numeric value and data without a numeric value, where the data with a numeric value includes two parts of a tag and a numeric value, the tag further includes a tag prefix and a tag subscript, and the data without a numeric value includes only a tag part. For example, data pa1001:12 to be compressed is data with numerical value, where pa1001 is a label, pa is a label prefix, 1001 is a label subscript, 12 is a numerical value, and the label is used "in the middle of the numerical value: "spaced apart, the tag prefix and tag subscript of the tag portion are not separated by a symbol; for another example, the data pb1004 to be compressed is data without numerical value, and similarly, pb1004 is a tag, pb is a tag prefix, and 1004 is a tag index.
Specifically, the process of classifying tag data to be compressed is as follows: firstly, determining whether label data to be compressed is label data with a numerical value or label data without the numerical value, wherein the label data to be compressed has a first preset symbol: "is label data with numerical value type, first preset symbol": "preceding is a label, first preset symbol": "followed by a numerical value. It will be appreciated that in this embodiment, the first preset symbol may be in addition to ": other symbols than "such as"/", etc., are not limited herein with respect to the type of the first predetermined symbol.
And step S20, identifying the classified data to be compressed, determining a label and a numerical value with numerical value data, and determining a label without numerical value data, wherein the label comprises a label prefix and a label subscript, the label prefix is a letter, and the label subscript and the numerical value are numbers.
Further, after determining whether the data is numeric or not, the tag prefix, tag index and/or numeric of the data to be compressed is divided. Specifically, the label prefix is a letter, and the label subscript and the number are numerals.
Step S30, compressing the label subscript and the numerical value with the numerical value data based on a preset algorithm, and compressing the label subscript without the numerical value data to obtain a compressed label subscript and a compressed numerical value.
Further, after determining the tag prefix, tag subscript and/or value of the data to be compressed, the tag subscript and/or value is compressed based on a preset compression algorithm, and in the compression process, different compression modes can be classified according to different types of the data to be compressed.
For example, if the data to be compressed is data with numerical values, respectively compressing the tag part and the numerical value part of the data to be compressed; if the data to be compressed is data without numerical value, only the label part of the data to be compressed is compressed. In this embodiment, when the label is partially compressed, only the label subscript is compressed, and the label prefix does not need to be compressed.
Further, the compression of the tag subscript and/or the numerical value portion of the tag portion is performed by a 62-ary compression method, wherein the 62-ary compression method means that 9 digits of 0 to 9 are used to sequentially represent the 9 digits of 0 to 9, 26 lower case letters of a to Z are used to represent the 25 digits in the middle of 10 to 35, and 26 upper case letters of a to Z are used to represent the 25 digits in the middle of 36 to 61. For example, the number 8 is represented by a 62 scale or 8, while the number 12 is represented by a 62 scale, which is the lower case letter c.
Taking data pa1001 without numerical values as an example, the compression process of the 62-ary compression method is illustrated: the data pa1001 to be compressed is divided, the tag prefix pa and the tag subscript 1001 are determined, the tag prefix is not compressed, and for the tag subscript 1001, the tag subscript is converted to 169 by using a decimal-sixty binary system binary conversion formula, 9 numbers of 0-9 are sequentially represented by 9 numbers of 0-9 in 62, 26 lowercase letters of a-z are correspondingly represented by 25 numbers in the middle of 10-35, 16 corresponds to the letter g in 62, and 9 corresponds to the number 9 in 62, so that the tag subscript 1001 is converted to g9 by 62.
Step S40, inquiring the tag prefix of the data to be compressed, and judging whether continuous data to be compressed with the same tag prefix exists or not; if not, step S50 is performed.
In this embodiment, in order to further improve the efficiency of data compression and save the storage space, after compressing the tag subscript and/or the numerical value based on the preset algorithm, the tag prefix of the data to be compressed is queried, to determine whether there is continuous data to be compressed with the same tag prefix in the data to be compressed, for continuous data to be compressed with the same tag prefix, the number of the same tag prefix may be simplified in the compressed data, and only one continuous identical tag prefix may be reserved, so as to improve the data compression rate.
Step S50, inserting a second preset symbol between the label prefix and the compressed label subscript.
Further, after the label portion and/or the numerical portion of the data to be compressed are compressed based on the 62-ary compression method, in order to improve the accuracy of reading the compressed data and facilitate the decompression of the subsequent compressed data, in this embodiment, a second preset symbol is inserted between the uncompressed label prefix and the compressed label subscript, so as to distinguish the prefix and the subscript.
For example, for the data pa1001 without the numerical value, the data obtained by compression by the preset algorithm is pag9, and if the corresponding symbol is not inserted to distinguish, it is impossible to identify which of the compressed data is the tag prefix and which is the tag index. For example, there may be a case where pa is a tag prefix and g9 is a tag index, or there may be a case where p is a tag prefix and ag9 is a tag index, which is not beneficial to the accurate decompression to be performed subsequently.
Therefore, the second preset symbol is inserted into the compressed data so as to distinguish the tag prefix from the tag subscript. Specifically, the "=" may be inserted between the uncompressed label prefix and the compressed label subscript to divide, or other symbols, such as "/", "; in this embodiment, the form of the second preset symbol is not limited, and of course, the first preset symbol is different from the second preset symbol, and after the second preset symbol is inserted, the compression process of the data to be compressed is completed.
Step S60, based on the label prefix, the second preset symbol and the compressed label subscript, or the label prefix, the second preset symbol, the compressed label subscript and the compressed numerical value, compressed data are obtained, and the compressed data are output.
For the data pa1001 without numerical value, the label prefix pa is not compressed, the label subscript is compressed to g9, and a second preset symbol "=" is inserted between the label prefix and the compressed label subscript, so the data pa1001 without numerical value is compressed to pa=g9;
for the data pa1001:12 with numerical value, a first preset symbol is arranged between the label subscript and the numerical value part, the label prefix pa is not compressed, the label subscript is compressed to g9, the numerical value part is compressed to c, and a second preset symbol "=", is inserted between the label prefix and the compressed label subscript. Therefore, the band numerical data pa1001:12 is compressed to pa=g9:c. When data decompression is carried out subsequently, the label prefix, label subscript and numerical value of the data to be decompressed can be accurately distinguished, so that the accuracy of decompression is improved.
In this embodiment, first, data to be compressed is acquired, and classified into data with a numerical value and data without a numerical value; identifying the classified data to be compressed, and determining a label with numerical data and a label without the numerical data, wherein the label comprises a label prefix and a label subscript; compressing the tag subscript with the numerical data and the numerical value based on a preset algorithm, and compressing the tag subscript without the numerical data to obtain a compressed tag subscript and a compressed numerical value; further inquiring the tag prefix of the data to be compressed, and if the continuous data to be compressed with the same tag prefix does not exist, inserting a second preset symbol between the tag prefix and the compressed tag subscript; and finally, obtaining compressed data based on the label prefix, the second preset symbol and the compressed label subscript or the label prefix, the second preset symbol, the compressed label subscript and the compressed numerical value, and outputting the compressed data. According to the data compression method, the tag subscript and the numerical part of the data to be compressed are respectively compressed by adopting a preset algorithm, so that the memory space close to half can be saved, and the data compression rate is greatly improved; meanwhile, a second preset symbol is inserted between the label prefix and the compressed label subscript, so that the decompression speed and the accuracy are improved.
Further, after step S40, the method further includes:
step S70, if n continuous data to be compressed with the same tag prefix exist, simplifying the number of the same tag prefix in the compressed data to 1, wherein n is more than or equal to 2;
step S80, inserting a second preset symbol between the same label prefix and the compressed label subscript;
step S90, the same label prefix is placed on the left side of a second preset symbol, and different label subscripts and numerical values corresponding to the same label prefix are sequentially placed on the right side of the second preset symbol;
step S100, when n consecutive data to be compressed with the same tag prefix and then different tag prefixes appear, inserting a third preset symbol after the tag subscript or the numerical value of the n consecutive data to be compressed with the same tag prefix and before the different tag prefixes.
In this embodiment, in order to further improve the data compression efficiency and save the storage space, when it is detected that continuous data to be compressed with the same tag prefix exists in the data to be compressed, the same tag prefix in the continuous data to be compressed may be placed on the left side of the second preset symbol, and different tag subscripts and/or values in the data to be compressed may be sequentially placed on the right side of the second preset symbol, so as to improve the compression rate.
Firstly, the label prefix, the label index and the numerical value of the data to be compressed are divided, and the label index and the numerical value are compressed according to a preset algorithm. Further, detecting the tag prefix of the data to be compressed, and judging whether the condition that the tag prefix is continuous and the same exists in the data to be compressed. And if the condition that the tag prefixes are continuously the same exists, adjusting the form of compressed data corresponding to the continuously the same tag prefixes. Specifically, in the compressed data, only one continuous identical tag prefix is reserved, the compressed tag subscripts and the numerical values are still arranged in the manner of the first embodiment, and a second preset symbol is inserted between the continuous identical tag prefix and the compressed tag subscripts and/or the numerical values to distinguish.
Further, when different tag prefixes appear after consecutive identical tag prefixes, a third preset symbol is inserted after the numerical value or tag subscript of the data to be compressed corresponding to the last consecutive identical tag prefix and before the different tag prefixes, so as to indicate that the tag prefixes of the data to be compressed before and after the third preset symbol are different.
For example, if the data to be compressed is pa1001:12, pa1002:35, pa1003:63, the three data to be compressed all have the same tag prefix pa, and therefore, the form of the compressed data to be compressed based on the preset algorithm is: pa=g9:c, pa=ga:z, pa=gb:z. In order to further reduce the data storage space and improve the data compression rate, the number of the same tag prefixes pa is simplified to be one, the same tag prefixes are placed on the left side of a second preset symbol "=", and after different tag subscripts and numerical values in three data to be compressed are respectively compressed, the different tag subscripts and numerical values are placed on the right side of the second preset symbol "=", namely, the data to be compressed are sequentially placed on the right side of the second preset symbol "=", namely, the data to be compressed is: pa=g9:c, ga:z, gb:z.
Further, when data with different tag prefixes appear after consecutive data to be compressed with the same tag prefix, the third preset symbol "#" is used for distinguishing before the position of the different tag prefix appears in the compressed data, and the third preset symbol may also be in other forms, such as "/", "; in the present embodiment, "etc., the form of the third preset symbol is not limited, and of course, the first preset symbol, the second preset symbol, and the third preset symbol are different. For example, if the data to be compressed is pa1001:12, pa1002:35, pa1003:63, and pb1004:23, the compressed data is pa=g9:c, ga:z, gb:z#pb=gc:n.
Specifically, taking the data to be compressed as the data with numerical values as an example, the compression process in this embodiment is illustrated. For example, data to be compressed is pa1001:12, pa1002:35, pa1003:63, pb1004:23, pb1005:26, pb1006:27, and pb1007:13, where the data to be compressed includes 7 data, and the compressed data is pa=g9:c, ga:z, gb: z#pb=gc, gd:q, ge: r, gf: d by compressing the tag index and the number of the tag index and the number portion of each data to be compressed by using a 62-ary method, and simplifying the number of consecutive identical tag prefixes, and inserting a second preset symbol "=", between the tag prefix and the tag index. The data before compression has 69 bytes, and the data after compression has 40 bytes, so the compression rate reaches 58%. The compression rate is higher if more data to be compressed with the same tag prefix is consecutive.
For the data without numerical value, the data before compression is assumed to be pa1001, pa1002, pa1003, pb1004, pb1005, pb1006, pb1007, and similarly, the data to be compressed includes 7 data. The data compressed by the 62-system compression method is pa=g9, ga, gb#pb=gc, gd, ge, gf. The data before compression has 48 bytes, and after compression, only 26 bytes, the compression rate reaches 54%. Likewise, the compression rate will be higher if more data to be compressed with the same tag prefix is consecutive.
In this embodiment, if there is continuous data to be compressed with the same tag prefix in the data to be compressed, the number of the same tag prefix is simplified after compression, the same tag prefix is placed on the left side of the second preset symbol, and different tag subscripts and/or values corresponding to the same tag prefix are sequentially placed on the right side of the second preset symbol, so as to improve the compression rate of the data.
Referring to fig. 3, fig. 3 is a schematic functional block diagram of a data compression device according to an embodiment of the application.
In this embodiment, the data compression apparatus includes:
the classification module 10 is configured to obtain data to be compressed, and classify the data to be compressed, where the data to be compressed includes data with a numerical value and data without a numerical value;
the dividing module 20 is configured to identify the classified data to be compressed, determine a label and a numerical value with the numerical value data, and determine the label without the numerical value data, where the label includes a label prefix and a label subscript, the label prefix is a letter, and the label subscript and the numerical value are numbers;
the compression module 30 is configured to compress the tag index with the numerical data and the numerical value based on a preset algorithm, and compress the tag index without the numerical data, so as to obtain a compressed tag index and a compressed numerical value;
a query module 40, configured to query the tag prefix of the data to be compressed, and determine whether there is continuous data to be compressed with the same tag prefix;
an inserting module 50, configured to insert a second preset symbol between the tag prefix and the compressed tag subscript if there is no continuous data to be compressed with the same tag prefix;
and an output module 60, configured to obtain compressed data based on the tag prefix, the second preset symbol, the compressed tag index, or the tag prefix, the second preset symbol, the compressed tag index, and the compressed numerical value, and output the compressed data.
Further, referring to fig. 4, the classification module 10 includes:
the query unit 101 is configured to obtain data to be compressed, query the data to be compressed, and determine whether the data to be compressed includes a first preset symbol;
a determining unit 102, configured to determine data to be compressed including a first preset symbol as numeric data, and determine data to be compressed not including the first preset symbol as numeric data.
Further, referring to fig. 5, the data compression apparatus further includes:
a simplifying module 70, configured to simplify the number of identical tag prefixes in the compressed data to 1 if n consecutive data to be compressed with identical tag prefixes exist, where n is greater than or equal to 2;
a simplified inserting module 80, configured to insert a second preset symbol between the same tag prefix and the compressed tag subscript;
the arrangement module 90 is configured to place the same tag prefix on the left side of the second preset symbol, and place different tag subscripts and values corresponding to the same tag prefix on the right side of the second preset symbol sequentially.
Further, referring to fig. 6, the data compression apparatus further includes:
and the connection module 100 is configured to insert a third preset symbol after the label subscript or the numerical value of the n consecutive data to be compressed with the same label prefix and before the different label prefix when the n consecutive data to be compressed with the same label prefix and different label prefixes appear.
The specific embodiment of the data compression device of the present application is substantially the same as each embodiment of the data compression method described above, and will not be described herein.
In addition, the embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores a data compression program, and the data compression program realizes the steps of the data compression method when being executed by a processor.
The specific embodiments of the computer readable storage medium of the present application are substantially the same as the embodiments of the data compression method described above, and will not be described herein.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the application, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (10)

1. A data compression method, characterized in that the data compression method comprises the steps of:
acquiring data to be compressed, and classifying the data to be compressed, wherein the data to be compressed comprises data with numerical value and data without numerical value;
identifying the classified data to be compressed, determining the label and the numerical value with the numerical value data, and determining the label without the numerical value data, wherein the label comprises a label prefix and a label subscript, the label prefix is a letter, and the label subscript and the numerical value are numbers;
compressing the tag subscript and the numerical value with the numerical value data based on a preset algorithm, and compressing the tag subscript without the numerical value data to obtain a compressed tag subscript and a compressed numerical value;
inquiring the tag prefix of the data to be compressed, and judging whether continuous data to be compressed with the same tag prefix exists or not;
if the continuous data to be compressed with the same tag prefix does not exist, inserting a second preset symbol between the tag prefix and the compressed tag subscript;
and obtaining compressed data based on the tag prefix, the second preset symbol and the compressed tag index, or the tag prefix, the second preset symbol, the compressed tag index and the compressed numerical value, and outputting the compressed data.
2. The data compression method of claim 1, wherein the acquiring data to be compressed and classifying the data to be compressed comprises:
acquiring data to be compressed, inquiring the data to be compressed, and judging whether the data to be compressed contains a first preset symbol or not;
determining data to be compressed containing a first preset symbol as numeric data, and determining data to be compressed which does not contain the first preset symbol as numeric data;
if there is no continuous data to be compressed with the same tag prefix, inserting a second preset symbol between the tag prefix and the compressed tag subscript, and then, including:
and obtaining compressed data based on the tag prefix, the second preset symbol, the compressed tag subscript, the first preset symbol and the compressed numerical value, and outputting the compressed data.
3. The method for compressing data as recited in claim 1, wherein said querying the tag prefix of the data to be compressed, after determining whether there is continuous data to be compressed with the same tag prefix, further comprises:
if n continuous data to be compressed with the same tag prefix exist, simplifying the number of the same tag prefix in the compressed data into 1, wherein n is more than or equal to 2;
inserting a second preset symbol between the same tag prefix and the compressed tag subscript;
and placing the same label prefix on the left side of a second preset symbol, and placing different label subscripts and numerical values corresponding to the same label prefix on the right side of the second preset symbol in sequence.
4. The data compression method as claimed in claim 3, wherein the placing the same tag prefix on the left side of the second preset symbol, and placing different tag subscripts and values corresponding to the same tag prefix sequentially on the right side of the second preset symbol, further comprises:
when the n continuous data to be compressed with the same tag prefix and the data to be compressed with different tag prefixes appear, inserting a third preset symbol after the tag subscript or the numerical value of the n continuous data to be compressed with the same tag prefix and before the different tag prefixes.
5. A data compression device, the data compression device comprising:
the classifying module is used for acquiring data to be compressed and classifying the data to be compressed, wherein the data to be compressed comprises data with numerical value and data without numerical value;
the dividing module is used for identifying the classified data to be compressed, determining the label and the numerical value with the numerical value data and determining the label without the numerical value data, wherein the label comprises a label prefix and a label subscript, the label prefix is a letter, and the label subscript and the numerical value are numbers;
the compression module is used for compressing the label subscript and the numerical value with the numerical value data based on a preset algorithm, and compressing the label subscript without the numerical value data to obtain a compressed label subscript and a compressed numerical value;
the query module is used for querying the tag prefix of the data to be compressed and judging whether continuous data to be compressed with the same tag prefix exists or not;
the inserting module is used for inserting a second preset symbol between the tag prefix and the compressed tag subscript if continuous data to be compressed with the same tag prefix does not exist;
and the output module is used for obtaining compressed data based on the tag prefix, the second preset symbol and the compressed tag subscript or the tag prefix, the second preset symbol, the compressed tag subscript and the compressed numerical value and outputting the compressed data.
6. The data compression apparatus of claim 5, wherein the classification module comprises:
the inquiring unit is used for acquiring data to be compressed, inquiring the data to be compressed and judging whether the data to be compressed contains a first preset symbol or not;
and the determining unit is used for determining the data to be compressed containing the first preset symbol as the numeric data, and determining the data to be compressed which does not contain the first preset symbol as the numeric data.
7. The data compression device of claim 5, wherein the data compression device further comprises:
the simplifying module is used for simplifying the number of the same tag prefixes in the compressed data to 1 if n continuous data to be compressed with the same tag prefixes exist, wherein n is more than or equal to 2;
the simplified inserting module is used for inserting a second preset symbol between the same label prefix and the compressed label subscript;
the arrangement module is used for placing the same label prefix on the left side of a second preset symbol, and placing different label subscripts and numerical values corresponding to the same label prefix on the right side of the second preset symbol in sequence.
8. The data compression device of claim 7, wherein the data compression device further comprises:
and the connection module is used for inserting a third preset symbol after the label subscript or the numerical value of the n-th continuous data to be compressed with the same label prefix and before the different label prefixes when the data to be compressed with different label prefixes appear after the n continuous data to be compressed with the same label prefix.
9. A data compression device comprising an input-output unit, a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the data compression method according to any of claims 1 to 4.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a data compression program which, when executed by a processor, implements the steps of the data compression method according to any one of claims 1 to 4.
CN201910871503.9A 2019-09-16 2019-09-16 Data compression method, device, equipment and computer readable storage medium Active CN110808738B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910871503.9A CN110808738B (en) 2019-09-16 2019-09-16 Data compression method, device, equipment and computer readable storage medium
PCT/CN2019/117104 WO2021051532A1 (en) 2019-09-16 2019-11-11 Data compression method, apparatus and device, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910871503.9A CN110808738B (en) 2019-09-16 2019-09-16 Data compression method, device, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110808738A CN110808738A (en) 2020-02-18
CN110808738B true CN110808738B (en) 2023-10-20

Family

ID=69487560

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910871503.9A Active CN110808738B (en) 2019-09-16 2019-09-16 Data compression method, device, equipment and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN110808738B (en)
WO (1) WO2021051532A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111510154B (en) * 2020-04-17 2023-07-21 环荣电子(惠州)有限公司 Coordinate data compression method
CN114866487B (en) * 2022-03-08 2024-03-05 国网江苏省电力有限公司南京供电分公司 Massive power grid dispatching data acquisition and storage system
CN115422142A (en) * 2022-08-22 2022-12-02 北京羽乐创新科技有限公司 Data compression method and device
CN117579079B (en) * 2024-01-15 2024-03-29 每日互动股份有限公司 Data compression processing method, device, equipment and medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5684478A (en) * 1994-12-06 1997-11-04 Cennoid Technologies, Inc. Method and apparatus for adaptive data compression
CN1786939A (en) * 2005-11-10 2006-06-14 浙江中控技术有限公司 Real-time data compression method
JP2008192031A (en) * 2007-02-07 2008-08-21 Nec Corp Compression method, compression device, compressed data restoration method, compressed data restoration device, visualization method and visualization device
US7444347B1 (en) * 2007-11-16 2008-10-28 International Business Machines Corporation Systems, methods and computer products for compression of hierarchical identifiers
WO2009001174A1 (en) * 2007-06-28 2008-12-31 Smartimage Solutions, Sia System and method for data compression and storage allowing fast retrieval
CN108880556A (en) * 2018-05-30 2018-11-23 中国人民解放军战略支援部队信息工程大学 Destructive data compressing method, error-resilience method and encoder and decoder based on LZ77
CN109903350A (en) * 2017-12-07 2019-06-18 上海寒武纪信息科技有限公司 Method for compressing image and relevant apparatus
CN110019184A (en) * 2017-09-04 2019-07-16 北京字节跳动网络技术有限公司 A kind of method of the orderly integer array of compression and decompression

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7688233B2 (en) * 2008-02-07 2010-03-30 Red Hat, Inc. Compression for deflate algorithm
US10169362B2 (en) * 2016-07-07 2019-01-01 Cross Commerce Media, Inc. High-density compression method and computing system
JP7003443B2 (en) * 2017-05-16 2022-01-20 富士通株式会社 Coding program, coding device and coding method
CN107592116B (en) * 2017-09-21 2019-06-11 咪咕文化科技有限公司 A kind of data compression method, device and storage medium
US10735025B2 (en) * 2018-03-02 2020-08-04 Microsoft Technology Licensing, Llc Use of data prefixes to increase compression ratios
CN108737976B (en) * 2018-05-22 2021-05-04 南京大学 Compression transmission method based on Beidou short message

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5684478A (en) * 1994-12-06 1997-11-04 Cennoid Technologies, Inc. Method and apparatus for adaptive data compression
CN1786939A (en) * 2005-11-10 2006-06-14 浙江中控技术有限公司 Real-time data compression method
JP2008192031A (en) * 2007-02-07 2008-08-21 Nec Corp Compression method, compression device, compressed data restoration method, compressed data restoration device, visualization method and visualization device
WO2009001174A1 (en) * 2007-06-28 2008-12-31 Smartimage Solutions, Sia System and method for data compression and storage allowing fast retrieval
US7444347B1 (en) * 2007-11-16 2008-10-28 International Business Machines Corporation Systems, methods and computer products for compression of hierarchical identifiers
CN110019184A (en) * 2017-09-04 2019-07-16 北京字节跳动网络技术有限公司 A kind of method of the orderly integer array of compression and decompression
CN109903350A (en) * 2017-12-07 2019-06-18 上海寒武纪信息科技有限公司 Method for compressing image and relevant apparatus
CN108880556A (en) * 2018-05-30 2018-11-23 中国人民解放军战略支援部队信息工程大学 Destructive data compressing method, error-resilience method and encoder and decoder based on LZ77

Also Published As

Publication number Publication date
CN110808738A (en) 2020-02-18
WO2021051532A1 (en) 2021-03-25

Similar Documents

Publication Publication Date Title
CN110808738B (en) Data compression method, device, equipment and computer readable storage medium
CN108388598B (en) Electronic device, data storage method, and storage medium
US7310055B2 (en) Data compression method and compressed data transmitting method
US9496891B2 (en) Compression device, compression method, decompression device, decompression method, and computer-readable recording medium
CN112953550B (en) Data compression method, electronic device and storage medium
CN108197686B (en) Method and device for analyzing article bar code and computer readable storage medium
CN104579360B (en) A kind of method and apparatus of data processing
US8838550B1 (en) Readable text-based compression of resource identifiers
CN1135876C (en) Short message transmitting equipment and method for mobile communication terminal
CN112637598B (en) Video compression and decompression method, device, equipment and readable storage medium
CN115567589B (en) Compression transmission method, device and equipment of JSON data and storage medium
CN101465905B (en) System and method for frisking mail address
CN111368508A (en) Data processing method, device, equipment and medium
CN111143312A (en) Format analysis method, device, equipment and storage medium for power logs
CN111191087A (en) Character matching method, terminal device and computer-readable storage medium
CN110765328A (en) Data processing method, device and storage medium
CN112054805B (en) Model data compression method, system and related equipment
CN106209605B (en) Method and equipment for processing attachment in network information
CN110287147B (en) Character string sorting method and device
CN111538914B (en) Address information processing method and device
CN112260699A (en) Attribute communication coding and decoding method, coding device, decoding device and system
CN111538730A (en) Data statistics method and system based on Hash bucket algorithm
CN114547030B (en) Multi-stage time sequence data compression method and device, electronic equipment and storage medium
CN111667196B (en) Method, device and equipment for controlling food formula improvement based on user behavior
TWI712033B (en) Voice identifying method, device, computer device and storage media

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant