CN113258934A - Data compression method, system and equipment - Google Patents

Data compression method, system and equipment Download PDF

Info

Publication number
CN113258934A
CN113258934A CN202110703279.XA CN202110703279A CN113258934A CN 113258934 A CN113258934 A CN 113258934A CN 202110703279 A CN202110703279 A CN 202110703279A CN 113258934 A CN113258934 A CN 113258934A
Authority
CN
China
Prior art keywords
data
fitting function
compressed
polynomial fitting
compression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110703279.XA
Other languages
Chinese (zh)
Inventor
邹婷
王楠
段泽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Highlandr Digital Technology Co ltd
Original Assignee
Beijing Highlandr Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Highlandr Digital Technology Co ltd filed Critical Beijing Highlandr Digital Technology Co ltd
Priority to CN202110703279.XA priority Critical patent/CN113258934A/en
Publication of CN113258934A publication Critical patent/CN113258934A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The embodiment of the invention discloses a data compression method, which comprises the following steps: calculating a polynomial fitting function of a data set of a data file to be compressed; calculating a fitting function calculation value corresponding to each numerical value in the data set according to the polynomial fitting function; calculating the difference value between each numerical value and the corresponding fitting function calculation value to obtain a difference value set; and storing the data head of the compressed data file, the difference set and the polynomial fitting function to obtain the compressed data file. The invention also discloses a data compression system and equipment. The invention has the beneficial effects that: by adopting the data compression method and the data compression system, the data precision can not be reduced, the compression ratio and the compression efficiency of the data can be improved, the data storage expense can be reduced, the bandwidth requirement during network transmission can be reduced, and the data transmission under the condition of poor network conditions can be facilitated.

Description

Data compression method, system and equipment
Technical Field
The present invention relates to the field of data compression technologies, and in particular, to a data compression method, system, and device.
Background
For data with large data volume, the existing data compression method has the disadvantages of insufficient compression ratio, low compression efficiency, too complex compression method and huge data model, which is not beneficial to network transmission. For example, meteorological data has a large number of elements, a large time span, and a large overall data amount, but the conventional data compression method cannot satisfy requirements for compression ratio, compression efficiency, and accuracy at the same time.
Disclosure of Invention
In order to solve the above problems, the present invention provides a data compression method, system and device with high data precision, high compression ratio and high compression efficiency.
The invention provides a data compression method, which comprises the following steps:
calculating a polynomial fitting function of a data set of a data file to be compressed;
calculating a fitting function calculation value corresponding to each numerical value in the data set according to the polynomial fitting function;
calculating the difference value between each numerical value and the corresponding fitting function calculation value to obtain a difference value set;
and storing the data head of the compressed data file, the difference set and the polynomial fitting function to obtain the compressed data file.
As a further improvement of the invention, the data file to be compressed comprises a plurality of data sets, and a polynomial fitting function and a difference value set of each data set are respectively calculated.
As a further improvement of the invention, a polynomial fitting function of the data set of the data file to be compressed is calculated, and the polynomial fitting function of the data set is calculated by adopting a least square method.
As a further improvement of the invention, the data set of the data file to be compressed is a NetCDF data set, the NetCDF data set comprises a plurality of variables, the variables are N-dimensional arrays with time as an independent variable, and N is a positive integer.
As a further improvement of the present invention, the NetCDF data set is divided into a plurality of data subsets according to the difference of variables, each variable corresponds to one data subset, a polynomial fitting function and a difference set of each data subset are sequentially calculated, and the data headers, the polynomial fitting function and the difference set of the data subsets are sequentially stored to obtain a compressed data file.
As a further improvement of the invention, the obtained compressed data file is subjected to secondary compression, and the secondary compression adopts a zstd compression algorithm.
The present invention also provides a data compression system, the system comprising:
the data set acquisition module is used for reading the data files to be compressed to obtain M data sets of the data files to be compressed, wherein M is a positive integer;
a polynomial fitting module for calculating a polynomial fitting function of the M data sets, respectively, to obtain M polynomial fitting functions;
the difference value calculation module is used for calculating a fitting function calculation value corresponding to each numerical value in the data set according to the polynomial fitting function aiming at each data set, and calculating the difference value between each numerical value and the corresponding fitting function calculation value to obtain a difference value set of M data sets;
and the data compression module is used for storing the data heads of the M data sets of the compressed data file, the difference set and the polynomial fitting function to obtain the compressed data file.
As a further improvement of the invention, the polynomial fitting module respectively calculates the polynomial fitting functions of the M data sets by adopting a least square method.
As a further improvement of the invention, the data set of the data file to be compressed is a NetCDF data set, the NetCDF data set comprises a plurality of variables, the variables are N-dimensional arrays with time as an independent variable, and N is a positive integer.
As a further improvement of the present invention, the data set acquisition module divides the NetCDF data set into a plurality of data subsets according to the difference of variables, and each variable corresponds to one data subset; the polynomial fitting module calculates a polynomial fitting function of each data subset in sequence; the difference value calculation module calculates a fitting function calculation value corresponding to each numerical value in each data subset in sequence, and calculates the difference value between each numerical value and the corresponding fitting function calculation value to obtain a difference value set; and the data compression module sequentially stores the data head, the difference set and the polynomial fitting function of each data subset in a storage manner to obtain a compressed data file.
As a further improvement of the invention, the system also comprises a secondary compression module, wherein the secondary compression module carries out secondary compression on the obtained compressed data file, and the secondary compression adopts a zstd compression algorithm.
The invention provides an electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer instructions, and wherein the one or more computer instructions are executed by the processor to implement the data compression method.
The present invention provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program is executed by a processor to implement the above-mentioned data compression method.
The invention has the beneficial effects that: by calculating a polynomial fitting function of a data set of a data file to be compressed and calculating a difference value between a fitting function calculation value corresponding to each numerical value in the data set and an original numerical value, the data length of the difference value is smaller than that of the original numerical value, and the required storage space is smaller, so that the purpose of data compression is achieved; by adopting the data compression method and the data compression system, the data precision can not be reduced, the compression ratio and the compression efficiency of the data can be improved, the data storage expense can be reduced, the bandwidth requirement during network transmission can be reduced, and the data transmission under the condition of poor network conditions can be facilitated.
Drawings
Fig. 1 is a schematic flow chart of a data compression method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a compression process of a data compression method according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a decompression process of a data compression method according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a polynomial fit curve of a data compression method according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a data compression system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, a data compression method according to an embodiment of the present invention includes: calculating a polynomial fitting function of a data set of a data file to be compressed; calculating a fitting function calculation value corresponding to each numerical value in the data set according to a polynomial fitting function; then calculating the difference value between each numerical value and the corresponding fitting function calculation value to obtain a difference value set; and storing the data head, the difference set and the polynomial fitting function of the compressed data file to obtain the compressed data file.
The data file to be compressed can have one or more data sets, and the same type of data can be classified into one data set, and each type of data has respective data attributes. In this embodiment, one process for implementing the method is as follows: after the data file to be compressed is obtained, a data header and data attributes of the data file to be compressed can be read into a memory, a polynomial fitting function and a difference set are sequentially solved for each data set, the polynomial fitting function and the difference set are stored in the memory, finally, the data header stored in the memory is simplified and stored in the data file, the obtained polynomial fitting function and the obtained difference set are sequentially stored in the data file, and the compressed data file is obtained.
By calculating a polynomial fitting function, calculating the difference value between the fitting function calculation value corresponding to each numerical value in the data set and the original numerical value, and enabling the difference value to occupy fewer bytes as far as possible, only the difference value set and the polynomial fitting function are finally stored, and therefore the purpose of reducing the storage space is achieved. By the method, the precision of the original data (the data in the data file to be compressed) is not influenced.
According to an optional implementation mode, the data file to be compressed comprises a plurality of data sets, each data set is the same type of data with relevance between the data, the types of the data can be distinguished according to data attributes, after the data sets are obtained, a polynomial fitting function and a difference value set of each data set are respectively calculated, and finally, a data header, the polynomial fitting function and the difference value set of each data set are sequentially stored to obtain the compressed data file. By utilizing the relevance among the same type of data, the polynomial fitting function and the difference value set of each data set are obtained through classification calculation, and because the difference value is smaller than the original value, only the difference value set and the polynomial fitting function are finally stored, so that the aim of reducing the storage space is fulfilled. Particularly for data with precision requirements, for example, on the premise of ensuring 2-bit decimal precision, the method provided by the embodiment of the invention can improve the data compression ratio by 15 to 20 times, and compared with the existing compression method (the compression ratio is about 5 times), the compression ratio is higher, and meanwhile, the data precision requirements can be met.
In an alternative embodiment, the polynomial fitting function of the data set of the data file to be compressed is calculated by using a least square method. Least squares (also known as the least squares method) is a mathematical optimization technique that finds the best match function for data by minimizing the sum of squares of the errors. The unknown data can be easily determined by the least square method, and the sum of the squares of the errors between these determined data and the actual data is minimized, i.e., the difference value finally stored is smaller. One implementation process of calculating the polynomial fitting function by the least square method in this embodiment is as follows:
setting a fitting polynomial:
Figure DEST_PATH_IMAGE001
assuming coefficients of the optimal function
Figure DEST_PATH_IMAGE003
j (j = 1, 2, 3.. times.n) minimizes the sum of squared errors S, so that for an optimal function, its sum of squared errors S is applied to the polynomial coefficients
Figure 190457DEST_PATH_IMAGE003
The partial derivative of j (j = 1, 2, 3.. times, n) should satisfy:
Figure 248542DEST_PATH_IMAGE004
j is 0, 1, 2, when n is taken, the following are:
Figure DEST_PATH_IMAGE005
and (5) decomposing the error square sum S into a matrix form. Order:
Figure 167562DEST_PATH_IMAGE006
the sum of squared errors S can be written as:
Figure DEST_PATH_IMAGE007
Figure 484143DEST_PATH_IMAGE008
is a Vandermonde Matrix (Vandermonde Matrix),
Figure 688728DEST_PATH_IMAGE003
still a coefficient vector of polynomial coefficients,
Figure DEST_PATH_IMAGE009
is the output vector of the sample data set. For the optimal function, it should satisfy:
Figure 839086DEST_PATH_IMAGE010
polynomial coefficient vector for obtaining optimal function
Figure 371699DEST_PATH_IMAGE003
Comprises the following steps:
Figure DEST_PATH_IMAGE011
obtaining the coefficient matrix
Figure 963742DEST_PATH_IMAGE003
]Meanwhile, a polynomial fitting function is obtained.
The essence of the matrix method in the embodiment of the invention is that a Van der Monde matrix is constructed through a sample set, and a univariate N-degree polynomial nonlinear regression problem is converted into an N-degree linear regression problem (namely, multiple linear regression).
For the solution of the linear regression problem, we use here the QR decomposition based on the Householder transform. The specific derivation process is as follows:
the least squares is generally of the form:
Figure 22834DEST_PATH_IMAGE012
wherein
Figure 219460DEST_PATH_IMAGE014
Is a residual function, representing the difference between the predicted value and the measured value,
Figure DEST_PATH_IMAGE015
as a function of loss
1) When in use
Figure DEST_PATH_IMAGE017
In the case of a linear equation, the linear least squares problem is:
Figure 537571DEST_PATH_IMAGE018
the expansion is as follows:
Figure DEST_PATH_IMAGE019
the derivation is:
Figure 477845DEST_PATH_IMAGE020
when the derivative is 0, the value of the loss function is found to be the minimum, so:
Figure DEST_PATH_IMAGE021
the above description yields a linear least squares problem
Figure 594706DEST_PATH_IMAGE022
Is solved as
Figure DEST_PATH_IMAGE023
Because the matrix reciprocal is required, in order to reduce the calculation difficulty, QR decomposition can be adopted for solving.
First, a is QR decomposed, i.e. a = QR where
Figure 742659DEST_PATH_IMAGE024
For the upper triangular matrix:
Figure DEST_PATH_IMAGE025
wherein RR is an upper triangular matrix, inversion is relatively easy, and direct pairing is avoided
Figure 377427DEST_PATH_IMAGE026
The inversion complexity is high.
2) In the case of a non-linear equation, the least squares problem is
Figure DEST_PATH_IMAGE027
Let the state vector x = (x1, x 2., xm),
Figure 777184DEST_PATH_IMAGE028
first order Tayor expansion:
Figure 561601DEST_PATH_IMAGE030
where is the Jacobian matrix, expressed as:
Figure DEST_PATH_IMAGE031
iteration
Figure DEST_PATH_IMAGE033
Until convergence, the optimal solution x is obtained
QR decomposition may also be employed here to solve for Δ x
And setting QR decomposition to obtain:
Figure 942773DEST_PATH_IMAGE034
an analog linear least squares method is used,
Figure DEST_PATH_IMAGE035
the nonlinear least squares problem is therefore solved iteratively as:
Figure 787539DEST_PATH_IMAGE036
preferably, the polynomial fitting function selects the over-fitting curve function, so that the calculated difference is smaller, and the storage space occupied by the finally obtained compressed data file is smaller.
In an optional embodiment, the data set of the data file to be compressed is a NetCDF data set, where the NetCDF data set includes a plurality of variables, the variables are N-dimensional arrays using time as an argument, and N is a positive integer. Dividing the NetCDF data set into a plurality of data subsets according to different variables, wherein each variable corresponds to one data subset, sequentially calculating a polynomial fitting function and a difference set of each data subset, and sequentially storing a data head, the polynomial fitting function and the difference set of each data subset to obtain a compressed data file.
One implementation of an embodiment of the invention, for example, compresses meteorological data. The NetCDF format is the most common storage format for meteorological data, and because there are many related elements in meteorological data and the time span is large, the data size is generally large, and the large number causes problems in data storage and network transmission. The existing compression technology of the NetCDF meteorological data mainly comprises the following steps:
1. rooka and the like establish a two-dimensional linear prediction statistical model of meteorological grid point data by analyzing the correlation between adjacent grid points of common meteorological elements and calculating the symbol entropy and the information redundancy of an element field, eliminate redundant information and provide a new method for lossless compression of data by combining Huffman coding.
2. The BP neural network is introduced on the basis of two-dimensional linear prediction, a secondary prediction model based on the neural network is established, redundant information of meteorological grid point data is effectively eliminated, and a novel efficient lossless compression scheme is provided by combining entropy coding.
3. In consideration of flood and the like, a new method is provided for changing the storage sequence of meteorological data through subtraction operation of adjacent time data, enabling the high 8 bits of the stored data to have 00 values or FF values as much as possible, and then compressing through a Win-RAR tool.
4. The method reduces redundant reading among meteorological lattice point data by a quadratic linear prediction method, solves the problem of large memory space occupied by the meteorological lattice point data during storage, is obtained by combining 500hPa height field analysis, and has the advantage that the variance of a prediction error sequence is one order of magnitude smaller than that of an original sequence, thereby showing that the correlation of the error sequence is greatly reduced, and proving that the lattice point data prediction compression can be completely realized.
Although the prior art can compress meteorological data on the premise of ensuring certain data precision, the problems of low compression ratio, complex compression method, huge data model and the like are solved, and the network transmission is not facilitated. For example, in the prior art, when the precision of 1 digit decimal is guaranteed, the highest compression ratio is about 10 times, and when the precision of 2 digit decimal is guaranteed, the compression ratio is about 5 times. In addition, the compression ratio is uncertain due to terrain changes only aiming at data without special ocean lattice point elements. Therefore, the prior art cannot be applied to common grid point meteorological data, the application range is not general enough, and the limitation is large.
The method compresses the NetCDF meteorological data, calculates to obtain a polynomial fitting function by utilizing the relevance between adjacent time grid point data, calculates a fitting function calculation value corresponding to each numerical value according to the polynomial fitting function, then calculates the difference value between the original storage numerical value and the fitting function calculation value thereof to obtain a difference value set, ensures that the difference value is not more than 1 byte as far as possible, and finally only stores the difference value set and the polynomial fitting function, thereby achieving the purpose of reducing the meteorological data storage space. One implementation process for compressing the NetCDF meteorological data implemented by the invention is as follows:
as shown in fig. 2, the NetCDF meteorological data is read into the memory, and then the different meteorological element data (such as air temperature, air pressure, wind, humidity, cloud, precipitation and various weather phenomena) are classified into a plurality of data sets according to the time-varying sequence through the grid point data, the data sets of different elements are stored in the memory, and the file header and the element attributes are also read into the memory. Assuming that the meteorological data is a 4-dimensional N element, time, longitude, and latitude (variables) are respectively expressed as: t, I, J, each data is stored in 8 bytes, the total data size (pure data size) is: t I J N8. Since the data set is changed according to the laws of time, longitude and latitude, redundant storage of the three variables can be removed from each element, only three values T, I, J need to be stored, and therefore the total data volume can be directly reduced by T I J3 bytes.
After the processing, I X J (N-3) data sets are obtained, then optimal polynomial fitting is carried out on each data set, and a polynomial fitting function is calculated. In this embodiment, a least square method is used to perform polynomial fitting, for example, a data set (original data set) obtained by intercepting the change of the atmospheric pressure element of one grid point in a certain NetCDF meteorological data file over time is as follows: "103678.999 '," 103673.769', "103677.423 '," 103723.144', "103831.389 '," 103993.258', "104192.559 '," 104393.131', "104571.853 '," 104726.615', "104876.057 '," 105039.165', "105228.224 '," 105423.211', "105550.947 '," 105555.651', "105517.754 '," 105514.537', "a multi-order fitting by a multi-order fitting method by a multi-order method by a method of a multi-order method:
Figure 256566DEST_PATH_IMAGE038
the resulting fitted curve is shown in fig. 4.
The set of differences between the calculated raw data and the calculated values by the polynomial fitting function is:
[80.0, 36.0, -32.0, -84.0, -97.0, -72.0, -21.0, 21.0, 39.0, 33.0, 24.0, 35.0, 83.0, 148.0, 160.0, 66.0, -52.0, -116.0, -122.0, -97.0, -106.0, -130.0, -132.0, -71.0, 23.0, 49.0, 70.0, 93.0, 163.0, 223.0, 151.0, 40.0, -73.0, -149.0, -159.0, -120.0, -57.0, 4.0, 80.0, 68.0, 0.0],
comparing the original data set with the difference set, it can be seen that the value of the original data set is larger, the difference set is reduced by a very large amount relative to the original data set, and compared with the pure data size T × I × J × 8 × N of the original data set, the data size of the difference set is reduced to T × I × J1 (N-3). On the premise of ensuring the precision of 2-bit decimal, the data compression ratio is improved to 15 to 20 times, and compared with other modes, the method obviously improves the data compression ratio and compression efficiency and meets the high-precision requirement of data.
And solving an optimal fitting polynomial, namely a polynomial fitting function, from the I, J, N-3 data sets in sequence, storing a difference set between a calculated value of the fitting function and an original value, and storing the polynomial fitting function and the difference set to a memory. And finally, performing data coding storage on the processed data, wherein the data coding storage comprises the following steps: first, simplifying the head data of a data file to be compressed and storing the head data into the file, and then sequentially storing a polynomial fitting function and a difference value set into the file to obtain the compressed data file.
In an optional implementation manner, after the compressed data file is obtained, the compressed data file is subjected to secondary compression, and the secondary compression adopts a zstd compression algorithm, so that the data compression rate can be further improved by about 15%. The compressed data file is more beneficial to network transmission, and particularly under the condition of poor network conditions (such as inland river and ship shore communication), the data transmission pressure is lower.
For the data compression process of this embodiment, the corresponding data decompression process is: as shown in fig. 3, firstly, the zstd compression algorithm is called to decompress the data file; then, reading the simplified head data, the polynomial fitting function and the difference value set stored in sequence, and temporarily storing the difference value set in an internal memory; then finding out a corresponding polynomial fitting function and a corresponding difference set, and restoring an original data value by calculating the sum of a calculated value and a difference value of the polynomial fitting function; and finally, saving the restored data value to a file according to the format.
As shown in fig. 5, a data compression system according to an embodiment of the present invention includes: the device comprises a data set acquisition module, a polynomial fitting module, a difference value calculation module and a data compression module; the data set acquisition module is used for reading a data file to be compressed to obtain M data sets of the data file to be compressed, wherein M is a positive integer; the polynomial fitting module is used for respectively calculating polynomial fitting functions of the M data sets to obtain M polynomial fitting functions; the difference value calculation module calculates a fitting function calculation value corresponding to each numerical value in the data set according to a polynomial fitting function aiming at each data set, and calculates the difference value between each numerical value and the corresponding fitting function calculation value to obtain difference value sets of M data sets; the data compression module is used for storing data heads, polynomial fitting functions and difference value sets of M data sets of compressed data files to obtain the compressed data files.
The data compression system of the present embodiment can also compress meteorological data in NetCDF format. The method comprises the steps of calculating to obtain a polynomial fitting function by utilizing the relevance between grid point data of adjacent time of meteorological data, calculating a fitting function calculation value corresponding to each numerical value according to the polynomial fitting function, calculating the difference value between an original storage numerical value and the fitting function calculation value to obtain a difference value set, ensuring that the difference value is not more than 1 byte as far as possible, and finally only storing the difference value set and the polynomial fitting function, thereby achieving the purpose of reducing the meteorological data storage space. Wherein, the least square algorithm can be adopted for calculating the polynomial fitting function. Preferably, the polynomial fitting function selects the over-fitting curve function, so that the calculated difference is smaller, and the storage space occupied by the finally obtained compressed data file is smaller.
In an optional implementation manner, the system implemented by the present invention further includes a secondary compression module, where the secondary compression module performs secondary compression on the obtained compressed data file, and the secondary compression employs a zstd compression algorithm. Through secondary compression, the compression rate of the data file can be improved by about 15 percent.
The invention also relates to an electronic device comprising the server, the terminal and the like. The electronic device includes: at least one processor; a memory communicatively coupled to the at least one processor; and a communication component communicatively coupled to the storage medium, the communication component receiving and transmitting data under control of the processor; wherein the memory stores instructions executable by the at least one processor to implement the method of the above embodiments.
In an alternative embodiment, the memory is used as a non-volatile computer-readable storage medium for storing non-volatile software programs, non-volatile computer-executable programs, and modules. The processor executes various functional applications of the device and data processing, i.e., implements the method, by executing nonvolatile software programs, instructions, and modules stored in the memory.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store a list of options, etc. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be connected to the external device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more modules are stored in the memory and, when executed by the one or more processors, perform the methods of any of the method embodiments described above.
The product can execute the method provided by the embodiment of the application, has corresponding functional modules and beneficial effects of the execution method, and can refer to the method provided by the embodiment of the application without detailed technical details in the embodiment.
The present invention also relates to a computer-readable storage medium for storing a computer-readable program for causing a computer to perform some or all of the above-described method embodiments.
That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Furthermore, those of ordinary skill in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
It will be understood by those skilled in the art that while the present invention has been described with reference to exemplary embodiments, various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims (10)

1. A method of data compression, the method comprising:
calculating a polynomial fitting function of a data set of a data file to be compressed;
calculating a fitting function calculation value corresponding to each numerical value in the data set according to the polynomial fitting function;
calculating the difference value between each numerical value and the corresponding fitting function calculation value to obtain a difference value set;
and storing the data head of the compressed data file, the difference set and the polynomial fitting function to obtain the compressed data file.
2. The method of claim 1, wherein the data file to be compressed comprises a plurality of data sets, and the polynomial fitting function and the difference set are calculated separately for each data set.
3. The method of claim 1, wherein the polynomial fit function of the data set is calculated using a least squares method.
4. The method according to claim 1, wherein the dataset of the data file to be compressed is a NetCDF dataset, the NetCDF dataset comprises a plurality of variables, the variables are N-dimensional arrays with time as an argument, and N is a positive integer.
5. The method according to claim 4, wherein the NetCDF data set is divided into a plurality of data subsets according to different variables, each variable corresponds to one data subset, a polynomial fitting function and a difference set of each data subset are sequentially calculated, and data headers, the polynomial fitting function and the difference set of the data subsets are sequentially stored to obtain a compressed data file.
6. The method according to any one of claims 1 to 5, wherein the resulting compressed data file is subjected to a secondary compression using a zstd compression algorithm.
7. A data compression system, the system comprising:
the data set acquisition module is used for reading the data files to be compressed to obtain M data sets of the data files to be compressed, wherein M is a positive integer;
a polynomial fitting module for calculating a polynomial fitting function of the M data sets, respectively, to obtain M polynomial fitting functions;
the difference value calculation module is used for calculating a fitting function calculation value corresponding to each numerical value in the data set according to the polynomial fitting function aiming at each data set, and calculating the difference value between each numerical value and the corresponding fitting function calculation value to obtain a difference value set of M data sets;
and the data compression module is used for storing the data heads of the M data sets of the compressed data file, the difference set and the polynomial fitting function to obtain the compressed data file.
8. The system of claim 7, further comprising a secondary compression module that secondarily compresses the resulting compressed data file, wherein the secondary compression employs a zstd compression algorithm.
9. An electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method of any of claims 1-6.
10. A computer-readable storage medium, on which a computer program is stored, the computer program being executable by a processor for implementing the method according to any one of claims 1-6.
CN202110703279.XA 2021-06-24 2021-06-24 Data compression method, system and equipment Pending CN113258934A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110703279.XA CN113258934A (en) 2021-06-24 2021-06-24 Data compression method, system and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110703279.XA CN113258934A (en) 2021-06-24 2021-06-24 Data compression method, system and equipment

Publications (1)

Publication Number Publication Date
CN113258934A true CN113258934A (en) 2021-08-13

Family

ID=77189459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110703279.XA Pending CN113258934A (en) 2021-06-24 2021-06-24 Data compression method, system and equipment

Country Status (1)

Country Link
CN (1) CN113258934A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114461594A (en) * 2021-12-31 2022-05-10 国网河北省电力有限公司营销服务中心 Data compression method, edge device and computer storage medium
CN114553888A (en) * 2022-01-24 2022-05-27 浙江数秦科技有限公司 Low-network-occupation data transmission method suitable for block chain
CN114567596A (en) * 2022-01-24 2022-05-31 浙江数秦科技有限公司 Data fast exchange method for block chain
CN115833843A (en) * 2023-02-14 2023-03-21 临沂云斗电子科技有限公司 Vehicle operation monitoring data storage optimization method and management platform
WO2023070424A1 (en) * 2021-10-28 2023-05-04 华为技术有限公司 Database data compression method and storage device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1866241A (en) * 2006-06-21 2006-11-22 浙江中控软件技术有限公司 Real-time data compression method based on least square linear fit
CN101894135A (en) * 2009-06-15 2010-11-24 复旦大学 Method for compressing and storing GPS data based on route clustering
CN103414476A (en) * 2013-08-09 2013-11-27 北华大学 Production energy consumption real-time data compression method
CN105808708A (en) * 2016-03-04 2016-07-27 广东轻工职业技术学院 Quick data compression method
CN105807266A (en) * 2016-05-19 2016-07-27 中国人民解放军军械工程学院 Compression method for early-warning radar track data transmission
US20190258619A1 (en) * 2016-09-14 2019-08-22 Turbo Data Laboratories, Inc. Data compression method, data compression device, computer program, and database system
CN112054805A (en) * 2020-09-14 2020-12-08 哈尔滨工业大学(深圳) Model data compression method, system and related equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1866241A (en) * 2006-06-21 2006-11-22 浙江中控软件技术有限公司 Real-time data compression method based on least square linear fit
CN101894135A (en) * 2009-06-15 2010-11-24 复旦大学 Method for compressing and storing GPS data based on route clustering
CN103414476A (en) * 2013-08-09 2013-11-27 北华大学 Production energy consumption real-time data compression method
CN105808708A (en) * 2016-03-04 2016-07-27 广东轻工职业技术学院 Quick data compression method
CN105807266A (en) * 2016-05-19 2016-07-27 中国人民解放军军械工程学院 Compression method for early-warning radar track data transmission
US20190258619A1 (en) * 2016-09-14 2019-08-22 Turbo Data Laboratories, Inc. Data compression method, data compression device, computer program, and database system
CN112054805A (en) * 2020-09-14 2020-12-08 哈尔滨工业大学(深圳) Model data compression method, system and related equipment

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023070424A1 (en) * 2021-10-28 2023-05-04 华为技术有限公司 Database data compression method and storage device
CN114461594A (en) * 2021-12-31 2022-05-10 国网河北省电力有限公司营销服务中心 Data compression method, edge device and computer storage medium
CN114553888A (en) * 2022-01-24 2022-05-27 浙江数秦科技有限公司 Low-network-occupation data transmission method suitable for block chain
CN114567596A (en) * 2022-01-24 2022-05-31 浙江数秦科技有限公司 Data fast exchange method for block chain
CN114567596B (en) * 2022-01-24 2024-04-05 浙江数秦科技有限公司 Data quick exchange method for block chain
CN114553888B (en) * 2022-01-24 2024-04-05 浙江数秦科技有限公司 Low network occupation data transmission method suitable for block chain
CN115833843A (en) * 2023-02-14 2023-03-21 临沂云斗电子科技有限公司 Vehicle operation monitoring data storage optimization method and management platform

Similar Documents

Publication Publication Date Title
CN113258934A (en) Data compression method, system and equipment
CN107832837B (en) Convolutional neural network compression method and decompression method based on compressed sensing principle
CN108960333B (en) Hyperspectral image lossless compression method based on deep learning
CN116681036B (en) Industrial data storage method based on digital twinning
CN109859281B (en) Compression coding method of sparse neural network
US12080384B2 (en) Method for compressing genomic data
WO2015180203A1 (en) High-throughput dna sequencing quality score lossless compression system and compression method
CN105374054A (en) Hyperspectral image compression method based on spatial spectrum characteristics
CN110021369B (en) Gene sequencing data compression and decompression method, system and computer readable medium
CN109408765B (en) Intelligent matching tracking sparse reconstruction method based on quasi-Newton method
CN115695564B (en) Efficient transmission method of Internet of things data
US11595057B2 (en) Reducing error in data compression
CN106452452A (en) Full-pulse data lossless compression method based on K-means clustering
US20220392117A1 (en) Data compression and decompression system and method thereof
CN116361256A (en) Data synchronization method and system based on log analysis
CN115618051A (en) Internet-based smart campus monitoring video storage method
CN109543772B (en) Data set automatic matching method, device, equipment and computer readable storage medium
CN112468154A (en) Data compression method suitable for visualization of oceanographic weather
CN103227644A (en) Compression method of automobile body small-format data
CN116029340B (en) Image and semantic information transmission method based on deep learning network
CN106101732B (en) The vector quantization scheme of Fast Compression bloom spectrum signal
Martinez et al. Marlin: A high throughput variable-to-fixed codec using plurally parsable dictionaries
CN103985096A (en) Hyperspectral image regression prediction compression method based on off-line training
CN106331719A (en) K-L transformation error space dividing based image data compression method
CN114665885A (en) Self-adaptive data compression method for time sequence database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210813