CN114268323A - Data compression coding method and device supporting line memory and time sequence database - Google Patents

Data compression coding method and device supporting line memory and time sequence database Download PDF

Info

Publication number
CN114268323A
CN114268323A CN202111597104.1A CN202111597104A CN114268323A CN 114268323 A CN114268323 A CN 114268323A CN 202111597104 A CN202111597104 A CN 202111597104A CN 114268323 A CN114268323 A CN 114268323A
Authority
CN
China
Prior art keywords
data
value
difference
difference value
line
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111597104.1A
Other languages
Chinese (zh)
Other versions
CN114268323B (en
Inventor
吴春中
张浩阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Sobey Digital Technology Co Ltd
Original Assignee
Chengdu Sobey Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sobey Digital Technology Co Ltd filed Critical Chengdu Sobey Digital Technology Co Ltd
Priority to CN202111597104.1A priority Critical patent/CN114268323B/en
Publication of CN114268323A publication Critical patent/CN114268323A/en
Application granted granted Critical
Publication of CN114268323B publication Critical patent/CN114268323B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a data compression coding method and device supporting a line memory and a time sequence database, belonging to the technical field of data compression coding and comprising the following steps: s1, defining the expression of the numerical data, and defining the data block according to the defined expression; s2, after inputting a line of data, judging whether the line is the first line of the data block, if so, calculating the difference value by using the previous column reference and based on the expression of the defined numerical data; if not, calculating the difference value by using the reference of the corresponding column of the previous row and based on the defined expression of the numerical data, starting row encoding, entering column-by-column cyclic encoding, judging the writing mode according to the data state, and then performing cyclic writing. The invention improves the utilization rate of the storage space, increases the storage capacity, improves the utilization rate of the cache, and can realize a simple decoding algorithm, thereby greatly improving the overall performance of the database.

Description

Data compression coding method and device supporting line memory and time sequence database
Technical Field
The present invention relates to the field of data compression coding technology, and more particularly, to a data compression coding method and apparatus supporting line memory, and a time sequence database.
Background
At present, time sequence databases are very many, and representative databases include HBase, InfluxDB, IOTDB, PI real-time databases and the like. The encoding mode of these databases is mostly similar to the technical scheme mentioned In Gorrilla paper of a Fast, Scalable, In-Memory Time Series Database: such a time and floating point based encoding is described with reference to fig. 1. In addition, many encoding algorithms have been developed by these database vendors, such as: simple8b, zigzag, run length coding (run length coding), revolving door compression algorithm, etc. all of these databases have a common characteristic that it adopts a column storage mode to gather a large amount of data of the same column together for coding, and the efficiency is very high when processing a certain column of data in the database. However, in practical applications, there are many scenarios that require processing data row by row, and when the number of columns needs to be processed is large, the column storage performance is not ideal, and the column storage coding cannot be immediately coded when data is written row by row.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a data compression coding method and device supporting line memory and a time sequence database, improves the utilization rate of memory space, increases memory space, improves the utilization rate of cache, can realize a simple decoding algorithm, and thus greatly improves the overall performance of the database.
The purpose of the invention is realized by the following scheme:
a data compression coding method supporting line memory comprises the following steps:
s1, defining the expression of the numerical data, and defining the data block according to the defined expression;
s2, after inputting a line of data, judging whether the line is the first line of the data block, if so, calculating the difference value by using the previous column reference and based on the expression of the defined numerical data; if not, calculating the difference value by using the reference of the corresponding column of the previous row and based on the defined expression of the numerical data, starting row encoding, entering column-by-column cyclic encoding, judging the writing mode according to the data state, and then performing cyclic writing.
Further, the expression of the numerical data specifically includes a true number and a decimal place, the true number is an expression mode in which a decimal point is removed from an actual value of the numerical value, and the true number is converted into a binary expression; the decimal place indicates that the position of the real number from right to left is a decimal point, namely, the position is expressed by the minus power of 10.
Further, the calculation difference is established under the condition that the decimal digits of the two numerical values are consistent, and the calculation method is obtained by performing exclusive or operation on the real digits of the two numerical values.
Further, the difference value is stored as a variable length pattern.
Further, the writing mode comprises writing an original value, writing a difference value and performing a null operation.
Further, the data blocks are independently encoded.
Further, the starting row encoding, entering column-wise cyclic encoding, comprises the sub-steps of: intra-row reference coding is taken for the first row within the data block: only storing an original value in the first column, calculating a difference value by XOR if the decimal digits of the first column are the same, and storing the difference value if the storage space of the difference value is smaller than a set value, otherwise, directly storing the original value; and in the subsequent rows from the second row, calculating the difference value of the corresponding column and the previous row through XOR, if the difference value can be calculated and the storage space of the difference value is less than a set value, storing the difference value, and otherwise, directly storing the original value.
Further, in the line coding process, the data type is used to describe the category of the data, and the field description information is used to describe the type of data storage: the original value, the difference value, the null value and the previous one are completely consistent in four states.
Furthermore, in the line coding process, the data organization operation shares one field description information according to 4 fields, and when the state is null or completely consistent with the previous one, no additional space is occupied; when the four fields are completely consistent with the previous field, the field description information is omitted, 1 bit is used for identifying a 0 on the field bitmap, one field bitmap byte manages 8 field description information, namely 32 field data, and when the data of the 32 fields are not changed, a 0 is stored on the field bitmap.
Further, the category of the data comprises a first-line data or a second-line data identifier.
Furthermore, in the line storage coding process, only the previous line of data needs to be referred to, and the database writing coding process is to write a line of coded lines.
Further, a decoding step is included;
and S3, when the database retrieval requirement occurs, positioning the data block to be read according to the upper layer data, and then decoding line by line.
Further, said performing progressive decoding comprises the sub-steps of: and reading subsequent data according to types and performing operation to obtain an actual value according to the field bitmap and the field description information in the line memory organization structure.
Further, the method comprises a data recovery step, wherein if the original value represents that the original data can be recovered directly according to the coding rule, if the difference represents that the XOR operation of the reference data and the difference is needed and the sign is corrected, if the difference represents that the difference and the reference data are consistent, the reference data is directly used, and if the difference represents that the difference and the reference data are consistent, the data is null.
Further, the states of the decimal place are specifically: a positive number indicates that a decimal number is included, a zero indicates that the number is an integer, and a negative number indicates that the number is exponential.
A data compression coding device supporting line memory comprises a numerical data coding unit, a numerical difference data coding unit and a line memory coding unit; the numerical data coding unit is used for defining a numerical data expression mode comprising real numbers, decimal digits and sign information; the real number is an expression mode of a numerical value with decimal points removed, and then the real number is converted into a binary expression; the decimal digit indicates that the position of the current real number from right to left is a decimal point, namely the decimal point is multiplied by the minus power of 10 to be consistent with the expressed numerical value; the numerical type difference data coding unit is used for coding the difference, storing the difference as a variable length mode, calculating the difference under the condition that decimal digits of the two numerical values are consistent, and performing exclusive-or operation on real digits of the two numerical values by using a calculation method; the line memory coding unit is used for storing data in a database and is designed to independently code according to data blocks; intra-row reference coding is taken for the first row within the data block: only storing an original value in the first column, calculating a difference value by XOR if the decimal digits of the first column are the same, and storing the difference value if the storage space of the difference value is smaller than a set value, otherwise, directly storing the original value; and in the subsequent rows from the second row, calculating the difference value of the corresponding column and the previous row through XOR, if the difference value can be calculated and the storage space of the difference value is less than a set value, storing the difference value, and otherwise, directly storing the original value.
Further, the device comprises a data decoding unit and a data recovery unit; the data decoding unit is used for positioning a data block to be read according to upper layer data when a database retrieval demand occurs; then, decoding line by line, namely reading subsequent data according to the type and calculating according to the field bitmap and the field description information to obtain an actual value; the data recovery unit is configured to execute the following processes: if the original value indicates that the original data can be recovered directly according to the coding rule, if the difference value indicates that the XOR operation between the reference data and the difference value is needed and the symbol is corrected, if the difference value indicates that the difference value is consistent with the reference data, the reference data is directly used, and if the difference value indicates that the difference value is null, the data is null.
A time series database comprising a readable storage medium and a program, which when run on the readable storage medium performs a method as described above; or a data compression encoding apparatus supporting the line memory as described in any of the above.
The invention has the beneficial effects that:
1. the embodiment of the invention can realize accurate representation of data coding.
2. The embodiment of the invention can realize effective compression of the storage space
3. The embodiment of the invention can improve the utilization efficiency of the cache.
4. The embodiment of the invention can realize the insertion and compression at the same time.
5. The embodiment of the invention can improve the data processing efficiency.
The advantageous effects of the present invention are not limited to the above description, and are specifically set forth in the embodiments in detail.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a diagram illustrating a conventional encoding scheme based on time and floating-point numbers;
FIG. 2 is a diagram illustrating a simple numerical code definition;
FIG. 3 is a diagram illustrating a complex numerical code definition;
FIG. 4 is a schematic diagram of a memory structure;
FIG. 5 is a schematic diagram of the organization of rows;
FIG. 6 is a flowchart of the steps for performing line memory encoding.
Detailed Description
All features disclosed in all embodiments in this specification, or all methods or process steps implicitly disclosed, may be combined and/or expanded, or substituted, in any way, except for mutually exclusive features and/or steps.
The technical problems, design ideas, working principles, advantageous effects and working processes mainly solved by the present invention are further described in detail below with reference to fig. 2 to 6.
As shown in fig. 2 to fig. 6, the present invention provides a technical solution for a data compression encoding method, apparatus and time sequence database supporting the line memory, which at least solves the following technical problems:
firstly, line storage is supported, and a large number of scenes needing to process data according to lines in practical application are met;
improving the column storage performance under the condition of more processing columns;
and thirdly, realizing the column memory coding, namely coding immediately when data is written line by line, and realizing the compression while inserting.
The main design idea of the invention is as follows: the method in the technical scheme of the invention is mainly oriented to numerical value type time sequence data, and provides a data compression coding method supporting the line memory. The numerical value type time sequence data comprises common integers and decimal numbers, and the compression codes comprise numerical value type data codes, numerical value type difference value data codes and line memory codes according to the hierarchy. The line memory coding can be understood as the final output of the whole coded data, and the numerical data coding and the numerical difference data coding are the basic expression modes. The entire encoding process can implement a mode that is completely lossless compression.
The main technical means of space compression provided by the invention comprises the following steps: the numerical data coding in the embodiment of the invention can meet the storage requirements of all numerical time series data, the numerical difference data coding in the embodiment of the invention can reduce the individual expression space, and the line memory coding in the embodiment of the invention can realize large-area space saving and management of various data expression modes. Finally, the invention reduces the data space, improves the utilization rate of the storage space and increases the storage capacity. Meanwhile, due to data compression, the utilization rate of the cache is improved, and a simple decoding algorithm can be realized, so that the overall performance of the database is greatly improved. In the implementation, the method specifically comprises the following steps:
A. numerical data encoding
The design idea is as follows: the numerical data expression mode designed by the invention is real number, decimal number and sign information. An expression mode in which a decimal point is removed from an actual value in which a real number is a numeric value, for example, 123.4567890369, is represented as 1234567890369, and in practical application, 30-bit binary expression can be performed for every 9 digits in a manner convenient to understand, and in consideration of compressing invalid zeros at the end as much as possible, a first data segment is 567890369, a second data segment is 1234, a maximum of 4 data segments are supported, and a maximum of 36 decimal significant digits can be expressed. Decimal place means that the position of the real number from right to left is a decimal point, namely the position is multiplied by minus power of 10 to be consistent with the expressed numerical value, wherein the decimal place is 10, and the original data is 1234567890369 x 10-10123.4567890369. Note that 0 represents an integer, a positive number represents a decimal number, and a negative number represents an exponent.
In practical implementation, the numerical data coding based on the above design concept of the present invention is designed into two categories, namely: simple numerical type coding and complex numerical type coding, and the data coding is designed to be variable length mode, so the real number part only keeps the valid number part, and the invalid 0 is compressed, for example, -1.23 can be expressed as the real number: 123. decimal place number: 2. symbol: negative, where the true number is stored in just one byte.
1. Simple numeric code definition: as shown in FIG. 2, the effective data length of the simple numerical code is 1-8 bytes. The decimal digit number is 3+1, namely 4 bits, and can express 0-15, the real data effective digit number is 7 × 8-1 at most 55 bits, and can express 17 decimal data. Therefore, the method meets the storage requirement of common numerical time sequence data.
2. Complex numerical code definition: as shown in FIG. 3, the effective data length of the complex numerical code is 1-17 bytes. The decimal digit number is 8 digits with symbols and can express-127 to 127, the real data effective digit number is at most 15 × 8 to 120 digits, and 36 decimal data can be expressed. Therefore, the storage requirements of all numerical time series data are met.
B. Numerical difference data encoding
The design idea is as follows: when there is a direct relationship between the data to be stored, such as the values of the time before and after the same column, the difference between the two values can be stored to obtain a higher compression ratio. The difference will typically be much smaller than the actual value. The difference designed by the invention is stored as a variable length mode, the calculation of the difference is carried out under the condition that the decimal digits of the two values are consistent, the calculation method is obtained by carrying out XOR operation on the real digits of the two values, and the result is smaller than the expression range of a 7-byte integer in specific implementation. When decoding is recovered, only the xor operation between the reference data and the difference is needed, and the symbol is corrected to obtain the actual data, and the storage structure is shown in fig. 4. 0 1 means 1 byte, 1 to 3 1 means 2 to 4 bytes, 4 1 means 6 bytes and 5 means 8 bytes.
C. Line memory coding
The design idea is as follows: the database stores data for speed and coding compression efficiency, and thus embodiments of the present invention are designed to code independently per data block. Intra-row reference coding is taken for the first row within the data block: that is, the first column can only store the original value, if the decimal digits are the same from the second column, the difference value can be calculated through exclusive or, if the storage space of the difference value is small, the difference value is stored, otherwise, the original value is directly stored. And in the subsequent rows from the second row, calculating the difference value of the corresponding column and the previous row by XOR, if the difference value can be calculated and the storage space of the difference value is small, storing the difference value, and otherwise, directly storing the original value. In design, it is considered that the number of columns is very large, and thus the organization structure of rows is as shown in fig. 5.
In line coding, the data type describes the category of the data, such as the first line data or the second line data identification. The field description information describes the type of data storage: the original value, the difference value, the null value and the previous one are completely consistent in four states, so two-bit expression can be adopted. The processing according to the byte is more convenient in the data organization operation, so that 4 fields share one field description information, namely 8 bits can be completely used. It should be noted that when the state is null or completely consistent with the previous one, no additional space is occupied; when the four fields are completely consistent with the previous field, the continuous field description information can be omitted, and only 1 bit is needed to identify one 0 on the field bitmap, so that one field bitmap byte can manage 8 field description information, namely 32 field data. Under special conditions, when the data of the 32 fields are not changed, the data can be solved only by storing a 0 on the field bitmap, and the extremely compression level is achieved. Therefore, the column-based coding algorithm is very suitable for the case of a large number of columns.
In the process of line memory coding, only the previous line of data needs to be referred to, so the database writing coding can be writing one line of coding, and is more suitable for a time sequence database.
D. Data decoding
When the database retrieval requirement occurs, the data block needing to be read can be positioned according to the upper layer data. And then decoding line by line, namely reading subsequent data according to the type and performing simple operation to obtain an actual value according to the field bitmap and the field description information, wherein the recovery method comprises the following steps: the original value represents that the original data can be restored directly according to the coding rule, the difference value represents that the XOR operation is required to be carried out on the reference data and the difference value and the symbol is corrected, the reference data can be directly used when the difference value is consistent with the reference data, the null value represents that the data is null, and the whole decoding process is fast and accurate in calculation.
As shown in fig. 6, a row of data is input, whether the row is a first row of a block is determined, and if the row is the first row of the block, a difference is calculated by using a front row reference; if not, calculating the difference value by using the reference of the corresponding column of the previous row, starting row coding, entering row-by-row cyclic coding, judging a writing mode (writing an original value, writing the difference value and idle operation) according to the data state, finishing the cycle and finishing the coding.
The beneficial effects of the invention in specific application are verified as follows:
1. compared with the data storage mode of Gorrilla In the paper of A Fast, Scalable, In-Memory Time Series Database, the data stored In the paper is the original double data value and occupies 64 bits, namely 8 bytes, and the data stored In the invention has variable length and is 1-17 bytes, so that the data expression is more accurate.
2. The difference coding method of the present invention is different from the difference mentioned in Gorrilla's paper, which uses floating-point primitive type, but the present invention is based on the numerical expression of autonomous design, and the specific differences are shown in table 1 (decimal place and sign are omitted):
TABLE 1
Figure BDA0003430708630000091
Figure BDA0003430708630000101
As can be seen from Table 1, the present invention can accurately express specific values and occupies a relatively small memory. And the difference of the data is reflected by the XOR operation, so that the method is more stable relative to float and double.
3. Compared with the scheme proposed in the Gorrilla-based paper, the line memory coding method has more strict management and organization information, the management and organization information not only enables each piece of data to have one header, but also enables the coding compression rate to be further improved, and data are not simply piled together.
Example 1: a data compression coding method supporting line memory comprises the following steps:
s1, defining the expression of the numerical data, and defining the data block according to the defined expression;
s2, after inputting a line of data, judging whether the line is the first line of the data block, if so, calculating the difference value by using the previous column reference and based on the expression of the defined numerical data; if not, calculating the difference value by using the reference of the corresponding column of the previous row and based on the defined expression of the numerical data, starting row encoding, entering column-by-column cyclic encoding, judging the writing mode according to the data state, and then performing cyclic writing.
Example 2: on the basis of the embodiment 1, the expression of the numerical data specifically includes a true number and a decimal place, the true number is an expression mode of a numerical value with a decimal point removed, and the true number is converted into a binary expression; the decimal place indicates that the position of the real number from right to left is a decimal point, namely, the position is expressed by the minus power of 10.
Example 3: on the basis of the embodiment 1, the calculation difference is calculated under the condition that the decimal digits of the two numerical values are consistent, and the calculation method is obtained by performing exclusive or operation on the real digits of the two numerical values.
Example 4: on the basis of embodiment 1, the difference is stored as a variable length pattern.
Example 5: on the basis of the embodiment 1, the writing mode includes writing an original value, writing a difference value, and performing a null operation.
Example 6: on the basis of embodiment 1, the data blocks are independently encoded.
Example 7: on the basis of embodiment 1, the starting row encoding and entering column-wise cyclic encoding includes the sub-steps of: intra-row reference coding is taken for the first row within the data block: only storing an original value in the first column, calculating a difference value by XOR if the decimal digits of the first column are the same, and storing the difference value if the storage space of the difference value is smaller than a set value, otherwise, directly storing the original value; and in the subsequent rows from the second row, calculating the difference value of the corresponding column and the previous row through XOR, if the difference value can be calculated and the storage space of the difference value is less than a set value, storing the difference value, and otherwise, directly storing the original value.
Example 8: on the basis of embodiment 1, in the line coding process, the data type is used for describing the category of the data, and the field description information is used for describing the type of data storage: the original value, the difference value, the null value and the previous one are completely consistent in four states.
Example 9: on the basis of embodiment 8, in the line coding process, the data organization operation shares a field description information according to 4 fields, and when the state is null or completely consistent with the previous one, no additional space is occupied; when the four fields are completely consistent with the previous field, the field description information is omitted, 1 bit is used for identifying a 0 on the field bitmap, one field bitmap byte manages 8 field description information, namely 32 field data, and when the data of the 32 fields are not changed, a 0 is stored on the field bitmap.
Example 10: on the basis of the embodiment 8, the category of the data comprises the first row data or the second row data identification.
Example 11: on the basis of the embodiment 1, in the line storage coding process, only the previous line of data needs to be referred to, and the database writing coding process is to write a line of coded lines.
Example 12: on the basis of the embodiment 1, the method comprises a decoding step;
and S3, when the database retrieval requirement occurs, positioning the data block to be read according to the upper layer data, and then decoding line by line.
Example 13: on the basis of embodiment 12, said performing progressive decoding comprises the sub-steps of: and reading subsequent data according to types and performing operation to obtain an actual value according to the field bitmap and the field description information in the line memory organization structure.
Example 14: on the basis of the embodiment 12, the method includes a data recovery step, if the original value represents that the original data can be recovered directly according to the coding rule, if the difference value represents that the xor operation is required to be performed on the reference data and the difference value and the sign is corrected, if the difference value represents that the difference value is consistent with the reference data, the reference data is directly used, and if the difference value represents that the difference value is null, the data is null.
Example 15: on the basis of embodiment 2, the states of the decimal places are specifically: a positive number indicates that a decimal number is included, a zero indicates that the number is an integer, and a negative number indicates that the number is exponential.
Example 16: a data compression coding device supporting line memory comprises a numerical data coding unit, a numerical difference data coding unit and a line memory coding unit; the numerical data coding unit is used for defining a numerical data expression mode comprising real numbers, decimal digits and sign information; the real number is an expression mode of a numerical value with decimal points removed, and then the real number is converted into a binary expression; the decimal digit indicates that the position of the current real number from right to left is a decimal point, namely the decimal point is multiplied by the minus power of 10 to be consistent with the expressed numerical value; the numerical type difference data coding unit is used for coding the difference, storing the difference as a variable length mode, calculating the difference under the condition that decimal digits of the two numerical values are consistent, and performing exclusive-or operation on real digits of the two numerical values by using a calculation method; the line memory coding unit is used for storing data in a database and is designed to independently code according to data blocks; intra-row reference coding is taken for the first row within the data block: only storing an original value in the first column, calculating a difference value by XOR if the decimal digits of the first column are the same, and storing the difference value if the storage space of the difference value is smaller than a set value, otherwise, directly storing the original value; and in the subsequent rows from the second row, calculating the difference value of the corresponding column and the previous row through XOR, if the difference value can be calculated and the storage space of the difference value is less than a set value, storing the difference value, and otherwise, directly storing the original value.
Example 17: on the basis of the embodiment 16, the device comprises a data decoding unit and a data recovery unit; the data decoding unit is used for positioning a data block to be read according to upper layer data when a database retrieval demand occurs; then, decoding line by line, namely reading subsequent data according to the type and calculating according to the field bitmap and the field description information to obtain an actual value; the data recovery unit is configured to execute the following processes: if the original value indicates that the original data can be recovered directly according to the coding rule, if the difference value indicates that the XOR operation between the reference data and the difference value is needed and the symbol is corrected, if the difference value indicates that the difference value is consistent with the reference data, the reference data is directly used, and if the difference value indicates that the difference value is null, the data is null.
Example 18: on the basis of embodiments 1 to 15, a time series database includes a readable storage medium and a program, and when the program runs on the readable storage medium, the method as described above is executed; or on the basis of embodiments 16 to 17, the data compression encoding device supporting the line memory is included.
The functionality of the present invention, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium, and all or part of the steps of the method according to the embodiments of the present invention are executed in a computer device (which may be a personal computer, a server, or a network device) and corresponding software. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, or an optical disk, exist in a read-only Memory (RAM), a Random Access Memory (RAM), and the like, for performing a test or actual data in a program implementation.

Claims (18)

1. A data compression coding method supporting line memory is characterized by comprising the following steps:
s1, defining the expression of the numerical data, and defining the data block according to the defined expression;
s2, after inputting a line of data, judging whether the line is the first line of the data block, if so, calculating the difference value by using the previous column reference and based on the expression of the defined numerical data; if not, calculating the difference value by using the reference of the corresponding column of the previous row and based on the defined expression of the numerical data, starting row encoding, entering column-by-column cyclic encoding, judging the writing mode according to the data state, and then performing cyclic writing.
2. The method according to claim 1, wherein the expression of the numerical data specifically includes a true number and a decimal place, and the true number is an expression form of a numerical value with a decimal place removed from an actual value and is converted into a binary expression; the decimal place indicates that the position of the real number from right to left is a decimal point, namely, the position is expressed by the minus power of 10.
3. The method according to claim 1, wherein the difference is calculated when the decimal digits of the two values are identical, and the difference is calculated by performing an exclusive or operation on the true digits of the two values.
4. The method according to claim 1, wherein the difference value is stored as a variable length pattern.
5. The method according to claim 1, wherein the writing manner comprises writing an original value, writing a difference value, and performing a null operation.
6. The method according to claim 1, wherein the data blocks are independently encoded.
7. The method of claim 1, wherein the starting row encoding and entering into column-wise cyclic encoding comprises the sub-steps of:
intra-row reference coding is taken for the first row within the data block: only storing an original value in the first column, calculating a difference value by XOR if the decimal digits of the first column are the same, and storing the difference value if the storage space of the difference value is smaller than a set value, otherwise, directly storing the original value;
and in the subsequent rows from the second row, calculating the difference value of the corresponding column and the previous row through XOR, if the difference value can be calculated and the storage space of the difference value is less than a set value, storing the difference value, and otherwise, directly storing the original value.
8. The method for encoding data compression supporting line memory according to claim 1, wherein in the line encoding process, the data type is used for describing the category of the data, and the field description information is used for describing the type of the data storage: the original value, the difference value, the null value and the previous one are completely consistent in four states.
9. The method for encoding data compression supporting line memory according to claim 8, wherein in the line encoding process, the data organization operation shares a field description information according to 4 fields, and when the state is null or completely consistent with the previous one, no additional space is occupied; when the four fields are completely consistent with the previous field, the field description information is omitted, 1 bit is used for identifying a 0 on the field bitmap, one field bitmap byte manages 8 field description information, namely 32 field data, and when the data of the 32 fields are not changed, a 0 is stored on the field bitmap.
10. The method for data compression coding supporting line memory according to claim 8, wherein the category of the data includes a first line data or a second line data identifier.
11. The method for encoding data compression supporting line memory according to claim 1, wherein in the line memory encoding process, only a previous line of data needs to be referred to, and the database writing encoding process is to write a line to encode a line.
12. The method for data compression coding supporting line memory according to claim 1, comprising a decoding step;
and S3, when the database retrieval requirement occurs, positioning the data block to be read according to the upper layer data, and then decoding line by line.
13. The method for compression coding data supporting line memory according to claim 12, wherein said performing line-by-line decoding comprises the sub-steps of: and reading subsequent data according to types and performing operation to obtain an actual value according to the field bitmap and the field description information in the line memory organization structure.
14. The method of claim 12, comprising a data recovery step, wherein if the original value indicates that the original data can be recovered directly according to the encoding rule, if the difference indicates that the difference needs to be subjected to xor operation with the reference data and the difference and the sign is corrected, if the difference indicates that the difference is consistent with the reference data, the reference data is directly used, and if the difference indicates that the difference is null, the data is null.
15. The method according to claim 2, wherein the status of the decimal place is specifically: a positive number indicates that a decimal number is included, a zero indicates that the number is an integer, and a negative number indicates that the number is exponential.
16. A data compression coding device supporting line memory is characterized by comprising a numerical data coding unit, a numerical difference data coding unit and a line memory coding unit;
the numerical data coding unit is used for defining a numerical data expression mode comprising real numbers, decimal digits and sign information; the real number is an expression mode of a numerical value with decimal points removed, and then the real number is converted into a binary expression; the decimal digit indicates that the position of the current real number from right to left is a decimal point, namely the decimal point is multiplied by the minus power of 10 to be consistent with the expressed numerical value;
the numerical type difference data coding unit is used for coding the difference, storing the difference as a variable length mode, calculating the difference under the condition that decimal digits of the two numerical values are consistent, and performing exclusive-or operation on real digits of the two numerical values by using a calculation method;
the line memory coding unit is used for storing data in a database and is designed to independently code according to data blocks; intra-row reference coding is taken for the first row within the data block: only storing an original value in the first column, calculating a difference value by XOR if the decimal digits of the first column are the same, and storing the difference value if the storage space of the difference value is smaller than a set value, otherwise, directly storing the original value; and in the subsequent rows from the second row, calculating the difference value of the corresponding column and the previous row through XOR, if the difference value can be calculated and the storage space of the difference value is less than a set value, storing the difference value, and otherwise, directly storing the original value.
17. The apparatus for encoding data compression supporting line memory according to claim 16, comprising a data decoding unit and a data restoring unit;
the data decoding unit is used for positioning a data block to be read according to upper layer data when a database retrieval demand occurs; then, decoding line by line, namely reading subsequent data according to the type and calculating according to the field bitmap and the field description information to obtain an actual value;
the data recovery unit is configured to execute the following processes: if the original value indicates that the original data can be recovered directly according to the coding rule, if the difference value indicates that the XOR operation between the reference data and the difference value is needed and the symbol is corrected, if the difference value indicates that the difference value is consistent with the reference data, the reference data is directly used, and if the difference value indicates that the difference value is null, the data is null.
18. A time series database comprising a readable storage medium and a program, wherein when the program is run on the readable storage medium, the method according to any one of claims 1 to 15 is performed; or a data compression encoding apparatus supporting line memory as claimed in any one of claims 16 to 17.
CN202111597104.1A 2021-12-24 2021-12-24 Data compression coding method, device and time sequence database supporting line memory Active CN114268323B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111597104.1A CN114268323B (en) 2021-12-24 2021-12-24 Data compression coding method, device and time sequence database supporting line memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111597104.1A CN114268323B (en) 2021-12-24 2021-12-24 Data compression coding method, device and time sequence database supporting line memory

Publications (2)

Publication Number Publication Date
CN114268323A true CN114268323A (en) 2022-04-01
CN114268323B CN114268323B (en) 2023-07-07

Family

ID=80829572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111597104.1A Active CN114268323B (en) 2021-12-24 2021-12-24 Data compression coding method, device and time sequence database supporting line memory

Country Status (1)

Country Link
CN (1) CN114268323B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115955250A (en) * 2023-03-14 2023-04-11 燕山大学 College scientific research data acquisition management system
CN116095184A (en) * 2023-03-07 2023-05-09 成都索贝视频云计算有限公司 Structured network data transmission coding method, device and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10210273A (en) * 1997-01-24 1998-08-07 Dainippon Screen Mfg Co Ltd Comparison method of compressed image data and its device
CN1808414A (en) * 2004-12-06 2006-07-26 索尼株式会社 Method and apparatus for learning data, method and apparatus for recognizing data, method and apparatus for generating data and computer program
US20080071818A1 (en) * 2006-09-18 2008-03-20 Infobright Inc. Method and system for data compression in a relational database
CN109636865A (en) * 2017-10-06 2019-04-16 想象技术有限公司 Data compression
CN109952708A (en) * 2016-12-12 2019-06-28 德州仪器公司 Lossless data compression
CN112905125A (en) * 2021-03-04 2021-06-04 中电普信(北京)科技发展有限公司 Data storage and reading method based on high-precision calculation of computer
US20210305998A1 (en) * 2020-03-31 2021-09-30 Yokogawa Electric Corporation Data management system, data management method, and storage medium with data management program stored thereon

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10210273A (en) * 1997-01-24 1998-08-07 Dainippon Screen Mfg Co Ltd Comparison method of compressed image data and its device
CN1808414A (en) * 2004-12-06 2006-07-26 索尼株式会社 Method and apparatus for learning data, method and apparatus for recognizing data, method and apparatus for generating data and computer program
US20080071818A1 (en) * 2006-09-18 2008-03-20 Infobright Inc. Method and system for data compression in a relational database
CN109952708A (en) * 2016-12-12 2019-06-28 德州仪器公司 Lossless data compression
CN109636865A (en) * 2017-10-06 2019-04-16 想象技术有限公司 Data compression
US20210305998A1 (en) * 2020-03-31 2021-09-30 Yokogawa Electric Corporation Data management system, data management method, and storage medium with data management program stored thereon
CN112905125A (en) * 2021-03-04 2021-06-04 中电普信(北京)科技发展有限公司 Data storage and reading method based on high-precision calculation of computer

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
骆金维;曾德生;郭雅;黄富平;: "时序数据并行压缩速率改进技术研究", 电子设计工程, no. 20, pages 104 - 107 *
黄缙华;周伊琳;: "基于EMS时间序列数据的实时全息无损压缩方法的研究与应用", 广东电力, no. 09, pages 89 - 93 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116095184A (en) * 2023-03-07 2023-05-09 成都索贝视频云计算有限公司 Structured network data transmission coding method, device and medium
CN116095184B (en) * 2023-03-07 2023-07-07 成都索贝视频云计算有限公司 Structured network data transmission coding method, device and medium
CN115955250A (en) * 2023-03-14 2023-04-11 燕山大学 College scientific research data acquisition management system
CN115955250B (en) * 2023-03-14 2023-05-12 燕山大学 College scientific research data acquisition management system

Also Published As

Publication number Publication date
CN114268323B (en) 2023-07-07

Similar Documents

Publication Publication Date Title
CN114268323B (en) Data compression coding method, device and time sequence database supporting line memory
CN103995887A (en) Bitmap index compressing method and bitmap index decompressing method
CN112953550A (en) Data compression method, electronic device and storage medium
CN116303374A (en) Multi-dimensional report data optimization compression method based on SQL database
CN116016606B (en) Sewage treatment operation and maintenance data efficient management system based on intelligent cloud
CN116594572B (en) Floating point number stream data compression method, device, computer equipment and medium
CN110032470B (en) Method for constructing heterogeneous partial repeat codes based on Huffman tree
CN115408350A (en) Log compression method, log recovery method, log compression device, log recovery device, computer equipment and storage medium
CN112434085B (en) Roaring Bitmap-based user data statistical method
CN114640354A (en) Data compression method and device, electronic equipment and computer readable storage medium
CN113078908B (en) Simple encoding and decoding method suitable for time sequence database
CN110019184B (en) Method for compressing and decompressing ordered integer array
CN112052228A (en) Binary coding method based on mutual mapping between standard Euclidean space and plane space projection
CN108062289B (en) Fast Fourier Transform (FFT) address order changing method, signal processing method and device
CN109698703B (en) Gene sequencing data decompression method, system and computer readable medium
CN112000509B (en) Erasure code encoding method, system and device based on vector instruction
CN108259515A (en) A kind of lossless source compression method suitable for transmission link under Bandwidth-Constrained
CN113705784A (en) Neural network weight coding method based on matrix sharing and hardware system
CN109255090B (en) Index data compression method of web graph
CN110362580B (en) BIM (building information modeling) construction engineering data retrieval optimization classification method and system thereof
CN111263155B (en) Compression method and system for equal-resolution CR image
CN113919289A (en) Coding method of bit coin wallet address character string and address numbering table generating method
CN110782003A (en) Neural network compression method and system based on Hash learning
CN110046159B (en) Bank account storage method and device, computer equipment and storage medium
CN113900622B (en) FPGA-based data information rapid sorting method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant