CN113078908A - Simple encoding and decoding method suitable for time sequence database - Google Patents

Simple encoding and decoding method suitable for time sequence database Download PDF

Info

Publication number
CN113078908A
CN113078908A CN202110259307.3A CN202110259307A CN113078908A CN 113078908 A CN113078908 A CN 113078908A CN 202110259307 A CN202110259307 A CN 202110259307A CN 113078908 A CN113078908 A CN 113078908A
Authority
CN
China
Prior art keywords
integer
value
floating point
storing
identifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110259307.3A
Other languages
Chinese (zh)
Other versions
CN113078908B (en
Inventor
黄励博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Upyun Technology Co ltd
Original Assignee
Hangzhou Upyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Upyun Technology Co ltd filed Critical Hangzhou Upyun Technology Co ltd
Priority to CN202110259307.3A priority Critical patent/CN113078908B/en
Publication of CN113078908A publication Critical patent/CN113078908A/en
Application granted granted Critical
Publication of CN113078908B publication Critical patent/CN113078908B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/40Conversion to or from variable length codes, e.g. Shannon-Fano code, Huffman code, Morse code
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a simple coding and decoding method suitable for a time sequence database, which comprises the following steps: 1) identifying the numerical values of the time sequence, determining the type of the data, and if the numerical values are floating point numbers, entering a step 2), and if the numerical values are integer, entering a step 4); 2) converting an integer; 3) compressing floating-point numbers; 4) and if the numerical value of the time sequence is the time stamp, compressing the time stamp, and if the integer numerical value of the time sequence is not the time stamp, performing integer compression. The method is used for storing and accessing the time sequence monitoring data of a specific scene, and floating point number compression is optimized through a conversion strategy of a floating point type to an integer type under the scene that the floating point type and the integer type are used in a mixed mode. The invention has better compression effect than a general lossless compression algorithm under a specific use scene. The algorithm of the invention is simple and easy to process, and can more conveniently process the data input of large data volume.

Description

Simple encoding and decoding method suitable for time sequence database
Technical Field
The invention relates to the technical field of simple coding and decoding of a time sequence database, in particular to a simple coding and decoding method suitable for the time sequence database.
Background
The time sequence database is one of databases, is specially used for storing data which changes along with time, such as sensor data, machine monitoring data and the like, and is suitable for scenes of Internet of things, industrial internet, operation and maintenance monitoring and the like.
The time sequence database is specially used for storing data center monitoring indexes, each piece of data is a combination of a timestamp and an index and a numerical value, the data are written in according to a time sequence, and flexible and diverse aggregation query capabilities are provided. The index data has the characteristics of vertical writing and horizontal reading, namely, various indexes are collected according to time sequence to be written into a database, and one index or a plurality of indexes in a specified time are read.
The time sequence database, because of its huge data volume, will generally be targeted to the data that it stored compressed to save, and compress and reduce the redundancy and save the storage space, also need convenient retrieval data. Therefore, different compression schemes are required for different data formats, and the compression complexity and the query speed are also considered.
Disclosure of Invention
The invention aims to provide a simple coding and decoding method suitable for a time sequence database so as to store and access time sequence monitoring data of a specific scene.
A simple coding and decoding method suitable for a time sequence database comprises the following steps:
1) identifying the numerical values of the time sequence, determining the type of the data, and if the numerical values are floating point numbers, entering a step 2), and if the numerical values are integer, entering a step 4);
2) integer conversion:
the integer conversion specifically comprises:
2a) taking floating point number as multiplier and [ 1101001000100001000001000000 ] as multiplicand in turn to obtain the first result, if the decimal part of the first result (i.e. result O) is 0, the conversion is successful and the multiplicand and integer part I are returned to complete the integer conversion;
2b) if the fractional part of the first result is not 0, the first result is converted to an integer according to the standard;
2c) if the last floating point number corresponding to the integer obtained in the step 2b) is the same as the integer part I of the first result (namely, the result O), the conversion is successful and the multiplicand and the integer part I are returned, and the integer conversion is completed;
2d) if the next floating point number corresponding to the integer obtained in the step 2b) is the same as the integer part I +1 of the first result (i.e. the result O), the conversion is successful and the multiplicand and the integer part I +1 are returned;
2e) if the integer conversion fails, entering step 3);
2f) if the integer conversion is successful, entering the step 4);
3) compression of floating point number:
4) if the numerical value of the time sequence is the time stamp, compressing the time stamp, and if the integer numerical value of the time sequence is the non-time stamp, performing integer compression;
in step 2b), the first result is converted into a 64-bit integer according to the IEEE 754 binary floating-point arithmetic standard. The arithmetic standard of the binary floating point number is IEEE 754 arithmetic standard of the binary floating point number. The integer is an integer of 64 bits.
In step 3), floating point number compression specifically includes:
3a) the floating point number is converted into a 64-bit integer according to the standard;
3b) if the floating point number is the first floating point number in the time sequence, writing an identifier which indicates the floating point number, and storing data according to 64 bits;
3c) if the last numerical value of the time sequence of the floating point number is an integer and is a floating point number, storing an identifier to represent that the numerical value is changed, storing an identifier to represent that the numerical value is not repeated, and storing an identifier to represent the floating point number;
3d) if the last numerical value of the time sequence of the floating point number is the floating point number and is the same as the numerical value of the floating point number, storing an identifier to represent that no change occurs, and storing an identifier to represent repetition;
3e) if the last time of the time sequence of the floating point number is a floating point number and is different from the numerical value of the floating point number, storing an identifier to represent that no change occurs, converting the floating point number into 64 bits and then calculating an exclusive-or value, comparing the exclusive-or value with the last exclusive-or value (0 if not), and comparing leading zeros and trailing zeros if the leading zeros PLZ of the previous exclusive-or value and the trailing zeros PTZ of the previous exclusive-or value are both less than or equal to the CLZ of the leading zeros of the current exclusive-or value and the CTZ of the trailing zeros of the current exclusive-or value, and storing a significant numerical value (64-the PLZ of the leading zeros of the previous exclusive-or value-the trailing zeros PTZ of the previous exclusive-or value) in bits.
3f) If the number of leading zeros of the previous xor value PLZ and the zero-number of the trailing digits of the previous xor value PTZ are greater than the number of leading zeros of the current xor value CLZ and the zero-number of the trailing digits of the current xor value CTZ, a significant value of (64-the number of leading zeros of the current xor value CLZ-the zero-number of the trailing digits of the current xor value CTZ) bits needs to be stored.
In step 3a), the floating point number is converted into a 64-bit integer according to the IEEE 754 binary floating point number arithmetic standard;
in step 3f), the significant value of the (64-leading zero number CLZ of the current xor value-leading zero number CTZ of the current xor value) bits is stored, which specifically includes:
first store 6 bits (2^6 ^ 64) to represent CLZ, then store 6 bits to represent 64-CLZ-CTZ, and finally store meaningful data of 64-CLZ-CTZ bits.
In step 4), compressing the timestamp specifically includes:
4a) defining time units including nanosecond, microsecond, millisecond and second, representing the minimum time which can be identified by the time sequence, wherein the finer the unit, the more the data quantity which needs to be stored;
4b) storing a value converted by a fixed time unit into a first time stamp in the time sequence;
4c) calculating the difference value from the last timestamp from the second timestamp in the time sequence, then calculating the difference value (0 if not), then converting by a fixed time unit, if the unit is second or millisecond, the value is converted into the result of maximum 32 bits, if the unit is nanosecond or microsecond, the value is converted into the result of maximum 64 bits, finally selecting proper bytes in the byte queues [0,7,9,12,32] or [0,7,9,12,64] according to the size of the converted result value, and distinguishing different byte queues by prefix bits.
Integer compression, specifically including:
4l) if the current integer is the first value of the time sequence, writing an identifier which indicates the integer, then storing the identifier to mark the multiplicand, and distinguishing the positive number from the negative number by using the identifier;
the number CLZ of leading zeros of the absolute value of the current integer is consistent according to 64-CLZ and 64-PLZ (the number of leading zeros of the previous integer), the identifier is stored to indicate consistency, if the number CLZ of the leading zeros of the current integer is inconsistent, the number CLZ of the current integer is stored into the 64-CLZ, and then the significant value of the 64-CLZ bit is stored;
4m) if a value in the time sequence of the current integer is also an integer, the multiplier is the same and the difference D is 0, storing an identifier to represent that the change occurs, and recording an identifier to represent the repeated integer;
4n) if a value in the time sequence of the current integer is an integer, but the multiplier is not the same or the difference D is not 0, distinguishing positive and negative numbers by using an identifier, not increasing the multiplicand and the number of bits 64-CLZ of the effective value is the same as the difference calculated at the last time, storing an identifier to represent no change, then storing an identifier to distinguish the positive and negative numbers, and finally storing a meaningful value of the 64-CLZ bits;
4o) if a value in the time series of the current integer is not an integer, distinguishing positive and negative numbers by an identifier, storing an identifier to represent that the change occurs, storing an identifier to represent that the change does not occur, storing the number CLZ of leading zeros of the absolute value of the current integer, according to the agreement between 64-CLZ and 64-PLZ (the number of leading zeros of the previous integer), storing an identifier to indicate the agreement, if the agreement does not occur, storing the value of 64-CLZ, and storing the meaningful value of the 64-CLZ bit.
Compared with the prior art, the invention has the following advantages:
1) under the scene of mixed use of the floating point type and the integer type, the floating point number compression is optimized through the conversion strategy of the floating point type to the integer type.
2) The invention has better compression effect than a general lossless compression algorithm under a specific use scene. For example, under the use scenario of the monitoring index collection, the compression effect of 1.66 bytes of single data can be achieved, which is better than 2-5 bytes of the traditional related compression algorithm.
3) The algorithm of the invention is simple and easy to process, saves CPU compared with the traditional compression algorithm, and more conveniently processes the data input with large data volume. Meanwhile, the invention can quickly read one index or a plurality of indexes in the appointed time.
Drawings
FIG. 1 is a flow chart illustrating a simplified encoding and decoding method for a time series database according to the present invention.
Detailed Description
As shown in fig. 1, a simple encoding and decoding method suitable for a time series database includes the following steps:
1) identifying the numerical values of the time sequence, determining the type of the data, and if the numerical values are floating point numbers, entering a step 2), and if the numerical values are integer, entering a step 4);
2) integer conversion:
2a) taking floating point number as multiplier, in turn [ 1101001000100001000001000000 ] as multiplicand to obtain the first result, if the decimal part of the first result (i.e. result O) is 0, the conversion is successful and returns multiplicand B and integer part I, thus finishing the integer conversion;
2b) if the fractional part of the first result is not 0, the first result is converted to an integer according to the standard;
2c) if the last floating point number corresponding to the integer obtained in the step 2B) is the same as the integer part I of the first result (namely the result O), the conversion is successful and the multiplicand B and the integer part I are returned, and the integer conversion is completed;
2d) if the next floating point number corresponding to the integer obtained in step 2B) is the same as the integer part I +1 of the first result (i.e., result O), the conversion is successful and the multiplicand B and the integer part I +1 are returned.
2e) If the integer conversion fails, entering step 3);
2f) if the integer conversion is successful, entering the step 4);
in step 2), the first result is converted into a 64-bit integer according to the IEEE 754 binary floating-point number arithmetic standard. The arithmetic standard of the binary floating point number is IEEE 754 arithmetic standard of the binary floating point number. The integer is an integer of 64 bits.
3) Compression of floating point number:
3a) the floating point number is converted into a 64-bit integer according to the IEEE 754 binary floating point number arithmetic standard;
3b) if the floating point number is the first floating point number in the time sequence, writing an identifier which indicates the floating point number, and storing data according to 64 bits;
3c) if the last numerical value of the time sequence of the floating point number is an integer and is a floating point number, storing an identifier to represent that the numerical value is changed, storing an identifier to represent that the numerical value is not repeated, and storing an identifier to represent the floating point number;
3d) if the last numerical value of the time sequence of the floating point number is the floating point number and is the same as the numerical value of the floating point number, storing an identifier to represent that no change occurs, and storing an identifier to represent repetition;
3e) if the last time sequence of the floating point number is a floating point number and is different from the numerical value of the floating point number, storing an identifier to represent that no change occurs, converting the floating point number into 64 bits and then calculating an exclusive-or value, comparing the exclusive-or value with the last exclusive-or value (0 if not), and comparing leading zeros and trailing zeros if the number PLZ of leading zeros of the previous exclusive-or value and the number PTZ of trailing zeros of the previous exclusive-or value are both less than or equal to the CLZ of the number of leading zeros of the current exclusive-or value and the CTZ of trailing zeros of the current exclusive-or value, and storing a meaningful numerical value (64-the number of leading zeros of the previous exclusive-or value-the number CTZ of trailing zeros of the previous exclusive-or value).
3f) If the number of leading zeros of the previous xor value PLZ and the zero-number of trailing zeros of the previous xor value PTZ are greater than the number of leading zeros of the current xor value CLZ and the zero-number of trailing zeros of the current xor value CTZ, a significant value of (64-number of leading zeros of the current xor value-number of trailing zeros of the current xor value) bits needs to be stored.
In step 3f), the significant value of the (64-leading zero number CLZ of the current xor value-leading zero number CTZ of the current xor value) bits is stored, which specifically includes:
storing 6 bits (2^6 ^ 64) to represent CLZ, then 6 bits to represent 64-CLZ-CTZ and finally storing meaningful data of 64-CLZ-CTZ bits
4) If the numerical value of the time sequence is the time stamp, compressing the time stamp, and if the integer numerical value of the time sequence is the non-time stamp, performing integer compression;
the timestamp compression specifically comprises the following steps:
4a) defining time units including nanosecond, microsecond, millisecond and second, representing the minimum time which can be identified by the time sequence, wherein the finer the unit, the more the data quantity which needs to be stored;
4b) storing a value converted by a fixed time unit into a first time stamp in the time sequence;
4c) calculating the difference value from the last timestamp from the second timestamp in the time sequence, then calculating the difference value (0 if not), then converting by a fixed time unit, if the unit is second or millisecond, the value is converted into the result of maximum 32 bits, if the unit is nanosecond or microsecond, the value is converted into the result of maximum 64 bits, finally selecting proper bytes in the byte queues [0,7,9,12,32] or [0,7,9,12,64] according to the size of the converted result value, and distinguishing different byte queues by prefix bits.
Integer compression, specifically including:
4l) if the current integer is the first value of the time sequence, writing an identifier which indicates the integer, then storing the identifier to mark the multiplicand, and distinguishing the positive number from the negative number by using the identifier;
the number CLZ of leading zeros of the absolute value of the current integer is consistent according to 64-CLZ and 64-PLZ (the number of leading zeros of the previous integer), the identifier is stored to indicate consistency, if the number CLZ of the leading zeros of the current integer is inconsistent, the number CLZ of the current integer is stored into the 64-CLZ, and then the significant value of the 64-CLZ bit is stored;
4m) if a value in the time sequence of the current integer is also an integer, the multiplier is the same and the difference D is 0, storing an identifier to represent that the change occurs, and recording an identifier to represent the repeated integer;
4n) if a value in the time sequence of the current integer is an integer, but the multiplier is not the same or the difference D is not 0, distinguishing positive and negative numbers by using an identifier, not increasing the multiplicand and the number of bits 64-CLZ of the effective value is the same as the difference calculated at the last time, storing an identifier to represent no change, then storing an identifier to distinguish the positive and negative numbers, and finally storing a meaningful value of the 64-CLZ bits;
4o) if a value in the time series of the current integer is not an integer, distinguishing positive and negative numbers by an identifier, storing an identifier to represent that the change occurs, storing an identifier to represent that the change does not occur, storing the number CLZ of leading zeros of the absolute value of the current integer, according to the agreement between 64-CLZ and 64-PLZ (the number of leading zeros of the previous integer), storing an identifier to indicate the agreement, if the agreement does not occur, storing the value of 64-CLZ, and storing the meaningful value of the 64-CLZ bit.
Specifically, the method comprises the following steps:
1) before compressing the data, a period of compression is selected in units of hours, a minimum of 1 hour, and a maximum of 24 hours.
2) A timestamp compression algorithm.
a) Defining time units, including nanoseconds, microseconds, milliseconds, seconds, represents the minimum time that the time series can identify, the more data that needs to be stored for a fine unit.
b) A first time stamp stored in a value converted in a fixed time unit
c) And calculating the difference value of the later time stamp and the last time stamp, then calculating the difference value (0 if not), and converting by a fixed time unit. It is noted that this value is a maximum of 32 bits if the unit is seconds or milliseconds, and a maximum of 64 bits if the unit is nanoseconds or microseconds, because of the compression period. Finally, the value is stored in the byte queue [0,7,9,12,32] or [0,7,9,12,64] according to the size selection suitable byte. Of course we need to distinguish the different byte queues by prefix bits.
3) A numerical compression algorithm.
a) Integer conversion: taking floating point number as multiplier, in turn [ 1101001000100001000001000000 ] as multiplicand, if the fractional part of result O is 0, the conversion is successful and returns multiplicand B and integer part I. Otherwise, the result O is converted into a 64-bit integer according to the IEEE 754 binary floating point number arithmetic standard, and if the last floating point number corresponding to the integer is the same as the integer part I of the result O, the conversion is successful and the multiplicand B and the integer part I are returned. Similarly, if the next floating point number corresponding to this integer is the same as the integer portion I +1 of result O, the conversion is successful and returns multiplicands B and I + 1. Otherwise, floating point number is processed according to 4) floating point number compression algorithm.
b) If it is the first integer, a bit 0x0 is written, indicating an integer, and three bits are stored to mark the multiplicand (since the multiplicand has only 7 options). The positive and negative numbers are distinguished and stored with one bit. The number CLZ of leading zeros of the absolute value is first stored in 64-CLZ, and compressed and stored according to whether the number CLZ is identical to 64-PLZ (the number of leading zeros in the last time), whether the number CLZ is 0, and the like. Significant value stored in 64-CLZ bit
c) If the data after this fails in judgment of 3.a, or if the difference D from the previous integer is successful and does not jump within the range of int64, the process goes to 4. c.
d) If the previous integer is also the same and the multiplier is the same and the difference D is 0, a bit is stored to indicate that a change has occurred and a bit of 0x1 is recorded to indicate a repeated integer.
e) Otherwise, as in 3.c, the difference D is divided into positive and negative numbers by one bit, if the last is also an integer, the multiplicand is not increased and the number of bits 64-CLZ of the effective value is the same as the last difference, a bit is stored to represent no change, a bit is stored to divide the positive and negative numbers, and finally a meaningful value of the 64-CLZ bit is stored.
f) Otherwise, storing a bit indicates that a change has occurred and storing a bit of 0x0 indicates no duplication. And then the difference is stored according to the value 3. b.
4) Floating point number compression algorithms.
a) The floating point number is converted to a 64-bit integer according to the IEEE 754 binary floating point arithmetic standard
b) If the floating point number is the first, a bit 0x1 is written to indicate the floating point number, and the data is stored in 64 bits
c) If the previous integer is a floating point number, a bit is stored to indicate that the change is generated, a bit 0x0 is stored to indicate that the change is not repeated, and a bit is stored to indicate the floating point number.
d) On the other hand, if the floating point number is the same as the previous floating point number, a bit is stored to indicate that the change is generated, and a bit 0x0 is stored to indicate that the change is repeated.
e) And finally, how to be different from the sum of the previous floating point number, storing a bit to represent that no change occurs, and calculating an exclusive or value after converting the sum of the previous floating point number and the 64 bits. And comparing the leading zero with the trailing zero by the exclusive-or value and the last exclusive-or value (0 if not), and storing a significant value of 64-PLZ-PTZ bits if the number PLZ of leading zeros and the number PTZ of trailing zeros of the previous exclusive-or value are less than or equal to the current number CLZ of leading zeros and the number CTZ of trailing zeros.
f) Conversely, a meaningful value in the 64-CLZ-CTZ bit needs to be stored. Firstly, we need to distinguish the two cases by prefix bit, then, before storing 64-CLZ-CTZ bit data, 6 bits (2^6 ═ 64) are stored to represent CLZ, then 6 bits are stored to represent 64-CLZ-CTZ, and finally, meaningful data of 64-CLZ-CTZ bit are stored
5) Simple string compression algorithm.
a) The use scenario is as follows: the string field is rarely changed or the same string is often used.
b) Recording the most recently used character strings, caching a certain number N (compression and decompression, N needs to be kept consistent) of the most recently coded character string list, checking whether a new character string exists in the cache or not at first and adding the new character string to the cache if the new character string does not exist in the cache, and clearing the least recently used character string if the cache number is larger than N.
c) If the string is identical to the last time, a bit 0x0 is stored to indicate that there is no change, and the process is ended. Otherwise, store into 0x1
d) If the string is present in the buffer, a bit 0x0 is stored to indicate that the sequence number I representing the string in the buffer is stored next, and the number of bits occupied depends on N. Otherwise, storing a bit 0x1 indicates that the next string is stored, the length of the string is stored as a variable length integer, and then the string itself is stored.

Claims (8)

1. A simple coding and decoding method suitable for a time sequence database is characterized by comprising the following steps:
1) identifying the numerical values of the time sequence, determining the type of the data, and if the numerical values are floating point numbers, entering a step 2), and if the numerical values are integer, entering a step 4);
2) integer conversion:
2a) taking the floating point number as a multiplier, multiplying the floating point number by a multiplicand to obtain a first result, and if the decimal part of the first result is 0, successfully returning the multiplicand and an integer part I to finish the integer conversion;
2b) if the fractional part of the first result is not 0, the first result is converted to an integer according to the standard;
2c) if the last floating point number corresponding to the integer obtained in the step 2b) is the same as the integer part I of the first result, successfully converting and returning the multiplicand and the integer part I to finish the integer conversion;
2d) if the next floating point number corresponding to the integer obtained in the step 2b) is the same as the integer part I +1 of the first result, the conversion is successful, and the multiplicand and the integer part I +1 are returned to complete the integer conversion;
2e) if the integer conversion fails, entering step 3);
2f) if the integer conversion is successful, entering the step 4);
3) compression of floating point number:
4) and if the numerical value of the time sequence is the time stamp, compressing the time stamp, and if the integer numerical value of the time sequence is not the time stamp, performing integer compression.
2. The simplified coding/decoding method for time series database as claimed in claim 1, wherein in step 2a), the multiplicands are sequentially [ 1101001000100001000001000000 ].
3. The simplified encoding and decoding method for time series databases as claimed in claim 1, wherein in step 2b), the first result is converted into 64-bit integer according to the IEEE 754 binary floating point arithmetic standard.
4. The simplified coding and decoding method applied to the time series database according to claim 1, wherein in step 3), the floating point number compression specifically includes:
3a) the floating point number is converted into a 64-bit integer according to the standard;
3b) if the floating point number is the first floating point number in the time sequence, writing an identifier which indicates the floating point number, and storing data according to 64 bits;
3c) if the last numerical value of the time sequence of the floating point number is an integer and is a floating point number, storing an identifier to represent that the numerical value is changed, storing an identifier to represent that the numerical value is not repeated, and storing an identifier to represent the floating point number;
3d) if the last numerical value of the time sequence of the floating point number is the floating point number and is the same as the numerical value of the floating point number, storing an identifier to represent that no change occurs, and storing an identifier to represent repetition;
3e) if the last one of the time sequence of the floating point number is a floating point number and is different from the numerical value of the floating point number, storing an identifier representing that no change occurs, converting the identifier into 64 bits with the last floating point number, then calculating an exclusive-or value, comparing leading zeros and trailing zeros of the exclusive-or value and the last exclusive-or value, and storing a (64-PLZ-PTZ) bit meaningful numerical value if the leading zeros PLZ of the previous exclusive-or value and the trailing zeros PTZ of the previous exclusive-or value are less than or equal to the leading zeros CLZ of the current exclusive-or value and the trailing zeros CTZ of the current exclusive-or value;
3f) if the number of leading zeros of the previous xor value PLZ and the zero-number of the trailing zeros of the previous xor value PTZ are greater than the number of leading zeros of the current xor value CLZ and the zero-number of the trailing zeros of the current xor value CTZ, a significant value of (64-CLZ-CTZ) bits needs to be stored.
5. The simplified encoding and decoding method for time series databases as claimed in claim 4, wherein in step 3a), the floating point number is converted into a 64-bit integer according to the IEEE 754 binary floating point arithmetic standard.
6. The simplified coding and decoding method applied to the time series database as claimed in claim 4, wherein the step 3f) of storing (64-CLZ-CTZ) significant values includes:
the 6 bits are stored to represent the CLZ, then the 6 bits are stored to represent the 64-CLZ-CTZ, and finally the meaningful data of the 64-CLZ-CTZ bits are stored.
7. The simplified coding and decoding method applied to the time series database according to claim 1, wherein the step 4) of compressing the time stamp specifically comprises:
4a) defining time units including nanosecond, microsecond, millisecond and second;
4b) storing a value converted by a fixed time unit into a first time stamp in the time sequence;
4c) calculating the difference value from the last timestamp from the second timestamp in the time sequence, then calculating the difference value, then performing conversion by a fixed time unit, if the unit is second or millisecond, the result of the conversion of the value is maximum 32 bits, if the unit is nanosecond or microsecond, the result of the conversion of the value is maximum 64 bits, and finally selecting proper bytes in a byte queue [0,7,9,12,32] or [0,7,9,12,64] according to the size of the converted result value to store, and distinguishing different byte queues by prefix bits.
8. The simplified coding and decoding method applied to the time series database according to claim 1, wherein in the step 4), the integer compression specifically comprises:
4l) if the current integer is the first value of the time sequence, writing an identifier which indicates the integer, then storing the identifier to mark the multiplicand, and distinguishing the positive number from the negative number by using the identifier;
leading zero number CLZ of the current integer absolute value is stored according to the consistency of 64-CLZ and 64-PLZ, the identifier is stored to indicate the consistency, if the leading zero number CLZ is inconsistent with the 64-CLZ, the 64-CLZ value is stored, and then the significant value of the 64-CLZ bit is stored;
4m) if a value in the time sequence of the current integer is also an integer, the multiplier is the same and the difference D is 0, storing an identifier to represent that the change occurs, and recording an identifier to represent the repeated integer;
4n) if a value in the time sequence of the current integer is an integer, but the multiplier is not the same or the difference D is not 0, distinguishing positive and negative numbers by using an identifier, not increasing the multiplicand and the number of bits 64-CLZ of the effective value is the same as the difference calculated at the last time, storing an identifier to represent no change, then storing an identifier to distinguish the positive and negative numbers, and finally storing a meaningful value of the 64-CLZ bits;
4o) if a value in the time sequence of the current integer is not an integer, distinguishing positive and negative numbers by an identifier, storing an identifier to represent that the change occurs, storing an identifier to represent that the change does not occur, storing the number CLZ of leading zeros of the absolute value of the current integer, according to the agreement between 64-CLZ and 64-PLZ, storing the identifier to indicate the agreement, if the agreement does not occur, storing the 64-CLZ value, and then storing the meaningful value of the 64-CLZ bit.
CN202110259307.3A 2021-03-10 2021-03-10 Simple encoding and decoding method suitable for time sequence database Active CN113078908B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110259307.3A CN113078908B (en) 2021-03-10 2021-03-10 Simple encoding and decoding method suitable for time sequence database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110259307.3A CN113078908B (en) 2021-03-10 2021-03-10 Simple encoding and decoding method suitable for time sequence database

Publications (2)

Publication Number Publication Date
CN113078908A true CN113078908A (en) 2021-07-06
CN113078908B CN113078908B (en) 2022-03-25

Family

ID=76612660

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110259307.3A Active CN113078908B (en) 2021-03-10 2021-03-10 Simple encoding and decoding method suitable for time sequence database

Country Status (1)

Country Link
CN (1) CN113078908B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114327264A (en) * 2021-12-22 2022-04-12 北京力控元通科技有限公司 Time sequence data compression method, device and equipment
CN114726380A (en) * 2022-06-07 2022-07-08 西南交通大学 Monitoring data lossless compression method, device, equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101795138A (en) * 2010-01-19 2010-08-04 北京四方继保自动化股份有限公司 Compressing method for high density time sequence data in WAMS (Wide Area Measurement System) of power system
CN104516894A (en) * 2013-09-27 2015-04-15 国际商业机器公司 Method and device for managing time series database
US20170270172A1 (en) * 2015-06-05 2017-09-21 Palantir Technologies Inc. Time-series data storage and processing database system
CN110633277A (en) * 2019-08-13 2019-12-31 平安科技(深圳)有限公司 Time sequence data storage method and device, computer equipment and storage medium
CN110995273A (en) * 2019-10-21 2020-04-10 武汉神库小匠科技有限公司 Data compression method, device, equipment and medium for power database

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101795138A (en) * 2010-01-19 2010-08-04 北京四方继保自动化股份有限公司 Compressing method for high density time sequence data in WAMS (Wide Area Measurement System) of power system
CN104516894A (en) * 2013-09-27 2015-04-15 国际商业机器公司 Method and device for managing time series database
US20170270172A1 (en) * 2015-06-05 2017-09-21 Palantir Technologies Inc. Time-series data storage and processing database system
CN110633277A (en) * 2019-08-13 2019-12-31 平安科技(深圳)有限公司 Time sequence data storage method and device, computer equipment and storage medium
CN110995273A (en) * 2019-10-21 2020-04-10 武汉神库小匠科技有限公司 Data compression method, device, equipment and medium for power database

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐化岩等: "基于influxDB的工业时序数据库引擎设计", 《计算机应用与软件》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114327264A (en) * 2021-12-22 2022-04-12 北京力控元通科技有限公司 Time sequence data compression method, device and equipment
CN114327264B (en) * 2021-12-22 2023-05-12 北京力控元通科技有限公司 Time sequence data compression method, device and equipment
CN114726380A (en) * 2022-06-07 2022-07-08 西南交通大学 Monitoring data lossless compression method, device, equipment and readable storage medium

Also Published As

Publication number Publication date
CN113078908B (en) 2022-03-25

Similar Documents

Publication Publication Date Title
CN112953550B (en) Data compression method, electronic device and storage medium
CN113078908B (en) Simple encoding and decoding method suitable for time sequence database
US9852169B2 (en) Compression of tables based on occurrence of values
US8704685B2 (en) Encoding method, encoding apparatus, decoding method, decoding apparatus, and system
EP1995878B1 (en) Block compression of tables with repeated values
JP3362177B2 (en) Data compression method and apparatus
US8400335B2 (en) Using variable length code tables to compress an input data stream to a compressed output data stream
US20130103655A1 (en) Multi-level database compression
CN111008230B (en) Data storage method, device, computer equipment and storage medium
CN111125033B (en) Space recycling method and system based on full flash memory array
JP2012533921A (en) Data compression method
US20200294629A1 (en) Gene sequencing data compression method and decompression method, system and computer-readable medium
CN116594572B (en) Floating point number stream data compression method, device, computer equipment and medium
WO2021226922A1 (en) Data compression method, apparatus and device, and readable storage medium
CN100349160C (en) Data compression method by finite exhaustive optimization
JP2012506665A (en) Method and apparatus for compressing and decompressing data records
EP4154406A1 (en) Compression/decompression using index correlating uncompressed/compressed content
CN114268323B (en) Data compression coding method, device and time sequence database supporting line memory
CN113312325B (en) Track data transmission method, device, equipment and storage medium
CN111061428B (en) Data compression method and device
CN109255090B (en) Index data compression method of web graph
US10037148B2 (en) Facilitating reverse reading of sequentially stored, variable-length data
JP3018990B2 (en) Arithmetic coding device
WO2023082156A1 (en) Lz77 decoding circuit and operation method thereof
CN115525469A (en) Method for recovering damaged file after compression by adopting WinRAR5

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant