CN115380267A - Data compression method and device, data compression equipment and readable storage medium - Google Patents

Data compression method and device, data compression equipment and readable storage medium Download PDF

Info

Publication number
CN115380267A
CN115380267A CN202080099580.8A CN202080099580A CN115380267A CN 115380267 A CN115380267 A CN 115380267A CN 202080099580 A CN202080099580 A CN 202080099580A CN 115380267 A CN115380267 A CN 115380267A
Authority
CN
China
Prior art keywords
timestamp
data point
value
chunk
current data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080099580.8A
Other languages
Chinese (zh)
Inventor
郭子亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Shenzhen Huantai Technology Co Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Shenzhen Huantai Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd, Shenzhen Huantai Technology Co Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Publication of CN115380267A publication Critical patent/CN115380267A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/50Conversion to or from non-linear codes, e.g. companding

Abstract

A data compression method, a data compression apparatus (10), a data compression device (100) and a non-volatile computer-readable storage medium (200). The data compression method comprises the following steps: (011) Obtaining a current data point, the current data point including an associated timestamp and a numerical value; (012) Calculating a first change value of the timestamp of the current data point and the reference timestamp, and a second change value of the numerical value of the current data point and the reference numerical value; and (013) storing the first variation value and the second variation value.

Description

Data compression method and device, data compression equipment and readable storage medium Technical Field
The present application relates to the field of database technologies, and in particular, to a data compression method, a data compression apparatus, a data compression device, and a non-volatile computer-readable storage medium.
Background
Data compression is a technical method for reducing the data volume to reduce the storage space and improve the transmission, storage and processing efficiency of the data on the premise of not losing useful information, or for reorganizing the data according to a certain method and reducing the redundancy and storage space of the data. The time sequence database comprises a plurality of data points, each data point comprises a time stamp and a numerical value, and when the data points are stored, the data points are generally compressed and stored through a data compression method so as to reduce the storage space occupied by the data points.
Disclosure of Invention
The embodiment of the application provides a data compression method, a data compression device and a non-volatile computer readable storage medium.
The data compression method of the embodiment of the application comprises the following steps: obtaining a current data point, the current data point including an associated timestamp and a numerical value; calculating a first change value of the timestamp and the reference timestamp of the current data point and a second change value of the numerical value and the reference numerical value of the current data point; and storing the first variation value and the second variation value.
The data compression device comprises a first acquisition module, a calculation module and a first storage module. The first acquisition module is used for acquiring a current data point, and the current data point comprises an associated timestamp and a numerical value; the calculation module is used for calculating a first change value of the timestamp and the reference timestamp of the current data point and a second change value of the numerical value and the reference numerical value of the current data point; the first storage module is used for storing the first variation value and the second variation value.
The data compression device comprises a memory and a processor, wherein the processor is used for acquiring a current data point, the current data point comprises an associated timestamp and a value, and calculating a first change value of the timestamp and a reference timestamp of the current data point and a second change value of the value and the reference value of the current data point; the memory is used for storing the first variation value and the second variation value.
One or more non-transitory computer-readable storage media embodying computer-executable instructions that, when executed by one or more processors, cause the processors to perform the following data compression steps: obtaining a current data point, the current data point including an associated timestamp and a numerical value; calculating a first change value of the timestamp and the reference timestamp of the current data point and a second change value of the numerical value and the reference numerical value of the current data point; and storing the first variation value and the second variation value.
Additional aspects and advantages of embodiments of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The above and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a schematic flow chart diagram of a data compression method according to some embodiments of the present application.
FIG. 2 is a block diagram of a data compression apparatus according to some embodiments of the present application.
FIG. 3 is a block diagram of a data compression apparatus according to some embodiments of the present application.
FIG. 4 is a schematic diagram of a data compression method according to some embodiments of the present application.
FIG. 5 is a schematic illustration of a data compression method according to some embodiments of the present application.
FIG. 6 is a flow chart illustrating a data compression method according to some embodiments of the present application.
FIG. 7 is a flow chart illustrating a data compression method according to some embodiments of the present application.
FIG. 8 is a schematic illustration of a data compression method according to some embodiments of the present application.
FIG. 9 is a flow chart illustrating a data compression method according to some embodiments of the present application.
FIG. 10 is a flow chart illustrating a method of data compression according to some embodiments of the present application.
FIG. 11 is a block diagram of a computer-readable storage medium and a processor according to embodiments of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application, and should not be construed as limiting the present application.
In order to implement data compression and ensure a higher compression rate, in the current compression method, adjacent data points are generally stored in an associated manner (for example, the current data point is represented and stored by a difference value between the current data point and a previous data point during compression, or the current data point is represented and stored by an exclusive or result between the current data point and the previous data point during compression), the compressed current data point is associated with the adjacent previous data point, once disorder occurs, because two adjacent data points are associated, when a data point associated with the disordered data point is to be determined, decompression of the whole data block is required to restore data before compression of each data point, a position where the disordered data point should be stored can be determined according to a timestamp of the disordered data point, and the write-in performance during disorder is greatly influenced.
Referring to fig. 1, to solve the above technical problem, the present application provides a data compression method, including:
011: obtaining a current data point, the current data point including an associated timestamp and a numerical value;
012: calculating a first change value of the timestamp of the current data point and the reference timestamp, and a second change value of the numerical value of the current data point and the reference numerical value; and
013: the first variation value and the second variation value are stored.
Referring to fig. 2 and 3, the present application further provides a data compression apparatus 10. In the present embodiment, the data compression device 10 is applied to the data compression apparatus 100, that is, the data compression apparatus 100 includes the data compression device 10. The data compression apparatus 10 includes a first obtaining module 11, a calculating module 12, and a first storing module 13. The first obtaining module 11, the calculating module 12 and the first storing module 13 are configured to perform step 011, step 012 and step 013, respectively. Namely, the first obtaining module 11 is configured to obtain a current data point; the calculating module 12 is configured to calculate a first variation value of the timestamp of the current data point and the reference timestamp, and a second variation value of the numerical value of the current data point and the reference numerical value; and the first storage module 13 is used for storing the first variation value and the second variation value.
Referring to fig. 3, the present application further provides a data compression apparatus 100. The data compression apparatus 100 includes a processor 20 and a memory 30. The processor 20 is configured to obtain a current data point, where the current data point includes an associated timestamp and a value, calculate a first variation value of the timestamp and a reference timestamp of the current data point, and calculate a second variation value of the value and the reference value of the current data point; the memory 30 is used for storing a first variation value and a second variation value. That is, step 011, step 012, and step 013 can be implemented by processor 20.
The data compression device 100 may be a terminal, a server, or the like, and the terminal may be a mobile phone, a tablet computer, a monitoring camera, a display, a notebook computer, a teller machine, a gate, a smart watch, a head display device, a game console, or the like. It will be appreciated that the data compression apparatus 100 is not limited to the above-described apparatus, but may be any apparatus having a memory 30 and a processor 20.
When the data compression apparatus 100 is a terminal or a server, for example, the terminal or the server receives the data and compresses the data by the processor 20, and then stores the compressed data in the memory 30. In other embodiments, the data compression device 100 may be formed by combining part of any one or more of the above devices, for example, the data compression device 100 may include a processor 20 of the terminal and a memory 30 of the server, that is, after the terminal acquires data, the terminal compresses the data by its own processor 20 and sends the compressed data to the server, and the server stores the compressed data in the memory 30, so that the terminal compresses the data in real time, thereby reducing the problem of disorder caused by problems such as network, load balancing, API gateway, server delay, and the like. In the present embodiment, the data compression device 100 is described as an example of a server.
Specifically, the server is connected to the terminal to acquire data acquired by the terminal (for example, the terminal is a monitoring camera, the acquired data is monitoring video data, for example, the terminal is a mobile phone, and the acquired data is location information, electric quantity data, and the like of the mobile phone), or the server may be connected to other servers to acquire index information (for example, central Processing Unit (CPU) occupancy rate, memory usage amount, and the like) of other servers, thereby implementing real-time monitoring on other servers. The description is given by taking an example that the server is connected with the terminal to acquire the memory usage amount of the terminal, and the principle that the server is connected with other servers to acquire data acquired by other servers is basically the same, and is not repeated herein.
The terminal can acquire the memory usage of itself in real time, for example, the terminal acquires the memory usage once every predetermined time (e.g., 1 second (S), 2S, 3S) and then sends the memory usage to the server, where the data sent to the server by the terminal may include a tag of the terminal, an index name, a sending time, the memory usage, a collection time of the memory usage, and the like, where the tag may be used to represent a feature code of the terminal (the feature code is a unique identifier representing an identity of the terminal itself when the terminal performs network communication), the index name is used to represent an index corresponding to the sent data (e.g., a memory index corresponding to the memory usage, a CPU index corresponding to the CPU usage, and the like), the sending time is the time for sending the data, the sending time may be used as a timestamp associated with the memory usage, the collection time of the memory usage may also be used as a timestamp associated with the memory usage, generally, the collection time of the memory usage is before the sending time, and the collection time of the memory usage may be just the sending time, and the data is sent while the terminal collects the data. Due to problems of network, load balancing, API gateway, server delay, etc., data with a previous sending time may be received by the server after data with a later sending time, and an out-of-order problem may occur.
After the server receives the data sent by the terminal, the processor 20 first obtains the tag and the index name, and it can be understood that the server can monitor one or more terminals at the same time, the server can divide one or more data storage areas for each terminal to store the data sent by the corresponding terminal, after obtaining the sent data of one of the terminals, the corresponding data storage area can be found according to the tag and the index name, and then the data is stored in the data storage area.
Before storage, the processor 20 obtains a current data point, where the current data point includes a timestamp and a numerical value, where the timestamp may be a sending time of data sent by the terminal or a collecting time of memory usage, and in this embodiment, the timestamp is a sending time of data sent by the terminal (a sending time is a time difference between the current time and a preset time, such as the preset time is 1 month, 1 day, and 0 point in 1970), and the numerical value is memory usage (such as 1%, 6%, 25%, 65%, and 90%).
The time stamp and the memory usage amount in the data sent by the terminal are associated, and during storage, the time stamp and the associated memory usage amount are stored in an associated manner as a data point. After acquiring the current data point, the processor 20 compresses the current data point, and calculates a first change value of a timestamp of the current data point and a reference timestamp, and a second change value of a numerical value of the current data point and a reference numerical value, where the reference timestamp is a timestamp of a first data point of a data block for storing the current data point, the timestamp is a time that has not been subjected to compression processing (e.g., 3/20/0/2020), and is the same as a transmission time in data transmitted by the terminal, and the reference numerical value is a numerical value of a first data point of a data block for storing the current data point, and the numerical value is also an uncompressed memory usage amount, and is the same as a memory usage amount in data transmitted by the terminal. It can be understood that each data storage area may be divided into one or more data blocks, when the current data point is the first data point stored in the data block, the timestamp and the value of the current data point may be directly stored in the data block to serve as the reference timestamp and the reference value of the data block, the timestamp of the subsequent data point is a first variation value relative to the reference timestamp, and the value of the subsequent data point is a second variation value relative to the reference value, thereby implementing data compression of the data point.
When calculating the first change value of the timestamp of the current data point and the reference timestamp, the first change value may be obtained by calculating a difference between the timestamp of the current data point and the reference timestamp, for example, the first change value is equal to the difference between the timestamp of the current data point and the reference timestamp; or, a mapping relationship exists between a difference between the timestamp of the current data point and the reference timestamp and the first change value, and the first change value can be obtained by calculating a mapping formula and the difference, for example, the mapping formula is y = ax + b, where y is the first change value, x is the difference (i.e., time difference) between the timestamp of the current data point and the reference timestamp, and a and b are constants and can be freely set as needed. In other embodiments, the mapping formula may also be other formulas, such as a quadratic function, a cubic function, etc., and is not limited to the mapping formula of the above-mentioned first-order function. Therefore, the first change value can be obtained through the difference calculation, the timestamp of the current data point is represented and stored through the first change value, compared with the timestamp for directly storing the current data point, the first change value obtained according to the difference value can be a numerical value smaller than the numerical value of the timestamp of the current data point, the number of storage bits required for storing the numerical value is also small, and therefore compression of the timestamp of the current data point is achieved. When the time stamp of the current data point is decompressed to restore, the difference value between the time stamp of the current data point and the reference time stamp can be calculated according to the first change value, and the time stamp of the current data point can be calculated through the difference value and the reference time stamp, so that the lossless compression and decompression of the time stamp of the current data point are realized.
The timestamp of each current data point is a first variation value relative to the reference timestamp, taking the first variation value equal to the difference as an example, the later the time corresponding to the timestamp of the current data point is (i.e. the greater the difference from the preset time), the larger the first variation value is, that is, the order of the first variation value is the time order of the data points, so that when the data points are out of order (for example, the time corresponding to the timestamp of the current data point is earlier than the time corresponding to the timestamp of the previous data point), the position where the first variation value is located can be found in the compressed data block according to the timestamp of the current data point and the first variation value of the reference timestamp.
For example, as shown in fig. 4, five timestamps a, B, C, D, and E exist in the data block, where a is a reference timestamp (00.
When calculating the second variation value of the current data point and the reference value, the second variation value may be obtained by calculating a difference between the value of the current data point and the reference value, for example, the second variation value is equal to the difference between the value of the current data point and the reference value; or, a mapping relationship exists between the difference between the numerical value of the current data point and the reference timestamp and the second variation value, and the second variation value can be obtained through a mapping formula and calculation of the difference, for example, the mapping formula is Y = cX + d, where Y is the second variation value, X is the difference between the numerical value of the current data point and the reference numerical value, and c and d are constants, and can be freely set as required. In other embodiments, the mapping formula may also be other formulas, such as a quadratic function, a cubic function, etc., and is not limited to the mapping formula of the above-mentioned first-order function. Therefore, the second change value can be obtained through the calculation of the difference value, the value of the current data point is represented through the second change value and stored, compared with the method of directly storing the value of the current data point, the second change value obtained according to the difference value can be a value smaller than the value of the current data point, the number of storage bits required for storing the value is also less, and therefore the compression of the value of the current data point is achieved. And when the numerical value of the current data point is decompressed to restore, the difference value between the numerical value of the current data point and the reference numerical value can be calculated according to the second change value, and the numerical value of the current data point can be calculated through the difference value and the reference timestamp, so that the lossless compression and decompression of the numerical value of the current data point are realized.
When calculating the value of the current data point and the second change value of the reference value, the result of the xor operation may be used as the second change value by performing xor operation on the value of the current data point and the reference value, it may be understood that the terminal may pre-allocate a memory according to a usage requirement, the memory of each application program is pre-allocated, and when the user uses only one program, the memory usage amount is generally unchanged within a preset time duration (e.g., 5 minutes, 10 minutes, and the like), therefore, the value of the current data point within the preset time duration after the reference timestamp is basically unchanged, after performing xor operation on the value of the current data point and the reference value, the second change value obtained by xor operation may be small, and the value of the current data point is represented by the second change value and stored, thereby implementing compression of the value of the current data point, for example, as shown in fig. 4, after storing the reference timestamp a, the preset time duration (e.g., the time duration corresponding to the timestamp a is located within the preset time duration after the reference timestamp a, the obtained value (i.e.g., the memory usage amount) and the reference value are both the same, and the obtained are the two change values, if the two change values are the same, the current data point usage amount is 0%, and the two change values are only need to be stored, and the two change values, such as 0%, if the current data point; compared with the case of directly storing 5% of data which needs 3 bits, the data compression method stores the value of the current data point by the second change value after the XOR, can realize the data compression, and reduces the storage space occupied by the value of the current data point. And when the numerical value of the current data point is decompressed to restore the numerical value of the current data point, the numerical value of the current data point can be calculated according to the second change value and the reference numerical value, so that lossless compression and decompression of the numerical value of the current data point are realized. Of course, the second variation values obtained by the xor may not all be able to achieve data compression, and for an index (such as CPU utilization) with large value fluctuation of the data points, the number of storage bits required for the second variation values may be equal to the number of storage bits required for the reference value.
The data compression method, the data compression device 10 and the data compression apparatus 100 according to the embodiment of the present application perform lossless compression and decompression on the timestamp and the value of the current data point respectively through the reference timestamp and the reference value, and since the first change value and the second change value stored after compression are respectively associated with the reference timestamp and the reference value, when the data point is out of order (for example, the time corresponding to the timestamp of the current data point is earlier than the time corresponding to the timestamp of the previous data point), the location where the first change value is stored can be found in the compressed data block according to the timestamp of the current data point and the first change value of the reference timestamp. Compared with the way of compressing the data point by associating and storing adjacent data points, the way of storing the first change value associated with the reference time stamp and the second change value associated with the reference value to realize compression of the data point can accurately determine the storage position and the second change value of the current data point without decompressing other data points in the current data block, and the disorder writing performance is obviously improved.
Referring to fig. 6, in some embodiments, the data compression method further includes:
014: storing the data points according to the data blocks, wherein the data points in a preset time period are stored in the data blocks; and
015: the data block is divided to obtain a plurality of chunks, the reference time stamp is the time stamp of the first data point stored in the chunk, and the reference value is the value of the first data point stored in the chunk.
Referring again to fig. 2, in some embodiments, the data compression apparatus 10 further includes a second storage module 14 and a partitioning module 15. The second storage module 14 and the splitting module 15 are used to implement step 014 and step 015, respectively. That is, the second storage module 14 is configured to store data points by data blocks; the segmentation module 15 is configured to segment the data block to obtain a plurality of chunks.
Referring again to fig. 3, in some embodiments, the memory 30 is further configured to store data points in data blocks, the data blocks storing data points within a predetermined period of time; the processor 20 is further configured to segment the data block to obtain a plurality of chunks, the reference time stamp being a time stamp of a first data point stored in the chunk, and the reference value being a value of the first data point stored in the chunk. That is, step 014 may be performed by memory 30 and step 015 may be implemented by processor 20.
Specifically, when the memory 30 stores data points in a predetermined time period according to a data block, because the predetermined time period corresponding to the data block is generally long, such as 2 hours, 3 hours, and the like, in the long predetermined time period, if a data point is to be queried, all data points of the data block need to be queried to obtain the data point that is to be queried, the query efficiency is low, and when the data point is out of order and is written out of order, the position where the out of order data point should be stored can be determined by comparing the first variation value of the out of order data point with all the first variation values of the current data block, and the efficiency of writing out of order is also low.
Thus, a data block may be divided into chunks (e.g., 2 chunks, 3 chunks, 4 chunks, etc.) having a short duration. At this time, the reference time stamp is a time stamp of the first data point stored in the chunk, and the reference value is a value of the first data point stored in the chunk, so that the data point in one chunk is associated with only the first data point stored in the chunk. When a data point is queried, the chunk where the data point to be queried is located is quickly determined according to the timestamp of the queried data point and the reference timestamp of each chunk, and then the data point in the chunk is queried to obtain the data point to be queried. During out-of-order writing, the chunk where the data point to be queried is located can be quickly determined according to the timestamp of the queried data point and the reference timestamp of each chunk, the first change value of the out-of-order data point is compared with all the first change values of the current chunk to determine the position where the out-of-order data point should be stored, and compared with the method that the first change value of the out-of-order data point is compared with all the first change values in the data block to determine the position where the out-of-order data point should be stored, the out-of-order writing efficiency is high.
In addition, the size of the chunk can be determined according to the value change rule of the data point corresponding to one index, if the change of the memory usage amount of the memory index in a preset time length (e.g., 5 minutes, 10 minutes, etc.) is small, the time length of the chunk can be 5 minutes, that is, the size of the chunk is determined according to the preset time length, and the difference between the data point value corresponding to the index and the reference value is smaller than a preset value in the preset time length. In this way, the difference between the value of the stored data point in each chunk and the reference value of the chunk is small, and the second variation value obtained by xoring the value of the current data point and the reference value of the chunk is small, so that the compression rate can be improved.
Referring to fig. 7, in some embodiments, the data compression method further includes:
016: acquiring a query timestamp;
017: positioning a data block corresponding to the query timestamp according to the query timestamp and the preset time period;
018: positioning a chunk corresponding to the query timestamp according to the time difference between the query timestamp and the reference timestamp of the first chunk in the data block and the duration of the chunk, and sequentially storing the data block according to the reference timestamp corresponding to the chunk; and
019: and traversing the chunks corresponding to the query time stamps to obtain the data points corresponding to the query time stamps.
Referring again to fig. 2, in some embodiments, the data compression apparatus 10 further includes a second obtaining module 16, a first positioning module 17, a second positioning module 18, and a query module 19. The second obtaining module 16, the first positioning module 17, the second positioning module 18 and the query module 19 are respectively used for executing step 016, step 017, step 018 and step 019. Namely, the second obtaining module 16 is used for obtaining the query timestamp; the first positioning module 17 is configured to position a data block corresponding to the query timestamp according to the query timestamp and the predetermined time period; the second positioning module 18 is configured to position a chunk corresponding to the query timestamp according to a time difference between the query timestamp and a reference timestamp of a first chunk in the data block and a duration of the chunk; the query module 19 is configured to traverse the chunks corresponding to the query timestamps to obtain the data points corresponding to the query timestamps.
Referring again to fig. 3, the processor 20 may be further configured to obtain a query timestamp, locate a data block corresponding to the query timestamp according to the query timestamp and a predetermined time period, locate a chunk corresponding to the query timestamp according to a time difference between the query timestamp and a reference timestamp of a first chunk in the data block and a duration of the chunk, and traverse the chunk corresponding to the query timestamp to obtain a data point corresponding to the query timestamp. That is, step 016, step 017, step 018, and step 019 may be implemented by the processor 20.
Specifically, when a user wants to view data at a certain time, a timestamp may be input, and after the processor 20 obtains the query timestamp, according to the query timestamp and the predetermined time period corresponding to the data block, the data block stored in the data point corresponding to the query timestamp may be located, for example, as shown in fig. 8, the predetermined time periods of the data block a, the data block B, the data block C, and the data block D are respectively 9:00: 00, 11:00, point 01 to 13:00, 13:00:00, 15:00, point 01 to 17:00, query timestamp 12:10:00, the query timestamp is located in the predetermined time period corresponding to the data block B, and therefore, the data point corresponding to the query timestamp is stored in the data block B.
After locating the data block where the query timestamp is located, the processor 20 calculates the time difference between the query timestamp and the reference timestamp of the first chunk of the data block, and then determines the chunk where the query timestamp is located according to the time difference and the duration of each chunk. For example, as shown in fig. 8, the data block B is divided into a chunk a, a chunk B, a chunk c, and a chunk d, and reference time stamps of the chunk a, the chunk B, the chunk c, and the chunk d are 11:00, 11:30, and 12:00: 30, the chunk a, the chunk B, the chunk c and the chunk d are stored in sequence of the reference timestamps, the duration of each chunk is the same and is half an hour, the time difference between the query timestamp and the reference timestamp of the chunk a is 1 hour 9 minutes 59 seconds, the duration of each chunk is half an hour, the quotient obtained by the time difference/the duration of each chunk is 2, and the remainder is 599 seconds (both the time difference and the duration of each chunk are expressed in seconds), therefore, the chunk where the query timestamp is located can be determined to be the third chunk (namely, the chunk c) in the data block B, and the chunk where the query timestamp is located can be accurately located.
It will be appreciated that to facilitate rapid location of the query timestamp, the duration of the chunks within the data block are generally all the same, and the duration of the chunks may be less than or equal to half an hour, to ensure that the chunks are not too large, which may affect query efficiency and out-of-order write efficiency. The chunks may be divided into more chunks (e.g., 5 chunks, 6 chunks, etc.), and accordingly, the duration of each chunk may become shorter (e.g., 24 minutes, 20 minutes, etc.), without limitation.
After the chunk where the query timestamp is located, the chunk can be decompressed, the data point corresponding to the query timestamp can be quickly determined by traversing the timestamp of each data point after query decompression, and therefore data corresponding to the query timestamp can be obtained. Therefore, only the chunk corresponding to the query timestamp needs to be decompressed, the whole data block does not need to be decompressed to query the data point corresponding to the query timestamp, and the query efficiency is high.
Referring to fig. 9, in some embodiments, the data compression method further includes:
020: determining a preset sub-period of the chunk according to the duration of the chunk and the reference timestamp;
021: positioning a chunk corresponding to the query timestamp according to the query timestamp and the preset sub-period; and
022: and acquiring a data point corresponding to the query timestamp in the chunk according to the query timestamp and the time difference of the reference timestamp of the chunk corresponding to the query timestamp.
Referring again to fig. 2, in some embodiments, the data compression apparatus 10 further includes a first determining module 20, a third positioning module 21, and a third obtaining module 22. The first determining module 20, the third positioning module 21 and the third acquiring module 22 are respectively used for implementing step 020, step 021 and step 022. That is, the first determining module 20 is configured to determine the predetermined sub-period in which the chunk is located according to the duration of the chunk and the reference timestamp; the third positioning module 21 is configured to position a chunk corresponding to the query timestamp according to the query timestamp and the predetermined sub-period; the third obtaining module 22 is configured to obtain a data point corresponding to the query timestamp in the chunk according to the query timestamp and a time difference of the reference timestamp of the chunk corresponding to the query timestamp.
Referring again to FIG. 3, in some embodiments, the processor 20 is further configured to determine a predetermined sub-period in which the chunk is located according to the duration of the chunk and the reference timestamp, locate the chunk corresponding to the query timestamp according to the query timestamp and the predetermined sub-period, and obtain a data point corresponding to the query timestamp in the chunk according to the query timestamp and a time difference of the reference timestamp of the chunk corresponding to the query timestamp. That is, step 020, step 021 and step 022 can be realized by the processor 20.
Specifically, after the predetermined time period of the data block is determined, the predetermined sub-time period in which each chunk is located may be determined according to the duration of the chunk and the reference timestamp of each chunk, where the predetermined sub-time period is located within the predetermined time period, as shown in fig. 8 and 9, for example, a data point corresponding to the query timestamp is located in the data block B, and the predetermined time period of the data block B is 11:00, point 01 to 13:00, the time duration of a chunk is half an hour, and the reference timestamps of chunk a, chunk B, chunk c, and chunk d in data block B are 11:00, 11:30, and 12:00: 30, therefore, the predetermined subintervals corresponding to chunk a, chunk B, chunk c, and chunk d in data block B are 11:00, point 01 to 11:30: 00. 11:30, point 01 to 12:00: 00. 12:00, point 01 to 12:30: 00. and 12:30, point 01 to 13:00:00. in this manner, the predetermined sub-period corresponding to each chunk can be quickly determined.
Then, the processor 20 can quickly find the predetermined sub-period where the query timestamp is located according to the query timestamp and the predetermined sub-period, and if the query timestamp is 12.00: 00: 30:00, so as to locate the chunk (i.e., the chunk c) corresponding to the predetermined sub-period according to the predetermined sub-period corresponding to the query timestamp, after locating the chunk where the data point corresponding to the query timestamp is located, the processor 20 may calculate a second variation value corresponding to the query timestamp according to a time difference between the query timestamp and the reference timestamp of the chunk, and then the processor 20 compares the second variation value corresponding to the query timestamp with all the second variation values in the chunk to obtain the data point corresponding to the query timestamp. Therefore, the data point corresponding to the query timestamp can be rapidly queried by comparing the second change value corresponding to the query timestamp with all the second change values in the chunk without decompressing the current chunk, and then the data point is independently decompressed to obtain the data corresponding to the query timestamp, so that the query efficiency is higher.
Referring to fig. 10, in some embodiments, the data compression method further includes:
023: judging whether the time difference of the time stamps of the current data point and the previous data point of the current data point is larger than a preset threshold value or not;
if yes, go to step 012;
024: if not, positioning a chunk corresponding to the current data point according to the timestamp of the current data point; and
025: and determining the storage position of the current data point according to the time difference between the time stamp of the current data point and the reference time stamp of the chunk, and inserting the current data point into the storage position.
Referring again to fig. 2, in some embodiments, the data compression apparatus 10 further includes a determination module 23, a fourth positioning module 24, and a second determination module 25. The judging module 23, the fourth positioning module 24 and the second determining module 25 are respectively configured to execute step 023, step 024 and step 025. That is, the determining module 23 is configured to determine whether a time difference between time stamps of a current data point and a previous data point of the current data point is greater than a predetermined threshold; the calculating module 12 is configured to calculate a first variation value of the timestamp and the reference timestamp, and a second variation value of the numerical value and the reference numerical value when a time difference between timestamps of a current data point and a previous data point of the current data point is greater than a predetermined threshold; the fourth positioning module 24 positions the chunk corresponding to the current data point according to the timestamp of the current data point when the time difference between the timestamps of the current data point and the previous data point of the current data point is less than or equal to the predetermined threshold; the second determining module 25 determines the storage location of the current data point according to the time difference between the time stamp of the current data point and the reference time stamp of the chunk and inserts the current data point into the storage location.
Referring again to fig. 3, in some embodiments, the processor 20 is further configured to determine whether a time difference between time stamps of a current data point and a previous data point of the current data point is greater than a predetermined threshold, and when the time difference between the time stamps of the current data point and the previous data point of the current data point is greater than the predetermined threshold, calculate a first variation value of the time stamp and the reference time stamp, and a second variation value of the numerical value and the reference numerical value; when the time difference between the time stamps of the current data point and the previous data point of the current data point is smaller than or equal to a preset threshold value, the current data point is positioned to the chunk corresponding to the current data point according to the time stamp of the current data point, the storage position of the current data point is determined according to the time difference between the time stamp of the current data point and the reference time stamp of the chunk, and the current data point is inserted into the storage position. That is, step 022, step 023, and step 024 may be implemented by processor 20.
Specifically, after acquiring the current data point, the processor 20 first determines whether a time difference between a time stamp of the current data point and a time stamp of a previous data point is greater than a predetermined threshold (for example, the predetermined threshold may be 0S, 1S, or the like, and in this embodiment, the predetermined threshold is 0S), and when the time difference is greater than the predetermined threshold, it is determined that the time stamps of the current data point are after the time stamps of all the acquired data points, and the timing of the data points is not disturbed, and at this time, the current data point may be compressed (i.e., step 012 is executed), and the data point may be stored after compression (i.e., step 013 is executed). When the time difference is smaller than the predetermined threshold, it may be determined that the time stamp of the current data point is before the time stamp of the previous data point, and the time sequence of the data point is disordered, at this time, a chunk corresponding to the time stamp may be determined according to the time stamp of the current data point (the positioning method please refer to the positioning method for the query time stamp, which is not described herein again), then a first variation value may be calculated according to the time difference between the time stamp and the reference time stamp of the chunk, and the storage location of the current data point may be determined by comparing the first variation value with all the first variation values in the chunk, for example, the storage location is between two adjacent data points, and the first variation values of the two data points are respectively greater than the first variation value and smaller than the first variation value, so that the storage location of the current data point may be accurately determined, and then the processor 20 inserts the compressed current data point into the storage location, and the writing of the disordered data point may be completed without decompressing the data point in the group.
Referring to fig. 11, the present application further provides a non-volatile computer-readable storage medium 200, where the non-volatile computer-readable storage medium 200 contains computer-executable instructions 202, and when the computer-executable instructions 202 are executed by one or more processors 20, the processor 20 is caused to execute the data compression method of any one of the above embodiments.
For example, referring to FIG. 1, computer-executable instructions 202, when executed by one or more processors 20, cause processors 20 to perform the steps of:
011: obtaining a current data point, the current data point including an associated timestamp and a numerical value;
012: calculating a first change value of the timestamp and the reference timestamp of the current data point and a second change value of the numerical value and the reference numerical value of the current data point; and
013: the first variation value and the second variation value are stored.
Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations of the above embodiments may be made by those of ordinary skill in the art within the scope of the present application, which is defined by the claims and their equivalents.

Claims (20)

  1. A method of data compression, comprising:
    obtaining a current data point, the current data point including an associated timestamp and a numerical value;
    calculating a first change value of the timestamp and the reference timestamp of the current data point and a second change value of the numerical value and the reference numerical value of the current data point; and
    storing the first variation value and the second variation value.
  2. The data compression method of claim 1, wherein the calculating a first variation value of the time stamp and the reference time stamp, and a second variation value of the numerical value and the reference numerical value comprises:
    calculating the first variation value according to the difference value of the timestamp and the reference timestamp; and
    exclusive-or the value and a reference value to calculate the second variation value.
  3. The data compression method of claim 1, further comprising:
    storing the data points by data blocks, wherein the data blocks store the data points within a predetermined time period; and
    and dividing the data block to obtain a plurality of chunks, wherein the reference timestamp is the timestamp of the first data point stored in the chunk, and the reference value is the value of the first data point stored in the chunk.
  4. The data compression method of claim 3, further comprising:
    acquiring a query timestamp;
    positioning the data block corresponding to the query timestamp according to the query timestamp and the preset time period;
    positioning a chunk corresponding to the query timestamp according to the time difference between the query timestamp and the reference timestamp of the first chunk in the data block and the duration of the chunk, and storing the data block in the sequence of the reference timestamps corresponding to the chunks; and
    and traversing and querying the chunks corresponding to the query timestamps to obtain data points corresponding to the query timestamps.
  5. The data compression method of claim 3, further comprising:
    determining a preset sub-period in which the chunk is located according to the duration of the chunk and the reference timestamp;
    locating the chunk corresponding to the query timestamp according to the query timestamp and the predetermined sub-period; and
    and acquiring the data point corresponding to the query timestamp in the chunk according to the query timestamp and the time difference of the reference timestamp of the chunk corresponding to the query timestamp.
  6. The data compression method of claim 3, further comprising:
    determining whether a time difference between the timestamps of the current data point and a previous data point of the current data point is greater than a predetermined threshold;
    if yes, entering the step of calculating a first change value of the timestamp and the reference timestamp and a second change value of the numerical value and the reference numerical value;
    if not, positioning the chunk corresponding to the current data point according to the timestamp of the current data point; and
    and determining the storage position of the current data point according to the time difference between the time stamp of the current data point and the reference time stamp of the chunk and inserting the current data point into the storage position.
  7. A method as claimed in claim 3, wherein the duration of each of said chunks in said data block is the same.
  8. A method of data compression as claimed in claim 3 in which the chunks are less than half an hour in duration.
  9. The data compression method of claim 1,
    acquiring a label and an index name; and
    and storing the current data point into a data storage area corresponding to the index name and the label according to the label and the index name.
  10. A data compression apparatus, comprising:
    a first obtaining module, configured to obtain a current data point, where the current data point includes an associated timestamp and a numerical value;
    the calculation module is used for calculating a first change value of the timestamp and the reference timestamp of the current data point and a second change value of the numerical value and the reference numerical value of the current data point; and
    and the first storage module is used for storing the first change value and the second change value.
  11. A data compression device, comprising a processor and a memory, the processor configured to obtain a current data point, the current data point including an associated timestamp and a value, and to calculate a first variance of the timestamp and a reference timestamp of the current data point, and a second variance of the value and the reference value of the current data point; the memory is used for storing the first variation value and the second variation value.
  12. The data compression apparatus of claim 11, wherein the processor is further configured to calculate the first variance value based on a difference between the timestamp and the reference timestamp, and to xor the value with a reference value to calculate the second variance value.
  13. The data compression apparatus of claim 11, wherein the memory is further configured to store the data points in data blocks, the data blocks storing the data points within a predetermined period of time; the processor is further configured to segment the data block to obtain a plurality of chunks, where the reference timestamp is a timestamp of a first data point stored in the chunk, and the reference value is a value of the first data point stored in the chunk.
  14. The data compression apparatus of claim 13, wherein the processor is further configured to obtain a query timestamp, locate the data block corresponding to the query timestamp according to the query timestamp and the predetermined time period, locate the chunk corresponding to the query timestamp according to the query timestamp and a time difference of a reference timestamp of a first chunk of the data blocks, and a duration of the chunk, store the chunks of the data blocks in the order of the reference timestamps corresponding to the chunks, and traverse the chunks corresponding to the query timestamp to obtain the data point corresponding to the query timestamp.
  15. The data compression device of claim 13, wherein the processor is further configured to determine a predetermined sub-period in which the chunk is located according to the duration of the chunk and the reference timestamp, locate the chunk corresponding to the query timestamp according to the query timestamp and the predetermined sub-period, and obtain the data point corresponding to the query timestamp in the chunk according to a time difference between the query timestamp and the reference timestamp of the chunk corresponding to the query timestamp.
  16. The data compression apparatus of claim 13, wherein the processor is further configured to determine whether a time difference between the timestamps of the current data point and a previous data point to the current data point is greater than a predetermined threshold, and if the time difference between the timestamps of the current data point and the previous data point to the current data point is greater than the predetermined threshold, then calculate a first variation value between the timestamp and a reference timestamp, and a second variation value between the value and a reference value; when the time difference between the time stamps of the current data point and the previous data point of the current data point is smaller than or equal to the preset threshold value, the current data point is positioned to the chunk corresponding to the current data point according to the time stamp of the current data point, the storage position of the current data point is determined according to the time difference between the time stamp of the current data point and the reference time stamp of the chunk, and the current data point is inserted into the storage position.
  17. The data compression apparatus of claim 13 wherein the duration of each of the chunks in the data block is the same.
  18. The data compression apparatus of claim 13 wherein the chunks are less than half an hour in duration.
  19. The data compression device of claim 13, wherein the processor is further configured to obtain a tag and a target name; and storing the current data point into a data storage area corresponding to the index name and the label according to the label and the index name.
  20. A non-transitory computer-readable storage medium containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform the detection method of any one of claims 1 to 9.
CN202080099580.8A 2020-05-14 2020-05-14 Data compression method and device, data compression equipment and readable storage medium Pending CN115380267A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/090198 WO2021226922A1 (en) 2020-05-14 2020-05-14 Data compression method, apparatus and device, and readable storage medium

Publications (1)

Publication Number Publication Date
CN115380267A true CN115380267A (en) 2022-11-22

Family

ID=78526200

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080099580.8A Pending CN115380267A (en) 2020-05-14 2020-05-14 Data compression method and device, data compression equipment and readable storage medium

Country Status (2)

Country Link
CN (1) CN115380267A (en)
WO (1) WO2021226922A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116303409B (en) * 2023-05-24 2023-08-08 北京庚顿数据科技有限公司 Industrial production time sequence data transparent compression method with ultrahigh compression ratio
CN117009755B (en) * 2023-10-07 2023-12-19 国仪量子(合肥)技术有限公司 Waveform data processing method, computer-readable storage medium, and electronic device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI241502B (en) * 2002-12-26 2005-10-11 Ind Tech Res Inst Real time data compression apparatus for a data recorder
JP2007072752A (en) * 2005-09-07 2007-03-22 Nippon Telegr & Teleph Corp <Ntt> Similar time series data calculation method, device, and program
CN103577456B (en) * 2012-07-31 2016-12-21 国际商业机器公司 For the method and apparatus processing time series data
CN106330995A (en) * 2015-06-19 2017-01-11 陕西重型汽车有限公司 Three-level data compression device used for vehicle networking and method thereof
CN106055275A (en) * 2016-05-24 2016-10-26 深圳市敢为软件技术有限公司 Data compression recording method and apparatus
CN109597588B (en) * 2018-12-11 2020-09-04 浙江中智达科技有限公司 Data storage method, data restoration method and device

Also Published As

Publication number Publication date
WO2021226922A1 (en) 2021-11-18

Similar Documents

Publication Publication Date Title
KR102511271B1 (en) Method and device for storing and querying time series data, and server and storage medium therefor
CA2898667C (en) Data object processing method and apparatus
CN110347651B (en) Cloud storage-based data synchronization method, device, equipment and storage medium
CN107409152B (en) Method and apparatus for compressing data received over a network
CN107046812B (en) Data storage method and device
CN111949621B (en) File compression storage method and terminal based on scene switching
US11249987B2 (en) Data storage in blockchain-type ledger
CN110995273B (en) Data compression method, device, equipment and medium for power database
CN115380267A (en) Data compression method and device, data compression equipment and readable storage medium
WO2017113124A1 (en) Server and method for compressing data by server
US11675768B2 (en) Compression/decompression using index correlating uncompressed/compressed content
CN110597461B (en) Data storage method, device and equipment in block chain type account book
CN114817651A (en) Data storage method, data query method, device and equipment
CN114398520A (en) Data retrieval method, system, device, electronic equipment and storage medium
WO2021082926A1 (en) Data compression method and apparatus
CN109302449A (en) Method for writing data, method for reading data, device and server
CN115398406A (en) Data compression method and device, electronic equipment and storage medium
CN107783990B (en) Data compression method and terminal
CN105630999A (en) Data compressing method and device of server
CN104637496A (en) Computer system and audio comparison method
CN110032432B (en) Example compression method and device and example decompression method and device
CN114547030B (en) Multi-stage time sequence data compression method and device, electronic equipment and storage medium
CN116303297B (en) File compression processing method, device, equipment and medium
CN110633277B (en) Time sequence data storage method, device, computer equipment and storage medium
CN113672575A (en) Data compression method and device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination