CN111930751A

CN111930751A - Time sequence data storage method and device

Info

Publication number: CN111930751A
Application number: CN202010891609.8A
Authority: CN
Inventors: 张艳清; 杨尧; 陈博; 胥莉君; 鲁燕
Original assignee: Chengdu Sefon Software Co Ltd
Current assignee: Chengdu Sefon Software Co Ltd
Priority date: 2020-08-31
Filing date: 2020-08-31
Publication date: 2020-11-13

Abstract

The invention discloses a time sequence data storage method and a time sequence data storage device, which mainly solve the problem that a server is easy to crash because a large amount of data is directly compressed by a time sequence database in the prior art. The storage method of the time sequence data comprises the following steps: splitting original data into different data sets and writing the data sets into different sub-tables; the sub-table is written to the timing database. By the scheme, the method and the device have the advantages that tens of thousands of columns of measurement data are split into different sub-tables and are compressed and stored; meanwhile, a storage format for storing the sub-table metadata is provided, and the aim of storing the sub-table is effectively managed.

Description

Time sequence data storage method and device

Technical Field

The invention relates to the technical field of data storage and query, in particular to a method and a device for storing time sequence data.

Background

The time sequence database is mainly used for processing data with time labels, namely time-sequenced data which are changed according to the time sequence, and the data with the time labels are also called time sequence data; when the existing time sequence database is used for writing in tens of thousands of columns of data, the data is directly compressed, and under the condition that the original data volume of one row is huge, even the data volume after batch data compression is also massive, so that the server bears huge pressure, and the server is easy to crash.

Disclosure of Invention

The invention aims to provide a time sequence data storage method and a time sequence data storage device, which are used for solving the problem that a server is easy to crash because a large amount of data is directly compressed by the conventional time sequence database.

In order to solve the above problems, the present invention provides the following technical solutions:

a storage method of time sequence data comprises the following steps: splitting original data into different data sets and writing the data sets into different sub-tables; storing the metadata information of the sub-table into the metadata of the original data; the sub-table is written to the timing database.

The conventional time-series database generally writes and reads ten thousand columns of data:

writing: and extracting a time period with a fixed length according to the timestamp, compressing the time period and writing the time period into the KV engine.

Reading: when data is read, the column to be read is calculated through the specified time stamp, and then the corresponding time sequence data block is read in the KV removing engine.

According to the method, before data is written in, the data is split and written into different sub-tables according to the measurement columns, so that the data written into one row in each table is one percent of the data written into one row in the original row, the total amount of compressed data is reduced, the writing overhead can be greatly reduced, the throughput of a time sequence database is integrally improved, and the method is more practical especially in an offline analysis scene of the time sequence data; storing the metadata information of the sub-table into the metadata of the original data, and providing a metadata information persistence structure for managing the metadata information of the sub-table from a data table of the original data during reading; reading amplification is reduced, and the corresponding column names can be directly positioned by managing metadata of the sub-tables; a storage format for storing sub-table metadata is provided to efficiently manage storage of sub-tables.

Further, after the original data measurement columns are sorted according to the column names and the hash, the original data measurement columns are split into different measurement column subsets according to the set number of measurement columns; and splitting the measurement data of tens of thousands of columns into different sub-tables for compression storage.

Further, the original data comprises an index column set and a time column, the index column set, the time column and the measurement column subset are combined into new data, and the data is written into the sub-table.

Further, after data is written in the sub-table, a key is extracted for each set tag, and the data is compressed by the extracted key in accordance with a time stamp, and compressed to generate compressed _ Block.

Further, the key and the following data Block are written into the KV engine as a KV, and the last timestamp of each Compress _ Block is used as a column name.

A data storage device based on time series data comprises

A memory: for storing executable instructions;

a processor: the storage method is used for executing the executable instructions stored in the memory and realizing the storage method of the time sequence data.

Compared with the prior art, the invention has the following beneficial effects:

(1) according to the invention, before data is written in, the data is split and written in different sub-tables according to the measurement columns, so that the data written in one row in each table is one percent of the data in the original row, the total amount of compressed data is reduced, the writing overhead can be greatly reduced, the throughput of the time sequence database is integrally improved, and the method is more practical especially in an offline analysis scene of the time sequence data.

(2) Storing the metadata information of the sub-table into the metadata of the original data, and providing a metadata information persistence structure for managing the metadata information of the sub-table from a data table of the original data during reading; reading amplification is reduced, and the corresponding column names can be directly positioned by managing metadata of the sub-tables; a storage format for storing sub-table metadata is provided to efficiently manage storage of sub-tables.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts, wherein:

FIG. 1 is a diagram of data split into sub-tables.

Fig. 2 is a diagram of a data write sub-table.

Fig. 3 is a diagram of metadata management.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to fig. 1 to 3, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

Example 1

A storage method of time sequence data comprises the following steps: splitting original data into different data sets and writing the data sets into different sub-tables; writing the sub-table into a time sequence database; by the scheme, tens of thousands of columns of measurement data are split into different sub-tables for compression and storage, so that the total amount of compressed data is reduced, the write overhead can be greatly reduced, and the throughput of the time sequence database is integrally improved; the method for writing the sub-table into the time sequence database adopts the existing time sequence data to write into the time sequence database, is a mature prior art, and therefore, the description is not repeated here.

Example 2

As shown in fig. 1, in this embodiment, based on embodiment 1, after sorting the original data metric columns by the column name hash, splitting the original data metric columns into different metric column subsets according to a set number of metric columns; the original data comprises an index column set and a time column, the index column set, the time column and the measurement column subset are combined into new data, and the data are written into a sub-table; the data is split and recombined and then written into different sub-tables and then compressed, so that the data volume is reduced, and the total amount of compressed data is also reduced.

Example 3

As shown in fig. 2, in this embodiment, after data is written into the sub-table, keys are extracted according to the set tags, and the data is compressed according to the time stamp by the extracted keys, so as to generate compressed _ Block; the ey and the following data Block are written into the KV engine as KV, and the last timestamp of each compression _ Block is used as a column name; data is written to the sub-table by the above method.

Example 4

The embodiment further stores the metadata information of the sub-table into the metadata of the original data on the basis of the embodiment 1, and proposes a metadata information persistence structure for managing the metadata information of the sub-table from the data table of the original data during reading; the data table provides a persistent structure for managing the metadata information of the sub-table, so that the data table can manage the metadata of the sub-table by managing the metadata of the sub-table; through the content, the invention provides a storage format for storing the metadata of the sub-table, which effectively manages and stores the sub-table; if the sub table 1 comprises an index column set, a measurement column set (M1-M100) and a time column, the sub table 2 comprises an index column set, a measurement column set (M101-M200) and a time column, the sub table 3 comprises an index column set, a measurement column set (M201-M300) and a time column …, the metadata information of the sub table 1, the sub table 2 and the sub table 3 … is stored in the metadata of the original data, the corresponding sub table can be directly read during reading, and the metadata information of the original data does not need to be traversed; if reading M232, it can be directly fed back to sub-table 3, reducing read amplification and saving memory.

Example 5

As shown in fig. 3, this embodiment is further based on embodiment 1, and a data storage device based on time series data includes a memory: for storing executable instructions; a processor: the storage method of the time sequence data is realized by executing the executable instructions stored in the memory.

Example 6

In this embodiment, based on embodiment 1, after the time series data is stored, reading data is performed by first obtaining related information of a sub-table from metadata of a data table, then reading a Compress _ Block that meets a query condition from a kv engine according to the metadata of the sub-table, obtaining a metric column subset after decompressing the Compress _ Block, and finally combining the index column set and the time column set of all the metric column subsets into original data; the stored data can be read out through the above process.

In the prior art, tens of thousands of columns of time sequence data are compressed strictly according to time periods, and are read and written integrally, so that the server has huge pressure and is easy to burst; the invention divides tens of thousands of columns of measurement data into different sub-tables for compression storage, and provides a storage format for storing sub-table metadata, thereby effectively managing the stored sub-tables, effectively reducing the pressure of the server and reducing the reading and amplifying; the method can be used for monitoring and analyzing after data acquisition, can be landed on a company big data monitoring platform, and can also be used as an independent time sequence data monitoring and analyzing node to be accessed to a user big data platform.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for storing time series data is characterized by comprising the following steps:

splitting original data into different data sets and writing the data sets into different sub-tables;

storing the metadata information of the sub-table into the metadata of the original data;

the sub-table is written to the timing database.

2. The method as claimed in claim 1, wherein the original data metric columns are sorted by column name hash, and then split into different metric column subsets according to a set number of metric columns.

3. The method of claim 2, wherein the original data comprises an index column set and a time column, and wherein the index column set, the time column, and the metric column subset are combined into new data to be written into the sub-table.

4. The method of claim 1, wherein after the data is written in the sub-table, a key is extracted according to a set tag, the extracted key compresses the data according to a time stamp, and the compressed data is compressed to generate a compressed _ Block.

5. The method of claim 4, wherein the key and the following data Block are written into the KV engine as a KV, and the last timestamp of each Compress _ Block is used as a column name.

6. A data storage device based on time series data is characterized by comprising

A memory: for storing executable instructions;

a processor: the executable instructions stored in the memory are executed to realize a time sequence data storage method according to any one of claims 1 to 5.