CN115374240A

CN115374240A - Method and system for optimizing reading performance of time sequence database based on multi-level index

Info

Publication number: CN115374240A
Application number: CN202211007137.0A
Authority: CN
Inventors: 张楠; 姜鑫; 董一舟; 陈立德; 皮丕文
Original assignee: Clp Digital Technology Co ltd; Cetc Digital Technology Group Co ltd
Current assignee: Clp Digital Technology Co ltd; Cetc Digital Technology Group Co ltd
Priority date: 2022-08-22
Filing date: 2022-08-22
Publication date: 2022-11-22

Abstract

The invention provides a method and a system for optimizing the reading performance of a time sequence database based on a multi-level index, which relate to the technical field of database optimization and comprise the following steps: and (2) constructing a secondary index of the TSM file: constructing all timestamp indexes in the TSM on the outer layer of the TSM, positioning storage files related to a query condition aiming at the query with the timestamp condition, and acquiring storage addresses and offset of the files; constructing dictionary tree indexes in the TSI: and (4) positioning data meeting conditions in range query by constructing the dictionary tree and utilizing the orderliness of the dictionary tree. The method and the device can solve the problem that the whole table needs to be scanned in query in mass time sequence data storage, quickly locate whether the data meeting the query condition of non-timestamp exists in the storage file, and provide a solution for efficient query of the time sequence data of the Internet of things.

Description

Method and system for optimizing reading performance of time sequence database based on multi-level index

Technical Field

The invention relates to the technical field of database optimization, in particular to a time sequence database reading performance optimization method and system based on multi-level indexes.

Background

With the development of the internet of things technology, the application of the internet of things sensor in city management is more and more popular. Hundreds of millions of internet of things sensors collect various data information all the time and report the information to a data management platform. This requires the data management platform to have powerful data access capability and efficient reading capability.

The data generated by the internet of things device has the following characteristics: 1. the data volume is large, the data grows along with time, the same dimension is repeatedly taken, and the index changes smoothly; 2. data is generated continuously, writing is carried out continuously, and once the data is written, the data is hardly modified; 3. when data is counted in different dimensions, obvious cold and hot data scores exist, and data in the near term is generally concerned more. The characteristics show that the data generated by the Internet of things sensing equipment has obvious time sequence.

Conventional relational data storage systems such as mysql, sql server have the following problems when processing time series data: 1. the storage cost is high, the time sequence database is not well compressed, and a large amount of machine resources are occupied; 2. the maintenance cost is high, and a single machine system needs manual warehouse and table division on the upper layer; 3. the writing throughput is low, and the writing pressure of ten million levels of a time sequence database is difficult to meet; 4. the query performance is poor, and the aggregation analysis performance of the mass data is poor. When big data ecology such as hadoop and spark is used for storing time series data, the following problems also occur: 1. the data delay is high, and the time consumption is hour-level or even day-level when offline batch processing is carried out; 2. the query performance is scratched, indexes cannot be well utilized, a MapReduce task is relied on, and the query time consumption is generally minute-level.

The occurrence of time sequence data effectively solves the storage problem of the Internet of things perception data, and the writing and reading of the data are correspondingly optimized. The sequential database InfluxDB is ranked first in a DB-Engineers sequential database management system at present, and adopts a column type storage file structure to manage sequential data, a plurality of data blocks are stored in a file, and each data block stores data of a period of time in a time sequence. In order to satisfy fast writing of time series data, infiluxdb sacrifices partial read performance in the design process. Therefore, in the storage process of the large-scale internet of things perception log, the related query efficiency cannot meet the requirements of the industry.

In order to ensure data integrity and availability when infilxdb writes timing data, as with most database products, WAL (Write Ahead Log) is written first, then the WAL is written into the cache, and finally the data is refreshed to the disk.

Reading of time series data is generally classified into two categories: 1. querying with the timestamp as a query condition; 2. and querying with the non-timestamp field as a query condition. For these two types of queries, infiluxdb is handled as follows.

According to the storage file structure of infilxdb, there are records in the index block for the maximum timestamp and the minimum timestamp of the data in the file, and the internal data are ordered in timestamp. Therefore, in the query process with the timestamp as the query condition, the content of the index block can be read first, whether the data in the query range exists in the storage structure is determined, and if the data exists in the query range, the data is read from the file in a binary query mode. In the query with the non-timestamp as the query condition, the inverted Index structure in the storage file is utilized, the inverted Index which is used as a key word according to the measurement and tag is stored in the storage file of the infixDB, the position of the data is rapidly located by utilizing the Hash Index structure, the seriesID in the inverted Index is traversed, the corresponding seriesKey is found, and the required data is queried.

The following problems exist in the infilxdb data reading process:

1. when the number of the managed time sequences is large, the sequence positioning cost is high, all metadata needs to be read for traversal, and the query efficiency is low.

2. In the query process aiming at the non-timestamp fields, the Hash Index has a very efficient query effect on the equal query, but when the query condition is > or < the range query conditions, the ordered storage of the Tag Value is required to be combined to find the data meeting the conditions. This requires the Tag Value to be sorted when constructing the TSI, which results in a large memory and computation overhead.

The invention patent with publication number CN108268546B discloses a method and device for optimizing a database, which comprises the following steps: flexibly configuring functional points to be optimized and analyzed in a database to obtain configuration information; judging whether the current acquisition state of the function point is in a first state according to the configuration information, wherein the first state is used for representing that the acquisition of the function point is not finished; when the current acquisition state is determined to be in the first state, classifying the task types of the functional points to obtain a task classification result; and when the occupation condition of the current resources of the database is detected to be idle, correspondingly selecting a priority processing strategy corresponding to the task classification result so as to execute data acquisition of the function point according to the task queue priority of the task type.

The technical points are compared: the method comprises the steps of configuring function points to be optimized and analyzed, judging the specific state of a current database, and classifying the function points according to the specific state; and when the resources are free, reclassifying the related tasks and adjusting the priorities of the tasks. And the patent is optimized aiming at the data acquisition process, and is equivalent to the process of optimizing data writing. The method mainly aims at the time sequence database, optimizes the data reading process, and avoids scanning of the whole table through adding the secondary index.

The invention patent with publication number CN113868251A discloses a global secondary indexing method and device for a distributed database, and the specific implementation scheme is as follows: in response to a received database writing request, acquiring original data to be written, and writing the original data into a distributed database; performing global secondary index processing on the original data written into the distributed database to obtain global secondary index data; and establishing a global secondary index table corresponding to the global secondary index data and the data table main keys in the distributed database, and writing the global secondary index table into the index fragments based on an asynchronous processing mode.

The technical points are compared: the patent is to the scene of writing into of database, establishes whole second grade index, and the second grade index is corresponding in the primary key in the data sheet to the position that the data of can being fast was written into reduces the delay that data was written into. In the application, the data writing is processed in a mode of writing the file first, then writing the file into the memory cache, and finally writing the file into the disk, the data is directly returned to the client after being written into the file, and the data writing delay is very low. The process of constructing the index is mainly used for improving the performance of data reading.

The invention patent with publication number CN103902693B discloses a method for reading optimized memory database T tree index structure, which constructs the data structure of T-T tree: establishing a T-tree index structure according to the existing data, performing insertion operation on the data according to the size N of a node in the T-tree structure, ensuring the data in the node to be ordered, and performing splitting operation to ensure the balance of the tree if the data in one node is full; and performing data query operation in the established T-T tree, wherein the query operation is divided into single-value query and range query.

The technical points are compared: the method mainly comprises the steps of constructing a T tree index, ensuring the orderliness of data in nodes in the data insertion process, and ensuring the balance of the T tree through data splitting. When data is read, the structure of the T tree is utilized, efficient access of the data is guaranteed, and meanwhile, the cache hit rate is improved by reducing the use of pointers. According to the method and the device, the efficiency of timestamp query is improved by using the secondary index, and the data storage position is located by index data, so that full-table scanning in the query process is avoided; meanwhile, the TSI index is optimized by utilizing the dictionary tree, the range query of non-timestamp fields is promoted, and the storage position of the data is quickly located by utilizing the orderliness of the dictionary tree, so that the data can be conveniently and quickly searched.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a method and a system for optimizing the reading performance of a time sequence database based on a multi-level index.

According to the method and the system for optimizing the reading performance of the time sequence database based on the multi-level index, the scheme is as follows:

in a first aspect, a method for optimizing read performance of a time-series database based on a multi-level index is provided, where the method includes:

and (2) constructing a secondary index of the TSM file: constructing all timestamp indexes in the TSM on the outer layer of the TSM, positioning storage files related to a query condition aiming at the query with the timestamp condition, and acquiring storage addresses and offset of the files;

constructing dictionary tree indexes in the TSI: and (4) positioning data meeting conditions in range query by constructing the dictionary tree and utilizing the orderliness of the dictionary tree.

Preferably, the step of constructing a secondary index of the TSM file includes:

step S1.1: designing an index structure: the whole index structure is an array formed by index nodes; the index node comprises: the unique ID of the TSM, the minimum time minTime of the TSM, the maximum time maxTime and the storage position of the TSM in the disk; the nodes in the index array are sorted according to the unique ID of the TSM;

step S1.2: and (3) a data writing process: firstly writing WAL, namely a log structure, and then writing the WAL into a cache structure in a memory; when the data in the memory meets the condition of writing into a disk, firstly writing the data into a corresponding TSM storage structure in the disk according to the original logic, and obtaining the unique ID, TSM _ ID1, minimum timestamp min _ time1, maximum timestamp max _ time1 and disk storage position information offset1 of the TSM in the writing process;

step S1.3: query procedure conditioned on timestamp: includes 1) data with a query timestamp equal to some determined value; 2) Data with a timestamp within a certain range is queried.

Preferably, said step S1.2 further comprises: searching whether an index node with TDMID being TSM _ ID1 exists in the outer index array, and searching by utilizing a dichotomy;

if so, updating the maximum time and the minimum time of the index node; if not, a new index node is constructed according to the returned data, and the index node is added to the corresponding position in the array.

Preferably, said step S1.3 comprises:

1) The query condition is that the timestamp is equal to a certain value: when the query condition is that the timestamp is equal to a certain value, setting the timestamp value as timestamp1, traversing the index array, judging whether data meeting the condition exists in the corresponding TSM according to min _ time and max _ time of each index node, and when the following conditions are met, enabling the TSM storage file corresponding to the index to meet the query condition:

min_time<＝timestamp1<＝max_time

directly reading data in a corresponding TSM storage file according to the TSM unique ID information in the index node and the storage position information of the TSM, and returning the data to the client;

2) The query condition is timestamp range:

when the query condition is a timestamp range, setting the query timestamp range as min _ time1 and max _ time1; traversing the index array, and setting the timestamp range of the traversed index node as min _ time2 and max _ time2; when any one of the following conditions is satisfied, the searched TSM meets the query condition of the timestamp range:

max_time1>min_time2&&max_time1<max_time2

min_time1>min_time2&&min_time1<max_time2

min_time1<min_time2&&max_time1>max_time2

and then reading the data meeting the conditions in the searched TSM file, and returning the data to the client.

Preferably, the step of constructing the index of the dictionary tree in the TSI includes:

step S2.1: designing a dictionary tree, wherein the next layer of a root node is a TagKey node, storing the value information of the TagKey in a database, constructing the TagValue of each TagKey by taking the TagKey node as a father node, and ensuring that the sub-nodes of the TagKey node are ordered in a dictionary;

step S2.2: constructing a dictionary tree, and returning relevant information written into the TSM file after data is written into the TSM file according to a writing process, wherein the relevant information comprises a unique ID of the TSM, TSM _ ID1 and data storage index information seriesIDList; meanwhile, returning the TagKey and TagValue information of the data, and constructing a dictionary tree according to the TagKey and the TagValue;

step S2.3: and querying by taking the non-timestamp as a query condition.

Preferably, the constructing the dictionary tree according to TagKey and TagValue in the step S2.2 includes:

in a dictionary tree, whether the TagKey exists or not is searched from the child nodes of the root node;

if the node does not exist, adding a child node for the root node, wherein the node _ id of the child node is the current timestamp, the node _ value is the TagKey, and the key _ list _ address is null;

if so, constructing a TagValue path in the child node of the TagKey;

the last node of the TagValue path needs to store corresponding data storage index information serieidlist, that is, key _ list _ address = serieidlist.

Preferably, the query procedure with the non-timestamp as the query condition in step S2.3 includes:

non-timestamp equivalence query:

1) Acquiring a TagKey value of a non-timestamp;

2) Acquiring a corresponding TagValue value;

3) Searching in a dictionary tree, if a path corresponding to the TagKey and the TagValue exists in the dictionary tree and the key _ list _ address attribute of the next bit of the path is not null, determining the seriesIDList corresponding to the TagKey and the TagValue according to the corresponding key _ list _ address;

4) Acquiring a seriesKeyList according to the seriesIDList, thereby positioning a specific data storage position;

non-timestamp Range queries:

1) Acquiring a TagKey value of a non-timestamp;

2) Acquiring a corresponding TagValue value;

3) Searching in a dictionary tree, if a path corresponding to the TagKey and the TagValue exists in the dictionary tree, setting the last node as node1, and then executing the step 5);

4) If the path corresponding to the TagKey and the TagValue does not exist in the dictionary tree, searching the longest prefixes of the TagKey and the TagValue, and setting the last node as node2, and if the query condition is that the tagKey is smaller than the TagValue, searching the first node larger than the TagValue in the whole path of the child nodes of the node2 and setting the first node as node1; if the query condition is that tagKey > TagValue, searching the last node smaller than TagValue in the full path of the child node of node2, and setting the node as node1; then executing the step 5);

5) If the query condition is that tagKey < tagValue, all nodes on the left of node1 in the dictionary tree and all nodes on the tagValue path, which are not empty in key _ list _ address, are nodes meeting the condition;

6) If the query condition is that tagKey > tagValue, all nodes on the right of node1 and nodes with key _ list _ address not being empty in all child nodes of node1 are nodes meeting the condition;

7) And acquiring all the seriesKey according to the key _ list _ address of the node meeting the condition, acquiring the position of the specific storage of the data, reading the data, and returning the data to the client.

In a second aspect, a multi-level index-based sequential database read performance optimization system is provided, the system comprising:

constructing a secondary index module of the TSM file: constructing all timestamp indexes in the TSM on the outer layer of the TSM, positioning storage files related to a query condition aiming at the query with the timestamp condition, and acquiring storage addresses and offset of the files;

constructing a dictionary tree index module in the TSI: and (4) positioning data meeting conditions in range query by constructing the dictionary tree and utilizing the orderliness of the dictionary tree.

Preferably, the secondary index module for building a TSM file includes:

module M1.1: designing an index structure: the whole index structure is an array formed by index nodes; the index node comprises: the unique ID of the TSM, the minimum time minTime of the TSM, the maximum time maxTime and the storage position of the TSM in the disk; the nodes in the index array are sorted according to the unique ID of the TSM;

module M1.2: and (3) a data writing process: firstly writing WAL, namely a log structure, and then writing the WAL into a cache structure in a memory; when the data in the memory meets the condition of writing into a disk, firstly writing the data into a corresponding TSM storage structure in the disk according to the original logic, and obtaining the unique ID, TSM _ ID1, minimum timestamp min _ time1, maximum timestamp max _ time1 and disk storage position information offset1 of the TSM in the writing process;

module M1.3: query procedure conditioned on timestamp: including 1) data with a query timestamp equal to some determined value; 2) Querying data with a timestamp within a certain range;

the module M1.2 further comprises: searching whether an index node with TDMID as TSM _ ID1 exists in the outer index array, and searching by using a dichotomy;

if so, updating the maximum time and the minimum time of the index node; if the data does not exist, constructing a new index node according to the returned data, and adding the new index node at the corresponding position in the array;

the module M1.3 comprises:

min_time<＝timestamp1<＝max_time

2) The query condition is timestamp range:

when the query condition is a timestamp range, setting the queried timestamp range as min _ time1 and max _ time1; traversing the index array, and setting the timestamp ranges of traversed index nodes as min _ time2 and max _ time2; when any one of the following conditions is satisfied, the searched TSM meets the query condition of the timestamp range:

max_time1>min_time2&&max_time1<max_time2

min_time1>min_time2&&min_time1<max_time2

min_time1<min_time2&&max_time1>max_time2

Preferably, the secondary indexing module for constructing the TSM file includes:

module M2.1: designing a dictionary tree, wherein the next layer of a root node is a TagKey node, storing the value information of the TagKey in a database, constructing the TagValue of each TagKey by taking the TagKey node as a father node, and ensuring that the sub-nodes of the TagKey node are ordered in a dictionary;

module M2.2: constructing a dictionary tree, and returning relevant information written into the TSM file after data is written into the TSM file according to a writing process, wherein the relevant information comprises a unique ID of the TSM, TSM _ ID1 and data storage index information seriesIDList; meanwhile, returning the information of the TagKey and the TagValue of the data, and constructing a dictionary tree according to the TagKey and the TagValue;

module M2.3: querying by taking the non-timestamp as a query condition;

the constructing the dictionary tree according to the TagKey and the TagValue in the module M2.2 includes:

if so, constructing a TagValue path in the child node of the TagKey;

the last node of the TagValue path needs to store corresponding data storage index information seriesedlist, that is, key _ list _ address = seriesedlist;

the query process in the module M2.3 with the non-timestamp as the query condition includes:

non-timestamp equivalence query:

1) Acquiring a TagKey value of a non-timestamp;

2) Acquiring a corresponding TagValue value;

3) Searching in a dictionary tree, if a path corresponding to the TagKey and the TagValue exists in the dictionary tree and the key _ list _ address attribute of the next bit of the path is not null, determining the serieIDList corresponding to the TagKey and the TagValue according to the corresponding key _ list _ address;

non-timestamp Range queries:

1) Acquiring a TagKey value of a non-timestamp;

2) Acquiring a corresponding TagValue value;

3) Searching in a dictionary tree, if a path corresponding to the TagKey and the TagValue exists in the dictionary tree, setting the last node as a node1, and then executing the step 5);

6) If the query condition is that tagKey > tagValue, all nodes on the right of the node1 and nodes with key _ list _ address not being empty in all child nodes of the node1 are nodes meeting the condition;

7) And acquiring all the seriesKey according to the key _ list _ address of the node meeting the condition, acquiring the position of the specific data storage, reading the data and returning the data to the client.

Compared with the prior art, the invention has the following beneficial effects:

1) The invention can effectively improve the query efficiency of the time sequence database aiming at the timestamp range query statement;

for queries with the timestamp as the query condition: an index is additionally arranged on the outer layer of the infilxdb to record the timestamp interval of each storage file, namely minTime and maxTime, and in the query using the timestamp as the query condition, the storage files are positioned by the index, so that full-table scanning is avoided;

2) The invention can effectively improve the query efficiency of the time sequence database for the query statement in the non-timestamp range;

for queries with non-timestamp fields as query conditions: a dictionary tree is added in the TSI of the inflxDB to replace the original Hash Index structure, so that when the range is queried, all data can be quickly positioned through the dictionary tree under the condition of no need of sequencing, the repeated sequencing process in the TSI construction process is avoided, and the construction of indexes is accelerated;

3) The invention can utilize the construction of the dictionary tree, and avoids the sequencing of the written data aiming at the TagValue;

4) The invention can provide a solution for efficient query of large-scale time series data.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a schematic diagram of a data writing process;

FIG. 2 is a schematic diagram illustrating a timestamp range query satisfying condition 1;

FIG. 3 is a schematic diagram illustrating the timestamp range query condition satisfying condition 2;

FIG. 4 is a schematic diagram of a timestamp range query satisfying both conditions 1 and 2;

FIG. 5 is a diagram illustrating a timestamp range query satisfying condition 3;

FIG. 6 is a diagram of a dictionary tree construction;

FIG. 7 is a schematic diagram of a data reading process-non-timestamp equivalence query;

FIG. 8 is a schematic diagram of a data reading process-non-timestamp range query.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

The embodiment of the invention provides a method for optimizing the reading performance of a time sequence database based on a multi-level index, which aims to solve the technical problems that:

1) The problem that the whole table needs to be scanned in mass time series data storage is solved.

2) And quickly positioning whether data meeting the non-time-stamp query condition exists in the storage file.

3) The method provides a solution for efficient query of the time sequence data of the Internet of things.

Constructing a secondary index of the TSM file, and optimizing the query with the timestamp as the query condition; and constructing dictionary tree indexes in the TSI, and optimizing range query with non-time stamps as query conditions.

1. Index blocks stored in the TSM file store the minimum time and the maximum time range of all data blocks, and the time range in the index blocks needs to be scanned in the inquiry process according to the time range, which relates to index scanning of a plurality of TSM blocks. A layer of index is added outside the TSM, the time ranges of different TSM blocks are recorded, the TSM storage structure which meets the conditions can be directly positioned, and global scanning is avoided.

2. The original Hash Index in the TSI needs to be combined with ordered storage of the Tag Value to realize rapid positioning of range condition query, and data of the Tag Value is not predictable in advance, so that multiple sequencing of the Tag Value in the TSI can be caused, certain memory and calculation expenses are caused, and the data meeting the conditions in the range query can be rapidly positioned by constructing a dictionary tree and utilizing the orderliness of the dictionary tree.

For the improved TSM and TSI structures, there is a corresponding change in both writing and reading of data. During writing, an index building process needs to be added, and during reading, data can be quickly located according to the built index.

Referring to fig. 1, the present invention specifically includes:

Firstly, the step of constructing the secondary index of the TSM file comprises the following steps:

step S1.1: designing an index structure: the whole index structure is an array formed by index nodes; the index node comprises: the unique ID of the TSM, the minimum time minTime of the TSM, the maximum time maxTime, and the location of the TSM stored in the disk.

{

TSMID; // index unique id

minTime; // minimum time to index corresponding TSM File

maxTime; // maximum time of index corresponding to TSM File

offset; // storage location offset of TSM files

}

To minimally affect write efficiency, the nodes in the index array are sorted by the unique ID of the TSM. Therefore, when the index is updated subsequently, binary search can be carried out according to the TSMID, and the index needing to be modified is quickly positioned.

Step S1.2: and (3) a data writing process: firstly writing WAL, namely a log structure, and then writing the WAL into a cache structure in a memory; when the data in the memory meets the condition of writing in the disk, firstly writing the data into a corresponding TSM storage structure in the disk according to the original logic, and obtaining the unique ID, TSM _ ID1, the minimum timestamp min _ time1, the maximum timestamp max _ time1 and the disk storage position information offset1 of the TSM in the writing process.

And searching whether an index node with TDMID being TSM _ ID1 exists in the outer index array, and searching by utilizing a dichotomy. If yes, updating the maximum time and the minimum time of the index node; if not, a new index node is constructed according to the returned data, and the index node is added to the corresponding position in the array.

Step S1.3: queries that are conditioned on a timestamp can be divided into two categories: 1) Querying data having a timestamp equal to a certain determined value; 2) Querying data with a timestamp within a certain range (queries greater than a certain timestamp may translate to a time range from the timestamp to infinity; similarly, a query that is less than a certain timestamp may translate to a range of the infinitesimal time to the timestamp).

The step S1.3 specifically includes: 1) The query condition is that the timestamp is equal to a certain value: when the query condition is that the timestamp is equal to a certain value, setting the timestamp value as timestamp1, traversing the index array, judging whether data meeting the condition exists in the corresponding TSM according to min _ time and max _ time of each index node, and when the following conditions are met, enabling the TSM storage file corresponding to the index to meet the query condition:

min_time<＝timestamp1<＝max_time

and directly reading data in the corresponding TSM storage file according to the TSM unique ID information in the index node and the storage position information of the TSM, and returning the data to the client.

2) The query condition is timestamp range:

max_time1>min_time2&&max_time1<max_time2

min_time1>min_time2&&min_time1<max_time2

min_time1<min_time2&&max_time1>max_time2

examples of detailed range ranges are shown in fig. 2, 3, 4 and 5.

Next, referring to fig. 6, the step of constructing the index of the dictionary tree in the TSI specifically includes:

queries in a time series database not only have queries for time stamps, but also have a significant portion of non-time stamped queries. Regarding the construction of the inverted index in the TSI and the existing Hash index thereof, when the Tag value needs to be ordered, the range query aiming at the non-timestamp query condition can be efficiently carried out. In data writing, the Tag value is not predictable, so that multiple sorting operations are involved. Therefore, the dictionary tree is designed to replace the original Hash index in the step, and the problem of non-timestamp range query is rapidly solved according to the orderliness of the dictionary tree.

Step S2.1: designing a dictionary tree, wherein the Node structure of the nodes of the dictionary tree is as follows:

{

a node _ id; // node unique id

node _ value; value of// node

key _ list _ address; if the node is the end point of the tagvalue, the key _ list _ address is the address of the corresponding seriesIDList; otherwise, key _ list _ address is empty.

child _ node _ list; v/array structure, array element is Node, represents Node's child Node, and is null when there is no child Node

}

In the process of constructing the dictionary tree, the next layer of the root node is a TagKey node, the value information of the TagKey in the database is stored, the TagValue of each TagKey is constructed by taking the TagKey node as a father node, and the child nodes of the TagKey node are ensured to be orderly in the dictionary.

Step S2.2: constructing a dictionary tree, and returning relevant information written into the TSM file after data is written into the TSM file according to a writing process, wherein the relevant information comprises a unique ID of the TSM, TSM _ ID1 and data storage index information seriesIDList; meanwhile, returning the information of the TagKey and the TagValue of the data, and constructing a dictionary tree according to the TagKey and the TagValue:

if yes, constructing a TagValue path in the child node of the TagKey;

Step S2.3: and querying by taking the non-timestamp as a query condition.

The method specifically comprises the following steps:

referring to FIG. 7, a non-time-stamped isoquery:

1) Acquiring a TagKey value of a non-timestamp;

2) Acquiring a corresponding TagValue value;

referring to FIG. 8, a non-timestamp Range query:

1) Acquiring a TagKey value of a non-timestamp;

2) Acquiring a corresponding TagValue value;

4) If the dictionary tree does not have a path corresponding to the TagKey and the TagValue, searching the longest prefixes of the TagKey and the TagValue, setting the last node as node2, and if the query condition is that the TagKey is less than the TagValue, searching the full path (full path: the full path of a node refers to a character string composed of node _ values from the child node of the TagKey to all nodes of the node. ) Setting the first node larger than the TagValue as node1; if the query condition is that tagKey > TagValue, searching the last node smaller than TagValue in the full path of the child node of node2, and setting the node as node1; then executing the step 5);

5) If the query condition is that the tagKey is < tagValue, all nodes on the left of the node1 in the dictionary tree and nodes on all nodes on a tagValue path are nodes which are not empty in key _ list _ address and are consistent with the condition;

The embodiment of the invention provides a time sequence database reading performance optimization method and system based on multi-level index, and the time sequence database is improved by using the text method in the process of collecting and storing logs of large-scale internet of things sensing equipment, so that the data query efficiency can be improved, and the query result can be quickly returned according to the query condition. In a monitoring system of a large-scale server cluster, for data of performance and operation condition of a large-scale server generated in real time, efficient data writing and data reading can be carried out by utilizing a database structure improved by the method. In a scene needing aggregation analysis, the reading method of the method can be called by using the drama and the algorithm built in the database, so that the analysis speed is improved.

Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.

The foregoing description has described specific embodiments of the present invention. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A time sequence database reading performance optimization method based on multi-level index is characterized by comprising the following steps:

2. The method according to claim 1, wherein the step of constructing a secondary index of the TSM file comprises:

step S1.2: and (3) writing data: firstly writing WAL, namely a log structure, and then writing the WAL into a cache structure in a memory; when the data in the memory meets the condition of writing in a disk, firstly writing the data into a corresponding TSM storage structure in the disk according to the original logic, and obtaining the unique ID, TSM _ ID1, minimum timestamp min _ time1, maximum timestamp max _ time1 and disk storage position information offset1 of the TSM in the writing process;

step S1.3: query procedure conditioned on timestamp: including 1) data with a query timestamp equal to some determined value; 2) Data with time stamps within a certain range is queried.

3. The method for optimizing the read performance of the time-series database based on the multi-level index according to claim 2, wherein the step S1.2 further comprises: searching whether an index node with TDMID being TSM _ ID1 exists in the outer index array, and searching by utilizing a dichotomy;

if yes, updating the maximum time and the minimum time of the index node; if not, a new index node is constructed according to the returned data, and the index node is added to the corresponding position in the array.

4. The method according to claim 2, characterized in that said step S1.3 comprises:

1) The query condition is that the timestamp is equal to a certain value: when the query condition is that the timestamp is equal to a certain value, the timestamp value is set as timestamp1, the index array is traversed, whether data meeting the condition exists in the corresponding TSM is judged according to the min _ time and the max _ time of each index node, and when the following conditions are met, the TSM storage file corresponding to the index meets the query condition:

min_time<＝timestamp1<＝max_time

2) The query condition is timestamp range:

when the query condition is a timestamp range, setting the queried timestamp range as min _ time1 and max _ time1; traversing the index array, and setting the timestamp ranges of traversed index nodes as min _ time2 and max _ time2; when any one of the following conditions is met, the found TSM is the query condition meeting the timestamp range:

max_time1>min_time2&&max_time1<max_time2

min_time1>min_time2&&min_time1<max_time2

min_time1<min_time2&&max_time1>max_time2

5. The method for optimizing the read performance of the time-series database based on the multi-level index according to claim 1, wherein the step of constructing the index of the dictionary tree in the TSI comprises the steps of:

step S2.2: constructing a dictionary tree, and returning relevant information written into the TSM file after data is written into the TSM file according to a writing process, wherein the relevant information comprises the unique ID of the TSM, TSM _ ID1 and data storage index information seriesIDList; meanwhile, returning the information of the TagKey and the TagValue of the data, and constructing a dictionary tree according to the TagKey and the TagValue;

step S2.3: and querying by taking the non-timestamp as a query condition.

6. The method for optimizing the read performance of the time-series database based on the multi-level index according to claim 5, wherein the step S2.2 of constructing the dictionary tree according to the TagKey and the TagValue comprises the following steps:

if so, constructing a TagValue path in the child node of the TagKey;

7. The method for optimizing the reading performance of the time-series database based on the multi-level index as claimed in claim 5, wherein the query process with the non-timestamp as the query condition in the step S2.3 comprises:

non-timestamp equivalence query:

1) Acquiring a TagKey value of a non-timestamp;

2) Acquiring a corresponding TagValue value;

non-timestamp Range queries:

1) Acquiring a TagKey value of a non-timestamp;

2) Acquiring a corresponding TagValue value;

4) If the path corresponding to the TagKey and the TagValue does not exist in the dictionary tree, searching the longest prefixes of the TagKey and the TagValue, and setting the last node as node2, and if the query condition is that the TagKey is smaller than the TagValue, searching the first node larger than the TagValue in the whole path of the child nodes of the node2 and setting the first node as node1; if the query condition is that tagKey > tagValue, searching the last node smaller than tagValue in the whole path of the child nodes of the node2, and setting the node as node1; then executing the step 5);

8. A multi-level index based timing database read performance optimization system, comprising:

constructing a secondary index module of the TSM file: constructing timestamp indexes in all TSMs on the outer layer of the TSM, positioning storage files related to the query conditions aiming at the query with the timestamp conditions, and acquiring storage addresses and offsets of the files;

9. The multi-level index based timing database read performance optimization system of claim 8, wherein the secondary index module to build a TSM file comprises:

the module M1.3 comprises:

min_time<＝timestamp1<＝max_time

2) The query condition is timestamp range:

when the query condition is a timestamp range, setting the query timestamp range as min _ time1 and max _ time1; traversing the index array, and setting the timestamp range of the traversed index node as min _ time2 and max _ time2; when any one of the following conditions is met, the found TSM is the query condition meeting the timestamp range:

max_time1>min_time2&&max_time1<max_time2

min_time1>min_time2&&min_time1<max_time2

min_time1<min_time2&&max_time1>max_time2

10. The multi-level index based timing database read performance optimization system of claim 8, wherein the secondary index module to build a TSM file comprises:

module M2.3: querying by taking the non-timestamp as a query condition;

constructing a dictionary tree according to the TagKey and the TagValue in the module M2.2 includes:

if so, constructing a TagValue path in the child node of the TagKey;

the last node of the TagValue path needs to store corresponding data storage index information serieidist, that is, key _ list _ address = serieidist;

non-timestamp equivalence query:

1) Acquiring a TagKey value of a non-timestamp;

2) Acquiring a corresponding TagValue value;

non-timestamp Range queries:

1) Acquiring a TagKey value of a non-timestamp;

2) Acquiring a corresponding TagValue value;