CN116089414B

CN116089414B - Time sequence database writing performance optimization method and device based on mass data scene

Info

Publication number: CN116089414B
Application number: CN202310372120.3A
Authority: CN
Inventors: 罗实; 陈昱宇; 周宇; 邵研; 张君兰; 王永恒; 黄宝忠
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-04-10
Filing date: 2023-04-10
Publication date: 2023-09-08
Anticipated expiration: 2043-04-10
Also published as: CN116089414A

Abstract

The invention discloses a time sequence database writing performance optimization method and device based on a mass data scene, wherein the method comprises the following steps: firstly, separating a read-write thread pool corresponding to a database read-write request based on Apache Calcite, and establishing an independent write task thread pool; then, slicing the write task in the write task thread pool through a hash strategy, and scheduling the sliced task according to the load value so as to write the data of the write task into the nodes in the cluster; and finally, acquiring a write-in request according to the write-in task of the node, distributing a memory array for the write-in request, dynamically distributing the memory required by the write-in task based on a massive time line, and flushing the data of the write-in task. The invention can relieve the situation that the writing thread is blocked by the reading thread in the time sequence database writing multi-reading few scenes, has better cluster load balancing effect, enhances the concurrency of tasks, is beneficial to reducing the frequency of brushing the disk when writing mass data, and improves the writing efficiency.

Description

Time sequence database writing performance optimization method and device based on mass data scene

Technical Field

The invention belongs to the technical field of databases, and particularly relates to a time sequence database writing performance optimization method and device based on a mass data scene.

Background

The time sequence database is totally called as a time sequence database (Time Series Database), is a specialized database for storing and managing time sequence data, has the characteristics of more writing, less reading, high concurrency writing, continuous writing of mass data and the like, and has wide application in the fields of Internet of things, economic finance, industrial production, system monitoring and the like.

Time series data is defined as a series of data indexed by time dimension. In daily life, time sequence data are ubiquitous, stock data are the most intuitive and typical, and stock prices are continuously changed along with time to form a time sequence. Typically, a time series database contains the following concepts:

a. a Table (Metric) corresponding to the tables in the relational database, representing a set of similar time series data;

b. tags (tags) that describe the characteristics of the data source, typically not changing over time;

c. a time stamp (Timestamp) representing the point in time at which the data was generated;

d. a measurement value (field) for describing a measurement index of the data source, which is generally continuously variable over time;

e. a Data Point (Data Point), wherein a certain measurement value generated by a Data source at a certain time is called a Data Point, and the Data Point is used as a statistical index according to the Data Point when the database is written in and inquired;

f. time Series (Time Series), where a certain measurement of the data source varies with Time, forms a Time line, and the combination of the table and the tag uniquely identifies a Time line, the computation of the Time Series data is typically based on the Time line dimension.

In recent years, with the continuous development and application of 5G network and internet of things technology in various industries, the collection and storage of large-scale time series data becomes an increasingly troublesome reality problem. At present, the mainstream time sequence database generally adopts modes of upgrading disk media, expanding cluster scale, optimizing data structures and the like to improve the system capacity, and continuous technical transformation ensures that the system obtains long-term performance improvement and is mature and stable gradually. However, in the scene of massive time series data, the optimization brought by the means often has the problems of long time cost, low cost performance, repeated waste of resources and the like, and how to solve the performance bottleneck by utilizing the service characteristics of the large-scale time series data becomes a new research direction.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a time sequence database writing performance optimization method and device based on a mass data scene.

The aim of the invention is realized by the following technical scheme: the first aspect of the embodiment of the invention provides a time sequence database writing performance optimization method based on a mass data scene, which comprises the following steps:

(1) Separating read-write thread pools corresponding to database read-write requests based on Apache Calcite, and establishing independent write task thread pools;

(2) Fragmenting the write task in the write task thread pool in the step (1) through a hash strategy, and scheduling the fragmented task according to a load value so as to write the data of the write task into a corresponding node in the cluster;

(3) And (3) acquiring a write-in request according to the write-in task of the write-in node in the step (2), distributing a memory array for the write-in request, dynamically distributing the memory required by the write-in task based on a massive time line, and then brushing the data of the write-in task.

Optionally, the processing flow of the Apache Calcite comprises:

(a) Analysis: acquiring and analyzing SQL sentences, and converting the SQL sentences into abstract syntax trees;

(b) Verification: performing grammar checking and semantic analysis; the grammar checking is specifically as follows: carrying out grammar verification according to metadata information of a database; the semantic analysis specifically comprises the following steps: constructing an initial version of logic plan through abstract syntax tree and meta information;

(c) And (3) optimizing: optimizing a logic plan;

(d) The execution stage: and generating a physical plan based on the optimized logic plan, and executing the physical plan.

Optionally, the step (1) includes the sub-steps of:

(1.1) submitting a task for the first time, acquiring and analyzing an SQL statement of the task based on Apache Calcite, judging whether the SQL statement contains a keyword insert, and if so, determining the task type as a writing task; otherwise, the task type is a reading task;

(1.2) if the task type acquired in the step (1.1) is a writing task, submitting the task to a writing thread pool; if the task type obtained in the step (1.1) is a read task, marking a statement corresponding to the read task to obtain a statement id corresponding to the read task, storing the statement id into a hash set, binding the hash set to a TCP connection, and submitting the task to a read thread pool;

(1.3) submitting a subsequent task, judging whether the statementId corresponding to the current task is in the hash set obtained in the step (1.2), if not, submitting the current task to a write thread pool, and if not, submitting the current task to the write thread pool; otherwise, the current task is a read task, and the current task is submitted to a read thread pool; separating the read-write thread pools corresponding to the tasks, and establishing an independent write task thread pool.

Optionally, the step (2) includes the sub-steps of:

(2.1) acquiring meta information of the writing task according to the writing task in the writing task thread pool in the step (1), wherein the meta information comprises the number of labels of the writing task, the number of measured values and the time length;

(2.2) calculating a corresponding hash value according to the hash strategy specified by the user; the hash policy includes: calculating a hash value according to the table name, calculating a hash value according to the selected tag, and calculating a hash value according to the table name and the tag;

(2.3) slicing the writing task according to the hash value calculated in the step (2.2), and putting the writing task into a queue to be written in a slicing unit;

(2.4) calculating a load value according to the meta information of the write task obtained in the step (2.1);

(2.5) scheduling the writing task to be written in the queue in the step (2.3) to a node with the minimum load value in the cluster nodes, and dynamically updating a load value table;

(2.6) writing the data of the write task into the corresponding node according to the scheduling mode in the step (2.5).

Optionally, the expression of the load value is:

where i represents a label and n represents the number of labels included in the slicing task.

Optionally, the dynamically updating the load value table in the step (2.5) includes: distributing the writing task of the fragment to a node with the smallest load value in the cluster nodes, and accumulating the load value to the load value of the node so as to dynamically update a load value table; when the writing of the writing task of the slice is finished, the accumulated load value of the corresponding node is subtracted to dynamically update the load value table.

Optionally, the step (3) comprises the following sub-steps:

(3.1) acquiring a write request according to the write task of the write node in the step (2), distributing a memory array for the write request, and writing data;

(3.2) judging whether the memory is larger than a memory threshold, if so, calculating the effective data duty ratio, adjusting the size of the memory array according to the effective data duty ratio, and then brushing the write-in data; otherwise, go to step (3.3);

3.3, judging whether the current memory array is full, if so, expanding the size of the memory array to be twice as large as the current memory array, and then brushing the written data; otherwise, go to step (3.4);

(3.4) judging whether other writing requests exist, and if so, returning to the step (3.1); otherwise, the writing task ends.

The second aspect of the embodiment of the invention provides a time sequence database writing performance optimizing device based on a mass data scene, which comprises one or more processors and is used for realizing the time sequence database writing performance optimizing method based on the mass data scene.

A third aspect of the embodiments of the present invention provides a computer program, including computer program instructions, which when executed by a processor, implement the above-mentioned method for optimizing writing performance of a time-series database based on a mass data scenario.

A fourth aspect of the embodiments of the present invention provides a computer readable storage medium having a program stored thereon, which when executed by a processor, is configured to implement the above-described method for optimizing writing performance of a time-series database based on a mass data scenario.

The method has the advantages that the read-write thread pool is separated, so that the problem of writing failure caused by the fact that the thread pool is filled up in a large inquiry state is effectively avoided, successful writing of data can be preferentially ensured under the condition of heavy load, the availability of a database is improved, and the method is particularly suitable for scenes with more writing and less reading; according to the invention, the user can effectively reasonably fragment the written data aiming at different service scenes by specifying the hash strategy, and the cluster fully realizes load balancing by defining the dynamic scheduling model based on the priority of the service load values, so that the concurrency of tasks is enhanced, and the writing efficiency is improved; the invention effectively avoids the condition of excessive pre-allocation space through dynamic memory allocation, optimizes the utilization rate of the memory, reduces the frequency of brushing the disk and improves the writing efficiency.

Drawings

FIG. 1 is a read-write thread pool separation flow diagram of the present invention;

FIG. 2 is a flow chart of task scheduling based on traffic load value prioritization in accordance with the present invention;

FIG. 3 is a flow chart of dynamic memory allocation according to the present invention;

FIG. 4 is a flow chart of a method for optimizing the writing performance of a time sequence database based on a mass data scene;

fig. 5 is a schematic structural diagram of a time-series database writing performance optimizing device based on a mass data scene.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the invention. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

The present invention will be described in detail with reference to the accompanying drawings. The features of the examples and embodiments described below may be combined with each other without conflict.

Referring to fig. 4, the time sequence database writing performance optimization method based on mass data scene of the present invention includes the following steps:

(1) Separating read-write thread pools corresponding to database read-write requests based on Apache Calcite, and establishing independent write task thread pools.

In this embodiment, apache Calcite is a basic framework for optimizing query processing of heterogeneous data sources, provides a standard SQL (Structured Query Language, structured query) language, optimizes multiple queries and connects various data sources, and leaves the data storage and data management capabilities to the engine itself. Because Apache Calcite has a good pluggable architecture design, an SQL engine is constructed based on a part of functions in Apache Calcite, so that SQL support of a database kernel is realized, and research and development of database products are completed.

The processing flow of Apache Calcite mainly comprises the following four stages:

(a) Parser (parse) stage: and acquiring and analyzing the SQL sentence, and converting the SQL sentence into an abstract syntax tree.

(b) Validate (verification) phase: and performing grammar checking and semantic analysis, wherein the grammar checking is performed by grammar checking according to metadata information of a database, and the semantic analysis is performed by constructing a logic plan of an initial version through an abstract grammar tree and the metadata information.

(c) Optimization phase: optimizing a logic plan will typically be done according to a set of rules that the developer has set itself up.

(d) Execution phase: and generating a physical plan based on the optimized logic plan, and executing the physical plan.

The read-write thread pool is an optimization for the SQL engine. Under the condition that the reading and writing share one thread pool, if a certain user makes a large number of queries with larger range in a short time, the condition that the threads in the pool are occupied by the large queries may occur, in this case, the reading and writing of the database are unavailable, the writing of the data is seriously blocked, and a large number of data bins and writing failures may be caused.

Typically, when an api (Application Program Interface ) accesses a database, the read-write tasks may be classified according to the api name, so as to separate the read-write thread pools of the tasks. In the SQL engine based on Apache Calcite, all SQL sentences access the database through the same api, so that the read-write thread pool cannot be separated through the api name.

Therefore, based on Apache Calcite, the read-write thread pool corresponding to the database read-write request is separated, and an independent write task thread pool is built, as shown in FIG. 1, and specifically comprises the following steps:

(1.1) submitting a task for the first time, acquiring and analyzing an SQL statement of the task based on Apache Calcite, judging whether the SQL statement contains a keyword insert, and if so, determining the task type as a writing task; otherwise, the task type is a read task.

Specifically, when a task is submitted for the first time, the task will enter a Parser stage, and the task contains a complete SQL sentence for subsequent grammar checking and semantic analysis. Therefore, SQL sentences of the tasks can be directly acquired in the Parser stage; judging whether the SQL statement contains a keyword insert or not to distinguish a writing thread from a reading thread, and if the SQL statement contains the keyword insert, judging that the task type is a writing task; otherwise, the task type is a read task.

It should be understood that the tasks of the Parser stage, the value stage and the Optimize stage all comprise complete SQL sentences, and the task types can be obtained according to whether keywords are included in the tasks or not.

(1.2) if the task type obtained in the step (1.1) is a writing task, submitting the task to a writing thread pool; if the task type obtained in the step (1.1) is a read task, marking a statement corresponding to the read task to obtain a statement Id corresponding to the read task, storing the statement Id in a hash set, binding the hash set to a TCP connection, and submitting the task to a read thread pool.

In this embodiment, the task is submitted to the corresponding thread pool according to the task type and executed according to the actual situation of the subsequent step. The task types include a read task and a write task.

It should be noted that, since there is only one batch of data pairs in the subsequent task, and key information for effectively distinguishing the read-write operation is lacking, it is necessary to mark the statement corresponding to the read task, and store the corresponding statement id in the hash set, so as to facilitate the recognition of the read or write task of the subsequent task.

After the semantic verification, plan generation and optimization of the first task submission are completed, the subsequent tasks only need to multiplex the states of the first task and submit new data. For example, the first task submits the following SQL statement: insert into table (a, b, c) values (.

In this case, only one batch of data pairs will be in the subsequent task, but the key information for effectively distinguishing the read-write operation is lacking, so that the states to be multiplexed in the subsequent task need to be marked, thereby distinguishing the read-write operation. Therefore, after distinguishing the read-write thread in the Parser stage, a hash set is maintained to store the statementId corresponding to the read thread, and the hash set is bound to the corresponding TCP connection, so as to ensure that the task type of the data injected through the TCP connection can be confirmed according to the statementId every time later.

And submitting a subsequent task, and only judging whether the current statementId is in the hash set in the Execute stage, distinguishing read-write threads so as to distribute the task to a corresponding thread pool. It should be noted that, if the current statementId does not exist in the hash set, the current task is a write task; otherwise, the current task is a read task.

(2) And (3) fragmenting the write tasks in the write task thread pool in the step (1) through a hash strategy, and scheduling the fragmented tasks according to the load value so as to write the data of the write tasks into the corresponding nodes in the cluster.

Specifically, as shown in fig. 2, the method comprises the following steps:

(2.1) acquiring meta information of the write task according to the write task in the write task thread pool in the step (1), wherein the meta information comprises the number of labels of the write task, the number of measured values and the time length.

In this embodiment, in the time-series database, the storage and management of data are performed based on the dimension of the time line, and the time line needs to be reasonably distributed to each write-in node in the database cluster, so as to implement load balancing of the distributed time-series database cluster. Where a node represents a single server in a cluster.

(2.2) calculating a corresponding hash value according to the hash policy specified by the user. The hash strategy comprises three types, namely: the hash value is calculated from the table name, the hash value is calculated from the selected tag, and the hash value is calculated from the table name and the tag.

In this embodiment, the hash value is calculated according to the table name by default, and the other two hash policies are configured by the support user according to the service data characteristics.

In a typical design of a distributed time sequence database, a system often segments a time line through a table name (Metric), places the time lines belonging to the same table in the same socket (storage space) and distributes the time lines to each node to realize distributed storage; however, this slicing strategy has a drawback that when there is a large write rate difference between the two tables, a situation occurs in which one node is very heavy and the other node is very light, and it is obvious that the load balancing requirement is not met. Therefore, the strategy of hashing fragments by tag values (Tags) can be extended on the basis of the original table name based fragment strategy.

Further, hash-slicing policies based on table names and tag values (Tags) facilitate load balancing of distributed time-series databases.

In this embodiment, in general, the user has a clearer understanding of the service data, and may designate the most suitable hash policy for slicing before writing. For example, the labels may be divided according to table names, or one or a group of labels specifying a timeline, or a combination of table names and labels. The system then calculates the corresponding hash value based on the user's selection.

It should be appreciated that the Hash value may be obtained by invoking a Hash (Hash) algorithm.

And (2.3) slicing the writing task according to the hash value calculated in the step (2.2), and putting the writing task into a queue to be written in a slicing unit. Wherein the queue adopts a first-in first-out scheduling strategy.

In the embodiment, the write task is segmented more reasonably according to the hash value calculated in the step (2.2), so that load balancing of the cluster is realized; and the write task is put into the queue to be written by taking the fragments as a unit, so that the write task can be conveniently scheduled according to a first-in first-out scheduling strategy. For example, a monitoring data table and an air humidity table are established for two machine rooms, the monitoring data are collected every 5 seconds, the air humidity table is collected every half hour, at this time, if the data are segmented according to table names, the load of the node where the monitoring data table is located is obviously heavier than that of the node where the air humidity table is located, resources cannot be fully utilized, and if the data are segmented according to machine room id, the data of the machine room 1 and the data of the machine room 2 can be separately written to obtain the nodes with almost the same load, so that load balancing of a cluster is realized.

(2.4) calculating the load value based on the meta information of the write task acquired in the step (2.1).

In this embodiment, the load value is the load condition that the node is processing tasks during the system operation. The higher the load value of the node in a period of time is, the larger the task amount which the node needs to write in is, and the more busy the work is.

Further, the formula of the load value is as follows:

the tag complexity is the product of the number of tags contained in the slicing task, and is expressed as:

Illustratively, when there are two labels in a certain segment, there are two values of area 1/2 under the area and three values of 1/2/3 under the machine room, the label complexity of the segment is 2*3 =6, the number of the label values is not fixed, the segment is allocated to area 1/2, the next segment is possibly allocated to area 3/4/5, and thus the label complexity under each segment is different; it is further assumed that the slice has two measured values of temperature and humidity, and 30s of data are recorded, and the load value of the slice is 6×2×30=360.

(2.5) scheduling the writing task to be written into the queue in the step (2.3) to the node with the smallest load (load) value in the cluster nodes, and dynamically updating the load (load) value table.

In this embodiment, after calculating the load value of the slice, the load value dynamic table of the cluster node (the load value of each node is 0 initially) is queried, the node with the smallest load is found, the task of the slice is allocated to the node, the load value is added to the load value of the node, and the load value table is dynamically updated. When the writing of the slicing task is finished, the accumulated load value of the corresponding node is subtracted and the load value table is dynamically updated as well. Based on the scheduling method, the business complexity of the large-scale time sequence data task in the distributed scene is fully considered, so that the load of the cluster is further balanced, the concurrency of the task is enhanced, and the writing efficiency is improved.

Specifically, according to the scheduling manner of step (2.5), the data of the writing task is written into the corresponding node, that is, written into the single server in the corresponding cluster.

(3) And (3) performing writing operation on the scheduled task on a specific node, acquiring a writing request according to the writing task of the writing node in the step (2), distributing a memory array for the writing request, dynamically distributing memories required by the writing task based on a massive time line, and then brushing data of the writing task.

In this embodiment, the dynamic allocation of the memory is an optimization for more scenes of the timeline. When the time lines are more, because a large amount of meta information needs to be stored, the memory is easy to overrun and trigger the flash disk, so that effective data stored by each flash disk is very limited, and the next flash disk needs to generate a large amount of meta information again, so that the writing efficiency is greatly reduced, and therefore, the memory needs to be dynamically allocated based on a large amount of time lines.

It should be understood that, in general, the field value is stored in a form of pre-allocated fixed-size array, so that in a scene with more time lines, the memory of the array storing the field value occupies a relatively large amount, the memory threshold is reached soon to trigger the flash, and a large amount of blank memory is often left for storing the field value without writing data, so that serious resource waste is caused.

In this embodiment, the dynamic memory optimization may reduce the space memory ratio by dynamically adjusting the memory size of the pre-allocated field value array, so as to avoid resource waste. It should be understood that the memory threshold is adjusted to be larger until the number of the timelines on the current scale is adapted, so that the resource waste can be avoided, but the method is a method for treating the symptoms without treating the root cause, on one hand, the timelines are likely to be further expanded, time and labor are wasted when the memory threshold is adjusted to be larger for multiple times, on the other hand, the load of the CPU and the memory is likely to be increased when the memory threshold is adjusted to be larger, and the performance is reduced, so that the feasibility of the method is not high.

Preferably, the memory is dynamically optimized by dynamically adjusting the memory size of the pre-allocated field value array.

The present embodiment optimizes the allocation of the data memory, specifically as follows: before each brushing, the size of the effective data in each data block is obtained and summed, the size of the effective data in each array is calculated by dividing the size of the effective data by the number of the arrays, and the average value is rounded upwards by 8 to obtain a reference value. When the reference value is smaller than the original memory size, taking the reference value as the memory size pre-allocated when the array is generated next time, and modifying the corresponding parameter value; when the reference value is equal to the original memory size, it is indicated that each field array can be fully written, and the array size can be properly enlarged, at this time, twice the reference value is taken as the memory size pre-allocated when the array is generated next time, and the corresponding parameter value is modified. Finally, the pre-allocated memory size will be dynamically stabilized to a fixed value, thereby reducing the frequency of brushing the disk and improving the writing efficiency.

Specifically, as shown in fig. 3, the method comprises the following steps:

and (3.1) acquiring a write request according to the write task of the write node in the step (2), distributing a memory array for the write request, and writing data.

(3.2) judging whether the memory is larger than a memory threshold, if so, calculating the effective data duty ratio, adjusting the size of the memory array according to the effective data duty ratio, and then brushing the write-in data; otherwise, go to step (3.3).

It should be understood that when the memory is greater than the preset memory threshold, the size of the memory array may be adjusted according to the effective data duty ratio, the trimmed size of the memory array is taken as the memory size pre-allocated when the array is generated next time, and then the disk is flushed, so that the write data is actually written into the disk.

3.3, judging whether the current memory array is full, if so, expanding the size of the memory array to be twice as large as the current memory array, and then brushing the written data; otherwise, go to step (3.4).

It should be understood that when the memory array can be fully written, the size of the array can be properly enlarged, the size of the memory array can be enlarged to be twice as much as the current size of the memory array, which is pre-allocated when the array is generated next time, and then the memory array is flushed, so that the written data is actually written into the disk.

(3.4) judging whether other writing requests exist, if so, returning to the step (3.1); otherwise, the writing task ends.

For example, 100 tens of thousands of fields are used, the memory threshold is 64MB, and the primitive array size is 128B. Before data writing, checking whether an array is allocated to the current field in a memory, if so, writing the data into the array, otherwise, generating a new array. In the original memory allocation method, because the array is larger, the memory required to be pre-allocated reaches 128B 100W and is about 128MB, which is far larger than the memory threshold value, the flash is necessarily triggered before the array is allocated, at this time, for the arrays with the allocated memory, only the first data is always written, and serious space waste is caused; in the optimized dynamic memory allocation method, the size of the array after the first adjustment is only 8B, the memory required to be pre-allocated is about 8MB, the memory threshold cannot be reached, the data writing will trigger the disk brushing after the array is filled, the memory space is fully utilized, the data writing is full at this time, the requirement of expanding the array is met, then the array is adjusted to 16B for the second time, and so on, the pre-allocated memory is infinitely close to the memory threshold, and the memory space is most effectively utilized.

In the embodiment, experiments and comparisons are made on the situation of brushing discs in different scale time line scenes based on two distribution methods before and after optimization. Experimental environment: TSDB single node, configuration: CPU4 cores, memory 16GB, disk size 480GB. The experiment is to write 1 hundred million time series data into the database through 10W/40W/70W time lines respectively, and count the average brushing time, the brushing times and the total time. The specification of each time line is 3 labels and 3 measurement values, and the 3 measurement values are BIGINT, DOUBLE, VARCHAR data types respectively. The experimental results are shown in table 1.

Table 1: comparison of brushing conditions in different scale timeline scenes

As can be seen from Table 1, in the scene with fewer time lines, the method has similar brushing time to the originally designed brushing time, and the brushing time of the method is obviously longer than the original design along with the increase of the time lines, so that the brushing frequency is reduced, the writing efficiency is improved, and the data writing pressure in the scene with massive time lines is effectively relieved.

Corresponding to the embodiment of the time sequence database writing performance optimizing method based on the mass data scene, the invention also provides an embodiment of the time sequence database writing performance optimizing device based on the mass data scene.

Referring to fig. 5, the device for optimizing writing performance of a time sequence database based on a mass data scene provided by the embodiment of the invention includes one or more processors, and is used for implementing the method for optimizing writing performance of a time sequence database based on a mass data scene in the above embodiment.

The embodiment of the time sequence database writing performance optimizing device based on the mass data scene can be applied to any device with data processing capability, and the device with data processing capability can be a device or a device such as a computer. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of any device with data processing capability. In terms of hardware, as shown in fig. 5, a hardware structure diagram of an apparatus with data processing capability according to the present invention, where the time-series database writing performance optimizing apparatus based on a mass data scenario is located, is shown in fig. 5, and in addition to a processor, a memory, a network interface, and a nonvolatile memory shown in fig. 5, any apparatus with data processing capability in the embodiment generally includes other hardware according to an actual function of the any apparatus with data processing capability, which is not described herein.

The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present invention. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The embodiment of the invention also provides a computer program, which comprises computer program instructions, when the instructions are executed by a processor, to realize the time sequence database writing performance optimization method based on the mass data scene in the embodiment.

The embodiment of the invention also provides a computer readable storage medium, on which a program is stored, which when executed by a processor, implements the method for optimizing the writing performance of a time sequence database based on a mass data scene in the embodiment.

The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may be any device having data processing capability, for example, a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), or the like, which are provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing device. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The time sequence database writing performance optimization method based on the mass data scene is characterized by comprising the following steps of:

(3) Acquiring a writing request according to the writing task of the writing node in the step (2), distributing a memory array for the writing request, dynamically distributing memories required by the writing task based on a massive time line, and then brushing data of the writing task;

said step (1) comprises the sub-steps of:

2. The method for optimizing writing performance of a time series database based on a mass data scenario as claimed in claim 1, wherein said step (2) comprises the sub-steps of:

3. The method for optimizing writing performance of a time-series database based on a mass data scenario as claimed in claim 2, wherein the expression of the load value is:

load value = tag complexity number of measurement values time length

4. The method for optimizing writing performance of a time-series database based on a mass data scenario of claim 2, wherein said dynamically updating the load value table in step (2.5) comprises: distributing the writing task of the fragment to a node with the smallest load value in the cluster nodes, and accumulating the load value to the load value of the node so as to dynamically update a load value table; when the writing of the writing task of the slice is finished, the accumulated load value of the corresponding node is subtracted to dynamically update the load value table.

5. The method for optimizing writing performance of a time series database based on a mass data scenario as claimed in claim 1, wherein said step (3) comprises the sub-steps of:

6. A time-series database writing performance optimizing device based on mass data scenes, which is characterized by comprising one or more processors for realizing the time-series database writing performance optimizing method based on mass data scenes according to any one of claims 1-5.

7. A computer program comprising computer program instructions which, when executed by a processor, implement the mass data scenario-based time-series database write performance optimization method of any one of claims 1-5.

8. A computer readable storage medium, having stored thereon a program which, when executed by a processor, is adapted to carry out the method of optimizing writing performance of a time-series database based on a mass data scenario as claimed in any one of claims 1 to 5.