CN114416737B

CN114416737B - Time sequence data storage method based on dynamic weight balance time sequence database cluster

Info

Publication number: CN114416737B
Application number: CN202210002029.8A
Authority: CN
Inventors: 刘涛; 瞿洪桂; 陈文彬; 涂刚
Original assignee: Beijing Sinonet Science and Technology Co Ltd
Current assignee: Beijing Sinonet Science and Technology Co Ltd
Priority date: 2022-01-04
Filing date: 2022-01-04
Publication date: 2022-08-05
Anticipated expiration: 2042-01-04
Also published as: CN114416737A

Abstract

The invention provides a time sequence data storage method based on a dynamic weight balance time sequence database cluster, which comprises the following steps: the time sequence database cluster is provided with a cluster interface, a system cluster for issuing and subscribing messages, a reader-writer and m time sequence databases; initializing to generate n virtual buckets when a cluster interface is started, and determining an allocation interval for each virtual bucket; and determining a target time sequence database by adopting a time sequence database selection algorithm based on the dynamic weight balance time sequence database cluster, and storing data. The invention provides a time sequence data storage method based on a dynamic weight balance time sequence database cluster, which is used in a scene of storing data acquired by mass equipment at the cloud end of the Internet of things and can solve the problems of storage space waste and cluster lateral expansion difficulty caused by serious imbalance of stored data of each time sequence database in the current cluster. According to the invention, the data acquired by the Internet of things equipment can be uniformly stored in each time sequence database in the time sequence database cluster, so that the user experience is improved.

Description

Time sequence data storage method based on dynamic weight balance time sequence database cluster

Technical Field

The invention belongs to the technical field of time sequence data storage, and particularly relates to a time sequence data storage method based on a dynamic weight balance time sequence database cluster.

Background

The internet of things system needs to store massive device acquisition data by using a time sequence database at the cloud end so as to be used for inquiry and analysis. The infilux-proxy is used as a clustering scheme of a time sequence database (infixtb), and the problem that a single-version time sequence database cannot store massive equipment acquisition data is solved. The infiux-proxy allocates the equipment acquisition data to be stored for each time sequence database in the time sequence database cluster based on the hash algorithm, so that the problem that the data quantity stored by each time sequence database in the time sequence database cluster is seriously unbalanced is solved.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a time sequence data storage method based on a dynamic weight balance time sequence database cluster, which can effectively solve the problems.

The technical scheme adopted by the invention is as follows:

the invention provides a time sequence data storage method based on a dynamic weight balance time sequence database cluster, which comprises the following steps:

step 1, configuring a cluster interface infilux-gate, a system cluster kafka for issuing and subscribing messages, a reader infilux-writer and m time sequence databases infiluxdb, wherein the m time sequence databases infiluxdb are sequentially represented as follows: time series database infiluxdb ₁ Time series database infiluxdb ₂ ,., time series database infiluxdb _m ；

Step 2, a cluster interface infilux-gate storage distribution table; the distribution table is used for storing the global unique ID of all tables in the time sequence database infiluxdb cluster and the mapping of the address IP of the time sequence database infiluxdb where the distribution table is located currently;

wherein, the generation process of the table measurement global unique ID is as follows: if the table measurement is used for storing the time sequence data of the specific type of the specific equipment, combining the ID of the specific equipment and the ID of the specific type of the data to obtain a global unique ID of the table measurement;

step 3, storing a current latest time sequence database residual space statistical table by the cluster interface infilux-gate;

the time sequence database residual space statistical table stores each time sequence database influxdb _i Current remaining space S _i (ii) a Wherein, i is 1, 2.. times, m; and summing the current residual spaces of all the time sequence databases infiluxdb to obtain the total residual space S of the cluster _{General assembly} ；

And 4, the distribution method of the distribution interval section of the virtual barrel comprises the following steps:

step 4.1, initializing and generating n virtual buckets when the cluster interface infilux-gate is started, wherein the n virtual buckets are respectively expressed as: virtual barrel

Virtual barrel

Virtual barrel

Step 4.2, the cluster interface infilux-gate generates a virtual bucket global unique ID of each virtual bucket, then an md5 value of the virtual bucket global unique ID is calculated, and the last four bytes of the md5 value are taken as a virtual bucket integer field; thus, the value range of the virtual bucket integer field is: [0000, 9999 ];

thus, for virtual buckets

Virtual barrel

Virtual barrel

The corresponding virtual bucket integer fields are sequentially expressed as:

step 4.3, sorting the n virtual buckets from small to large according to the integer fields of the virtual buckets, wherein the sorted virtual buckets are represented as follows: virtual barrel

Virtual barrel

Virtual barrel

The corresponding virtual bucket integer field ordering is expressed as:

step 4.4, for any virtual bucket

Wherein j is 1,2, n, and the distribution block section KP is obtained by the following method _j ；

Virtual barrel

The virtual bucket integer field of

If j is 1, the block section is allocated

If j ≠ 1, then the partition section is allocated

Thus, for virtual buckets

Virtual barrel

Virtual barrel

The corresponding distribution interval section is as follows: KP (Key Performance) ₁ ，KP ₂ ,...,KP _n ；

Distribution section KP ₁ ，KP ₂ ,...,KP _n The lengths of the two parts are as follows: f ₁ ，F ₂ ,...,F _n ；

Then: the standard deviation of the lengths of the n distribution block sections is smaller than a set threshold value, and the lengths of the distribution block sections tend to be equal; in this way, the complete integer space [0,2 ] ³² -1]Dividing the current time into n distribution interval sections;

step 5, when the cluster interface infiux-gate receives a write-in data request, the cluster interface infiux-gate analyzes the write-in data request to obtain an equipment ID and a data type ID corresponding to data to be written in, and then combines the equipment ID and the data type ID to obtain a table measurement global unique ID, which is expressed as: table measurement global unique id (new);

step 6, the cluster interface infilux-gate searches the distribution table in the step 2, judges whether a record of the table measurement global unique ID (new) obtained in the step 5 exists in the distribution table, and executes the step 7 if the record does not exist; if the time sequence database infiluxdb exists, acquiring the address IP of the time sequence database infiluxdb corresponding to the table measurement global unique ID (new), taking the address IP as the address IP of the target time sequence database infiluxdb, and then executing the step 8;

step 7, the cluster interface influx-gate adopts a time sequence database selection algorithm based on a dynamic weight balance time sequence database cluster to select influxdb from the time sequence database ₁ Time series database infiluxdb ₂ ,., time series database infiluxdb _m Selecting the target time sequence database infiluxdb to obtain the address IP of the target time sequence database infiluxdb, and then executing the step 8;

the method comprises the following specific steps:

step 7.1, the cluster interface infilux-gate reads the current latest time sequence database residual space statistical table in step 3,

calculating to obtain influxdb of each time sequence database in the cluster _i Current remaining space S _i In the cluster total remaining space S _{General assembly} Ratio R of _i Then using the ratio R _i Multiplying by the total number n of virtual buckets to obtain the influxdb assigned to the time sequence database _i The number of virtual buckets;

step 7.2, obtaining each time sequence database infiluxdb according to the step 7.1 _i Distributing n virtual buckets to m time sequence databases infiluxdb according to the number of the distributed virtual buckets, and obtaining a virtual bucket distribution table;

the virtual bucket allocation table is used for recording the mapping of the virtual bucket global unique ID, the virtual bucket allocation interval and the timing sequence database inflixdb address IP to which the virtual bucket belongs;

step 7.3, for the write data request currently being processed, the cluster interface infiux-gate calculates the md5 value of the table measurement global unique id (new) obtained in step 5, and takes the last four bytes of the md5 value as the table integer field x (new);

then, using the table integer field x (new) as a query key word, searching the virtual bucket allocation table established in step 7.2 to obtain a virtual bucket allocation section kp (new) including the table integer field x (new), wherein the time sequence database inflixdb address IP corresponding to the virtual bucket allocation section kp (new) is the searched target time sequence database inflixdb, thereby obtaining the address IP of the target time sequence database inflixdb;

step 8, the cluster interface infilux-gate packages the current data to be written, the table measurement global unique id (new) and the address IP of the target time sequence database infixtb into a data packet, writes the data packet into the publishing and subscribing message system cluster kafka, and completes the process of writing data into the target time sequence database infiluxdb by the cooperation of the publishing and subscribing message system cluster kafka and the reader-writer infilux-writer;

recalculating the cluster interface influx-gate to obtain the current residual space of the target time sequence database influxdb and the total residual space S of the cluster _{General assembly} Updating the time sequence database residual space statistical table in the step 3 and the distribution table in the step 2; and then returns to step 5 to process the next write data request.

Preferably, in step 8, the publishing and subscribing message system cluster kafka is matched with the reader infilux-writer to complete the process of writing data into the target time sequence database infiluxdb, which specifically includes:

step 8.1, the reader infilux-writer reads the data packet from the publishing and subscribing message system cluster kafka, and analyzes the data packet to obtain the current data to be written, the table measurement global unique ID (new) and the address IP of the target time sequence database infiluxdb;

step 8.2, the reader infilux-writer locates the target time sequence database infiluxdb according to the address IP of the target time sequence database infiluxdb, and then sends the current data to be written and the corresponding table measurement global unique ID (new) to the target time sequence database infiluxdb;

step 8.3, the target time sequence database infiluxdb judges whether a table with the global unique ID being table measurement global unique ID (new) exists in the database;

if the global unique ID exists, directly writing the data which needs to be written currently into a table with the global unique ID being the table measurement global unique ID (new);

if the global unique ID does not exist, the target time sequence database infiluxdb newly establishes a table with the global unique ID being the table measurement global unique ID (new) in the database, and then writes the data needing to be written currently into the newly established table.

The time sequence data storage method based on the dynamic weight balance time sequence database cluster has the following advantages that:

the invention provides a time sequence data storage method based on a dynamic weight balance time sequence database cluster, which is used for a scene of storing data acquired by mass equipment at the cloud end of the Internet of things and can solve the problems of storage space waste and cluster transverse expansion difficulty caused by serious imbalance of each inflixdb storage data in the current inflixdb cluster. According to the invention, the data acquired by the Internet of things equipment can be uniformly stored in each time sequence database infiluxdb in the time sequence database cluster, so that the user experience is improved.

Drawings

Fig. 1 is a schematic flow chart of a time sequence data storage method based on a dynamic weight balancing time sequence database cluster according to the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides a time sequence data storage method based on a dynamic weight balance time sequence database cluster, based on the method, the data collected by Internet of things equipment can be uniformly stored in each time sequence database infiluxdb in the time sequence database cluster, and the method is mainly used for storing a scene of massive data collected by the Internet of things equipment at the cloud end of an Internet of things system, and referring to fig. 1, the method comprises the following steps:

the generation process of the table measurement global unique ID comprises the following steps: if the table measurement is used for storing the time sequence data of the specific type of the specific equipment, combining the ID of the specific equipment and the ID of the specific type of the data to obtain a global unique ID of the table measurement;

for example, the temperature data, humidity data, and power data of the equipment E are respectively three types of time series data, and then: the temperature data of the equipment E is stored through a table measurement, and the global unique ID of the table measurement is as follows: a combination of device E and temperature type ID;

and storing the humidity data of the equipment E through another table measurement, wherein the global unique ID of the table measurement is as follows: a combination of equipment E and humidity type ID;

and storing the power data of the equipment E through another table measurement, wherein the global unique ID of the table measurement is as follows: device E and power type ID.

In the invention, a cluster interface influx-gateym is used as an inflixdb cluster portal and provides interfaces for writing data and inquiring data for an external system.

the time series database residual space statisticsThe table stores each time-series database infiluxdb _i Current remaining space S _i (ii) a Wherein, i ═ 1, 2., m; and summing the current residual spaces of all the time sequence databases infiluxdb to obtain the total residual space S of the cluster _{General (1)} ；

Step 4, the distribution method of the distribution interval section of the virtual barrel comprises the following steps:

Virtual barrel

Virtual barrel

For example, 1000 virtual buckets may be generated.

Step 4.2, the cluster interface infilux-gate generates a virtual bucket global unique ID of each virtual bucket, then an md5 value of the virtual bucket global unique ID is calculated, and the last four bytes of the md5 value are taken as a virtual bucket integer field; thus, the range of values for the virtual bucket integer field is: [0000, 9999 ];

thus, for virtual buckets

Virtual barrel

Virtual barrel

The corresponding virtual bucket integer fields are sequentially expressed as:

Virtual barrel

Virtual barrel

The corresponding virtual bucket integer field ordering is expressed as:

step 4.4, for any virtual bucket

Wherein j is 1,2, n, and the assigned block segment KP is obtained by the following method _j ；

Virtual barrel

The virtual bucket integer field of

If j is 1, the block section is allocated

If j ≠ 1, then the partition section is allocated

Thus, for virtual buckets

Virtual barrel

Virtual barrel

Distribution interval section KP ₁ ，KP ₂ ,...,KP _n The lengths of the components are as follows in sequence: f ₁ ，F ₂ ,...,F _n ；

the method comprises the following specific steps:

calculating to obtain influxdb of each time sequence database in the cluster _i Is currently leftResidual space S _i In the cluster total remaining space S _{General assembly} Ratio R of _i Then using the ratio R _i Multiplying by the total number n of virtual buckets to obtain the influxdb assigned to the time sequence database _i The number of virtual buckets;

In this step, the publishing and subscribing message system cluster kafka is matched with the reader infilux-writer to complete the process of writing data into the target time sequence database infiluxdb, which specifically includes:

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.

Claims

1. A time sequence data storage method based on a dynamic weight balance time sequence database cluster is characterized by comprising the following steps:

step 4.1, initializing and generating n virtual buckets when the cluster interface infilux-gate is started, wherein the n virtual buckets are respectively represented as: virtual barrel

Virtual barrel

.., virtual bucket

thus, for virtual buckets

Virtual barrel

.., virtual bucket

The corresponding virtual bucket integer fields are sequentially expressed as:

Virtual barrel

.., virtual bucket

The corresponding virtual bucket integer field ordering is expressed as:

step 4.4, for any virtual bucket

Virtual barrel

The virtual bucket integer field of

If j is 1, the block section is allocated

If j ≠ 1, then the partition section is allocated

Thus, for virtual buckets

Virtual barrel

.., virtual bucket

Distribution section KP ₁ ，KP ₂ ,...,KP _n The lengths of the components are as follows in sequence: f ₁ ，F ₂ ,...,F _n ；

Then: the standard deviation of the lengths of the n distribution block sections is smaller than a set threshold value, and the lengths of the distribution block sections tend to be equal; in this way, the whole integer space is divided[0,2 ³² -1]Dividing the data into n distribution block sections;

step 7, the cluster interface influx-gate adopts a time sequence database selection algorithm based on a dynamic weight balance time sequence database cluster to select influxdb from the time sequence database ₁ Time series database infiluxdb ₂ ,., timing database influxdb _m Selecting the target time sequence database infiluxdb to obtain the address IP of the target time sequence database infiluxdb, and then executing the step 8;

the method comprises the following specific steps:

step 7.2, obtaining each time sequence database infiluxdb according to the step 7.1 _i Distributing n virtual buckets to m time sequence databases infiluxdb to obtain virtual bucket distribution tables;

the virtual bucket allocation table is used for recording mapping of the virtual bucket global unique ID, the virtual bucket allocation block section and the timing database inflixdb address IP to which the virtual bucket belongs;

2. The method for storing time series data based on the dynamic weight balance time series database cluster as claimed in claim 1, wherein in step 8, the publishing and subscribing message system cluster kafka cooperates with the reader infilux-writer to complete the process of writing data into the target time series database infiluxdb, specifically: