CN114416737B - Time sequence data storage method based on dynamic weight balance time sequence database cluster - Google Patents

Time sequence data storage method based on dynamic weight balance time sequence database cluster Download PDF

Info

Publication number
CN114416737B
CN114416737B CN202210002029.8A CN202210002029A CN114416737B CN 114416737 B CN114416737 B CN 114416737B CN 202210002029 A CN202210002029 A CN 202210002029A CN 114416737 B CN114416737 B CN 114416737B
Authority
CN
China
Prior art keywords
time sequence
sequence database
virtual
cluster
infiluxdb
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210002029.8A
Other languages
Chinese (zh)
Other versions
CN114416737A (en
Inventor
刘涛
瞿洪桂
陈文彬
涂刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sinonet Science and Technology Co Ltd
Original Assignee
Beijing Sinonet Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sinonet Science and Technology Co Ltd filed Critical Beijing Sinonet Science and Technology Co Ltd
Priority to CN202210002029.8A priority Critical patent/CN114416737B/en
Publication of CN114416737A publication Critical patent/CN114416737A/en
Application granted granted Critical
Publication of CN114416737B publication Critical patent/CN114416737B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a time sequence data storage method based on a dynamic weight balance time sequence database cluster, which comprises the following steps: the time sequence database cluster is provided with a cluster interface, a system cluster for issuing and subscribing messages, a reader-writer and m time sequence databases; initializing to generate n virtual buckets when a cluster interface is started, and determining an allocation interval for each virtual bucket; and determining a target time sequence database by adopting a time sequence database selection algorithm based on the dynamic weight balance time sequence database cluster, and storing data. The invention provides a time sequence data storage method based on a dynamic weight balance time sequence database cluster, which is used in a scene of storing data acquired by mass equipment at the cloud end of the Internet of things and can solve the problems of storage space waste and cluster lateral expansion difficulty caused by serious imbalance of stored data of each time sequence database in the current cluster. According to the invention, the data acquired by the Internet of things equipment can be uniformly stored in each time sequence database in the time sequence database cluster, so that the user experience is improved.

Description

Time sequence data storage method based on dynamic weight balance time sequence database cluster
Technical Field
The invention belongs to the technical field of time sequence data storage, and particularly relates to a time sequence data storage method based on a dynamic weight balance time sequence database cluster.
Background
The internet of things system needs to store massive device acquisition data by using a time sequence database at the cloud end so as to be used for inquiry and analysis. The infilux-proxy is used as a clustering scheme of a time sequence database (infixtb), and the problem that a single-version time sequence database cannot store massive equipment acquisition data is solved. The infiux-proxy allocates the equipment acquisition data to be stored for each time sequence database in the time sequence database cluster based on the hash algorithm, so that the problem that the data quantity stored by each time sequence database in the time sequence database cluster is seriously unbalanced is solved.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a time sequence data storage method based on a dynamic weight balance time sequence database cluster, which can effectively solve the problems.
The technical scheme adopted by the invention is as follows:
the invention provides a time sequence data storage method based on a dynamic weight balance time sequence database cluster, which comprises the following steps:
step 1, configuring a cluster interface infilux-gate, a system cluster kafka for issuing and subscribing messages, a reader infilux-writer and m time sequence databases infiluxdb, wherein the m time sequence databases infiluxdb are sequentially represented as follows: time series database infiluxdb 1 Time series database infiluxdb 2 ,., time series database infiluxdb m
Step 2, a cluster interface infilux-gate storage distribution table; the distribution table is used for storing the global unique ID of all tables in the time sequence database infiluxdb cluster and the mapping of the address IP of the time sequence database infiluxdb where the distribution table is located currently;
wherein, the generation process of the table measurement global unique ID is as follows: if the table measurement is used for storing the time sequence data of the specific type of the specific equipment, combining the ID of the specific equipment and the ID of the specific type of the data to obtain a global unique ID of the table measurement;
step 3, storing a current latest time sequence database residual space statistical table by the cluster interface infilux-gate;
the time sequence database residual space statistical table stores each time sequence database influxdb i Current remaining space S i (ii) a Wherein, i is 1, 2.. times, m; and summing the current residual spaces of all the time sequence databases infiluxdb to obtain the total residual space S of the cluster General assembly
And 4, the distribution method of the distribution interval section of the virtual barrel comprises the following steps:
step 4.1, initializing and generating n virtual buckets when the cluster interface infilux-gate is started, wherein the n virtual buckets are respectively expressed as: virtual barrel
Figure BDA0003455062990000021
Virtual barrel
Figure BDA0003455062990000022
Virtual barrel
Figure BDA0003455062990000023
Step 4.2, the cluster interface infilux-gate generates a virtual bucket global unique ID of each virtual bucket, then an md5 value of the virtual bucket global unique ID is calculated, and the last four bytes of the md5 value are taken as a virtual bucket integer field; thus, the value range of the virtual bucket integer field is: [0000, 9999 ];
thus, for virtual buckets
Figure BDA0003455062990000024
Virtual barrel
Figure BDA0003455062990000025
Virtual barrel
Figure BDA0003455062990000026
The corresponding virtual bucket integer fields are sequentially expressed as:
Figure BDA0003455062990000027
step 4.3, sorting the n virtual buckets from small to large according to the integer fields of the virtual buckets, wherein the sorted virtual buckets are represented as follows: virtual barrel
Figure BDA0003455062990000028
Virtual barrel
Figure BDA0003455062990000029
Virtual barrel
Figure BDA00034550629900000210
The corresponding virtual bucket integer field ordering is expressed as:
Figure BDA00034550629900000211
step 4.4, for any virtual bucket
Figure BDA00034550629900000212
Wherein j is 1,2, n, and the distribution block section KP is obtained by the following method j
Virtual barrel
Figure BDA00034550629900000213
The virtual bucket integer field of
Figure BDA00034550629900000214
If j is 1, the block section is allocated
Figure BDA00034550629900000215
If j ≠ 1, then the partition section is allocated
Figure BDA0003455062990000031
Thus, for virtual buckets
Figure BDA0003455062990000032
Virtual barrel
Figure BDA0003455062990000033
Virtual barrel
Figure BDA0003455062990000034
The corresponding distribution interval section is as follows: KP (Key Performance) 1 ,KP 2 ,...,KP n
Distribution section KP 1 ,KP 2 ,...,KP n The lengths of the two parts are as follows: f 1 ,F 2 ,...,F n
Then: the standard deviation of the lengths of the n distribution block sections is smaller than a set threshold value, and the lengths of the distribution block sections tend to be equal; in this way, the complete integer space [0,2 ] 32 -1]Dividing the current time into n distribution interval sections;
step 5, when the cluster interface infiux-gate receives a write-in data request, the cluster interface infiux-gate analyzes the write-in data request to obtain an equipment ID and a data type ID corresponding to data to be written in, and then combines the equipment ID and the data type ID to obtain a table measurement global unique ID, which is expressed as: table measurement global unique id (new);
step 6, the cluster interface infilux-gate searches the distribution table in the step 2, judges whether a record of the table measurement global unique ID (new) obtained in the step 5 exists in the distribution table, and executes the step 7 if the record does not exist; if the time sequence database infiluxdb exists, acquiring the address IP of the time sequence database infiluxdb corresponding to the table measurement global unique ID (new), taking the address IP as the address IP of the target time sequence database infiluxdb, and then executing the step 8;
step 7, the cluster interface influx-gate adopts a time sequence database selection algorithm based on a dynamic weight balance time sequence database cluster to select influxdb from the time sequence database 1 Time series database infiluxdb 2 ,., time series database infiluxdb m Selecting the target time sequence database infiluxdb to obtain the address IP of the target time sequence database infiluxdb, and then executing the step 8;
the method comprises the following specific steps:
step 7.1, the cluster interface infilux-gate reads the current latest time sequence database residual space statistical table in step 3,
calculating to obtain influxdb of each time sequence database in the cluster i Current remaining space S i In the cluster total remaining space S General assembly Ratio R of i Then using the ratio R i Multiplying by the total number n of virtual buckets to obtain the influxdb assigned to the time sequence database i The number of virtual buckets;
step 7.2, obtaining each time sequence database infiluxdb according to the step 7.1 i Distributing n virtual buckets to m time sequence databases infiluxdb according to the number of the distributed virtual buckets, and obtaining a virtual bucket distribution table;
the virtual bucket allocation table is used for recording the mapping of the virtual bucket global unique ID, the virtual bucket allocation interval and the timing sequence database inflixdb address IP to which the virtual bucket belongs;
step 7.3, for the write data request currently being processed, the cluster interface infiux-gate calculates the md5 value of the table measurement global unique id (new) obtained in step 5, and takes the last four bytes of the md5 value as the table integer field x (new);
then, using the table integer field x (new) as a query key word, searching the virtual bucket allocation table established in step 7.2 to obtain a virtual bucket allocation section kp (new) including the table integer field x (new), wherein the time sequence database inflixdb address IP corresponding to the virtual bucket allocation section kp (new) is the searched target time sequence database inflixdb, thereby obtaining the address IP of the target time sequence database inflixdb;
step 8, the cluster interface infilux-gate packages the current data to be written, the table measurement global unique id (new) and the address IP of the target time sequence database infixtb into a data packet, writes the data packet into the publishing and subscribing message system cluster kafka, and completes the process of writing data into the target time sequence database infiluxdb by the cooperation of the publishing and subscribing message system cluster kafka and the reader-writer infilux-writer;
recalculating the cluster interface influx-gate to obtain the current residual space of the target time sequence database influxdb and the total residual space S of the cluster General assembly Updating the time sequence database residual space statistical table in the step 3 and the distribution table in the step 2; and then returns to step 5 to process the next write data request.
Preferably, in step 8, the publishing and subscribing message system cluster kafka is matched with the reader infilux-writer to complete the process of writing data into the target time sequence database infiluxdb, which specifically includes:
step 8.1, the reader infilux-writer reads the data packet from the publishing and subscribing message system cluster kafka, and analyzes the data packet to obtain the current data to be written, the table measurement global unique ID (new) and the address IP of the target time sequence database infiluxdb;
step 8.2, the reader infilux-writer locates the target time sequence database infiluxdb according to the address IP of the target time sequence database infiluxdb, and then sends the current data to be written and the corresponding table measurement global unique ID (new) to the target time sequence database infiluxdb;
step 8.3, the target time sequence database infiluxdb judges whether a table with the global unique ID being table measurement global unique ID (new) exists in the database;
if the global unique ID exists, directly writing the data which needs to be written currently into a table with the global unique ID being the table measurement global unique ID (new);
if the global unique ID does not exist, the target time sequence database infiluxdb newly establishes a table with the global unique ID being the table measurement global unique ID (new) in the database, and then writes the data needing to be written currently into the newly established table.
The time sequence data storage method based on the dynamic weight balance time sequence database cluster has the following advantages that:
the invention provides a time sequence data storage method based on a dynamic weight balance time sequence database cluster, which is used for a scene of storing data acquired by mass equipment at the cloud end of the Internet of things and can solve the problems of storage space waste and cluster transverse expansion difficulty caused by serious imbalance of each inflixdb storage data in the current inflixdb cluster. According to the invention, the data acquired by the Internet of things equipment can be uniformly stored in each time sequence database infiluxdb in the time sequence database cluster, so that the user experience is improved.
Drawings
Fig. 1 is a schematic flow chart of a time sequence data storage method based on a dynamic weight balancing time sequence database cluster according to the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a time sequence data storage method based on a dynamic weight balance time sequence database cluster, based on the method, the data collected by Internet of things equipment can be uniformly stored in each time sequence database infiluxdb in the time sequence database cluster, and the method is mainly used for storing a scene of massive data collected by the Internet of things equipment at the cloud end of an Internet of things system, and referring to fig. 1, the method comprises the following steps:
step 1, configuring a cluster interface infilux-gate, a system cluster kafka for issuing and subscribing messages, a reader infilux-writer and m time sequence databases infiluxdb, wherein the m time sequence databases infiluxdb are sequentially represented as follows: time series database infiluxdb 1 Time series database infiluxdb 2 ,., time series database infiluxdb m
Step 2, a cluster interface infilux-gate storage distribution table; the distribution table is used for storing the global unique ID of all tables in the time sequence database infiluxdb cluster and the mapping of the address IP of the time sequence database infiluxdb where the distribution table is located currently;
the generation process of the table measurement global unique ID comprises the following steps: if the table measurement is used for storing the time sequence data of the specific type of the specific equipment, combining the ID of the specific equipment and the ID of the specific type of the data to obtain a global unique ID of the table measurement;
for example, the temperature data, humidity data, and power data of the equipment E are respectively three types of time series data, and then: the temperature data of the equipment E is stored through a table measurement, and the global unique ID of the table measurement is as follows: a combination of device E and temperature type ID;
and storing the humidity data of the equipment E through another table measurement, wherein the global unique ID of the table measurement is as follows: a combination of equipment E and humidity type ID;
and storing the power data of the equipment E through another table measurement, wherein the global unique ID of the table measurement is as follows: device E and power type ID.
In the invention, a cluster interface influx-gateym is used as an inflixdb cluster portal and provides interfaces for writing data and inquiring data for an external system.
Step 3, storing a current latest time sequence database residual space statistical table by the cluster interface infilux-gate;
the time series database residual space statisticsThe table stores each time-series database infiluxdb i Current remaining space S i (ii) a Wherein, i ═ 1, 2., m; and summing the current residual spaces of all the time sequence databases infiluxdb to obtain the total residual space S of the cluster General (1)
Step 4, the distribution method of the distribution interval section of the virtual barrel comprises the following steps:
step 4.1, initializing and generating n virtual buckets when the cluster interface infilux-gate is started, wherein the n virtual buckets are respectively expressed as: virtual barrel
Figure BDA0003455062990000071
Virtual barrel
Figure BDA0003455062990000072
Virtual barrel
Figure BDA0003455062990000073
For example, 1000 virtual buckets may be generated.
Step 4.2, the cluster interface infilux-gate generates a virtual bucket global unique ID of each virtual bucket, then an md5 value of the virtual bucket global unique ID is calculated, and the last four bytes of the md5 value are taken as a virtual bucket integer field; thus, the range of values for the virtual bucket integer field is: [0000, 9999 ];
thus, for virtual buckets
Figure BDA0003455062990000074
Virtual barrel
Figure BDA0003455062990000075
Virtual barrel
Figure BDA0003455062990000076
The corresponding virtual bucket integer fields are sequentially expressed as:
Figure BDA0003455062990000077
step 4.3, sorting the n virtual buckets from small to large according to the integer fields of the virtual buckets, wherein the sorted virtual buckets are represented as follows: virtual barrel
Figure BDA0003455062990000078
Virtual barrel
Figure BDA0003455062990000079
Virtual barrel
Figure BDA00034550629900000710
The corresponding virtual bucket integer field ordering is expressed as:
Figure BDA00034550629900000711
step 4.4, for any virtual bucket
Figure BDA00034550629900000712
Wherein j is 1,2, n, and the assigned block segment KP is obtained by the following method j
Virtual barrel
Figure BDA00034550629900000713
The virtual bucket integer field of
Figure BDA00034550629900000714
If j is 1, the block section is allocated
Figure BDA00034550629900000715
If j ≠ 1, then the partition section is allocated
Figure BDA00034550629900000716
Thus, for virtual buckets
Figure BDA00034550629900000717
Virtual barrel
Figure BDA00034550629900000718
Virtual barrel
Figure BDA00034550629900000719
The corresponding distribution interval section is as follows: KP (Key Performance) 1 ,KP 2 ,...,KP n
Distribution interval section KP 1 ,KP 2 ,...,KP n The lengths of the components are as follows in sequence: f 1 ,F 2 ,...,F n
Then: the standard deviation of the lengths of the n distribution block sections is smaller than a set threshold value, and the lengths of the distribution block sections tend to be equal; in this way, the complete integer space [0,2 ] 32 -1]Dividing the current time into n distribution interval sections;
step 5, when the cluster interface infiux-gate receives a write-in data request, the cluster interface infiux-gate analyzes the write-in data request to obtain an equipment ID and a data type ID corresponding to data to be written in, and then combines the equipment ID and the data type ID to obtain a table measurement global unique ID, which is expressed as: table measurement global unique id (new);
step 6, the cluster interface infilux-gate searches the distribution table in the step 2, judges whether a record of the table measurement global unique ID (new) obtained in the step 5 exists in the distribution table, and executes the step 7 if the record does not exist; if the time sequence database infiluxdb exists, acquiring the address IP of the time sequence database infiluxdb corresponding to the table measurement global unique ID (new), taking the address IP as the address IP of the target time sequence database infiluxdb, and then executing the step 8;
step 7, the cluster interface influx-gate adopts a time sequence database selection algorithm based on a dynamic weight balance time sequence database cluster to select influxdb from the time sequence database 1 Time series database infiluxdb 2 ,., time series database infiluxdb m Selecting the target time sequence database infiluxdb to obtain the address IP of the target time sequence database infiluxdb, and then executing the step 8;
the method comprises the following specific steps:
step 7.1, the cluster interface infilux-gate reads the current latest time sequence database residual space statistical table in step 3,
calculating to obtain influxdb of each time sequence database in the cluster i Is currently leftResidual space S i In the cluster total remaining space S General assembly Ratio R of i Then using the ratio R i Multiplying by the total number n of virtual buckets to obtain the influxdb assigned to the time sequence database i The number of virtual buckets;
step 7.2, obtaining each time sequence database infiluxdb according to the step 7.1 i Distributing n virtual buckets to m time sequence databases infiluxdb according to the number of the distributed virtual buckets, and obtaining a virtual bucket distribution table;
the virtual bucket allocation table is used for recording the mapping of the virtual bucket global unique ID, the virtual bucket allocation interval and the timing sequence database inflixdb address IP to which the virtual bucket belongs;
step 7.3, for the write data request currently being processed, the cluster interface infiux-gate calculates the md5 value of the table measurement global unique id (new) obtained in step 5, and takes the last four bytes of the md5 value as the table integer field x (new);
then, using the table integer field x (new) as a query key word, searching the virtual bucket allocation table established in step 7.2 to obtain a virtual bucket allocation section kp (new) including the table integer field x (new), wherein the time sequence database inflixdb address IP corresponding to the virtual bucket allocation section kp (new) is the searched target time sequence database inflixdb, thereby obtaining the address IP of the target time sequence database inflixdb;
step 8, the cluster interface infilux-gate packages the current data to be written, the table measurement global unique id (new) and the address IP of the target time sequence database infixtb into a data packet, writes the data packet into the publishing and subscribing message system cluster kafka, and completes the process of writing data into the target time sequence database infiluxdb by the cooperation of the publishing and subscribing message system cluster kafka and the reader-writer infilux-writer;
recalculating the cluster interface influx-gate to obtain the current residual space of the target time sequence database influxdb and the total residual space S of the cluster General assembly Updating the time sequence database residual space statistical table in the step 3 and the distribution table in the step 2; and then returns to step 5 to process the next write data request.
In this step, the publishing and subscribing message system cluster kafka is matched with the reader infilux-writer to complete the process of writing data into the target time sequence database infiluxdb, which specifically includes:
step 8.1, the reader infilux-writer reads the data packet from the publishing and subscribing message system cluster kafka, and analyzes the data packet to obtain the current data to be written, the table measurement global unique ID (new) and the address IP of the target time sequence database infiluxdb;
step 8.2, the reader infilux-writer locates the target time sequence database infiluxdb according to the address IP of the target time sequence database infiluxdb, and then sends the current data to be written and the corresponding table measurement global unique ID (new) to the target time sequence database infiluxdb;
step 8.3, the target time sequence database infiluxdb judges whether a table with the global unique ID being table measurement global unique ID (new) exists in the database;
if the global unique ID exists, directly writing the data which needs to be written currently into a table with the global unique ID being the table measurement global unique ID (new);
if the global unique ID does not exist, the target time sequence database infiluxdb newly establishes a table with the global unique ID being the table measurement global unique ID (new) in the database, and then writes the data needing to be written currently into the newly established table.
The invention provides a time sequence data storage method based on a dynamic weight balance time sequence database cluster, which is used for a scene of storing data acquired by mass equipment at the cloud end of the Internet of things and can solve the problems of storage space waste and cluster transverse expansion difficulty caused by serious imbalance of each inflixdb storage data in the current inflixdb cluster. According to the invention, the data acquired by the Internet of things equipment can be uniformly stored in each time sequence database infiluxdb in the time sequence database cluster, so that the user experience is improved.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and improvements can be made without departing from the principle of the present invention, and such modifications and improvements should also be considered within the scope of the present invention.

Claims (2)

1. A time sequence data storage method based on a dynamic weight balance time sequence database cluster is characterized by comprising the following steps:
step 1, configuring a cluster interface infilux-gate, a system cluster kafka for issuing and subscribing messages, a reader infilux-writer and m time sequence databases infiluxdb, wherein the m time sequence databases infiluxdb are sequentially represented as follows: time series database infiluxdb 1 Time series database infiluxdb 2 ,., time series database infiluxdb m
Step 2, a cluster interface infilux-gate storage distribution table; the distribution table is used for storing the global unique ID of all tables in the time sequence database infiluxdb cluster and the mapping of the address IP of the time sequence database infiluxdb where the distribution table is located currently;
the generation process of the table measurement global unique ID comprises the following steps: if the table measurement is used for storing the time sequence data of the specific type of the specific equipment, combining the ID of the specific equipment and the ID of the specific type of the data to obtain a global unique ID of the table measurement;
step 3, storing a current latest time sequence database residual space statistical table by the cluster interface infilux-gate;
the time sequence database residual space statistical table stores each time sequence database influxdb i Current remaining space S i (ii) a Wherein, i is 1, 2.. times, m; and summing the current residual spaces of all the time sequence databases infiluxdb to obtain the total residual space S of the cluster General assembly
And 4, the distribution method of the distribution interval section of the virtual barrel comprises the following steps:
step 4.1, initializing and generating n virtual buckets when the cluster interface infilux-gate is started, wherein the n virtual buckets are respectively represented as: virtual barrel
Figure FDA0003455062980000011
Virtual barrel
Figure FDA0003455062980000012
.., virtual bucket
Figure FDA0003455062980000013
Step 4.2, the cluster interface infilux-gate generates a virtual bucket global unique ID of each virtual bucket, then an md5 value of the virtual bucket global unique ID is calculated, and the last four bytes of the md5 value are taken as a virtual bucket integer field; thus, the value range of the virtual bucket integer field is: [0000, 9999 ];
thus, for virtual buckets
Figure FDA0003455062980000014
Virtual barrel
Figure FDA0003455062980000015
.., virtual bucket
Figure FDA0003455062980000016
The corresponding virtual bucket integer fields are sequentially expressed as:
Figure FDA0003455062980000021
step 4.3, sorting the n virtual buckets from small to large according to the integer fields of the virtual buckets, wherein the sorted virtual buckets are represented as follows: virtual barrel
Figure FDA0003455062980000022
Virtual barrel
Figure FDA0003455062980000023
.., virtual bucket
Figure FDA0003455062980000024
The corresponding virtual bucket integer field ordering is expressed as:
Figure FDA0003455062980000025
step 4.4, for any virtual bucket
Figure FDA0003455062980000026
Wherein j is 1,2, n, and the assigned block segment KP is obtained by the following method j
Virtual barrel
Figure FDA0003455062980000027
The virtual bucket integer field of
Figure FDA0003455062980000028
If j is 1, the block section is allocated
Figure FDA0003455062980000029
If j ≠ 1, then the partition section is allocated
Figure FDA00034550629800000210
Thus, for virtual buckets
Figure FDA00034550629800000211
Virtual barrel
Figure FDA00034550629800000212
.., virtual bucket
Figure FDA00034550629800000213
The corresponding distribution interval section is as follows: KP (Key Performance) 1 ,KP 2 ,...,KP n
Distribution section KP 1 ,KP 2 ,...,KP n The lengths of the components are as follows in sequence: f 1 ,F 2 ,...,F n
Then: the standard deviation of the lengths of the n distribution block sections is smaller than a set threshold value, and the lengths of the distribution block sections tend to be equal; in this way, the whole integer space is divided[0,2 32 -1]Dividing the data into n distribution block sections;
step 5, when the cluster interface infiux-gate receives a write-in data request, the cluster interface infiux-gate analyzes the write-in data request to obtain an equipment ID and a data type ID corresponding to data to be written in, and then combines the equipment ID and the data type ID to obtain a table measurement global unique ID, which is expressed as: table measurement global unique id (new);
step 6, the cluster interface infilux-gate searches the distribution table in the step 2, judges whether a record of the table measurement global unique ID (new) obtained in the step 5 exists in the distribution table, and executes the step 7 if the record does not exist; if the time sequence database infiluxdb exists, acquiring the address IP of the time sequence database infiluxdb corresponding to the table measurement global unique ID (new), taking the address IP as the address IP of the target time sequence database infiluxdb, and then executing the step 8;
step 7, the cluster interface influx-gate adopts a time sequence database selection algorithm based on a dynamic weight balance time sequence database cluster to select influxdb from the time sequence database 1 Time series database infiluxdb 2 ,., timing database influxdb m Selecting the target time sequence database infiluxdb to obtain the address IP of the target time sequence database infiluxdb, and then executing the step 8;
the method comprises the following specific steps:
step 7.1, the cluster interface infilux-gate reads the current latest time sequence database residual space statistical table in step 3,
calculating to obtain influxdb of each time sequence database in the cluster i Current remaining space S i In the cluster total remaining space S General assembly Ratio R of i Then using the ratio R i Multiplying by the total number n of virtual buckets to obtain the influxdb assigned to the time sequence database i The number of virtual buckets;
step 7.2, obtaining each time sequence database infiluxdb according to the step 7.1 i Distributing n virtual buckets to m time sequence databases infiluxdb to obtain virtual bucket distribution tables;
the virtual bucket allocation table is used for recording mapping of the virtual bucket global unique ID, the virtual bucket allocation block section and the timing database inflixdb address IP to which the virtual bucket belongs;
step 7.3, for the write data request currently being processed, the cluster interface infiux-gate calculates the md5 value of the table measurement global unique id (new) obtained in step 5, and takes the last four bytes of the md5 value as the table integer field x (new);
then, using the table integer field x (new) as a query key word, searching the virtual bucket allocation table established in step 7.2 to obtain a virtual bucket allocation section kp (new) including the table integer field x (new), wherein the time sequence database inflixdb address IP corresponding to the virtual bucket allocation section kp (new) is the searched target time sequence database inflixdb, thereby obtaining the address IP of the target time sequence database inflixdb;
step 8, the cluster interface infilux-gate packages the current data to be written, the table measurement global unique id (new) and the address IP of the target time sequence database infixtb into a data packet, writes the data packet into the publishing and subscribing message system cluster kafka, and completes the process of writing data into the target time sequence database infiluxdb by the cooperation of the publishing and subscribing message system cluster kafka and the reader-writer infilux-writer;
recalculating the cluster interface influx-gate to obtain the current residual space of the target time sequence database influxdb and the total residual space S of the cluster General assembly Updating the time sequence database residual space statistical table in the step 3 and the distribution table in the step 2; and then returns to step 5 to process the next write data request.
2. The method for storing time series data based on the dynamic weight balance time series database cluster as claimed in claim 1, wherein in step 8, the publishing and subscribing message system cluster kafka cooperates with the reader infilux-writer to complete the process of writing data into the target time series database infiluxdb, specifically:
step 8.1, the reader infilux-writer reads the data packet from the publishing and subscribing message system cluster kafka, and analyzes the data packet to obtain the current data to be written, the table measurement global unique ID (new) and the address IP of the target time sequence database infiluxdb;
step 8.2, the reader infilux-writer locates the target time sequence database infiluxdb according to the address IP of the target time sequence database infiluxdb, and then sends the current data to be written and the corresponding table measurement global unique ID (new) to the target time sequence database infiluxdb;
step 8.3, the target time sequence database infiluxdb judges whether a table with the global unique ID being table measurement global unique ID (new) exists in the database;
if the global unique ID exists, directly writing the data which needs to be written currently into a table with the global unique ID being the table measurement global unique ID (new);
if the global unique ID does not exist, the target time sequence database infiluxdb newly establishes a table with the global unique ID being the table measurement global unique ID (new) in the database, and then writes the data needing to be written currently into the newly established table.
CN202210002029.8A 2022-01-04 2022-01-04 Time sequence data storage method based on dynamic weight balance time sequence database cluster Active CN114416737B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210002029.8A CN114416737B (en) 2022-01-04 2022-01-04 Time sequence data storage method based on dynamic weight balance time sequence database cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210002029.8A CN114416737B (en) 2022-01-04 2022-01-04 Time sequence data storage method based on dynamic weight balance time sequence database cluster

Publications (2)

Publication Number Publication Date
CN114416737A CN114416737A (en) 2022-04-29
CN114416737B true CN114416737B (en) 2022-08-05

Family

ID=81272102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210002029.8A Active CN114416737B (en) 2022-01-04 2022-01-04 Time sequence data storage method based on dynamic weight balance time sequence database cluster

Country Status (1)

Country Link
CN (1) CN114416737B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117725258A (en) * 2023-12-19 2024-03-19 北京中电兴发科技有限公司 Video storage planning and positioning read-write method based on space and time balance security protection

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108769271A (en) * 2018-08-20 2018-11-06 北京百度网讯科技有限公司 Method, apparatus, storage medium and the terminal device of load balancing
CN111522665A (en) * 2020-04-24 2020-08-11 北京思特奇信息技术股份有限公司 Zookeeper-based method for realizing high availability and load balancing of Influxdb-proxy
CN112199419A (en) * 2020-10-09 2021-01-08 深圳市欢太科技有限公司 Distributed time sequence database, storage method, equipment and storage medium
CN112422611A (en) * 2020-09-11 2021-02-26 深圳市证通电子股份有限公司 Virtual bucket storage processing method and system based on distributed object storage
CN113282604A (en) * 2021-07-14 2021-08-20 北京远舢智能科技有限公司 High-availability time sequence database cluster system realized based on message queue
CN113655969A (en) * 2021-08-25 2021-11-16 北京中电兴发科技有限公司 Data balanced storage method based on streaming distributed storage system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7020713B1 (en) * 2000-10-10 2006-03-28 Novell, Inc. System and method for balancing TCP/IP/workload of multi-processor system based on hash buckets
US10120921B2 (en) * 2015-10-20 2018-11-06 Mastercard International Incorporated Parallel transfer of SQL data to software framework

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108769271A (en) * 2018-08-20 2018-11-06 北京百度网讯科技有限公司 Method, apparatus, storage medium and the terminal device of load balancing
CN111522665A (en) * 2020-04-24 2020-08-11 北京思特奇信息技术股份有限公司 Zookeeper-based method for realizing high availability and load balancing of Influxdb-proxy
CN112422611A (en) * 2020-09-11 2021-02-26 深圳市证通电子股份有限公司 Virtual bucket storage processing method and system based on distributed object storage
CN112199419A (en) * 2020-10-09 2021-01-08 深圳市欢太科技有限公司 Distributed time sequence database, storage method, equipment and storage medium
CN113282604A (en) * 2021-07-14 2021-08-20 北京远舢智能科技有限公司 High-availability time sequence database cluster system realized based on message queue
CN113655969A (en) * 2021-08-25 2021-11-16 北京中电兴发科技有限公司 Data balanced storage method based on streaming distributed storage system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《基于贪心算法的一致性哈希负载均衡优化》;王诚 等;《南京邮电大学学报(自然科学版)》;20180630;第38卷(第3期);第89-97页 *

Also Published As

Publication number Publication date
CN114416737A (en) 2022-04-29

Similar Documents

Publication Publication Date Title
Zhou et al. Adaptive processing for distributed skyline queries over uncertain data
US5943677A (en) Sparsity management system for multi-dimensional databases
US7213025B2 (en) Partitioned database system
US6493728B1 (en) Data compression for records of multidimensional database
KR101792168B1 (en) Managing storage of individually accessible data units
US9576073B2 (en) Distance queries on massive networks
KR101266358B1 (en) A distributed index system based on multi-length signature files and method thereof
JP2004518226A (en) Database system and query optimizer
US6430565B1 (en) Path compression for records of multidimensional database
CN106095863B (en) A kind of multidimensional data query and storage system and method
CN110825733B (en) Multi-sampling-stream-oriented time series data management method and system
CN105975587A (en) Method for organizing and accessing memory database index with high performance
CN111475105B (en) Monitoring data storage method, monitoring data storage device, monitoring data server and storage medium
CN103294785A (en) Packet-based metadata server cluster management method
CN114416737B (en) Time sequence data storage method based on dynamic weight balance time sequence database cluster
CN113568906A (en) Distributed index structure and load balancing method for high-throughput data stream
Gou et al. Graph stream sketch: Summarizing graph streams with high speed and accuracy
CN111400301B (en) Data query method, device and equipment
US20120254245A1 (en) Relational database joins for inexact matching
CN107273443B (en) Mixed indexing method based on metadata of big data model
US9747363B1 (en) Efficient storage and retrieval of sparse arrays of identifier-value pairs
CN111666302A (en) User ranking query method, device, equipment and storage medium
CN117171161A (en) Data query method and device
CN113360551B (en) Method and system for storing and rapidly counting time sequence data in shooting range
US8645402B1 (en) Matching trip data to transportation network data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant