CN114116277A - InfluxDB high-availability cluster implementation system and method - Google Patents
InfluxDB high-availability cluster implementation system and method Download PDFInfo
- Publication number
- CN114116277A CN114116277A CN202111287679.3A CN202111287679A CN114116277A CN 114116277 A CN114116277 A CN 114116277A CN 202111287679 A CN202111287679 A CN 202111287679A CN 114116277 A CN114116277 A CN 114116277A
- Authority
- CN
- China
- Prior art keywords
- influxdb
- unit
- buffer
- node
- availability cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 20
- 238000012544 monitoring process Methods 0.000 claims abstract description 9
- 230000003139 buffering effect Effects 0.000 claims abstract description 6
- 230000003862 health status Effects 0.000 claims description 4
- 238000011084 recovery Methods 0.000 claims description 3
- 238000011144 upstream manufacturing Methods 0.000 claims description 3
- 238000013500 data storage Methods 0.000 abstract description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0709—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0727—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1461—Backup scheduling policy
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1464—Management of the backup or restore process for networked environments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1458—Management of the backup or restore process
- G06F11/1469—Backup restoration techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3034—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3055—Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
- G06F11/3433—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment for load management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
Abstract
The invention discloses a system and a method for realizing an InfluxDB high-availability cluster, and belongs to the technical field of big data storage. The InfluxDB high-availability cluster implementation system comprises a buffer son unit, a health state check unit and a reverse proxy unit; the buffer son unit is used for an asynchronous HTTP proxy of internal buffering; the health state checking unit is used for monitoring the profile state of each node and automatically traversing and deleting the InfluxDB node when recovering data; the reverse proxy unit is supported by Nginx to limit the number of HTTP requests that a client can make per unit time. The InfluxDB high-availability cluster implementation system writes the index into any number of InfluxDB nodes, distributes the inquired high-availability service among all the nodes, and has good popularization and application values.
Description
Technical Field
The invention relates to the technical field of big data storage, and particularly provides a system and a method for realizing an InfluxDB high-availability cluster.
Background
Currently, after the version InfluxDBv0.9, a user cannot create an InfluxDB high-availability cluster from an open-source free version. Only commercial versions are currently available with infiluxdb Enterprise. This causes a number of inconveniences for the infiluxdb user, especially in professional settings, who think it is the company behind infiluxdb-infiluxata that tries to leverage OSS solutions to gain profits.
This situation is also immaterial to the InfluxData company, but the cost of commercial versions of InfluxDB is really a minor burden for many users. This is a significant cost to businesses or organizations that rely heavily on InfluxDB.
Although the solution of infilux Relay was later released by infiluxdata, it was not widely accepted because of the many unsolved problems. Therefore, a solution is needed to truly realize a high available architecture and solve the problems existing in the market.
Disclosure of Invention
The technical task of the present invention is to provide a system and a method for implementing an infiluxdb high-availability cluster, which write an index into any number of infiluxdb nodes and distribute a query high-availability service among all the nodes, in view of the above-mentioned existing problems.
In order to achieve the purpose, the invention provides the following technical scheme:
an InfluxDB high-availability cluster implementation system comprises a buffer son unit, a health state check unit and a reverse proxy unit;
the buffer son unit is used for an asynchronous HTTP proxy of internal buffering;
the health state checking unit is used for monitoring the profile state of each node and automatically traversing and deleting the InfluxDB node when recovering data;
the reverse proxy unit is supported by Nginx to limit the number of HTTP requests that a client can make per unit time.
Preferably, the Bufferson unit provides temporary high-availability storage using queues, and provides a simple proxy function for asynchronously buffered HTTP processing.
Preferably, the buffer unit comprises a Replay-component and a Recover-component, the Replay-component forwards the HTTP request directly to each upstream node, puts the failed request into a buffer, and the Recover-component continuously processes the queue and attempts to deliver the buffered request.
Preferably, when the request is sent to the Bufferson unit, the health status check unit forwards to the infixdb instance by means of a load balancing mechanism.
InfluxDB support/ping, which may facilitate verifying whether a service is running, but actually needs to ensure that it does not process any queries when a node recovers from a temporary failure and the cached data is still refreshing. It is not possible to rely entirely on the call/ping interface to verify that the node is healthy. The health check is run locally and the load balancer is used to put the node in an on/off state for queries.
Preferably, a local daemon is run on each InfluxDB instance, the InfluxDB instance performs two checks of checking and calling a/ping node of the InfluxDB and a Bufferson judgment node that data is not recovered, and the two checks are successful and then return to success.
Preferably, in the reverse proxy unit, reasonable load distribution is realized by passing all traffic through Nginx and Nginx http limit req module.
Some clients have extreme access patterns and it is desirable to ensure reasonable load distribution to avoid clustering problems. By passing all traffic through Nginx and Nginx http limit req module. Therefore, the stability of the cluster in the extreme access mode can be ensured to a greater extent.
The index is written to any number of InfluxDB nodes and queries are distributed among all nodes to provide high availability services. If a tool can be built to run reliable health checks on a single node, a standard load balancer is sufficient to solve the latter. For the former, we must establish a mechanism to forward the write or copy data.
Techniques for adding high availability and failover to InfluxDB include:
1. the problem of high write availability is solved by using indexes repeatedly written into a plurality of independent nodes;
2. resolving temporary faults with cache area payloads;
3. the problem of permanent faults is solved by utilizing backup restoration and a cache area effective load;
4. the traffic peaking problem is addressed with global and single database rate limiting.
A Bufferson unit, a health check unit, and a reverse proxy unit are added to illustrate the monitoring of the stack storage tier supported by infiluxdb.
The invention relates to an InfluxDB high-availability cluster realization method, which is realized by the InfluxDB high-availability cluster realization system, indexes are written into any number of InfluxDB nodes, query is distributed among all the nodes, a buffer unit is used for asynchronous HTTP proxy of internal buffering, a health state check unit is used for monitoring the status of each node, the InfluxDB nodes are automatically deleted in a traversing manner when data is recovered, and a reverse proxy unit is supported by Nginx so as to limit the number of HTTP requests which can be sent by a client in unit time.
Preferably, when the timing task is operated, the Rsync is used for timing backup of data, a temporary fault occurs, the Bufferson-recovery continuously extracts data from the buffer area, the node is delivered to operate when the node is available again, the instance is started, the instance is added to the Bufferson, the backup is restored, the infiluxdb is started, and the Bufferson starts to transmit a request for restoring the backup in the buffer area.
Compared with the prior art, the method for realizing the InfluxDB high-availability cluster has the following outstanding beneficial effects: the method for realizing the InfluxDB high-availability cluster realizes the InfluxDB high-availability cluster, increases the stability and the safety, and has good popularization and application values.
Drawings
Fig. 1 is a topology diagram of an infiluxdb high availability cluster implementation system according to the present invention.
Detailed Description
The system and method for implementing an infiluxdb high-availability cluster according to the present invention will be described in detail with reference to the accompanying drawings and embodiments.
Examples
As shown in fig. 1, the system for implementing an infiluxdb high-availability cluster of the present invention includes a Bufferson unit, a health status check unit, and a reverse proxy unit.
The Bufferson unit is used for internally buffered asynchronous HTTP proxy.
The buffer unit includes a Replay-component that forwards HTTP requests directly to each upstream node, places failed requests into a buffer, and a Recover-component that continuously processes the queue and attempts to deliver buffered requests. The Bufferson unit uses queues to provide temporary high availability storage, providing a simple proxy function for asynchronously buffered HTTP processing.
The health state checking unit is used for monitoring the profile state of each node and automatically traversing and deleting the InfluxDB node when the data is recovered.
When a request is sent to the Bufferson unit, the health status check unit forwards to the infiluxdb instance through a load balancing mechanism.
InfluxDB support/ping, which may facilitate verifying whether a service is running, but actually needs to ensure that it does not process any queries when a node recovers from a temporary failure and the cached data is still refreshing. It is not possible to rely entirely on the call/ping interface to verify that the node is healthy. The health check is run locally and the load balancer is used to put the node in an on/off state for queries. And running a local daemon program on each InfluxDB instance, wherein the InfluxDB instance executes two checks of checking and calling a/ping node of the InfluxDB and a Bufferson judgment node, wherein the data which is not recovered by the node is checked twice, and the two checks are successful and then return to success.
The reverse proxy unit is supported by Nginx to limit the number of HTTP requests that a client can make per unit time.
In the reverse proxy unit, reasonable load distribution is realized by passing all the traffic through Nginx and Nginx http limit req module.
Some clients have extreme access patterns and it is desirable to ensure reasonable load distribution to avoid clustering problems. By passing all traffic through Nginx and Nginx http limit req module. Therefore, the stability of the cluster in the extreme access mode can be ensured to a greater extent.
The index is written to any number of InfluxDB nodes and queries are distributed among all nodes to provide high availability services. If a tool can be built to run reliable health checks on a single node, a standard load balancer is sufficient to solve the latter. For the former, we must establish a mechanism to forward the write or copy data.
Wherein techniques for adding high availability and failover to InfluxDB include:
1. the problem of high write availability is solved by using indexes repeatedly written into a plurality of independent nodes;
2. resolving temporary faults with cache area payloads;
3. the problem of permanent faults is solved by utilizing backup restoration and a cache area effective load;
4. the traffic peaking problem is addressed with global and single database rate limiting.
A Bufferson unit, a health check unit, and a reverse proxy unit are added to illustrate the monitoring of the stack storage tier supported by infiluxdb.
The method for realizing the InfluxDB high-availability cluster is realized by the InfluxDB high-availability cluster realization system. The index is written into InfluxDB nodes of any number, query is distributed among all the nodes, the Bufferson unit is used for asynchronous HTTP proxy of internal buffering, the health state checking unit is used for monitoring the status of each node, the InfluxDB nodes are automatically deleted in a traversing mode when data is recovered, and the reverse proxy unit is supported by Nginx so as to limit the number of HTTP requests which can be sent by a client in unit time.
When a timing task is operated, using Rsync to backup data at a timing, generating a temporary fault, continuously extracting data from the buffer by using a buffer-recovery, delivering the data to operate when the node is available again, starting an instance, adding the instance to the buffer, restoring the backup, starting the InfluxDB, and beginning transferring a request for recovering the backup in the buffer by using the buffer.
The above-described embodiments are merely preferred embodiments of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.
Claims (8)
1. An InfluxDB high-availability cluster implementation system is characterized in that: the system comprises a buffer son unit, a health state checking unit and a reverse proxy unit;
the buffer son unit is used for an asynchronous HTTP proxy of internal buffering;
the health state checking unit is used for monitoring the profile state of each node and automatically traversing and deleting the InfluxDB node when recovering data;
the reverse proxy unit is supported by Nginx to limit the number of HTTP requests that a client can make per unit time.
2. The InfluxDB high availability cluster implementation system of claim 1, wherein: the Bufferson unit provides temporary high-availability storage using queues, and provides a simple proxy function for asynchronously buffered HTTP processing.
3. The InfluxDB high availability cluster implementation system of claim 2, wherein: the buffer unit comprises a Replay-component and a recovery-component, wherein the Replay-component directly forwards the HTTP request to each upstream node, the failed request is placed into a buffer area, and the recovery-component continuously processes the queue and tries to transmit the buffer request.
4. The InfluxDB high availability cluster implementation system of claim 3, wherein: when a request is sent to the Bufferson unit, the health status check unit forwards to the infiluxdb instance through a load balancing mechanism.
5. The InfluxDB high availability cluster implementation system of claim 4, wherein: and running a local daemon program on each InfluxDB instance, wherein the InfluxDB instance executes two checks of checking and calling a/ping node of the InfluxDB and a Bufferson judgment node, wherein the data which is not recovered by the node is checked twice, and the two checks are successful and then return to success.
6. The InfluxDB high availability cluster implementation system of claim 5, wherein: in the reverse proxy unit, reasonable load distribution is realized by passing all the traffic through Nginx and Nginx http limit req module.
7. A method for realizing InfluxDB high-availability cluster is characterized in that: the method is implemented by the system for implementing the InfluxDB high-availability cluster according to any one of claims 1 to 6, indexes are written into InfluxDB nodes of any number, query is distributed among all the nodes, a buffer son unit is used for asynchronous HTTP proxy of internal buffering, a health state check unit is used for monitoring the profile state of each node, the InfluxDB nodes are automatically deleted in a traversing manner when data is recovered, and a reverse proxy unit is supported by Nginx to limit the number of HTTP requests which can be sent by a client in unit time.
8. The InfluxDB high availability cluster implementation method of claim 7, wherein: when a timing task is operated, using Rsync to backup data at a timing, generating a temporary fault, continuously extracting data from the buffer by using a buffer-recovery, delivering the data to operate when the node is available again, starting an instance, adding the instance to the buffer, restoring the backup, starting the InfluxDB, and beginning transferring a request for recovering the backup in the buffer by using the buffer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111287679.3A CN114116277A (en) | 2021-11-02 | 2021-11-02 | InfluxDB high-availability cluster implementation system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111287679.3A CN114116277A (en) | 2021-11-02 | 2021-11-02 | InfluxDB high-availability cluster implementation system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114116277A true CN114116277A (en) | 2022-03-01 |
Family
ID=80380305
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111287679.3A Pending CN114116277A (en) | 2021-11-02 | 2021-11-02 | InfluxDB high-availability cluster implementation system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114116277A (en) |
-
2021
- 2021-11-02 CN CN202111287679.3A patent/CN114116277A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7702947B2 (en) | System and method for enabling site failover in an application server environment | |
US8886796B2 (en) | Load balancing when replicating account data | |
US11075795B2 (en) | Arbitration method, apparatus, and system used in active-active data centers | |
US9110837B2 (en) | System and method for creating and maintaining secondary server sites | |
US10482104B2 (en) | Zero-data loss recovery for active-active sites configurations | |
US9785691B2 (en) | Method and apparatus for sequencing transactions globally in a distributed database cluster | |
CN103226502B (en) | A kind of data calamity is for control system and data reconstruction method | |
US10831741B2 (en) | Log-shipping data replication with early log record fetching | |
KR101315330B1 (en) | System and method to maintain coherence of cache contents in a multi-tier software system aimed at interfacing large databases | |
US8856091B2 (en) | Method and apparatus for sequencing transactions globally in distributed database cluster | |
Adya et al. | Thialfi: a client notification service for internet-scale applications | |
US20140108532A1 (en) | System and method for supporting guaranteed multi-point delivery in a distributed data grid | |
CN105493474B (en) | System and method for supporting partition level logging for synchronizing data in a distributed data grid | |
US9058304B2 (en) | Continuous workload availability between sites at unlimited distances | |
US9319267B1 (en) | Replication in assured messaging system | |
US20120278429A1 (en) | Cluster system, synchronization controlling method, server, and synchronization controlling program | |
US20120259968A1 (en) | Continuous availability between sites at unlimited distances | |
CN105069152A (en) | Data processing method and apparatus | |
CN109167690A (en) | A kind of restoration methods, device and the relevant device of the service of distributed system interior joint | |
CN117061535A (en) | Multi-activity framework data synchronization method, device, computer equipment and storage medium | |
CN108390919A (en) | A kind of message synchronization system and method for highly reliable two-node cluster hot backup | |
CN114116277A (en) | InfluxDB high-availability cluster implementation system and method | |
CN111414411A (en) | High availability database system | |
WO2008054388A1 (en) | System and method for network disaster recovery | |
Davis et al. | Pro SQL Server 2008 Mirroring |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |