CN114116277A - InfluxDB high-availability cluster implementation system and method - Google Patents

InfluxDB high-availability cluster implementation system and method Download PDF

Info

Publication number
CN114116277A
CN114116277A CN202111287679.3A CN202111287679A CN114116277A CN 114116277 A CN114116277 A CN 114116277A CN 202111287679 A CN202111287679 A CN 202111287679A CN 114116277 A CN114116277 A CN 114116277A
Authority
CN
China
Prior art keywords
influxdb
unit
buffer
node
availability cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111287679.3A
Other languages
Chinese (zh)
Inventor
高翔宇
曹博
吴楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Cloud Information Technology Co Ltd filed Critical Inspur Cloud Information Technology Co Ltd
Priority to CN202111287679.3A priority Critical patent/CN114116277A/en
Publication of CN114116277A publication Critical patent/CN114116277A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0727Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1461Backup scheduling policy
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1464Management of the backup or restore process for networked environments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process
    • G06F11/1469Backup restoration techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3034Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3433Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment for load management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Abstract

The invention discloses a system and a method for realizing an InfluxDB high-availability cluster, and belongs to the technical field of big data storage. The InfluxDB high-availability cluster implementation system comprises a buffer son unit, a health state check unit and a reverse proxy unit; the buffer son unit is used for an asynchronous HTTP proxy of internal buffering; the health state checking unit is used for monitoring the profile state of each node and automatically traversing and deleting the InfluxDB node when recovering data; the reverse proxy unit is supported by Nginx to limit the number of HTTP requests that a client can make per unit time. The InfluxDB high-availability cluster implementation system writes the index into any number of InfluxDB nodes, distributes the inquired high-availability service among all the nodes, and has good popularization and application values.

Description

InfluxDB high-availability cluster implementation system and method
Technical Field
The invention relates to the technical field of big data storage, and particularly provides a system and a method for realizing an InfluxDB high-availability cluster.
Background
Currently, after the version InfluxDBv0.9, a user cannot create an InfluxDB high-availability cluster from an open-source free version. Only commercial versions are currently available with infiluxdb Enterprise. This causes a number of inconveniences for the infiluxdb user, especially in professional settings, who think it is the company behind infiluxdb-infiluxata that tries to leverage OSS solutions to gain profits.
This situation is also immaterial to the InfluxData company, but the cost of commercial versions of InfluxDB is really a minor burden for many users. This is a significant cost to businesses or organizations that rely heavily on InfluxDB.
Although the solution of infilux Relay was later released by infiluxdata, it was not widely accepted because of the many unsolved problems. Therefore, a solution is needed to truly realize a high available architecture and solve the problems existing in the market.
Disclosure of Invention
The technical task of the present invention is to provide a system and a method for implementing an infiluxdb high-availability cluster, which write an index into any number of infiluxdb nodes and distribute a query high-availability service among all the nodes, in view of the above-mentioned existing problems.
In order to achieve the purpose, the invention provides the following technical scheme:
an InfluxDB high-availability cluster implementation system comprises a buffer son unit, a health state check unit and a reverse proxy unit;
the buffer son unit is used for an asynchronous HTTP proxy of internal buffering;
the health state checking unit is used for monitoring the profile state of each node and automatically traversing and deleting the InfluxDB node when recovering data;
the reverse proxy unit is supported by Nginx to limit the number of HTTP requests that a client can make per unit time.
Preferably, the Bufferson unit provides temporary high-availability storage using queues, and provides a simple proxy function for asynchronously buffered HTTP processing.
Preferably, the buffer unit comprises a Replay-component and a Recover-component, the Replay-component forwards the HTTP request directly to each upstream node, puts the failed request into a buffer, and the Recover-component continuously processes the queue and attempts to deliver the buffered request.
Preferably, when the request is sent to the Bufferson unit, the health status check unit forwards to the infixdb instance by means of a load balancing mechanism.
InfluxDB support/ping, which may facilitate verifying whether a service is running, but actually needs to ensure that it does not process any queries when a node recovers from a temporary failure and the cached data is still refreshing. It is not possible to rely entirely on the call/ping interface to verify that the node is healthy. The health check is run locally and the load balancer is used to put the node in an on/off state for queries.
Preferably, a local daemon is run on each InfluxDB instance, the InfluxDB instance performs two checks of checking and calling a/ping node of the InfluxDB and a Bufferson judgment node that data is not recovered, and the two checks are successful and then return to success.
Preferably, in the reverse proxy unit, reasonable load distribution is realized by passing all traffic through Nginx and Nginx http limit req module.
Some clients have extreme access patterns and it is desirable to ensure reasonable load distribution to avoid clustering problems. By passing all traffic through Nginx and Nginx http limit req module. Therefore, the stability of the cluster in the extreme access mode can be ensured to a greater extent.
The index is written to any number of InfluxDB nodes and queries are distributed among all nodes to provide high availability services. If a tool can be built to run reliable health checks on a single node, a standard load balancer is sufficient to solve the latter. For the former, we must establish a mechanism to forward the write or copy data.
Techniques for adding high availability and failover to InfluxDB include:
1. the problem of high write availability is solved by using indexes repeatedly written into a plurality of independent nodes;
2. resolving temporary faults with cache area payloads;
3. the problem of permanent faults is solved by utilizing backup restoration and a cache area effective load;
4. the traffic peaking problem is addressed with global and single database rate limiting.
A Bufferson unit, a health check unit, and a reverse proxy unit are added to illustrate the monitoring of the stack storage tier supported by infiluxdb.
The invention relates to an InfluxDB high-availability cluster realization method, which is realized by the InfluxDB high-availability cluster realization system, indexes are written into any number of InfluxDB nodes, query is distributed among all the nodes, a buffer unit is used for asynchronous HTTP proxy of internal buffering, a health state check unit is used for monitoring the status of each node, the InfluxDB nodes are automatically deleted in a traversing manner when data is recovered, and a reverse proxy unit is supported by Nginx so as to limit the number of HTTP requests which can be sent by a client in unit time.
Preferably, when the timing task is operated, the Rsync is used for timing backup of data, a temporary fault occurs, the Bufferson-recovery continuously extracts data from the buffer area, the node is delivered to operate when the node is available again, the instance is started, the instance is added to the Bufferson, the backup is restored, the infiluxdb is started, and the Bufferson starts to transmit a request for restoring the backup in the buffer area.
Compared with the prior art, the method for realizing the InfluxDB high-availability cluster has the following outstanding beneficial effects: the method for realizing the InfluxDB high-availability cluster realizes the InfluxDB high-availability cluster, increases the stability and the safety, and has good popularization and application values.
Drawings
Fig. 1 is a topology diagram of an infiluxdb high availability cluster implementation system according to the present invention.
Detailed Description
The system and method for implementing an infiluxdb high-availability cluster according to the present invention will be described in detail with reference to the accompanying drawings and embodiments.
Examples
As shown in fig. 1, the system for implementing an infiluxdb high-availability cluster of the present invention includes a Bufferson unit, a health status check unit, and a reverse proxy unit.
The Bufferson unit is used for internally buffered asynchronous HTTP proxy.
The buffer unit includes a Replay-component that forwards HTTP requests directly to each upstream node, places failed requests into a buffer, and a Recover-component that continuously processes the queue and attempts to deliver buffered requests. The Bufferson unit uses queues to provide temporary high availability storage, providing a simple proxy function for asynchronously buffered HTTP processing.
The health state checking unit is used for monitoring the profile state of each node and automatically traversing and deleting the InfluxDB node when the data is recovered.
When a request is sent to the Bufferson unit, the health status check unit forwards to the infiluxdb instance through a load balancing mechanism.
InfluxDB support/ping, which may facilitate verifying whether a service is running, but actually needs to ensure that it does not process any queries when a node recovers from a temporary failure and the cached data is still refreshing. It is not possible to rely entirely on the call/ping interface to verify that the node is healthy. The health check is run locally and the load balancer is used to put the node in an on/off state for queries. And running a local daemon program on each InfluxDB instance, wherein the InfluxDB instance executes two checks of checking and calling a/ping node of the InfluxDB and a Bufferson judgment node, wherein the data which is not recovered by the node is checked twice, and the two checks are successful and then return to success.
The reverse proxy unit is supported by Nginx to limit the number of HTTP requests that a client can make per unit time.
In the reverse proxy unit, reasonable load distribution is realized by passing all the traffic through Nginx and Nginx http limit req module.
Some clients have extreme access patterns and it is desirable to ensure reasonable load distribution to avoid clustering problems. By passing all traffic through Nginx and Nginx http limit req module. Therefore, the stability of the cluster in the extreme access mode can be ensured to a greater extent.
The index is written to any number of InfluxDB nodes and queries are distributed among all nodes to provide high availability services. If a tool can be built to run reliable health checks on a single node, a standard load balancer is sufficient to solve the latter. For the former, we must establish a mechanism to forward the write or copy data.
Wherein techniques for adding high availability and failover to InfluxDB include:
1. the problem of high write availability is solved by using indexes repeatedly written into a plurality of independent nodes;
2. resolving temporary faults with cache area payloads;
3. the problem of permanent faults is solved by utilizing backup restoration and a cache area effective load;
4. the traffic peaking problem is addressed with global and single database rate limiting.
A Bufferson unit, a health check unit, and a reverse proxy unit are added to illustrate the monitoring of the stack storage tier supported by infiluxdb.
The method for realizing the InfluxDB high-availability cluster is realized by the InfluxDB high-availability cluster realization system. The index is written into InfluxDB nodes of any number, query is distributed among all the nodes, the Bufferson unit is used for asynchronous HTTP proxy of internal buffering, the health state checking unit is used for monitoring the status of each node, the InfluxDB nodes are automatically deleted in a traversing mode when data is recovered, and the reverse proxy unit is supported by Nginx so as to limit the number of HTTP requests which can be sent by a client in unit time.
When a timing task is operated, using Rsync to backup data at a timing, generating a temporary fault, continuously extracting data from the buffer by using a buffer-recovery, delivering the data to operate when the node is available again, starting an instance, adding the instance to the buffer, restoring the backup, starting the InfluxDB, and beginning transferring a request for recovering the backup in the buffer by using the buffer.
The above-described embodiments are merely preferred embodiments of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.

Claims (8)

1. An InfluxDB high-availability cluster implementation system is characterized in that: the system comprises a buffer son unit, a health state checking unit and a reverse proxy unit;
the buffer son unit is used for an asynchronous HTTP proxy of internal buffering;
the health state checking unit is used for monitoring the profile state of each node and automatically traversing and deleting the InfluxDB node when recovering data;
the reverse proxy unit is supported by Nginx to limit the number of HTTP requests that a client can make per unit time.
2. The InfluxDB high availability cluster implementation system of claim 1, wherein: the Bufferson unit provides temporary high-availability storage using queues, and provides a simple proxy function for asynchronously buffered HTTP processing.
3. The InfluxDB high availability cluster implementation system of claim 2, wherein: the buffer unit comprises a Replay-component and a recovery-component, wherein the Replay-component directly forwards the HTTP request to each upstream node, the failed request is placed into a buffer area, and the recovery-component continuously processes the queue and tries to transmit the buffer request.
4. The InfluxDB high availability cluster implementation system of claim 3, wherein: when a request is sent to the Bufferson unit, the health status check unit forwards to the infiluxdb instance through a load balancing mechanism.
5. The InfluxDB high availability cluster implementation system of claim 4, wherein: and running a local daemon program on each InfluxDB instance, wherein the InfluxDB instance executes two checks of checking and calling a/ping node of the InfluxDB and a Bufferson judgment node, wherein the data which is not recovered by the node is checked twice, and the two checks are successful and then return to success.
6. The InfluxDB high availability cluster implementation system of claim 5, wherein: in the reverse proxy unit, reasonable load distribution is realized by passing all the traffic through Nginx and Nginx http limit req module.
7. A method for realizing InfluxDB high-availability cluster is characterized in that: the method is implemented by the system for implementing the InfluxDB high-availability cluster according to any one of claims 1 to 6, indexes are written into InfluxDB nodes of any number, query is distributed among all the nodes, a buffer son unit is used for asynchronous HTTP proxy of internal buffering, a health state check unit is used for monitoring the profile state of each node, the InfluxDB nodes are automatically deleted in a traversing manner when data is recovered, and a reverse proxy unit is supported by Nginx to limit the number of HTTP requests which can be sent by a client in unit time.
8. The InfluxDB high availability cluster implementation method of claim 7, wherein: when a timing task is operated, using Rsync to backup data at a timing, generating a temporary fault, continuously extracting data from the buffer by using a buffer-recovery, delivering the data to operate when the node is available again, starting an instance, adding the instance to the buffer, restoring the backup, starting the InfluxDB, and beginning transferring a request for recovering the backup in the buffer by using the buffer.
CN202111287679.3A 2021-11-02 2021-11-02 InfluxDB high-availability cluster implementation system and method Pending CN114116277A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111287679.3A CN114116277A (en) 2021-11-02 2021-11-02 InfluxDB high-availability cluster implementation system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111287679.3A CN114116277A (en) 2021-11-02 2021-11-02 InfluxDB high-availability cluster implementation system and method

Publications (1)

Publication Number Publication Date
CN114116277A true CN114116277A (en) 2022-03-01

Family

ID=80380305

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111287679.3A Pending CN114116277A (en) 2021-11-02 2021-11-02 InfluxDB high-availability cluster implementation system and method

Country Status (1)

Country Link
CN (1) CN114116277A (en)

Similar Documents

Publication Publication Date Title
US7702947B2 (en) System and method for enabling site failover in an application server environment
US8886796B2 (en) Load balancing when replicating account data
US11075795B2 (en) Arbitration method, apparatus, and system used in active-active data centers
US9110837B2 (en) System and method for creating and maintaining secondary server sites
US10482104B2 (en) Zero-data loss recovery for active-active sites configurations
US9785691B2 (en) Method and apparatus for sequencing transactions globally in a distributed database cluster
CN103226502B (en) A kind of data calamity is for control system and data reconstruction method
US10831741B2 (en) Log-shipping data replication with early log record fetching
KR101315330B1 (en) System and method to maintain coherence of cache contents in a multi-tier software system aimed at interfacing large databases
US8856091B2 (en) Method and apparatus for sequencing transactions globally in distributed database cluster
Adya et al. Thialfi: a client notification service for internet-scale applications
US20140108532A1 (en) System and method for supporting guaranteed multi-point delivery in a distributed data grid
CN105493474B (en) System and method for supporting partition level logging for synchronizing data in a distributed data grid
US9058304B2 (en) Continuous workload availability between sites at unlimited distances
US9319267B1 (en) Replication in assured messaging system
US20120278429A1 (en) Cluster system, synchronization controlling method, server, and synchronization controlling program
US20120259968A1 (en) Continuous availability between sites at unlimited distances
CN105069152A (en) Data processing method and apparatus
CN109167690A (en) A kind of restoration methods, device and the relevant device of the service of distributed system interior joint
CN117061535A (en) Multi-activity framework data synchronization method, device, computer equipment and storage medium
CN108390919A (en) A kind of message synchronization system and method for highly reliable two-node cluster hot backup
CN114116277A (en) InfluxDB high-availability cluster implementation system and method
CN111414411A (en) High availability database system
WO2008054388A1 (en) System and method for network disaster recovery
Davis et al. Pro SQL Server 2008 Mirroring

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination