CN115883547A

CN115883547A - High-availability NiFi deployment method and system based on DRBD

Info

Publication number: CN115883547A
Application number: CN202211423470.XA
Authority: CN
Inventors: 孙亮亮; 张栋; 李国涛; 胡清
Original assignee: Inspur Cloud Information Technology Co Ltd
Current assignee: Inspur Cloud Information Technology Co Ltd
Priority date: 2022-11-15
Filing date: 2022-11-15
Publication date: 2023-03-31

Abstract

The invention relates to the technical field of high-availability deployment, in particular to a high-availability deployment method of NiFi based on DRBD, which comprises the following steps: deploying a Pacemaker tube-replacing type cluster consisting of three computing instances, respectively installing NiFi and DRBD on a main instance and a standby instance, and installing arbitration equipment in a third instance; installing and configuring DRBD and NiFi; configuring a Pacemaker nanotube cluster service; the beneficial effects are that: the DRBD-based NiFi high-availability deployment method and the system improve the safety and reliability of NiFi configuration data, ensure that the data cannot be lost, and ensure the continuous operation and the service reliability of a NiFi task; the method has the advantages that simple and convenient operation and maintenance management is realized, the stability of the platform product is enhanced, and the high availability of the service product is guaranteed; the architecture has the characteristics of high-efficiency management, automatic fault transfer and stable service operation, and has the advantages of automatic main and standby synchronization, data safety and reliability guarantee and low storage cost; prevent split brain, avoid data damage and prevent system confusion.

Description

High-availability NiFi deployment method and system based on DRBD

Technical Field

The invention relates to the technical field of high-availability deployment, in particular to a high-availability deployment method and system for NiFi based on a DRBD.

Background

With the development of big data technology, distributed data storage systems are increasing, big data applications generally need to integrate a plurality of different data storage systems to build data warehouses of different applications, and ETL is used to describe the process of extracting (extract), converting (transform) and loading (load) data from a source data warehouse to a target data warehouse. In general, the ETL tool is used to take charge of scheduling control of the system running program and allocation of resources. Apache NiFi is an easy to use, powerful and reliable system for processing and distributing data.

In the prior art, in the development of a big data project, if an NIFI node goes down, loses connection or fails, the NIFI task is terminated, thereby affecting the service processing. If the processing of the stream file is not completed in the NiFi cluster mode, data loss can be caused if a disconnected node is down.

Disclosure of Invention

The present invention aims to provide a method and a system for deploying NiFi based on DRBD with high availability, so as to solve the problems in the background art.

In order to achieve the purpose, the invention provides the following technical scheme: a high-availability NiFi deployment method based on DRBD comprises the following steps:

deploying a Pacemaker tube-type cluster consisting of three computing instances, respectively installing NiFi and DRBD on a main instance and a standby instance, and installing arbitration equipment in a third instance;

installing and configuring DRBD and NiFi;

and configuring the Pacemaker nanotube cluster service.

Preferably, in the double-node cluster, the active nodes can be determined only by one voting; if two nodes lose connection with each other, there is a risk that a plurality of cluster nodes treat them as active nodes; the arbitration device serves as an arbiter and elects a unique running node in a voting mode; under the condition that the main and standby examples can not communicate, the arbitration node can communicate with the main and standby examples to achieve a majority vote mechanism.

Preferably, the Pacemaker is installed and configured to be responsible for the full life cycle management of software services in the cluster, the Pacemaker is installed on the main instance node and the standby instance node, the arbitration device of the Pacemaker is installed on the arbitration device node to perform Pacemaker-managed cluster configuration, the three calculation instance nodes are accessed into the node resource management of the Pacemaker, and the node state is online.

Preferably, the DRBD is initialized and configured, DRBD copying from the main instance node to the standby instance node is set, a diskless arbitration mode for configuring the DRBD is installed at an arbitration node, the arbitration setting needs at least three DRBD nodes, and after the DRBD setting is completed, the disk for data synchronization of the DRBD is mounted to the data storage file system directory of the NiFi, so that the data of the NiFi can be synchronized to the standby instance node through the DRBD.

Preferably, the cluster services DRBD, niFi, and VIP are accessed into the resource management configuration of the Pacemaker according to the resource access specification requirement of the Pacemaker, and are selected by the Pacemaker to uniformly manage and schedule the instance nodes of the DRBD, niFi, and VIP.

A kind of NiFi based on DRBD highly available deploys the system, this system is by deploying module, building module and managing the module to form;

the deployment module is used for deploying a Pacemaker tube-type cluster consisting of three computing instances, installing NiFi and DRBD on the main instance and the standby instance respectively, and installing arbitration equipment in the third instance;

the building module is used for installing and configuring the DRBD and the NiFi;

and the management module is used for configuring the Pacemaker nano-tube cluster service.

Preferably, in the deployment module, in the dual-node cluster, the active node can be determined only by one voting; if two nodes lose connection with each other, there is a risk that a plurality of cluster nodes treat them as active nodes; the arbitration device serves as an arbiter and elects a unique running node in a voting mode; under the condition that the main and standby examples can not communicate, the arbitration node can communicate with the main and standby examples to achieve a majority vote mechanism.

Preferably, in the deployment module, a deployment module is installed and configured with a Pacemaker and is responsible for the full life cycle management of software services in the cluster, the Pacemaker is installed on the active and standby instance nodes, the arbitration device of the Pacemaker is installed on the arbitration device node, the configuration of the Pacemaker-managed cluster is performed, the three calculation instance nodes are accessed into the node resource management of the Pacemaker, and the node state is online.

Preferably, in the building module, the DRBD is initialized and configured, DRBD copying from the main instance node to the standby instance node is set, a diskless arbitration mode for configuring the DRBD is installed at the arbitration node, at least three DRBD nodes are required for arbitration setting, and after the DRBD setting is completed, the disk for data synchronization of the DRBD is mounted to the directory of the data storage file system of the NiFi, so that the data of the NiFi can be synchronized to the standby instance node through the DRBD.

Preferably, in the management module, the cluster services DRBD, niFi, and VIP are accessed to the resource management configuration of the Pacemaker according to the resource access specification requirement of the Pacemaker, and the Pacemaker manages and schedules DRBD, niFi, and VIP instance node selection in a unified manner.

Compared with the prior art, the invention has the beneficial effects that:

the DRBD-based NiFi high-availability deployment method and the system improve the safety and reliability of NiFi configuration data, ensure that the data cannot be lost, and ensure the continuous operation and the service reliability of a NiFi task; the simple and convenient operation and maintenance management is realized, the stability of a platform product is enhanced, and the high availability of a service product is ensured; the framework has the characteristics of high-efficiency management, automatic fault transfer and stable service operation, and has the advantages of automatic main and standby synchronization, data safety and reliability guarantee and low storage cost; prevent split brain, avoid data damage and prevent system confusion.

Drawings

FIG. 1 is a schematic diagram of the method of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clear and fully described, embodiments of the present invention are further described in detail below with reference to the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of some embodiments of the invention and are not limiting of the invention, and that all other embodiments obtained by those of ordinary skill in the art without the exercise of inventive faculty are within the scope of the invention.

Example one

Referring to fig. 1, the present invention provides a technical solution: a high-availability NiFi deployment method based on DRBD comprises the following steps:

installing and configuring DRBD and NiFi;

and configuring the Pacemaker nanotube cluster service.

Specific operations are, 1) cluster environment preparation:

a. a Pacemaker surrogate tubular cluster consisting of three compute instances is deployed. NiFi and DRBD are installed on the main instance and the standby instance, respectively. In a third example, an arbitration device is installed.

b. In the cluster, each node will vote to select the active node it considers ideal, i.e. the node running NiFi. In a dual-node cluster, active nodes can be determined by only one voting. In this case, clustering behavior may lead to split-brain (split-brain) problems or outages. The split brain problem occurs when both nodes gain control, as only one vote is required in a two node scenario. If two nodes lose connection with each other, there is a risk that a plurality of cluster nodes treat them as active nodes.

This can be avoided by configuring the arbitration device. The arbitration device acts as an arbitrator and elects the only operation node by means of voting. Under the condition that the main and standby examples can not communicate, the arbitration node can communicate with the main and standby examples to achieve a majority vote mechanism and avoid split brain.

2) Mounting and configuring a Pacemaker:

the Pacemaker realizes fault detection and resource recovery of node and resource levels, thereby ensuring high availability of cluster services to the maximum extent. From a logical function, the placemaker is responsible for the full-life-cycle management of the software services in the cluster, driven by the resource rules defined by the cluster administrator, and the management even includes the whole software system and the interaction of the software systems with each other. The Pacemaker can manage clusters of any size in practical application, and because the Pacemaker has a strong resource dependency model, a cluster administrator can accurately describe and express the relationship among cluster resources (including the relationship such as the sequence and the position of the resources).

And installing a Pacemaker on the active and standby instance nodes, and installing an arbitration device of the Pacemaker on the arbitration device node. And carrying out the configuration of the Pacemaker agent cluster. And accessing the three calculation example nodes into the node resource management of the Pacemaker, wherein the node state is online.

3) Installation configuration DRBD and NiFi:

and performing initialization configuration on the DRBD, and setting DRBD copying of the DRBD from the main instance node to the standby instance node. And installing a diskless arbitration mode for configuring the DRBD at the arbitration node. The arbitration setup requires at least three DRBD nodes, but DRBD replication requires only two nodes, so an unstored (diskless) arbitration device can be built at the third node.

And after the DRBD is set, mounting the data synchronous disk of the DRBD under a data storage file system directory of the NiFi. So that the data of the NiFi can be synchronized to the standby instance node through the DRBD.

4) Configuring a Pacemaker nanotube cluster service:

and accessing the cluster services DRBD, niFi and VIP into the resource management configuration of the Pacemaker according to the resource access specification requirements of the Pacemaker, and uniformly managing and scheduling the selection of instance nodes of the DRBD, the NiFi and the VIP by the Pacemaker to ensure that the services are uniformly scheduled to operate at the same node. The VIP is mainly used for network address translation, network fault tolerance and mobility. In order to improve the high availability of external services of the system, the high availability configuration is carried out by adopting a main standby mode. A VIP is configured to connect the active and standby example nodes, when the main node is down, the VIP floats to the standby node and continues to provide services, so that the NiFi services are provided to the outside in a unified mode, single-point failures are prevented, and service availability is improved.

Thus, a high-availability deployment scheme of the NiFi service is formed by taking the Pacemaker as a cluster management tool, combining the DRBD and the VIP and introducing a third arbitration device node to prevent brain fragmentation. The accuracy and safety of the NiFi service data are guaranteed through the DRBD, the maximum availability of resources is guaranteed through automatic transfer of the service fault through the Pacemaker, operation and maintenance management is facilitated, and the high availability of the service is improved by combining with the VIP guarantee NiFi service access unified entry.

Example two

the deployment module is used for deploying a Pacemaker tube-type cluster consisting of three computing instances, installing NiFi and DRBD on the main instance and the standby instance respectively, and installing arbitration equipment in the third instance; in the double-node cluster, the active nodes can be determined only by voting once; if two nodes lose connection with each other, there is a risk that a plurality of cluster nodes treat them as active nodes; the arbitration device serves as an arbiter and elects a unique running node in a voting mode; under the condition that the main and standby examples can not communicate, the arbitration node can communicate with the main and standby examples to achieve a mechanism of majority vote; the method comprises the steps of installing and configuring a Pacemaker to be responsible for full life cycle management of software services in a cluster, installing the Pacemaker on main and standby example nodes, installing arbitration equipment of the Pacemaker on arbitration equipment nodes, carrying out Pacemaker-managed cluster configuration, and accessing three calculation example nodes into node resource management of the Pacemaker, wherein the node state is online;

the building module is used for installing and configuring the DRBD and the NiFi; after the DRBD is set, mounting a disk with synchronous data of the DRBD to a data storage file system directory of the NiFi, so that the data of the NiFi can be synchronized to a standby instance node through the DRBD;

the management module is used for configuring the Pacemaker nano-tube cluster service; and accessing the cluster services DRBD, niFi and VIP into the resource management configuration of the Pacemaker according to the resource access specification requirements of the Pacemaker, and uniformly managing and scheduling the DRBD, niFi and VIP instance node selection by the Pacemaker.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A high-availability deployment method of NiFi based on DRBD is characterized by comprising the following steps:

installing and configuring DRBD and NiFi;

and configuring the Pacemaker nanotube cluster service.

2. The method of claim 1, wherein the method comprises: in the double-node cluster, the active nodes can be determined only by voting once; if two nodes lose connection with each other, there is a risk that a plurality of cluster nodes treat them as active nodes; the arbitration device serves as an arbiter and elects a unique running node in a voting mode; under the condition that the main and standby examples can not communicate, the arbitration node can communicate with the main and standby examples to achieve a majority vote mechanism.

3. The method of claim 2, wherein the method comprises: the method comprises the steps of installing and configuring a Pacemaker to be responsible for full life cycle management of software services in a cluster, installing the Pacemaker on a main instance node and a standby instance node, installing arbitration equipment of the Pacemaker on the arbitration equipment nodes to conduct Pacemaker-managed cluster configuration, and accessing three calculation instance nodes into node resource management of the Pacemaker, wherein the node state is online.

4. The method of claim 1, wherein the method comprises: the DRBD is initialized and configured, DRBD copying from a main instance node to a standby instance node is set, a diskless arbitration mode for configuring the DRBD is installed on an arbitration node, the arbitration setting at least needs three DRBD nodes, and after the DRBD setting is completed, a disk of the DRBD with data synchronization is mounted to a data storage file system directory of the NiFi, so that the data of the NiFi can be synchronized to the standby instance node through the DRBD.

5. The method of claim 1, wherein the method comprises: and accessing the cluster services DRBD, niFi and VIP into the resource management configuration of the Pacemaker according to the resource access specification requirement of the Pacemaker, and uniformly managing and scheduling the instance nodes of the DRBD, the NiFi and the VIP by the Pacemaker for selection.

6. A DRBD-based NiFi high availability deployment system as claimed in any of the above claims 1-5, characterized by: the system is composed of a deployment module, a building module and a management module;

7. The DRBD-based NiFi high availability deployment system of claim 6, wherein: in the deployment module, in the double-node cluster, the active nodes can be determined only by one voting; if two nodes lose connection with each other, there is a risk that a plurality of cluster nodes treat them as active nodes; the arbitration device serves as an arbiter and elects a unique running node in a voting mode; under the condition that the main and standby examples can not communicate, the arbitration node can communicate with the main and standby examples to achieve a majority vote mechanism.

8. The DRBD-based NiFi high availability deployment system of claim 6, wherein: the deployment module is provided with a Pacemaker for managing the whole life cycle of software services in the cluster, the Pacemaker is arranged on the main instance node and the standby instance node, the arbitration device of the Pacemaker is arranged on the arbitration device node, the Pacemaker manages the cluster configuration by the proxy, the three calculation instance nodes are accessed into the node resource management of the Pacemaker, and the node state is online.

9. The DRBD-based NiFi high availability deployment system of claim 6, wherein: in the building module, the DRBD is initialized and configured, DRBD copying from a main instance node to a standby instance node is set, a diskless arbitration mode for configuring the DRBD is installed at an arbitration node, the arbitration setting at least needs three DRBD nodes, and after the DRBD setting is completed, a disk of the DRBD with data synchronization is mounted to a data storage file system directory of the NiFi, so that the data of the NiFi can be synchronized to the standby instance node through the DRBD.

10. The DRBD-based NiFi high availability deployment system of claim 6, wherein: in the management module, cluster services DRBD, niFi and VIP are accessed into the resource management configuration of the Pacemaker according to the resource access specification requirements of the Pacemaker, and the Pacemaker uniformly manages and schedules the instance node selection of the DRBD, niFi and VIP.