CN112905308B - High-availability deployment method for double computer rooms of es cluster - Google Patents

High-availability deployment method for double computer rooms of es cluster Download PDF

Info

Publication number
CN112905308B
CN112905308B CN202110495513.4A CN202110495513A CN112905308B CN 112905308 B CN112905308 B CN 112905308B CN 202110495513 A CN202110495513 A CN 202110495513A CN 112905308 B CN112905308 B CN 112905308B
Authority
CN
China
Prior art keywords
node
nodes
machine room
cluster
extracted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110495513.4A
Other languages
Chinese (zh)
Other versions
CN112905308A (en
Inventor
秦威伟
曾令华
龚建
胡沛勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Zhongbang Bank Co Ltd
Original Assignee
Wuhan Zhongbang Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Zhongbang Bank Co Ltd filed Critical Wuhan Zhongbang Bank Co Ltd
Priority to CN202110495513.4A priority Critical patent/CN112905308B/en
Publication of CN112905308A publication Critical patent/CN112905308A/en
Application granted granted Critical
Publication of CN112905308B publication Critical patent/CN112905308B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45575Starting, stopping, suspending or resuming virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45587Isolation or security of virtual machine instances
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention discloses a high-availability deployment method for es cluster double machine rooms, belongs to the field of computer service application research and development, and solves the problems that cluster deployment in the prior art cannot guarantee cluster data security, application availability and uninterrupted application service. The method comprises the steps of installing and deploying Es clusters in a main machine room A and a standby machine room B, respectively distributing copies of the same fragment in the main machine room A and the standby machine room B after installing and deploying, and setting parameters of the fragment and the copy of the fragment in different areas; installing and deploying the Nginx cluster in a main machine room A and a standby machine room B, and performing routing configuration; and realizing high-availability logic calling based on the configured main machine room A and the standby machine room B. The invention is used for realizing the es cluster dual-computer room deployment.

Description

High-availability deployment method for double computer rooms of es cluster
Technical Field
A high-availability deployment method for es cluster double machine rooms is used for achieving es cluster double machine room deployment and belongs to the field of computer business application research and development.
Background
In daily payment business, affected by multidimensional factors such as supervision, risk and cost, business data and log data of a bank system need to be stored for a long time, business rules are changeable, statistical analysis data become more difficult, and business needs to perform statistical analysis on historical data to develop innovative business. System operation and maintenance personnel need to analyze system log data to judge the system operation and maintenance condition; the analysis is mostly realized by using an es cluster.
However, most of the existing es clusters are deployed in the same machine room, and can meet the high availability and data security of general requirements, but the traditional es cluster deployment mode cannot meet the requirements well in the face of the harsh security of the financial industry and the property of disaster recovery across machine rooms in different places. According to the characteristics of the es cluster, the efficiency of writing and querying data is reduced when the es cluster is deployed across machine rooms, and the retrieval requirements of high concurrency and low time delay cannot be met. That is, the data is deployed in the same machine room, and when the data of the machine room is lost (for example, when the machine room has a natural disaster), the data cannot be recovered, and the service will be interrupted. If the cluster is simply deployed in the two different places, the cluster data writing and query efficiency is reduced.
CN202010099024.2 discloses a disaster recovery method and device for dual machine rooms, but the following technical problems exist:
data backup integrity is not reflected, namely, all data of the machine room A is lost, and whether the data can be recovered or not is judged, namely, the cluster data safety, the application availability and the application service are not ensured to be uninterrupted under the condition that one machine room is completely damaged;
when a fault of a certain node server is detected, if the corresponding master node is abnormally added into the cluster, the high availability is questioned, and the high availability of the application cannot be guaranteed.
Disclosure of Invention
Aiming at the problems of the research, the invention aims to provide a high-availability deployment method for double computer rooms of an es cluster, and solves the problems that cluster deployment in the prior art cannot guarantee cluster data security, application availability and uninterrupted application service.
In order to achieve the purpose, the invention adopts the following technical scheme:
a high-availability deployment method for double computer rooms of an es cluster comprises the following steps:
step 1: installing and deploying Es clusters in a main machine room A and a standby machine room B, and after installation and deployment are completed, performing parameter setting that copies of the same fragment are respectively distributed in the main machine room A and the standby machine room B, and the fragment and the copy of the fragment are located in different areas;
step 2: installing and deploying the Nginx cluster in a main machine room A and a standby machine room B, and performing routing configuration;
and step 3: and realizing high-availability logic calling based on the configured main machine room A and the standby machine room B.
Further, the specific steps of step 1 are:
step 1.1: 6 es nodes are installed and deployed in the main machine room A, and the 6 es nodes are Anode1-Anode6;
step 1.2: 3 es nodes are installed and deployed in a standby machine room B, wherein the 3 es nodes are Bnode1-Bnode3;
step 1.3: setting a sender node and a data node: modifying the configuration of the nodes of the ANode1, the ANode4 and the Bnode1, namely setting the parameter node in the nodes of the ANode1, the ANode4 and the Bnode1 as true, wherein the nodes of the ANode1, the ANode4 and the Bnode1 are master nodes after modification; modifying the configuration of the nodes of the Anode2, the Anode3, the Anode5, the Anode6, the Bnode2 and the Bnode3, namely setting parameter nodes data in the nodes of the Anode2, the Anode3, the Anode5, the Anode6, the Bnode2 and the Bnode3 as true, and after modification, the nodes of the Anode2, the Anode3, the Anode5, the Anode6, the Bnode2 and the Bnode3 are data nodes;
step 1.4: setting Es cluster distribution parameters according to regions, namely setting a parameter cluster which determines whether the cluster is distributed according to regions, and setting a parameter cluster which determines forced regions of Es cluster distribution, namely, setting a parameter cluster which determines forced regions of Es cluster, namely, routing, allocation, aspect, value, = z1, z2 and z3, so that only one copy of the same shard can be stored in one region, the shard and the copy are located in different regions, and the copy can be guaranteed not to be allocated across regions, wherein the shard represents a fragment used for storing data in Es cluster, each fragment has two copies, and allocation represents node migration;
step 1.5: after setting the Es cluster according to the area distribution parameters, dividing the serial numbers for each node: namely, node.attr.zone parameters in each node are set, node.attr.zone in the node1-ANode3 node is set as z1, node.attr.zone in the node ANode4-ANode6 node is set as z2, and node.attr.zone in the node Bnode1-Bnode3 node is set as z3, wherein node.attr.zone represents a region division number parameter, so that different nodes are set as the same region number, and the corresponding nodes are assigned to the same region.
Step 1.6: after the serial numbers are divided, setting that an Es cluster can provide services to the outside only by at least two master nodes, and adjusting the overtime time of the Es cluster for discovering other nodes;
step 1.7: if firewalls exist in the two machine rooms of the main machine room A and the standby machine room B or a network policy causes tcp to be interrupted in a certain time, setting a parameter network.
Further, the specific steps of step 2 are:
step 2.1: arranging an Nginx cluster at each of a main machine room A and a standby machine room B;
and 2.2, after deployment, configuring the routing weights of the Nginx clusters in the main machine room A and the standby machine room B into 2: 1.
Further, the step 3 specifically comprises:
based on the configured main machine room A and the standby machine room B:
when 1 data node in the main computer room A is down: extracting a node from the data nodes of the main computer room A to simulate the downtime condition, namely turning off the extracted node or turning off the virtual machine where the extracted node is located, and performing normal application query service;
when one node in each of two areas in the main computer room A is down: extracting two nodes in two different areas in the main computer room A to simulate the downtime condition, namely turning off the extracted nodes or turning off the virtual machines where the extracted nodes are located, and performing normal application query service;
when two data nodes in an area in the main computer room a are all down: two machines in one area are selected optionally in the main computer room A, the downtime condition is simulated, namely the extracted nodes are turned off or the virtual machines where the extracted nodes are located are turned off, and the application query service is carried out normally;
when a data node in each of the three areas is down: in the three areas, one data node is extracted to simulate the downtime condition, namely the extracted node is turned off or the virtual machine where the extracted node is located is turned off, and the application query service is carried out normally;
when the data nodes in the area corresponding to the standby machine room B are all down: extracting two data nodes of the standby machine room B to simulate the downtime condition, namely turning off the extracted nodes or turning off the virtual machines where the extracted nodes are located, and performing application query service normally;
when the master node in the area z1 is down: extracting the downtime of the node Anode1 in the main room A, namely turning off the extracted node or turning off the virtual machine where the extracted node is located, and performing normal application query service;
when the master node in the area z2 is down: extracting the downtime of the node Anode4 in the main room A, namely turning off the extracted node or turning off the virtual machine where the extracted node is located, and performing normal application query service;
when the master node in the area z3 is down: turning off the node Bnode1 in the standby machine room B, namely turning off the extracted node or turning off the virtual machine where the extracted node is located, and performing application query service normally;
when all the nodes in the standby machine room B are down: extracting the node Bnode1-Bnode3 in the standby machine room B, namely turning off the extracted node or turning off the virtual machine where the extracted node is located, and performing application query service normally;
when all the nodes in the main machine room A are down: and if the service needs to be provided, temporarily modifying one data node in the standby machine room into a generator node, namely setting the node parameter of one data node to true, otherwise, providing the service and reporting errors.
Compared with the prior art, the invention has the beneficial effects that:
1. in the invention, the copies of the same fragment are respectively distributed in the main machine room A and the standby machine room B, and the fragments and the copies of the fragment are positioned in different areas, thereby ensuring the integrity and the safety of data; the method ensures the recoverability and integrity of data under the conditions that all nodes of the machine room A (main machine room A) or the machine room B (main machine room B) are down and the data is lost due to emergencies such as natural disasters of the machine room A/B, i.e. the query efficiency is ensured, and meanwhile, the data is stored in different places, so that the data safety is improved;
2. the weight of Nginx forwarded to the service is changed into a main machine room A, namely a standby machine room B =2:1, and the query request of about 2/3 can be processed in the main machine room A by changing the weight, so that the query efficiency of the cluster under the normal condition can be ensured, and the cluster deployment is according to the proportion of 2:1 of the main machine room and the standby machine room;
3. based on the deployment and parameter setting of the invention, the parameter index of each node in the Es cluster is changed into _ routing.required = true, the query can be carried out according to a routing mechanism, namely, the query of the fragments is assigned according to the stored document id instead of traversing all the fragments, specifically, the query is routed through doc _ id of each record, all the shards do not need to be queried once, and each query can be routed to one shard or a copy node thereof through simple configuration, so that the problem that the query necessarily spans a machine room every time can be avoided;
4. based on the deployment and the setting of the invention, the filter can be used during the query, thereby avoiding scoring, saving time (namely screening out the desired data without influencing the scoring), and improving the response time of each query request;
5. the invention is configured to force to partition cluster, route, Allocation, enterprise, force, zone, values = z1, z2, z3, by this parameter, it can be ensured that three segments (one segment has two copies) of the main machine room (main machine room A and standby machine room B), the same copy will not fall on the virtual machine node in the same area, so as to ensure that the performance will not be affected during Allocation (node migration) when part of the nodes are down, and at the same time, in order to avoid node down, the corresponding shredders will be migrated to other nodes in the same area (the shredders stored on the down nodes will be migrated to other nodes), the migration waiting time of "8h" (namely, after a certain node in the Es cluster is down, the unassigned segments are distributed after delaying for 8 h), and the online management of the nodes will not affect the node recovery time;
6. according to the invention, Es cluster region partition setting is introduced, routing weight is set by inquiring routing setting and filtering scoring mechanism and matching with Nginx, so that cluster advantages are exerted to the maximum extent to meet the requirements of high availability, high concurrency and low delay, data retrieval efficiency and data security are greatly improved, and database use cost is reduced.
Drawings
Fig. 1 is a schematic diagram of Es cluster cross-room deployment in the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific embodiments.
The es cluster double-computer room high-availability deployment method comprises the following steps:
step 1: installing and deploying Es clusters in a main machine room A and a standby machine room B, and after installation and deployment are completed, performing parameter setting that copies of the same fragment are respectively distributed in the main machine room A and the standby machine room B, and the fragment and the copy of the fragment are located in different areas;
the method comprises the following specific steps:
step 1.1: 6 es nodes are installed and deployed in the main machine room A, and the 6 es nodes are Anode1-Anode6;
step 1.2: 3 es nodes are installed and deployed in a standby machine room B, wherein the 3 es nodes are Bnode1-Bnode3;
step 1.3: setting a sender node and a data node: modifying the configuration of the nodes of the ANode1, the ANode4 and the Bnode1, namely setting the parameter node in the nodes of the ANode1, the ANode4 and the Bnode1 as true, wherein the nodes of the ANode1, the ANode4 and the Bnode1 are master nodes after modification; modifying the configuration of the nodes of the Anode2, the Anode3, the Anode5, the Anode6, the Bnode2 and the Bnode3, namely setting parameter nodes data in the nodes of the Anode2, the Anode3, the Anode5, the Anode6, the Bnode2 and the Bnode3 as true, and after modification, the nodes of the Anode2, the Anode3, the Anode5, the Anode6, the Bnode2 and the Bnode3 are data nodes;
step 1.4: setting Es cluster distribution parameters according to regions, namely setting a parameter cluster which determines whether the cluster is distributed according to regions, and setting a parameter cluster which determines forced regions of Es cluster distribution, namely, setting a parameter cluster which determines forced regions of Es cluster, namely, routing, allocation, aspect, value, = z1, z2 and z3, so that only one copy of the same shard can be stored in one region, the shard and the copy are located in different regions, and the copy can be guaranteed not to be allocated across regions, wherein the shard represents a fragment used for storing data in Es cluster, each fragment has two copies, and allocation represents node migration;
step 1.5: after setting the Es cluster according to the area distribution parameters, dividing the serial numbers for each node: namely, node.attr.zone parameters in each node are set, node.attr.zone in the node1-ANode3 node is set as z1, node.attr.zone in the node ANode4-ANode6 node is set as z2, and node.attr.zone in the node Bnode1-Bnode3 node is set as z3, wherein node.attr.zone represents a region division number parameter, so that different nodes are set as the same region number, and the corresponding nodes are assigned to the same region.
Step 1.6: after the serial numbers are divided, setting that an Es cluster can provide services to the outside only by at least two master nodes, and adjusting the overtime time of the Es cluster for discovering other nodes;
step 1.7: if firewalls exist in the two machine rooms of the main machine room A and the standby machine room B or a network policy causes tcp to be interrupted in a certain time, setting a parameter network.
Step 2: installing and deploying the Nginx cluster in a main machine room A and a standby machine room B, and performing routing configuration;
the method comprises the following specific steps:
step 2.1: arranging an Nginx cluster at each of a main machine room A and a standby machine room B;
and 2.2, after deployment, configuring the routing weights of the Nginx clusters in the main machine room A and the standby machine room B into 2: 1.
And step 3: and realizing high-availability logic calling based on the configured main machine room A and the standby machine room B.
The method specifically comprises the following steps:
based on the configured main machine room A and the standby machine room B:
when 1 data node in the main computer room A is down: extracting a node from the data nodes of the main computer room A to simulate the downtime condition, namely turning off the extracted node or turning off the virtual machine where the extracted node is located, and performing normal application query service;
when one node in each of two areas in the main computer room A is down: extracting two nodes in two different areas in the main computer room A to simulate the downtime condition, namely turning off the extracted nodes or turning off the virtual machines where the extracted nodes are located, and performing normal application query service;
when two data nodes in an area in the main computer room a are all down: two machines in one area are selected optionally in the main computer room A, the downtime condition is simulated, namely the extracted nodes are turned off or the virtual machines where the extracted nodes are located are turned off, and the application query service is carried out normally;
when a data node in each of the three areas is down: in the three areas, one data node is extracted to simulate the downtime condition, namely the extracted node is turned off or the virtual machine where the extracted node is located is turned off, and the application query service is carried out normally;
when the data nodes in the area corresponding to the standby machine room B are all down: extracting two data nodes of the standby machine room B to simulate the downtime condition, namely turning off the extracted nodes or turning off the virtual machines where the extracted nodes are located, and performing application query service normally;
when the master node in the area z1 is down: extracting the downtime of the node Anode1 in the main room A, namely turning off the extracted node or turning off the virtual machine where the extracted node is located, and performing normal application query service;
when the master node in the area z2 is down: extracting the downtime of the node Anode4 in the main room A, namely turning off the extracted node or turning off the virtual machine where the extracted node is located, and performing normal application query service;
when the master node in the area z3 is down: turning off the node Bnode1 in the standby machine room B, namely turning off the extracted node or turning off the virtual machine where the extracted node is located, and performing application query service normally;
when all the nodes in the standby machine room B are down: extracting the node Bnode1-Bnode3 in the standby machine room B, namely turning off the extracted node or turning off the virtual machine where the extracted node is located, and performing application query service normally;
when all the nodes in the main machine room A are down: and if the service needs to be provided, temporarily modifying one data node in the standby machine room into a generator node, namely setting the node parameter of one data node to true, otherwise, providing the service and reporting errors.
Examples
Step 1: installing and deploying Es clusters in a main machine room A and a standby machine room B, and setting parameters after the Es clusters are installed and deployed;
step 1.1: selecting six servers of 16C +32G in the machine room A, respectively installing es (namely es nodes) on the six servers, modifying es configuration file elasticsearch.yml, and sequentially setting the node.name as a node Anode1-Anode6;
step 1.2: selecting three servers of 16C +32G in a machine room B, respectively installing es (namely es nodes) on the three servers, modifying es configuration file elasticsearch.yml, and sequentially setting the node.name as a node Bnode1-Bnode3;
step 1.3: setting configuration node.master in the nodes of Anode1, Anode4 and Bnode1 as true, setting configuration node.data in the nodes of Anode2, Anode3, Anode5, Anode6, Bnode2 and Bnode3 as false, and setting configuration node.data as true;
step 1.4: the configuration cluster, routing, allocation, aware, attribute in nodes of enode 1-enode 6, Bnode1-Bnode3 are set to zone. The parameters cluster. routing. allocation. artifact. force. zone. values are set to z1, z2, z 3.
Step 1.5: the method comprises the steps of setting a configured node in an anti 1-anti 3 node as z1, a configured node in an anti 4-anti 6 node as z2, and a configured node in a Bnode1-Bnode3 node as z 3.
Step 1.6: setting configuration discovery, zen, minimum _ master _ nodes in nodes of Anode1-Anode6 and Bnode1-Bnode3 to 2 means that an Es cluster needs at least two master nodes to normally provide service.
Step 1.7: the configuration network.tcp.key.alive in the nodes of the nodes 1-ANode6 and Bnode1-Bnode3 is set to true, and the transmission. ping _ schedule is set to 300 s.
Step 2: and installing and deploying Nginx clusters in the main machine room A and the standby machine room B.
Step 2.1: one server configured as 2C +8G is selected from the main room a and the backup room B, respectively, and Nginx is installed.
Step 2.2: modifying Nginx configuration, configuring the application request routing addresses of the main machine room A and the standby machine room B in the upstream, and setting the routing address weight of the main machine room A to be 2, weight =2, and the routing address weight of the standby machine room B to be 1, weight = 1.
And step 3: and realizing high-availability logic calling based on the configured main machine room A and the standby machine room B.
One data node is selected from four data nodes of a main computer room, namely, an Anode2, an Anode3, an Anode5 and an Anode6, the es service of the data node is stopped, and the application query service is verified to be normal.
And (3) shutting down the es services of the data nodes Anode2 and Anode5 in the main computer room A, and verifying that the application query service is normal.
And (3) shutting down the es services of the data nodes Anode5 and Anode6 in the main computer room A, and verifying that the application query service is normal.
And (3) shutting down the es services of the data nodes Anode2 and Anode6 in the main machine room A and the data node Bnode2 in the standby machine room B, and verifying that the query service is normal.
And (3) shutting down the es services of the data nodes Bnode2 and Bnode3 in the standby computer room B, and verifying that the application query service is normal.
The es service of the master node Anode1 in the main computer room A is stopped, and the application query service is verified to be normal.
The es service of the master node Anode4 in the main computer room A is stopped, and the application query service is verified to be normal.
And (3) shutting down the es service of the master node Bnode1 in the standby machine room B, and verifying that the application query service is normal.
And (3) shutting down the es services of the Bnode1, the Bnode2 and the Bnode3 in the standby machine room B, and verifying that the application query service is normal.
And (3) shutting down the es service of all the nodes of the Anode1-Anode6 in the main machine room A, setting the node of the Anode3 node in the machine room of the standby machine room B to true, and verifying that the application query service is normal.
Based on the above implementation logics: and verifying the working condition of the cluster under network jitter. Because the network jitter situation among different machine rooms needs to be considered in the cross-machine-room deployment, the cluster working condition under the network jitter needs to be verified.
Setting a data node Anode2 in the main computer room A: the tc qdisc add dev eth0 root net delay is 50ms 20ms 50% (meaning that the transmission delay of the eth0 network card is set to 50ms, and at the same time, 50% of packets are randomly delayed for 30 (50-20) to 70 (50 + 20) ms), the data node Anode4 sets the tc qdisc add dev 0 root net loss 1% (meaning that the transmission of the eth0 network card is set to randomly drop 1% of data packets), the data node Bnode2 in the standby computer room B sets the tc qdisc add dev 0 root net delay 50ms 50%, and the application query service is verified to be normal.
The application address weight of the Nginx route to the main machine room A and the standby machine room B is set to be 2: 1;
setting an index parameter _ routing.required of each node in an Es cluster as true, comparing tps of the application query service before and after setting, remarkably improving tps of the application query service;
setting a use filter during index query in an Es cluster, comparing tps of application query service before and after setting, wherein tps of the application query service is obviously improved after the filter is set, and the use of the filter is higher than tps without the filter under the same condition;
the above are merely representative examples of the many specific applications of the present invention, and do not limit the scope of the invention in any way. All the technical solutions formed by the transformation or the equivalent substitution fall within the protection scope of the present invention.

Claims (2)

1. A high-availability deployment method for double computer rooms of an es cluster is characterized by comprising the following steps:
step 1: installing and deploying Es clusters in a main machine room A and a standby machine room B, and after installation and deployment are completed, performing parameter setting that copies of the same fragment are respectively distributed in the main machine room A and the standby machine room B, and the fragment and the copy of the fragment are located in different areas;
step 2: installing and deploying the Nginx cluster in a main machine room A and a standby machine room B, and performing routing configuration;
and step 3: realizing high-availability logic calling based on the configured main machine room A and the standby machine room B;
the specific steps of the step 2 are as follows:
step 2.1: arranging an Nginx cluster at each of a main machine room A and a standby machine room B;
step 2.2, after deployment, configuring the routing weight of the Nginx clusters in the main machine room A and the standby machine room B as 2: 1;
the specific steps of the step 1 are as follows:
step 1.1: 6 es nodes are installed and deployed in the main machine room A, and the 6 es nodes are Anode1-Anode6;
step 1.2: 3 es nodes are installed and deployed in a standby machine room B, wherein the 3 es nodes are Bnode1-Bnode3;
step 1.3: setting a sender node and a data node: modifying the configuration of the nodes of the ANode1, the ANode4 and the Bnode1, namely setting the parameter node in the nodes of the ANode1, the ANode4 and the Bnode1 as true, wherein the nodes of the ANode1, the ANode4 and the Bnode1 are master nodes after modification; modifying the configuration of the nodes of the Anode2, the Anode3, the Anode5, the Anode6, the Bnode2 and the Bnode3, namely setting parameter nodes data in the nodes of the Anode2, the Anode3, the Anode5, the Anode6, the Bnode2 and the Bnode3 as true, and after modification, the nodes of the Anode2, the Anode3, the Anode5, the Anode6, the Bnode2 and the Bnode3 are data nodes;
step 1.4: setting Es cluster distribution parameters according to regions, namely setting a parameter cluster which determines whether the cluster is distributed according to regions, and setting a parameter cluster which determines forced regions of Es cluster distribution, namely, setting a parameter cluster which determines forced regions of Es cluster, namely, routing, allocation, aspect, value, = z1, z2 and z3, so that only one copy of the same shard can be stored in one region, the shard and the copy are located in different regions, and the copy can be guaranteed not to be allocated across regions, wherein the shard represents a fragment used for storing data in Es cluster, each fragment has two copies, and allocation represents node migration;
step 1.5: after setting the Es cluster according to the area distribution parameters, dividing the serial numbers for each node: setting a node.attr.zone parameter in each node, setting the node.attr.zone in an anti 1-anti 3 node as z1, the node.attr.zone in an anti 4-anti 6 node as z2, and the node.attr.zone in a Bnode1-Bnode3 node as z3, wherein the node.attr.zone represents a region division number parameter, so that different nodes are set as the same region number, and the corresponding nodes are assigned to the same region;
step 1.6: after the serial numbers are divided, setting that an Es cluster can provide services to the outside only by at least two master nodes, and adjusting the overtime time of the Es cluster for discovering other nodes;
step 1.7: if firewalls exist in the two machine rooms of the main machine room A and the standby machine room B or a network policy causes tcp to be interrupted in a certain time, setting a parameter network.
2. The es cluster dual-room high availability deployment method according to claim 1, characterized in that: the step 3 is specifically as follows:
based on the configured main machine room A and the standby machine room B:
when 1 data node in the main computer room A is down: extracting a node from the data nodes of the main computer room A to simulate the downtime condition, namely turning off the extracted node or turning off the virtual machine where the extracted node is located, and performing normal application query service;
when one node in each of two areas in the main computer room A is down: extracting two nodes in two different areas in the main computer room A to simulate the downtime condition, namely turning off the extracted nodes or turning off the virtual machines where the extracted nodes are located, and performing normal application query service;
when two data nodes in an area in the main computer room a are all down: two machines in one area are selected optionally in the main computer room A, the downtime condition is simulated, namely the extracted nodes are turned off or the virtual machines where the extracted nodes are located are turned off, and the application query service is carried out normally;
when a data node in each of the three areas is down: in the three areas, one data node is extracted to simulate the downtime condition, namely the extracted node is turned off or the virtual machine where the extracted node is located is turned off, and the application query service is carried out normally;
when the data nodes in the area corresponding to the standby machine room B are all down: extracting two data nodes of the standby machine room B to simulate the downtime condition, namely turning off the extracted nodes or turning off the virtual machines where the extracted nodes are located, and performing application query service normally;
when the master node in the area z1 is down: extracting the downtime of the node Anode1 in the main room A, namely turning off the extracted node or turning off the virtual machine where the extracted node is located, and performing normal application query service;
when the master node in the area z2 is down: extracting the downtime of the node Anode4 in the main room A, namely turning off the extracted node or turning off the virtual machine where the extracted node is located, and performing normal application query service;
when the master node in the area z3 is down: turning off the node Bnode1 in the standby machine room B, namely turning off the extracted node or turning off the virtual machine where the extracted node is located, and performing application query service normally;
when all the nodes in the standby machine room B are down: extracting the node Bnode1-Bnode3 in the standby machine room B, namely turning off the extracted node or turning off the virtual machine where the extracted node is located, and performing application query service normally;
when all the nodes in the main machine room A are down: and if the service needs to be provided, temporarily modifying one data node in the standby machine room into a transmitter node, namely setting the node of one data node to true, otherwise, providing the service and reporting errors.
CN202110495513.4A 2021-05-07 2021-05-07 High-availability deployment method for double computer rooms of es cluster Active CN112905308B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110495513.4A CN112905308B (en) 2021-05-07 2021-05-07 High-availability deployment method for double computer rooms of es cluster

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110495513.4A CN112905308B (en) 2021-05-07 2021-05-07 High-availability deployment method for double computer rooms of es cluster

Publications (2)

Publication Number Publication Date
CN112905308A CN112905308A (en) 2021-06-04
CN112905308B true CN112905308B (en) 2021-07-30

Family

ID=76108986

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110495513.4A Active CN112905308B (en) 2021-05-07 2021-05-07 High-availability deployment method for double computer rooms of es cluster

Country Status (1)

Country Link
CN (1) CN112905308B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102857363B (en) * 2012-05-04 2016-04-20 运软网络科技(上海)有限公司 A kind of autonomous management system and method for virtual network
CN106285079A (en) * 2016-07-29 2017-01-04 安徽华斯源新能源科技有限公司 A kind of integrated tank-type self-contained Design of Machine Room method
US10313295B2 (en) * 2016-12-16 2019-06-04 Dreamworks Animation L.L.C. Scalable messaging system
CN109558270B (en) * 2017-09-25 2021-02-05 北京国双科技有限公司 Data backup method and device and data restoration method and device
CN109471755A (en) * 2018-11-14 2019-03-15 江苏鸿信系统集成有限公司 A kind of method and its system that the same city strange land calamity based on cloud computing is standby
CN109726046B (en) * 2018-11-23 2021-01-08 网联清算有限公司 Machine room switching method and device

Also Published As

Publication number Publication date
CN112905308A (en) 2021-06-04

Similar Documents

Publication Publication Date Title
CN105138615B (en) A kind of method and system constructing big data distributed information log
CN112313916B (en) Method and system for pseudo-storage of anti-tampering logs by fusing block chain technology
US8640218B2 (en) System, method and program for managing firewalls
US9773015B2 (en) Dynamically varying the number of database replicas
CN103677967B (en) A kind of remote date transmission system of data base and method for scheduling task
CN102932444B (en) Load balancing module in finance real-time transaction system
US7441024B2 (en) Method and apparatus for applying policies
CN102053982B (en) A kind of database information management method and equipment
US9450700B1 (en) Efficient network fleet monitoring
WO2023142054A1 (en) Container microservice-oriented performance monitoring and alarm method and alarm system
CN113515499A (en) Database service method and system
WO2012145963A1 (en) Data management system and method
CN102902615A (en) Failure alarm method and system for Lustre parallel file system
CN102938705A (en) Method for managing and switching high availability multi-machine backup routing table
CN102968457B (en) Database method for switching between and system
CN108092936A (en) A kind of Host Supervision System based on plug-in architecture
CN112131305A (en) Account processing system
CN103327116A (en) Dynamic copy storage method for network file
CN113127199A (en) Load balancing configuration method, device, equipment and storage medium
CN112905308B (en) High-availability deployment method for double computer rooms of es cluster
CN101668028B (en) Extendable system structure-based distributed system and application method thereof
CN112714022A (en) Control processing method and device for multiple clusters and computer equipment
CN106657390A (en) Cluster file system directory isolation method, cluster file system directory isolation device and cluster file system directory isolation system
CN104503871A (en) Implementation method based on full-redundancy model of small computer system
CN110515766B (en) Cloud-based electric power disaster recovery data security protection system, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant