CN114546427A - MySQL high-availability implementation method based on DNS and MGR - Google Patents
MySQL high-availability implementation method based on DNS and MGR Download PDFInfo
- Publication number
- CN114546427A CN114546427A CN202210154698.7A CN202210154698A CN114546427A CN 114546427 A CN114546427 A CN 114546427A CN 202210154698 A CN202210154698 A CN 202210154698A CN 114546427 A CN114546427 A CN 114546427A
- Authority
- CN
- China
- Prior art keywords
- database
- dns
- node
- cluster
- mgr
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 239000000523 sample Substances 0.000 claims abstract description 34
- 238000001514 detection method Methods 0.000 claims abstract description 13
- 238000009825 accumulation Methods 0.000 claims abstract description 7
- 230000008569 process Effects 0.000 claims description 24
- 230000008439 repair process Effects 0.000 claims description 11
- 238000004458 analytical method Methods 0.000 claims description 5
- 230000002159 abnormal effect Effects 0.000 claims description 4
- 230000007547 defect Effects 0.000 claims description 4
- 238000012795 verification Methods 0.000 claims description 4
- 238000007639 printing Methods 0.000 claims description 3
- 238000012423 maintenance Methods 0.000 abstract description 5
- 238000013461 design Methods 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 10
- 230000010076 replication Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000007726 management method Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000009434 installation Methods 0.000 description 3
- 230000009466 transformation Effects 0.000 description 3
- 238000013500 data storage Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
- G06F8/61—Installation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/445—Program loading or initiating
- G06F9/44505—Configuring for program initiating, e.g. using registry, configuration files
- G06F9/4451—User profiles; Roaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/448—Execution paradigms, e.g. implementations of programming paradigms
- G06F9/4482—Procedural
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0631—Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0663—Performing the actions predefined by failover planning, e.g. switching to standby network elements
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a MySQL high-availability realization method based on a DNS and an MGR. The method comprises the steps of installing and online configuring a database cluster, initializing and configuring a probe, detecting the running database cluster by the probe, judging whether the database cluster runs normally or not according to the database cluster structure, the transaction accumulation condition and the domain name IP obtained by detection, and completing the switching of the database by combining the probe and the MGR after abnormity occurs so that an application is automatically connected to the switched database cluster. The invention is designed by combining the whole structure of the application system, the DNS, the MGR and the characteristics of the probe, so the structure is simple. Meanwhile, the probe is based on an interfacing design, is not limited to any language, and can be realized by combining with a language familiar to a user during actual use so as to facilitate maintenance.
Description
Technical Field
The invention relates to the technical field of databases, in particular to a MySQL high-availability implementation method based on DNS and MGR.
Background
The database is used as data storage equipment of the information system, mainly provides data storage and access functions, and is often the core of the information system. With the rapid iteration and development of database technologies, various database vendors provide more functions and features, including database highly available technologies.
The database high-availability technology is a technology which can continuously provide services after partial nodes fail due to a server, an operating system or database software and the like through the characteristics of database software. The MySQL database is a technology widely used in various industries, provides asynchronous master-slave, semi-synchronous master-slave and latest MGR distributed technologies in sequence, and the technology and the scheme are gradually perfected and mature.
MGR refers to MySQL Group Replication (MySQL Group Replication), and is a synchronization technology provided by MySQL databases based on the Paxos distributed protocol from version 5.7. By the technology, conflict check and analysis are performed on the transaction logs synchronized between the main database and the standby database, so that the service and data of the database are ensured to be consistent under an extreme scene.
A Domain Name Service (DNS) is a Service that provides a mapping relationship between a Domain Name and an IP address, and a main purpose application system thereof can access a real Service by configuring a Domain Name server and accessing a Domain Name having a business meaning without paying attention to the IP address, thereby realizing unification of application configuration and reducing maintenance difficulty. The DNS is becoming a standard configuration for information system construction in various industries. The industry has now provided sophisticated solutions including the business version ZDNS, the open source version bind, bind-dlz, etc.
By combining the technologies, the database and the DNS belong to different technical fields, and effective integration is not performed. Although the database provides complete synchronization schemes such as master-slave synchronization and MGR, applications cannot be identified after switching is completed, and the applications need to independently realize related functions. The common schemes in the industry and the advantages and disadvantages thereof are analyzed as follows:
1. a VIP based switching scheme. According to the scheme, the database nodes are detected through the independent control server, fault switching is started after the detection failure of the main node is found, and the auxiliary node is promoted to be the main node. However, because nodes outside the database are adopted for detection, when fault switching is triggered due to abnormal network isolation, server death and the like, the database may be cracked, and data loss is caused; meanwhile, due to the particularity of the control node, a single-node or main-standby framework is often adopted, and when a complex fault scene is faced, the control node may break down, so that the switching operation cannot be executed; in addition, the scheme is mostly realized by a third-party organization, the core switching logic of the scheme is strongly associated with the database version, and the possibility of failure of the high-availability scheme due to incompatibility with the actually used database version exists, particularly the current MySQL8 version.
2. An intermediate proxy scheme. The solution provides for connection of the application by establishing a proxy layer between the database and the application, identifying the master node of the database through the proxy layer, and then providing a protocol close to the database. After the database master node fails, the intermediate proxy identifies the new node and then re-establishes the application connection to the new database node. However, because an additional intermediate proxy layer is required to be introduced, any database request needs to pass through the intermediate proxy layer, which inevitably causes the problems of more complex overall architecture of the application, more request paths, longer time consumption and the like. Meanwhile, the application agent often realizes compatibility through a third-party organization and simulation of MySQL common operations, but operations which cannot be identified by the intermediate agent inevitably exist, and additional compatibility transformation is required; furthermore, the intermediate proxy layer itself may have high availability issues, performance bottlenecks, etc.
3. Based on DNS health check scheme. According to the scheme, the DNS server is used for regularly checking the database, and when the condition change caused by the failure of the original main node is found, the IP of the corresponding domain name is updated to be the IP of the new main node. However, the scheme only provides how to connect the database after the database is switched, and a complete scheme is not provided, including how to perform fault switching on the database, not losing data, and enabling the application to be connected to a new host node according to updated domain name information; meanwhile, the method also depends on a specific DNS product to realize the active detection function based on the database, and has insufficient popularization value.
Disclosure of Invention
The invention aims to provide a MySQL high-availability implementation method based on DNS and MGR aiming at the defects in the prior art.
In order to achieve the above object, the present invention provides a MySQL high-availability implementation method based on DNS and MGR, including:
step 1, installing and online configuring a database cluster;
step 2, initializing the probe, and then detecting a database cluster in operation;
step 3, judging whether the database cluster normally operates according to the database cluster architecture, the transaction accumulation condition and the domain name IP acquired by detection, and specifically comprising the following steps:
if the database cluster main node IP is equal to the domain name IP and the database cluster node is equal to the configuration node, indicating that the cluster runs normally;
if the IP of the database cluster master node is equal to the domain name IP, but the database cluster node is not equal to the configuration node, indicating that the slave node is abnormal in operation, and triggering an alarm;
if the database cluster main node IP is not equal to the domain name IP and the node IP is equal to the database cluster main node IP, triggering a fault switching process;
and if the database cluster main node IP is not equal to the domain name IP and the local node IP is not equal to the database cluster main node IP, triggering an alarm.
Further, the failover process includes:
selecting a new main node through a distributed consistency protocol, terminating the original main node database instance or entering an offline mode, and adjusting the detection interval time;
analyzing the running state of the database, and determining whether a transaction which is not completely applied exists in the new main node;
after the application of the accumulated transaction is finished, if the node is a new main node, the probe initiates a DNS switching process;
after the DNS is successfully switched, if the DNS has a cache, executing the refreshing operation of the DNS cache;
checking the domain name IP, the new main node IP and the PING result, wherein the failure switching is completed if the checking is successful;
and recovering the detection interval as the configuration interval time and continuously executing the detection.
Further, the DNS switching process includes consistency check of a domain name and an IP before updating, IP for updating a domain name, and consistency check of a domain name and an IP after updating, checking and updating the commercial DNS device using an API interface provided by the commercial DNS device, updating the open-source DNS device using a remote command or an SQL request, and the like, and if the check is successful, the DNS switching is successful, otherwise, the DNS switching is aborted.
Further, the step 1 specifically includes:
step 1.1, resource application and configuration are carried out;
step 1.2, deploying database software;
step 1.3, optimizing parameter configuration;
step 1.4, initializing and starting a database;
step 1.5, configuring the user and the authority;
step 1.6, MGR plug-in and parameter configuration is carried out;
step 1.7, starting MGR operation;
step 1.8, configuring and starting each database node probe, so as to complete the configuration of the database node and gradually complete the configuration of all the database nodes;
and step 1.9, the application system connects the database cluster according to the configured information such as the domain name, the user name, the password and the like of the database.
Further, the step 2 specifically includes:
step 2.1, reading the fixed path configuration file, and determining whether the configuration file format is illegal and the parameter value is reasonable;
step 2.2, analyzing to obtain database login information, a login database query variable, a cluster structure and a transaction accumulation condition;
step 2.3, checking whether the cluster structure is matched with the information in the configuration file, if the node set of the configuration information is equal to or contains the real-time cluster architecture of the database, the checking returns success, otherwise, the checking returns failure;
step 2.4, executing ping operation according to the domain name to obtain an IP address corresponding to the domain name;
and 2.5, checking the DNS update interface, printing initialization information, and reasonably checking initialization configuration.
Further, after the fault switching process is completed, repairing the fault specifically includes:
analyzing the root cause of the database cluster for fault switching, and determining whether the components such as an operating system, a database, a probe and the like need to be optimized according to the analysis result;
checking data such as an operating system running state and a log, confirming whether repair operation on the operating system level needs to be executed, and restarting the operating system if fault switching is caused by operating system defects;
checking a database error log, confirming whether the repair operation of the database instance needs to be executed or not, and restarting the database if the database instance or the process abnormally triggers the fault switching;
confirming whether key parameters of a database of a fault node are normal or not, wherein the key parameters comprise related parameters such as an offline mode, read-only configuration and the like;
restarting the MGR, automatically recovering data after starting operation, and adding the fault node into the database cluster again;
and executing probe starting operation, and if the log is normal and the initialization is successful, indicating that the fault repair is successful.
Has the advantages that: the invention is suitable for the application using MySQL database, and has the following advantages:
1. the application is completely transparent, and the method has a wide application foundation. According to the database high-availability scheme, the native MySQL standard is reserved from application program coding, a driver and SQL analysis and execution, and the database high-availability scheme is completely transparent to an application system. When the application system is transferred from other high-availability schemes to use the scheme, the application system can be put into operation and used only by necessary standardized transformation.
2. The method is completely based on the distributed consistency protocol, and is safe and reliable. The scheme is designed based on a distributed consistency protocol provided by MGR, all operations affecting a database and a cluster structure, such as transaction verification, view switching and the like of a cluster are subjected to consistency negotiation, the operations are performed after half members vote and agree, and meanwhile, a probe is further performed based on a consistency switching result, such as view updating and the like. Therefore, the whole scheme is safe and reliable under any condition.
3. Based on common infrastructure, the operation and the cost are controllable. Since the whole architecture of the system is not changed by the scheme, the basic architecture of the common application service + database is maintained, and other intermediate proxy services are not introduced. Meanwhile, the DNS serves as a basic resource of the data center, and whether commercial DNS software (such as ZDNS) or DNS based on open sources (such as bind and bind-dlz) is adopted, the DNS can meet the requirement of the scheme. Meanwhile, in the fault switching process, the probe actively switches to the DNS according to the database consistency switching result, customized design (such as an API (application program interface), a remote command and an SQL (structured query language) request) can be carried out according to the target DNS, and the DNS does not need to be subjected to customized transformation so as to control cost and controllability.
4. The structure is simple, and the configuration and the maintenance are convenient. In summary, the highly available solution is designed by combining the overall architecture of the application system itself, the characteristics of the DNS, MGR, and probe, so the architecture is simple. Meanwhile, the probe is in an interface design and is not limited to any language, and can be realized by combining with the language familiar to a user in actual use so as to facilitate maintenance.
Drawings
FIG. 1 is a schematic flow diagram of an installation and on-line configuration of a database cluster;
FIG. 2 is a schematic flow chart of initializing a probe and probing a database cluster;
FIG. 3 is a schematic diagram of a failover process;
FIG. 4 is a schematic flow chart for repairing a fault;
FIG. 5 is an architecture diagram of an application system accessing DNS directly;
fig. 6 is an architecture diagram of an application system accessing a DNS cache.
Detailed Description
The present invention will be further illustrated with reference to the accompanying drawings and specific examples, which are carried out on the premise of the technical solution of the present invention, and it should be understood that these examples are only for illustrating the present invention and are not intended to limit the scope of the present invention.
As shown in fig. 1 to 4, an embodiment of the present invention provides a MySQL high-availability implementation method based on DNS and MGR, including:
step 1, installing and online configuring a database cluster. Specifically, referring to fig. 1, step 1 includes:
and 1.1, resource application and configuration are carried out.
In the system online process, an application system person in charge applies for resources such as servers and domain names through an existing standard process (such as ITSM). After the application is submitted and approved, the relevant responsibility team allocates and configures the resources. For example, a system management department allocates a database server and configures modules such as an IP address, a security policy, a monitoring policy and the like; and the network management department allocates domain names according to the allocated IP and configures related views and authorities.
And step 1.2, deploying database software.
The step is that the database management department performs installation and deployment of the database software. The method comprises the operations of operating system dependency package installation, operating system user configuration, operating system kernel parameter optimization, database directory creation, database software installation, directory authority modification and the like.
And 1.3, optimizing parameter configuration.
Specifically, the relevant database parameter values are calculated according to the server configuration (such as the number of CPU cores and the total size of the memory). If the Innodb _ thread _ security is set as the number of the CPUs of the server, the Innodb _ buffer _ pool _ size is 70% of the total size of the memory of the server, the slave _ parallel _ works is set as the number of the cores of the server, and the unique server-id is calculated. And other key parameter settings, detailed in table 1:
TABLE 1
And step 1.4, initializing and starting the database.
And (4) using deployed database software, and combining the optimized parameter file and the optimized directory configuration to perform initialization operation of the database. The method mainly comprises the operations of generating a system table space, generating a redo (redo) log file, generating a double-write (double-write) cache file, creating an undo (undo) table space, creating a temporary sequencing file, generating a temporary password and the like. And after the initialization is completed, executing a command to start the database.
And step 1.5, configuring the user and the authority.
After the initialization of the database is completed, the random password is inconvenient for unified management, and meanwhile, the related password is recorded in a database log, so that the security risk exists. Related users are also required to be established according to the application and the operation and maintenance requirements, and the user list is shown in table 2:
TABLE 2
It should be noted that, the user according to the principle of minimizing the permissions is noted above, the permissions of the table or the database should be allocated according to the actual needs. In particular, the CONNECTION _ ADMIN and the SUPER authority cannot be given.
And step 1.6, MGR plug-in and parameter configuration is carried out.
The configuration database supports a complete MGR (MySQL Group replication) function, two plug-ins, namely Group _ replication and mysql _ clone, need to be installed, wherein the Group _ replication is used for realizing the core function of the MGR, and the mysql _ clone is used for carrying out full data synchronization during fault repair. In addition, MGR related parameters need to be set, and the specification setting is shown in table 3:
TABLE 3
Step 1.7, start MGR operation.
After the completion of the plug-in and parameter configuration for the MGR, the database instance is restarted to validate the parameters. After the restart is completed, setting a group _ reproduction _ recovery channel, and then starting group copy. When the first node of the database cluster executes the startup group replication, the boot mode is set, and after the startup is completed, the boot mode is closed, and then the group replication of other nodes is started.
And step 1.8, configuring and starting each database node probe, so as to complete the configuration of the database node and gradually complete the configuration of all the database nodes.
When configuring and starting each database node probe, according to the applied domain name address and database cluster configuration, perfecting a probe configuration file, executing probe starting operation, and observing whether the log is normal and the initialization is successful.
And step 1.9, the application system connects the database cluster according to the configured information such as the domain name, the user name, the password and the like of the database.
It is further noted that, first, modifications to the MGR configuration are made in accordance with the above-described specification configuration to ensure that the switching operation is completed within a desired time frame after the occurrence of the fault. Secondly, the permission of the application program database user is distributed according to the requirement by adopting a permission minimization principle, so that the original main node can interrupt the existing database connection after being driven out of the cluster.
And 2, initializing the probe, and then detecting the running database cluster. Specifically, referring to fig. 2, step 2 specifically includes:
and 2.1, reading the fixed path configuration file, and determining whether the configuration file format is illegal and the parameter value is reasonable.
And 2.2, analyzing to obtain database login information, database login query variables, cluster structures and accumulation conditions.
And 2.3, checking whether the cluster structure is matched with the information in the configuration file, if the node set of the configuration information is equal to or contains the real-time cluster architecture of the database, the checking returns success, and if not, the returning fails.
And 2.4, executing ping operation according to the domain name to acquire the IP address corresponding to the domain name.
And 2.5, checking the DNS update interface, printing initialization information, and reasonably checking initialization configuration.
Step 3, judging whether the database cluster normally operates according to the database cluster architecture, the transaction accumulation condition and the domain name IP acquired by detection, and specifically comprising the following steps:
if the database cluster main node IP is equal to the domain name IP and the database cluster node is equal to the configuration node, the cluster is normally operated.
If the IP of the database cluster master node is equal to the IP of the domain name, but the database cluster node is not equal to the configuration node, the fact that the slave node runs abnormally is indicated, and an alarm is triggered.
And if the database cluster main node IP is not equal to the domain name IP and the local node IP is equal to the database cluster main node IP, triggering a fault switching process.
And if the database cluster main node IP is not equal to the domain name IP and the local node IP is not equal to the database cluster main node IP, triggering an alarm.
Referring to fig. 3, the above-mentioned failover process specifically includes:
and selecting a new main node through a distributed consistency protocol, terminating the original main node database instance or entering an offline mode, and adjusting the detection interval time. After the original master node database instance terminates or enters an offline mode, all sessions or connections running at the original master node are interrupted. The probe interval time is preferably adjusted to 1 second.
Analyzing the running state of the database, and determining whether the new host node has a transaction which is not completed by the application.
After the application of the accumulated transaction is completed, if the node is a new main node, the probe initiates a DNS switching process.
And after the DNS is successfully switched, if the DNS has a cache, executing the refreshing operation of the DNS cache.
And checking the domain name IP, the new main node IP and the PING result, wherein the failure switching is completed when the checking is successful.
And restoring the probe interval to be the configured interval time and continuing to execute the detection.
The DNS switching process comprises consistency check of a domain name and an IP before updating, IP updating of the domain name and consistency check of the domain name and the IP after updating, the business DNS equipment is checked and updated by using an API (application program interface) provided by the business DNS equipment, the open source DNS equipment is updated by using a remote command or SQL (structured query language) request, and finally, the DNS switching is successful if the verification is successful, otherwise, the DNS switching process is abnormal and exits.
After the fault switching process is completed, the following method may be further adopted to repair the fault, which is shown in fig. 4 and specifically includes:
and analyzing the root cause of the database cluster in the fault switching, and determining whether the components such as the operating system, the database, the probe and the like need to be optimized according to the analysis result.
Checking data such as operating system running state and log, determining whether repair operation on operating system level needs to be executed, and restarting the operating system if fault switching is caused by operating system defects.
And checking the database error log, confirming whether the repair operation of the database instance needs to be executed or not, and restarting the database if the database instance or the process triggers the failover.
And confirming whether the key parameters of the database of the fault node are normal or not, wherein the key parameters comprise the relevant parameters of an offline mode and a read-only configuration. See table 4 for details:
parameter name | Parameter value |
offline_mode | off |
super_read_only | on |
read_only | on |
TABLE 4
And restarting the MGR, automatically recovering data after starting operation, and adding the fault node into the database cluster again.
And executing probe starting operation, and if the log is normal and the initialization is successful, indicating that the fault repair is successful.
Referring to fig. 5 and 6, the present invention can be implemented based on an application system directly accessing a DNS or accessing a DNS cache, and when a failure occurs by directly accessing the DNS, the MGR identifies the failure and performs switching; the probe is combined with the MGR switching result, and accesses a DNS to provide an API (application program interface) to initiate the updating and checking of the domain name; the application system obtains the IP with the updated domain name through the DNS when reconnecting because the original main node of the database cluster fails so as to connect to the new main node of the database cluster. When the DNS cache is accessed for implementation, after a fault occurs, the MGR identifies the fault and performs switching; the probe is combined with the MGR switching result, accesses a DNS to provide an API (application program interface) to initiate the updating and the verification of the domain name, and refreshes the updated information to a cache in real time; the application system obtains the IP with the updated domain name through the DNS cache when reconnecting because the original main node of the database cluster is in fault so as to connect to the new main node of the database cluster.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that other parts not specifically described are within the prior art or common general knowledge to those of ordinary skill in the art. Without departing from the principle of the invention, several improvements and modifications can be made, and these improvements and modifications should also be construed as the scope of the invention.
Claims (6)
1. A MySQL high-availability implementation method based on DNS and MGR is characterized by comprising the following steps:
step 1, installing and online configuring a database cluster;
step 2, initializing the probe, and then detecting a database cluster in operation;
step 3, judging whether the database cluster normally operates according to the database cluster architecture, the transaction accumulation condition and the domain name IP acquired by detection, and specifically comprising the following steps:
if the database cluster main node IP is equal to the domain name IP and the database cluster node is equal to the configuration node, indicating that the cluster runs normally;
if the IP of the database cluster master node is equal to the domain name IP, but the database cluster node is not equal to the configuration node, indicating that the slave node is abnormal in operation, and triggering an alarm;
if the database cluster main node IP is not equal to the domain name IP and the node IP is equal to the database cluster main node IP, triggering a fault switching process;
and if the database cluster main node IP is not equal to the domain name IP and the local node IP is not equal to the database cluster main node IP, triggering an alarm.
2. The MySQL high-availability implementation method based on DNS and MGR of claim 1, wherein the failover process comprises:
selecting a new main node through a distributed consistency protocol, terminating the original main node database instance or entering an offline mode, and adjusting the probe interval time;
analyzing the running state of the database, and determining whether a transaction which is not completely applied exists in the new main node;
after the application of the accumulated transaction is finished, if the node is a new main node, the probe initiates a DNS switching process;
after the DNS is successfully switched, if the DNS has a cache, executing the refreshing operation of the DNS cache;
checking the domain name IP, the new main node IP and the PING result, wherein the failure switching is completed if the checking is successful;
and restoring the probe interval to be the configured interval time and continuing to execute the detection.
3. The MySQL high-availability implementation method based on DNS and MGR as claimed in claim 1, wherein the DNS switching process comprises consistency check of domain name and IP before update, IP for updating domain name, and consistency check of domain name and IP after update, the business DNS device is checked and updated by using API provided by the business DNS device, the open source DNS device is updated by using remote command or SQL request, and finally, successful verification means successful DNS switching, otherwise, the DNS switching process is exited.
4. The MySQL high availability implementation method based on DNS and MGR as claimed in claim 1, wherein the step 1 specifically comprises:
step 1.1, resource application and configuration are carried out;
step 1.2, deploying database software;
step 1.3, optimizing parameter configuration;
step 1.4, initializing and starting a database;
step 1.5, configuring the user and the authority;
step 1.6, MGR plug-in and parameter configuration is carried out;
step 1.7, starting MGR operation;
step 1.8, configuring and starting each database node probe, so as to complete the configuration of the database node and gradually complete the configuration of all the database nodes;
and step 1.9, the application system connects the database cluster according to the configured information such as the domain name, the user name, the password and the like of the database.
5. The MySQL high availability implementation method based on DNS and MGR as claimed in claim 1, wherein the step 2 specifically comprises:
step 2.1, reading the fixed path configuration file, and determining whether the configuration file format is illegal and the parameter value is reasonable;
step 2.2, analyzing to obtain database login information, a login database query variable, a cluster structure and a transaction accumulation condition;
step 2.3, checking whether the cluster structure is matched with the information in the configuration file, if the node set of the configuration information is equal to or contains the real-time cluster architecture of the database, the checking returns success, otherwise, the checking returns failure;
step 2.4, executing ping operation according to the domain name to obtain an IP address corresponding to the domain name;
and 2.5, checking the DNS update interface, printing initialization information, and reasonably checking initialization configuration.
6. The MySQL high-availability implementation method based on the DNS and the MGR as recited in claim 1, wherein after the fault switching process is completed, the fault is repaired, and the method specifically comprises the following steps:
analyzing the root cause of the database cluster for fault switching, and determining whether the components such as an operating system, a database, a probe and the like need to be optimized according to the analysis result;
checking data such as an operating system running state and a log, confirming whether repair operation on the operating system level needs to be executed, and restarting the operating system if fault switching is caused by operating system defects;
checking a database error log, confirming whether the repair operation of the database instance needs to be executed or not, and restarting the database if the database instance or the process abnormally triggers the fault switching;
confirming whether key parameters of a database of a fault node are normal or not, wherein the key parameters comprise an offline mode and read-only configuration related parameters;
restarting the MGR, automatically recovering data after starting operation, and adding the fault node into the database cluster again;
and executing probe starting operation, and if the log is normal and the initialization is successful, indicating that the fault repair is successful.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210154698.7A CN114546427A (en) | 2022-02-21 | 2022-02-21 | MySQL high-availability implementation method based on DNS and MGR |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210154698.7A CN114546427A (en) | 2022-02-21 | 2022-02-21 | MySQL high-availability implementation method based on DNS and MGR |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114546427A true CN114546427A (en) | 2022-05-27 |
Family
ID=81674839
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210154698.7A Pending CN114546427A (en) | 2022-02-21 | 2022-02-21 | MySQL high-availability implementation method based on DNS and MGR |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114546427A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115934428A (en) * | 2023-01-10 | 2023-04-07 | 湖南三湘银行股份有限公司 | Main disaster recovery backup switching method and device of MYSQL database and electronic equipment |
-
2022
- 2022-02-21 CN CN202210154698.7A patent/CN114546427A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115934428A (en) * | 2023-01-10 | 2023-04-07 | 湖南三湘银行股份有限公司 | Main disaster recovery backup switching method and device of MYSQL database and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11907254B2 (en) | Provisioning and managing replicated data instances | |
JP5443614B2 (en) | Monitoring replicated data instances | |
JP4426736B2 (en) | Program correction method and program | |
JP5860497B2 (en) | Failover and recovery for replicated data instances | |
US7007047B2 (en) | Internally consistent file system image in distributed object-based data storage | |
US8346891B2 (en) | Managing entities in virtual computing environments | |
US20040078654A1 (en) | Hybrid quorum/primary-backup fault-tolerance model | |
CN108959045B (en) | Method and system for testing fault switching performance of NAS cluster | |
CN104503965A (en) | High-elasticity high availability and load balancing realization method of PostgreSQL (Structured Query Language) | |
US9282021B2 (en) | Method and apparatus for simulated failover testing | |
US10230567B2 (en) | Management of a plurality of system control networks | |
US7228344B2 (en) | High availability enhancement for servers using structured query language (SQL) | |
US20240118884A1 (en) | Automated deployment method for upgrading client's internal business software systems | |
CN110275793B (en) | Detection method and equipment for MongoDB data fragment cluster | |
CN114546427A (en) | MySQL high-availability implementation method based on DNS and MGR | |
US7765230B2 (en) | Method and system for managing data | |
CN116069583A (en) | Database cluster management method and device and network equipment | |
Nomani et al. | Implementing IBM£ DB2£ Universal Database Enterprise Edition with Microsoft£ Cluster Server (TR-74.177) | |
US20120216071A1 (en) | Avoiding Failover Identifier Conflicts | |
Carter | SQL server AlwaysOn revealed | |
CN115757658A (en) | Consul, bind and MGR-based data storage method, system, storage medium, electronic device and application | |
CN114257512A (en) | Method and system for realizing high availability of ambari big data platform | |
CN116361073A (en) | Data security management method, device, equipment and storage medium | |
Boyer et al. | Full autonomic repair for distributed applications | |
Allison et al. | Oracle Real Application Clusters Installation Guide, 11g Release 1 (11.1) for Microsoft Windows B28251-05 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Country or region after: China Address after: No.4 building, Hexi Financial City, Jianye District, Nanjing City, Jiangsu Province, 210000 Applicant after: Jiangsu Sushang Bank Co.,Ltd. Address before: No.4 building, Hexi Financial City, Jianye District, Nanjing City, Jiangsu Province, 210000 Applicant before: JIANGSU SUNING BANK Co.,Ltd. Country or region before: China |