CN117421158A

CN117421158A - Database fault processing method, system and storage medium

Info

Publication number: CN117421158A
Application number: CN202311422213.9A
Authority: CN
Inventors: 程玉文; 孙科; 赵轶新; 玄勇
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2023-10-30
Filing date: 2023-10-30
Publication date: 2024-01-19

Abstract

The invention discloses a database fault processing method, a system and a storage medium, which are applied to the technical field of network communication security, can relieve the problem of single-point fault of a database, and improve the high availability and resource utilization rate of the database. The method comprises the following steps: performing node deployment according to the service parameters to obtain a preset main equipment cluster; the preset main equipment cluster comprises a standby server and a main server group, wherein the main server group comprises a main read server and a main write server; creating a virtual Internet protocol address and distributing the virtual Internet protocol address to a preset main equipment cluster to construct a virtual Internet protocol link; when the working state of the main server group is monitored to be a fault state through a preset heartbeat detection mechanism, switching the virtual Internet protocol link to be connected with the standby server, and triggering a fault early warning mechanism; and when the fault server is determined to be repaired, synchronizing the data in the standby server to the main server group, and switching the virtual Internet protocol link to be connected with the main server group.

Description

Database fault processing method, system and storage medium

Technical Field

The present invention relates to the field of network communication security technologies, and in particular, to a database fault processing method, system, and storage medium.

Background

With the continuous high-speed development of Information Technology (IT) and Internet industries, each industry has applied Internet technology to a wider development stage, and various data also become key factors in enterprises. Accordingly, how to efficiently and safely store, query, select and deploy storage middleware, and achieve more efficient resource utilization rate, and the like, is generally focused on the information technology industry. In the related technology, the database middleware presents hundreds of flowers and is arranged, and master-slave configuration also become mainstream deployment strategies. However, when high performance and high concurrency are realized, there is often a problem of single point failure, or more resources need to be input, so that the overall resource utilization rate is low, and a high security risk exists. Therefore, the above technical problems need to be solved.

Disclosure of Invention

In order to solve at least one of the above technical problems, the present invention provides a method, a system and a storage medium for processing database faults, which can alleviate the problem of single point faults of a database and effectively improve the high availability and resource utilization rate of the database.

In one aspect, an embodiment of the present invention provides a method for processing database faults, including the following steps:

Performing node deployment according to the service parameters to obtain a preset main equipment cluster; the preset main and standby clusters comprise standby servers and at least one main server group, wherein the main server group comprises a main reading server and a main writing server, and standby database examples corresponding to the main reading server and the main writing server are arranged in the standby servers;

creating a plurality of virtual internet protocol addresses, and distributing the plurality of virtual internet protocol addresses to each server in the preset master equipment cluster to construct corresponding virtual internet protocol links; the virtual internet protocol link is respectively connected with the main writing server and the main reading server;

when the working state of the main server group is monitored to be a fault state through a preset heartbeat detection mechanism, switching the virtual internet protocol link to be connected with the standby database instance in the standby server, and triggering a fault early warning mechanism to repair the fault server through the fault early warning mechanism; the fault server comprises a server with a working state of a fault state in the main server group;

And when the fault server repair is determined to be completed, synchronizing the data in the standby server to the main server group, and switching the virtual internet protocol link to be connected with the main server group.

According to some embodiments of the present invention, the node deployment according to the service parameter, to obtain a preset master-device cluster, includes:

carrying out service division according to the service parameters to obtain a plurality of service types;

deploying a corresponding active server group for each service type, and constructing a corresponding standby database instance in the standby server; the standby database instance comprises a standby read database instance corresponding to the main read server and a standby write database instance corresponding to the main write server.

According to some embodiments of the present invention, after performing the step of monitoring that the working state of the active server group is a fault state through a preset heartbeat detection mechanism, the method further includes:

when the service parameters corresponding to the fault server are the first type of service, closing a database instance corresponding to a server in the standby server, wherein the working state of the database instance is a non-fault state, and the server is in a non-fault state;

Or when the service parameter corresponding to the fault server is a second type service, dynamically adjusting the instance resource of the standby server; the dynamic adjustment of the instance resources comprises the steps of reducing the resource configuration of the standby database instance corresponding to the server with the working state being a non-fault state in the active server group in the standby server, and increasing the resource configuration of the standby database instance corresponding to the fault server.

According to some embodiments of the invention, synchronizing data in the standby server to the active server group includes:

synchronizing the data to be synchronized in the standby write database instance to a corresponding main write server; the data to be synchronized comprises incremental data from the detection of the fault server as a fault state to the determination of the completion of the repair of the fault server;

and carrying out data synchronization to the main reading server through the main writing server.

According to some embodiments of the invention, after performing the step of creating a number of virtual internet protocol addresses and assigning a number of the virtual internet protocol addresses to respective servers in the pre-set master cluster, the method further comprises:

Configuring the preset heartbeat detection mechanism to be in a non-automatic restarting state;

configuring priority parameters of the preset heartbeat detection mechanism to determine a service host through the priority parameters;

and configuring a process shutdown script to guide the virtual internet protocol link switching through the process shutdown script.

According to some embodiments of the present invention, when the working state of the active server group is monitored to be a fault state by a preset heartbeat detection mechanism, the switching the virtual internet protocol link to connect with the standby database instance in the standby server includes:

when the working state of the main writing server is detected to be a fault state by the preset heartbeat detection mechanism, the main reading server corresponding to the main writing server is shut down by a preset stopping script;

and switching the virtual internet protocol link to be connected with the corresponding standby writing database instance and the standby reading database instance.

According to some embodiments of the invention, the method further comprises:

the virtual internet protocol address link is connected with the corresponding main writing server, and platform data writing operation is executed to write the platform input data into the main writing server;

Synchronizing the platform input data to the corresponding primary read server and the backup write database instance through a binary log;

and synchronizing the platform input data to the corresponding standby read database instance through the standby write database instance.

On the other hand, the embodiment of the invention also provides a database fault processing system, which comprises:

the first module is used for carrying out node deployment according to the service parameters to obtain a preset main equipment cluster; the preset main and standby clusters comprise standby servers and at least one main server group, wherein the main server group comprises a main reading server and a main writing server, and standby database examples corresponding to the main reading server and the main writing server are arranged in the standby servers;

the second module is used for creating a plurality of virtual internet protocol addresses and distributing the plurality of virtual internet protocol addresses to each server in the preset main equipment cluster so as to construct corresponding virtual internet protocol links; the virtual internet protocol link is respectively connected with the main writing server and the main reading server;

The third module is used for switching the virtual internet protocol link to be connected with the standby database instance in the standby server when the working state of the main server group is monitored to be a fault state through a preset heartbeat detection mechanism, and triggering a fault early warning mechanism so as to repair the fault server through the fault early warning mechanism; the fault server comprises a server with a working state of a fault state in the main server group;

and a fourth module, configured to synchronize data in the standby server to the active server set and switch the virtual internet protocol link to connect with the active server set when it is determined that the repair of the failed server is completed.

at least one processor;

at least one memory for storing at least one program;

the at least one program, when executed by the at least one processor, causes the at least one processor to implement the database fault handling method as described in the above embodiments.

In another aspect, an embodiment of the present invention further provides a computer storage medium, in which a program executable by a processor is stored, where the program executable by the processor is used to implement the database fault handling method according to the above embodiment.

The database fault processing method provided by the embodiment of the invention has at least the following beneficial effects: according to the embodiment of the invention, node deployment is firstly carried out according to the service parameters, and the preset main equipment cluster is obtained. The preset main equipment cluster in the embodiment of the invention comprises a standby server and at least one main server group, wherein the main server group comprises a main read server and a main write server, and standby database examples corresponding to the main read server and the main write server are arranged in the standby server. Then, the embodiment of the invention creates a plurality of virtual internet protocol addresses and distributes the virtual internet protocol addresses to each server in the preset master and slave clusters to construct corresponding virtual internet protocol links, so that the virtual internet protocol links are respectively connected with the master writing server and the master reading service. It is easy to understand that in the embodiment of the invention, the standby server and the main server group are loaded in a virtual internet protocol address mode, so that a node deployment mode with multiple main and standby modes is formed, and the high availability of the database can be effectively improved. Further, the embodiment of the invention monitors the working state of the main server group through the preset heartbeat detection medium, and when the working state of the main server group is monitored to be a fault state, the embodiment of the invention switches the virtual internet protocol link to be connected with a standby database instance in the standby server and triggers a fault early warning mechanism so as to repair the fault server through the fault early warning mechanism. The fault server in the embodiment of the invention comprises a server with a working state of a fault state in the main server group. The embodiment of the invention can realize the switching between the main server and the standby server by presetting a heartbeat detection mechanism to detect the survival condition of each server, keep the normal operation of the service and alleviate the single-point fault problem of the database. Finally, when the repair of the fault server is determined to be completed, the embodiment of the invention synchronizes the data in the standby server to the main server group and switches the virtual internet protocol link to be connected with the main server group, thereby completing the processing of the single-point fault problem of the database and effectively relieving the single-point fault problem of the database. In addition, according to the node deployment mode of the service parameters, the embodiment of the invention deploys to obtain the preset main equipment cluster comprising the standby server and at least one main server group in a multi-main-one-standby node deployment mode, so that the resources of the standby server can be fully used, the resource cost is reduced, and the resource utilization rate is effectively improved.

Drawings

FIG. 1 is a schematic diagram of a database single point of failure provided by an embodiment of the present invention;

FIG. 2 is a flowchart of a database fault handling method provided by an embodiment of the present invention;

FIG. 3 is a schematic diagram of a multi-primary-backup port allocation according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a configuration of a primary virtual internet protocol address according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of data synchronization after repairing a failed node according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a primary-standby node switching for fault repair according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a primary node failure and a primary-standby node switching provided by an embodiment of the present invention;

FIG. 8 is a schematic diagram of data synchronization during normal operation provided by an embodiment of the present invention;

FIG. 9 is a schematic diagram of a database fault handling system according to an embodiment of the present invention;

FIG. 10 is a schematic block diagram of a database fault handling system provided by an embodiment of the present invention.

Detailed Description

The embodiments described in the present application should not be construed as limitations on the present application, but rather as many other embodiments as possible without inventive faculty to those skilled in the art, are intended to be within the scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the present application.

Before describing embodiments of the present application, related terms referred to in the present application will be first described.

Database (Database): is a "repository" that organizes, stores and manages data according to a data structure, and is a collection of large amounts of data that are stored in a computer for a long period of time, organized, sharable, and uniformly managed.

Single point of failure (Single Point Of Failure, abbreviated SPOF): a point failure in a system, such as a failure of a node, may result in a component that is not operational in the entire system, i.e., a single point failure may result in an overall failure.

Master-Slave (Master-Slave): it means that two identical databases are established, one of which is a master database and the other is a slave database, and the read-write is separated, the master database provides the write operation to the outside, and the slave database provides the read operation to the outside.

Master-slave (Master-keep): the method is to establish two identical databases, one is a main database and the other is a standby database, wherein the main database provides read-write operation to the outside, and when the main database operates normally, the standby database synchronizes the main database data. Accordingly, when the main database fails, the backup database serves the main database instead of the failed main database.

Heartbeat detection: the state of each service node in the cluster system is monitored, and after a certain node is detected to be down, the node can be removed from the service cluster. Accordingly, when the node returns to normal, the node may be added to the cluster again.

Disk IO (Disk IO): IO belongs to the term of a computer and is also written as I/O, which refers to Input/Output (I/O). The hard disk IO refers to the reading speed of bytes, namely the reading and writing capability of the hard disk.

With the continuous high-speed development of Information Technology (IT) and Internet industries, each industry has applied Internet technology to a wider development stage, and various data also become key factors in enterprises. Accordingly, how to efficiently and safely store, query, select and deploy storage middleware, and achieve more efficient resource utilization rate, and the like, is generally focused on the information technology industry. In the related technology, the database middleware presents hundreds of flowers and is arranged, and master-slave configuration also become mainstream deployment strategies. However, when high performance and high concurrency are achieved, there is often a problem of single point failure, as shown in fig. 1, or more resources need to be input, which results in a situation that the overall resource utilization rate is low, and also there is a high security risk. Therefore, the above technical problems need to be solved.

Based on the above, an embodiment of the invention provides a database fault processing method, a system and a storage medium, which can alleviate the problem of single-point fault of a database and effectively improve the high availability and the resource utilization rate of the database. Referring to fig. 2, the method of the embodiment of the present invention includes, but is not limited to, step S110, step S120, step S130, and step S140.

Specifically, the method application process of the embodiment of the invention includes, but is not limited to, the following steps:

s110: and performing node deployment according to the service parameters to obtain a preset main equipment cluster. The preset main equipment cluster comprises a standby server and at least one main server group, wherein the main server group comprises a main read server and a main write server, and standby database examples corresponding to the main read server and the main write server are arranged in the standby server.

S120: creating a plurality of virtual internet protocol addresses, and distributing the plurality of virtual internet protocol addresses to each server in a preset main equipment cluster to construct corresponding virtual internet protocol links. The virtual internet protocol links are respectively connected with the main writing server and the main reading server.

S130: when the working state of the main server group is monitored to be a fault state through a preset heartbeat detection mechanism, switching the virtual internet protocol link to be connected with a standby database instance in the standby server, and triggering a fault early warning mechanism to repair the fault server through the fault early warning mechanism. The fault server comprises servers with working states of fault states in the main server group.

S140: and when the fault server repair is determined to be completed, synchronizing the data in the standby server to the main server group, and switching the virtual Internet protocol link to be connected with the main server group.

In this embodiment, the embodiment of the present invention first performs node deployment according to service parameters to obtain a preset master device cluster. Specifically, in the embodiment of the present invention, the database instance refers to an operation state of the database in the memory in the database management system, where the operation state includes all data and related operation parameters of the database, and the operation state can be accessed and operated. In addition, the service parameters in the embodiment of the present invention refer to different service class parameters, such as service a, service B, and the like. According to the embodiment of the invention, node deployment is carried out through the corresponding service parameters, so that the preset main equipment cluster is constructed. The preset main equipment cluster in the embodiment of the invention comprises a standby server and at least one main server group. Correspondingly, each main server group in the embodiment of the invention comprises a main read server and a main write server, the standby servers are provided with standby database examples corresponding to the main servers and the main servers, a preset main and standby cluster is constructed in a 'multi-main-standby' node deployment mode, and when the main server group fails, the main server group can be switched to the standby servers, so that the normal operation of the service is maintained, and meanwhile, the resource utilization rate can be improved. Further, the embodiment of the invention creates a plurality of virtual internet protocol addresses and distributes the virtual internet protocol addresses to each server in the preset master equipment cluster, thereby constructing corresponding virtual internet protocol links. Specifically, the virtual internet protocol address (Virtual Internet Protocol address, VIP) in embodiments of the present invention is an IP (internet protocol) address that equally distributes network traffic to multiple servers or devices. Wherein, the virtual internet protocol address in the embodiment of the invention is not a physical address which really exists, but is a virtual address configured on the network equipment. The embodiment of the invention carries out fault transfer through the virtual Internet protocol address and switches the flow to a normal server so as to ensure the continuity of the service. Correspondingly, after the constructed virtual internet protocol addresses are distributed to each server, the embodiment of the invention firstly constructs the virtual internet protocol link according to the virtual internet protocol addresses corresponding to the main writing server and the main reading server. It is easy to understand that in the embodiment of the present invention, when initially deployed, the primary write server and the primary read server are first used as the primary node server, so that the embodiment of the present invention constructs a virtual internet protocol link through its corresponding virtual internet protocol address, so as to perform corresponding data read-write operation.

Further, the embodiment of the invention monitors the working state of the main server group through a preset heartbeat detection mechanism. The central jump mechanism in the embodiment of the invention refers to a mechanism for confirming the running state of equipment or a server by periodically sending a heartbeat signal and taking corresponding measures when abnormality is detected. For example, the post-preset heartbeat detection mechanism in the embodiment of the present invention may include a cluster manager (paymaker), a distributed coordination service (Zookeeper), and a keep-alive mechanism (keep-alive). In addition, the fault state in the embodiment of the invention refers to an abnormal working state of the server, such as downtime, crash, disk failure and the like. Specifically, when the working state of the main server group is detected to be a fault state through a preset heartbeat detection mechanism, the embodiment of the invention switches the virtual internet protocol link to be connected with the standby database instance in the standby server. The embodiment of the invention firstly disconnects the connection with the main server group, and then constructs a new virtual Internet protocol link according to the virtual Internet protocol address of the standby database instance corresponding to the main server group, thereby switching the main node from the main server group to the standby server, realizing real-time switching of the standby server when the main server is abnormal, and ensuring the normal operation of the service. Meanwhile, the embodiment of the invention triggers a fault early warning mechanism to repair the fault server through the fault early warning mechanism. It is easy to understand that in the embodiment of the present invention, the failure server includes a server whose working state is a failure state in the active server group, for example, if the active write server in the active server group is down, the active write server is the failure server. Accordingly, in the embodiment of the invention, the fault early warning mechanism can diagnose and repair the fault server through a preset repair program, or push the fault early warning to platform maintainers when the fault server cannot be repaired automatically, so as to inform the corresponding maintainers that the database master node has a fault problem, and further enable the maintainers to repair in time.

Finally, when it is determined that the repair of the failed server is completed, the embodiment of the present invention synchronizes the data in the standby server to the active server group, and switches the virtual internet protocol link to connect with the active server group. Specifically, after the fault early warning mechanism is triggered, the embodiment of the invention analyzes and locates the fault server and repairs the fault server so as to enable the fault server to recover to be normal. Accordingly, when the repair of the failed server is completed, the embodiment of the invention firstly synchronizes the data in the standby server to the active server group. It is easy to understand that in the repair process of the fault server, the standby server is used as the main node, that is, the standby server executes related services, such as writing operation and reading operation of data, and new data cannot be updated to the fault server in time. Therefore, after the repair of the fault server is completed, the corresponding data in the standby server is firstly required to be synchronized into the main server group, so that the data in the main server group is consistent with the data in the standby server, and the problem of untimely data update caused by the fault of the main server group is solved. Then, after the data synchronization is completed, the embodiment of the invention switches the virtual internet protocol link from the standby server to the main server group, namely cuts off the connection with the corresponding standby database instance in the standby server, and constructs a new virtual internet protocol link according to the main write server and the corresponding virtual internet protocol address of the main read server of the main server group, thereby completing the main-standby switching. It is easy to understand that after the repair of the fault server is completed, that is, after the problem of database fault is solved, the embodiment of the invention adjusts and restores the resources of the standby node, continuously synchronizes the data of the main database group, and switches the main node from the standby server back to the main server group. In addition, the embodiment of the invention can be applied to cloud environment of an information security operation scheduling system, the deployment of the backup nodes of multiple databases such as a processing scheduling database, a data security management database, a statistics operation database and the like, and the high availability of the database is realized by adding a small amount of resources to solve single-point faults of the database.

In some embodiments of the present invention, node deployment is performed according to service parameters to obtain a preset master-device cluster, including but not limited to the following steps:

and carrying out service division according to the service parameters to obtain a plurality of service types.

And deploying a corresponding active server group for each service type, and constructing a corresponding standby database instance in the standby server. The standby database examples comprise standby read database examples corresponding to the main read server and standby write database examples corresponding to the main write server.

In this embodiment, the embodiment of the present invention first performs service division according to service parameters to obtain a plurality of service types. Specifically, the embodiment of the invention firstly divides each service data, so that different types of database application scenes are divided into a plurality of corresponding service types, such as transaction, interaction, management and the like. Next, the embodiment of the invention deploys a corresponding active server group for each service type, and constructs a corresponding standby database instance in the standby server. Specifically, in the embodiment of the present invention, each active server group includes an active read server and an active write server. Accordingly, in the embodiment of the present invention, the standby database instance includes a standby read database instance corresponding to the primary read server, and a standby write database instance corresponding to the primary write server. Referring to fig. 3 and fig. 4, the embodiment of the present invention uniformly plans and individually allocates the instance ports of the database of the platform master node, so as to ensure that the cluster node is unique and not repeated as a whole. Meanwhile, the embodiment of the invention selects a server with high disk reading and writing speed as a standby server. Correspondingly, mySQL database examples are installed in the standby server in the embodiment of the invention, and the number and ports of the examples are consistent with those of the application examples of the database master node. According to the embodiment of the invention, the high performance of the server is effectively improved by a mode of separate service master-slave clusters, a mode of separate read-write deployment, a mode of master-write slave-read and a single machine single instance. For example, as shown in fig. 3, in the embodiment of the present invention, service division is performed according to service parameters to obtain corresponding service types, such as service one-M, service one-S, service two-M, and service two-S. Wherein, service one-M represents a master node of service one for writing operation of service one, service one-S represents a slave node of service one for reading operation of service one, and service two is similar to deployment of service one. It is easy to understand that in the embodiment of the present invention, each time a service is added subsequently, a primary server group needs to be added, including a primary node and a secondary node, that is, a primary write server and a primary read server. Meanwhile, in order to improve the utilization rate of resources and reduce the input cost of resources, the standby node, namely the standby server, adopts single-machine deployment, and a plurality of MySQL application examples, namely database examples, are installed in the standby server, the resources are uniformly distributed on each example, and the port of each example is consistent with the port of the main node and is used as backup. For example, referring to fig. 4, the correspondence relationship in the embodiment of the present invention is: service-master M (primary write server), IP:10.0.1.1:3306, and service-master standby M' (standby write database instance), IP:10.0.1.5:3306 corresponds to service one slave node S (active read server), IP:10.0.1.2:3307, and service one slave node standby node S' (standby read database instance), IP:10.0.1.5:3307 corresponds.

In some embodiments of the present invention, after performing the step of monitoring that the working state of the active server group is a fault state through a preset heartbeat detection mechanism, the database fault processing method provided in the embodiment of the present invention further includes, but is not limited to, the following steps:

and when the service parameters corresponding to the fault server are the first type of service, closing the database instance corresponding to the server in the standby server, wherein the working state of the database instance is a non-fault state, and the server in the main server group is corresponding to the database instance.

Or when the service parameter corresponding to the fault server is the second type service, dynamically adjusting the instance resource of the standby server. The dynamic adjustment of the instance resources comprises the steps of reducing the resource configuration of the standby database instance corresponding to the server with the working state being the non-fault state in the main server group in the standby server, and increasing the resource configuration of the standby database instance corresponding to the fault server.

In this embodiment, in order to improve the bearing capacity of the standby node, when the primary server group fails and switches to the standby server, the embodiment of the present invention dynamically adjusts the resources of the standby server. The embodiment of the invention firstly judges the service type through the service parameters corresponding to the fault server, and then adjusts the resources of the standby server according to the service type corresponding to the fault server. Specifically, when the service type of the service parameter corresponding to the failure server is a first type service, the embodiment of the invention closes the database instance corresponding to the server in the standby server, the server in the active server group and the server in the non-failure state. The first service type in the embodiment of the invention comprises a core service or a high concurrency service. It is easy to understand that, in the embodiment of the present invention, when a main node, i.e., a main server group, corresponding to a core service or a high concurrency service fails, the embodiment of the present invention directly shuts down an application instance in a standby server corresponding to another service, i.e., closes a database instance corresponding to a server in a non-failure state, so as to maximally raise a resource of the standby node corresponding to the failed service, thereby improving a bearing capacity of the standby node. In addition, when the service parameter corresponding to the fault server is the second type service, the embodiment of the invention dynamically adjusts the instance resource of the standby server. Correspondingly, the dynamic adjustment of the instance resources in the embodiment of the invention comprises the steps of reducing the resource configuration of the standby database instance corresponding to the server with the working state being the non-fault state in the main server group in the standby server, and increasing the resource configuration of the standby database instance corresponding to the fault server. The second type of service in the embodiment of the invention comprises non-core service or non-high concurrency service. For example, when determining that the failed primary node is a non-core service or a non-high concurrency service, the embodiment of the present invention reduces the resource configuration of the corresponding database instance in the standby server corresponding to the other service, for example, reduces the resource level according to the preset configuration level, and meanwhile, the embodiment of the present invention increases the resource configuration of the standby database instance corresponding to the failed server, for example, increases the resource level according to the preset configuration level. Illustratively, in the dynamic adjustment process of the standby server, embodiments of the present invention first set the transactional store engine (InnoDB) buffer pool size in the MySQL database. Wherein the innodb_buffer_pool_size parameter represents the size of the transactional memory engine (InnoDB) buffer pool, which is recommended to be set to 80% of the system memory. The innodb_buffer_pool_instances parameter represents the number of instances of transactional memory engine (InnoDB) buffer pools, suggesting that the settings are greater than 8, at least 1GB per buffer pool instance. Next, when the MySQL application instance resource that is not failed is subjected to pressure drop, the embodiment of the present invention uses the set global command to set the innodb_buffer_pool_size parameter to 1024000K, i.e. 1GB. Meanwhile, when the MySQL application instance resource which has failed is promoted, the embodiment of the invention uses the set global command to set the innodb_buffer_pool_size parameter to 8192000K, namely 8GB.

In some embodiments of the present invention, data in a standby server is synchronized to a primary server group, including, but not limited to, the steps of:

and synchronizing the data to be synchronized in the standby write database instance to the corresponding main write server. The data to be synchronized comprises incremental data from the detection of the fault server as a fault state to the determination of the completion of the repair of the fault server.

And carrying out data synchronization to the main read server through the main write server.

In this embodiment, after the repair of the failure server is completed, the embodiment of the present invention synchronizes the data to be synchronized in the backup write database instance to the printed primary write server, and then performs data synchronization to the primary read server through the primary server. Specifically, the data to be synchronized in the embodiment of the invention comprises incremental data in the period from the detection of the fault server as a fault state to the determination of the completion of the repair of the fault server. Illustratively, referring to fig. 5, when the primary write server M fails, after the platform maintainer receives the failure alarm, the positioning problem is analyzed in time, and the failure server, i.e., the primary write server M, is subjected to emergency repair. When the repair of the fault server is completed, the backup node, namely the backup server, sets a binary log (Binlog) offset vector, and synchronizes incremental data of the fault server in the period from fault to repair into the active server group through the binary log (Binlog). Accordingly, in the data synchronization process, the standby write database M' in the embodiment of the present invention synchronizes the newly added data to the repaired primary write server M through a binary log (Binlog). Then, the embodiment of the invention starts the original read service master node, namely the master read server S, and continuously synchronizes data to the master read server S through the master write server M, thereby realizing the synchronization of the data generated in the fault repairing process. It should be noted that, during the data synchronization process, the read-write service of the corresponding service is still provided by the standby node, that is, by the corresponding standby database instance in the standby server. Correspondingly, referring to fig. 6, in the embodiment of the present invention, after the repair and restart of the master node and the synchronization of the standby node data are completed, the priority of the keep-alive master node is adjusted, and the virtual internet protocol address (VIP) is switched to the master node, and the master node provides the read-write service for the platform, so that the single point failure processing of the database is completed, and the standby node resource is adjusted and restored to continue synchronizing the master node data.

In some embodiments of the present invention, after performing the steps of creating a plurality of virtual internet protocol addresses and assigning the plurality of virtual internet protocol addresses to respective servers in a preset host device cluster, the database fault handling method provided in the embodiments of the present invention further includes, but is not limited to, the following steps:

the preset heartbeat detection mechanism is configured to be in a non-automatic restart state.

The priority parameter of the preset heartbeat detection mechanism is configured to determine the service host through the priority parameter.

The process shutdown script is configured to direct virtual internet protocol link switching through the process shutdown script.

In this embodiment, the heartbeat detection is performed by a keep-alive mechanism (keepalive). Correspondingly, after the Virtual Internet Protocol (VIP) is created by the keep-alive mechanism (keep-alive) and distributed to the corresponding server node, the embodiment of the invention configures the keep-alive mechanism (keep-alive) to confirm the working state of each node in a heartbeat detection mode, so that when the host node fails, the host node can switch the host-backup Internet Protocol (IP) address in time. Specifically, the embodiment of the invention firstly configures the preset heartbeat detection mechanism into a non-automatic restarting state, namely configures a keep-alive mechanism (keep-alive) into the non-automatic restarting state. It is easy to understand that when the keep-alive mechanism (keep-alive) is set to automatically restart, after the repair of the failed server is completed, the keep-alive mechanism (keep-alive) will automatically restart and switch the virtual internet protocol address link back to connect with the active server group, i.e. directly switch the read-write service from the standby server to the active server. However, after the repair of the failed server is completed, a certain data synchronization time is usually required to synchronize the corresponding incremental data from the standby server to the primary write server, so that a keep-alive mechanism (keep-alive) needs to be set to a non-automatic restart state, so that the failed server can have sufficient time to perform data synchronization after the repair is completed. Next, the embodiment of the invention configures the priority parameter of the preset heartbeat detection mechanism to determine the service host through the priority parameter. For example, the embodiment of the present invention configures priority parameters of the standby server, the active read server, and the active write server to direct the service hosts to the corresponding virtual internet protocol addresses, such as the stand-alone deployment instances of 10.0.1.1:3306, 10.0.1.2:3307, and the like. Further, the embodiment of the invention carries out configuration of the process closing script so as to guide the virtual internet protocol connection switching through the process closing script. In the embodiment of the invention, the process closing script refers to a script executed when the keepalive process is closed or stopped. It will be readily appreciated that keepalive will automatically switch the Internet Protocol (IP) address from the active server farm to the standby server in the event of a failure of the active server farm to ensure continuity of service.

In some embodiments of the present invention, when the working state of the active server group is monitored to be a fault state by a preset heartbeat detection mechanism, the virtual internet protocol link is switched to connect with a standby database instance in the standby server, including but not limited to the following steps:

when the working state of the main writing server is detected to be a fault state through a preset heartbeat detection mechanism, the main reading server corresponding to the main writing server is shut down through a preset stopping script.

The virtual internet protocol link is switched to connect with the corresponding spare write database instance and the spare read database instance.

In this embodiment, when a failure of the primary write server is detected, the embodiment of the present invention first shuts down the corresponding primary read server, and then switches the virtual internet protocol connection to connect with the standby server, thereby implementing primary-standby switching. Specifically, in the embodiment of the present invention, when detecting that the working state of the active write server is a failure state through a keep-alive mechanism (keep-alive), the embodiment of the present invention first shuts down the active read server corresponding to the active write server through a preset stop script. In the embodiment of the present invention, the preset stop script refers to a script for the active exit or stop of the service of the master node, for example, the active read server actively stops the read service. It is easy to understand that in the embodiment of the present invention, the data of the active read server needs to be updated synchronously by the corresponding active write server. Therefore, when the corresponding primary write server fails, the primary read server cannot update the data, so that the data acquired from the primary read server is not the latest data. Therefore, the embodiment of the invention shuts down the main read server corresponding to the main write server through the preset stop script, and switches the virtual Internet protocol connection to be connected with the corresponding standby write database instance and the standby database instance, thereby switching the read write server to the corresponding database instance in the standby server for execution, and further effectively improving the stability and usability of the system. For example, referring to fig. 7, when a keep-alive mechanism (keep-alive) detects that a primary write server M of 10.0.1.1:3306 is deactivated, an embodiment of the present invention automatically switches to a corresponding backup write database instance M' by a virtual internet protocol address (VIP), IP:10.0.1.5:3306, and the primary read server S is turned off by a preset stop script because a primary node of a write service, i.e., the primary write server M, fails, resulting in an inability to synchronize data to a primary node of a read service, i.e., the primary read server S. Meanwhile, the embodiment of the invention switches the read service to the standby read database instance S' through a virtual Internet protocol address (VIP) strategy. At this time, in the embodiment of the present invention, the platform write operation and the read operation are switched to the standby node for execution through the virtual internet protocol address (VIP).

It should be noted that, in the embodiment of the present invention, when the working state of the primary read server is detected to be a fault state by the preset heartbeat detection mechanism, the read service may be switched to the standby read database instance S' by a virtual internet protocol address (VIP) policy. In addition, the embodiment of the invention can also shut down the main writing server corresponding to the main reading server through the preset stopping script, and switch the virtual internet protocol link to be connected with the corresponding standby writing database instance and the standby reading database instance.

In some embodiments of the present invention, the database fault handling method provided in the embodiments of the present invention further includes, but is not limited to, the following steps:

and connecting with the corresponding main writing server through a virtual internet protocol address link, and executing a platform data writing operation to write the platform input data into the main writing server.

Platform input data is synchronized to corresponding primary read servers and backup write database instances through binary logs.

Platform input data is synchronized to corresponding spare read database instances by spare write database instances.

In this embodiment, in the process of executing the data writing service, the embodiment of the present invention first connects to the corresponding active writing server through the virtual internet protocol address link, and executes the platform data writing operation, so as to write the platform input data into the active writing server. Then, the embodiment of the invention synchronizes the updated platform input data in the main writing server to the corresponding main reading server and the backup writing database instance through the binary log. Finally, the embodiment of the invention synchronizes the platform input data to the corresponding standby read database instance through the standby write database instance, thereby completing the whole master-slave and master-standby data synchronization process. For example, referring to fig. 8, in the embodiment of the present invention, the preset platform links the master node of the business write-once service, that is, the master write server M (10.0.1.1:3306), through the VIP (10.0.0.1:3306), and performs the platform data write operation, and simultaneously links the master node of the business read service, that is, the master read server S (10.0.1.2:3307), through the VIP (10.0.0.2:3307), so as to perform the platform data read operation. After writing the corresponding platform input data in the primary write server M, in the embodiment of the present invention, the primary write server M uses a binary log (Binlog) synchronization technique to synchronize the new data to the primary read server S (10.0.1.2:3307), and the primary write server backup node M' (10.0.1.5:3306). Further, the alternate write database instance M 'of service one (10.0.1.5:3306) resynchronizes the data to the alternate node of the service read service, namely the alternate read database instance S' (10.0.1.5:3307).

Taking a database single point failure processing application scenario as an example, the embodiment of the invention firstly performs node deployment according to service parameters to obtain a preset main equipment cluster. Correspondingly, the preset main equipment cluster comprises a standby server and at least one main server group. The active server group in the embodiment of the invention comprises an active read server and an active write server, wherein the standby server is provided with a standby read database instance corresponding to the active read server and a standby write database instance corresponding to the active write server. For example, when two service types of data need to be processed in the embodiment of the present invention, two active server groups and one standby server are deployed in the embodiment of the present invention, where each active server group includes an active read server and an active write server, and four database instances are set in the standby server, and each database instance corresponds to each server in the two active server groups. Then, the embodiment of the invention creates a plurality of virtual internet protocol addresses and distributes the virtual internet protocol addresses to the preset master equipment cluster to construct corresponding virtual internet protocol links. Meanwhile, after a plurality of virtual internet protocol addresses are created, the embodiment of the invention configures a preset heartbeat detection mechanism, such as a keep-alive mechanism (keep-alive), into a non-automatic restarting state so as to ensure that sufficient time can be provided for data synchronization after a fault server is restarted, and simultaneously configures priority parameters of the preset heartbeat detection mechanism so as to determine a service host through the priority parameters, and guides the host to point to the corresponding virtual internet protocol addresses. In addition, the embodiment of the invention also configures the process shutdown script to guide the virtual Internet protocol link switching through the process shutdown script. Correspondingly, when the system works normally, the virtual internet protocol links in the embodiment of the invention are respectively connected with the corresponding main writing server and the main reading server. When the system works normally, the embodiment of the invention is connected with the corresponding main writing server through the virtual internet protocol address link and executes the platform data writing operation to write the platform input data into the main writing server, and simultaneously synchronizes the platform input data to the corresponding main reading server and the standby writing database instance through the binary log, and further synchronizes the platform input data to the corresponding standby reading database instance through the standby writing database instance, thereby realizing the synchronization of the main data and the standby data.

Further, in the embodiment of the present invention, each server is detected by a preset heartbeat detection mechanism, such as a keep-alive mechanism (keep-alive), so as to determine the working state of each server. When the working state of the main server group is detected to be the fault state through the preset heartbeat detection mechanism, the embodiment of the invention switches the virtual internet protocol connection to the standby database instance connection in the standby server, and simultaneously triggers the fault early warning mechanism so as to repair the fault server, namely the server with the working state being the fault state in the main server group through the fault early warning mechanism. Specifically, when the working state of the main writing server is detected to be a fault state through a preset heartbeat detection mechanism, the main reading server corresponding to the main writing server is stopped through a preset stopping script, and the virtual internet protocol link is switched to be connected with the corresponding standby writing database instance and the standby reading database instance. In addition, the embodiment of the invention adjusts the resources in the standby server after the virtual internet protocol address (VIP) is automatically switched to the corresponding standby node. Specifically, when the service parameter corresponding to the failure server is a first type of service, such as a core service or a high concurrency service, the embodiment of the present invention closes a database instance corresponding to a server in the standby server, where the working state is a non-failure state, and the server in the primary server group. Or when the service parameter corresponding to the fault server is a second type of service, such as a non-core service or a non-high concurrency service, the embodiment of the invention dynamically adjusts the instance resource of the standby server. For example, the embodiment of the invention reduces the resource configuration of the standby database instance corresponding to the server with the working state being the non-fault state in the main server group in the standby server, and increases the resource configuration of the standby database instance corresponding to the fault server. The embodiment of the invention can effectively improve the bearing capacity of the standby node by dynamically adjusting the resources in the standby server.

Further, when it is determined that the repair of the failed server is completed, the embodiment of the present invention synchronizes the data in the standby server to the active server group and switches the virtual internet protocol link to connect with the active server group. Specifically, the embodiment of the invention synchronizes the data to be synchronized in the standby write database instance to the corresponding main write server, and then performs data synchronization to the main read server through the main write server, thereby completing the switching from the standby node to the main node. The data to be synchronized in the embodiment of the invention comprises incremental data from the detection of the fault server as a fault state to the determination of the completion of the repair of the fault server. It is easy to understand that in the embodiment of the invention, by using the virtual internet protocol address, a plurality of database master nodes and backup nodes are loaded to form 'multiple master and backup', so that the dynamic switching to the backup nodes can be ensured when the master node has an operation fault, the problem of single-point fault of the database can be relieved, the high availability and the resource utilization rate of the database can be effectively improved, and the writing capability of the backup nodes can be effectively improved by providing the backup nodes with high IO disks. In addition, the embodiment of the invention can effectively improve the load capacity by dynamically adjusting the resources of the standby server.

Referring to FIG. 9, one embodiment of the present invention also provides a database fault handling system, comprising:

the first module 210 is configured to perform node deployment according to the service parameter, so as to obtain a preset master device cluster. The preset main equipment cluster comprises a standby server and at least one main server group, wherein the main server group comprises a main read server and a main write server, and standby database examples corresponding to the main read server and the main write server are arranged in the standby server.

The second module 220 is configured to create a plurality of virtual internet protocol addresses, and allocate the plurality of virtual internet protocol addresses to each server in the preset master device cluster, so as to construct a corresponding virtual internet protocol link. The virtual internet protocol links are respectively connected with the main writing server and the main reading server.

And a third module 230, configured to switch the virtual internet protocol link to connect with the standby database instance in the standby server when the working state of the active server set is monitored by the preset heartbeat detection mechanism to be a fault state, and trigger the fault early warning mechanism to repair the fault server by the fault early warning mechanism. The fault server comprises servers with working states of fault states in the main server group.

And a fourth module 240, configured to synchronize data in the standby server to the active server set and switch the virtual internet protocol link to connect with the active server set when it is determined that the repair of the failed server is completed.

The content of the method embodiment of the invention is suitable for the system embodiment, the specific function of the system embodiment is the same as that of the method embodiment, and the achieved beneficial effects are the same as those of the method.

Referring to fig. 10, one embodiment of the present invention further provides a database fault handling system, including:

at least one processor 310.

At least one memory 320 for storing at least one program.

The at least one program, when executed by the at least one processor 310, causes the at least one processor 310 to implement the database fault handling method as described in the above embodiments.

An embodiment of the present invention also provides a computer-readable storage medium storing computer-executable instructions for execution by one or more control processors, e.g., performing the database fault handling method steps described in the above embodiments.

The terms "first," "second," "third," "fourth," and the like in the description of the present application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be capable of operation in sequences other than those illustrated or described herein, for example. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in this application, "at least one" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

Those of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.

The step numbers in the above method embodiments are set for convenience of illustration, and the order of steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.

While the preferred embodiment of the present invention has been described in detail, the present invention is not limited to the embodiments described above, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the present invention, and these equivalent modifications and substitutions are intended to be included in the scope of the present invention as defined in the appended claims.

Claims

1. A database fault handling method, comprising the steps of:

2. The method for processing a database fault according to claim 1, wherein the deploying nodes according to the service parameters to obtain a preset master device cluster includes:

3. The database fault handling method according to claim 1, wherein after performing the step of monitoring that the working state of the active server group is a fault state through a preset heartbeat detection mechanism, the method further comprises:

4. The database fault handling method of claim 2, wherein synchronizing the data in the backup server to the active server group comprises:

5. The database fault handling method according to claim 1, wherein after performing the step of creating a number of virtual internet protocol addresses and assigning a number of the virtual internet protocol addresses to respective servers in the pre-set master cluster, the method further comprises:

6. The method for processing a database fault according to claim 2, wherein when the working state of the active server group is monitored to be a fault state by a preset heartbeat detection mechanism, switching the virtual internet protocol link to be connected with the standby database instance in the standby server comprises:

7. The database fault handling method of claim 6, wherein the method further comprises:

8. A database fault handling system, comprising:

9. A database fault handling system, comprising:

At least one processor;

at least one memory for storing at least one program;

the at least one program, when executed by the at least one processor, causes the at least one processor to implement the database fault handling method of any of claims 1 to 7.

10. A computer storage medium in which a processor-executable program is stored, characterized in that the processor-executable program is for implementing the database fault handling method according to any one of claims 1 to 7 when being executed by the processor.