CN116089222A - Database cluster monitoring method, device, equipment and medium - Google Patents

Database cluster monitoring method, device, equipment and medium Download PDF

Info

Publication number
CN116089222A
CN116089222A CN202310232325.1A CN202310232325A CN116089222A CN 116089222 A CN116089222 A CN 116089222A CN 202310232325 A CN202310232325 A CN 202310232325A CN 116089222 A CN116089222 A CN 116089222A
Authority
CN
China
Prior art keywords
database
cluster
database node
node
shared storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310232325.1A
Other languages
Chinese (zh)
Inventor
赵武清
沈卫强
李承钊
柏姗姗
耿新
李科德
李英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Southern Power Grid Digital Power Grid Group Information Communication Technology Co ltd
Original Assignee
China Southern Power Grid Digital Power Grid Group Information Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Southern Power Grid Digital Power Grid Group Information Communication Technology Co ltd filed Critical China Southern Power Grid Digital Power Grid Group Information Communication Technology Co ltd
Priority to CN202310232325.1A priority Critical patent/CN116089222A/en
Publication of CN116089222A publication Critical patent/CN116089222A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/203Failover techniques using migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3089Monitoring arrangements determined by the means or processing involved in sensing the monitored data, e.g. interfaces, connectors, sensors, probes, agents
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a database cluster monitoring method, device, equipment and medium. The method comprises the following steps: detecting the running states of at least two main database nodes of the shared storage database cluster in the local machine room through a cluster monitoring component, and determining whether the shared storage database cluster is in a fault state or not according to the running states of all the main database nodes; when the cluster monitoring component determines that the shared storage database cluster is in a fault state, switching each main database node from the main database node to a backup database node, switching a first backup database node in a remote machine room from the backup database node to the main database node, and setting a second backup database node in the remote machine room to be a backup database node corresponding to the first backup database node. According to the embodiment of the invention, the cluster monitoring assembly can automatically monitor the shared storage database cluster, so that the labor cost and the time cost are reduced, and the monitoring efficiency is improved.

Description

Database cluster monitoring method, device, equipment and medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a medium for monitoring a database cluster.
Background
Shared storage database clusters are based on shared storage and typically comprise a shared storage device and a plurality of database nodes. The shared storage device is used to store data. Each database node of the shared storage database cluster may access and modify data in the shared storage device. When an application program needs to perform read-write operation on data in the shared storage equipment of the shared storage database cluster, the application program sends a data read-write request to any one database node of the shared storage database cluster, and the database node executes corresponding data read operation or data write operation. In order to ensure stable operation of the shared storage database cluster, the shared storage database cluster needs to be monitored.
In the related art, a common database cluster monitoring scheme for a shared storage database cluster is as follows: and detecting the running state of each database node of the shared storage database cluster by a technician, and determining whether the shared storage database cluster is in a fault state or not, and whether the shared storage database cluster can provide data read-write service or not. The database cluster monitoring scheme in the related art relies on manpower, has higher labor cost and time cost and lower monitoring efficiency, and cannot ensure that the shared storage database cluster can provide data read-write service for a long time.
Disclosure of Invention
The invention provides a method, a device, equipment and a medium for monitoring a database cluster, which are used for solving the problems that the labor cost and the time cost of a database cluster monitoring scheme in the related technology are high, the monitoring efficiency is low, and the shared storage database cluster can not be guaranteed to provide data read-write service for a long time.
According to an aspect of the present invention, there is provided a database cluster monitoring method, including:
detecting the running states of at least two main database nodes of a shared storage database cluster in a local machine room through a cluster monitoring component, and determining whether the shared storage database cluster is in a fault state according to the running states of the main database nodes;
and when the cluster monitoring component determines that the shared storage database cluster is in a fault state, switching each main database node from the main database node to a backup database node, switching a first backup database node of the shared storage database cluster in a different place machine room from the backup database node to the main database node, and setting a second backup database node of the shared storage database cluster in the different place machine room to be the backup database node corresponding to the first backup database node.
According to another aspect of the present invention, there is provided a database cluster monitoring apparatus, including:
the state detection module is used for detecting the running states of at least two main database nodes of the shared storage database cluster in the local machine room through the cluster monitoring component, and determining whether the shared storage database cluster is in a fault state according to the running states of the main database nodes;
and the node switching module is used for switching each main database node from the main database node to the backup database node when the shared storage database cluster is determined to be in a fault state through the cluster monitoring component, switching a first backup database node of the shared storage database cluster in the remote machine room from the backup database node to the main database node, and setting a second backup database node of the shared storage database cluster in the remote machine room to be the backup database node corresponding to the first backup database node.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor;
and a memory communicatively coupled to the at least one processor network;
The memory stores a computer program executable by the at least one processor, and the computer program is executed by the at least one processor, so that the at least one processor can execute the database cluster monitoring method according to any embodiment of the present invention.
According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement a database cluster monitoring method according to any embodiment of the present invention.
According to the technical scheme, the cluster monitoring component is used for detecting the running states of at least two main database nodes of the shared storage database cluster in the local machine room, and determining whether the shared storage database cluster is in a fault state or not according to the running states of all the main database nodes; and then, when the shared storage database cluster is determined to be in a fault state through the cluster monitoring component, switching each main database node from the main database node to a backup database node, switching a first backup database node of the shared storage database cluster in a different place computer room from the backup database node to the main database node, setting a second backup database node of the shared storage database cluster in the different place computer room to be a backup database node corresponding to the first backup database node, solving the problems that the labor cost and the time cost of a database cluster monitoring scheme in the related technology are higher, the monitoring efficiency is lower, and the shared storage database cluster can not be guaranteed to provide data read-write service for a long time, taking the problem that the shared storage database cluster can not be automatically monitored through the cluster monitoring component, determining whether the shared storage database cluster can provide data read-write service, and timely switching the first backup database node in the different place computer room to be a new main database node when the shared storage database cluster is determined to be in the fault state, setting the second backup database node in the different place computer room to be the new backup database node, and setting the second backup database node in the different place computer room to be the new backup database node corresponding to the new database node, enhancing the labor cost and the time cost of the shared storage database cluster can be a stable, thereby reducing the cost of the shared storage database cluster monitoring process, and the time-saving and the cost of the shared storage system can be guaranteed.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for monitoring a database cluster according to an embodiment of the present invention.
Fig. 2 is a flowchart of a database cluster monitoring method according to a second embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a database cluster monitoring device according to a third embodiment of the present invention.
Fig. 4 is a schematic structural diagram of an electronic device implementing a database cluster monitoring method according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "object," "first," "second," and the like in the description and the claims of the present invention and the above drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example 1
Fig. 1 is a flowchart of a method for monitoring a database cluster according to an embodiment of the present invention. The embodiment can be suitable for monitoring the shared storage database cluster in the running process of the shared storage database cluster to determine whether the shared storage database cluster is in a fault state or not, and whether the shared storage database cluster can provide data read-write service or not. The method may be performed by a database cluster monitoring device, which may be implemented in hardware and/or software, which may be configured in an electronic device. As shown in fig. 1, the method includes:
Step 101, detecting the running states of at least two main database nodes of the shared storage database cluster in the local machine room through the cluster monitoring component, and determining whether the shared storage database cluster is in a fault state according to the running states of all the main database nodes.
Optionally, the shared storage database cluster is a shared storage database cluster that needs to be monitored. The shared storage device of the shared storage database cluster and the at least two master database nodes are arranged in a local machine room. The shared storage device is used to store data. The shared storage device may be a disk device for storing data. The database node may be a server with database software installed.
Optionally, the master database node is a database node in the shared storage database cluster, which is used for receiving a data read-write request sent by the application program, accessing or modifying data according to the data read-write request, thereby providing a data read-write service.
Optionally, the backup database node is a database node for switching to a new master database node when all master database nodes of the shared storage database cluster are abnormal in operation, executing a data read-write request sent by the receiving application program instead of the old master database node, and accessing or modifying data according to the data read-write request, thereby providing an operation of the data read-write service. The backup database nodes of the shared storage database cluster include a first backup database node and a second backup database node. The first backup database node and the second backup database node are arranged in a different place machine room.
Optionally, the first backup database node and the second backup database node have a master-slave replication relationship with the shared storage devices of the shared storage database cluster. The data in the shared storage device is synchronized in real time to the storage devices of the first backup database node and the second backup database node. The data in the shared storage device is consistent with the data in the storage devices in the first backup database node and the second backup database node. The main database node arranged in the local machine room can receive the data read-write request sent by the application program, and access or modify the data in the shared storage device according to the data read-write request. After the first backup database node or the second backup database node arranged in the remote machine room becomes a new main database node, a data read-write request sent by an application program can be received, and the data in the storage equipment in the first backup database node or the second backup database node can be accessed or modified according to the data read-write request.
Alternatively, the storage devices of the first backup database node and the second backup database node may be disk devices in the first backup database node and the second backup database node.
Optionally, the cluster monitoring component is a component provided in the electronic device for monitoring the shared storage database cluster. The cluster monitoring component can switch the main database node from the main database node to the backup database node, so that the main database node is controlled not to execute the operation of receiving the data read-write request sent by the application program, accessing or modifying the data according to the data read-write request, and providing the data read-write service. The cluster monitoring component can switch the backup database node from the backup database node to the main database node, thereby controlling the backup database node to be a new main database node, executing the operation of receiving the data read-write request sent by the application program, accessing or modifying the data according to the data read-write request, and providing the data read-write service.
Optionally, the detecting the operation state of at least two master database nodes of the shared storage database cluster in the local machine room includes: and according to a preset time interval, the running state of each main database node is read at fixed time.
Optionally, the preset time interval may be set according to service requirements. For example, 5 seconds. The operation state of the master database node is information which is set in the master database node and used for representing whether the master database node operates normally. The running state is a normal state or an abnormal state. And if the operation state of the main database node is a normal state, indicating that the main database node operates normally. If the running state of the main database node is abnormal, the main database node is indicated to be faulty or down, the data read-write request sent by the application program cannot be continuously executed, and the data is accessed or modified according to the data read-write request, so that the operation of the data read-write service is provided.
Optionally, the determining whether the shared storage database cluster is in a fault state according to the operation state of each master database node includes: if the running state of each main database node is an abnormal state, determining that the shared storage database cluster is in a fault state; and if the target main database node with the normal running state exists in the main database nodes, determining that the shared storage database cluster is not in a fault state.
Optionally, if the operation state of each main database node is an abnormal state, which indicates that each main database node in the shared storage database cluster is faulty or down, the data read-write request sent by the application program cannot be continuously executed, and the data is accessed or modified according to the data read-write request, so that the operation of the data read-write service is provided, that is, the shared storage database cluster cannot normally operate and cannot continuously provide the data read-write service, and then it is determined that the shared storage database cluster is in the faulty state.
Optionally, the target master database node is a master database node whose operation state is a normal state in each of the master database nodes. If a target main database node with a normal running state exists in each main database node, the main database node in the shared storage database cluster is indicated to continuously execute the data read-write request sent by the receiving application program, and the data is accessed or modified according to the data read-write request, so that the operation of data read-write service is provided, namely the shared storage database cluster can normally run and can continuously provide the data read-write service, and the shared storage database cluster is determined not to be in a fault state.
Optionally, when it is determined that the shared storage database cluster is not in the failure state, the operation state of each master database node is continuously read at regular time according to a preset time interval.
Optionally, after detecting the operation states of at least two master database nodes of the shared storage database cluster in the local machine room, the method further includes: and generating a visual chart corresponding to the running state of each main database node, and providing the visual chart for a user. The visualization graph corresponding to the operation state of each of the master database nodes is a visualization graph for showing the operation state of each of the master database nodes. Visualization charts include, but are not limited to, tables.
Optionally, the providing the visual chart to the user includes: and displaying the visual chart in a cluster display page. The cluster presentation page is a page for interacting with a user and providing information related to the shared storage database cluster to the user. The user may be a technician responsible for managing the shared storage database cluster. And displaying the visual chart in a node state display area of the cluster display page. The node state display area is a page area for displaying a visual chart corresponding to the operation state of each of the master database nodes.
102, when determining that the shared storage database cluster is in a fault state, through the cluster monitoring component, switching each main database node from the main database node to a backup database node, switching a first backup database node of the shared storage database cluster in the remote machine room from the backup database node to the main database node, and setting a second backup database node of the shared storage database cluster in the remote machine room to be a backup database node corresponding to the first backup database node.
Optionally, if the shared storage database cluster is in a fault state, that is, the shared storage database cluster cannot normally operate and cannot continue to provide the data read-write service, the cluster monitoring component switches each main database node from the main database node to the backup database node, switches a first backup database node of the shared storage database cluster in the remote machine room from the backup database node to the main database node, and sets a second backup database node of the shared storage database cluster in the remote machine room to the backup database node corresponding to the first backup database node.
Optionally, the first backup database node becomes a new master database node, and performs an operation of receiving a data read-write request sent by the application program instead of the old master database node, accessing or modifying data according to the data read-write request, thereby providing a data read-write service. The second backup database node is a backup database node corresponding to the first backup database node, and is used for switching to a new main database node when the first backup database node runs abnormally, replacing the old main database node to execute the data read-write request sent by the receiving application program, and accessing or modifying the data according to the data read-write request, so that the operation of the data read-write service is provided.
Optionally, the method further comprises: detecting the running state of the cluster monitoring component, and providing preset alarm information for a user when detecting that the running state of the cluster monitoring component is abnormal. The preset alarm information is used for prompting the running state of the cluster monitoring component to be an abnormal state.
Optionally, detecting the running state of the cluster monitoring component includes: and according to a preset time interval, the running state of the cluster monitoring component is read at fixed time. The preset time interval may be set according to the service requirement. For example, 5 seconds.
Optionally, the operation state of the cluster monitoring component is information for characterizing whether the cluster monitoring component is operating normally. The running state is a normal state or an abnormal state. If the running state of the cluster monitoring component is a normal state, the normal running of the cluster monitoring component is indicated, the running state of the shared storage database cluster can be monitored normally, and the reliability and the high availability of the shared storage database cluster are enhanced. If the running state of the cluster monitoring component is abnormal, the fault of the cluster monitoring component is indicated, and the running state of the storage database cluster cannot be continuously shared.
Therefore, the running state of the cluster monitoring assembly can be detected in real time, and when the cluster monitoring assembly runs abnormally, a user can be prompted through preset alarm information, so that the user can take measures in time, the stable running of the cluster monitoring assembly is ensured, and the reliability and the high availability of the shared storage database cluster are enhanced.
Optionally, after setting the second backup database node of the shared storage database cluster in the remote machine room as the backup database node corresponding to the first backup database node, the method further includes: detecting the running state of the first backup database node through the cluster monitoring component; and when the cluster monitoring component detects that the running state of the first backup database node is an abnormal state, switching the first backup database node from a main database node to a backup database node, and switching the second backup database node from the backup database node to the main database node.
Optionally, detecting the operation state of the first backup database node includes: and according to a preset time interval, the running state of the first backup database node is read at regular time. The preset time interval may be set according to the service requirement. For example, 5 seconds. The running state of the first backup database node is information which is set in the first backup database node and used for representing whether the first backup database node runs normally or not. The running state is a normal state or an abnormal state. And if the running state of the first backup database node is a normal state, indicating that the first backup database node runs normally. If the running state of the first backup database node is abnormal, the first backup database node is indicated to be failed or down, the data read-write request sent by the application program cannot be continuously executed, and the data is accessed or modified according to the data read-write request, so that the operation of the data read-write service is provided.
Optionally, after the first backup database node of the shared storage database cluster in the remote computer room is switched from the backup database node to the master database node, if the operation state of the first backup database node is detected to be an abnormal state, that is, the first backup database node cannot continue to provide the data read-write service, the first backup database node is switched from the master database node to the backup database node and the second backup database node is switched from the backup database node to the master database node through the cluster monitoring component. The second backup database node becomes a new main database node, replaces the old main database node to execute the operation of receiving the data read-write request sent by the application program, and accessing or modifying the data according to the data read-write request, thereby providing the data read-write service.
According to the technical scheme, the cluster monitoring component is used for detecting the running states of at least two main database nodes of the shared storage database cluster in the local machine room, and determining whether the shared storage database cluster is in a fault state or not according to the running states of all the main database nodes; and then, when the shared storage database cluster is determined to be in a fault state through the cluster monitoring component, switching each main database node from the main database node to a backup database node, switching a first backup database node of the shared storage database cluster in a different place computer room from the backup database node to the main database node, setting a second backup database node of the shared storage database cluster in the different place computer room to be a backup database node corresponding to the first backup database node, solving the problems that the labor cost and the time cost of a database cluster monitoring scheme in the related technology are higher, the monitoring efficiency is lower, and the shared storage database cluster can not be guaranteed to provide data read-write service for a long time, taking the problem that the shared storage database cluster can not be automatically monitored through the cluster monitoring component, determining whether the shared storage database cluster can provide data read-write service, and timely switching the first backup database node in the different place computer room to be a new main database node when the shared storage database cluster is determined to be in the fault state, setting the second backup database node in the different place computer room to be the new backup database node, and setting the second backup database node in the different place computer room to be the new backup database node corresponding to the new database node, enhancing the labor cost and the time cost of the shared storage database cluster can be a stable, thereby reducing the cost of the shared storage database cluster monitoring process, and the time-saving and the cost of the shared storage system can be guaranteed.
Example two
Fig. 2 is a flowchart of a database cluster monitoring method according to a second embodiment of the present invention. Embodiments of the invention may be combined with various alternatives to one or more of the embodiments described above. As shown in fig. 2, the method includes:
step 201, detecting the operation states of at least two main database nodes of the shared storage database cluster in the local machine room through the cluster monitoring component, and determining whether the shared storage database cluster is in a fault state according to the operation states of all the main database nodes.
Step 202, when it is determined that the shared storage database cluster is in a fault state, through the cluster monitoring component, switching each main database node from the main database node to a backup database node, switching a first backup database node of the shared storage database cluster in the remote machine room from the backup database node to the main database node, and setting a second backup database node of the shared storage database cluster in the remote machine room to be a backup database node corresponding to the first backup database node.
Optionally, if the shared storage database cluster is in a fault state, that is, the shared storage database cluster cannot normally operate and cannot continue to provide the data read-write service, the cluster monitoring component switches each main database node from the main database node to the backup database node, switches a first backup database node of the shared storage database cluster in the remote machine room from the backup database node to the main database node, and sets a second backup database node of the shared storage database cluster in the remote machine room to the backup database node corresponding to the first backup database node.
Optionally, the first backup database node becomes a new primary database node, and performs an operation of receiving a data read-write request sent by the application program, and accessing or modifying data according to the data read-write request, instead of the old primary database node. The second backup database node is a backup database node corresponding to the first backup database node, and is used for switching to a new main database node when the first backup database node runs abnormally, replacing the old main database node to execute the operation of receiving the data read-write request sent by the application program and accessing or modifying the data according to the data read-write request.
Step 203, detecting an operation state of the first backup database node through the cluster monitoring component.
Optionally, detecting the operation state of the first backup database node includes: and according to a preset time interval, the running state of the first backup database node is read at regular time.
Optionally, the preset time interval may be set according to service requirements. For example, 5 seconds. The running state of the first backup database node is information which is set in the first backup database node and used for representing whether the first backup database node runs normally or not. The running state is a normal state or an abnormal state. And if the running state of the first backup database node is a normal state, indicating that the first backup database node runs normally. If the running state of the first backup database node is abnormal, the first backup database node is indicated to be failed or down, the data read-write request sent by the application program cannot be continuously executed, and the data is accessed or modified according to the data read-write request, so that the operation of the data read-write service is provided.
Step 204, when detecting that the operation state of the first backup database node is an abnormal state, the cluster monitoring component switches the first backup database node from the main database node to the backup database node, and switches the second backup database node from the backup database node to the main database node.
Optionally, after the first backup database node of the shared storage database cluster in the remote computer room is switched from the backup database node to the master database node, if the operation state of the first backup database node is detected to be an abnormal state, that is, the first backup database node cannot continue to provide the data read-write service, the first backup database node is switched from the master database node to the backup database node and the second backup database node is switched from the backup database node to the master database node through the cluster monitoring component. The second backup database node becomes a new main database node, replaces the old main database node to execute the operation of receiving the data read-write request sent by the application program, and accessing or modifying the data according to the data read-write request, thereby providing the data read-write service.
According to the technical scheme, the shared storage database cluster can be automatically monitored through the cluster monitoring component, whether the shared storage database cluster is in a fault state or not is determined, whether the shared storage database cluster can provide data read-write service or not can be determined, when the shared storage database cluster is in the fault state, the first backup database node in the remote machine room is timely switched to a new main database node, the second backup database node in the remote machine room is set to the backup database node corresponding to the new main database node, after the first backup database node in the remote machine room is switched to the new main database node, the second backup database node is timely switched to the new main database node through the cluster monitoring component when the first backup database node is determined to be incapable of continuously providing data read-write service, the labor cost and the time cost of a database cluster monitoring process are reduced, the monitoring efficiency is improved, the shared storage database can be ensured to stably provide data read-write service, and the reliability and the high availability of the shared storage database cluster are enhanced.
Example III
Fig. 3 is a schematic structural diagram of a database cluster monitoring device according to a third embodiment of the present invention. The apparatus may be configured in an electronic device. As shown in fig. 3, the apparatus includes: a state detection module 301 and a node switching module 302.
The state detection module 301 is configured to detect, by using a cluster monitoring component, an operation state of at least two master database nodes of a shared storage database cluster in a local machine room, and determine, according to the operation state of each master database node, whether the shared storage database cluster is in a fault state; the node switching module 302 is configured to switch, by the cluster monitoring component, each primary database node from a primary database node to a backup database node when determining that the shared storage database cluster is in a failure state, switch, by the backup database node, a first backup database node of the shared storage database cluster in the remote machine room to the primary database node, and set, by the cluster monitoring component, a second backup database node of the shared storage database cluster in the remote machine room to a backup database node corresponding to the first backup database node.
According to the technical scheme, the cluster monitoring component is used for detecting the running states of at least two main database nodes of the shared storage database cluster in the local machine room, and determining whether the shared storage database cluster is in a fault state or not according to the running states of all the main database nodes; and then, when the shared storage database cluster is determined to be in a fault state through the cluster monitoring component, switching each main database node from the main database node to a backup database node, switching a first backup database node of the shared storage database cluster in a different place computer room from the backup database node to the main database node, setting a second backup database node of the shared storage database cluster in the different place computer room to be a backup database node corresponding to the first backup database node, solving the problems that the labor cost and the time cost of a database cluster monitoring scheme in the related technology are higher, the monitoring efficiency is lower, and the shared storage database cluster can not be guaranteed to provide data read-write service for a long time, taking the problem that the shared storage database cluster can not be automatically monitored through the cluster monitoring component, determining whether the shared storage database cluster can provide data read-write service, and timely switching the first backup database node in the different place computer room to be a new main database node when the shared storage database cluster is determined to be in the fault state, setting the second backup database node in the different place computer room to be the new backup database node, and setting the second backup database node in the different place computer room to be the new backup database node corresponding to the new database node, enhancing the labor cost and the time cost of the shared storage database cluster can be a stable, thereby reducing the cost of the shared storage database cluster monitoring process, and the time-saving and the cost of the shared storage system can be guaranteed.
In an optional implementation manner of the embodiment of the present invention, optionally, when performing an operation of detecting an operation state of at least two master database nodes of a shared storage database cluster in a local machine room, the state detection module 301 is specifically configured to: and according to a preset time interval, the running state of each main database node is read at fixed time.
In an optional implementation manner of the embodiment of the present invention, optionally, when performing an operation of determining whether the shared storage database cluster is in a failure state according to an operation state of each of the master database nodes, the state detection module 301 is specifically configured to: if the running state of each main database node is an abnormal state, determining that the shared storage database cluster is in a fault state; and if the target main database node with the normal running state exists in the main database nodes, determining that the shared storage database cluster is not in a fault state.
In an optional implementation of the embodiment of the present invention, optionally, the state detection module 301 is further configured to: and generating a visual chart corresponding to the running state of each main database node, and providing the visual chart for a user.
In an optional implementation manner of the embodiment of the present invention, optionally, the state detection module 301 is specifically configured to, when performing an operation of providing the visual chart to the user: and displaying the visual chart in a cluster display page.
In an optional implementation manner of the embodiment of the present invention, optionally, the database cluster monitoring device further includes: and the alarm module is used for detecting the running state of the cluster monitoring component and providing preset alarm information for a user when detecting that the running state of the cluster monitoring component is abnormal.
In an optional implementation manner of the embodiment of the present invention, optionally, the database cluster monitoring device further includes: the remote node detection module is used for detecting the running state of the first backup database node through the cluster monitoring component; and the remote node switching module is used for switching the first backup database node from the main database node to the backup database node and switching the second backup database node from the backup database node to the main database node when the cluster monitoring component detects that the running state of the first backup database node is an abnormal state.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
The database cluster monitoring device can execute the database cluster monitoring method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the database cluster monitoring method.
Example IV
Fig. 4 shows a schematic diagram of an electronic device 10 that may be used to implement the database cluster monitoring method of an embodiment of the present invention. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., which is in network communication connection with the at least one processor 11, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program built into the Random Access Memory (RAM) 13 from the storage unit 18. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as database cluster monitoring methods.
In some embodiments, the database cluster monitoring method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. One or more of the steps of the database cluster monitoring method described above may be performed when a computer program is built into RAM 13 and executed by processor 11. Alternatively, in other embodiments, the processor 11 may be configured to perform the database cluster monitoring method in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
The computer program used to implement the database cluster monitoring method of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for monitoring a database cluster, comprising:
detecting the running states of at least two main database nodes of a shared storage database cluster in a local machine room through a cluster monitoring component, and determining whether the shared storage database cluster is in a fault state according to the running states of the main database nodes;
and when the cluster monitoring component determines that the shared storage database cluster is in a fault state, switching each main database node from the main database node to a backup database node, switching a first backup database node of the shared storage database cluster in a different place machine room from the backup database node to the main database node, and setting a second backup database node of the shared storage database cluster in the different place machine room to be the backup database node corresponding to the first backup database node.
2. The method of claim 1, wherein detecting the operational status of at least two master database nodes of a shared storage database cluster in a local machine room comprises:
and according to a preset time interval, the running state of each main database node is read at fixed time.
3. The method of claim 1, wherein determining whether the shared storage database cluster is in a failure state based on the operational status of each of the primary database nodes comprises:
if the running state of each main database node is an abnormal state, determining that the shared storage database cluster is in a fault state;
and if the target main database node with the normal running state exists in the main database nodes, determining that the shared storage database cluster is not in a fault state.
4. The method of claim 1, further comprising, after detecting the operational status of at least two primary database nodes of the shared storage database cluster in the local machine room:
and generating a visual chart corresponding to the running state of each main database node, and providing the visual chart for a user.
5. The method of claim 4, wherein the providing the visualization graph to the user comprises:
and displaying the visual chart in a cluster display page.
6. The method as recited in claim 1, further comprising:
detecting the running state of the cluster monitoring component, and providing preset alarm information for a user when detecting that the running state of the cluster monitoring component is abnormal.
7. The method of claim 1, further comprising, after setting a second backup database node of the shared storage database cluster in the off-site machine room as a backup database node corresponding to the first backup database node:
detecting the running state of the first backup database node through the cluster monitoring component;
and when the cluster monitoring component detects that the running state of the first backup database node is an abnormal state, switching the first backup database node from a main database node to a backup database node, and switching the second backup database node from the backup database node to the main database node.
8. A database cluster monitoring device, comprising:
the state detection module is used for detecting the running states of at least two main database nodes of the shared storage database cluster in the local machine room through the cluster monitoring component, and determining whether the shared storage database cluster is in a fault state according to the running states of the main database nodes;
and the node switching module is used for switching each main database node from the main database node to the backup database node when the shared storage database cluster is determined to be in a fault state through the cluster monitoring component, switching a first backup database node of the shared storage database cluster in the remote machine room from the backup database node to the main database node, and setting a second backup database node of the shared storage database cluster in the remote machine room to be the backup database node corresponding to the first backup database node.
9. An electronic device, the electronic device comprising:
at least one processor;
and a memory communicatively coupled to the at least one processor network;
wherein the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the database cluster monitoring method of any one of claims 1-7.
10. A computer readable storage medium storing computer instructions for causing a processor to implement the database cluster monitoring method of any one of claims 1-7 when executed.
CN202310232325.1A 2023-03-09 2023-03-09 Database cluster monitoring method, device, equipment and medium Pending CN116089222A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310232325.1A CN116089222A (en) 2023-03-09 2023-03-09 Database cluster monitoring method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310232325.1A CN116089222A (en) 2023-03-09 2023-03-09 Database cluster monitoring method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN116089222A true CN116089222A (en) 2023-05-09

Family

ID=86210351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310232325.1A Pending CN116089222A (en) 2023-03-09 2023-03-09 Database cluster monitoring method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN116089222A (en)

Similar Documents

Publication Publication Date Title
CN113312153B (en) Cluster deployment method and device, electronic equipment and storage medium
CN112527567A (en) System disaster tolerance method, device, equipment and storage medium
CN109271270A (en) The troubleshooting methodology, system and relevant apparatus of bottom hardware in storage system
CN117076196A (en) Database disaster recovery management and control method and device
CN116089222A (en) Database cluster monitoring method, device, equipment and medium
CN113590287B (en) Task processing method, device, equipment, storage medium and scheduling system
CN115934742A (en) Fault processing method, device, equipment and storage medium
CN114116288A (en) Fault processing method, device and computer program product
CN114885014A (en) Method, device, equipment and medium for monitoring external field equipment state
CN114860322A (en) Substrate management controller, control method and electronic equipment
CN114189429A (en) System, method, device and medium for monitoring server cluster faults
CN114095394A (en) Network node fault detection method and device, electronic equipment and storage medium
CN113656231A (en) Processing method, device, equipment and storage medium for disk failure
CN115827782A (en) Database read-write separation method, device, equipment and medium
CN117395263B (en) Data synchronization method, device, equipment and storage medium
CN117608609A (en) Updating method, device, equipment and medium of power grid self-healing program
CN113220554B (en) Method and apparatus for detecting performance of program code
CN113609145B (en) Database processing method, device, electronic equipment, storage medium and product
CN116909757B (en) Cluster management control system, method, electronic device and storage medium
CN115454973A (en) Data migration method, device, equipment and storage medium
CN117632600A (en) Fault management method and device and electronic equipment
CN117992264A (en) Host fault repairing method, device and system, electronic equipment and storage medium
CN117743029A (en) Group replication cluster self-starting method and device, electronic equipment and storage medium
CN117632601A (en) Master-slave machine database switching method, device, equipment and storage medium
CN117040958A (en) RS 485-based multipurpose multi-standby control method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination