CN115686951A - Fault processing method and device for database server - Google Patents

Fault processing method and device for database server Download PDF

Info

Publication number
CN115686951A
CN115686951A CN202110869472.0A CN202110869472A CN115686951A CN 115686951 A CN115686951 A CN 115686951A CN 202110869472 A CN202110869472 A CN 202110869472A CN 115686951 A CN115686951 A CN 115686951A
Authority
CN
China
Prior art keywords
database
state
database server
server
fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110869472.0A
Other languages
Chinese (zh)
Inventor
韦鹏程
董俊峰
强群力
刘超千
赵彤
陈瑛绮
周欢
刘海龙
余星
王鹏
孟令银
朱绍辉
陈飞
姚文龙
高超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NetsUnion Clearing Corp
Original Assignee
NetsUnion Clearing Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NetsUnion Clearing Corp filed Critical NetsUnion Clearing Corp
Priority to CN202110869472.0A priority Critical patent/CN115686951A/en
Publication of CN115686951A publication Critical patent/CN115686951A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Hardware Redundancy (AREA)

Abstract

The application discloses a fault processing method and a fault processing device of a database server, which are executed by the database server, wherein the method comprises the following steps: detecting the read-write state of a database deployed on a database server; detecting the state of an operating system on a database server under the condition that the read-write state of the database is a non-writable state; and under the condition that the state of the operating system is a survival state, performing fault processing on the database server by adopting a preset fault processing strategy. The method and the device can detect the software running condition of the database server through the writable state of the automatic detection database and the survival state of the operating system, and further can determine whether the database server has a fault condition of system ramming, can adopt a fault handling strategy to carry out fault handling quickly when the system ramming occurs, and isolate the database server which has faults in time, so that the fault handling efficiency is greatly improved, manual intervention is not needed in the process, and the risk of manual misoperation is reduced.

Description

Fault processing method and device for database server
Technical Field
The present application relates to the field of database technologies, and in particular, to a method and an apparatus for processing a failure of a database server.
Background
Under some extreme abnormal conditions, a Raid (Redundant array of Independent Disks) fault or a network card fault and other hardware faults may occur in the database server, but the software system operates normally, and the database server is tamped. In this case, the common highly available components are unable to detect the anomaly and thus are unable to automatically isolate the failed database server.
In order to avoid the influence of the failed database server on the service continuity, when the above situation occurs, human intervention is generally required to perform fault isolation. However, human operation is often risky, isolation efficiency is low, and uncontrollable risks are easily caused to actual services.
Disclosure of Invention
The embodiment of the application provides a fault processing method and device for a database server, so that operation risks are reduced, and fault isolation efficiency is improved.
The embodiment of the application adopts the following technical scheme:
in a first aspect, an embodiment of the present application provides a method for processing a failure of a database server, where the method is performed by the database server, and the method includes:
detecting the read-write state of a database deployed on a database server;
detecting the state of an operating system on the database server under the condition that the read-write state of the database is a non-writable state;
and under the condition that the state of the operating system is a survival state, adopting a preset fault processing strategy to carry out fault processing on the database server.
Optionally, the detecting the read-write state of the database deployed on the database server includes:
writing timestamp data into a database deployed on the database server using a high availability component;
and determining the read-write state of the database according to the timestamp data in the database.
Optionally, the determining the read-write state of the database according to the timestamp data in the database includes:
determining whether timestamp data in the database is updated;
and if so, determining that the read-write state of the database is a writable state.
Optionally, the determining the read-write state of the database according to the timestamp data in the database further includes:
continuing to perform the step of writing timestamp data into a database deployed on the database server using a highly available component if the timestamp data in the database is not updated;
and under the condition that the times of not updating the timestamp data in the database reach a preset time threshold, determining that the read-write state of the database is a non-writable state.
Optionally, the detecting the state of the operating system on the database server includes:
sending a communication request to the operating system;
and determining the state of the operating system according to the response result of the operating system to the communication request.
Optionally, after detecting the read-write status of the database deployed on the database server, the method further includes:
and sending a fault isolation request to a service platform under the condition that the read-write state of the database is a non-writable state, so that the service platform performs fault isolation on the database server.
Optionally, the performing fault processing on the database server by using a preset fault processing policy includes:
calling an out-of-band management platform interface to close the database server so as to enable the state of an operating system on the database server to enter a non-survival state;
and triggering fault processing on the database deployed on the database server according to the role of the database according to the non-survival state.
Optionally, the fault handling according to the role of the database includes at least one of:
under the condition that the role of the database is a main database, the fault processing according to the role of the database is to switch a local standby database corresponding to the main database into a new main database so that the new main database receives service data;
under the condition that the role of the database is a local standby database, the fault processing according to the role of the database is to perform degradation processing on the main database so that the degraded main database directly synchronizes data to the same-city standby database corresponding to the local standby database;
under the condition that the role of the database is the same-city standby database, the fault processing according to the role of the database is to perform degradation processing on the local standby database corresponding to the same-city standby database, so that the local standby database after the degradation processing directly synchronizes data with the different-place standby database corresponding to the same-city standby database;
and under the condition that the role of the database is the allopatric standby database, the fault processing according to the role of the database is to directly close the database server where the allopatric standby database is located.
In a second aspect, an embodiment of the present application further provides a failure processing apparatus for a database server, which is applied to the database server, where the apparatus is configured to implement any one of the foregoing methods.
In a third aspect, an embodiment of the present application further provides an electronic device, including:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to perform any of the methods described above.
In a fourth aspect, embodiments of the present application further provide a computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to perform any of the methods described above.
The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects: the method for processing the fault of the database server in the embodiment of the application can be executed by each database server, when the database server is processed with the fault, the read-write state of the database deployed on the database server can be detected firstly, if the read-write state of the database is a non-writable state, the state of an operating system on the database server is further detected, and if the state of the operating system is a survival state, the condition that the database server is tamped by the system is described, a preset fault processing strategy needs to be adopted to process the fault of the database server. According to the fault processing method of the database server, the software running state of the database server can be ascertained by automatically detecting the writable state of the database and the survival state of the operating system, and then whether the database server is in a fault condition of system ramming can be determined.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a schematic structural diagram of a distributed data center according to an embodiment of the present application;
fig. 2 is a schematic flowchart illustrating a method for processing a failure of a database server according to an embodiment of the present application;
fig. 3 is a schematic diagram illustrating a failure processing flow of a database server according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a failure processing apparatus of a database server according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Under an actual service scene, when a database server is in a condition of system ramming, a general high-availability component cannot detect an abnormality, and further cannot automatically trigger an operation of isolating a failed database server, and if the failed database server is not isolated in time, a large number of current and subsequent service requests cannot be effectively processed, and further, conditions of service processing interruption, processing failure and the like occur, so that the stability and service continuity of the whole database server cluster are greatly influenced.
Based on this, an embodiment of the present application provides a fault handling method for a database server, where the fault handling method is executed by the database server, and the database server in the embodiment of the present application is applicable to any one database server under a distributed data center architecture, and here, an application scenario of the embodiment of the present application is described with reference to fig. 1 as an example. As shown in fig. 1, a schematic architecture diagram of a distributed data center in the embodiment of the present application is provided. The data center comprises a main data center, a city-sharing data center and a different-place data center, wherein the main data center is provided with a main database and a local standby database, the city-sharing data center and the main data center are located in the same city, the city-sharing standby database is arranged in the main data center, the different-place data center and the main data center are located in different cities, and the different-place standby database is arranged in the different-place data center.
When providing service to the outside, the database servers in the main data center, the same-city data center and the different-place data center can receive the service request of the upper-layer application, the database server in the main data center can directly process the service request after receiving the service request, the main database connected with the main data center is accessed to read and write data, and the data is synchronized to the local standby database located in the same data center in real time. After receiving the service request, the database servers of the same-city data center and the different-place data center cannot directly process the service request, but need to forward the service request to the main data center, the main data center performs service processing, then the local standby database of the main data center synchronizes the data of the main database to the same-city standby database of the same-city data center in an asynchronous mode, and the same-city standby database synchronizes the data to the different-place standby database of the different-place data center in an asynchronous mode, so that the data integrity under the whole data center framework is ensured.
As shown in fig. 2, a schematic flow chart of a method for processing a failure of a database server according to an embodiment of the present application is provided, where the method at least includes the following steps S210 to S230:
step S210, detecting a read-write state of the database deployed on the database server.
The database server in the embodiment of the present application may be understood as a machine in which a database is deployed, and in an actual service scenario, a plurality of database servers are usually involved, and each database server may be configured to execute the fault handling method for the database server provided in the embodiment of the present application. Of course, the method may be executed by a database server where the database is located, or may be executed by separately deploying an independent server.
In order to determine whether the database server is dead or not, the hardware operating condition of the database server may be determined first, for example, whether a Raid fault or a network card fault occurs in the database server is determined. The judgment can be specifically carried out by detecting the read-write state of a database deployed on a database server, wherein the read-write state of the database refers to the writable state of the database, namely, whether the database server can write data into a hard disk of the database server is judged, and whether the hardware running condition of the database server is normal can be further determined according to the writable state of the database.
The database status may be detected at intervals, for example, every 3 seconds, or in real time. Of course, how to set the detection frequency specifically is flexible for those skilled in the art according to actual requirements, and is not limited herein.
Step S220, detecting a state of an operating system on the database server when the read-write state of the database is a non-writable state.
When the read-write state of the database is the unwritable state, it indicates that the database server cannot write data into the hard disk at this time, and this situation may be caused by a software layer or a hardware layer, for example, a Raid fault or a network card fault may occur in the database server. In order to further determine whether the database server is dead or not, the software running status of the database server needs to be further detected, which can be specifically determined by detecting the state of an operating system running on the database server.
Step S230, performing fault processing on the database server by using a preset fault processing policy when the state of the operating system is a survival state.
When the state of the operating system is the survival state, it is described that the operating system can normally operate, that is, the operating condition of the database server on the software level is normal, so that it can be determined that the database server is dead-rammed at this time, and a preset fault handling policy needs to be adopted to perform fault handling on the database server, so as to avoid affecting service continuity.
According to the fault processing method of the database server, the software running state of the database server can be detected through automatically detecting the writable state of the database and the survival state of the operating system, whether the database server has the fault condition of system ramming can be determined, when the system ramming occurs, a preset fault processing strategy can be adopted to quickly process the fault, the database server with the fault is isolated in time, the fault processing efficiency is greatly improved, manual intervention is not needed in the process, and the risk of manual operation errors is reduced.
In an embodiment of the present application, the detecting a read-write state of a database deployed on a database server includes: writing timestamp data into a database deployed on the database server using a high availability component; and determining the read-write state of the database according to the timestamp data in the database.
When the read-write state of the database deployed on the database server is detected, the determination can be performed by using a High Availability (HA) component deployed on the database server, and the High Availability component is generally deployed on the database server in a main-standby mode, and is mainly used for ensuring that when a main database fails, the failure transfer can be automatically completed, and a virtual IP is drifted to a standby database. The highly available component is responsible for changing the primary database from a semi-synchronous configuration to an asynchronous configuration to ensure that the primary database is writable if the backup database fails.
In this embodiment of the present application, whether the database is writable or not may be detected by using the high-availability component, specifically, by writing timestamp data into the database, where the timestamp data may be regarded as a time field, and determining whether the read-write state of the database is a writable state or a non-writable state according to a write result of the timestamp data.
It should be noted that, in the present application, it is required to detect whether a hardware fault condition, such as a Raid fault, occurs in the database server, so that in the embodiment of the present application, when detecting whether the database is writable, it means whether the timestamp data can be written into the hard disk of the database server.
In an embodiment of the present application, the determining the read-write state of the database according to the timestamp data in the database includes: determining whether timestamp data in the database is updated; and if so, determining that the read-write state of the database is a writable state.
When the read-write state of the database is determined according to the timestamp data in the database, the read-write state of the database can be determined according to whether the timestamp data in the database is updated or not, if the timestamp data in the database is updated, the timestamp data is successfully written, the read-write state of the database is a writable state, and the hardware running condition of a database server is normal, so that the detection can be finished for the next detection.
For example, assuming that the timestamp data currently stored in the database is "01 minutes 01 seconds at 11 o 'clock 1 month 1 year 2021", after an operation of writing the timestamp data into the database is performed, the timestamp data currently stored in the database is changed to "01 minutes 04 seconds at 11 o' clock 1 month 1 year 2021", it can be seen that the timestamp data is updated, which indicates that the operation of writing the timestamp data is successful, and the read-write state of the database is a writable state.
In an embodiment of the application, the determining the read-write state of the database according to the timestamp data in the database further includes: continuing to perform the step of writing timestamp data into a database deployed on the database server using a highly available component if the timestamp data in the database is not updated; and under the condition that the times of not updating the timestamp data in the database reach a preset time threshold, determining that the read-write state of the database is a non-writable state.
After the time stamp data is written into the database, if the time stamp data in the database is not updated, the writing is failed at the moment. However, in an actual service scenario, when a concurrent service request is high, the database server is under high pressure, and a situation of short response delay or failure may also occur, and actually, the database server may not be in a failure state at this time.
Therefore, in the embodiment of the present application, in consideration of the above special situation, after a write failure, the write operation can be performed once or more, and the specific number of times of performing the write operation can be flexibly set by a person skilled in the art according to the actual situation. If the number of times of continuous write-in failure within a certain time triggers a preset number threshold, it indicates that the database server cannot write data into the hard disk at this time, and this situation may be caused by a software layer or a hardware layer, and the system is tamped, and it may be determined by further combining the survival state of the operating system subsequently. According to the embodiment of the application, the writable state of the database server is judged by executing the writing operation for multiple times, and the accuracy of the detection result is ensured.
In one embodiment of the application, the detecting a state of an operating system on the database server includes: sending a communication request to the operating system; and determining the state of the operating system according to the response result of the operating system to the communication request.
The whole process of the embodiment of the present application may be specifically executed by a separate program or component built in the database server, and thus may be determined by attempting to establish a communication connection with the operating system when detecting the state of the operating system running on the database server. For example, a communication request may be sent to the operating system, where the communication request may be implemented by using a Ping command, which is mainly used to test the reachability of a host on an internet protocol network, and is implemented by sending an internet message control protocol (ICMP) echo request packet to a target host and waiting for an ICMP echo reply, and the Ping command is capable of reporting errors, packet loss, and a statistical summary of the result, typically including parameters such as a minimum value, a maximum value, an average value, and a standard deviation of the average value of round trip times. Of course, in addition to the Ping command, an SSH (Secure Shell) command may also be used, and specifically, what type of communication command is used may be flexibly set by those skilled in the art according to actual needs.
For example, taking Ping command as an example, after receiving the communication request, the operating system returns a response packet, as a response result to the communication request, if the parameter value in the response packet is not 0, it indicates that a communication connection can be established with the operating system at this time, and the operating system is in a live state. If the parameter values in the response packet are all 0, it indicates that the communication connection with the operating system cannot be established at this time, and the operating system is in a non-survival state, that is, a fault occurs in the software layer.
Of course, for the setting and judgment of the parameter values in the response packet, those skilled in the art may flexibly set according to actual requirements, for example, it may also be set that when a specific parameter value exceeds a preset threshold, the operating system is considered to be in an alive state, and when a specific parameter value does not exceed the preset threshold, the operating system is considered to be in a non-alive state, and so on.
It should be noted that, when the read-write state of the database is in the non-writable state and the operating system is in the non-viable state, it indicates that the data server has a fault in both hardware and software layers, and in this case, the high-availability component may directly detect the abnormality, and even if the detection process is not executed, the subsequent main-standby switching operation may be automatically triggered, so that it is necessary to make clear the difference between this case and the system ramming case.
In an embodiment of the application, after detecting the read-write state of the database deployed on the database server, the method further includes: and sending a fault isolation request to a service platform under the condition that the read-write state of the database is a non-writable state, so that the service platform performs fault isolation on the database server.
After detecting that the database deployed on the database server is in the unwritable state, it indicates that the database server has a fault, and in order to avoid excessive influence on service continuity, the faulty database server can be isolated, so as to ensure the stability of service provided by the whole database cluster.
Specifically, after detecting that the database deployed on the database server is in the unwritable state, the service isolation interface of the service platform may be called, and a fault isolation request is sent to the service platform, so that the service platform can isolate the failed database server in time.
In an embodiment of the present application, the performing fault processing on the database server by using a preset fault processing policy includes: calling an out-of-band management platform interface to close the database server so as to enable the state of an operating system on the database server to enter an unviable state; and triggering fault processing of the database deployed on the database server according to the role of the database according to the non-survival state.
In an actual service scene, the databases on the database server are generally deployed in a master-standby mode, and the embodiment of the application specifically includes a master database, a standby database, a same-city standby database and a different-place standby database which are sequentially established in communication connection, and the databases with different roles exert different functions, so that when the databases on the database server fail, the databases deployed on the database server can be triggered to be correspondingly processed according to the roles of the databases, so that subsequent database managers can conveniently perform fault repair.
However, in the existing service logic, when a high-availability component detects a failure of a database server at a software level, that is, an operating system is in a non-survival state, a failure processing operation of a database is automatically triggered, and when the system is tamped down, only a hardware level fails, and a software level operates normally, so that the high-availability component cannot detect an exception, and further cannot automatically trigger the failure processing operation of the database.
In view of the above situation, in the embodiment of the present application, the failed database server is shut down by calling the out-of-band management platform interface, and since the shutdown of the database server causes the operating system to enter the non-survival state, and the detection frequency of the survival state of the operating system in the embodiment of the present application is generally several seconds, when the survival state of the operating system is detected next time, it is detected that the operating system is in the non-survival state, so that the database deployed on the database server can be automatically triggered to perform corresponding processing according to the role of the database.
The out-of-band management platform is used for uniformly managing and maintaining a plurality of hardware devices, and when an interface of the out-of-band management platform is called, parameter information of a database server needing to be closed, such as a user name, a password, a serial number and the like, can be transmitted to the out-of-band management platform, so that the out-of-band management platform can determine which database server is closed according to the parameter information.
The above embodiment may be understood as that, in the case that the system is dead, the out-of-band management platform interface is called to shut down the database server, and the operating system of the database server is forced to enter the non-survival state, so as to automatically trigger the fault handling operation.
In an embodiment of the application, the fault handling according to the role of the database includes at least one of: under the condition that the role of the database is a main database, the fault processing according to the role of the database is to switch a local standby database corresponding to the main database into a new main database so that the new main database receives service data; under the condition that the role of the database is a local standby database, the fault processing according to the role of the database is to perform degradation processing on the main database so that the degraded main database directly synchronizes data to the same-city standby database corresponding to the local standby database; under the condition that the role of the database is the same-city standby database, the fault processing according to the role of the database is to perform degradation processing on a local standby database corresponding to the same-city standby database so that the local standby database after the degradation processing directly synchronizes data with a different-place standby database corresponding to the same-city standby database; and under the condition that the role of the database is the allopatric standby database, the fault processing according to the role of the database is to directly close the database server where the allopatric standby database is located.
As described above, the roles of the databases in the embodiments of the present application may include a main database, a local backup database, a city-owned backup database, and a remote backup database, where the main database is responsible for reading and writing data and synchronizing data with the backup database, the local backup database is responsible for reading data and synchronizing data with the city-owned backup database, the city-owned backup database is responsible for reading data and synchronizing data with the remote backup database, and the remote backup database is mainly responsible for reading data.
Based on this, the specific failure handling logic adopted by the database with different roles in the embodiment of the present application is different. For the main database, when the system of the server where the main database is located is dead, the standby database corresponding to the main database may be switched to a new main database, so as to continue to provide external services through the new main database.
For the local backup database, when the server of the local backup database is in a condition of system ramming, the data of the local backup database cannot be sent to the city-wide backup database connected with the local backup database, and the server of the main database cannot synchronize the data to the backup database, so that the main database connected with the local backup database can be subjected to degradation processing, the degraded main database can replace the failed local backup database, and the data can be directly synchronized with the city-wide backup database connected with the local backup database, thereby ensuring the correctness and timeliness of data synchronization.
Similarly, for the same-city standby database, when the server of the same-city standby database is in a condition of system ramming, the data of the same-city standby database cannot be sent to the different-place standby database connected with the same-city standby database, and the server of the local standby database cannot synchronize the data with the same-city standby database, so that the local standby database connected with the same-city standby database can be degraded, the degraded local standby database can take over the failed same-city standby database, and the data can be directly synchronized with the different-place standby database connected with the same-city standby database, thereby ensuring the correctness and timeliness of data synchronization.
For the different-place standby database, the different-place standby database is only responsible for receiving the data synchronized by the same-place standby database, so that when the different-place standby database fails, the server where the different-place standby database is located is directly closed.
It should be noted that the isolation process and the fault handling process performed according to the role of the database in the above embodiments do not have an interaction relationship. In an actual service scene, the time consumed by the isolation operation is usually large, the isolation operation can be completed within 3 seconds generally, the fault processing logic performed according to the role of the database is complex, the time consumption is long, the fault processing can be completed within ten minutes generally, and if the isolation operation is performed after the process is completed, the database server with the fault cannot be isolated within a short time, so that the actual service is seriously affected.
Therefore, the isolation operation of the embodiment of the application is independent of subsequent fault processing operation performed according to the role of the database, and the isolation operation can be directly triggered as long as the database is detected to be in the non-writable state, so that the influence on the actual service is reduced.
As shown in fig. 3, a schematic diagram of a failure processing flow of a database server according to an embodiment of the present application is provided. Firstly, detecting whether a database deployed on a database server is writable or not, if so, finishing the detection, if not, marking as 1-time writing failure, then, executing the step of detecting whether the database deployed on the database server is writable or not, and repeating the steps until the database deployed on the database server is continuously detected for 3 times to be unwritable, determining that the read-write state of the database is an unwritable state, and indicating that the database server has a fault. At this time, the service isolation interface of the service platform can be directly called to send a service isolation request, so that the service platform can isolate the database server in time.
In addition, in order to determine whether the failure of the database server is a system-dead-ramming condition, the state of an operating system running on the database server needs to be further checked, if the operating system is in a survival state, it is proved that the database server is dead-rammed, and because the high-availability component cannot detect an abnormality and cannot trigger fault handling operations such as active-standby switching and the like under the condition of the system-dead-ramming, an out-of-band management platform interface can be called to close the database server, so that the operating system is forced to enter a non-survival state, and further, subsequent fault handling operations can be automatically triggered, thereby realizing fault handling of the database server.
An embodiment of the present application further provides a device 400 for processing a failure of a database server, which is applied to the database server, and as shown in fig. 4, a schematic structural diagram of the device for processing a failure of a database server according to the embodiment of the present application is provided, where the device 400 includes: a first detection unit 410, a second detection unit 420 and a fault handling unit 430, wherein:
a first detecting unit 410, configured to detect a read-write state of a database deployed on a database server;
a second detecting unit 420, configured to detect a state of an operating system on the database server when a read-write state of the database is a non-writable state;
and a fault processing unit 430, configured to perform fault processing on the database server by using a preset fault processing policy when the state of the operating system is a survival state.
In an embodiment of the present application, the first detecting unit 410 is specifically configured to: writing timestamp data into a database deployed on the database server using a high availability component; and determining the read-write state of the database according to the timestamp data in the database.
In an embodiment of the present application, the first detecting unit 410 is specifically configured to: determining whether timestamp data in the database is updated; and if so, determining that the read-write state of the database is a writable state.
In an embodiment of the present application, the first detecting unit 410 is specifically configured to: continuing to perform the step of writing timestamp data into a database deployed on the database server using a highly available component if the timestamp data in the database is not updated; and under the condition that the times of not updating the timestamp data in the database reach a preset time threshold, determining that the read-write state of the database is a non-writable state.
In an embodiment of the present application, the second detecting unit 420 is specifically configured to: sending a communication request to the operating system; and determining the state of the operating system according to the response result of the operating system to the communication request.
In one embodiment of the present application, the apparatus further comprises: and the sending unit is used for sending a fault isolation request to a service platform under the condition that the read-write state of the database is a non-writable state, so that the service platform can carry out fault isolation on the database server.
In an embodiment of the present application, the fault handling unit 430 is specifically configured to: calling an out-of-band management platform interface to close the database server so as to enable the state of an operating system on the database server to enter an unviable state; and triggering fault processing of the database deployed on the database server according to the role of the database according to the non-survival state.
In an embodiment of the application, the fault handling according to the role of the database includes at least one of: under the condition that the role of the database is a main database, the fault processing according to the role of the database is to switch a local standby database corresponding to the main database into a new main database so that the new main database receives service data; under the condition that the role of the database is a local standby database, the fault processing according to the role of the database is to perform degradation processing on the main database so that the main database after the degradation processing directly synchronizes data with the same-city standby database corresponding to the local standby database; under the condition that the role of the database is the same-city standby database, the fault processing according to the role of the database is to perform degradation processing on the local standby database corresponding to the same-city standby database, so that the local standby database after the degradation processing directly synchronizes data with the different-place standby database corresponding to the same-city standby database; and under the condition that the role of the database is the allopatric standby database, the fault processing according to the role of the database is to directly close the database server where the allopatric standby database is located.
It can be understood that, the above-mentioned fault processing apparatus for a database server can implement the steps of the fault processing method for a database server executed by a database server provided in the foregoing embodiment, and the explanations regarding the fault processing method for a database server are applicable to the fault processing apparatus for a database server, and are not described herein again.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 5, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.
The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 5, but this does not indicate only one bus or one type of bus.
And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.
The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the fault processing device of the database server on a logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:
detecting the read-write state of a database deployed on a database server;
detecting the state of an operating system on the database server under the condition that the read-write state of the database is a non-writable state;
and under the condition that the state of the operating system is a survival state, adopting a preset fault processing strategy to carry out fault processing on the database server.
The method executed by the failure processing device of the database server according to the embodiment shown in fig. 4 of the present application may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
The electronic device may further execute the method executed by the failure processing apparatus of the database server in fig. 4, and implement the functions of the failure processing apparatus of the database server in the embodiment shown in fig. 4, which are not described herein again in this embodiment of the present application.
An embodiment of the present application further provides a computer-readable storage medium storing one or more programs, where the one or more programs include instructions, which, when executed by an electronic device that includes multiple application programs, enable the electronic device to perform the method performed by the failure processing apparatus of the database server in the embodiment shown in fig. 4, and are specifically configured to perform:
detecting the read-write state of a database deployed on a database server;
detecting the state of an operating system on the database server under the condition that the read-write state of the database is a non-writable state;
and under the condition that the state of the operating system is a survival state, performing fault processing on the database server by adopting a preset fault processing strategy.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises that element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art to which the present application pertains. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (11)

1. A method of fault handling for a database server, performed by a database server, wherein the method comprises:
detecting the read-write state of a database deployed on a database server;
detecting the state of an operating system on the database server under the condition that the read-write state of the database is a non-writable state;
and under the condition that the state of the operating system is a survival state, adopting a preset fault processing strategy to carry out fault processing on the database server.
2. The method of claim 1, wherein the detecting the read-write status of the database deployed on the database server comprises:
writing timestamp data into a database deployed on the database server using a high availability component;
and determining the read-write state of the database according to the timestamp data in the database.
3. The method of claim 2, wherein the determining the read-write status of the database according to the timestamp data in the database comprises:
determining whether timestamp data in the database is updated;
and if so, determining that the read-write state of the database is a writable state.
4. The method of claim 3, wherein said determining the read-write status of the database according to the timestamp data in the database further comprises:
continuing to perform the step of writing timestamp data into a database deployed on the database server using a highly available component if the timestamp data in the database is not updated;
and under the condition that the times of not updating the timestamp data in the database reach a preset time threshold, determining that the read-write state of the database is a non-writable state.
5. The method of claim 1, wherein said detecting a state of an operating system on said database server comprises:
sending a communication request to the operating system;
and determining the state of the operating system according to the response result of the operating system to the communication request.
6. The method of claim 1, wherein after detecting the read-write status of the database deployed on the database server, the method further comprises:
and sending a fault isolation request to a service platform under the condition that the read-write state of the database is a non-writable state, so that the service platform performs fault isolation on the database server.
7. The method of claim 1, wherein the performing fault handling on the database server by using the preset fault handling policy comprises:
calling an out-of-band management platform interface to close the database server so as to enable the state of an operating system on the database server to enter an unviable state;
and triggering fault processing on the database deployed on the database server according to the role of the database according to the non-survival state.
8. The method of claim 7, wherein the fault handling in accordance with the role of the database comprises at least one of:
under the condition that the role of the database is a main database, the fault processing according to the role of the database is to switch a local standby database corresponding to the main database into a new main database so that the new main database receives service data;
under the condition that the role of the database is a local standby database, the fault processing according to the role of the database is to perform degradation processing on the main database so that the degraded main database directly synchronizes data to the same-city standby database corresponding to the local standby database;
under the condition that the role of the database is the same-city standby database, the fault processing according to the role of the database is to perform degradation processing on the local standby database corresponding to the same-city standby database, so that the local standby database after the degradation processing directly synchronizes data with the different-place standby database corresponding to the same-city standby database;
and under the condition that the role of the database is the allopatric standby database, the fault processing according to the role of the database is to directly close the database server where the allopatric standby database is located.
9. A failure processing device for a database server, applied to the database server, wherein the device is configured to implement the method according to any one of claims 1 to 8.
10. An electronic device, comprising:
a processor; and
a memory arranged to store computer executable instructions which, when executed, cause the processor to perform the method of any one of claims 1 to 8.
11. A computer readable storage medium storing one or more programs which, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the method of any of claims 1-8.
CN202110869472.0A 2021-07-30 2021-07-30 Fault processing method and device for database server Pending CN115686951A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110869472.0A CN115686951A (en) 2021-07-30 2021-07-30 Fault processing method and device for database server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110869472.0A CN115686951A (en) 2021-07-30 2021-07-30 Fault processing method and device for database server

Publications (1)

Publication Number Publication Date
CN115686951A true CN115686951A (en) 2023-02-03

Family

ID=85058741

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110869472.0A Pending CN115686951A (en) 2021-07-30 2021-07-30 Fault processing method and device for database server

Country Status (1)

Country Link
CN (1) CN115686951A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116485371A (en) * 2023-04-24 2023-07-25 广州一小时科技有限公司 Tracing method and system for representing maintenance process of smart phone
CN118376867A (en) * 2024-06-25 2024-07-23 东方电子股份有限公司 Automatic testing device and method for power distribution terminal

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116485371A (en) * 2023-04-24 2023-07-25 广州一小时科技有限公司 Tracing method and system for representing maintenance process of smart phone
CN116485371B (en) * 2023-04-24 2023-11-21 广州一小时科技有限公司 Tracing method and system for representing maintenance process of smart phone
CN118376867A (en) * 2024-06-25 2024-07-23 东方电子股份有限公司 Automatic testing device and method for power distribution terminal

Similar Documents

Publication Publication Date Title
US11194679B2 (en) Method and apparatus for redundancy in active-active cluster system
US11397647B2 (en) Hot backup system, hot backup method, and computer device
CN108710673B (en) Method, system, computer device and storage medium for realizing high availability of database
CN106933843B (en) Database heartbeat detection method and device
CN110535692B (en) Fault processing method and device, computer equipment, storage medium and storage system
CN111294845B (en) Node switching method, device, computer equipment and storage medium
CN115686951A (en) Fault processing method and device for database server
CN113625945A (en) Distributed storage slow disk processing method, system, terminal and storage medium
CN111342986B (en) Distributed node management method and device, distributed system and storage medium
CN112486718B (en) Database fault automatic switching method, device and computer storage medium
CN115705261A (en) Memory fault repairing method, CPU, OS, BIOS and server
US20190303233A1 (en) Automatically Detecting Time-Of-Fault Bugs in Cloud Systems
CN113596195B (en) Public IP address management method, device, main node and storage medium
CN111934909B (en) Main-standby machine IP resource switching method, device, computer equipment and storage medium
CN112612652A (en) Distributed storage system abnormal node restarting method and system
CN115484267B (en) Multi-cluster deployment processing method and device, electronic equipment and storage medium
CN107783855B (en) Fault self-healing control device and method for virtual network element
CN109815064B (en) Node isolation method, node isolation device, node equipment and computer readable storage medium
CN115639969B (en) Storage disk main-standby switching method and device and computer equipment
CN107707402B (en) Management system and management method for service arbitration in distributed system
CN117971568A (en) Database switching method, apparatus, device, storage medium, and program product
CN114385592A (en) Fault transfer method, device, equipment and storage medium
CN117234837A (en) Health monitoring method of object storage cluster, related device and storage medium
CN118445126A (en) Data processing method, device, equipment, readable storage medium and product
CN116737396A (en) Method, device, electronic equipment and storage medium for configuring maintainability of server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination