CN111209140B - Method and device for recovering crash of main and standby dual-node databases - Google Patents

Method and device for recovering crash of main and standby dual-node databases Download PDF

Info

Publication number
CN111209140B
CN111209140B CN201911391020.5A CN201911391020A CN111209140B CN 111209140 B CN111209140 B CN 111209140B CN 201911391020 A CN201911391020 A CN 201911391020A CN 111209140 B CN111209140 B CN 111209140B
Authority
CN
China
Prior art keywords
database
service
backup
data
crash
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911391020.5A
Other languages
Chinese (zh)
Other versions
CN111209140A (en
Inventor
潘景基
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN201911391020.5A priority Critical patent/CN111209140B/en
Publication of CN111209140A publication Critical patent/CN111209140A/en
Application granted granted Critical
Publication of CN111209140B publication Critical patent/CN111209140B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases

Abstract

The embodiment of the invention discloses a method and a device for recovering crash of a main and standby dual-node database, which comprises the steps of preprocessing backup path configuration parameters of the database; separating the ics-manager service from the main/standby switching; and according to the state of the database service mariardb and the condition of database integrity backup, performing database recovery operation. According to the method and the system, the recovery operation of the database is respectively carried out according to the database service and the database integrity backup condition. And the data recovery is realized under the normal condition of the system and the data disk, so that the continuous availability of a virtualization system user is realized. The integrity of the data of the HCI virtualization system is ensured, and the high maintainability of the system operation is enhanced.

Description

Method and device for recovering crash of main and standby dual-node databases
Technical Field
The invention relates to the technical field of virtualization, in particular to a method and a device for recovering crash of a main and standby dual-node database.
Background
Cloud computing is a new innovation in the information age following the internet and computers, has strong expansibility and desirability, and can provide a brand new experience for users.
The virtualization technology in the cloud computing technology is developed particularly rapidly at present, and in the face of the development opportunity, a wave and tide super-fusion all-in-one machine launched by wave and tide deploys an InCloud Rail virtualization system, namely an HCI system, which is an enterprise-level server virtualization solution, converts a static and complex IT environment into a more dynamic and easily-managed virtual data center through fusion, distribution and management of bottom-layer physical resources, improves the agility and flexibility of resource delivery and the use efficiency of resources, helps enterprises to create a high-performance, extensible, manageable and flexible server virtualization infrastructure, and provides high-quality virtual data center services.
For the Langchao InCloud Rail super-fusion architecture system, namely the HCI system, for some users, the system exception may be triggered under the condition that the users do not operate by people according to the instruction manual or due to sudden exception conditions, so that the environment is broken down. Particularly, data recovery is performed only for the iCenter in the HCI system active/standby dual-node environment under normal system and data disk conditions, that is, a database or a database file needs to be recovered due to damage caused by a special reason.
Disclosure of Invention
The embodiment of the invention provides a method and a device for recovering crash of a main and standby dual-node database, which are used for solving the problem of data recovery when an HCI (host-standby dual-node) system is abnormally crashed in the prior art.
In order to solve the technical problem, the embodiment of the invention discloses the following technical scheme:
the first aspect of the present invention provides a method for recovering crash of a primary/standby dual-node database, where the method includes the following steps:
preprocessing the configuration parameters of the backup path of the database;
separating the ics-manager service from the main/standby switching;
and according to the state of the database service mariaddb and the condition of database integrity backup, performing recovery operation on the database.
Further, the recovering operation of the database according to the state of the database service mariaddb and the condition of the database integrity backup specifically comprises:
judging whether the state of the database service mariaddb is normal or not;
if yes, judging whether the database has integrity backup, if yes, directly starting data recovery operation, and if not, performing data recovery operation according to the current environment;
if not, the service is crashed, and the crash service is checked and analyzed.
Further, the directly enabling data recovery operation specifically includes:
deleting the database;
entering a database backup catalog, and decompressing backup database files;
importing the backup data into a database, and restarting the ics-manager service after the database is recovered to be normal;
adds ics-manager services to the heartbeat cluster.
Further, the delete database includes delete database name, delete database neutron, and delete mysql.
Further, the specific process of performing troubleshooting analysis on the crash service is as follows:
backing up a database data directory and a database log;
acquiring configuration information through a database configuration file, and checking a service log;
and calling a problem solution library according to the service log, and performing problem pairing and recovery.
A second aspect of the present invention provides a device for recovering a crash of a primary/standby dual-node database, where the device includes:
the data preprocessing module is used for preprocessing the configuration parameters of the backup path of the database;
the service separation module is used for separating the ics-manager service from the main/standby switching;
and the data recovery module is used for performing recovery operation on the database according to the state of the database service mariaddb and the condition of database integrity backup.
Further, the data recovery module comprises:
the state judgment unit is used for judging whether the state of the database service mariaddb is normal or not;
the backup integrity judging unit is used for judging whether the database has an integrity backup;
the first data recovery unit is used for performing data recovery operation when the service state of the database is normal and the backup is complete;
the second data recovery unit is used for performing data recovery operation according to the current environment when the service state of the database is normal and the backup is incomplete;
and the analysis and investigation unit is used for carrying out investigation and analysis on the crash service when the service state of the database is abnormal.
Further, the analysis and investigation unit includes:
the data backup subunit is used for backing up the database data catalog and the database log;
the information acquisition subunit acquires the configuration information through the database configuration file and checks the service log;
and the data recovery subunit calls the problem solution library according to the service log, and performs problem pairing and recovery.
Further, the first data recovery unit includes:
the first data processing subunit is used for deleting the database;
the second data processing subunit enters a database backup catalog and decompresses backup database files;
the service recovery subunit is used for importing the backup data into the database and restarting the ics-manager service after the database is recovered to be normal;
and the service configuration subunit adds the ics-manager service to the heartbeat cluster.
The effect provided in the summary of the invention is only the effect of the embodiment, not all the effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:
and respectively carrying out recovery operation on the databases according to the database service and the database integrity backup condition. And the data recovery is realized under the normal condition of the system and the data disk, so that the continuous availability of a virtualization system user is realized. The integrity of the data of the HCI virtualization system is ensured, and the high maintainability of the system operation is enhanced.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic flow diagram of an embodiment of the method of the present invention;
fig. 3 is a schematic diagram of the structure of the device of the present invention.
Detailed Description
In order to clearly explain the technical features of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.
As shown in fig. 1, the method for recovering crash of the active/standby dual-node database of the present invention includes the following steps:
s1, preprocessing configuration parameters of a database backup path;
s2, separating the ics-manager service from the main/standby switching;
and S3, performing recovery operation on the database according to the state of the database service mariaddb and the condition of database integrity backup.
In step S1, backup path configuration parameters of the iCenter system database are reserved, and the path is/var/backup/up.
In step S2, a heartbeat-disable ics-manager command is executed, and ics-manager service is separated from the main/standby switching.
As shown in fig. 2, the implementation process in step S3 is: executing a system status mariaddb, and judging whether the state of the database service mariaddb is normal or not according to the state result; active (running) indicates that the maridb service is normal, otherwise, the maridb service is abnormal. If the service state is normal, checking the/var/backup file, judging whether the database has an integrity backup, if so, directly starting data recovery operation, and if not, performing data recovery operation according to the current environment; if the service state is abnormal, the service is broken down, and the broken-down service is checked and analyzed.
The specific method for directly starting the data recovery operation is as follows: respectively executing database name deletion (drop database), database neutron (drop database neutron) and deletion of mysql. Entering a/var/backup database backup directory, and decompressing a backup database file by adopting a gunzip command gunzip xxx.sql.gz; importing backup data into a database by executing mysql-boot-ppassword mysql < xxx.sql, and restarting ics-manager service by executing systemtctl restart-manager after the database is recovered to be normal; the ics-manager service is added to the heartbeat cluster by heartbeat-enablers-manager.
The specific process of performing troubleshooting analysis on the crash service comprises the following steps: backing up database data directories datadir =/var/mysql, log-bin =/var/mysql/xxx.log and database logs; acquiring configuration information through a database configuration file, and viewing/var/log/mariaddb/mariaddb.log service logs; and calling a problem solution library according to the service log, and performing problem pairing and recovery.
As shown in fig. 3, the recovery apparatus for a crash of a primary and standby dual-node database of the present invention includes a data preprocessing module 1, a service separation module 2, and a data recovery module 3. The data preprocessing module 1 preprocesses the configuration parameters of the database backup path; the service separation module 2 separates the ics-manager service from the main/standby switching; and the data recovery module 3 performs database recovery operation according to the state of the database service mariardb and the condition of database integrity backup.
The data restoring module 3 includes a state judging unit 31, a backup integrity judging unit 32, a first data restoring unit 33, a second data restoring unit 34, and an analysis and review unit 35. The state judging unit 31 is used for judging whether the state of the database service mariaddb is normal; the backup integrity judging unit 32 is used for judging whether an integrity backup exists in the database; the first data recovery unit 33 is configured to perform a data recovery operation when the database service state is normal and the backup is complete; the second data recovery unit 33 is configured to perform data recovery operation according to the current environment when the database service state is normal and the backup is incomplete; the analysis and troubleshooting unit 34 is configured to perform troubleshooting analysis on the crash service when the database service state is abnormal.
The analysis and review unit 35 includes a data backup sub-unit 351, an information acquisition sub-unit 352, and a data restoration sub-unit 353. The data backup subunit 351 backs up the database data directory and the database log; the information obtaining subunit 352 obtains the configuration information through the database configuration file, and checks the service log; the data recovery subunit 353 calls the problem solution library according to the service log, and performs problem pairing and recovery.
The first data restoring unit 33 includes a first data processing sub-unit 331, a second data processing sub-unit 332, a service restoring sub-unit 333, and a service configuring sub-unit 334. The first data processing subunit 331 is configured to delete the database; the second data processing subunit 332 enters the database backup directory and decompresses the backup database file; the service recovery subunit 333 imports the backup data into the database, and restarts the ics-manager service after the database is recovered to be normal; the service configuration subunit 334 adds the ics-manager service to the heartbeat cluster.
The foregoing is only a preferred embodiment of the present invention and it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principle of the present invention and are intended to be included within the scope of the present invention.

Claims (7)

1. A recovery method for crash of a main and standby dual-node database is characterized by comprising the following steps:
preprocessing the configuration parameters of the backup path of the database;
separating the ics-manager service from the main/standby switching;
according to the state of the database service mariardb and the condition of database integrity backup, performing recovery operation on the database;
the specific operation of restoring the database according to the state of the database service mariaddb and the condition of database integrity backup is as follows:
judging whether the state of the database service mariaddb is normal or not;
if yes, judging whether the database has integrity backup, if yes, directly starting data recovery operation, and if not, performing data recovery operation according to the current environment;
if not, the service is crashed, and the crash service is checked and analyzed.
2. The method according to claim 1, wherein the specific process of directly starting the data recovery operation is as follows:
deleting the database;
entering a database backup catalog, and decompressing backup database files;
importing the backup data into a database, and restarting the ics-manager service after the database is recovered to be normal;
adds ics-manager services to the heartbeat cluster.
3. The method for recovering from crash of active/standby dual-node database according to claim 2, wherein said deleting database includes deleting database name, deleting database neutron, and deleting mysql.
4. The method for recovering crash of active/standby dual-node database according to claim 1, wherein the specific process of performing troubleshooting analysis on crash service is as follows:
backing up a database data directory and a database log;
acquiring configuration information through a database configuration file, and checking a service log;
and calling a problem solution library according to the service log, and performing problem pairing and recovery.
5. A device for recovering crash of a main and standby dual-node database is characterized by comprising:
the data preprocessing module is used for preprocessing the configuration parameters of the backup path of the database;
the service separation module is used for separating the ics-manager service from the main/standby switching;
the data recovery module is used for performing recovery operation on the database according to the state of the database service mariaddb and the condition of database integrity backup;
the data recovery module comprises:
the state judgment unit is used for judging whether the state of the database service mariaddb is normal or not;
the backup integrity judging unit is used for judging whether the database has an integrity backup;
the first data recovery unit is used for performing data recovery operation when the service state of the database is normal and the backup is complete;
the second data recovery unit is used for performing data recovery operation according to the current environment when the service state of the database is normal and the backup is incomplete;
and the analysis and investigation unit is used for carrying out investigation and analysis on the crash service when the service state of the database is abnormal.
6. The apparatus according to claim 5, wherein the parsing unit comprises:
the data backup subunit is used for backing up the database data catalog and the database log;
the information acquisition subunit acquires the configuration information through the database configuration file and checks the service log;
and the data recovery subunit calls the problem solution library according to the service log, and performs problem pairing and recovery.
7. The apparatus for recovering from a crash of an active/standby dual-node database according to claim 5, wherein said first data recovery unit comprises:
the first data processing subunit is used for deleting the database;
the second data processing subunit enters a database backup catalog and decompresses backup database files;
the service recovery subunit is used for importing the backup data into the database and restarting the ics-manager service after the database is recovered to be normal;
and the service configuration subunit adds the ics-manager service to the heartbeat cluster.
CN201911391020.5A 2019-12-30 2019-12-30 Method and device for recovering crash of main and standby dual-node databases Active CN111209140B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911391020.5A CN111209140B (en) 2019-12-30 2019-12-30 Method and device for recovering crash of main and standby dual-node databases

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911391020.5A CN111209140B (en) 2019-12-30 2019-12-30 Method and device for recovering crash of main and standby dual-node databases

Publications (2)

Publication Number Publication Date
CN111209140A CN111209140A (en) 2020-05-29
CN111209140B true CN111209140B (en) 2023-01-06

Family

ID=70787744

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911391020.5A Active CN111209140B (en) 2019-12-30 2019-12-30 Method and device for recovering crash of main and standby dual-node databases

Country Status (1)

Country Link
CN (1) CN111209140B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407045A (en) * 2016-09-29 2017-02-15 郑州云海信息技术有限公司 Data disaster recovery method and system, and server virtualization system
CN107291787A (en) * 2016-04-13 2017-10-24 中兴通讯股份有限公司 Master/slave data storehouse switching method and apparatus
US10282256B1 (en) * 2013-06-15 2019-05-07 Veritas Technologies Llc System and method to enable deduplication engine to sustain operational continuity

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10282256B1 (en) * 2013-06-15 2019-05-07 Veritas Technologies Llc System and method to enable deduplication engine to sustain operational continuity
CN107291787A (en) * 2016-04-13 2017-10-24 中兴通讯股份有限公司 Master/slave data storehouse switching method and apparatus
CN106407045A (en) * 2016-09-29 2017-02-15 郑州云海信息技术有限公司 Data disaster recovery method and system, and server virtualization system

Also Published As

Publication number Publication date
CN111209140A (en) 2020-05-29

Similar Documents

Publication Publication Date Title
US11513926B2 (en) Systems and methods for instantiation of virtual machines from backups
US8688642B2 (en) Systems and methods for managing application availability
US10565077B2 (en) Using cognitive technologies to identify and resolve issues in a distributed infrastructure
US11829263B2 (en) In-place cloud instance restore
EP3754514A1 (en) Distributed database cluster system, data synchronization method and storage medium
US9052935B1 (en) Systems and methods for managing affinity rules in virtual-machine environments
US8862927B2 (en) Systems and methods for fault recovery in multi-tier applications
US9170888B2 (en) Methods and apparatus for virtual machine recovery
CN108712501B (en) Information sending method and device, computing equipment and storage medium
US20130091376A1 (en) Self-repairing database system
US9342390B2 (en) Cluster management in a shared nothing cluster
US8984325B2 (en) Systems and methods for disaster recovery of multi-tier applications
CN112506702B (en) Disaster recovery method, device, equipment and storage medium for data center
CN105242990A (en) Cloud platform based data backup method and apparatus
US9002798B1 (en) Systems and methods for remedying corrupt backup images of host devices
US9195528B1 (en) Systems and methods for managing failover clusters
US9003139B1 (en) Systems and methods for recovering virtual machines after disaster scenarios
CN114328005B (en) Method and system for incremental backup of container data
US8650160B1 (en) Systems and methods for restoring multi-tier applications
CN111209140B (en) Method and device for recovering crash of main and standby dual-node databases
CN112506682A (en) Method, device and computer readable storage medium for relieving abnormity of business server
CN108959547A (en) A kind of PV snapshot distributed experiment &amp; measurement system restoration methods
CN109117317A (en) A kind of clustering fault restoration methods and relevant apparatus
CN106250432A (en) A kind of hbase fault-tolerance approach based on persistence MQ
CN112783832B (en) Method and device for storing snapshot file

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant