CN113986618B - Cluster brain fracture automatic repair method, system, device and storage medium - Google Patents

Cluster brain fracture automatic repair method, system, device and storage medium Download PDF

Info

Publication number
CN113986618B
CN113986618B CN202111315155.0A CN202111315155A CN113986618B CN 113986618 B CN113986618 B CN 113986618B CN 202111315155 A CN202111315155 A CN 202111315155A CN 113986618 B CN113986618 B CN 113986618B
Authority
CN
China
Prior art keywords
model
repair
target
data set
fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111315155.0A
Other languages
Chinese (zh)
Other versions
CN113986618A (en
Inventor
刘晓健
苏宝珠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202111315155.0A priority Critical patent/CN113986618B/en
Publication of CN113986618A publication Critical patent/CN113986618A/en
Application granted granted Critical
Publication of CN113986618B publication Critical patent/CN113986618B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1435Saving, restoring, recovering or retrying at system level using file system or storage system metadata
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1471Saving, restoring, recovering or retrying involving logging of persistent data for recovery
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a MySQL Galera cluster brain crack automatic restoration method, a system, a device and a storage medium, wherein the method comprises the following steps: when detecting that the brain fracture problem of the MySQL Galera cluster is not recovered within the length of a preset time period, acquiring a container log and fault information corresponding to the brain fracture problem occurrence stage of the MySQL Galera cluster; selecting a target repair model from a preset model library based on the LogME evaluation model according to the container log and the fault information; and if the target repair model meets the preset requirement, repairing the cerebral infarction problem of the MySQL Galera cluster by adopting the target repair model. According to the scheme, when the container log corresponding to the brain fracture problem does not exist in the model library and the fault information does not have a corresponding repair model, a target repair scheme is selected from the model library based on LogME evaluation, so that the purpose of automatically selecting the fault repair model for the brain fracture problem can be achieved.

Description

Cluster brain fracture automatic repair method, system, device and storage medium
Technical Field
The application relates to an automatic restoration method, system, device and storage medium for cluster brain cracks, belonging to the technical field of automatic restoration of MySQL Galera cluster brain cracks.
Background
As the volume of data traffic for AI platforms and deep learning tasks increases, so does the pressure of data storage. In particular, due to the increasingly high availability trend of platform clusters, the reliability requirements of mysql are also increasing. The MySQL Galera cluster solves the problem, when the AI platform is deployed in a high availability mode, mySQL is arranged on three main nodes, the relationship between the MySQL and the main nodes is equal, the multi-master architecture can write data into multiple nodes at the same time, and when the problem occurs in one node, the node can be used for replacing the node to provide service, so that the consistency of the cluster data is ensured.
However, under the conditions of platform server fault, unexpected power failure, network jitter, abnormal service and the like, mySQL Galera cluster may have a brain fracture condition, so that cluster data is inconsistent; and in this case, there is a problem that the normal cannot be automatically restored. The problem occurs that the bottom layer service of the AI platform is abnormal in connection with the database, business data cannot be read into and written into the database, and the overall influence is large.
In the prior art, the MySQL Galera cluster repairing method is characterized in that two to three representative database faults and repairing methods are input into a recovering script, and when a specified problem occurs in a database, a repairing scheme can be selected according to the corresponding relation between the database faults and the recovering methods recorded in the recovering script to repair the database faults and the recovering methods. However, when a problem that may occur in the fault log information is not in the recovery script, the system cannot automatically recover.
Disclosure of Invention
The application provides a method, a system, a device and a storage medium for automatically repairing a cluster brain fracture, which are used for solving the technical problem that a system cannot automatically recover when the possible problems in fault log information are not in a recovery script in the prior art.
In a first aspect, an embodiment of the present application provides a method for automatically repairing a clustered brain fracture, including:
when detecting that the brain fracture problem of the MySQL Galera cluster is not recovered within the length of a preset time period, acquiring a container log and fault information corresponding to the brain fracture problem occurrence stage of the MySQL Galera cluster;
selecting a target repair model from a preset model library based on a pre-trained LogME evaluation model according to the container log and the fault information;
and if the target repair model meets the preset requirement, repairing the cerebral infarction problem of the MySQL Galera cluster by adopting the target repair model.
Preferably, the method further comprises:
generating a target data set (targetData) according to the container log and fault information;
the selecting a target repair scheme from a preset fault repair model library based on the pre-trained LogME evaluation model according to the container log and the fault information comprises the following steps:
the target data set is brought into each stored fault recovery model, and a recovery result file corresponding to each fault recovery model is obtained;
and evaluating the recovery result file based on a pre-trained LogME evaluation model, and selecting a target repair model corresponding to the best recovery result file.
Preferably, if the target repair model meets a preset requirement, repairing a brain fracture problem of the MySQL Galera cluster by using the target repair model includes:
selecting a preset number of first characteristic points for the target data set to form a first characteristic point set, and determining a second characteristic point set corresponding to the first specific point set based on a fault restoration model corresponding to the optimal restoration result;
determining variances of the first characteristic point set and the second characteristic point set by adopting a variance calculation model;
when the variance of the first characteristic point set and the variance of the second characteristic point set meet a preset variance threshold, repairing the cerebral infarction problem by adopting the target repairing model.
Preferably, the variance calculation model is:
wherein mu i To select from the ith feature point in the target data set, X i The feature points obtained for the target repair model corresponding to the ith feature point in the target data set are n which is the number of the feature points selected from the target data set, S 2 Is the feature point variance.
Preferably, the method further comprises:
if the target repair model does not meet the preset requirement, reporting a target data set and receiving a repair scheme aiming at the target data set;
training out repair scheme model information according to the received repair scheme aiming at the target data set, and storing the target data set and the trained repair scheme model information into a model library.
In a second aspect, according to an embodiment of the present application, there is provided an automatic repair system for clustered brain cracks, the system including:
the acquisition module is used for acquiring a container log and fault information corresponding to a phase of the MySQL Galera cluster, wherein the phase of the MySQL Galera cluster is provided with the brain fracture problem when the brain fracture problem is detected not to be recovered within a preset time period;
the target repair scheme selection module is used for selecting a target repair model from a preset model library based on a pre-trained LogME evaluation model according to the container log and the fault information;
and the automatic repair module is used for repairing the cerebral infarction problem of the MySQL Galera cluster by adopting the target repair model if the target repair model meets the preset requirement.
Preferably, the system further comprises:
the target data set generation module is used for generating a target data set according to the container log and the fault information;
the target repair scheme selection module includes:
the recovery result generating unit brings the target data set into each stored fault recovery model to obtain a recovery result file corresponding to each fault recovery model;
and the target repair model determining unit is used for evaluating the recovery result file based on a pre-trained LogME evaluation model and selecting a target repair model corresponding to the best recovery result file.
Preferably, the system further comprises:
the reporting unit is used for reporting the target data set and receiving a repairing scheme aiming at the target data set if the target repairing model does not meet the preset requirement;
the repair scheme model training unit is used for training repair scheme model information according to the received repair scheme aiming at the target data set, and storing the target data set and the trained repair scheme model information into the model library.
According to a third aspect, an embodiment of the present application provides an automatic repair device for a MySQL Galera cluster brain fracture, including a processor, a memory, and a computer program stored in the memory and executable on the processor, where the computer program is loaded and executed by the processor, so as to implement the automatic repair method for a cluster brain fracture according to any one of the foregoing embodiments.
In a fourth aspect, according to an embodiment of the present application, there is provided a computer readable storage medium storing a computer program, where the computer program is used to implement the method for automatically repairing a clustered brain crack according to any one of the above embodiments when executed by a processor.
The application has the beneficial effects that:
according to the MySQL Galera cluster brain fracture automatic restoration method, system, device and storage medium provided by the embodiment of the application, when the brain fracture problem of the MySQL Galera cluster is detected not to be restored within the preset length time period, the container log and the fault information corresponding to the brain fracture problem occurrence stage of the MySQL Galera cluster are obtained; selecting a target repair scheme from a preset fault repair model library based on a pre-trained LogME evaluation model according to the container log and the fault information; and repairing the cerebral infarction problem of the MySQL Galera cluster by adopting a target repairing scheme. According to the scheme, when the container log corresponding to the brain fracture problem does not exist in the model library and the fault information does not exist in the corresponding repair model, the target repair scheme is selected from the model library based on the pre-trained LogME evaluation model, so that the aim of automatically selecting the repair model for the brain fracture problem can be fulfilled.
In addition, in the embodiment of the application, when the target repair model selected from the preset fault repair model library does not meet the preset requirement based on the pre-trained LogME evaluation model, a target data set corresponding to the current brain fracture problem is reported, related personnel can set a fault repair scheme according to the target data set, after the fault repair scheme set by the related personnel is received, repair scheme model information is trained according to the target data set, and the target data set and the trained repair scheme model information are stored in the model library. When the type of the cerebral infarction problem occurs again next time, the cerebral infarction problem can be automatically repaired based on the model information of the repairing scheme trained according to the target data set.
The foregoing description is only an overview of the present application, and is intended to provide a better understanding of the present application, as it is embodied in the following description, with reference to the preferred embodiments of the present application and the accompanying drawings.
Drawings
FIGS. 1-4 are flowcharts of an automatic repair method for a clustered brain fracture according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a clustered automatic repair system for cerebral cracks in another embodiment;
fig. 6 is a block diagram of an automatic repair device for clustered brain cracks according to an embodiment of the present application.
Detailed Description
The following examples are illustrative of the application and are not intended to limit the scope of the application.
The automatic repairing method for the cerebral infarction problem is suitable for the situation that the cerebral infarction problem cannot be automatically repaired when the cerebral infarction problem occurs in a system, in the traditional scheme, a script file is set, database faults corresponding to the cerebral infarction problem and a recovering method corresponding to the database faults are set in the script file, and when the cerebral infarction problem occurs, the cerebral infarction problem is repaired based on the recovering method corresponding to the database faults stored in the script file. However, if a new fault occurs to the database fault stored in the script file, the existing technical solution cannot automatically repair the split brain problem.
According to the scheme, a script file or a storage module is also arranged, the corresponding relation between database faults and the repair models is also stored, when a brain fracture problem occurs, the corresponding fault repair model is obtained according to the database faults corresponding to the brain fracture problem, the selected fault repair model is further adopted to repair the occurring database faults, when a new brain fracture problem occurs, the new brain fracture problem is brought into each mathematical model stored in the script file or the database, then a target model is selected by adopting a pre-trained log ME model, so that the purpose of automatic repair can be achieved, when the fault repair scheme which does not meet the requirement is selected by adopting the log ME model, the fault information is reported, related personnel (such as operation staff) can set the fault repair scheme according to the fault information, after the fault repair scheme is received, the system exercises the fault repair model corresponding to the reported target data set and stores the corresponding fault repair model into the script file or the storage module, and when the same target data set is detected again, the fault repair can be carried out according to the stored trained fault repair model, and the intelligent fault repair can be continued to achieve the purpose of the related personnel. The MySQL Galera cluster brain crack automatic restoration scheme disclosed by the application is explained in detail as follows.
Fig. 1 shows a flowchart of an automatic repair method for MySQL Galera cluster brain cracks in an embodiment of the present application, where, as shown in fig. 1, the automatic repair method for MySQL Galera cluster brain cracks includes:
step S12, when the situation that the brain fracture problem of the MySQL Galera cluster is not recovered within the length of a preset time period is detected, acquiring a container log and fault information corresponding to the brain fracture problem occurrence stage of the MySQL Galera cluster;
in the embodiment of the application, a target time period can be set, the time period length can be 30s, 1min or 90s, and when the brain fracture problem of the MySQL Galera cluster is detected to be not automatically repaired within the target time period, the brain fracture problem is determined to be classified as a corresponding fault repair model which cannot be determined according to a script file or a corresponding relation between fault information and fault repair models stored in a storage module, namely, the fault repair can not be performed by adopting a scheme in the prior art. The corresponding relation of the fault information and the fault repair model is the corresponding relation of the fault information stored in the script file or the storage module and the fault repair model corresponding to the fault information.
In the embodiment of the application, when the problem of the brain fracture of the MySQL Galera cluster is not recovered within the preset time period, the container log and the fault information corresponding to the time period of the unrecoverable brain fracture of the MySQL Galera cluster are acquired.
Step S14, selecting a target repair model from a preset model library based on a pre-trained LogME evaluation model according to the container log and fault information;
in the embodiment of the application, after the container log and the fault information corresponding to the unrecoverable brain fracture problem are obtained, the container log and the fault information are firstly brought into a script file or each fault repair model stored in a storage module, the brain fracture problem is repaired based on each stored fault repair model, and a corresponding fault repair result is obtained for each fault repair model. And then, continuously adopting a trained LogME evaluation model to evaluate the obtained fault repair result, and evaluating a fault repair model with the highest score from the fault repair models, namely a target repair model.
And S16, if the target repair model meets the preset requirement, repairing the cerebral infarction problem of the MySQL Galera cluster by adopting the target repair model.
In the embodiment of the application, after the target repair model is determined, whether the target repair scheme meets the preset requirement is continuously judged, and when the target repair scheme meets the preset requirement, the target repair scheme is adopted to repair the brain fracture problem of the MySQL Galera cluster. As an implementation manner, the preset requirement is whether the variance obtained by the target repair model meets a set variance threshold, and if so, the target repair model is determined to meet the preset requirement.
In an embodiment of the present application, referring to fig. 2, the method further includes:
step S13, generating a target data set (targetData) according to the container log and fault information;
in the method, after the container log and fault information corresponding to the unrecoverable brain fracture problem are obtained, a target data set (targetData) is firstly generated according to the container log and the fault information, and then the target data set can be brought into a script file or a fault repair model stored in a storage module, so that a repair result of repairing the target data set by each stored preset fault repair model is obtained.
Accordingly, in step S14, according to the container log and the fault information, based on the pre-trained log me evaluation model, a target repair model is selected from a preset fault repair model library, including:
step S141, the target data set is brought into each stored fault recovery model to obtain a recovery result file corresponding to each fault recovery model;
and step S142, evaluating the recovery result file based on a pre-trained LogME evaluation model, and selecting a target repair model corresponding to the best recovery result file.
In the embodiment of the application, the process of selecting the target repair scheme from the preset fault repair model library based on the pre-trained LogME evaluation model is roughly divided into two steps, wherein the first step is as follows: and substituting the target data set into each fault repair model in the preset fault repair model library in turn to obtain a recovery result file corresponding to each fault repair model, wherein the recovery result file may be a numerical value. The specific substitution method can be as follows: the target data set (targetData) corresponding to the brain fracture problem is respectively brought into each pre-stored fault repair model, and a model test (test command such as python-u is executed
The model_check_res is obtained by/log me/trace_sample_cnn_lm_v1.01/descriptions/lm.py-model x lvgg16-feature 20-1000-batch_size 256-data_dir=/targetData), i.e. a repair result model_check_res is obtained for each stored fault repair model.
After a repair result file corresponding to each fault recovery model is obtained, evaluating the repair result of each preset fault recovery model based on a pre-trained LogME model, determining the fault recovery model with the optimal repair result as a target repair model, wherein the target repair model needs to be continuously judged after the target repair model is determined, and when the target repair model meets the preset requirement, the target repair model is adopted to repair the brain fracture problem.
In the embodiment of the present application, referring to fig. 3, in step S142, according to the container log and the fault information, a target repair model is selected from a preset model library based on a pre-trained LogME evaluation model, including:
step S1421, comparing the repair results to determine an optimal repair result file;
as a specific embodiment, after the target data set is brought into the preset fault repair models to obtain repair results corresponding to the preset fault repair models, comparing the obtained repair result values, and determining the repair result with the largest repair result value as the best repair result based on the comparison of the obtained repair result values.
Step S1422, selecting a preset number of first characteristic points for the target data set to form a first characteristic point set, and determining a second characteristic point set corresponding to the first specific point set based on a fault restoration model corresponding to the best restoration result;
in the embodiment of the application, after the optimal restoration model is determined, a preset number of first characteristic points are selected from a container log and fault information generation target data set (targetData) which occur according to the stage of occurrence of the brain fracture problem that cannot be automatically restored to form a first characteristic point set, then each first characteristic point is brought into a fault restoration model corresponding to an optimal restoration result one by one to obtain second characteristic points corresponding to the first characteristic points, and all the second characteristic points are gathered together to form the second characteristic point set.
In the embodiment of the present application, in step S16, if the target repair model meets a preset condition, repairing the brain fracture problem of the MySQL Galera cluster by using the target repair model includes:
step S161, selecting a preset number of first characteristic points for a target data set to form a first characteristic point set, and determining a second characteristic point set corresponding to the first specific point set based on a fault restoration model corresponding to an optimal restoration result;
step S162, determining variances of the first feature point set and the second feature point set by adopting a variance calculation model;
after determining a second characteristic point set corresponding to the first characteristic point set, calculating variances of the first characteristic point set and the second characteristic point set by adopting a variance calculation model.
And step 163, when the variance of the first feature point set and the variance of the second feature point set meet a preset variance threshold, repairing the brain fracture problem by adopting the target repair model.
In the embodiment of the application, after the variances of the first feature point set and the second feature point set are calculated, whether the calculated variances meet the preset variance threshold is judged again, if yes, a fault modification model corresponding to the best repair result is taken as a target repair scheme, otherwise, even if the fault repair model corresponding to the best repair result is determined according to the scores, the fault modification model cannot be determined as the target repair scheme.
Further, the calculation model of the variance of the first feature point set and the second feature point set is:
wherein mu i To obtain the ith first characteristic point X from the target data set i The ith second characteristic point obtained for the optimal repair scheme corresponding to the ith first characteristic point in the target data set, n is the number of characteristic points selected from the target data set, S 2 Is the variance of the first set of feature points and the second set of feature points.
In an embodiment of the present application, referring to fig. 4, the method further includes:
step S181, if the target repair model does not meet the preset requirement, reporting a target data set and receiving a repair scheme aiming at the target data set;
step S182, training out repair scheme model information according to the received repair scheme aiming at the target data set, and storing the target data set and the trained repair scheme model information into a model library.
In the embodiment of the application, a LogME model is adopted to evaluate the test result file, if the stored preset fault repair model does not have a fault repair scheme meeting the requirement, the target data set is reported, and in general, the target data set is reported to a management end or a server, after receiving the reported target data set, a worker of the management end or the server analyzes the target data set, and a repair scheme corresponding to the target data set is manually selected or input according to the analysis result.
After receiving a repair scheme selected or manually input by a worker at a management end or a server, training a model corresponding to the repair scheme, further training a fault repair model corresponding to the target data set, and storing the fault repair model into a model library.
When the cerebral infarction problem occurs again, the fault repair model can be selected from the prestored fault repair models according to the target data set, so that the effect of automatic repair can be realized.
In view of the above-mentioned, it is desirable,
according to the MySQL Galera cluster brain fracture automatic restoration method, system, device and storage medium provided by the embodiment of the application, when the brain fracture problem of the MySQL Galera cluster is detected not to be restored within the length of a preset time period, the container log and fault information corresponding to the brain fracture problem occurrence stage of the MySQL Galera cluster are obtained; selecting a target repair scheme from a preset fault repair model library based on a pre-trained LogME evaluation model according to the container log and the fault information; and repairing the cerebral infarction problem of the MySQL Galera cluster by adopting a target repairing scheme. According to the scheme, when the container log corresponding to the brain fracture problem does not exist in the model library and the fault information does not exist in the corresponding repair model, the target repair scheme is selected from the model library based on the pre-trained LogME evaluation model, so that the aim of automatically selecting the repair model for the brain fracture problem can be fulfilled.
In addition, in the embodiment of the application, when the target repair model selected from the preset fault repair model library does not meet the preset requirement based on the pre-trained LogME evaluation model, a target data set corresponding to the current brain fracture problem is reported, related personnel can set a fault repair scheme according to the target data set, after the fault repair scheme set by the related personnel is received, repair scheme model information is trained according to the target data set, and the target data set and the trained repair scheme model information are stored in the model library. When the type of the cerebral infarction problem occurs again next time, the cerebral infarction problem can be automatically repaired based on the model information of the repairing scheme trained according to the target data set.
Therefore, according to the scheme, aiming at the problem of the brain fracture which cannot be repaired, only one time of fault repair scheme is needed to be received from the management end or the server, and when the problem of the brain fracture occurs again, the brain fracture can be repaired according to the fault repair model trained by the received fault repair scheme. In the scheme, as time increases, the number of fault repair models trained according to the received fault repair scheme is increased, namely the number of pre-stored fault repair schemes is increased, namely the fault repair schemes have stronger self-learning capacity, so that more brain fracture problems can be repaired.
Fig. 5 is a block diagram of an automatic repair system for MySQL Galera cluster brain cracks according to an embodiment of the present application, where the automatic repair method for MySQL Galera cluster brain cracks shown in fig. 1 is adopted by the system. The system at least comprises the following modules:
the obtaining module 51 is used for obtaining a container log and fault information corresponding to a phase of the MySQL Galera cluster, when the MySQL Galera cluster is detected that the brain fracture problem is not recovered within a preset time period;
the target repair scheme selection module 52 is configured to select a target repair model from a preset model library based on a pre-trained LogME evaluation model according to the container log and fault information;
and the automatic repair module 53 is configured to repair the cerebral infarction problem occurring in the MySQL Galera cluster by using the target repair model if the target repair model meets a preset requirement.
In an embodiment of the present application, the system further includes:
the target data set generation module is used for generating a target data set according to the container log and the fault information;
the target repair scheme selection module includes:
the recovery result generating unit brings the target data set into each stored fault recovery model to obtain a recovery result file corresponding to each fault recovery model;
and the target repair model determining unit is used for evaluating the recovery result file based on a pre-trained LogME evaluation model and selecting a target repair model corresponding to the best recovery result file.
Further, the automatic repair module includes:
the characteristic point extraction subunit is used for selecting a preset number of first characteristic points for the target data set to form a first characteristic point set, and determining a second characteristic point set corresponding to the first specific point set based on a fault restoration model corresponding to the optimal restoration result;
the variance calculating subunit is used for determining variances of the first characteristic point set and the second characteristic point set by adopting a variance calculating model;
and the target restoration scheme determining subunit is used for restoring the brain fracture problem by adopting the target restoration model when the variance of the first characteristic point set and the variance of the second characteristic point set meet a preset variance threshold.
Further, the variance calculation subunit adopts a variance calculation model, which is:
wherein mu i To select from the ith feature point in the target data set, X i For the feature points corresponding to the ith feature point in the target data set and obtained after the target repair model corresponding to the ith feature point is repaired, n is the number of the feature points selected from the target data set, S 2 Is the feature point variance.
Still further, the system further comprises:
the reporting unit is used for reporting the target data set and receiving a repairing scheme aiming at the target data set if the target repairing model does not meet the preset requirement;
the repair scheme model training unit is used for training repair scheme model information according to the received repair scheme aiming at the target data set, and storing the target data set and the trained repair scheme model information into the model library.
The MySQL Galera cluster brain crack automatic restoration method, system, device and storage medium provided by the embodiment of the application,
according to the MySQL Galera cluster brain fracture automatic restoration method, system, device and storage medium provided by the embodiment of the application, when the brain fracture problem of the MySQL Galera cluster is detected not to be restored within the length of a preset time period, the container log and fault information corresponding to the brain fracture problem occurrence stage of the MySQL Galera cluster are obtained; selecting a target repair scheme from a preset fault repair model library based on a pre-trained LogME evaluation model according to the container log and the fault information; and repairing the cerebral infarction problem of the MySQL Galera cluster by adopting a target repairing scheme. According to the scheme, when the container log corresponding to the brain fracture problem does not exist in the model library and the fault information does not exist in the corresponding repair model, the target repair scheme is selected from the model library based on the pre-trained LogME evaluation model, so that the aim of automatically selecting the repair model for the brain fracture problem can be fulfilled.
In addition, in the embodiment of the application, when the target repair model selected from the preset fault repair model library does not meet the preset requirement based on the pre-trained LogME evaluation model, a target data set corresponding to the current brain fracture problem is reported, related personnel can set a fault repair scheme according to the target data set, after the fault repair scheme set by the related personnel is received, repair scheme model information is trained according to the target data set, and the target data set and the trained repair scheme model information are stored in the model library. When the type of the cerebral infarction problem occurs again next time, the cerebral infarction problem can be automatically repaired based on the model information of the repairing scheme trained according to the target data set. Therefore, according to the scheme, aiming at the problem of the brain fracture which cannot be repaired, only one time of fault repair scheme is needed to be received from the management end or the server, and when the problem of the brain fracture occurs again, the brain fracture can be repaired according to the fault repair model trained by the received fault repair scheme. In the scheme, as time increases, the number of fault repair models trained according to the received fault repair scheme is increased, namely the number of pre-stored fault repair schemes is increased, namely the fault repair schemes have stronger self-learning capacity, so that more brain fracture problems can be repaired.
It should be noted that: the MySQL Galera cluster split automatic repair method provided in the above embodiment belongs to the same concept as the MySQL Galera cluster split automatic repair system embodiment, and the detailed implementation process thereof is referred to the method embodiment, and is not repeated here.
Fig. 6 is a block diagram of an automatic repair device for MySQL Galera cluster brain cracks, which may be a desktop computer, a notebook computer, a palm computer, a cloud server, and other computing devices according to an embodiment of the present application, where the device may include, but is not limited to, a processor and a memory. The MySQL Galera cluster split automatic repair device in this embodiment at least includes a processor and a memory, where the memory stores a computer program, the computer program may run on the processor, and when the processor executes the computer program, the steps in the above-mentioned MySQL Galera cluster split automatic repair method embodiment are implemented, for example, the steps in the MySQL Galera cluster split automatic repair method shown in any one of fig. 1-2. Or when the processor executes the computer program, the functions of the modules in the embodiment of the MySQL Galera cluster brain crack automatic repair device are realized.
The computer program may be divided into one or more modules, which are stored in the memory and executed by a processor to accomplish the present application, for example. The one or more modules may be a series of computer program instruction segments capable of performing particular functions in describing the computer programThe saidMySQL Galera cluster brain crack automatic restorationDevice and method for controlling the sameIs performed in the same manner as the above. For example, the computer program may be divided into an acquisition module, a target repair scheme selection module, and an automatic repair module, each of which has the following specific functions:
the acquisition module is used for acquiring a container log and fault information corresponding to a phase of the MySQL Galera cluster, wherein the phase of the MySQL Galera cluster is provided with the brain fracture problem when the brain fracture problem is detected not to be recovered within a preset time period;
the target repair scheme selection module is used for selecting a target repair scheme from a preset fault repair model library based on a pre-trained LogME evaluation model according to the container log and the fault information;
and the automatic repair module is used for repairing the brain fracture problem of the MySQL Galera cluster by adopting a target repair scheme.
The processor may include one or more processing cores, such as: 4 core processor, 6 core processor, etc. The processor may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning. The processor is a control center of the message optimal transmission priority control system, and is connected with each part of the whole MySQL Galera cluster brain-crack automatic repair system by utilizing various interfaces and lines.
The memory may be used to store the computer program and/or module, and the processor may implement various functions of the MySQL Galera cluster automatic repair system by running or executing the computer program and/or module stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, memory device, or other volatile solid-state storage device.
It will be understood by those skilled in the art that the apparatus described in this embodiment is merely an example of the apparatus for controlling the priority of the transmission of the message, and does not constitute a limitation of the system for controlling the priority of the transmission of the message, and in other embodiments, more or fewer components may be included, some components may be combined, or different components, for example, the apparatus for controlling the priority of the transmission of the message may further include an input/output device, a network access device, a bus, and so on. The processor, memory, and peripheral interfaces may be connected by buses or signal lines. The individual peripheral devices may be connected to the peripheral device interface via buses, signal lines or circuit boards. Illustratively, peripheral devices include, but are not limited to: radio frequency circuitry, touch display screens, audio circuitry, and power supplies, among others.
Of course, the MySQL Galera cluster automatic brain crack repairing device may further include fewer or more components, which is not limited in this embodiment.
Optionally, the application further provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program is used for realizing the steps of the MySQL Galera cluster brain crack automatic repair method when being executed by a processor.
Optionally, the present application further provides a computer product, where the computer product includes a computer readable storage medium, where a program is stored in the computer readable storage medium, and the program is loaded and executed by a processor to implement the steps of the MySQL Galera cluster brain fracture automatic repair method embodiment.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (7)

1. An automatic cluster brain fracture repairing method is characterized by comprising the following steps:
when detecting that the brain fracture problem of the MySQL Galera cluster is not recovered within the length of a preset time period, acquiring a container log and fault information corresponding to the brain fracture problem occurrence stage of the MySQL Galera cluster;
selecting a target repair model from a preset model library based on a pre-trained LogME evaluation model according to the container log and the fault information;
if the target repair model meets the preset requirement, repairing the cerebral infarction problem of the MySQL Galera cluster by adopting the target repair model;
wherein the method further comprises:
generating a target data set according to the container log and fault information;
the selecting a target repair scheme from a preset fault repair model library based on the pre-trained LogME evaluation model according to the container log and the fault information comprises the following steps:
the target data set is brought into each stored fault recovery model, and a recovery result file corresponding to each fault recovery model is obtained;
evaluating the recovery result file based on a pre-trained LogME evaluation model, and selecting a target repair model corresponding to the best recovery result file;
and if the target repair model meets the preset requirement, repairing the cerebral infarction problem of the MySQL Galera cluster by adopting the target repair model, wherein the method comprises the following steps:
selecting a preset number of first characteristic points for the target data set to form a first characteristic point set, and determining a second characteristic point set corresponding to the first specific point set based on a fault restoration model corresponding to the optimal restoration result;
determining variances of the first characteristic point set and the second characteristic point set by adopting a variance calculation model;
when the variance of the first characteristic point set and the variance of the second characteristic point set meet a preset variance threshold, repairing the cerebral infarction problem by adopting the target repairing model.
2. The method of claim 1, wherein the variance calculation model is:
wherein mu i To select from the ith feature point in the target data set, X i The feature points obtained for the target repair model corresponding to the ith feature point in the target data set are n which is the number of the feature points selected from the target data set, S 2 Is the feature point variance.
3. The method according to claim 1, characterized in that the method further comprises:
if the target repair model does not meet the preset requirement, reporting a target data set and receiving a repair scheme aiming at the target data set;
training out repair scheme model information according to the received repair scheme aiming at the target data set, and storing the target data set and the trained repair scheme model information into a model library.
4. A clustered automatic repair system for a split brain, the system comprising:
the acquisition module is used for acquiring a container log and fault information corresponding to a phase of the MySQL Galera cluster, wherein the phase of the MySQL Galera cluster is provided with the brain fracture problem when the brain fracture problem is detected not to be recovered within a preset time period;
the target repair scheme selection module is used for selecting a target repair model from a preset model library based on a pre-trained LogME evaluation model according to the container log and the fault information;
the automatic repair module is used for repairing the brain fracture problem of the MySQL Galera cluster by adopting the target repair model if the target repair model meets the preset requirement;
the system further comprises:
the target data set generation module is used for generating a target data set according to the container log and the fault information;
the target repair scheme selection module includes:
the recovery result generating unit brings the target data set into each stored fault recovery model to obtain a recovery result file corresponding to each fault recovery model;
the target repair model determining unit is used for evaluating the recovery result file based on a pre-trained LogME evaluation model and selecting a target repair model corresponding to the best recovery result file;
the automatic repair module includes:
the characteristic point extraction subunit is used for selecting a preset number of first characteristic points for the target data set to form a first characteristic point set, and determining a second characteristic point set corresponding to the first specific point set based on a fault restoration model corresponding to the optimal restoration result;
the variance calculating subunit is used for determining variances of the first characteristic point set and the second characteristic point set by adopting a variance calculating model;
and the target restoration scheme determining subunit is used for restoring the brain fracture problem by adopting the target restoration model when the variance of the first characteristic point set and the variance of the second characteristic point set meet a preset variance threshold.
5. The system of claim 4, further comprising:
the reporting unit is used for reporting the target data set and receiving a repairing scheme aiming at the target data set if the target repairing model does not meet the preset requirement;
the repair scheme model training unit is used for training repair scheme model information according to the received repair scheme aiming at the target data set, and storing the target data set and the trained repair scheme model information into the model library.
6. A clustered automatic repair device comprising a processor, a memory and a computer program stored in the memory and executable on the processor, wherein the computer program is loaded and executed by the processor to implement the clustered automatic repair method of any one of claims 1-3.
7. A computer readable storage medium storing a computer program, characterized in that the computer program is for implementing the clustered brain fracture automatic repair method according to any one of claims 1-3 when executed by a processor.
CN202111315155.0A 2021-11-08 2021-11-08 Cluster brain fracture automatic repair method, system, device and storage medium Active CN113986618B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111315155.0A CN113986618B (en) 2021-11-08 2021-11-08 Cluster brain fracture automatic repair method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111315155.0A CN113986618B (en) 2021-11-08 2021-11-08 Cluster brain fracture automatic repair method, system, device and storage medium

Publications (2)

Publication Number Publication Date
CN113986618A CN113986618A (en) 2022-01-28
CN113986618B true CN113986618B (en) 2023-11-10

Family

ID=79747155

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111315155.0A Active CN113986618B (en) 2021-11-08 2021-11-08 Cluster brain fracture automatic repair method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN113986618B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114528350B (en) * 2022-02-18 2024-01-16 苏州浪潮智能科技有限公司 Cluster brain fracture processing method, device, equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844132A (en) * 2015-12-03 2017-06-13 北京国双科技有限公司 The fault repairing method and device of cluster server
CN107608826A (en) * 2017-09-19 2018-01-19 郑州云海信息技术有限公司 A kind of fault recovery method, device and the medium of the node of storage cluster
CN108429629A (en) * 2017-02-14 2018-08-21 腾讯科技(深圳)有限公司 Equipment fault restoration methods and device
CN111597079A (en) * 2020-05-21 2020-08-28 山东汇贸电子口岸有限公司 Method and system for detecting and recovering MySQL Galera cluster fault
CN112131033A (en) * 2020-09-18 2020-12-25 苏州浪潮智能科技有限公司 Server fault repairing method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844132A (en) * 2015-12-03 2017-06-13 北京国双科技有限公司 The fault repairing method and device of cluster server
CN108429629A (en) * 2017-02-14 2018-08-21 腾讯科技(深圳)有限公司 Equipment fault restoration methods and device
CN107608826A (en) * 2017-09-19 2018-01-19 郑州云海信息技术有限公司 A kind of fault recovery method, device and the medium of the node of storage cluster
CN111597079A (en) * 2020-05-21 2020-08-28 山东汇贸电子口岸有限公司 Method and system for detecting and recovering MySQL Galera cluster fault
CN112131033A (en) * 2020-09-18 2020-12-25 苏州浪潮智能科技有限公司 Server fault repairing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN113986618A (en) 2022-01-28

Similar Documents

Publication Publication Date Title
CN107248927B (en) Generation method of fault positioning model, and fault positioning method and device
US11294754B2 (en) System and method for contextual event sequence analysis
US20160232450A1 (en) Storage device lifetime monitoring system and storage device lifetime monitoring method thereof
CN107832164A (en) A kind of method and device of the faulty hard disk processing based on Ceph
CN115994044B (en) Database fault processing method and device based on monitoring service and distributed cluster
CN110187841A (en) A kind of method, apparatus and storage server of system management memory disk
CN113986618B (en) Cluster brain fracture automatic repair method, system, device and storage medium
CN107516546B (en) Online detection device and method for random access memory
CN110291505A (en) Reduce the recovery time of application
CN114113984A (en) Fault drilling method, device, terminal equipment and medium based on chaotic engineering
CN113705896A (en) Target equipment determination method and device and electronic equipment
CN112069023A (en) Storage link monitoring system and method
CN110968456B (en) Method and device for processing fault disk in distributed storage system
US10964132B2 (en) Detecting fault states of an aircraft
CN114647531A (en) Failure solving method, failure solving system, electronic device, and storage medium
CN113407180B (en) Configuration page generation method, system, equipment and medium
CN112069014B (en) Storage system fault simulation method, device, equipment and medium
CN113946543A (en) Data archiving method, device, equipment and storage medium based on artificial intelligence
Kang et al. Spatiotemporal real-time anomaly detection for supercomputing systems
CN112838962A (en) Performance bottleneck detection method and device for big data cluster
CN116628508B (en) Model training process anomaly detection method, device, equipment and storage medium
CN115858324B (en) AI-based IT equipment fault processing method, apparatus, equipment and medium
CN113656208A (en) Data processing method, device, equipment and storage medium for distributed storage system
CN115511620A (en) Fault processing method and device, storage medium and computer equipment
CN114185756A (en) Distributed system state analysis method, device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant