CN114706856A

CN114706856A - Fault processing method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN114706856A
Application number: CN202210377076.0A
Authority: CN
Inventors: 张静; 张宪波
Original assignee: Jingdong Technology Information Technology Co Ltd
Current assignee: Jingdong Technology Information Technology Co Ltd
Priority date: 2022-04-11
Filing date: 2022-04-11
Publication date: 2022-07-05

Abstract

The disclosure provides a fault processing method and device, electronic equipment and a computer readable storage medium, which can be applied to the technical field of big data. The fault processing method comprises the following steps: receiving fault log data to be tested from a database to be tested; inputting the fault log data to be detected into a log template tree, wherein the log template tree comprises a plurality of keyword sets and fault category labels respectively corresponding to the keyword sets, and the keyword sets are obtained by processing target fault log data of preselected fault levels in a preset historical time period; outputting the fault category of the database to be tested by using the log template tree; and carrying out fault processing on the database to be tested according to the fault category of the database to be tested.

Description

Fault processing method and device, electronic equipment and computer readable storage medium

Technical Field

The present disclosure relates to the field of big data technologies, and in particular, to a method, an apparatus, a device, a medium, and a program product for processing a fault.

Background

In the database operation and maintenance work, an operation and maintenance engineer needs to determine the fault category of the database. Usually, an operation and maintenance engineer checks a database fault log printed in a fault time period, and performs fault troubleshooting according to historical operation and maintenance experience. Because the mode depends on historical experience, the fault cognition is limited by artificial judgment, the accuracy of fault prediction of unknown types is not high, and the manual troubleshooting process is complicated and consumes long time.

Disclosure of Invention

In view of the above, the present disclosure provides a fault handling method, apparatus, device, medium, and program product.

In one aspect of the present disclosure, a fault handling method is provided, including:

receiving fault log data to be tested from a database to be tested;

inputting the fault log data to be detected into a log template tree, wherein the log template tree comprises a plurality of keyword sets and fault category labels respectively corresponding to the keyword sets, and the keyword sets are obtained by processing target fault log data of preselected fault levels in a preset historical time period;

outputting the fault category of the database to be tested by using the log template tree; and

and carrying out fault processing on the database to be tested according to the fault category of the database to be tested.

According to the embodiment of the disclosure, processing the target fault log data of the preselected fault level in the preset historical time period to obtain a plurality of keyword sets comprises:

clustering target fault log data by using a preset clustering algorithm to obtain a preset number of log data sets;

a plurality of keyword sets associated with a preset number of log data sets are constructed.

According to an embodiment of the present disclosure, clustering target fault log data by using a predetermined clustering algorithm to obtain a preset number of log data sets includes:

determining a plurality of feature words from the target fault log data;

calculating the weights of a plurality of characteristic words;

and clustering the target fault log data by using a preset clustering algorithm based on the weights of the characteristic words to obtain a preset number of log data sets.

According to an embodiment of the present disclosure, wherein constructing a plurality of keyword sets associated with a preset number of log data sets comprises:

determining a plurality of initial word sets from a preset number of log data sets;

removing stop words in the plurality of primary word sets to obtain a plurality of re-selected word sets;

respectively calculating the word frequency of the reselected words in the multiple reselected word sets;

and determining a plurality of keyword sets according to the word frequency of the reselected words in the plurality of reselected word sets.

According to the embodiment of the disclosure, the fault category labels respectively corresponding to the plurality of keyword sets are obtained by processing a preset number of log data sets.

According to an embodiment of the present disclosure, processing a preset number of log data sets to obtain fault category labels respectively corresponding to a plurality of keyword sets includes:

determining a plurality of pre-selected fault log data sets according to a preset number of log data sets, wherein each pre-selected fault log data set is associated with a keyword set;

determining preselected fault category labels respectively corresponding to the plurality of preselected fault log data sets;

and taking the plurality of preselected fault category labels as fault category labels corresponding to the keyword set associated with the preselected fault log data set.

According to an embodiment of the present disclosure, wherein determining preselected fault category labels corresponding to the plurality of preselected fault log data sets, respectively, comprises:

determining identification information of a target database respectively corresponding to the plurality of pre-selected fault log data sets and determining reference keyword sets respectively corresponding to the plurality of pre-selected fault log data sets;

acquiring target historical operation and maintenance data of a target database;

determining the fault category of the target database according to the target historical operation and maintenance data and the reference keyword set;

and determining preselected fault category labels respectively corresponding to the preselected fault log data sets according to the fault categories of the target database.

According to the embodiment of the disclosure, the log template tree is obtained by correcting an initial log template tree, wherein the initial log template tree comprises a plurality of initial keyword sets and fault category labels respectively corresponding to the initial keyword sets;

the step of correcting the initial log template tree to obtain a log template tree comprises the following steps:

determining invalid keywords according to the initial log template tree and the target fault log data;

and removing invalid keywords in the initial keyword set to obtain a log template tree.

According to the embodiment of the disclosure, the target fault log data is obtained by screening original fault log data, and the original fault log data is divided into: error log data, deadlock log data, alarm log data and message log data;

the step of screening the original fault log data to obtain target fault log data comprises the following steps:

and screening error log data and deadlock log data in the original fault log data as target fault log data.

Another aspect of the present disclosure provides a fault handling apparatus including a receiving module, an input module, an output module, and a first processing module.

The receiving module is used for receiving the fault log data to be detected from the database to be detected;

the log template tree comprises a plurality of keyword sets and fault category labels respectively corresponding to the keyword sets, and the keyword sets are obtained by processing target fault log data of a preselected fault level in a preset historical time period;

the output module is used for outputting the fault category of the database to be tested by utilizing the log template tree; and

and the first processing module is used for carrying out fault processing on the database to be tested according to the fault category of the database to be tested.

According to an embodiment of the present disclosure, the apparatus further includes a second processing module, configured to process target fault log data of a preselected fault level in a preset historical time period to obtain a plurality of keyword sets, where the second processing module includes:

the clustering unit is used for clustering target fault log data by using a preset clustering algorithm to obtain a preset number of log data sets;

the building unit is used for building a plurality of keyword sets associated with the log data sets in preset number.

According to an embodiment of the present disclosure, wherein the clustering unit includes:

the first determining subunit is used for determining a plurality of characteristic words from the target fault log data;

the first calculating subunit is used for calculating the weights of the plurality of characteristic words;

and the clustering subunit is used for clustering the target fault log data by using a preset clustering algorithm based on the weights of the plurality of characteristic words to obtain a preset number of log data sets.

According to an embodiment of the present disclosure, wherein the building unit includes:

the second determining subunit is used for determining a plurality of initial word sets from a preset number of log data sets;

the removing subunit is used for removing stop words in the plurality of primary word sets to obtain a plurality of re-word sets;

the second calculating subunit is used for respectively calculating the word frequency of the reselected words in the multiple reselected word sets;

and the third determining subunit is used for determining the multiple keyword sets according to the word frequency of the reselected words in the multiple reselected word sets.

According to an embodiment of the present disclosure, the apparatus further includes a third processing module, configured to process a preset number of log data sets, so as to obtain fault category labels respectively corresponding to the multiple keyword sets.

According to an embodiment of the present disclosure, wherein the third processing module includes:

the system comprises a first determining unit, a second determining unit and a third determining unit, wherein the first determining unit is used for determining a plurality of pre-selection fault log data sets according to a preset number of log data sets, and each pre-selection fault log data set is associated with a keyword set;

a second determination unit for determining preselected fault category labels corresponding to the plurality of preselected fault log data sets, respectively;

and a third determining unit, which is used for taking the plurality of preselected fault category labels as the fault category labels corresponding to the keyword set associated with the preselected fault log data set.

According to an embodiment of the present disclosure, wherein the second determining unit includes:

a fourth determining subunit that determines identification information of the target database corresponding to the plurality of pre-selected fault log data sets, respectively, and determines reference keyword sets corresponding to the plurality of pre-selected fault log data sets, respectively;

the acquisition subunit is used for acquiring target historical operation and maintenance data of the target database;

the fifth determining subunit is used for determining the fault category of the target database according to the target historical operation and maintenance data and the reference keyword set;

and the sixth determining subunit is used for determining the preselected fault category labels respectively corresponding to the preselected fault log data sets according to the fault categories of the target database.

According to an embodiment of the present disclosure, the apparatus further includes a fourth processing module, configured to modify the initial log template tree to obtain a log template tree, where the initial log template tree includes a plurality of initial keyword sets and fault category labels respectively corresponding to the plurality of initial keyword sets;

wherein, the fourth processing module includes:

the fourth determining unit is used for determining invalid keywords according to the initial log template tree and the target fault log data;

and the removing unit is used for removing the invalid keywords in the initial keyword set to obtain the log template tree.

According to an embodiment of the present disclosure, the apparatus further includes a fifth processing module, configured to perform screening processing on the original fault log data to obtain target fault log data, where the original fault log data is divided into: error log data, deadlock log data, alarm log data and message log data;

the fifth processing module comprises a screening unit, and the screening unit is used for screening error-reporting log data and deadlock log data in the original fault log data as target fault log data.

Another aspect of the present disclosure provides an electronic device including: one or more processors; a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the fault handling method described above.

Another aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described fault handling method.

Another aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above-described fault handling method.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates an application scenario diagram of a fault handling method, apparatus, device, medium and program product according to embodiments of the disclosure;

FIG. 2 schematically illustrates a flow diagram of a fault handling method according to an embodiment of the disclosure;

FIG. 3 schematically illustrates an example diagram of building multiple keyword sets in accordance with an embodiment of the disclosure;

FIG. 4 schematically illustrates a flow diagram of a clustering operation on target fault log data according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates an operational flow diagram for determining failure category labels respectively corresponding to a plurality of keyword sets according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates an operational flow diagram for determining invalid keywords associated with an initial log template tree, according to an embodiment of the present disclosure;

fig. 7 schematically shows a block diagram of a fault handling apparatus according to an embodiment of the present disclosure; and

fig. 8 schematically shows a block diagram of an electronic device adapted to implement a fault handling method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

The MySQL database Log file comprises a fault Log, a general Log, an update Log, a binary Log, a slow query Log and other logs, wherein the fault Log (Error Log) is one of the most commonly used logs in MySQL.

In the database operation and maintenance work, an operation and maintenance engineer needs to determine the fault category of the database so as to carry out fault operation and maintenance in a targeted manner. Usually, an operation and maintenance engineer checks a database fault log printed in a fault time period, and performs fault troubleshooting according to historical operation and maintenance experience.

However, the method depends on historical experience, the historical occurrence performance of the log cannot be well learned, the troubleshooting of the problem when the fault occurs depends on the experience of a DBA engineer, the fault cognition is limited by artificial judgment, the accuracy of fault prediction of unknown types is not high, the manual troubleshooting process is complicated, the consumed time is long, and the analysis capability of the total MySQL fault log is lost.

The database fault log can be divided into four levels according to the fault level thereof, including four classes of ERROR, DeadLock, Warning and Note, and it is found in the process of implementing the present disclosure that extra attention is needed for the logs at the ERROR and DeadLock levels, and a fault handling method may be, for example: based on the experience of DBA operation and maintenance experts, the operation and maintenance experts summarize a log keyword library according to collected MySQL fault logs, and classify MySQL in a regular matching mode, when the MySQL breaks down, operation and maintenance engineers firstly check the ERROR level logs and the DeadLock level logs printed in the fault time period, and if the problem cannot be determined, the operation and maintenance engineers continue to check the Warner level logs for further diagnosis, but the method still has the problems of complex checking process and long time consumption.

In the process of realizing the method, the occurrence of the fault is also found to have a premonitory phenomenon under the real condition, and if the phenomenon can be found in advance based on the rule of historical fault log data, the fault pre-judgment of partial scenes can be realized, and the occurrence of the fault is reduced.

In view of this, the embodiment of the disclosure makes category division on the full amount of MySQL historical logs in advance through the NLP technology, models the operation and maintenance experience, establishes a log template tree by using a large amount of historical log training models, can realize fast classification on the logs generated in real time, identifies the abnormality from the log perspective, further realizes online real-time prediction according to the database logs through the log template tree, and obtains the fault category of the database in time, thereby assisting the DBA engineer to quickly lock the fault source of MySQL when a fault occurs.

An embodiment of the present disclosure provides a fault processing method, including:

receiving fault log data to be tested from a database to be tested;

inputting the fault log data to be detected into a log template tree, wherein the log template tree comprises a plurality of keyword sets and fault category labels respectively corresponding to the keyword sets, and the keyword sets are obtained by processing target fault log data of a preselected fault level in a preset historical time period;

Fig. 1 schematically shows an application scenario diagram of a fault handling method, apparatus, device, medium, and program product according to embodiments of the present disclosure.

As shown in fig. 1, an application scenario 100 according to this embodiment may include a terminal device 101, a server 102, and a database 103. Communication between the terminal devices 101, the server 102 and the database 103 may be via a network, which may include various types of connections, such as wired, wireless communication links, or fiber optic cables, among others.

A user may use terminal device 101 to interact with server 102 over a network to receive or send messages and the like. Various messaging client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on terminal device 101.

The terminal device 101 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 102 may be a server that provides various services, such as a background management server (for example only) that provides support for websites browsed by users using the terminal devices 101. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.

Database 103 may be various types of databases including, but not limited to, various relational databases, non-relational databases, and the like. Such as MySQL, MariaDB, Oracle databases, MongoDB, CouchDB, and so forth. Database 103 may generate various log files during use, such as fault logs, general logs, update logs, binary logs, slow query logs, and the like.

In an application scenario of the present disclosure, a user may use the terminal device 101 to interact with the server 102 through a network, and initiate a request for obtaining a fault identification result of the database 103 to the server 102. In response to a user request, the server 102 may receive real-time fault log data to be tested sent from the database 103 in real time, the server 102 may output a fault category result of the database to be tested by executing the fault processing method according to the embodiment of the present disclosure based on the fault log data sent from the database 103, and return the fault category result to the user through the terminal device 101, and further, the server 102 may perform fault processing on the database 103 based on the fault category of the database to be tested 103, for example, stop receiving service data sent from the database 103, perform a locking operation on the database 103, and the like.

It is noted that the fault handling method provided by the embodiments of the present disclosure may be generally performed by the server 102. Accordingly, the fault handling apparatus provided by the embodiments of the present disclosure may be generally disposed in the server 102. The fault handling method provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 102 and is capable of communicating with the terminal device 101, the server 102, and the database 103. Accordingly, the fault handling apparatus provided in the embodiments of the present disclosure may also be disposed in a server or a server cluster different from the server 102 and capable of communicating with the terminal device 101, the server 102, and the database 103.

It should be understood that the number of terminal devices, servers, databases in fig. 1 is merely illustrative. There may be any number of terminal devices, servers, databases, as desired for implementation.

The following describes the fault handling method according to the embodiment of the present disclosure in detail based on the scenario described in fig. 1.

Fig. 2 schematically shows a flow chart of a fault handling method according to an embodiment of the present disclosure.

As shown in fig. 2, the fault handling method of this embodiment includes operations S201 to S204.

In operation S201, receiving fault log data to be detected from a database to be detected;

in operation S202, inputting the log template tree of the fault to be detected, where the log template tree includes a plurality of keyword sets and fault category labels respectively corresponding to the keyword sets, and the keyword sets are obtained by processing target fault log data at preselected fault levels in a preset historical time period;

in operation S203, outputting the fault category of the database to be tested by using the log template tree; and

in operation S204, the database under test is fault-processed according to the fault category of the database under test.

According to an embodiment of the present disclosure, the database to be tested may be various types of databases, such as MySQL, Oracle, MongoDB, and the like. The database to be tested can generate various log data in real time, wherein the log data comprises fault log data. For example, MySQL fault log data may contain records of MySQL start and shut down times, diagnostic messages, etc., such as errors, warnings, and comments that occur during server start-up and shut-down, as well as during server operation (e.g., if MySQL notes that a table needs to be automatically checked or repaired, it writes a message in the fault log).

According to the embodiment of the disclosure, the log template tree is a model tree obtained by pre-training according to historical fault log data, and can be used for predicting the fault category of the to-be-tested database according to the input to-be-tested fault log data.

According to an embodiment of the disclosure, in the log template tree, a plurality of keyword sets established in advance may be sets of some keywords representing key features of data, for example, one of the keyword sets may be "Table", "is masked as featured", "shouldered repaired"; the failure category labels respectively corresponding to the plurality of keyword sets may be descriptions for characterizing failure categories. For example, the failure category label corresponding to the keyword set may be "Class ═ insufficient disk space or system disk damage".

According to the embodiment of the disclosure, the log template tree mainly executes the failure prediction operation through a plurality of keyword sets established in advance and failure category labels respectively corresponding to the keyword sets. For example, after receiving MySQL fault log data to be detected, matching the MySQL fault log data to be detected with a plurality of keyword sets through a tag tree matching algorithm, and obtaining a fault category represented by the current fault log data to be detected according to a fault category tag corresponding to a target keyword set after matching to the target keyword set.

According to the embodiment of the disclosure, the database fault log can be divided into a plurality of levels according to the fault level thereof, and the pre-selected fault level corresponding to the target fault log data can be part or all of the fault log data.

According to the embodiment of the disclosure, the plurality of keyword sets in the log template tree are obtained by processing target fault log data of a preselected fault level within a preset historical time period (for example, the last month, year and the like). For example, the target fault log data may be classified first, and then keywords may be extracted from each category of data to form a plurality of keyword sets.

According to the embodiment of the disclosure, after the fault category of the database to be tested is obtained, fault processing can be performed on the database to be tested according to the fault category of the database to be tested, for example, the receiving of service data sent by the database is stopped, and locking operation is performed on the database.

According to the embodiment of the disclosure, by executing the fault processing method, the real-time fault prediction can be realized only by inputting the online real-time data into the log template tree, so that not only is the full-automatic flow of fault identification and fault processing realized, but also the fault processing time is shortened, the manpower is liberated, and the processing timeliness is improved compared with manual experience identification. Moreover, the log template tree is constructed according to a large amount of historical log data, operation and maintenance experience modeling is achieved, various fault types are covered, compared with experience identification, the limitation of artificial judgment on fault cognition is broken, and accuracy of fault prediction of unknown types is higher.

According to the embodiment of the disclosure, the database fault log may be divided into a plurality of levels according to the fault level thereof, the target fault log data is obtained by screening the original fault log data, for example, the original fault log data is divided according to the fault level: ERROR log data-ERROR, DeadLock log data-DeadLock, alarm log data-Warning, and message log data-Note.

The preselected fault level corresponding to the target fault log data may be fault log data of a part of or all of the four levels, for example, data including only the ERROR level, or data including data corresponding to the ERROR level, the deadload level, and the Warning level.

Further, the step of screening the original fault log data to obtain target fault log data includes: and screening error log data and deadlock log data in the original fault log data as target fault log data.

According to the embodiment of the disclosure, log data at an ERROR level and a DeadLock level are more related to database failures, data at a Warning level and a Note level may only alarm or prompt the possibility of possible failures, and the probability is irrelevant to the database failures, so in order to improve the efficiency of data processing and reduce invalid operations on irrelevant data, data related to the ERROR level and the DeadLock level with higher failure levels need to be selectively paid attention, data processing is performed in a targeted manner, and the processing efficiency is improved.

According to an embodiment of the present disclosure, the plurality of keyword sets in the log template tree are obtained by processing target fault log data of a preselected fault level within a preset historical time period (for example, may be a last month, a year, and the like), and the processing operation may include:

firstly, clustering target fault log data by using a preset clustering algorithm to obtain a preset number of log data sets, namely, classifying the target fault log data to form a plurality of categories of data sets. The preset number of log data sets (i.e., how many categories the log data are divided into) may be a predetermined optimal number of categories, which may be determined according to an algorithm or may be determined empirically, and in the embodiment of the present disclosure, the optimal number of log data sets obtained after the clustering operation is performed is 20.

Then, a plurality of keyword sets associated with a preset number of log data sets are constructed, for example, keywords may be extracted for each category of data set, respectively, to form a plurality of keyword sets.

FIG. 3 schematically shows an example diagram of building multiple keyword sets according to an embodiment of the disclosure.

Fig. 3 shows an example of clustering target fault log data at the ERROR and loadlock levels of MySQL and constructing multiple keyword sets (multiple ERROR log key segments).

Specifically, for example, text clustering may be performed on the ERROR log data and the DeadLock-level fault log data, and each level of data clustering obtains 20 categories of log data sets, for example, 20 categories in [ ERROR ] data are: [ ERROR _0] to [ ERROR _19 ].

And then constructing error log key fragments for the log data sets of each category respectively to obtain a keyword set under each category. For example, in [ ERROR ] data, the keyword set corresponding to [ ERROR _0] data set is: the keyword set corresponding to the "result not found target log … …" [ ERROR _1] data set is as follows: "coin not load plugin … …" and the like. The keyword set corresponding to the log data set of each category reflects key features of the log data, such as 'data cannot be loaded', 'data is to be repaired', and the like.

According to the embodiment of the disclosure, the constructed multiple keyword sets contain representative keywords in the special field of database faults, each keyword set corresponds to one fault category (two, three or other number of keyword sets correspond to the same fault category), and the multiple keyword sets constructed by the method can be applied to not only classification and identification of database faults so as to quickly and accurately know the causes of the problems, but also any other fields of database fault analysis, and have good universality.

According to an embodiment of the present disclosure, in the above operation, clustering the target fault log data by using a predetermined clustering algorithm to obtain a preset number of log data sets may include:

determining a plurality of feature words from the target fault log data;

calculating the weights of a plurality of characteristic words;

Fig. 4 schematically shows a flowchart of a clustering operation on target fault log data according to an embodiment of the present disclosure, which is described below in conjunction with fig. 4.

As shown in fig. 4, taking MySQL fault log data as an example, the operation of performing data processing includes:

firstly, a plurality of characteristic words are determined from target fault log data, automatic word segmentation processing (for example, jieba word segmentation can be used) can be performed on MySQL fault log text data in sequence, word frequency of each word is counted, stop words are filtered out, and finally a plurality of characteristic words are selected.

The weights of the plurality of feature words are then calculated, and the weights of the feature words are constructed based on the word frequency of each word, for example, by using a TF-IDF algorithm.

Finally, clustering the target fault log data based on the weights of the plurality of feature words by using a predetermined clustering algorithm, such as a K-Means clustering algorithm, so as to obtain a preset number of log data sets. When the K-Means clustering algorithm is used for clustering, 20 is taken as the number K of the clustered categories, and the number K of the clustered categories can be the optimal number of the clustered categories which is determined in advance according to the algorithm, so that the log data sets of 20 categories are finally obtained.

It should be noted that the target fault log data may be data including multiple fault levels, for example, data of one or more levels of four fault levels, such as ERROR, DeadLock, Warning, and Note, and the clustering operation method may be applied to perform clustering operation on data of any one of the fault levels, and after the clustering operation is performed, data of each preselected fault level is classified.

According to the embodiment of the disclosure, in the clustering operation process, the weights of the plurality of feature words are determined, so that the key feature words can be configured with larger weights, and clustering can be performed according to the words with higher weights in the subsequent clustering operation process, so that the possibility of obtaining invalid classification results due to clustering based on invalid words is avoided, and the clustering accuracy is improved. Because the fault log data of the database are relatively regular and all short text types, the fault log data can be well classified by text clustering and combining a TF-IDF algorithm according to the characteristics of large data volume and regular data, the influence of noise data is reduced in the clustering process, and the clustering effect is good.

According to an embodiment of the present disclosure, in the above operation, constructing a plurality of keyword sets associated with a preset number of log data sets includes:

firstly, a plurality of initial word sets are determined from a preset number of log data sets, the operation can be that initial words are extracted from each log data set respectively to obtain one initial word set, and a plurality of initial word sets are finally obtained after each log data set is processed.

And then, removing stop words in the plurality of initial word sets to obtain a plurality of re-selected word sets, wherein the purpose of the operation is to remove the interference of invalid words.

Then, word frequency of the reselected words in the multiple reselected word sets is calculated, and then the multiple keyword sets are determined according to the word frequency of the reselected words in the multiple reselected word sets. For example, the word frequency of each reselected word is calculated respectively, and reselected words with word frequencies greater than a certain preset value are extracted to form a keyword set. The operation may be to process each re-selected word set to obtain a keyword set, and process each re-selected word set to obtain a plurality of keyword sets.

According to an embodiment of the present disclosure, a log template tree includes a plurality of keyword sets and failure category labels respectively corresponding to the plurality of keyword sets.

The fault category labels respectively corresponding to the plurality of keyword sets may be obtained by processing target fault log data, for example, after clustering the target fault log data by using a predetermined clustering algorithm to obtain a certain preset number of log data sets, processing the preset number of log data sets to obtain the fault category labels respectively corresponding to the plurality of keyword sets.

The failure category labels respectively corresponding to the plurality of keyword sets may also be obtained by respectively processing the plurality of keyword sets. For example, for each keyword set, according to the features represented by the keywords in the keyword set, the fault category possibly characterized by the keyword set is determined, and then the fault category label is obtained.

Further, in the above operations, after clustering the target fault log data by using a predetermined clustering algorithm to obtain a certain preset number of log data sets, processing the preset number of log data sets to obtain fault category labels respectively corresponding to the multiple keyword sets may include the following operations:

first, a plurality of pre-selected fault log data sets are determined according to a preset number of log data sets, wherein each pre-selected fault log data set is associated with a keyword set. For example, after 20 log data sets of different types are obtained by clustering preselected target fault log data, a part of log data which can represent the characteristics of the log data is selected from each log data set to serve as a preselected fault log data set associated with the log data set, and each preselected fault log data set can be associated with a keyword set because each log data set is associated with a keyword set correspondingly. For example, one log data set includes 1 ten thousand pieces of historical log data, and a keyword set obtained by extracting keywords from the 1 ten thousand pieces of historical log data is: "Table", "is masked as cropped", "short replayed"; extracting the most representative 10 pieces of data from the 1 ten thousand pieces of historical log data to form a preselected fault log data set, and associating the preselected fault log data set with the corresponding keyword set: "Table", "is masked as cropped" and "short replayed".

Then, determining preselected fault category labels respectively corresponding to the preselected fault log data sets; that is, for each pre-selected fault log data set, its corresponding pre-selected fault category label is determined.

And finally, taking the plurality of preselected fault category labels as fault category labels corresponding to the keyword set associated with the preselected fault log data set. That is, the preselected fault category label corresponding to the preselected fault log dataset is used as the fault category label corresponding to the keyword set associated with the preselected fault log dataset. For example, for one log data set, which contains 1 ten thousand pieces of historical log data, 10 pieces of data that are most representative of the 1 ten thousand pieces of historical log data are extracted to form a preselected fault log data set, and the associated keyword set of the preselected fault log data set is: "Table", "is masked as cropped" and "short replayed". Now, the 10 pieces of data are combined into a preselected failure log data set to be subjected to data analysis processing, and if a failure category possibly associated with the 10 pieces of data is insufficient disk space or system disk damage, the 10 pieces of data can be regarded as a keyword set: the failure category corresponding to "Table", "is masked as rashed" and "short replayed" is also insufficient disk space or system disk damage.

According to the embodiment of the disclosure, since each log data set may contain a huge amount of log data, analyzing and processing the full amount of data to obtain the fault category is not only time-consuming, but also unnecessary. The method comprises the steps of selecting partial log data which can represent the characteristics of the log data most, analyzing and processing the small amount of data to obtain corresponding fault categories, and expanding the result to the whole full log data set to be used as the fault categories corresponding to the keyword set associated with the whole full log data set. Through the operation, on the premise of ensuring the fault category precision, the workload of data processing can be reduced, and the data processing efficiency is improved.

According to an embodiment of the present disclosure, further, in the above operation, determining preselected fault category labels respectively corresponding to the plurality of preselected fault log data sets may include:

Fig. 5 schematically illustrates an operation flow diagram for determining failure category labels respectively corresponding to a plurality of keyword sets according to an embodiment of the present disclosure. The above operation will be described below with reference to fig. 5.

As shown in fig. 5, taking the MySQL database as an example, first, identification information of a target database corresponding to a MySQL pre-selected failure log data set (MySQL error log) is determined, for example, data analysis may be performed on the MySQL error log, and database IP address information corresponding to the MySQL error log is extracted and located to a database that has failed once. Meanwhile, key segments are extracted from the MySQL error log to form a reference keyword set, and the reference keyword set represents the data characteristics of a preselected fault log data set and can be used for representing the fault characteristics.

The marking operation is then performed on this preselected fault log data set, i.e., the fault category with which it is associated is determined. Specifically, the target historical operation and maintenance data of the target database can be obtained according to the determined IP information of the target database, the fault category of the target database is determined by analyzing the target historical operation and maintenance data and combining the fault characteristics represented by the reference keyword set, and finally, the preselected fault category labels respectively corresponding to the preselected fault log data sets are determined according to the fault category of the target database.

For example, the result after performing the marking operation is: for the pre-selected fault log data set 1, the reference keyword set (extract _ msg) of which may be "is masked as featured … …", the pre-selected fault category label (problemlabel) determined is [ ERROR _0 ]: insufficient disk space or system disk damage; as another example, for the pre-selected fault log data set 2, the reference keyword set (extract _ msg) may be "result not load plugin … …", and the determined pre-selected fault category label (problemlabel) is [ ERROR _1 ]: no specific plug-in can be found, and so on.

According to the embodiment of the disclosure, the fault category of the target database is determined by combining the historical operation and maintenance data of the fault database and the reference keyword set, and on the basis of historical experience, the characteristics reflected by the log data are combined, so that not only the judgment is performed according to the historical faults, but also the determination of the fault category is more accurate.

According to the embodiment of the present disclosure, further, the log template tree for outputting the fault category according to the to-be-detected fault log data in the embodiment of the present disclosure is obtained by correcting the initial log template tree, where the initial log template tree includes a plurality of initial keyword sets and fault category labels respectively corresponding to the plurality of initial keyword sets.

Specifically, the step of modifying the initial log template tree to obtain the log template tree includes:

FIG. 6 schematically illustrates an operational flow diagram for determining invalid keywords associated with an initial log template tree, according to an embodiment of the present disclosure.

As shown in fig. 6, according to the initial log template tree and the target fault log data, the determined invalid keyword may be: according to the constructed initial log template tree (constructed according to target fault log data of a preselected fault level in a preset historical time period), subtracting an initial keyword set in the initial log template tree from the target fault log data to obtain initial field invalid words, and then filtering the valid times in the initial field invalid words to obtain final invalid keywords.

According to the embodiment of the disclosure, by constructing invalid keywords in the field and correcting the constructed log template tree, the obtained log template tree can better obtain more accurate results during text clustering calculation, and further the accuracy of text classification is improved.

Based on the fault processing method, the disclosure also provides a fault processing device. The apparatus will be described in detail below with reference to fig. 7.

Fig. 7 schematically shows a block diagram of a fault handling device according to an embodiment of the present disclosure.

As shown in fig. 7, the fault handling apparatus includes a receiving module 701, an input module 702, an output module 703, and a first processing module 704.

The receiving module 701 is configured to receive fault log data to be detected from a database to be detected;

an input module 702, configured to input the log data to be tested into a log template tree, where the log template tree includes a plurality of keyword sets and fault category labels respectively corresponding to the keyword sets, and the keyword sets are obtained by processing target fault log data at preselected fault levels within a preset historical time period;

an output module 703, configured to output the fault category of the database to be tested by using the log template tree; and

the first processing module 704 is configured to perform fault processing on the database to be tested according to the fault category of the database to be tested.

According to the embodiment of the disclosure, through the receiving module 701, the input module 702 and the output module 703, the real-time fault prediction can be realized only by inputting the online real-time data into the log template tree, so that not only is the full-automatic flow of fault identification and fault processing realized, but also the fault processing time is shortened, the manpower is liberated, and the processing timeliness is improved compared with the manual experience identification. Moreover, the log template tree is constructed according to a large amount of historical log data, operation and maintenance experience modeling is achieved, various fault types are covered, the limitation of artificial judgment on fault cognition is broken through compared with experience identification, and the accuracy of fault prediction of unknown types is higher.

According to the embodiment of the disclosure, the device further comprises a second processing module, which is used for processing the target fault log data of the preselected fault level in the preset historical time period to obtain a plurality of keyword sets, wherein the second processing module comprises a clustering unit and a constructing unit.

The system comprises a clustering unit, a data processing unit and a data processing unit, wherein the clustering unit is used for clustering target fault log data by using a preset clustering algorithm to obtain a preset number of log data sets; the building unit is used for building a plurality of keyword sets associated with the log data sets in preset number.

According to the embodiment of the disclosure, the clustering unit comprises a first determining subunit, a first calculating subunit and a clustering subunit.

The first determining subunit is used for determining a plurality of feature words from the target fault log data; the first calculating subunit is used for calculating the weights of the plurality of characteristic words; and the clustering subunit is used for clustering the target fault log data by using a preset clustering algorithm based on the weights of the plurality of characteristic words to obtain a preset number of log data sets.

According to an embodiment of the present disclosure, wherein the construction unit includes a second determination subunit, a removal subunit, a second calculation subunit, and a third determination subunit.

The second determining subunit is configured to determine a plurality of primary word sets from a preset number of log data sets; the removing subunit is used for removing stop words in the plurality of primary word sets to obtain a plurality of re-word sets; the second calculating subunit is used for respectively calculating the word frequency of the reselected words in the multiple reselected word sets; and the third determining subunit is used for determining the multiple keyword sets according to the word frequency of the reselected words in the multiple reselected word sets.

According to an embodiment of the present disclosure, the third processing module includes a first determining unit, a second determining unit, and a third determining unit.

The system comprises a first determining unit, a second determining unit and a third determining unit, wherein the first determining unit is used for determining a plurality of pre-selection fault log data sets according to a preset number of log data sets, and each pre-selection fault log data set is associated with a keyword set; a second determination unit for determining preselected fault category labels corresponding to the plurality of preselected fault log data sets, respectively; and a third determining unit, which is used for taking the plurality of preselected fault category labels as the fault category labels corresponding to the keyword set associated with the preselected fault log data set.

According to an embodiment of the present disclosure, the second determination unit includes a fourth determination subunit, an acquisition subunit, a fifth determination subunit, and a sixth determination subunit.

The fourth determining subunit determines the identification information of the target database corresponding to the multiple pre-selected fault log data sets respectively, and determines the reference keyword sets corresponding to the multiple pre-selected fault log data sets respectively; the acquisition subunit is used for acquiring target historical operation and maintenance data of the target database; the fifth determining subunit is used for determining the fault category of the target database according to the target historical operation and maintenance data and the reference keyword set; and the sixth determining subunit is used for determining the preselected fault category labels respectively corresponding to the preselected fault log data sets according to the fault categories of the target database.

the fourth processing module comprises a fourth determining unit and a removing unit.

The fourth determining unit is used for determining invalid keywords according to the initial log template tree and the target fault log data; and the removing unit is used for removing the invalid keywords in the initial keyword set to obtain the log template tree.

the fifth processing module comprises a screening unit, which is used for screening error-reporting log data and deadlock log data in the original fault log data as target fault log data.

According to the embodiment of the present disclosure, any plurality of the receiving module 701, the input module 702, the output module 703 and the first processing module 704 may be combined into one module to be implemented, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the receiving module 701, the input module 702, the output module 703 and the first processing module 704 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented in any one of three implementations of software, hardware and firmware, or in a suitable combination of any several of them. Alternatively, at least one of the receiving module 701, the input module 702, the output module 703 and the first processing module 704 may be at least partially implemented as a computer program module, which when executed may perform a corresponding function.

As shown in fig. 8, an electronic device 800 according to an embodiment of the present disclosure includes a processor 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. The processor 801 may include, for example, a general purpose microprocessor (e.g., CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., Application Specific Integrated Circuit (ASIC)), among others. The processor 801 may also include onboard memory for caching purposes. The processor 801 may include a single processing unit or multiple processing units for performing different actions of the method flows according to embodiments of the present disclosure.

In the RAM 803, various programs and data necessary for the operation of the electronic apparatus 800 are stored. The processor 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. The processor 801 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 802 and/or RAM 803. Note that the programs may also be stored in one or more memories other than the ROM 802 and RAM 803. The processor 801 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

Electronic device 800 may also include input/output (I/O) interface 805, input/output (I/O) interface 805 also connected to bus 804, according to an embodiment of the present disclosure. Electronic device 800 may also include one or more of the following components connected to I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 802 and/or RAM 803 described above and/or one or more memories other than the ROM 802 and RAM 803.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the fault handling method provided by the embodiment of the disclosure.

The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 801. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted in the form of a signal on a network medium, distributed, downloaded and installed through the communication part 808, and/or installed from the removable medium 811. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 808 and/or installed from the removable medium 811. The computer program, when executed by the processor 801, performs the above-described functions defined in the system of the embodiments of the present disclosure. The above described systems, devices, apparatuses, modules, units, etc. may be implemented by computer program modules according to embodiments of the present disclosure.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A fault handling method, comprising:

receiving fault log data to be tested from a database to be tested;

and carrying out fault processing on the database to be detected according to the fault category of the database to be detected.

2. The method of claim 1, wherein processing target fault log data for a preselected fault level over a preset historical period of time to obtain the plurality of keyword sets comprises:

clustering the target fault log data by using a preset clustering algorithm to obtain a preset number of log data sets;

and constructing a plurality of keyword sets associated with the preset number of log data sets.

3. The method of claim 2, wherein the clustering the target fault log data using a predetermined clustering algorithm to obtain a preset number of log data sets comprises:

determining a plurality of feature words from the target fault log data;

calculating weights of the plurality of feature words;

and clustering the target fault log data by utilizing the preset clustering algorithm based on the weights of the plurality of feature words to obtain a preset number of log data sets.

4. The method of claim 2, wherein the constructing a plurality of the keyword sets associated with the preset number of log data sets comprises:

determining a plurality of initial word sets from the preset number of log data sets;

and determining the multiple keyword sets according to the word frequency of the reselected words in the multiple reselected word sets.

5. The method according to claim 2, wherein the failure category labels respectively corresponding to the plurality of keyword sets are obtained by processing the preset number of log data sets.

6. The method of claim 5, wherein processing the preset number of log data sets to obtain failure category labels respectively corresponding to the plurality of keyword sets comprises:

determining a plurality of preselected fault log data sets according to the preset number of log data sets, wherein each preselected fault log data set is associated with one keyword set;

determining preselected fault category labels corresponding to the preselected fault log data sets respectively;

and using a plurality of the preselected fault category labels as the fault category labels corresponding to the keyword set associated with the preselected fault log data set.

7. The method of claim 6, wherein said determining preselected fault category labels corresponding to said plurality of preselected fault log data sets, respectively, comprises:

acquiring target historical operation and maintenance data of the target database;

and according to the fault category of the target database, determining preselected fault category labels respectively corresponding to the preselected fault log data sets.

8. The method according to claim 1, wherein the log template tree is obtained by modifying an initial log template tree, wherein the initial log template tree includes a plurality of initial keyword sets and failure category labels respectively corresponding to the initial keyword sets;

the step of correcting the initial log template tree to obtain the log template tree comprises the following steps:

and removing the invalid keywords in the initial keyword set to obtain the log template tree.

9. The method of claim 1, wherein the target fault log data is obtained by screening original fault log data, and the original fault log data is classified according to fault levels as follows: error log data, deadlock log data, alarm log data and message log data;

the step of screening the original fault log data to obtain the target fault log data comprises:

and screening the error-reporting log data and the deadlock log data in the original fault log data as the target fault log data.

10. A fault handling device comprising:

the receiving module is used for receiving the to-be-tested fault log data from the to-be-tested database;

the log template tree comprises a plurality of keyword sets and fault category labels respectively corresponding to the keyword sets, wherein the keyword sets are obtained by processing target fault log data of a preselected fault level in a preset historical time period;

11. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-9.

12. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 9.

13. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 9.