CN117389827A

CN117389827A - Fault locating method, device, electronic equipment and computer readable medium

Info

Publication number: CN117389827A
Application number: CN202311318096.1A
Authority: CN
Inventors: 陈翟翟
Original assignee: China Construction Bank Corp; CCB Finetech Co Ltd
Current assignee: China Construction Bank Corp; CCB Finetech Co Ltd
Priority date: 2023-10-12
Filing date: 2023-10-12
Publication date: 2024-01-12

Abstract

The invention discloses a fault positioning method, a fault positioning device, electronic equipment and a computer readable medium, and relates to the technical field of big data processing and mining. One embodiment of the method comprises the following steps: calculating a first probability value of each word segment, and sequencing each word segment based on the first probability value of each word segment to obtain a first keyword corresponding to each text; classifying the texts based on the first keywords corresponding to the texts, so as to obtain first text clusters and the first keywords corresponding to the first text clusters; clustering each text to obtain each second text cluster; calculating a second probability value of each word in each second text cluster, and sequencing each word based on the second probability value of each word to obtain a second keyword corresponding to each second text cluster; a fault model is constructed based on the first keyword and the second keyword. The implementation mode can solve the technical problems of high manual maintenance cost and inaccurate fault positioning.

Description

Fault locating method, device, electronic equipment and computer readable medium

Technical Field

The present invention relates to the field of big data processing and mining technologies, and in particular, to a fault locating method, a fault locating device, an electronic device, and a computer readable medium.

Background

Currently, with the wide application of new technologies such as virtualization and cloud computing, the scale of an IT infrastructure in an enterprise data center is multiplied, the scale of computer hardware and software is continuously enlarged, corresponding computer faults also frequently occur, and a line of operation and maintenance personnel needs more specialized and powerful operation and maintenance tools. In the daily operation and maintenance work of the data center, a mechanism for discovering the faults of software and hardware of the data center is generally constructed through a basic monitoring system and an application monitoring system. In the process, when various software and hardware are abnormal, the index item exceeds a preset threshold value, so that an alarm is triggered, and an operation and maintenance expert is informed to remove the obstacle.

In the whole process of 'fault discovery-fault identification-fault disposal', a fault model can be set first to perform automatic fault judgment and then perform automatic disposal on faults, so that the fault recovery system is safe.

In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art:

the setting of the fault model completely depends on the precipitation of the experience of operation and maintenance specialists, and the manual maintenance cost is high; moreover, a fault model is established manually, and the fault is difficult to cover comprehensively for increasingly-grown software and hardware faults, so that fault positioning is inaccurate.

Disclosure of Invention

In view of the above, embodiments of the present invention provide a fault locating method, device, electronic apparatus, and computer readable medium, so as to solve the technical problems of high manual maintenance cost and inaccurate fault locating.

To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a fault locating method including:

calculating a first probability value of each word segment, and sequencing each word segment based on the first probability value of each word segment to obtain a first keyword corresponding to each text; classifying each text based on the first keywords corresponding to each text, so as to obtain each first text cluster and the first keywords corresponding to each first text cluster;

clustering the texts to obtain second text clusters; calculating a second probability value of each word in each second text cluster, and sorting the words based on the second probability values of the words to obtain second keywords corresponding to the second text clusters;

constructing a fault model based on the first keyword and the second keyword;

and receiving alarm information in real time, responding to the alarm information hitting the fault model, and generating a fault instance according to the alarm information.

Optionally, calculating a first probability value of each word segment, and sorting the words based on the first probability value of each word segment to obtain first keywords corresponding to each text, including:

for each word segment, respectively calculating a first occurrence probability of the word segment in a text where the word segment is located and a second occurrence probability of the word segment in all texts, and dividing the first occurrence probability by the second occurrence probability to obtain a first probability value of the word segment;

and arranging the segmented words in a descending order based on the first probability value of each segmented word, so that n segmented words with the top order are screened out to be used as first keywords corresponding to the text.

Optionally, classifying each text based on the first keyword corresponding to each text, so as to obtain each first text cluster and the first keyword corresponding to each first text cluster, including:

comparing the first text keywords corresponding to the first text with the first text keywords corresponding to the second text, and if the number of the first text keywords which are the same as the number of the second text keywords is more than or equal to m, dividing the first text and the second text into the same text cluster, thereby obtaining a first text cluster and the first keywords corresponding to the first text cluster; wherein n is greater than or equal to m.

Optionally, clustering the texts to obtain second text clusters, including:

for each text, calculating a text vector of the text according to word vectors of the words in the text;

calculating the distance between the text vector and the cluster center of each second text cluster by adopting an included angle cosine formula; judging whether the distance is larger than a distance threshold value or not; if yes, dividing the text into a second text cluster with the minimum distance from the text; if not, a second text cluster is newly built, and the text is divided into the newly built second text clusters.

Optionally, calculating a second probability value of each word segment in each second text cluster, and sorting each word segment based on the second probability value of each word segment to obtain a second keyword corresponding to each second text cluster, including:

for each word segment, calculating a third occurrence probability of the word segment in a second text cluster where the word segment is located and a second occurrence probability of the word segment in all texts respectively, and dividing the third occurrence probability by the second occurrence probability to obtain a second probability value of the word segment;

and arranging the segmented words in a descending order based on the second probability value of the segmented words, so that t segmented words with the top ranking are screened out and used as second keywords corresponding to the second text cluster.

Optionally, constructing a fault model based on the first keyword and the second keyword includes:

combining and de-duplicating the first keyword and the second keyword, thereby obtaining a model keyword;

constructing a keyword strategy based on the model keywords;

and constructing a fault model according to the keyword strategy, the time strategy, the space strategy and the disposal strategy.

Optionally, merging and deduplicating the first keyword and the second keyword, thereby obtaining a model keyword, including:

comparing the first keywords with the second keywords, and if the number of the first keywords which is the same as that of the second keywords is greater than or equal to a number threshold, merging the first keywords with the second keywords, and de-duplicating the merged keywords to obtain model keywords.

In addition, according to another aspect of the embodiment of the present invention, there is provided a fault locating device including:

the first extraction module is used for calculating a first probability value of each word segment, and sequencing each word segment based on the first probability value of each word segment to obtain a first keyword corresponding to each text; classifying each text based on the first keywords corresponding to each text, so as to obtain each first text cluster and the first keywords corresponding to each first text cluster;

The second extraction module is used for clustering the texts so as to obtain second text clusters; calculating a second probability value of each word in each second text cluster, and sorting the words based on the second probability values of the words to obtain second keywords corresponding to the second text clusters;

the construction module is used for constructing a fault model based on the first keyword and the second keyword;

and the positioning module is used for receiving the alarm information in real time, responding to the alarm information hitting the fault model, and generating a fault instance according to the alarm information.

Optionally, the first extraction module is further configured to:

Optionally, the second extraction module is further configured to:

Optionally, the building module is further configured to:

constructing a keyword strategy based on the model keywords;

Optionally, the building module is further configured to:

According to another aspect of an embodiment of the present invention, there is also provided an electronic device including:

one or more processors;

storage means for storing one or more programs,

the one or more processors implement the method of any of the embodiments described above when the one or more programs are executed by the one or more processors.

According to another aspect of an embodiment of the present invention, there is also provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the method according to any of the embodiments described above.

According to another aspect of embodiments of the present invention, there is also provided a computer program product comprising a computer program which, when executed by a processor, implements the method according to any of the embodiments described above.

One embodiment of the above invention has the following advantages or benefits: because the first probability value of each word segment is calculated, sequencing each word segment based on the first probability value of each word segment to obtain a first keyword corresponding to each text; classifying each text based on the first keywords corresponding to each text to obtain each first text cluster and the first keywords corresponding to each first text cluster, and clustering each text to obtain each second text cluster; and calculating a second probability value of each word in each second text cluster, and sequencing each word based on the second probability value of each word to obtain a second keyword corresponding to each second text cluster, so that the technical problems of high manual maintenance cost and inaccurate fault location in the prior art are overcome. According to the embodiment of the invention, the keywords are extracted in a mode of combining probability statistics and vector classification, and the fault model is constructed based on the extracted keywords, so that not only are the manpower and the labor maintenance cost saved, but also the keywords can be extracted accurately and rapidly, and the fault model can be used for accurately positioning the fault.

Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

FIG. 1 is a flow chart of a fault localization method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a fault localization method according to one referenceable embodiment of the invention;

FIG. 3 is a flow chart of a fault localization method according to another referenceable embodiment of the invention;

FIG. 4 is a flow chart of a fault localization method according to yet another alternative embodiment of the present invention;

FIG. 5 is a schematic diagram of a fault locating device according to an embodiment of the present invention;

FIG. 6 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;

fig. 7 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the invention, the aspects of acquisition, analysis, use, transmission, storage and the like of the related user personal information all meet the requirements of related laws and regulations, are used for legal and reasonable purposes, are not shared, leaked or sold outside the aspects of legal use and the like, and are subjected to supervision and management of a supervision department. Necessary measures should be taken for the personal information of the user to prevent illegal access to such personal information data, ensure that personnel having access to the personal information data comply with the regulations of the relevant laws and regulations, and ensure the personal information of the user. Once these user personal information data are no longer needed, the risk should be minimized by limiting or even prohibiting the data collection and/or deletion.

User privacy is protected, when applicable, by de-identifying the data, including in some related applications, such as by removing specific identifiers (e.g., name, account number, age, date of birth, etc.), controlling the amount or specificity of stored data, controlling how the data is stored, and/or other methods.

Fig. 1 is a flow chart of a fault localization method according to an embodiment of the present invention. As an embodiment of the present invention, as shown in fig. 1, the fault locating method may include:

step 101, calculating a first probability value of each word segment, and sorting the words segments based on the first probability value of each word segment to obtain first keywords corresponding to each text; and classifying each text based on the first keywords corresponding to each text, thereby obtaining each first text cluster and the first keywords corresponding to each first text cluster.

In the embodiment of the invention, two ways are adopted to extract keywords from a text, namely, a step 101 and a step 102, wherein a first keyword is extracted from the text through the step 101, and a second keyword is extracted from the text through the step 102. Wherein the respective text may include at least one of: event ticket data, alert data, and knowledge base data. Alternatively, an open source big data component (e.g., hdfs, kudu, hive, elasticsearch, etc.) may be used to store the operation-related massive data, which may be processed as text in steps 101 and 102. Because the data is bigger, cold and hot data separation can be considered, and an index is added; if the data is missing in disorder, data cleaning may be required, while the operation and maintenance large data platform provides a method for cleaning the data.

The data which participates in the calculation of the invention needs a large amount of text data, mainly selects event sheet data (including detailed description of faults, mainly including occurrence time of faults, abnormal phenomenon, passing of processing faults, result of recovery, when recovery, etc.), alarm information (including abstract fields, mainly including time of alarm, alarm which occurs in which system, alarm which occurs in which deployment unit, alarm which occurs in which specific machine, why alarm occurs, reasons such as CPU, memory or disk, etc.), knowledge base data (operation and maintenance knowledge articles), etc.

In step 101, a first probability value of each word segment is calculated, then the words are sorted according to the order from big to small or the order from small to big, so as to screen out first keywords of each text, and finally each text is classified according to the first keywords corresponding to each text, so as to obtain each first text cluster and the first keywords corresponding to each first text cluster.

Optionally, calculating a first probability value of each word segment, and sorting the words based on the first probability value of each word segment to obtain first keywords corresponding to each text, including: for each word segment, respectively calculating a first occurrence probability of the word segment in a text where the word segment is located and a second occurrence probability of the word segment in all texts, and dividing the first occurrence probability by the second occurrence probability to obtain a first probability value of the word segment; and arranging the segmented words in a descending order based on the first probability value of each segmented word, so that n segmented words with the top order are screened out to be used as first keywords corresponding to the text.

Performing word segmentation on massive text data acquired from an operation and maintenance big data platform, counting the first occurrence probability of each word segment in the text where the word segment is located and the second occurrence probability of the word segment in all texts, and calculating the first probability value of the word segment by adopting a formula (I):

t (w) =c (w) ×a (w)/B (w) formula (one)

Wherein w represents the word segmentation, A (w) represents the first occurrence probability of the word segmentation in the text, B (w) represents the second occurrence probability of the word segmentation in all the texts, and T (w) represents the first probability value of the word segmentation.

Alternatively, each text may be input to the probability statistics tool one by one, so as to obtain a first occurrence probability of each word segment in the text. Alternatively, the text may be segmented using the jieba segmentation or hanlp segmentation with an open source, and the total number of the segmented words and the number of each segmented word in the text are counted, and the number of the segmented words is divided by the total number of the segmented words in the text, so that the first occurrence probability a (w) of the segmented words in the text may be obtained.

Alternatively, the second occurrence probability B (w) of the word in all the texts can be obtained by dividing the number of the word by the total number of the word in all the texts. B (w) is typically a fixed variable, which may be calculated once for 3 months or half a year.

T (w) represents the probability value of the word w, and the larger the probability value is, the greater the importance of the word w is, the more the word w can be a keyword; the meaning is that when the probability of one word w in a mass of texts is lower (the smaller B), and the probability of a certain text is higher (the larger a), the more particular the word w is described (the larger T), and therefore the probability that the word w becomes a keyword is higher.

In addition, the weighting coefficient C (w) of each word can be set, the weighting coefficient of the default word is 1.0, the operation and maintenance personnel can adjust the weighting coefficient of the individual word according to actual needs, for example, the words such as a disk, a CPU, a memory and the like are important, the weighting coefficient can be set to be 1.2 or 1.5, and the result adjustment is convenient, so that the operation and maintenance personnel adapt to different service requirements.

For each text, calculating a first probability value of each word segment in the text by adopting a formula (I) to obtain a series of T (w 1), T (w 2),. And T (wn), wherein the text Ctx consists of the word segments w1, w2, & gt and wn; ordering T (w 1), T (w 2), and T (wn) from large to small, and taking the first n segmentation words as the first key words of the text Ctx.

Optionally, classifying each text based on the first keyword corresponding to each text, so as to obtain each first text cluster and the first keyword corresponding to each first text cluster, including: comparing the first text keywords corresponding to the first text with the first text keywords corresponding to the second text, and if the number of the first text keywords which are the same as the number of the second text keywords is more than or equal to m, dividing the first text and the second text into the same text cluster, thereby obtaining a first text cluster and the first keywords corresponding to the first text cluster; wherein n is greater than or equal to m.

For example, the keywords corresponding to the text a are the word A1, the word A2, the word A3, the word A4, and the word A5, the keywords corresponding to the text B are the word B1, the word B2, the word B3, the word B4, and the word B5, and if the number of the same words in the word A1, the word A2, the word A3, the word A4, the word A5, the word B1, the word B2, the word B3, the word B4, and the word B5 is equal to or greater than 4, the text a and the text B are divided into the same text cluster.

For example, the keywords corresponding to the text a are the word A1, the word A2, the word A3, the word A4, and the word A5, the keywords corresponding to the text B are the word B1, the word B2, the word B3, the word B4, and the word B5, and if the number of the same words in the word A1, the word A2, the word A3, the word A4, and the word A5 and the word B1, the word B2, the word B3, the word B4, and the word B5 is 5, the text a and the text B are divided into the same text cluster.

It should be noted that the sizes of n and m may be set according to actual needs, which is not limited in the embodiment of the present invention.

In step 101, the texts with identical or mostly identical keywords are divided into identical text clusters, so as to obtain a plurality of first text clusters, and each first text cluster has a corresponding first keyword. It should be noted that, the first keywords corresponding to the first text cluster may be the same keywords in the keywords of all the texts in the cluster, or may be a set of the keywords of all the texts in the cluster after the duplication removal.

The calculation method of step 101 has the following advantages: has the characteristics of simplicity, intuitiveness and good expressive property, and is easy to understand after interpretation; the required computing resource is very small, and the goal of rapid computation can be achieved; the probability value of the word segmentation can be adjusted as required by adjusting the weighting coefficient, and the ranking of the keywords is changed.

Step 102, clustering the texts to obtain second text clusters; and calculating a second probability value of each word in each second text cluster, and sorting the words based on the second probability values of the words to obtain second keywords corresponding to the second text clusters.

In the step 102, the text is clustered, then the second probability value of each word in the text cluster is calculated, and finally the keywords corresponding to each text cluster are screened out based on the second probability value of each word.

Optionally, clustering the texts to obtain second text clusters, including: for each text, calculating a text vector of the text according to word vectors of the words in the text; calculating the distance between the text vector and the cluster center of each second text cluster by adopting an included angle cosine formula; judging whether the distance is larger than a distance threshold value or not; if yes, dividing the text into a second text cluster with the minimum distance from the text; if not, a second text cluster is newly built, and the text is divided into the newly built second text clusters. Specifically, the jieba or hanlp word segmentation may be performed on each text, and then the segmented text is input into a word2vec tool, so as to output a word vector of each word, for example, a 128-dimensional vector.

Vector: the dimension space is used for carrying out mathematical representation on a certain object, and mainly carries out vector processing and expression on words and texts in the invention.

Assuming that the word vector of the word segmentation w is V (w), for each text, segmenting the text, removing unimportant words such as stop words and auxiliary words, and obtaining a series of segmented words, wherein the text is expressed as ctx=w1+w2+ & gt, and the vector of the text can be calculated by adopting a formula (two):

v (Ctx) =c1×v (w 1) +c2×v (w 2) +v (wn) formula (two)

Where C1, C2, cn represent weighting coefficients (default coefficient is 1.0, the weighting coefficients of keywords specific to the more common failure scene types can be suitably increased, for example, "disk" of IO failure, "CPU" and "memory" of performance failure, etc. are increased to 1.2 or 1.5, and so on). The addition in equation (two) is a linear addition of the weighted vectors.

And processing batch texts input on the operation and maintenance big data platform according to the steps to obtain a series of text vectors, wherein V (Ctx 1), V (Ctx 2), and V (Ctxn).

Then, the text vectors are projected into a high-dimensional space one by one, when the newly added text vector cannot be integrated into the original text cluster, a new text cluster is constructed by the new text vector (the text vector which is projected at the beginning almost becomes the new text cluster), whether the new text vector can be integrated into a certain text cluster is calculated, the distance between the cluster center vector of the text cluster and the new text vector is calculated by using an included angle cosine formula, if the distance is larger than a distance threshold (such as 0.8, 0.85 or 0.9, etc.), the distance is divided into the text cluster, wherein the included angle cosine formula is as follows,

Wherein A represents a vector of a new text, B represents a cluster center of a text cluster, and both A and B are multidimensional vectors.

Cluster center (centroid) calculation of the text cluster is calculated by using a formula (IV):

v (center) = [ C1 x V (Ctx 1) +c2 x V (Ctx 2) +.+ Cm x V (Ctxm) ]/m formula (four)

Wherein V (Ctx) represents one text vector in a text cluster, and m texts (vectors) are in total in the text cluster; and C represents a weighting coefficient of the text, the weight of the important text is increased, and the cluster center of the text cluster can be calculated by dividing the linear sum of the weighting vectors by m.

Cm default weighting factor is 1.0, and this factor can be adjusted appropriately when the event order is high, e.g., the event order for level 6 faults is set to 1.2, the event order for level 5 faults is set to 1.5, etc., and so on.

Through the steps, the stored operation and maintenance texts on the operation and maintenance big data platform can be classified, and when new texts are generated, the new texts can also be added into corresponding text clusters.

Optionally, calculating a second probability value of each word segment in each second text cluster, and sorting each word segment based on the second probability value of each word segment to obtain a second keyword corresponding to each second text cluster, including: for each word segment, calculating a third occurrence probability of the word segment in a second text cluster where the word segment is located and a second occurrence probability of the word segment in all texts respectively, and dividing the third occurrence probability by the second occurrence probability to obtain a second probability value of the word segment; and arranging the segmented words in a descending order based on the second probability value of the segmented words, so that t segmented words with the top ranking are screened out and used as second keywords corresponding to the second text cluster.

After clustering the texts, extracting the keywords of each text cluster, and adopting a calculation mode similar to that of the step 101, calculating a second probability value of each word in each text cluster by adopting a formula (I), and then carrying out descending order arrangement on each word based on the second probability value of each word so as to screen t words with the top ranking as the keywords corresponding to the text cluster.

The step 101 has the advantages of simple calculation, processing of one text, quick response and more rigorous calculation due to the clustering algorithm. The calculation mode of the invention avoids a large amount of calculation of the neural network of machine learning, the calculation is simple and clear in the operation, and part of larger workload which needs to be calculated in advance is put on an operation and maintenance large data platform to be calculated in advance (such as the second occurrence probability of word segmentation in all texts in the step 101).

And step 103, constructing a fault model based on the first keyword and the second keyword.

After the keywords are extracted in the steps 101 and 102, the keywords are used as the keywords of the alarm rules in the modeling of the fault model.

Failure: the invention refers to software and hardware errors on (data center) computer equipment.

Fault model: aiming at faults occurring in operation and maintenance, a pertinently established model comprises the setting of various rules such as alarm keyword rule policies, time policy setting, space policies, disposal policies and the like.

The invention automatically builds the fault model, and the keywords are required to be extracted from the event list and the alarms and used as basic elements for building the alarm matching rule of the fault model. The invention provides two modes for extracting the keywords, and the keywords are used as the keywords of the fault model rule after being synthesized, so that the accuracy and the effect are improved.

Optionally, constructing a fault model based on the first keyword and the second keyword includes: combining and de-duplicating the first keyword and the second keyword, thereby obtaining a model keyword; constructing a keyword strategy based on the model keywords; and constructing a fault model according to the keyword strategy, the time strategy, the space strategy and the disposal strategy. The invention adopts two different technical means of probability statistics and vector cluster analysis to process the operation and maintenance text, and then carries out de-duplication processing, thereby ensuring the accuracy and reliability of the extracted keywords.

For example, a text cluster a and the keywords A1, A2 and A3 corresponding to the text cluster a are obtained in step 101, a text cluster B and the keywords B1, B2 and B3 corresponding to the text cluster B are obtained in step 102, the keywords A1, A2, A3, B1, B2 and B3 are combined and de-duplicated to obtain model keywords, then a keyword strategy is constructed based on the model keywords, and finally a fault model is constructed according to the keyword strategy, the time strategy, the space strategy and the disposal strategy.

It should be noted that, if the keywords of the text cluster a and the text cluster B cannot be merged, a keyword policy is built based on the keywords of the text cluster a, so as to build a fault model, and a keyword policy is built based on the keywords of the text cluster B, so as to build a fault model.

Optionally, merging and deduplicating the first keyword and the second keyword, thereby obtaining a model keyword, including: comparing the first keywords with the second keywords, and if the number of the first keywords which is the same as that of the second keywords is greater than or equal to a number threshold, merging the first keywords with the second keywords, and de-duplicating the merged keywords to obtain model keywords. In order to improve the fault location accuracy of the fault model, if most of the keywords of the text cluster A and the text cluster B are the same, for example, the number of the same keywords is greater than or equal to a number threshold, the keywords of the text cluster A and the keywords of the text cluster B are combined and de-duplicated.

Specifically, the definition of the fault model mainly contains 4 rules:

1) Keyword matching to alert rules, such as setting:

rule a alarm text contains disk

Rule b alert text contains full

Rule c alarm text contains centralized backup

aAND b AND c (there may be OR OR bracket logic calculation method)

The rules are mainly used to filter alarms.

2) Time relationship: all alarms are set to aggregate into one fault instance within a plurality of (5) minutes.

3) Spatial relationship: alarms on the same system (or deployment unit, or server) are aggregated into one failure instance.

4) Treatment strategy: the failure instance can recover from the failure by executing a series of scripts.

And 104, receiving alarm information in real time, responding to the alarm information hitting the fault model, and generating a fault instance according to the alarm information.

The fault model is defined statically, which is equivalent to setting a plurality of rules, and the fault instance represents a dynamic form, which represents that the fault model is triggered to be activated after the alarm is received, so as to form a fault analysis and treatment set description, namely the fault instance.

For example, if a fault model is defined that the alarm information is to satisfy keywords such as "disk" and "full", when the received alarm contains "disk" and "full", a fault instance may be formed, and the fault model may be generated by the same machine at 10 points, or a fault instance may be triggered again at 11 points, where the fault instances at 10 points and 11 points are different (because of different time), but the triggered fault models are the same (because the constraint "disk" and "full" are the same).

As shown in fig. 2, when receiving an alarm message, judging whether the alarm message hits a certain fault model, if yes, generating a fault instance according to the fault model and the alarm message; if no hit occurs, step 101 (probability statistics) and step 102 (vector classification) are performed on the alert information as a new text, and then it is determined whether the alert information is divided into existing text clusters and whether the keywords corresponding to the text clusters need to be updated, if yes, the new keywords are extracted as model keywords, so that the existing fault model is updated or a new fault model is constructed.

According to the various embodiments described above, it can be seen that the first keyword corresponding to each text is obtained by calculating the first probability value of each word segment and sorting each word segment based on the first probability value of each word segment; classifying each text based on the first keywords corresponding to each text to obtain each first text cluster and the first keywords corresponding to each first text cluster, and clustering each text to obtain each second text cluster; and calculating a second probability value of each word in each second text cluster, and sequencing each word based on the second probability value of each word to obtain a second keyword corresponding to each second text cluster, thereby solving the technical problems of high manual maintenance cost and inaccurate fault location in the prior art. According to the embodiment of the invention, the keywords are extracted in a mode of combining probability statistics and vector classification, and the fault model is constructed based on the extracted keywords, so that not only are the manpower and the labor maintenance cost saved, but also the keywords can be extracted accurately and rapidly, and the fault model can be used for accurately positioning the fault.

Fig. 3 is a flow chart of a fault localization method according to another referenceable embodiment of the invention. As another embodiment of the present invention, as shown in fig. 3, the fault locating method may include:

step 301, for each word segment, calculating a first occurrence probability of the word segment in a text where the word segment is located and a second occurrence probability of the word segment in all texts, and dividing the first occurrence probability by the second occurrence probability to obtain a first probability value of the word segment.

And 302, arranging the segmented words in a descending order based on the first probability value of each segmented word, so as to screen n segmented words with the top order as first keywords corresponding to the text.

Step 303, comparing the first text keywords corresponding to the first text with the first text keywords corresponding to the second text, and if the number of the first text keywords is greater than or equal to m, dividing the first text and the second text into the same text cluster, thereby obtaining a first text cluster and the first keywords corresponding to the first text cluster; wherein n is greater than or equal to m.

Step 304, for each text, calculating the text vector of the text according to the word vector of each word in the text, and calculating the distance between the text vector and the cluster center of each second text cluster by adopting an included angle cosine formula.

Step 305, judging whether the distance is greater than a distance threshold; if yes, go to step 306; if not, step 307 is performed.

And step 306, dividing the text into a second text cluster with the smallest distance from the text.

Step 307, a second text cluster is created, and the text is divided into the created second text cluster.

Step 308, for each word segment, calculating a third occurrence probability of the word segment in a second text cluster where the word segment is located and a second occurrence probability of the word segment in all texts, and dividing the third occurrence probability by the second occurrence probability to obtain a second probability value of the word segment.

And 309, arranging the segmented words in a descending order based on the second probability value of each segmented word, so as to screen t segmented words with the top ranking as second keywords corresponding to the second text cluster.

Step 310, constructing a fault model based on the first keyword and the second keyword.

Step 311, receiving alarm information in real time, responding to the alarm information hitting the fault model, and generating a fault instance according to the alarm information.

In addition, the specific implementation of the fault locating method according to the present invention is described in detail in the above description of the fault locating method, and thus the description thereof will not be repeated here.

Fig. 4 is a flow chart of a fault localization method according to yet another alternative embodiment of the present invention. As still another embodiment of the present invention, as shown in fig. 4, the fault locating method may include:

step 401, calculating a first probability value of each word segment, and sorting the words segments based on the first probability value of each word segment to obtain first keywords corresponding to each text; and classifying each text based on the first keywords corresponding to each text, thereby obtaining each first text cluster and the first keywords corresponding to each first text cluster.

Step 402, clustering the texts to obtain second text clusters; and calculating a second probability value of each word in each second text cluster, and sorting the words based on the second probability values of the words to obtain second keywords corresponding to the second text clusters.

And step 403, comparing the first keyword with the second keyword, and if the number of the first keyword and the second keyword which are the same is greater than or equal to a number threshold, merging the first keyword and the second keyword, and performing de-duplication on the merged keyword, thereby obtaining a model keyword.

Step 404, constructing a keyword strategy based on the model keywords.

And step 405, constructing a fault model according to the keyword strategy, the time strategy, the space strategy and the disposal strategy.

And step 406, receiving alarm information in real time, responding to the alarm information hitting the fault model, and generating a fault instance according to the alarm information.

In addition, in still another embodiment of the present invention, the specific implementation of the fault locating method has been described in detail in the above description, so that the description is not repeated here.

Fig. 5 is a schematic diagram of a fault locating device according to an embodiment of the present invention. As shown in fig. 5, the fault location device 500 includes a first extraction module 501, a second extraction module 502, a construction module 503, and a location module 504; the first extraction module 501 is configured to calculate a first probability value of each word segment, and rank each word segment based on the first probability value of each word segment to obtain a first keyword corresponding to each text; classifying each text based on the first keywords corresponding to each text, so as to obtain each first text cluster and the first keywords corresponding to each first text cluster; the second extraction module 502 is configured to cluster the texts, so as to obtain second text clusters; calculating a second probability value of each word in each second text cluster, and sorting the words based on the second probability values of the words to obtain second keywords corresponding to the second text clusters; the construction module 503 is configured to construct a fault model based on the first keyword and the second keyword; the positioning module 504 is configured to receive alarm information in real time, and generate a fault instance according to the alarm information in response to the alarm information hitting the fault model.

Optionally, the first extraction module 501 is further configured to:

Optionally, the second extraction module 502 is further configured to:

Optionally, the building module 503 is further configured to:

constructing a keyword strategy based on the model keywords;

Optionally, the building module 503 is further configured to:

In addition, the specific implementation of the fault locating device according to the present invention is described in detail in the fault locating method described above, and thus the description thereof will not be repeated here.

Fig. 6 illustrates an exemplary system architecture 600 in which the fault locating method or fault locating device of an embodiment of the present invention may be applied.

As shown in fig. 6, the system architecture 600 may include terminal devices 601, 602, 603, a network 604, and a server 605. The network 604 is used as a medium to provide communication links between the terminal devices 601, 602, 603 and the server 605. The network 604 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user may interact with the server 605 via the network 604 using the terminal devices 601, 602, 603 to receive or send messages, etc. Various communication client applications such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 601, 602, 603.

The terminal devices 601, 602, 603 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 605 may be a server providing various services, such as a background management server (by way of example only) providing support for shopping-type websites browsed by users using terminal devices 601, 602, 603. The background management server can analyze and other data such as the received article information inquiry request and feed back the processing result to the terminal equipment.

It should be noted that, the fault locating method provided by the embodiment of the present invention is generally executed by the server 605, and accordingly, the fault locating device is generally disposed in the server 605.

It should be understood that the number of terminal devices, networks and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 7, there is illustrated a schematic diagram of a computer system 700 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 7 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU) 701, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the system 700 are also stored. The CPU 701, ROM 702, and RAM703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, and the like; an output portion 707 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 708 including a hard disk or the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. The drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read therefrom is mounted into the storage section 708 as necessary.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709, and/or installed from the removable medium 711. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 701.

The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer programs according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor comprises a first extraction module, a second extraction module, a construction module and a positioning module, wherein the names of these modules do not constitute a limitation of the module itself in some cases.

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, implement the method of: calculating a first probability value of each word segment, and sequencing each word segment based on the first probability value of each word segment to obtain a first keyword corresponding to each text; classifying each text based on the first keywords corresponding to each text, so as to obtain each first text cluster and the first keywords corresponding to each first text cluster; clustering the texts to obtain second text clusters; calculating a second probability value of each word in each second text cluster, and sorting the words based on the second probability values of the words to obtain second keywords corresponding to the second text clusters; constructing a fault model based on the first keyword and the second keyword; and receiving alarm information in real time, responding to the alarm information hitting the fault model, and generating a fault instance according to the alarm information.

As a further aspect, embodiments of the present invention also provide a computer program product comprising a computer program which, when executed by a processor, implements the method according to any of the above embodiments.

According to the technical scheme of the embodiment of the invention, because the first probability value of each word segment is calculated, the words are ordered based on the first probability value of each word segment, and the first keywords corresponding to each text are obtained; classifying each text based on the first keywords corresponding to each text to obtain each first text cluster and the first keywords corresponding to each first text cluster, and clustering each text to obtain each second text cluster; and calculating a second probability value of each word in each second text cluster, and sequencing each word based on the second probability value of each word to obtain a second keyword corresponding to each second text cluster, so that the technical problems of high manual maintenance cost and inaccurate fault location in the prior art are overcome. According to the embodiment of the invention, the keywords are extracted in a mode of combining probability statistics and vector classification, and the fault model is constructed based on the extracted keywords, so that not only are the manpower and the labor maintenance cost saved, but also the keywords can be extracted accurately and rapidly, and the fault model can be used for accurately positioning the fault.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A fault locating method, comprising:

constructing a fault model based on the first keyword and the second keyword;

2. The method of claim 1, wherein calculating a first probability value for each term, and ranking each term based on the first probability value for each term to obtain a first keyword for each text comprises:

3. The method of claim 2, wherein classifying each text based on the first keyword corresponding to each text to obtain each first text cluster and the first keyword corresponding thereto, comprises:

4. The method of claim 1, wherein clustering the respective texts to obtain respective second text clusters comprises:

5. The method of claim 1, wherein calculating a second probability value for each word segment in each second text cluster, and ranking each word segment based on the second probability value for each word segment, to obtain a second keyword corresponding to each second text cluster, comprises:

6. The method of claim 1, wherein constructing a fault model based on the first keyword and the second keyword comprises:

constructing a keyword strategy based on the model keywords;

7. The method of claim 6, wherein merging and deduplicating the first keyword and the second keyword to obtain a model keyword comprises:

8. A fault locating device, comprising:

9. The apparatus of claim 8, wherein the first extraction module is further to:

10. The apparatus of claim 9, wherein the first extraction module is further configured to:

11. The apparatus of claim 8, wherein the second extraction module is further to:

12. The apparatus of claim 8, wherein the second extraction module is further to:

13. The apparatus of claim 8, wherein the build module is further to:

constructing a keyword strategy based on the model keywords;

14. The apparatus of claim 13, wherein the build module is further to:

15. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

the one or more processors implement the method of any of claims 1-7 when the one or more programs are executed by the one or more processors.

16. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-7.

17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-7.