CN111241145A - Self-healing rule mining method and device based on big data - Google Patents
Self-healing rule mining method and device based on big data Download PDFInfo
- Publication number
- CN111241145A CN111241145A CN201811437497.8A CN201811437497A CN111241145A CN 111241145 A CN111241145 A CN 111241145A CN 201811437497 A CN201811437497 A CN 201811437497A CN 111241145 A CN111241145 A CN 111241145A
- Authority
- CN
- China
- Prior art keywords
- data
- rule
- mining
- sample data
- strong association
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 238000005065 mining Methods 0.000 title claims abstract description 57
- 238000007781 pre-processing Methods 0.000 claims abstract description 21
- 230000002159 abnormal effect Effects 0.000 claims abstract description 19
- 238000007418 data mining Methods 0.000 claims abstract description 17
- 230000009467 reduction Effects 0.000 claims description 17
- 238000012545 processing Methods 0.000 claims description 16
- 238000004140 cleaning Methods 0.000 claims description 15
- 238000004422 calculation algorithm Methods 0.000 claims description 14
- 238000009499 grossing Methods 0.000 claims description 13
- 230000008859 change Effects 0.000 claims description 8
- 238000012795 verification Methods 0.000 claims description 7
- 238000007637 random forest analysis Methods 0.000 claims description 5
- 230000002776 aggregation Effects 0.000 claims description 4
- 238000004220 aggregation Methods 0.000 claims description 4
- 238000002372 labelling Methods 0.000 claims description 4
- 238000010200 validation analysis Methods 0.000 claims 1
- 230000008569 process Effects 0.000 description 14
- 238000004891 communication Methods 0.000 description 5
- 238000004590 computer program Methods 0.000 description 5
- 238000010276 construction Methods 0.000 description 4
- 230000007547 defect Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000012423 maintenance Methods 0.000 description 4
- 238000009412 basement excavation Methods 0.000 description 3
- 210000004556 brain Anatomy 0.000 description 3
- 238000003066 decision tree Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 238000013501 data transformation Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000012952 Resampling Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000001556 precipitation Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000033772 system development Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/40—Business processes related to the transportation industry
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides a self-healing rule mining method and device based on big data, wherein the method comprises the following steps: collecting service data, performance data and log data as sample data; preprocessing the sample data, and converting the sample data into a form suitable for data mining; mining association rules of the preprocessed sample data to obtain a certain number of strong association rules, wherein each strong association rule at least comprises business data and performance data; and verifying the strong association rule, and if the strong association rule is bound to correspond to the occurrence of an abnormal phenomenon, taking the strong association rule as a self-healing rule. The embodiment of the invention greatly reduces the dependence on human experience, reduces the energy input of professionals and saves a great amount of manpower.
Description
Technical Field
The invention relates to the technical field of data mining, in particular to a self-healing rule mining method and device based on big data.
Background
In order to adapt to the continuous increase of the number of users and the continuous expansion of service types in the telecommunication industry, the design of a service acceptance system is also developed, and the characteristics of complex architecture relationship, large node scale, frequent system updating and the like are presented. In the face of such a complex, large and variable "super" system, the operation and maintenance personnel still need to maintain the goal of abnormally fast recovery, which is a great challenge. How to fix a position the slight anomaly fast and solve fast in complicated, huge, changeable system, the existing mainstream scheme is realized through the operation and maintenance automation mode, what the automation mode to this kind of condition mainly adopted is the self-healing means, the self-healing process relies on the automated flow drive, wherein the rule basis of judging the self-healing is the kernel of whole self-healing, how to formulate the current common solution of rule as follows:
the first scheme is as follows: and introducing an industry general rule, and generating a self-healing rule based on industry recognized operation index information, such as the operation index CPU utilization rate and the memory utilization rate of hardware. However, only rule indexes of general devices such as hosts, networks and the like are widely accepted in the industry at present, and other abnormal points beyond the scope are blind areas.
Scheme II: the most traditional method for converting the rules by human brain conversion rules is to summarize the rules by human experience based on human experiences, and mainly depends on the historical experience, technical ability and summarization and induction ability of personnel. However, the scheme has strong dependence on personnel ability, and the rule of human brain transformation is relatively simple, and as the system becomes more complex and huge, the yield efficiency of the rule precipitation of human brain transformation is low from the perspective of the rule yield efficiency, and the pace of system development cannot be kept up.
The third scheme is as follows: the fault recovery method adopts a scheme of individual rule design, after the fault is recovered, the fault occurrence and processing process is combed once again, the fault root cause is found out, the fault characteristics are extracted aiming at the fault, the fault characteristics are solidified into self-healing rules, and the same fault is avoided. According to the scheme, other problems beyond the fault range cannot be radiated, the individuality of each rule enables the rule to be incapable of quickly adapting to system changes, the rule is easy to become invalid, and the accuracy of the rule cannot be guaranteed along with the change of the system from the using effect of the rule.
In summary, the existing technical solutions all have serious disadvantages, and have been unable to adapt to the self-healing requirement of current system maintenance in terms of rule application range, rule output efficiency, rule using effect, and the like.
Disclosure of Invention
The present invention provides a big data based self-healing rule mining method and apparatus that overcomes or at least partially solves the above-mentioned problems.
In a first aspect, an embodiment of the present invention provides a self-healing rule mining method, including:
collecting service data, performance data and log data as sample data;
preprocessing the sample data, and converting the sample data into a form suitable for data mining;
mining association rules of the preprocessed sample data to obtain a certain number of strong association rules, wherein each strong association rule at least comprises business data and performance data;
verifying the strong association rule, and if the strong association rule is bound to correspond to the abnormal phenomenon, taking the strong association rule as a self-healing rule
In a second aspect, an embodiment of the present invention provides an excavating device for self-healing rules, including:
the sample data acquisition module is used for acquiring the service data, the performance data and the log data as sample data;
the preprocessing module is used for preprocessing the sample data and converting the sample data into a form suitable for data mining;
the association rule mining module is used for mining association rules of the preprocessed sample data to obtain a certain number of strong association rules, and each strong association rule at least comprises business data and performance data;
and the verification module is used for verifying the strong association rule, and if the strong association rule is bound to correspond to the occurrence of an abnormal phenomenon, the strong association rule is used as a self-healing rule.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method provided in the first aspect when executing the program.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method as provided in the first aspect.
The self-healing rule mining method and device based on big data provided by the embodiment of the invention have the characteristics of diversification of collected data information by collecting performance data, log data and service data, are not limited to basic data common to several industries any more, incorporate the service data, mine loading of more data from an association rule algorithm, convert the data into more rule forms to cope with abnormity, break through the original limitation of rule construction, and enable the rule application range to be wider, in addition, sample data is mined by the association rule mining algorithm, the whole process is highly automatic, the rule output efficiency is remarkably improved, the dependence on human experience is greatly reduced, the energy input of professionals is reduced, a large amount of manpower is saved, in addition, the high matching with the actual requirement is realized by verifying the obtained strong association rule, the system abnormity is judged more accurately, and the rules are always in a training and optimizing state along with the continuous change of the system, so that the defect that the rules need to be manually maintained is overcome.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a self-healing rule mining method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an excavating device for self-healing rules according to an embodiment of the present invention;
fig. 3 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to overcome the above problems in the prior art, an embodiment of the present invention provides a self-healing rule mining method, and the inventive concept is as follows: the service data, the performance data and the log data are brought into a mining range, basic data common to several industries is not limited, the association among the data is mined through association rules, the mined strong association rules are verified, whether the strong association rules are necessarily associated with abnormal phenomena or not is judged, and self-healing rules are obtained. The embodiment of the invention enables the construction of the self-healing rule to break through the original limitation, has wider application range, realizes high matching with the actual requirement, has more accurate judgment on the system abnormity, and simultaneously, the self-healing rule is always in the excavation state along with the continuous change of the system, thereby avoiding the defect that the rule needs manual maintenance.
Fig. 1 is a schematic flow chart of a self-healing rule mining method provided in an embodiment of the present invention, as shown in the figure, including:
s101, collecting service data, performance data and log data as sample data.
It should be noted that, in the embodiments of the present invention, service data is collected from a service monitoring system, performance data is collected from an application host/virtual machine, and log data is collected from a log management center. Table 1 is a table of a plurality of key indexes in sample data acquired in the embodiment of the present invention, and as shown in table 1, the embodiment of the present invention is divided into three acquisition types, which are log data, performance data, and service data, respectively, each acquisition type has a plurality of index items, each index item is composed of a plurality of index sub-items, and each index sub-item is represented by a specific index value. For example, the log data includes an index item: TOP5 error, representing the most frequently occurring 5 errors, includes a key for these 5 types of error messages, characterized for each type of error message by the number of occurrences of the key for that error message per minute. It can be understood that each index item, each index sub item, and each index value shown in table 1 are only a part of sample data collected in the mining method according to the embodiment of the present invention. When sampling sample data, the embodiment of the invention fully considers the increase of the user scale and the data scale, prepares for data asset accumulation, realizes multiple data sources through methods such as a client, program code insertion and the like, collects the data in a full amount through multiple methods, and collects enough and comprehensive sample data throughout the whole life cycle of a product used by a user.
TABLE 1 Table of key indicators in sample data
S102, preprocessing the sample data, and converting the sample data into a form suitable for data mining.
Data preprocessing is an important part of the mining process of the embodiment of the invention, and clean, accurate and concise data must be provided in order to mine rich rules. The preprocessing process in the embodiment of the invention can improve the accuracy, integrity and consistency of data by performing operations such as data cleaning, data reduction, data transformation and the like on the collected sample data, and is an important step after the embodiment of the invention starts.
S103, mining association rules of the preprocessed sample data to obtain a certain number of strong association rules, wherein each strong association rule at least comprises business data and performance data.
It should be understood that the self-healing process includes a self-healing rule and a self-healing operation, the self-healing rule indicates a precondition for performing the self-healing operation, for example, an index value a in the service data is abnormal, an error keyword B appears in the log, and a performance data C appears in the log, which is a self-healing rule, and after the self-healing rule appears, the base station performs a task of cell switching out, which is a self-healing operation. Strong association rules are implications in the form of X → Y, where X and Y are referred to as the antecedent or left-hand-side (LHS) and successor (RHS) of the association rule, respectively. Wherein, the rule XY is related, and the support degree and the trust degree exist. The embodiment of the invention converts the self-healing rule into the strong association rule for mining, and the mining of the self-healing rule can be realized by adopting the existing association rule mining method. The mining process of the association rules mainly comprises two stages: in the first stage, all high frequency item sets (frequency items) are found from the data set, and in the second stage, Association Rules (Association Rules) are generated from the high frequency item sets. In the embodiment of the present invention, the material set is a set composed of sample data, and the high-frequency project group is a project group including at least service data and performance data, and it can be understood that the service data and the performance data in the high-frequency project group are both represented by the index sub-item and the corresponding index value. It should be understood that the number of the service data and the performance data in the self-healing rule is not particularly limited.
And S104, verifying the strong association rule, and if the strong association rule is bound to correspond to the abnormal phenomenon, taking the strong association rule as a self-healing rule.
Specifically, after a strong association rule is acquired, the strong association rule needs to be verified, the verification is completed through simulation test, when an abnormal phenomenon occurs, such as a front report 503 error, whether an index sub-item in the strong association rule reaches a corresponding index value is simulated and collected, and if the index value reaches the index value, the 503 error is necessarily reported, the strong association rule is used as a self-healing rule.
The embodiment of the invention ensures that the acquired data information has diversified characteristics by acquiring the performance data, the log data and the service data, is not limited to basic data common to several industries any more, incorporates the service data, excavates more data from the association rule algorithm for loading, converts the data into more rule forms for dealing with the abnormity, ensures that the rule construction breaks through the original limitation and ensures that the rule application range is wider, in addition, the association rule excavation algorithm is used for excavating the sample data, the whole process is highly automatic, the rule output efficiency is obviously improved, the dependence on human experience is greatly reduced, meanwhile, the energy input of professionals is reduced, a great amount of manpower is saved, in addition, the high matching with the actual requirement is realized by verifying the acquired strong association rule, the judgment on the system abnormity is more accurate and simultaneously along with the continuous change of the system, the rules are always in a training and optimizing state, and the defect that the rules need to be manually maintained is overcome.
On the basis of the foregoing embodiments, as an optional embodiment, the preprocessing the sample data further includes: and labeling the acquired index value through stream processing, and storing the sample data in a corresponding database table according to the type of labeling.
Correspondingly, the mining of the association rule is performed on the preprocessed sample data, and specifically comprises the following steps:
according to the label determined by the user, extracting sample data from the database table corresponding to the determined label to carry out association rule mining.
Specifically, the tags of the embodiment of the present invention may be system changes, time intervals, importance levels, and the like, and through tagging, when data mining is performed, the purpose of mining sample data with specific tags can be achieved, so that the mined self-healing rules are more targeted.
On the basis of the above embodiments, as an alternative embodiment, the association rule mining algorithm is Apriori algorithm.
Specifically, the Apriori algorithm is a frequent item set algorithm for mining association rules, and the core idea is to mine a frequent item set through two stages of candidate set generation and downward closed detection of plots. Apriori algorithm searches candidate item set by using a level-wise (level-wise) method, and finds frequent item set by limiting candidate generation, and the calculation formula is as follows:
where x and y are disjoint sets of terms, i.e., the strength of an association rule can be defined in terms of its support and confidence measures, both support(s) and confidence (c), as N is the total number of transactions in the historical time period, and σ (x ∪ y) is the support count, which represents the number of times x and y occur simultaneously in N transactions.
The flow of the algorithm is generally as follows:
firstly, finding out all frequent item sets, wherein the occurrence frequency of the frequent item sets is at least as same as the predefined minimum support degree;
secondly, generating strong association rules from the frequent item set, wherein the rules must meet the minimum support degree and the minimum credibility;
and thirdly, generating a desired rule by using the frequent item set, and generating all rules only containing items of the set, wherein the right part of each rule only has one item, and the definition of the rule in the middle is adopted. Once these rules are generated, only those rules that are greater than a given minimum confidence are left. To generate all frequency sets, a recursive approach is used.
On the basis of the foregoing embodiments, as an optional embodiment, the strong association rule is verified, specifically, the strong association rule is verified by using a random forest (random forest) model
The random forest model refers to a classifier that trains and predicts a sample by using a plurality of trees. In general, the processing flow of the random forest model is as follows:
1. generating n samples from the sample set by means of resampling;
2. assuming that the feature number of the samples is a, selecting k features in a for n samples, and obtaining an optimal segmentation point by establishing a decision tree;
3. and repeating the steps m times to generate m decision trees.
In the embodiment of the present invention, the sample set is a frequent item set, the sample is each item in the frequent item set, the characteristic is an index sub-item, the optimal division point is a specific value, for example, the CPU utilization is greater than 60%, wherein the CPU utilization is an index sub-item and is also a characteristic, greater than 60% is the optimal division point, and the decision tree is a process of performing simulation test on the frequent item set to obtain a result, which can be understood as a generated strong association rule.
On the basis of the foregoing embodiments, as an optional embodiment, the sample data is preprocessed, specifically:
performing data cleaning on the sample data, including: deleting repeated data and irrelevant data, smoothing noise data, and interpolating abnormal data and missing data.
Specifically, in the embodiment of the present invention, noise data is smoothed by using a binning method, where a "neighbor" (surrounding value) is considered to smoothly store a value of the data, a "bin depth" indicates that there is the same number of data in different bins, and a "bin width" indicates a value range of each bin value. The box separation method can remove noise, discretize continuous data and increase granularity. The method for separating the boxes in the embodiment of the invention can be a sampling equal-depth box separating method, an equal-width box separating method, a minimum entropy method or a user-defined interval method.
And performing dimension reduction processing on the sample data after data cleaning, and discarding the sample data.
It should be noted that the effect of the dimension reduction processing is to reduce the data after cleaning, and on the premise of keeping the original appearance of the data as much as possible, the time for data exchange and subsequent data mining is reduced by reducing the data volume. Dimension reduction is to reduce the number of features of data, discard unimportant features, and describe data with as few key features as possible.
And converting the sample data left after the dimension reduction processing into a form suitable for data mining in a smooth aggregation and data generalization mode.
Specifically, the smoothing process may be a smoothing by average value, a smoothing by boundary value, and a smoothing by median value. For real data, the data is transformed by conceptual layering and discretization of the data.
Fig. 2 is a schematic structural diagram of an excavating device according to a self-healing rule provided in an embodiment of the present invention, and as shown in fig. 2, the excavating device includes: a sample data acquisition module 201, a preprocessing module 202, an association rule mining module 203 and a verification module 204, wherein:
the sample data acquiring module 201 is configured to acquire service data, performance data, and log data as sample data.
It should be noted that the sample data acquisition module in the embodiment of the present invention collects service data from the service monitoring system, collects performance data from the application host/virtual machine, and collects log data from the log management center. The method fully considers the increase of the user scale and the data scale, prepares for data asset accumulation, realizes multiple data sources through methods such as a client and program code insertion, collects the data in a full amount by multiple methods, and collects enough and comprehensive sample data throughout the whole life cycle of a product used by a user.
The preprocessing module 202 is configured to preprocess the sample data and convert the sample data into a form suitable for data mining.
Data preprocessing is an important part of the mining process of the embodiment of the invention, and clean, accurate and concise data must be provided in order to mine rich rules. The preprocessing module in the embodiment of the invention can improve the accuracy, integrity and consistency of data by performing operations such as data cleaning, data reduction, data transformation and the like on the acquired sample data, and is an important module in the embodiment of the invention.
The association rule mining module 203 is configured to perform association rule mining on the preprocessed sample data to obtain a certain number of strong association rules, where each strong association rule at least includes service data and performance data.
It should be understood that the self-healing process includes a self-healing rule and a self-healing operation, the self-healing rule indicates a precondition for performing the self-healing operation, for example, an index value a in the service data is abnormal, an error keyword B appears in the log, and a performance data C appears in the log, which is a self-healing rule, and after the self-healing rule appears, the base station performs a task of cell switching out, which is a self-healing operation. Strong association rules are implications in the form of X → Y, where X and Y are referred to as the antecedent or left-hand-side (LHS) and successor (RHS) of the association rule, respectively. Wherein, the rule XY is related, and the support degree and the trust degree exist. The embodiment of the invention converts the self-healing rule into the strong association rule for mining, and the mining of the self-healing rule can be realized by adopting the existing association rule mining method. The mining process of the association rules mainly comprises two stages: in the first stage, all high frequency item sets (frequency items) are found from the data set, and in the second stage, Association Rules (Association Rules) are generated from the high frequency item sets. In the embodiment of the present invention, the material set is a set composed of sample data, and the high-frequency project group is a project group including at least service data and performance data, and it can be understood that the service data and the performance data in the high-frequency project group are both represented by the index sub-item and the corresponding index value. It should be understood that the number of the service data and the performance data in the self-healing rule is not particularly limited.
The verification module 204 is configured to verify the strong association rule, and if the strong association rule inevitably corresponds to an abnormal phenomenon, the strong association rule is used as a self-healing rule.
Specifically, after a strong association rule is acquired, the strong association rule needs to be verified through a verification module, the verification is completed through simulation testing, when an abnormal phenomenon occurs, such as a front report 503 error, whether an index sub-item in the strong association rule reaches a corresponding index value is simulated and collected, and if the index value reaches the index value, the front report 503 error is inevitable, the strong association rule is used as a self-healing rule.
The mining device provided in the embodiment of the present invention specifically executes the flows of the mining method embodiments, and please refer to the contents of the mining method embodiments in detail, which are not described herein again. The embodiment of the invention ensures that the acquired data information has diversified characteristics by acquiring the performance data, the log data and the service data, is not limited to basic data common to several industries any more, incorporates the service data, excavates more data from the association rule algorithm for loading, converts the data into more rule forms for dealing with the abnormity, ensures that the rule construction breaks through the original limitation and ensures that the rule application range is wider, in addition, the association rule excavation algorithm is used for excavating the sample data, the whole process is highly automatic, the rule output efficiency is obviously improved, the dependence on human experience is greatly reduced, meanwhile, the energy input of professionals is reduced, a great amount of manpower is saved, in addition, the high matching with the actual requirement is realized by verifying the acquired strong association rule, the judgment on the system abnormity is more accurate and simultaneously along with the continuous change of the system, the rules are always in a training and optimizing state, and the defect that the rules need to be manually maintained is overcome.
On the basis of the above embodiments, the preprocessing module of the embodiment of the present invention includes: the data processing system comprises a data cleaning unit, a data reduction unit and a change unit, wherein:
the data cleaning unit is used for performing data cleaning on the sample data, and comprises: deleting repeated data and irrelevant data, smoothing noise data, and interpolating abnormal data and missing data.
Specifically, the data cleaning unit performs smoothing processing on the noise data by using a binning method, in which values of the stored data are smoothed by considering "neighbors" (surrounding values), the depth of a bin indicates that the same number of data exist in different bins, and the width of a bin indicates a value range of each bin value. The box separation method can remove noise, discretize continuous data and increase granularity. The method for separating the boxes in the embodiment of the invention can be a sampling equal-depth box separating method, an equal-width box separating method, a minimum entropy method or a user-defined interval method.
The data reduction unit is used for carrying out dimension reduction processing on the sample data after data cleaning and discarding the sample data.
The data reduction unit is used for reducing the cleaned data, and reducing the time of data exchange and subsequent data mining by reducing the data volume on the premise of keeping the original appearance of the data as much as possible. Dimension reduction is to reduce the number of features of data, discard unimportant features, and describe data with as few key features as possible.
And the change unit is used for converting the residual sample data after the dimension reduction processing into a form suitable for data mining in a smooth aggregation and data generalization mode.
Specifically, the variation unit may perform smoothing by an average value, smoothing by a boundary value, and smoothing by a median value at the time of the smoothing processing. For real data, the data is transformed by conceptual layering and discretization of the data.
Fig. 3 is a schematic entity structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 3, the electronic device may include: a processor (processor)310, a communication Interface (communication Interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may invoke a computer program stored on the memory 330 and executable on the processor 310 to perform the mining methods provided by the various embodiments described above, including, for example: collecting service data, performance data and log data as sample data; preprocessing the sample data, and converting the sample data into a form suitable for data mining; mining association rules of the preprocessed sample data to obtain a certain number of strong association rules, wherein each strong association rule at least comprises business data and performance data; and verifying the strong association rule, and if the strong association rule is bound to correspond to the occurrence of an abnormal phenomenon, taking the strong association rule as a self-healing rule.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or make a contribution to the prior art, or may be implemented in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Embodiments of the present invention further provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the mining method provided in the foregoing embodiments when executed by a processor, and the mining method includes: collecting service data, performance data and log data as sample data; preprocessing the sample data, and converting the sample data into a form suitable for data mining; mining association rules of the preprocessed sample data to obtain a certain number of strong association rules, wherein each strong association rule at least comprises business data and performance data; and verifying the strong association rule, and if the strong association rule is bound to correspond to the occurrence of an abnormal phenomenon, taking the strong association rule as a self-healing rule.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (9)
1. A self-healing rule mining method is characterized by comprising the following steps:
collecting service data, performance data and log data as sample data;
preprocessing the sample data, and converting the sample data into a form suitable for data mining;
mining association rules of the preprocessed sample data to obtain a certain number of strong association rules, wherein each strong association rule at least comprises business data and performance data;
and verifying the strong association rule, and if the strong association rule is bound to correspond to the occurrence of an abnormal phenomenon, taking the strong association rule as a self-healing rule.
2. The mining method of claim 1, wherein the pre-processing the sample data further comprises:
labeling the acquired index value through stream processing, and storing sample data in a corresponding database table according to the type of labeling;
correspondingly, the mining of the association rule is performed on the preprocessed sample data, and specifically comprises the following steps:
according to the label determined by the user, extracting sample data from the database table corresponding to the determined label to carry out association rule mining.
3. The mining method according to claim 1, wherein the association rule mining algorithm is Apriori algorithm.
4. A mining method as claimed in claim 1, wherein the validation of the strong association rules is carried out by using a random forest model.
5. The mining method according to claim 1, wherein the preprocessing of the sample data is specifically:
performing data cleaning on the sample data, including: deleting repeated data and irrelevant data, smoothing noise data and interpolating abnormal data and missing data;
carrying out dimension reduction processing on the sample data after data cleaning, and discarding the sample data;
and converting the sample data left after the dimension reduction processing into a form suitable for data mining in a smooth aggregation and data generalization mode.
6. A self-healing regular excavating device, comprising:
the sample data acquisition module is used for acquiring the service data, the performance data and the log data as sample data;
the preprocessing module is used for preprocessing the sample data and converting the sample data into a form suitable for data mining;
the association rule mining module is used for mining association rules of the preprocessed sample data to obtain a certain number of strong association rules, and each strong association rule at least comprises business data and performance data;
and the verification module is used for verifying the strong association rule, and if the strong association rule is bound to correspond to the occurrence of an abnormal phenomenon, the strong association rule is used as a self-healing rule.
7. Excavating device according to claim 6, wherein the preprocessing module is embodied as:
the data cleaning unit is used for performing data cleaning on the sample data, and comprises: deleting repeated data and irrelevant data, smoothing noise data and interpolating abnormal data and missing data;
the data reduction unit is used for carrying out dimension reduction processing on the sample data after the data cleaning and discarding the sample data;
and the change unit is used for converting the residual sample data after the dimension reduction processing into a form suitable for data mining in a smooth aggregation and data generalization mode.
8. An electronic device, comprising:
at least one processor; and
at least one memory communicatively coupled to the processor, wherein:
the memory stores program instructions executable by the processor, the processor invoking the program instructions to perform the mining method of any one of claims 1 to 5.
9. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the mining method of any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811437497.8A CN111241145A (en) | 2018-11-28 | 2018-11-28 | Self-healing rule mining method and device based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811437497.8A CN111241145A (en) | 2018-11-28 | 2018-11-28 | Self-healing rule mining method and device based on big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111241145A true CN111241145A (en) | 2020-06-05 |
Family
ID=70863736
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811437497.8A Pending CN111241145A (en) | 2018-11-28 | 2018-11-28 | Self-healing rule mining method and device based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111241145A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113420069A (en) * | 2021-06-24 | 2021-09-21 | 平安科技(深圳)有限公司 | Association rule mining method, system, terminal and storage medium based on abnormal samples |
CN113434404A (en) * | 2021-06-24 | 2021-09-24 | 北京同创永益科技发展有限公司 | Automatic service verification method and device for verifying reliability of disaster recovery backup system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104616210A (en) * | 2015-02-05 | 2015-05-13 | 河海大学常州校区 | Method for fusion reconstruction and interaction of intelligent power distribution network big data |
WO2016029570A1 (en) * | 2014-08-28 | 2016-03-03 | 北京科东电力控制系统有限责任公司 | Intelligent alert analysis method for power grid scheduling |
CN107301296A (en) * | 2017-06-27 | 2017-10-27 | 西安电子科技大学 | Circuit breaker failure influence factor method for qualitative analysis based on data |
CN108415789A (en) * | 2018-01-24 | 2018-08-17 | 西安交通大学 | Node failure forecasting system and method towards extensive mixing heterogeneous storage system |
CN108446184A (en) * | 2018-02-23 | 2018-08-24 | 北京天元创新科技有限公司 | Analyze the method and system of failure root primordium |
CN108650684A (en) * | 2018-02-12 | 2018-10-12 | 中国联合网络通信集团有限公司 | A kind of correlation rule determines method and device |
CN108768695A (en) * | 2018-04-27 | 2018-11-06 | 华为技术有限公司 | The problem of KQI localization method and device |
-
2018
- 2018-11-28 CN CN201811437497.8A patent/CN111241145A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016029570A1 (en) * | 2014-08-28 | 2016-03-03 | 北京科东电力控制系统有限责任公司 | Intelligent alert analysis method for power grid scheduling |
CN104616210A (en) * | 2015-02-05 | 2015-05-13 | 河海大学常州校区 | Method for fusion reconstruction and interaction of intelligent power distribution network big data |
CN107301296A (en) * | 2017-06-27 | 2017-10-27 | 西安电子科技大学 | Circuit breaker failure influence factor method for qualitative analysis based on data |
CN108415789A (en) * | 2018-01-24 | 2018-08-17 | 西安交通大学 | Node failure forecasting system and method towards extensive mixing heterogeneous storage system |
CN108650684A (en) * | 2018-02-12 | 2018-10-12 | 中国联合网络通信集团有限公司 | A kind of correlation rule determines method and device |
CN108446184A (en) * | 2018-02-23 | 2018-08-24 | 北京天元创新科技有限公司 | Analyze the method and system of failure root primordium |
CN108768695A (en) * | 2018-04-27 | 2018-11-06 | 华为技术有限公司 | The problem of KQI localization method and device |
Non-Patent Citations (1)
Title |
---|
邓晓衡;曾德天;: "基于AHP和混合Apriori-Genetic算法的交通事故成因分析模型" * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113420069A (en) * | 2021-06-24 | 2021-09-21 | 平安科技(深圳)有限公司 | Association rule mining method, system, terminal and storage medium based on abnormal samples |
CN113434404A (en) * | 2021-06-24 | 2021-09-24 | 北京同创永益科技发展有限公司 | Automatic service verification method and device for verifying reliability of disaster recovery backup system |
CN113420069B (en) * | 2021-06-24 | 2023-08-11 | 平安科技(深圳)有限公司 | Association rule mining method, system, terminal and storage medium based on abnormal samples |
CN113434404B (en) * | 2021-06-24 | 2024-03-19 | 北京同创永益科技发展有限公司 | Automatic service verification method and device for verifying reliability of disaster recovery system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112148772A (en) | Alarm root cause identification method, device, equipment and storage medium | |
CN112181758B (en) | Fault root cause positioning method based on network topology and real-time alarm | |
CN110471913A (en) | A kind of data cleaning method and device | |
CN110932899B (en) | Intelligent fault compression research method and system applying AI | |
CN111726248A (en) | Alarm root cause positioning method and device | |
CN115809183A (en) | Method for discovering and disposing information-creating terminal fault based on knowledge graph | |
CN114465874B (en) | Fault prediction method, device, electronic equipment and storage medium | |
CN112217674B (en) | Alarm root cause identification method based on causal network mining and graph attention network | |
CN109547251B (en) | Service system fault and performance prediction method based on monitoring data | |
Chu et al. | Prefix-graph: A versatile log parsing approach merging prefix tree with probabilistic graph | |
CN109656898A (en) | Distributed large-scale complex community detection method and device based on node degree | |
CN111241145A (en) | Self-healing rule mining method and device based on big data | |
CN113268370A (en) | Root cause alarm analysis method, system, equipment and storage medium | |
CN105630797A (en) | Data processing method and system | |
CN109993391A (en) | Distributing method, device, equipment and the medium of network O&M task work order | |
CN109993390A (en) | Alarm association and worksheet processing optimization method, device, equipment and medium | |
CN114647558A (en) | Method and device for detecting log abnormity | |
CN106096117B (en) | Uncertain graph key side appraisal procedure based on flow and reliability | |
CN110399278B (en) | Alarm fusion system and method based on data center anomaly monitoring | |
CN117034149A (en) | Fault processing strategy determining method and device, electronic equipment and storage medium | |
CN108243058A (en) | A kind of method and apparatus based on alarm positioning failure | |
CN115495587A (en) | Alarm analysis method and device based on knowledge graph | |
CN112750047B (en) | Behavior relation information extraction method and device, storage medium and electronic equipment | |
EP3855316A1 (en) | Optimizing breakeven points for enhancing system performance | |
CN114968933A (en) | Method and device for classifying logs of data center |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |