CN106708016A - fault monitoring method and device - Google Patents
fault monitoring method and device Download PDFInfo
- Publication number
- CN106708016A CN106708016A CN201611199335.6A CN201611199335A CN106708016A CN 106708016 A CN106708016 A CN 106708016A CN 201611199335 A CN201611199335 A CN 201611199335A CN 106708016 A CN106708016 A CN 106708016A
- Authority
- CN
- China
- Prior art keywords
- monitored
- destination
- data
- status data
- probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 98
- 238000012544 monitoring process Methods 0.000 title claims abstract description 64
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 50
- 238000003860 storage Methods 0.000 claims abstract description 27
- 238000012806 monitoring device Methods 0.000 claims abstract description 7
- 238000012545 processing Methods 0.000 claims description 33
- 230000007257 malfunction Effects 0.000 claims description 17
- 230000008439 repair process Effects 0.000 claims description 8
- 230000004888 barrier function Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 11
- 238000005516 engineering process Methods 0.000 abstract description 7
- 230000006870 function Effects 0.000 description 18
- 230000008569 process Effects 0.000 description 16
- 238000004458 analytical method Methods 0.000 description 15
- 101100289995 Caenorhabditis elegans mac-1 gene Proteins 0.000 description 10
- 238000010586 diagram Methods 0.000 description 7
- 230000015572 biosynthetic process Effects 0.000 description 5
- 238000003064 k means clustering Methods 0.000 description 5
- 238000012423 maintenance Methods 0.000 description 5
- 239000000047 product Substances 0.000 description 5
- 238000003786 synthesis reaction Methods 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 238000005070 sampling Methods 0.000 description 4
- 241001269238 Data Species 0.000 description 3
- 238000012937 correction Methods 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 238000009412 basement excavation Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000009795 derivation Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000011017 operating method Methods 0.000 description 2
- 230000003449 preventive effect Effects 0.000 description 2
- 230000000750 progressive effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007596 consolidation process Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013481 data capture Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000012467 final product Substances 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 235000015170 shellfish Nutrition 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
- 238000013024 troubleshooting Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B23/00—Testing or monitoring of control systems or parts thereof
- G05B23/02—Electric testing or monitoring
- G05B23/0205—Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults
- G05B23/0218—Electric testing or monitoring by means of a monitoring system capable of detecting and responding to faults characterised by the fault detection method dealing with either existing or incipient faults
- G05B23/0224—Process history based detection method, e.g. whereby history implies the availability of large amounts of data
- G05B23/024—Quantitative history assessment, e.g. mathematical relationships between available data; Functions therefor; Principal component analysis [PCA]; Partial least square [PLS]; Statistical classifiers, e.g. Bayesian networks, linear regression or correlation analysis; Neural networks
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Debugging And Monitoring (AREA)
Abstract
The embodiment of the application provides a fault monitoring method and a fault monitoring device, wherein the method comprises the following steps: acquiring state data of one or more target objects in the system; determining the probability of each target object failing according to the state data of the target objects; determining a target object with the probability of failure greater than a preset threshold value in the target objects as an object to be monitored; and determining the reason of the fault of the object to be monitored, and monitoring the object to be monitored according to the reason of the fault of the object to be monitored. According to the scheme, a distributed storage technology is utilized, and various algorithms are comprehensively applied to analyze the state data under a MapReduce framework, so that the fault probability of each target object in the system is predicted, and the target object to be monitored is monitored. The method solves the technical problems that the prior fault monitoring method can not early warn the potential fault in the system, has poor monitoring effect and low efficiency, and achieves the technical effect of effectively maintaining the system safety.
Description
Technical field
The application is related to oil exploration technical field of data processing, more particularly to a kind of failure monitoring method and device.
Background technology
In oil exploration data processing field, by the data information to be studied treatment is very huge, it is often necessary to
Using high performance computer cluster, work station and large-capacity and high-performance storage device etc. as seismic data process, explain
Platform or system, are processed with to oil exploration data.
When specifically being processed oil exploration data using above-mentioned platform or system, because the data volume for requiring treatment increases
Plus, cluster scale increase and various application software cross-reference so that cluster, work station, storage etc. are susceptible to all kinds of
Failure, influences the carrying out of production task, in turn results in loss.Therefore, how the failure of monitoring data processing platform or system, protect
The stability of card data processing platform (DPP) or system is increasingly subject to people's concern.
In order to ensure safe and stable, the failure in timely discovery system, existing failure monitoring of platform or system work
Method, generally by the status data for gathering each equipment, ratio is analyzed by by device status data and predetermined threshold value
Compared with, judge equipment whether failure.But, during specific implementation, the method can only find the equipment for having occurred and that failure, can only be to
Equipment through breaking down sends alarm, it is impossible to which the failure to that will occur carries out effective prediction, alarm and safeguards.
Therefore, existing failure monitoring method exists and potential failure in system can not be predicted, in monitoring system
The technical problem that accuracy is poor, efficiency is low of failure.
Regarding to the issue above, effective solution is not yet proposed at present.
The content of the invention
The embodiment of the present application provides a kind of failure monitoring method and device, to solve to be deposited in existing failure monitoring method
The technical problem that can not predict that incipient fault, the accuracy of monitoring system failure are low.
The embodiment of the present application provides a kind of failure monitoring method, including:
The status data of one or more destination objects in acquisition system;
According to the status data of one or more of destination objects, each is determined in one or more of destination objects
The probability that destination object breaks down;
Determine destination object of the probability broken down in described each destination object more than predetermined threshold value as waiting to supervise
Control object;
The reason for determining the reason for object to be monitored breaks down, and broken down according to the object to be monitored,
The object to be monitored is monitored.
In one embodiment, in acquisition system one or more destination objects status data, including:
According to interface type, the multiple destination object is divided into multiple clusters, wherein, the target in same cluster
Object uses same type of interface;
The destination object being pointed in same cluster obtains the status data using same data acquiring mode.
In one embodiment, the destination object in same cluster is pointed to is obtained using same data acquiring mode
After the status data, methods described also includes:
The status data of the destination object being located in different clusters is converted to the status data of same form.
In one embodiment, according to the status data of one or more of destination objects, determine it is one or
The probability that each destination object breaks down in multiple destination objects, including:
According to the status data of one or more of destination objects, it is determined that the shape with one or more of destination objects
One or more preset models of state data Corresponding matching;
According to one or more of preset models, each destination object hair in one or more of destination objects is determined
The probability of raw failure.
In one embodiment, the reason for object to be monitored breaks down is determined, including:
Status data according to the object to be monitored and the default mould matched with the status data of the object to be monitored
Type, determines the reason for object to be monitored breaks down.
In one embodiment, the reason for being broken down according to the object to be monitored, enters to the object to be monitored
Row monitoring, including:
The probability that the reason for being broken down according to the object to be monitored and the object to be monitored break down, perform with
At least one lower Business Processing:The object to be monitored that failure is had occurred and that in the system is repaired, deletes or replaces, repair,
Delete or replace the object to be monitored that failure is had not occurred in the system, to the system in object to be monitored send announcement
It is alert.
In one embodiment, sent out with the object to be monitored the reason for being broken down according to the object to be monitored
The probability of raw failure, performs after the Business Processing, and methods described also includes:
The result after the Business Processing as monitored results data is stored in knowledge data base;
According to the monitored results data, the preset model is corrected.
In one embodiment, in acquisition system one or more destination objects status data, including:
User is received by presetting the system problem that passage is uploaded;
Using the system problem as the status data.
In one embodiment, the multiple preset model is by under MapReduce frameworks, with preset algorithm
Obtain, wherein, the preset algorithm includes:Clustering algorithm and/or bayesian algorithm.
In one embodiment, the multiple preset model is by under MapReduce frameworks, with preset algorithm
Obtain, including:On distributed storage platform, the multiple preset model is by under MapReduce frameworks, with pre-
What imputation method was obtained.
In one embodiment, in the acquisition system after the status data of one or more destination objects, will be described
Status data in a distributed manner database form store in the knowledge data base.
Based on identical inventive concept, the embodiment of the present application additionally provides a kind of failure monitoring device, including:
State data acquisition module, for the status data of one or more destination objects in acquisition system;
Probability of malfunction determining module, for the status data according to one or more of destination objects, determines described one
The probability that each destination object breaks down in individual or multiple destination objects;
Object determining module to be monitored, for determining the probability broken down in described each destination object more than default
The destination object of threshold value is used as object to be monitored;
Object handles module to be monitored, for determining the reason for object to be monitored breaks down, and treats according to described
The reason for monitored object breaks down, is monitored to the object to be monitored.
In one embodiment, the state data acquisition module includes:
Assemblage classification unit, for according to interface type, the multiple destination object being divided into multiple clusters, wherein, position
Destination object in same cluster uses same type of interface;
Data acquisition unit, the destination object for being pointed in same cluster obtains institute using same data acquiring mode
State status data.
In one embodiment, the probability of malfunction determining module includes:
Preset model determining unit, for the status data according to one or more of destination objects, it is determined that with it is described
One or more preset models of the status data Corresponding matching of one or more destination objects;
Probability of malfunction determining unit, for according to one or more of preset models, determining one or more of mesh
The probability that each destination object breaks down in mark object.
In one embodiment, the object handles module to be monitored includes:
Failure cause determining unit, for the status data according to the object to be monitored and with the object to be monitored
The preset model of status data matching, determines the reason for object to be monitored breaks down;
Service Processing Unit, occurs the reason for for being broken down according to the object to be monitored with the object to be monitored
The probability of failure, performs the Business Processing of at least one of:Repair, delete or replace and have occurred and that failure in the system
Object to be monitored, repairs, deletes or replaces the object to be monitored that failure is had not occurred in the system, to the system in treat
Monitored object sends alarm.
In the embodiment of the present application, by Distributed Computing Platform (Hadoop platform), in MapReduce frameworks
Interior, integrated use clustering algorithm and bayesian algorithm carry out depth to the status data of each destination object in the system that collects
Enter analysis, obtain the probability that destination object breaks down, so can the destination object high to the probability that breaks down be monitored
Treatment, the generation of trouble saving.Solve and incipient fault in system can not be carried out present in existing failure monitoring method
The technical problem of prediction, has reached to having occurred in system and/or nonevent failure is while carry out early warning, improves monitoring failure
The technique effect of accuracy.
Brief description of the drawings
In order to illustrate more clearly of the embodiment of the present application or technical scheme of the prior art, below will be to embodiment or existing
The accompanying drawing to be used needed for having technology description is briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments described in application, for those of ordinary skill in the art, are not paying the premise of creative labor
Under, other accompanying drawings can also be obtained according to these accompanying drawings.
Fig. 1 is the process chart of the failure monitoring method according to the embodiment of the present application;
Fig. 2 is that the NB Algorithm in the failure monitoring method/device provided using the embodiment of the present application exists
The schematic diagram for realizing flow under MapReduce frameworks;
Fig. 3 is that the failure monitoring method/device provided using the embodiment of the present application obtains status data schematic diagram;
Fig. 4 is the composition structure chart of the failure monitoring device according to the embodiment of the present application;
Fig. 5 is that the failure monitoring method/device provided using the embodiment of the present application is carried out to the data system for exploring center
The schematic diagram of maintenance;
Fig. 6 is the generic state data collecting model in the failure monitoring method/device provided using the embodiment of the present application
Schematic diagram;
Fig. 7 is the synthesis under MapReduce frameworks in the failure monitoring method/device provided using the embodiment of the present application
With the schematic diagram of many Algorithm Analysis;
Fig. 8 is that the K-means clustering algorithms in the failure monitoring method/device provided using the embodiment of the present application exist
The schematic diagram for realizing flow under MapReduce frameworks.
, specific embodiment
In order that those skilled in the art more fully understand the technical scheme in the application, below in conjunction with the application reality
The accompanying drawing in example is applied, the technical scheme in the embodiment of the present application is clearly and completely described, it is clear that described implementation
Example is only some embodiments of the present application, rather than whole embodiments.Based on the embodiment in the application, this area is common
The every other embodiment that technical staff is obtained under the premise of creative work is not made, should all belong to the application protection
Scope.
In view of existing failure monitoring method, because simply by acquisition state data, by status data and predetermined threshold value
Carrying out simply compare, fully utilization state data, do not carry out deep analyzing and processing to status data, lead
When causing specific implementation, existing failure monitoring method presence can only find the failure having occurred and that, it is impossible to incipient fault in system
Carry out early warning, and technical problem poor to malfunction monitoring effect, that efficiency is slow.For the basic reason for producing above-mentioned technical problem,
The application considers can be by distributed storage method combination MapReduce frameworks, by integrated use many algorithms with abundant
Using the status data of each destination object, by intellectual analysis, determine that the probability of malfunction and failure of each destination object are produced
Reason, and then treat monitoring objective object and carry out preventive maintenance.Can not be right so as to solve that existing failure monitoring method is present
Incipient fault carries out early warning, the low technical problem of the failure monitoring degree of accuracy, has reached to having occurred in system and/or nonevent
Failure carries out early warning simultaneously, improves the technique effect of monitoring failure accuracy.
Based on above-mentioned thinking thinking, this application provides a kind of failure monitoring method.Refer to Fig. 1.What the application was provided
Failure monitoring method, may comprise steps of.
Step 101:The status data of one or more destination objects in acquisition system.
In one embodiment, the destination object can specifically include CPU, GPU, storage device, the net in system
Network attachment means and supporting infrastructure (such as radiator fan) etc..Certainly, it is necessary to explanation, above-mentioned cited target pair
As if in order to the embodiment of the present invention is better described, during specific implementation, can according to construction requirement select other relevant devices or
Device is used as destination object.In this regard, the application is not construed as limiting.
In one embodiment, due to multiple different types of destination objects may be included in a system or platform,
And the interface of different types of semantic object extraction status data is different.For example, the system of oil exploration data processing centre is just
Including multiple CPU and multiple storage device etc., and the interface of the status data of CPU can be obtained and storage device status number is obtained
According to interface and differ.Efficiency and the degree of accuracy of status data are obtained to improve, can be in units of cluster, by same
Cluster identical interface obtains the status data of each destination object in the cluster, specifically can be according to steps of processing:
S1:According to interface type, the multiple destination object is divided into multiple clusters, wherein, in same cluster
Destination object uses same type of interface.
S2:The destination object being pointed in same cluster obtains the status data using same data acquiring mode.
In one embodiment, in different clusters destination object status data form is different, and the mesh for directly obtaining
The status data form of mark object also differs and surely meets subsequent use requirement.For example, the shape of the destination object in CPU cluster
The status data form of destination object and differed in the form and storage device cluster of state data.Therefore, in order that must gather
Different clusters in the status data of destination object have unified data form so that the form of status data meets subsequent treatment
It is required that.Specifically, the destination object in same cluster is pointed to obtains the status data using same data acquiring mode
Afterwards, methods described can also include:The status data of the destination object being located in different clusters is converted into same form
Status data.
For example, CPU different in data handling system is divided into a CPU cluster, connect by CPU state data acquisition
Mouth obtains the status data of each CPU in CPU cluster.And the status data to each CPU in CPU cluster carries out unified lattice
Formula is changed so that the form phase of the form of the status data of each CPU in CPU cluster and other destination object status datas
Together, follow-up use requirement is met.GPU cluster, storage device cluster and network equally can also be respectively obtained in the manner described above
The status data of the destination object of attachment means cluster etc..The application, will not be repeated here.
In one embodiment, in order to improve the effect of the status data that each destination object is read in subsequent processes
Rate, improves the stability of status data, can be by the status data with HBase (Hadoop Database, distributed data
The abbreviation in storehouse) form storage and show.Specifically, the status data can be stored in knowledge number in the form of distributed data base
According in storehouse.It should be noted that store being different from general data library storage form using HBase forms, the method is by state
Data carry out storing displaying in column form, such that it is able to improve the efficiency of reading, and improve data stability.Certainly,
Can also be stored using other suitable databases as the case may be.In this regard, the application is not construed as limiting.
Step 102:According to the status data of one or more of destination objects, one or more of targets pair are determined
The probability that each destination object breaks down as in.
In one embodiment, in order to be predicted to the destination object not broken down in system, can be by dividing
The status data for analysing destination object determines the probability that destination object breaks down, the probability broken down according to destination object come
Whether prediction destination object future can break down.Specific implementation can include:
S1:According to the status data of one or more of destination objects, it is determined that with one or more of destination objects
Status data Corresponding matching one or more preset models.
In the present embodiment, it is determined that can be according to mesh with the preset model of the status data Corresponding matching of destination object
The status data of object is marked, the default mould minimum with status data difference value is determined in multiple preset models from knowledge data base
Type is used as the corresponding Matching Model.
It should be noted that in the present embodiment, in order to judge what destination object was matched according to status data exactly
Preset model, can differentiate the preset model corresponding to status data by NB Algorithm during specific implementation.For example, can
With the status data according to certain destination object, by Reduce tasks, calculate respectively each destination object belong to each preset
The probable value of category of model, and find out the classification of the preset model corresponding to maximum probability, as the destination object Corresponding matching
Preset model.
S2:According to one or more of preset models, each target pair in one or more of destination objects is determined
As the probability for breaking down.
In one embodiment, the multiple preset model is by under MapReduce frameworks, with preset algorithm
Obtain.That is, by under MapReduce frameworks, integrated use many algorithms are obtained the multiple preset model, wherein, institute
Stating preset algorithm, i.e. many algorithms includes:Clustering algorithm and bayesian algorithm.It should be noted that the clustering algorithm and shellfish
Leaf this algorithm is generally required and could efficiently and accurately run under MapReduce frameworks.And entirely MapReduce frameworks are general
Need synthetically run many algorithms, i.e., described clustering algorithm and pattra leaves on distributed storage platform (Hadoop platform) again
This algorithm, solves corresponding problem.
In one embodiment, the multiple preset model is by under MapReduce frameworks, the various calculations of integrated use
Method is obtained, including:On distributed storage platform, the multiple preset model by under MapReduce frameworks, integrated use
Many algorithms are obtained.Wherein, described MapReduce frameworks can be a kind of programming model framework, be used for large-scale data
The concurrent operation of collection (being more than 1TB)." Map (the mappings, for one group of key assignments it should be noted that the concept in MapReduce
To being mapped to one group of new key-value pair) " and " Reduce (for each in the key-value pair for ensureing all mappings share by reduction
Identical key group) ", all it is the characteristic according to Functional Programming and vector programming language, obtain.MapReduce frameworks
During specific implementation, can facilitate programming personnel will not distributed parallel program in the case of, by correspondence program operate in distribution
In formula system, parallel computation is realized, improve efficiency and the degree of accuracy of computing.In the present embodiment, it is many in order to be ready in advance
Individual preset model storage, can be on distributed storage platform (Hadoop platform) in knowledge data base, can be by being based on
Many algorithm synthesis algorithms of MapReduce frameworks, sample data is carried out abundant excavation treatment (including:Cluster, obtains multiple samples
This type and training, obtain multiple preset models), obtain accurate preset model.During specific implementation, can include:
S1:Clustering processing is carried out to multiple samples by K-means (i.e. the English name of K averaging methods) clustering algorithm, is obtained
To multiple sample types.Can specifically include:
1) concentrated from survey data central apparatus status data and choose k (plan classification number) individual sample data as center.
2) all data to the distance at each center are measured, a minimum range is found out, and is put under such, obtained final product
To initial sample type.
3) all kinds of centers are recalculated.The step of repeating 2,3, until meeting the threshold value of setting.In principal function, it is necessary to
The threshold value being designed correctly, and by iterative program, realize calling Map functions and the continuous of Reduce functions, until satisfaction sets
Fixed threshold value, you can to obtain multiple sample types.
S2:Multiple sample patterns are trained by NB Algorithm, obtain preset model.Can specifically refer to
Fig. 2, including:
S2-1:If X={ a1, a2..., amIt is an item to be sorted, and each a is a characteristic attribute of x.
S2-2:There are category set C={ y1, y2..., ym}。
S2-3:Calculate P (y1|x)、P(y2|x)、...P(yn|x)。
S2-4:If P (yk| x)=max { P (y1| x), P (y2| x) ..., P (yn| x) }, then x ∈ yk。
S2-5:By repeatedly test, according to actual recognition result, each characteristic attribute in sample type is directed to
The multiple correction of property, obtains preset model.
Wherein, X is status data to be analyzed, a1, a2..., amIt is each characteristic attribute data in data to be analyzed,
C is the set of multiple preset models, y1, y2..., ymIt is multiple preset models, P (y1|x)、P(y2| x) ..., P (yn| x) divide
Not for X belongs to y1, y2..., ymThe probable value of each preset model.
During specific implementation, Fig. 3 can also be referred to.Mac1 data strips can be in system some time point obtain each
The set of the status data of destination object, i.e., equivalent to an item to be sorted represented by above-mentioned X.Wherein, Mac1 data strips
In each lattice data correspond to each destination object a kind of status data.Data in i.e. each lattice are equivalent to above-mentioned
a1, a2..., amEach represented characteristic attribute data.It should be noted that can be according to specific in point at the same time
Situation obtains a plurality of different conditions data of same target.For example, in Fig. 3, the data 20% in the 5th small lattice, the 6th small lattice
In the small lattice of data 10 and the 7th in data 5, may each be the status data at certain CPU time point in system.Specifically,
20% can be the status data of the CPU remaining spaces, and 10 can be the state of Swap (exchange partition) service condition of the CPU
Data, 5 can be the status data of Buffer (buffer) service condition of the CPU.Accordingly, the C in above-mentioned formula is suitable
In the set of preset model.y1, y2..., ymEquivalent to each the specific preset model in preset model set.Such as y1,
y2..., ymCan be respectively cpu fault model, fan failure model, GPU fault models ... etc..It is wherein, described that each is pre-
If mould model can include the various status data values of corresponding each destination object.P (y are calculated respectively1|x)、P(y2|
X) ..., P (yn| x), equivalent to according to each status data value in Mac1 and each preset model y1, y2..., ymIn it is right
The similarity degree of each status data value answered, calculates Mac1 and belongs to y1, y2..., ymIn each preset model probable value, enter
And can judge that the system mode corresponding to Mac1 belongs to state corresponding to which kind of preset model according to these probable values, example
Such as, state when CPU breaks down, or the state that GPU breaks down, or other states are belonging to.Calculate P (yk| x)=
max{P(y1| x), P (y2| x) ..., P (yn| x) }, equivalent to the probable value for belonging to each preset model according to Mac1, it is determined that most
Preset model corresponding to greatest is the preset model of the most proximity corresponding to Mac1.And then can consider represented by Mac1
State is the state corresponding to the preset model.For example, belonging to preset model y according to Mac1 is calculated2Probable value it is maximum,
And preset model y2Corresponding situation is situation when CPU is overheated, therefore may determine that the time period of collection Mac1 data, is
There is the situation of certain CPU operation overheats in system.
It should be noted that in the above-described embodiment, for calculation procedure 3) in each conditional probability, specifically can be with
Process in the following manner:
S2-3-1:The item set to be sorted classified known to one is found, this set is called training sample set.
S2-3-2:Statistics obtains estimating in the conditional probability of lower each characteristic attribute of all categories.I.e.
Wherein, X is status data to be analyzed, a1, a2..., amIt is each characteristic attribute data in data to be analyzed,
P(a1|y1), P (a2|y1) ..., P (am|y1) ..., P (a1|yn), P (a2|yn) ..., P (am|yn) each characteristic attribute difference
Belong to y1, y2..., ymThe probable value of each preset model scope.
S2-3-3:If each characteristic attribute is conditional sampling, following derivation is had according to Bayes' theorem:
Because denominator is constant for all categories, therefore can be maximized molecule here.Again because each characteristic attribute
It is conditional sampling, so having:
Wherein, P (a1|yi)P(a2|yi)...P(am|yi) represent that each characteristic attribute belongs to preset model y respectivelyiIt is general
Rate, P (yi) represent preset model yiThe probability of generation, P (x) represents total probability, P (yi| x) represent that status data X belongs to default mould
Type yiProbability.
Step 103:Determine destination object of the probability more than predetermined threshold value broken down in described each destination object
As object to be monitored.
In one embodiment, predetermined threshold value can be set as the case may be.When destination object break down it is general
When rate is more than the predetermined threshold value, though the target destination object not yet breaks down, but may determine that the destination object have compared with
Failure risk high, i.e., be likely to break down, it is necessary to pay close attention to prevent in time in a following time period.Cause
This, can using the probability for breaking down more than predetermined threshold value destination object as object to be monitored carry out close supervision and other
Relevant treatment.
Step 104:Determine the reason for object to be monitored breaks down, and broken down according to the object to be monitored
The reason for, the object to be monitored is monitored.
In one embodiment, for the generation of trouble saving, process in time or early warning incipient fault, Ke Yijin
One step determines the reason for object to be monitored breaks down.Can specifically include according to the status data of the object to be monitored and with
The preset model of the status data matching of the object to be monitored, determines the reason for object to be monitored breaks down.Need
Illustrate, here with the preset model of object matching to be monitored obtained by great amount of samples data processing, and store
In knowledge data base.Wherein, the preset model is contained and the related bulk information of the monitored object.According to these letters
Breath, it may be determined that the reason for monitored object breaks down.
In one embodiment, in order to prevent the generation of incipient fault, the original that can be broken down according to monitored object
Cause, treatment is monitored to monitored object.Wherein, the monitoring can include the original broken down according to the object to be monitored
The probability that cause and the object to be monitored break down, performs the Business Processing of at least one of:Repair, delete or replace institute
State the object to be monitored that failure is had occurred and that in system, repair, delete or replace have not occurred in the system failure wait supervise
Control object, and to the system in object to be monitored send alarm.During specific implementation, above-mentioned one can be performed to monitored object
Monitoring is planted, can also above-mentioned various monitoring be performed to monitored object.It is, of course, also possible to as the case may be, using other than the above
Other suitable methods treat monitored object and are monitored treatment.In this regard, the application is not construed as limiting.
In one embodiment, treat monitored object for basis and be monitored treatment, specifically can be according to ITIL
(Information Technology Infrastructure Library, the abbreviation in IT infrastructure storehouse) flow,
The reason for being broken down according to the object to be monitored in the way of IT is serviced, is carried out corresponding specific to the object to be monitored
Monitoring is processed.
In one embodiment, in order to further improve the degree of accuracy to failure monitoring, can be according to the monitoring of feedback
Result is targetedly corrected to original preset model.I.e. the reason for being broken down according to the object to be monitored and institute
The probability that object to be monitored breaks down is stated, is performed after the Business Processing, methods described can also include:
S1:The result after the Business Processing as monitored results data is stored in knowledge data base.
S2:According to the monitored results data, the preset model is corrected.
Wherein, the correction can be the monitored results data according to feedback pointedly to preset model certain is specific
Parameter value is modified, or the weight of original characteristic parameter of preset model is modified.In this regard, the application is not
It is construed as limiting.
In one embodiment, in order to obtain more comprehensively more detailed status data, can extend and gather each target
The channel of Obj State data.Therefore, in acquisition system one or more destination objects status data, can specifically include:
S1:User is received by presetting the system problem that passage is uploaded.
S2:Using the system problem as the status data.
In the embodiment of the present application, compared to existing failure monitoring method, this method utilizes distributed storage technology,
Under MapReduce frameworks, the status data of each destination object by integrated use many algorithms to collecting fills
Analysis, obtains the probability of malfunction and failure cause of each destination object, and then the destination object that do not broken down can be entered
Row preventive maintenance.So as to solve that existing failure monitoring method is present early warning and monitoring can not be carried out to nonevent failure therefore
The low technical problem of the barrier degree of accuracy, has reached to having occurred and that with nonevent failure the technology effect while being monitored in system
Really.
A kind of failure monitoring device is additionally provided based on same inventive concept, in the embodiment of the present invention, such as following implementation
Example is described.Because the principle of device solve problem is similar to failure monitoring method, therefore the implementation of failure monitoring device can be joined
See the implementation of failure monitoring method, repeat part and repeat no more.Used below, term " unit " or " module " can be real
The combination of the software and/or hardware of existing predetermined function.Although the device described by following examples is preferably realized with software,
But hardware, or the realization of the combination of software and hardware is also that may and be contemplated.Fig. 4 is referred to, is implementation of the present invention
A kind of composition structure chart of the failure monitoring device of example, the device can include:State data acquisition module 401, probability of malfunction
Determining module 402, object determining module 403 to be monitored and object handles module 404 to be monitored, are carried out specifically to the structure below
Explanation.
State data acquisition module 401, can be used for the status data of one or more destination objects in acquisition system.
Probability of malfunction determining module 402, can be used for the status data according to one or more of destination objects, it is determined that
The probability that each destination object breaks down in one or more of destination objects.
Object determining module 403 to be monitored, is determined for out the probability broken down in described each destination object
More than predetermined threshold value destination object as object to be monitored.
Object handles module 404 to be monitored, the reason for being determined for the object to be monitored and break down, and according to
The reason for object to be monitored breaks down, is monitored to the object to be monitored.
In one embodiment, in order to improve efficiency and the degree of accuracy of acquisition state data, state data acquisition module
401 can include:
Assemblage classification unit, for according to interface type, the multiple destination object being divided into multiple clusters, wherein, position
Destination object in same cluster uses same type of interface.
Data capture unit, the destination object for being pointed in same cluster obtains institute using same data acquiring mode
State status data.
In one embodiment, in order to by the status data consolidation form of different-format in different clusters, the state
Data acquisition module can also include format conversion unit, and the status data of the destination object for that will be located in different clusters turns
It is changed to the status data of same form.
In one embodiment, in order in determining one or more of destination objects each destination object break down
Probability, probability of malfunction determining module 402 can include:
First determining unit, for the status data according to one or more of destination objects, it is determined that with it is one
Or one or more preset models of the status data Corresponding matching of multiple destination objects.It should be noted that first determines list
The preset model that unit can most be matched by NB Algorithm determination with status data.
Second determining unit, for according to one or more of preset models, determining one or more of targets pair
The probability that each destination object breaks down as in.
In one embodiment, in order to obtain multiple preset models, the probability of malfunction determining module 402 can also be wrapped
Include preset model and set up unit, for obtaining multiple sample datas;According to multiple sample datas, by K-means clustering algorithms,
Obtain multiple sample types;Multiple sample types are trained by NB Algorithm, obtain multiple preset models.
In one embodiment, in order to determine the reason for object to be monitored breaks down, object handles to be monitored
Module 404 can include failure cause determining unit, and reason is waited to supervise according to the status data of the object to be monitored and with described
The preset model of the status data matching of object is controlled, the reason for object to be monitored breaks down is determined.
In one embodiment, occur to be monitored to the object to be monitored with trouble saving or failure is entered
Go and process in time, object handles module 404 to be monitored can include processing unit, for there is event according to the object to be monitored
The probability that the reason for barrier and the object to be monitored break down, performs the Business Processing of at least one of:Repair, delete or
The object to be monitored that failure is had occurred and that in the system is replaced, to be repaired, delete or is replaced and have not occurred failure in the system
Object to be monitored, to the system in object to be monitored send alarm.
In one embodiment, in order to improve the degree of accuracy of preset model, and then the precision of monitoring failure is improved, it is described
Probability of malfunction determining module 402 can also include preset model correct unit, for using the result after the Business Processing as
Monitored results data, are stored in knowledge data base;According to the monitored results data, specific aim is carried out to the preset model
Correction.
In one embodiment, in order to gather more comprehensive and accurate status data, state data acquisition module 401 can
With including for feedback unit, for receiving user by presetting the system problem that passage is uploaded;And make the system problem
It is the status data.
Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment
Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.Especially for system reality
Apply for example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method
Part explanation.
It should be noted that system, device, module or unit that above-mentioned implementation method is illustrated, specifically can be by computer
Chip or entity are realized, or are realized by the product with certain function.For convenience of description, in this manual, retouch
It is divided into various units with function when stating apparatus above to describe respectively.Certainly, can be the function of each unit when the application is implemented
Realized in same or multiple softwares and/or hardware.
Additionally, in this manual, adjective as such as first and second can be only used for an element or dynamic
Make to be made a distinction with another element or action, without requiring or implying any actual this relation or order.Permit in environment
Perhaps in the case of, in only element, part or step is should not be interpreted as limited to reference to element or part or step (s)
It is individual, and can be in element, part or step one or more etc..
As can be seen from the above description, the embodiment of the present application is provided the monitoring of failure side method and device, using distribution
Formula storage platform, by the shape of each destination object of the integrated use many algorithms to collecting under MapReduce frameworks
State data carry out intellectual analysis treatment, obtain the probability of malfunction and failure cause of each destination object, and then can be in system
The destination object for not yet breaking down is monitored and prevents.Solving can not be to sending out present in existing failure monitoring method
Raw failure carries out early warning and monitoring, monitor the low technical problem of the degree of accuracy of failure, has reached to being had occurred and that in system
Failure and nonevent failure are monitored simultaneously, and improve the technique effect of the failure monitoring degree of accuracy;Further through by each mesh
Mark object is divided into the status data of each destination object in corresponding cluster, and then the same cluster of acquisition, and to same cluster
In the status data of each destination object be uniformly processed, improve the efficiency of state data acquisition, also reduce status number
According to error;Also combined by application distribution formula (Hadoop) storage platform many by integrated use under MapReduce frameworks
Plant algorithm carries out deep excavation to status data, obtains the probability of malfunction and failure cause of each destination object, further carries
The degree of accuracy of failure monitoring high;Additionally, by according to probability of malfunction and failure cause to system in have occurred and that and do not send out
Raw failure is targetedly prevented or maintenance is processed, and reaches the technique effect of effective maintenance system stabilization;In addition, always according to
Monitored results are targetedly corrected to preset model, improve the precision of preset model, are reached and further improve failure prison
Control the technique effect of the degree of accuracy.
Scene is embodied at one, number of the failure monitoring method/device to survey data center is provided using the application
Failure monitoring is carried out according to system.
Fig. 5 can be referred to, is to propose that failure monitoring method/device safeguards the data system at survey data center using the application
The schematic diagram of system.Can specifically include:
1) data monitoring and acquisition module
By integrating, realize that data center's sorts of systems (CPU cluster, GPU cluster, storage, network, infrastructure) is discrete
The integrated monitoring of module.
2) ITIL process modules:
Failure is found by monitoring system, and is committed to ITIL (Information Technology automatically
Infrastructure Library, the abbreviation in IT infrastructure storehouse) flow, realize efficiently IT services.
User unifies to be encountered problems in submission research and production by ITIL information desks, and the processing procedure of problem has detailed
Log recording, user and administrative staff can be tracked to issue handling process and result.
3) fault processing module based on Hadoop platform
Analyzed by warning data acquisition, fault filtering, failure dependency, quickly position and solve all kinds of failures.
Analyzed by many algorithm synthesis under MapReduce frameworks, find out in system potential failure and by failure report
Accuse, realize Initiative Defense in advance.
4) knowledge base and performance evaluation
Set up a performance point for integrating Data Integration, information inquiry, on-line analysis, multidimensional analysis, dynamic statement
Analysis system, can carry out information analysis with aid decision making person from multi-angle.Statistics including various resources, the statistics of situation on duty,
The statistics of routine work;Again various statistical items are set up with index, and decision-making is formulated according to index or indicator combination.
The linkage of many algorithm synthesis analysis under database realizing and ITIL flows, MapReduce frameworks, enables knowledge base
Enough constantly addition new knowledges, strengthen the ability of troubleshooting.
It should be noted that above-mentioned data monitoring and acquisition module, can refer to shown in Fig. 6, i.e. this Shen during specific implementation
Please embodiment proposition generic state data acquisition, the collection of all kinds of survey data central apparatus of completion.Including:
1) various kinds of equipment provides different protocol interfaces, and such as CPU/GPU clusters obtain facility information in SSH modes, and deposit
Storage equipment generally provides SMI_S agreements.As the case may be, various kinds of equipment status information is obtained.
2) to the data for gathering, using generic state data conversion module, the uniform data storage of all data is realized
(HBase) and uniform data displaying.
The above-mentioned fault processing module based on Hadoop platform, when carrying out accident analysis, can specifically refer to shown in Fig. 7,
Many algorithm synthesis analysis models under the MapReduce frameworks that i.e. the embodiment of the present application is proposed, in being also whole processing module
Core.Including:
1) state acquisition module completes the collection of each equipment running status data in survey data center, by unified mould
Type, realizes the state data acquisition of CPU cluster, GPU cluster, the network equipment, storage device;
2) status data memory module uses HBase, realizes that the huge status data of dynamic time sequence, historical data is efficiently deposited
Storage;
3) analysis and processing module of running state data is the core content of this paper, is included in reality under MapReduce frameworks
Two existing algorithms.Wherein K-Means clustering algorithms are clustered to running state data, and the running status of each generation is gathered
Class center is used as sample, shape sample knowledge storehouse;Bayes is trained to each knowledge base, and testing data is differentiated,
Finally reach fault pre-alarming.
When specifically carrying out accident analysis using fault processing module, can refer to shown in Fig. 8, be that the embodiment of the present application is proposed
K-means clustering algorithms realize flow under MapReduce frameworks.
Wherein, K-means clustering algorithms are a processes for iteration, specifically, can follow the steps below iteration:
S1:Concentrated from survey data central apparatus status data and choose k (plan classification number) individual data as center.
S2:All data to the distance at each center are measured, a minimum range is found out, and is put under such.
S3:Recalculate all kinds of centers.
Step S2 and step S3 is repeated, until meeting the threshold value of setting., it is necessary to the threshold being designed correctly in principal function
Value, and by iterative program, realize calling Map functions and the continuous of Reduce functions, until meeting the threshold value of setting.
Refering to shown in Fig. 2, being realization of the NB Algorithm under MapReduce frameworks that the embodiment of the present application is proposed
Process.
Wherein, Naive Bayesian Classifier be it is a kind of based on statistical sorting technique, including training and
PANBIE two parts.Specific implementation can include:
S1:If X={ a1, a2..., amIt is an item to be sorted, and each a is a characteristic attribute of x.
S2:There are category set C={ y1, y2..., ym}。
S3:Calculate P (y1|x)、P(y2| x) ..., P (yn|x)。
S4:If P (yk| x)=max { P (y1| x), P (y2| x) ..., P (yn| x) }, then x ∈ yk。
So present key is how to calculate each conditional probability in the 3rd step, can be with during specific implementation:
S3-1:The item set to be sorted classified known to one is found, this set is called training sample set.
S3-2:Statistics obtains estimating in the conditional probability of lower each characteristic attribute of all categories.I.e.
S3-3:If each characteristic attribute is conditional sampling, following derivation is had according to Bayes' theorem:
Because denominator is constant for all categories, as long as all may be used because we maximize molecule.Again because each feature
Attribute is conditional sampling, so having:
It should be noted that the process that algorithm runs under MapReduce frameworks, can specifically include three below step:
S1:Data preparation stage, realizes the slitting of data;
S2:The data classification based training stage, Map task computations each classification P (yi) value;
S3:Data sorting phase, Reduce task computations each classification P (x | yi)P(yi), and find out maximum P (x |
yi)P(yi), the classification that as certain sample to be tested belongs to.
It is applied in specific implement scene by the failure monitoring method/device for providing the embodiment of the present application, is verified
What the embodiment of the present application provided that failure monitoring method/device really can solve that existing failure monitoring method is present can not send out
Incipient fault in existing system, the low technical problem of the monitoring failure degree of accuracy, has reached same with nonevent failure to having occurred and that
When the technique effect that is monitored and processes.
Although mentioning different failure monitoring methods or device in teachings herein, the application is not limited to must
Must be professional standard or the situation described by embodiment etc., some professional standards or be described using self-defined mode or embodiment
Practice processes on embodiment amended slightly can also realize above-described embodiment it is identical, equivalent or close or deformation after
It is anticipated that implementation result.Using the embodiment of data acquisition, treatment, output, the judgment mode after these modifications or deformation etc.,
Still within the scope of may belong to the optional embodiment of the application.
Although this application provides the method operating procedure as described in embodiment or flow chart, based on conventional or noninvasive
The means of the property made can include more or less operating procedures.The step of being enumerated in embodiment order is only numerous steps
A kind of mode in execution sequence, unique execution sequence is not represented.When device or client production in practice is performed, can
Performed or executed in parallel (such as at parallel processor or multithreading with according to embodiment or method shown in the drawings order
The environment of reason, even distributed analysis processing environment).Term " including ", "comprising" or its any other variant be intended to contain
Lid nonexcludability is included, so that process, method, product or equipment including a series of key elements not only will including those
Element, but also other key elements including being not expressly set out, or also include being this process, method, product or equipment
Intrinsic key element.In the absence of more restrictions, be not precluded from the process including the key element, method, product or
Also there are other identical or equivalent elements in person's equipment.
Device that above-described embodiment is illustrated or module etc., can specifically be realized by computer chip or entity, or by having
There is the product of certain function to realize.For convenience of description, it is divided into various modules with function during description apparatus above to retouch respectively
State.Certainly, the function of each module can be realized in same or multiple softwares and/or hardware when the application is implemented,
Can will realize that the module of same function is realized by the combination of multiple submodule.Device embodiment described above is only
Schematically, for example, the division of the module, only a kind of division of logic function, can there is other drawing when actually realizing
The mode of dividing, such as multiple module or components can be combined or be desirably integrated into another system, or some features can be ignored,
Or do not perform.
It is also known in the art that in addition to realizing controller in pure computer readable program code mode, it is complete
Entirely can by by method and step carry out programming in logic come cause controller with gate, switch, application specific integrated circuit, may be programmed
Logic controller realizes identical function with the form of embedded microcontroller etc..Therefore this controller is considered one kind
Hardware component, and the device for realizing various functions included to its inside can also be considered as the structure in hardware component.Or
Person even, can be used to realizing that the device of various functions is considered as not only being the software module of implementation method but also can be hardware
Structure in part.
The application can be described in the general context of computer executable instructions, such as program
Module.Usually, program module includes performing particular task or realizes routine, program, object, the group of particular abstract data type
Part, data structure, class etc..The application can also be in a distributed computing environment put into practice, in these DCEs,
Task is performed by the remote processing devices connected by communication network.In a distributed computing environment, program module can
With in the local and remote computer-readable storage medium including including storage device.
As seen through the above description of the embodiments, those skilled in the art can be understood that the application can
Realized by the mode of software plus required general hardware platform.Based on such understanding, the technical scheme essence of the application
On the part that is contributed to prior art in other words can be embodied in the form of software product, the computer software product
Can store in storage medium, such as ROM/RAM, magnetic disc, CD, including some instructions are used to so that a computer equipment
(can be personal computer, mobile terminal, server, or network equipment etc.) performs each embodiment of the application or implementation
Method described in some parts of example.
Each embodiment in this specification is described by the way of progressive, same or analogous portion between each embodiment
Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.The application can be used for crowd
In more general or special purpose computing system environments or configuration.For example:Personal computer, server computer, handheld device or
Portable set, laptop device, multicomputer system, the system based on microprocessor, set top box, programmable electronics set
Standby, network PC, minicom, mainframe computer, the DCE including any of the above system or equipment etc..
Although depicting the application by embodiment, it will be appreciated by the skilled addressee that the application have it is many deformation and
Change is without deviating from spirit herein, it is desirable to which appended claim includes these deformations and changes without deviating from the application.
Claims (15)
1. a kind of failure monitoring method, it is characterised in that including:
The status data of one or more destination objects in acquisition system;
According to the status data of one or more of destination objects, each target in one or more of destination objects is determined
The probability that object breaks down;
Destination object using the probability broken down in described each destination object more than predetermined threshold value is used as object to be monitored;
The reason for determining the reason for object to be monitored breaks down, and broken down according to the object to be monitored, to institute
Object to be monitored is stated to be monitored.
2. method according to claim 1, it is characterised in that the status number of one or more destination objects in acquisition system
According to, including:
According to interface type, the multiple destination object is divided into multiple clusters, wherein, the destination object in same cluster
Using same type of interface;
The destination object being pointed in same cluster obtains the status data using same data acquiring mode.
3. method according to claim 2, it is characterised in that the destination object in same cluster is pointed to is using same
Data acquiring mode is obtained after the status data, and methods described also includes:
The status data of the destination object being located in different clusters is converted to the status data of same form.
4. method according to claim 1, it is characterised in that according to the status number of one or more of destination objects
According to, determine the probability that each destination object breaks down in one or more of destination objects, including:
According to the status data of one or more of destination objects, it is determined that the status number with one or more of destination objects
According to one or more preset models of Corresponding matching;
According to one or more of preset models, in determining one or more of destination objects there is event in each destination object
The probability of barrier.
5. method according to claim 4, it is characterised in that determine the reason for object to be monitored breaks down, wraps
Include:
Status data according to the object to be monitored and the preset model matched with the status data of the object to be monitored, really
The reason for fixed object to be monitored breaks down.
6. method according to claim 4, it is characterised in that the reason for being broken down according to the object to be monitored is right
The object to be monitored is monitored, including:
The probability that the reason for being broken down according to the object to be monitored and the object to be monitored break down, execution is following extremely
One of few Business Processing:The object to be monitored that failure is had occurred and that in the system is repaired, deleted or replaced, repaired, deleted
Or replace the object to be monitored that failure is had not occurred in the system, to the system in object to be monitored send alarm.
7. method according to claim 6, it is characterised in that the reason for being broken down according to the object to be monitored and
The probability that the object to be monitored breaks down, performs after the Business Processing, and methods described also includes:
The result after the Business Processing as monitored results data is stored in knowledge data base;
According to the monitored results data, the preset model is corrected.
8. method according to claim 1, it is characterised in that the status number of one or more destination objects in acquisition system
According to, including:
User is received by presetting the system problem that passage is uploaded;
Using the system problem as the status data.
9. method according to claim 4, it is characterised in that the multiple preset model is by MapReduce frames
Under frame, obtained with preset algorithm, wherein, the preset algorithm includes:Clustering algorithm and/or bayesian algorithm.
10. method according to claim 9, it is characterised in that the multiple preset model is by MapReduce frames
Under frame, obtained with preset algorithm, including:On distributed storage platform, the multiple preset model is by described
Under MapReduce frameworks, obtained with the preset algorithm.
11. methods according to claim 1, it is characterised in that the shape of one or more destination objects in acquisition system
After state data, by the status data in a distributed manner database form store in the knowledge data base.
A kind of 12. failure monitoring devices, it is characterised in that including:
State data acquisition module, for the status data of one or more destination objects in acquisition system;
Probability of malfunction determining module, for the status data according to one or more of destination objects, determine it is one or
The probability that each destination object breaks down in multiple destination objects;
Object determining module to be monitored, for mesh of the probability more than predetermined threshold value that will be broken down in described each destination object
Mark object is used as object to be monitored;
Object handles module to be monitored, for determining the reason for object to be monitored breaks down, and according to described to be monitored
The reason for object breaks down, is monitored to the object to be monitored.
13. devices according to claim 12, it is characterised in that the state data acquisition module includes:
Assemblage classification unit, for according to interface type, the multiple destination object being divided into multiple clusters, wherein, positioned at same
Destination object in one cluster uses same type of interface;
Data acquisition unit, the destination object for being pointed in same cluster obtains the shape using same data acquiring mode
State data.
14. devices according to claim 12, it is characterised in that the probability of malfunction determining module includes:
Preset model determining unit, for the status data according to one or more of destination objects, it is determined that with it is one
Or one or more preset models of the status data Corresponding matching of multiple destination objects;
Probability of malfunction determining unit, for according to one or more of preset models, determining one or more of targets pair
The probability that each destination object breaks down as in.
15. devices according to claim 12, it is characterised in that the object handles module to be monitored includes:
Failure cause determining unit, for the status data according to the object to be monitored and the state with the object to be monitored
The preset model of Data Matching, determines the reason for object to be monitored breaks down;
Service Processing Unit, breaks down the reason for for being broken down according to the object to be monitored with the object to be monitored
Probability, perform at least one of Business Processing:Repair, delete or replace have occurred and that in the system failure wait supervise
Control object, repairs, deletes or replaces the object to be monitored that failure is had not occurred in the system, to the system in it is to be monitored
Object sends alarm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611199335.6A CN106708016B (en) | 2016-12-22 | 2016-12-22 | fault monitoring method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611199335.6A CN106708016B (en) | 2016-12-22 | 2016-12-22 | fault monitoring method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106708016A true CN106708016A (en) | 2017-05-24 |
CN106708016B CN106708016B (en) | 2019-12-10 |
Family
ID=58901981
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611199335.6A Active CN106708016B (en) | 2016-12-22 | 2016-12-22 | fault monitoring method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106708016B (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107483252A (en) * | 2017-08-22 | 2017-12-15 | 深圳企管加企业服务有限公司 | Calculator room equipment fault early warning system based on Internet of Things |
CN107491021A (en) * | 2017-08-21 | 2017-12-19 | 无锡小天鹅股份有限公司 | Household electrical appliance and its fault diagnosis system, method and server |
CN107608866A (en) * | 2017-08-22 | 2018-01-19 | 深圳企管加企业服务有限公司 | Calculator room equipment fault early warning method, device and storage medium based on Internet of Things |
CN108470242A (en) * | 2018-03-08 | 2018-08-31 | 阿里巴巴集团控股有限公司 | Risk management and control method, device and server |
CN108846429A (en) * | 2018-05-31 | 2018-11-20 | 清华大学 | Cyberspace resource automatic classification method and device based on unsupervised learning |
WO2019036924A1 (en) * | 2017-08-22 | 2019-02-28 | 深圳企管加企业服务有限公司 | Machine room device fault early-warning system based on internet of things |
CN109656793A (en) * | 2018-11-22 | 2019-04-19 | 安徽继远软件有限公司 | A kind of information system performance stereoscopic monitoring method based on multi-source heterogeneous data fusion |
CN109842521A (en) * | 2019-01-28 | 2019-06-04 | 西安科技大学 | A kind of mobile terminal delay machine forecasting system and method |
CN110164101A (en) * | 2019-04-09 | 2019-08-23 | 烽台科技(北京)有限公司 | A kind of method and apparatus handling warning message |
CN110322583A (en) * | 2018-03-30 | 2019-10-11 | 欧姆龙株式会社 | Abnormality detection system supports device and method for detecting abnormal |
CN110351150A (en) * | 2019-07-26 | 2019-10-18 | 中国工商银行股份有限公司 | Fault rootstock determines method and device, electronic equipment and readable storage medium storing program for executing |
CN110968061A (en) * | 2019-12-06 | 2020-04-07 | 珠海格力电器股份有限公司 | Equipment fault early warning method and device, storage medium and computer equipment |
CN111489539A (en) * | 2019-01-29 | 2020-08-04 | 珠海格力电器股份有限公司 | Household appliance system fault early warning method, system and device |
CN111931323A (en) * | 2019-04-28 | 2020-11-13 | 中国石油化工股份有限公司 | Memory, hydrocracking equipment fault prediction method, device and equipment |
CN112350840A (en) * | 2019-08-08 | 2021-02-09 | 中移物联网有限公司 | Fault monitoring and repairing method and related equipment |
CN112711912A (en) * | 2020-12-30 | 2021-04-27 | 许昌学院 | Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm |
CN113127984A (en) * | 2019-12-31 | 2021-07-16 | 中移(上海)信息通信科技有限公司 | Method, device, equipment and storage medium for equipment maintenance |
CN113297045A (en) * | 2020-07-27 | 2021-08-24 | 阿里巴巴集团控股有限公司 | Monitoring method and device for distributed system |
CN113689042A (en) * | 2021-08-25 | 2021-11-23 | 华自科技股份有限公司 | Fault source prediction method for monitoring node |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090094076A1 (en) * | 2007-10-05 | 2009-04-09 | Reddy Sudhakar Y | Method and system using linear programming for estimating test costs for bayesian diagnostic models |
CN101763589A (en) * | 2009-12-24 | 2010-06-30 | 宁波市中控信息技术有限公司 | Safety management method and system based on dynamic quantitative accident risk prediction |
CN101872165A (en) * | 2010-06-13 | 2010-10-27 | 西安交通大学 | Method for fault diagnosis of wind turbines on basis of genetic neural network |
CN101950327A (en) * | 2010-09-09 | 2011-01-19 | 西北工业大学 | Equipment state prediction method based on fault tree information |
CN102439568A (en) * | 2009-11-19 | 2012-05-02 | 索尼公司 | System health and performance care of computing devices |
US8230262B2 (en) * | 2010-07-02 | 2012-07-24 | Oracle International Corporation | Method and apparatus for dealing with accumulative behavior of some system observations in a time series for Bayesian inference with a static Bayesian network model |
CN102664961A (en) * | 2012-05-04 | 2012-09-12 | 北京邮电大学 | Method for anomaly detection in MapReduce environment |
CN103064340A (en) * | 2011-10-21 | 2013-04-24 | 沈阳高精数控技术有限公司 | Failure prediction method facing to numerically-controlled machine tool |
CN103199919A (en) * | 2013-04-19 | 2013-07-10 | 重庆邮电大学 | Multi-parameter-sensed high-accuracy network fault screening and positioning system and method |
CN103338261A (en) * | 2013-07-04 | 2013-10-02 | 北京泰乐德信息技术有限公司 | Storage and processing method and system of rail transit monitoring data |
CN103617110A (en) * | 2013-11-11 | 2014-03-05 | 国家电网公司 | Server device condition maintenance system |
CN104184819A (en) * | 2014-08-29 | 2014-12-03 | 城云科技(杭州)有限公司 | Multi-hierarchy load balancing cloud resource monitoring method |
CN104391211A (en) * | 2014-12-12 | 2015-03-04 | 国网山西省电力公司电力科学研究院 | On-line detection system for condition-based maintenance of series compensation device |
CN104834579A (en) * | 2014-02-10 | 2015-08-12 | 富士施乐株式会社 | Failure predictive system and failure predictive apparatus |
CN105095963A (en) * | 2015-08-17 | 2015-11-25 | 中国空气动力研究与发展中心高速空气动力研究所 | Method for accurately diagnosing and predicting fault of wind tunnel equipment |
WO2016138067A1 (en) * | 2015-02-24 | 2016-09-01 | Cloudlock, Inc. | System and method for securing an enterprise computing environment |
CN106067088A (en) * | 2016-05-30 | 2016-11-02 | 中国邮政储蓄银行股份有限公司 | E-bank accesses detection method and the device of behavior |
-
2016
- 2016-12-22 CN CN201611199335.6A patent/CN106708016B/en active Active
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090094076A1 (en) * | 2007-10-05 | 2009-04-09 | Reddy Sudhakar Y | Method and system using linear programming for estimating test costs for bayesian diagnostic models |
CN102439568A (en) * | 2009-11-19 | 2012-05-02 | 索尼公司 | System health and performance care of computing devices |
CN101763589A (en) * | 2009-12-24 | 2010-06-30 | 宁波市中控信息技术有限公司 | Safety management method and system based on dynamic quantitative accident risk prediction |
CN101872165A (en) * | 2010-06-13 | 2010-10-27 | 西安交通大学 | Method for fault diagnosis of wind turbines on basis of genetic neural network |
US8230262B2 (en) * | 2010-07-02 | 2012-07-24 | Oracle International Corporation | Method and apparatus for dealing with accumulative behavior of some system observations in a time series for Bayesian inference with a static Bayesian network model |
CN101950327A (en) * | 2010-09-09 | 2011-01-19 | 西北工业大学 | Equipment state prediction method based on fault tree information |
CN103064340A (en) * | 2011-10-21 | 2013-04-24 | 沈阳高精数控技术有限公司 | Failure prediction method facing to numerically-controlled machine tool |
CN102664961A (en) * | 2012-05-04 | 2012-09-12 | 北京邮电大学 | Method for anomaly detection in MapReduce environment |
CN103199919A (en) * | 2013-04-19 | 2013-07-10 | 重庆邮电大学 | Multi-parameter-sensed high-accuracy network fault screening and positioning system and method |
CN103338261A (en) * | 2013-07-04 | 2013-10-02 | 北京泰乐德信息技术有限公司 | Storage and processing method and system of rail transit monitoring data |
CN103617110A (en) * | 2013-11-11 | 2014-03-05 | 国家电网公司 | Server device condition maintenance system |
CN104834579A (en) * | 2014-02-10 | 2015-08-12 | 富士施乐株式会社 | Failure predictive system and failure predictive apparatus |
CN104184819A (en) * | 2014-08-29 | 2014-12-03 | 城云科技(杭州)有限公司 | Multi-hierarchy load balancing cloud resource monitoring method |
CN104391211A (en) * | 2014-12-12 | 2015-03-04 | 国网山西省电力公司电力科学研究院 | On-line detection system for condition-based maintenance of series compensation device |
WO2016138067A1 (en) * | 2015-02-24 | 2016-09-01 | Cloudlock, Inc. | System and method for securing an enterprise computing environment |
CN105095963A (en) * | 2015-08-17 | 2015-11-25 | 中国空气动力研究与发展中心高速空气动力研究所 | Method for accurately diagnosing and predicting fault of wind tunnel equipment |
CN106067088A (en) * | 2016-05-30 | 2016-11-02 | 中国邮政储蓄银行股份有限公司 | E-bank accesses detection method and the device of behavior |
Non-Patent Citations (1)
Title |
---|
薄翠梅: ""基于核函数和概率神经网络的TE过程监控研究"", 《第二十六届中国控制会议论文集》 * |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107491021A (en) * | 2017-08-21 | 2017-12-19 | 无锡小天鹅股份有限公司 | Household electrical appliance and its fault diagnosis system, method and server |
CN107483252A (en) * | 2017-08-22 | 2017-12-15 | 深圳企管加企业服务有限公司 | Calculator room equipment fault early warning system based on Internet of Things |
CN107608866A (en) * | 2017-08-22 | 2018-01-19 | 深圳企管加企业服务有限公司 | Calculator room equipment fault early warning method, device and storage medium based on Internet of Things |
WO2019036924A1 (en) * | 2017-08-22 | 2019-02-28 | 深圳企管加企业服务有限公司 | Machine room device fault early-warning system based on internet of things |
CN108470242A (en) * | 2018-03-08 | 2018-08-31 | 阿里巴巴集团控股有限公司 | Risk management and control method, device and server |
CN108470242B (en) * | 2018-03-08 | 2022-03-22 | 创新先进技术有限公司 | Risk management and control method, device and server |
CN110322583A (en) * | 2018-03-30 | 2019-10-11 | 欧姆龙株式会社 | Abnormality detection system supports device and method for detecting abnormal |
CN108846429A (en) * | 2018-05-31 | 2018-11-20 | 清华大学 | Cyberspace resource automatic classification method and device based on unsupervised learning |
CN109656793A (en) * | 2018-11-22 | 2019-04-19 | 安徽继远软件有限公司 | A kind of information system performance stereoscopic monitoring method based on multi-source heterogeneous data fusion |
CN109842521A (en) * | 2019-01-28 | 2019-06-04 | 西安科技大学 | A kind of mobile terminal delay machine forecasting system and method |
CN111489539A (en) * | 2019-01-29 | 2020-08-04 | 珠海格力电器股份有限公司 | Household appliance system fault early warning method, system and device |
CN110164101A (en) * | 2019-04-09 | 2019-08-23 | 烽台科技(北京)有限公司 | A kind of method and apparatus handling warning message |
CN111931323A (en) * | 2019-04-28 | 2020-11-13 | 中国石油化工股份有限公司 | Memory, hydrocracking equipment fault prediction method, device and equipment |
CN111931323B (en) * | 2019-04-28 | 2022-08-12 | 中国石油化工股份有限公司 | Memory, hydrocracking equipment fault prediction method, device and equipment |
CN110351150A (en) * | 2019-07-26 | 2019-10-18 | 中国工商银行股份有限公司 | Fault rootstock determines method and device, electronic equipment and readable storage medium storing program for executing |
CN110351150B (en) * | 2019-07-26 | 2022-08-16 | 中国工商银行股份有限公司 | Fault source determination method and device, electronic equipment and readable storage medium |
CN112350840A (en) * | 2019-08-08 | 2021-02-09 | 中移物联网有限公司 | Fault monitoring and repairing method and related equipment |
CN110968061B (en) * | 2019-12-06 | 2021-02-26 | 珠海格力电器股份有限公司 | Equipment fault early warning method and device, storage medium and computer equipment |
CN110968061A (en) * | 2019-12-06 | 2020-04-07 | 珠海格力电器股份有限公司 | Equipment fault early warning method and device, storage medium and computer equipment |
CN113127984A (en) * | 2019-12-31 | 2021-07-16 | 中移(上海)信息通信科技有限公司 | Method, device, equipment and storage medium for equipment maintenance |
CN113297045A (en) * | 2020-07-27 | 2021-08-24 | 阿里巴巴集团控股有限公司 | Monitoring method and device for distributed system |
CN113297045B (en) * | 2020-07-27 | 2024-03-08 | 阿里巴巴集团控股有限公司 | Monitoring method and device for distributed system |
CN112711912A (en) * | 2020-12-30 | 2021-04-27 | 许昌学院 | Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm |
CN112711912B (en) * | 2020-12-30 | 2024-03-19 | 许昌学院 | Air quality monitoring and alarming method, system, device and medium based on cloud computing and machine learning algorithm |
CN113689042A (en) * | 2021-08-25 | 2021-11-23 | 华自科技股份有限公司 | Fault source prediction method for monitoring node |
Also Published As
Publication number | Publication date |
---|---|
CN106708016B (en) | 2019-12-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106708016A (en) | fault monitoring method and device | |
CN110865929B (en) | Abnormality detection early warning method and system | |
Fan et al. | A review on data preprocessing techniques toward efficient and reliable knowledge discovery from building operational data | |
US10417528B2 (en) | Analytic system for machine learning prediction model selection | |
CN107577588B (en) | Intelligent operation and maintenance system for mass log data | |
US10025813B1 (en) | Distributed data transformation system | |
US20190354509A1 (en) | Techniques for information ranking and retrieval | |
Zhang et al. | Data stream clustering with affinity propagation | |
CN106844161B (en) | Abnormity monitoring and predicting method and system in calculation system with state flow | |
CN111694879A (en) | Multivariate time series abnormal mode prediction method and data acquisition monitoring device | |
CN106600115A (en) | Intelligent operation and maintenance analysis method for enterprise information system | |
CN107070692A (en) | A kind of cloud platform monitoring service system analyzed based on big data and method | |
CN109501834A (en) | A kind of point machine failure prediction method and device | |
WO2021159834A1 (en) | Abnormal information processing node analysis method and apparatus, medium and electronic device | |
Shi et al. | An accident prediction approach based on XGBoost | |
CN110287316A (en) | A kind of Alarm Classification method, apparatus, electronic equipment and storage medium | |
US10956825B1 (en) | Distributable event prediction and machine learning recognition system | |
CN112367303B (en) | Distributed self-learning abnormal flow collaborative detection method and system | |
CN112988509B (en) | Alarm message filtering method and device, electronic equipment and storage medium | |
Cao et al. | Load prediction for data centers based on database service | |
CN112905340A (en) | System resource allocation method, device and equipment | |
KR20220151650A (en) | Algorithmic learning engine for dynamically generating predictive analytics from large, high-speed stream data | |
CN107368516A (en) | A kind of log audit method and device based on hierarchical clustering | |
CN109344171A (en) | A kind of nonlinear system characteristic variable conspicuousness mining method based on Data Stream Processing | |
Sawalha et al. | Towards an efficient big data management schema for IoT |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |