CN106845526B - A kind of relevant parameter Fault Classification based on the analysis of big data Fusion of Clustering - Google Patents

A kind of relevant parameter Fault Classification based on the analysis of big data Fusion of Clustering Download PDF

Info

Publication number
CN106845526B
CN106845526B CN201611247433.2A CN201611247433A CN106845526B CN 106845526 B CN106845526 B CN 106845526B CN 201611247433 A CN201611247433 A CN 201611247433A CN 106845526 B CN106845526 B CN 106845526B
Authority
CN
China
Prior art keywords
data
parameter
fault
classification
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611247433.2A
Other languages
Chinese (zh)
Other versions
CN106845526A (en
Inventor
董云帆
房红征
樊焕贞
高健
熊毅
李蕊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Aerospace Measurement and Control Technology Co Ltd
Original Assignee
Beijing Aerospace Measurement and Control Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Aerospace Measurement and Control Technology Co Ltd filed Critical Beijing Aerospace Measurement and Control Technology Co Ltd
Priority to CN201611247433.2A priority Critical patent/CN106845526B/en
Publication of CN106845526A publication Critical patent/CN106845526A/en
Application granted granted Critical
Publication of CN106845526B publication Critical patent/CN106845526B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24143Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Testing And Monitoring For Control Systems (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)

Abstract

A kind of relevant parameter Fault Classification based on the analysis of big data Fusion of Clustering provided by the invention, Fault Classification of the invention is from the mass data of equipment operation, fault data is selected according to diagnostic rule, and the machine for carrying out supervision independently clusters, form the automatic classification results of relevant parameter failure, it is able to solve current equipment failure overdiagnose and relies on expert knowledge library, and the problem of having ignored the incidence relation between each subsystem between the parameter of depth Non-linear coupling, and magnanimity valid data there is no the problem of good digging utilization in practical equipment model operation;Simultaneously, since the implementation of Fault Classification of the invention needs not rely upon the precise physical modeling to object equipment, therefore traditional complication system difficulty difficult to model is avoided, the intelligent fault classification and relevant parameter analysis excavated based on mass data are realized, with the controllable failure modes ability of accuracy rate.

Description

A kind of relevant parameter Fault Classification based on the analysis of big data Fusion of Clustering
Technical field
The present invention relates to equipment failure prediction and the fields health control (PHM), and in particular to one kind is merged based on big data The relevant parameter Fault Classification of clustering.
Background technique
Failure predication and health control have been developed as aerospace field system logistics support, maintenance and autonomous health The important support technology of management and basis, in " National Program for Medium-to Long-term Scientific and Technological Development 2006-2020 ", " weight Big product and great installation forecasting technique in life span " is proposed as cutting edge technology in space flight in recent years, Aeronautics subject hair In exhibition report, PHM technology is classified as crucial and support technology.
PHM technology has become one and covers basic material, mechanical structure, the energy, electronics, automatic test, reliability, letter The multi-field cross disciplines and research hot topic direction such as breath have important application value and realistic meaning.In most of work In industry system PHM application, mathematics or the physical model for establishing complex component or system are very difficult or even cannot achieve, or identification The parameter of model is complex, and therefore, the test data in each stage such as component or system design, emulation, operation and maintenance passes Sensor historical data just becomes the main means for grasping system performance decline.
Fast development is gradually paid attention to and obtained to PHM method as a result, based on test or sensor historic data mining, Important research hot spot as the field PHM.Especially for complication systems such as aerospaces, it is difficult to directly acquire or construct characterization The physical model of component, system degradation and remaining life, meanwhile, these objective systems and component have a large amount of available state prisons It surveys and test data, therefore, the PHM method system based on data-driven obtains U.S. army, NASA and numerous grinds Study carefully the extensive attention of mechanism, industrial enterprise.
Data-driven PHM method is to be acquired and obtained feature ginseng related with system property based on advanced sensor technology Number, and these characteristic parameters are associated with useful information, it detected, analyzed and is predicted by intelligent algorithm and model, provided The probability that remaining life distribution, performance degradation degree or the task of goal systems fail, to be maintenance system and system security Decision information is provided.
In data-driven PHM method system, method flow, distinct methods fusion, model selection, Model suitability etc. Problem has become for the research emphasis in the field now, and data-driven PHM method is obtained with its flexible adaptability and ease for use It obtained and is widely applied and promotes.
Summary of the invention
It is an object of the present invention to solve existing data-driven PHM method, there is fault datas to obtain difficulty Technical problem, the present invention provides a kind of relevant parameter Fault Classification based on the analysis of big data Fusion of Clustering, for improving Existing complex equipment contains the operation data of massive information not by effective status excavated, efficiently used.
To achieve the above object, the present invention provides the algorithm flows of complete set, carry out operation and parsing, obtain final Failure modes and parameter association probabilistic model.The relevant parameter Fault Classification includes:
Step 1) obtains the various operation datas of object equipment.
The design data that step 2) is equipped according to object is established and covers the parameter diagnostic rule that object equips whole parameters Library.Parameter diagnostic rule library not only includes the thresholding judgment rule of parameter, while also Trend judgement rule and jump comprising parameter Become judgment rule.
Step 3) is subject to the rule in parameter diagnostic rule library, obtains failure to operation data screening all in step 1) Data form all fault data set without classification fault data collection.
Step 4) is met with independently being clustered without classification fault data collection by the data that clustering algorithm carries out supervision It is required that clusters number and every cluster centre.The number of cluster is gradually increased to number appropriate since 2, and final choose makes The minimum value that each cluster core average weighted distance no longer reduces is the sum of cluster.Meanwhile passing through determining every cluster centre Classification fault data collection is obtained to classification is carried out without classification fault data collection.
Step 5) will use mapping-reduction algorithm without classification fault data collection in step 3), generate parameter association probability Model, while the parameter association probabilistic model includes that every parameter breaks down in object equipment, other parameters are also sent out The probability distribution data of raw failure, data are arranged in probability table from high to low.
Step 6) is using the classification fault data collection in step 4) as fault distinguishing standard, using nearest neighbor algorithm, to step Rapid 1) the middle operation data obtained carries out fault category identification, obtains failure modes result.
Step 7) is combined according to failure modes result with the parameter association probabilistic model in step 5), obtains comprehensive event Hinder diagnostic classification result.The content that comprehensive diagnos classification results include are as follows: failure modes result, the failure modes result it is all The probability distribution data of parameter.
As a further improvement of the above technical scheme, the format of the operation data obtained in the step 1) meets: every A complete data entry includes all parameter values equipped at the time of the data entry occurs and in the moment object;Each Single data values in data entry characterize the measured value of a parameter in the object-based device at a certain moment;Between each data entry It is arranged one by one according to the sequencing that the moment occurs.
As a further improvement of the above technical scheme, the format of the fault data screened in the step 3) meets: every A data entry includes the whole fault parameters to break down at the time of the data entry occurs and at the moment;For data The parameter to break down in entry marks failure according to parameter diagnostic rule library and triggered rule occurs.
As a further improvement of the above technical scheme, the parameter diagnostic rule library includes the bound of parameter, ginseng Number jump abnormal determination rule, parameter trend gradual change abnormal determination rule.
As a further improvement of the above technical scheme, the step 4) specifically includes:
Step 101) sets the initial number of clusters number K as 2, according to current K value to without classification fault data collection Cluster operation is carried out, K cluster centre and its corresponding K cluster are obtained;
Step 102) calculates the mean profile coefficient of K cluster, the mean profile coefficient that K is clustered and K-1 cluster Mean profile coefficient compare, if two mean profile coefficients are constant, choose current K value as cluster sum, otherwise It is re-execute the steps 101) after setting K=K+1;The silhouette coefficient indicates that all data entries for including in each cluster are corresponding Vector point to cluster centre geometric distance average value;
Step 103) carries out cluster operation to without classification fault data collection with the cluster sum determined in step 102), and leads to It crosses the every cluster centre obtained to classify to all fault datas concentrated without classification fault data, obtains classification number of faults According to collection.
As a further improvement of the above technical scheme, the operating procedure packet of cluster centre is obtained in the step 101) It includes:
Step 101-1) from object equip all operation datas in select the corresponding vector point of a data entry at random As first cluster centre, and the vector point nearest with the geometric distance of first cluster centre is found as in the second cluster The heart;
Step 101-2) calculate the geometric distance Distance (x) of each cluster centre cluster centre nearest with it, general All geometric distance Distance (x), which are added, obtains total distance Sum (Distance (x));
Step 101-3) to randomly select the data entry that one can fall in total distance Sum (Distance (x)) corresponding Vector point Random re-execute the steps 101-2 as the cluster centre newly increased), until picking out in K cluster The heart.
As a further improvement of the above technical scheme, the step 5) specifically includes:
Whole fault data entries comprising each parameter are successively respectively mapped to together, form each item ginseng by step 201) The corresponding mapping class of number, the mapping class include whole fault data entries an of parameter and its frequency of appearance;
Step 202) calculates the sum of fault data entry in each mapping class, the denominator as probability calculation;
Step 203) adds up time occurred in each mapping class comprising the other parameters in addition to the mapping class corresponds to parameter Number, the molecule as probability calculation;
The ratio between molecule in step 203) and denominator in step 202) are obtained each parameter and broken down by step 204) While, probability distribution data that other parameters also break down.
As a further improvement of the above technical scheme, the step 6) specifically includes: calculating all in step 1) The geometric distance of operation data and the every cluster centre determined, takes mean profile system of the smallest distance value with corresponding cluster Number is compared, if the distance value is less than the mean profile coefficient of corresponding cluster, determines operation data for the cluster institute Corresponding fault type.
A kind of relevant parameter Fault Classification advantage based on the analysis of big data Fusion of Clustering of the invention is:
The present invention provides a kind of definition clearly, it is practical it is operable, with good result based on mass data The relevant parameter Fault Classification of Fusion of Clustering analysis, improves following technology existing for existing method for diagnosing faults and asks Topic:
1. current equipment failure overdiagnose relies on expert knowledge library, and expert knowledge library is when facing complication system, face Face multiple shot array problem, it is difficult to cover whole fault conditions and its relevant parameter, have ignored the ginseng of depth coupling between each subsystem The problem of non-linear correlation relationship between number.In this regard, Fault Classification of the invention is excavated not by data mining means With parameter association relationship and its fault mode between subsystem, so as to be effectively improved the above problem.
2. available data driving PHM method office is only limitted to component-level fault diagnosis, and in the fault diagnosis of complication system grade In the process, due to there is the difficulty to complication system entirety Accurate Model, for the variety classes event being mingled in normal data Barrier data rely primarily on the machine learning clustering method of non-supervisory formula, and the result of cluster both includes normal data, also include failure Data, and fault data classification is bad.Thus currently based on the method for diagnosing faults of data-driven, although being examined in component-level Preferable effect is achieved in disconnected, but in the diagnosis of complication system grade, it is difficult to obtain and be better than the fault diagnosis based on model-driven Method.In this regard, Fault Classification of the invention has merged the advantages of data-driven method and model driven method, using existing The expert knowledge library based on model, to equipment operation data carry out the classification for having supervision (interpretation result supervision), greatly improve The classification and convergence of data, can improve the bad problem of the classifying quality of current data-driven PHM method.
Detailed description of the invention
Fig. 1 is the relevant parameter Fault Classification that one of embodiment of the present invention is analyzed based on big data Fusion of Clustering Overview flow chart.
Fig. 2 a- Fig. 2 d is the four repetition Test Drawings choosing cluster sum in the embodiment of the present invention and executing.
Fig. 3 is the operational flowchart of clustering algorithm in the embodiment of the present invention.
Fig. 4 is in the embodiment of the present invention based on mapping-reduction algorithm parameter association probabilistic algorithm figure.
Specific embodiment
With reference to the accompanying drawings and examples to a kind of association ginseng based on the analysis of big data Fusion of Clustering of the present invention Number Fault Classification is described in detail.
Expert knowledge library is relied in order to solve current equipment failure overdiagnose, and expert knowledge library is difficult to cover each subsystem Between system depth couple parameter between non-linear correlation relationship the problem of, and using available data driving method complexity be Ineffective, the status that mass data is not excavated effectively in fault diagnosis of uniting, the present invention provides a kind of definition clearly, real Operable, with good result, based on the analysis of mass data Fusion of Clustering the relevant parameter Fault Classification in border.
In the present embodiment, the relevant parameter failure modes side provided by the invention based on the analysis of big data Fusion of Clustering Method uses certain equipment power-supply system to be verified for example.It establishes, fault data screening, gather by data prediction, rule The processes such as class, mapping, specification form comprehensive failure modes result.
First according to the data sources such as the real-time running data of equipment and direct fault location data, equipment operation data is established Collection, for based on data-driven model training and verifying.Secondly it is equipped according to object, establishes equipment parameter diagnostic rule library, Interpretation and detection are carried out for the real time fail to parameter in equipment operation.Then according to diagnostic rule library, equipment was run The mass data of journey carries out interpretation, therefrom isolates the data entry containing fault parameter.After isolating fault data, use There is the autonomous clustering method of the machine learning of supervision to carry out failure mode cluster.Fault verification is carried out using the cluster of generation, simultaneously Generation error parameter matrix, and it is associated Parameter analysis using mapping-specification (Map-Reduce) method, form analysis knot Fruit.It can thus be appreciated that: Fault Classification of the invention selects number of faults according to diagnostic rule from the mass data of equipment operation According to, and the machine for carrying out supervision independently clusters, and forms the automatic classification results of relevant parameter failure, is able to solve equipment event at present Hinder overdiagnose and rely on expert knowledge library, and has ignored the incidence relation between each subsystem between the parameter of depth Non-linear coupling The problem of, and magnanimity valid data there is no the problem of good digging utilization in practical equipment model operation;Meanwhile by The precise physical modeling to object equipment is needed not rely upon in the implementation of Fault Classification of the invention, therefore avoids tradition Complication system difficulty difficult to model.
Refering to what is shown in Fig. 1, the relevant parameter Fault Classification specifically includes:
Step 1) obtains the various operation datas of object equipment;The operation data include direct fault location emulation data, Analog simulation data, bus monitoring data, BIT, IETM data, maintenance and detection record and existing sensing data etc..
The related data that step 2) is equipped according to object carries out object analysis, establishes the parameter diagnostic rule of object equipment Library.Rule base should equip the diagnostic rule of whole parameters comprising object, for example including but be not limited to bound (the regulation ginseng of parameter Several bound extreme value, more than being then the criterion of failure), (regulation parameter is in the short time for parameter jump abnormal determination rule The situation significantly jumped occurs for interior value, and determines jump degree and failure criterion), parameter trend gradual change abnormal determination rule Then (failure criterion for the improper trend such as being gradually reduced is sported by being gradually increasing).
It should be noted that this parameter diagnostic rule library is most for the completeness for ensuring final argument association probability model Low requirement is the single decision rule comprising each parameter.There is no need to require to object equip establish accurate physical model with Acquire the associative expression formula of parameter.
Step 3) is under the premise of parameter diagnostic rule library is complete, on the basis of parameter diagnostic rule library, screening step 1) Diagnostic rule in parameter diagnostic rule library can be inputted count at this time by the abnormal data entry in the magnanimity operation data of middle acquisition Calculation machine, is screened by computer automatic execution.The format of the operation data should meet several following:
1, each complete data entry should include the exact time and fill in the moment object that the data entry occurs Standby all parameter values;
2, the single data values in each data entry should characterize the actual measurement of a parameter in the object equipment at a certain moment Value;
3, it is arranged one by one between each data entry according to the sequencing that the moment occurs.
The fault data filtered out should have following format:
1, each entry includes the exact time that the data entry occurs;
2, each entry includes the moment whole fault parameters of failure to have occurred, and is mapped and is advised in order to subsequent About;
3, is marked by failure generation and is triggered according to parameter diagnostic rule library for the parameter to break down in data entry Rule (thresholding rule, jump rule etc.).
The data obtained at this time are whole fault datas, are not classified.After obtaining fault data, data are carried out Cluster operation.
Step 4) will independently be clustered without classification fault data collection by the data that clustering algorithm carries out supervision, be met It is required that clusters number and every cluster centre after, by determining every cluster centre to the institute concentrated without classification fault data Faulty data are classified, and classification fault data collection is obtained.
The method that operation uses K-Means is clustered, according to the fault data isolated in previous step, machine is carried out and independently gathers Class operation.The wherein first step and a most important step are exactly the determination of K value (number of cluster core).K cluster core, it is practical Characterization is exactly K kind fault condition.
The method that the present invention uses silhouette coefficient to optimize, for choosing K value.The silhouette coefficient of some cluster, refers to The corresponding vector point of all data entries for including in the cluster to the cluster centre geometric distance average value.It is clustering Cheng Hou, silhouette coefficient are lower, it was demonstrated that the classifying quality of the cluster is more outstanding.
Refering to what is shown in Fig. 3, the step 4) specifically includes:
Step 101) sets the initial number of clusters number K as 2, according to current K value to no classification since K=2 Fault data collection carries out cluster operation, obtains K cluster centre and its corresponding K cluster.
Step 102) calculates under current K value after the completion of clustering operation, the mean profile coefficient of K cluster.It is poly- by K Compared with the mean profile coefficient that the mean profile coefficient of class is clustered with K-1, when the increase with K, silhouette coefficient is gradually received It holds back, when no longer reducing, that is, chooses current K value as cluster sum, re-execute the steps 101) after otherwise setting K=K+1.Such as figure It shown in 2a, 2b, 2c, 2d, is chosen for K value, has carried out four tests respectively.In four tests shown in the figure, with the increasing of K Add, the variation of silhouette coefficient can be gradually reduced.When K reaches 11, gradually restrain.
Step 103) carries out cluster operation to without classification fault data collection with the cluster sum determined in step 102), and leads to It crosses the every cluster centre obtained to classify to all fault datas concentrated without classification fault data, obtains classification number of faults According to collection.
Based on above-mentioned steps 101), during true defining K value, for each current K value, it is both needed to be clustered The selection of the heart.It is the selection of initial cluster center (seed point) first.For current K value, need to choose K seed point.Choosing Taking cluster centre, specific step is as follows:
Step 101-1) first the corresponding vector of a data entry is chosen at random from all operation data libraries that object is equipped Point is used as first cluster centre, and finds the vector point nearest with the geometric distance of first cluster centre as the second cluster Center.
Step 101-2) for each vector point, we calculate the geometric distance of itself and a nearest cluster centre Distance (x), and be stored in an array, these geometric distance Distance (x) addition is then obtained total distance Sum (Distance(x))。
Step 101-3) random value is taken again, it is calculated with the mode of weight and obtains next cluster centre.This is calculated The realization of method is to choose a corresponding vector point of data entry that can be fallen in total distance Sum (Distance (x)) immediately Random, Random=Random-Distance (x), when Random≤0, point at this time is exactly the poly- of next selection Class center.Repeat step 101-2) and step 101-3), until k cluster centre is selected.
It is the training of cluster in next step after choosing cluster centre.For each fault sample data, its correspondence is calculated Vector point to the geometric distance of each cluster centre, be referred to apart from nearest cluster centre, then calculated after updating Cluster geometric center, and substitute with new geometric center the former center of the cluster.Check whether cluster centre changes, In case of variation (not converged), then constantly repeat the above process.When cluster centre restrains (being no longer changed), cluster Operation is completed.
By above-mentioned operation, in the K value for having chosen optimization, and after having carried out cluster operation, what we grasped in hand Valid data include: without classification fault data, the number K of cluster, each vector parameter for clustering core and belonging to each cluster The detailed entry of (being subordinated to the cluster) fault data for including down.
Followed by be the operation of mapping-specification, the purpose of the operation be in order in the fault data of magnanimity, It was found that the Non-linear coupling fault correlation relationship between parameter.
Step 5) will use mapping-reduction algorithm without classification fault data collection in step 3), generate parameter association probability Model, while the parameter association probabilistic model includes that each parameter breaks down in object equipment, other parameters are also sent out The probability distribution data of raw failure.
Refering to what is shown in Fig. 4, the step 5) specifically includes:
Step 201) carries out mapping operations first, i.e., based on without classification fault data collection, carries out from discrete failure Mapping of the data to each parameter.According to the order of parameter, will successively distinguish comprising whole fault data entries of each parameter It is mapped to together, forms the corresponding mapping class of each parameter.Mapping operations the result is that whole number of faults comprising each parameter According to entry and its frequency of appearance.
By mapping operations, we have grasped the fault entries and its frequency for separately including each parameter.For example, All fault entries to break down comprising parameter 1, we have been mapped in first mapping ensemblen (in Fig. 4 on the left of the second layer First mapping ensemblen).All fault entries to break down comprising parameter 2, we have been mapped in second mapping ensemblen (figure Second mapping ensemblen on the left of the second layer in 4), and so on, obtain the mapping ensemblen of all parameters.
Based on the mapping class that above-mentioned steps obtain, specification operation is carried out.The purpose of specification operation is calculated when certain The synchronization that one parameter breaks down, the probability that in addition some parameter also breaks down simultaneously.Come between characterization parameter with this Fault correlation relationship.
Each class that step 202) forms above-mentioned mapping, calculates the sum of fault data entry in each mapping class (frequency addition), the denominator as probability calculation.
Step 203) adds up time occurred in each mapping class comprising the other parameters in addition to the mapping class corresponds to parameter Number, its frequency is added, the molecule as probability calculation.
The ratio between molecule in step 203) and denominator in step 202) are obtained each parameter and broken down by step 204) While, probability distribution data that other parameters also break down.It is (all to break down comprising parameter 1 with first mapping class Data combination) for, in the mapping class, retrieval include parameter 2 combination, its frequency is added, as molecule, divided by this The sum of class fault entries, while being broken down with this calculating parameter 1, probability that parameter 2 also breaks down.Parameter has been calculated After 2, calculating parameter 3 to parameter s (has traversed all parameters).The fault correlation parameter list of parameter 1 is consequently formed.
And so on, from the 2nd to s-th of mapping class, carry out identical specification operation.Form the fail close of s parameter Join parameter list.
The training part of data has been completed as a result, we have grasped the cluster of the K kind failure of K-Means generation, and The parameter association probabilistic model that mapping-specification generates.Next it can use equipment operation data collection, carry out actual failure and examine Disconnected and verifying.
Step 6) is using the classification fault data collection in step 4) as fault distinguishing standard, to all operations in step 1) Data carry out fault category identification using nearest neighbor algorithm, obtain failure modes result.During actual motion, for one The new operation data entry of item can use nearest neighbor algorithm, calculate separately the geometry of itself and the cluster centre of K fault cluster Distance takes the smallest distance value (arest neighbors).If this minimum value is less than the silhouette coefficient of the cluster, that is, it can determine that operation number According to for fault type corresponding to the cluster, fault diagnosis is carried out with this.
Step 7) combines failure modes result with the parameter association probabilistic model in step 5), obtains comprehensive diagnosis As a result.Comprehensive diagnostic result includes: failure modes result, main fault parameter and with major failure parameter association probability The parameter of larger (probability threshold value can adjust according to the actual situation).
In conclusion according to the relevant parameter failure modes side provided by the invention based on the analysis of big data Fusion of Clustering Method realizes the intelligent fault classification and relevant parameter analysis excavated based on mass data.With the controllable failure of accuracy rate point Class ability.And for the failure sorted out, according to parameter association probabilistic model, the association that can provide dependent failure parameter is general Rate, to improve the formulation of the intelligent diagnostics and maintenance decision of failure.
It should be noted last that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting.Although ginseng It is described the invention in detail according to embodiment, those skilled in the art should understand that, to technical side of the invention Case is modified or replaced equivalently, and without departure from the spirit and scope of technical solution of the present invention, should all be covered in the present invention Scope of the claims in.

Claims (7)

1. a kind of relevant parameter Fault Classification based on the analysis of big data Fusion of Clustering characterized by comprising
Step 1) obtains the various operation datas of object equipment;
The related data that step 2) is equipped according to object is established and covers the parameter diagnostic rule library that object equips whole parameters;
Step 3) is subject to the rule in parameter diagnostic rule library, obtains number of faults to operation data screening all in step 1) According to by the formation of all fault data set without classification fault data collection;
Step 4) will independently be clustered without classification fault data collection by the data that clustering algorithm carries out supervision, be met the requirements Clusters number and every cluster centre after, by determining every cluster centre to without concentrate all of classification fault data therefore Barrier data are classified, and classification fault data collection is obtained;
Step 5) will use mapping-reduction algorithm without classification fault data collection in step 3), generate parameter association probabilistic model, While the parameter association probabilistic model includes that each parameter breaks down in object equipment, other parameters also break down Probability distribution data;The step 5) specifically includes:
Whole fault data entries comprising each parameter are successively respectively mapped to together, form each parameter pair by step 201) The mapping class answered, the mapping class include whole fault data entries an of parameter and its frequency of appearance;
Step 202) calculates the sum of fault data entry in each mapping class, the denominator as probability calculation;
Step 203) adds up the number occurred in each mapping class comprising the other parameters in addition to the mapping class corresponds to parameter, makees For the molecule of probability calculation;
Step 204) by the molecule in step 203) and the ratio between denominator in step 202), obtain each parameter break down it is same When, probability distribution data that other parameters also break down;
Step 6) is using the classification fault data collection in step 4) as fault distinguishing standard, to all operation datas in step 1) Fault category identification is carried out using nearest neighbor algorithm, obtains failure modes result;
Step 7) combines failure modes result with the parameter association probabilistic model in step 5), obtains and the failure modes occur As a result the probability distribution data of all parameters.
2. the relevant parameter Fault Classification according to claim 1 based on the analysis of big data Fusion of Clustering, feature Be, the format of the operation data obtained in the step 1) meets: each complete data entry is sent out comprising the data entry All parameter values equipped at the time of raw and in the moment object;Single data values in each data entry characterize certain for the moment The measured value of a parameter in the object-based device at quarter;It is arranged one by one between each data entry according to the sequencing that the moment occurs.
3. the relevant parameter Fault Classification according to claim 1 based on the analysis of big data Fusion of Clustering, feature Be, the format of the fault data screened in the step 3) meets: each data entry include the data entry occur when The whole fault parameters carved and broken down at the moment;For the parameter to break down in data entry, sentenced according to parameter It reads rule base mark failure and triggered rule occurs.
4. the relevant parameter Fault Classification according to claim 1 based on the analysis of big data Fusion of Clustering, feature It is, the parameter diagnostic rule library includes the bound of parameter, parameter jump abnormal determination is regular, parameter trend gradual change is different Normal decision rule.
5. the relevant parameter Fault Classification according to claim 1 based on the analysis of big data Fusion of Clustering, feature It is, the step 4) specifically includes:
Step 101) sets the initial number of clusters number K as 2, carries out according to current K value to without classification fault data collection Operation is clustered, K cluster centre and its corresponding K cluster are obtained;
Step 102) calculates the mean profile coefficient of K cluster, and the mean profile coefficient that K is clustered clusters flat with K-1 Equal silhouette coefficient compares, if two mean profile coefficients are constant, choose current K value as cluster sum, otherwise sets K It is re-execute the steps 101) after=K+1;The silhouette coefficient indicate all data entries for including in each cluster it is corresponding to Amount point arrives the average value of the geometric distance of cluster centre;
Step 103) carries out cluster operation to without classification fault data collection with the cluster sum determined in step 102), and by obtaining The every cluster centre taken classifies to all fault datas concentrated without classification fault data, obtains classification fault data Collection.
6. the relevant parameter Fault Classification according to claim 5 based on the analysis of big data Fusion of Clustering, feature It is, the operating procedure that cluster centre is obtained in the step 101) includes:
Step 101-1) from object equip all operation datas in select the corresponding vector point conduct of a data entry at random First cluster centre, and the vector point nearest with the geometric distance of first cluster centre is found as the second cluster centre;
Step 101-2) the geometric distance Distance (x) that calculates each cluster centre cluster centre nearest with it, will own Geometric distance Distance (x), which is added, obtains total distance Sum (Distance (x));
Step 101-3) randomly select the corresponding vector of data entry that can be fallen in total distance Sum (Distance (x)) Point Random re-execute the steps 101-2 as the cluster centre newly increased), until picking out K cluster centre.
7. the relevant parameter Fault Classification according to claim 1 based on the analysis of big data Fusion of Clustering, feature Be, the step 6) specifically includes: calculate step 1) in all operation datas with determination every cluster centre it is several What distance, takes the smallest distance value to be compared with the mean profile coefficient of corresponding cluster, corresponds to if the distance value is less than Cluster mean profile coefficient, then determine operation data be the cluster corresponding to fault type.
CN201611247433.2A 2016-12-29 2016-12-29 A kind of relevant parameter Fault Classification based on the analysis of big data Fusion of Clustering Active CN106845526B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611247433.2A CN106845526B (en) 2016-12-29 2016-12-29 A kind of relevant parameter Fault Classification based on the analysis of big data Fusion of Clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611247433.2A CN106845526B (en) 2016-12-29 2016-12-29 A kind of relevant parameter Fault Classification based on the analysis of big data Fusion of Clustering

Publications (2)

Publication Number Publication Date
CN106845526A CN106845526A (en) 2017-06-13
CN106845526B true CN106845526B (en) 2019-12-03

Family

ID=59114134

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611247433.2A Active CN106845526B (en) 2016-12-29 2016-12-29 A kind of relevant parameter Fault Classification based on the analysis of big data Fusion of Clustering

Country Status (1)

Country Link
CN (1) CN106845526B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110018980B (en) * 2017-12-25 2021-07-27 北京金风科创风电设备有限公司 Method and device for searching fault data from simulation data of fan controller
WO2019167180A1 (en) * 2018-02-28 2019-09-06 日産自動車株式会社 Abnormality type determining device and abnormality type determining method
CN108763289B (en) * 2018-04-13 2021-11-23 西安电子科技大学 Massive heterogeneous sensor format data analysis method
CN109445306B (en) * 2018-10-26 2022-01-25 湖南磁浮技术研究中心有限公司 Automatic associated parameter interpretation method and system based on rule configuration analysis
CN109991951B (en) * 2019-04-28 2020-10-02 齐鲁工业大学 Multi-source fault detection and diagnosis method and device
CN110263944A (en) * 2019-05-21 2019-09-20 中国石油大学(华东) A kind of multivariable failure prediction method and device
CN113392208A (en) * 2020-03-12 2021-09-14 中国移动通信集团云南有限公司 Method, device and storage medium for IT operation and maintenance fault processing experience accumulation
CN113282433B (en) * 2021-06-10 2023-04-28 天翼云科技有限公司 Cluster anomaly detection method, device and related equipment
CN113421176B (en) * 2021-07-16 2022-11-01 昆明学院 Intelligent screening method for abnormal data in student score scores
CN113656389B (en) * 2021-08-12 2022-05-27 北京可视化智能科技股份有限公司 Intelligent factory abnormal data processing method, device and system and storage medium
CN116483705B (en) * 2023-04-17 2024-10-11 哈尔滨工业大学 Knowledge and model driven airborne software intelligent failure mode analysis method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105701157A (en) * 2015-12-30 2016-06-22 芜湖乐锐思信息咨询有限公司 Monitoring system for integrating social network site information
CN105718935A (en) * 2016-01-25 2016-06-29 南京信息工程大学 Word frequency histogram calculation method suitable for visual big data
CN105891629B (en) * 2016-03-31 2017-12-29 广西电网有限责任公司电力科学研究院 A kind of discrimination method of transformer equipment failure
CN106021062B (en) * 2016-05-06 2018-08-07 广东电网有限责任公司珠海供电局 The prediction technique and system of relevant fault
CN106251034A (en) * 2016-07-08 2016-12-21 大连大学 Wisdom energy saving electric meter monitoring system based on cloud computing technology

Also Published As

Publication number Publication date
CN106845526A (en) 2017-06-13

Similar Documents

Publication Publication Date Title
CN106845526B (en) A kind of relevant parameter Fault Classification based on the analysis of big data Fusion of Clustering
CN106355030B (en) A kind of fault detection method based on analytic hierarchy process (AHP) and Nearest Neighbor with Weighted Voting Decision fusion
CN103914064B (en) Based on the commercial run method for diagnosing faults that multi-categorizer and D-S evidence merge
EP2930578B1 (en) Failure cause classification apparatus
CN111507376B (en) Single-index anomaly detection method based on fusion of multiple non-supervision methods
CN109416531A (en) The different degree decision maker of abnormal data and the different degree determination method of abnormal data
CN107967485A (en) Electro-metering equipment fault analysis method and device
CN107430715A (en) Cascade identification in building automation
CN114358152A (en) Intelligent power data anomaly detection method and system
CN111858231A (en) Single index abnormality detection method based on operation and maintenance monitoring
CN106404441B (en) A kind of failure modes diagnostic method based on non-linear similarity index
CN110455537A (en) A kind of Method for Bearing Fault Diagnosis and system
CN113255848A (en) Water turbine cavitation sound signal identification method based on big data learning
CN111191726B (en) Fault classification method based on weak supervision learning multilayer perceptron
CN101021723A (en) Melt index detection fault diagnozing system and method in propylene polymerization production
CN109240276B (en) Multi-block PCA fault monitoring method based on fault sensitive principal component selection
CN110163075A (en) A kind of multi-information fusion method for diagnosing faults based on Weight Training
CN112906764B (en) Communication safety equipment intelligent diagnosis method and system based on improved BP neural network
CN110490486B (en) Enterprise big data management system
CN108334898A (en) A kind of multi-modal industrial process modal identification and Fault Classification
CN101738998A (en) System and method for monitoring industrial process based on local discriminatory analysis
CN112257767A (en) Product key part state classification method aiming at class imbalance data
CN116341901A (en) Integrated evaluation method for landslide surface domain-monomer hazard early warning
CN114266289A (en) Complex equipment health state assessment method
CN109871002A (en) The identification of concurrent abnormality and positioning system based on the study of tensor label

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant