CN116627761B - PHM modeling and modeling auxiliary system and method based on big data frame - Google Patents
PHM modeling and modeling auxiliary system and method based on big data frame Download PDFInfo
- Publication number
- CN116627761B CN116627761B CN202310582187.XA CN202310582187A CN116627761B CN 116627761 B CN116627761 B CN 116627761B CN 202310582187 A CN202310582187 A CN 202310582187A CN 116627761 B CN116627761 B CN 116627761B
- Authority
- CN
- China
- Prior art keywords
- phm
- model
- modeling
- data stream
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000012544 monitoring process Methods 0.000 claims abstract description 44
- 238000007781 pre-processing Methods 0.000 claims abstract description 33
- 230000036541 health Effects 0.000 claims description 29
- 238000013145 classification model Methods 0.000 claims description 26
- 230000006870 function Effects 0.000 claims description 21
- 239000013598 vector Substances 0.000 claims description 20
- 238000012545 processing Methods 0.000 claims description 18
- 238000004422 calculation algorithm Methods 0.000 claims description 16
- 238000003745 diagnosis Methods 0.000 claims description 14
- 230000002159 abnormal effect Effects 0.000 claims description 8
- 238000004458 analytical method Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 8
- 238000004138 cluster model Methods 0.000 claims description 7
- 230000009191 jumping Effects 0.000 claims description 6
- 238000004806 packaging method and process Methods 0.000 claims description 4
- 238000011161 development Methods 0.000 claims description 2
- 238000003672 processing method Methods 0.000 claims 2
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000012549 training Methods 0.000 description 27
- 238000012360 testing method Methods 0.000 description 16
- 238000003066 decision tree Methods 0.000 description 9
- 238000003860 storage Methods 0.000 description 9
- 238000000513 principal component analysis Methods 0.000 description 7
- 238000012795 verification Methods 0.000 description 7
- 238000003491 array Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 230000015556 catabolic process Effects 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 238000007635 classification algorithm Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000003862 health status Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000008186 active pharmaceutical agent Substances 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3055—Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3447—Performance evaluation by modeling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Quality & Reliability (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to the technical field of industrial big data and artificial intelligence, in particular to a PHM modeling and modeling auxiliary system and method based on a big data frame. The Spark modeling module is used for: based on Spark, PHM modeling is conducted on historical monitoring data of the facility, a PHM model with a preset function is generated, serialization is conducted, and a byte array is generated. The module for the Flink is used for: performing deserialization operation on the byte array, and loading the PHM model; pulling current monitoring data of the facility to form an input data stream; preprocessing an input data stream, and converting the preprocessed data stream by using a loaded PHM model to obtain a predicted result data stream. The PHM modeling method can assist the user to efficiently develop PHM modeling based on Spark big data frame, and then the generated PHM model with the preset function is applied to actual industry.
Description
Technical Field
The invention relates to the technical field of industrial big data and artificial intelligence, in particular to a PHM modeling and modeling auxiliary system and method based on a big data frame.
Background
Along with the acceleration of the digitization and intelligent transformation steps of key facility systems in the fields of aerospace, aviation, nuclear power, manufacturing, traffic and the like, the development of predictive maintenance based on state monitoring big data is increasingly receiving importance from academia and industry, wherein the training of PHM models and the real-time predictive analysis by using the models are one of core technologies. While the PHM modeling needs to learn an intelligent model for anomaly detection, fault diagnosis and fault prediction from historical data and a great deal of research is currently performed on a data-driven PHM modeling method and algorithm, a distributed system with excellent data throughput capability, reliability, fault tolerance and stability is required in practical application to process massive equipment state monitoring data in real time, which requires support of a large data stream processing technology, and advanced stream processing frameworks and engines represented by Flink have the advantages in this aspect. Because current conventional data-driven PHM modeling is often carried out through Python or Matlab, the model obtained through training is difficult to directly deploy and apply to a Flink stream processing system, and Python and Matlab are difficult to be capable of large data stream processing tasks. In addition, the machine learning library Flink ML of the Flink native can provide limited algorithms, and cannot meet the actual requirements of PHM modeling.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a PHM modeling and modeling auxiliary system and method based on a big data frame.
The PHM modeling and modeling auxiliary system based on the big data frame has the following technical scheme:
the device comprises a Spark modeling module and a module for the Flink;
the Spark modeling module is used for:
based on Spark, PHM modeling is conducted on historical monitoring data of the facility, and a PHM model with a preset function is generated;
serializing the PHM model with the preset function to generate a byte array corresponding to the PHM model;
the module for the Flink is used for:
performing deserialization operation on a byte array corresponding to the PHM model, and loading the PHM model;
pulling current monitoring data of the facility to form an input data stream;
preprocessing the input data stream to obtain a preprocessed data stream;
and converting the preprocessed data stream by using the loaded PHM model to obtain a predicted result data stream.
The PHM modeling and modeling auxiliary method based on the big data frame comprises the following steps:
based on Spark, PHM modeling is conducted on historical monitoring data of the facility, and a PHM model with a preset function is generated;
serializing the PHM model with the preset function to generate a byte array corresponding to the PHM model;
performing deserialization operation on a byte array corresponding to the PHM model, and loading the PHM model;
pulling current monitoring data of the facility to form an input data stream;
preprocessing the input data stream to obtain a preprocessed data stream;
and converting the preprocessed data stream by using the loaded PHM model to obtain a predicted result data stream.
The beneficial effects of the invention are as follows:
the PHM modeling method can assist the user to efficiently develop PHM modeling based on Spark big data frame, and then the generated PHM model with the preset function is applied to actual industry.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings in which:
FIG. 1 is a schematic diagram of a PHM modeling and modeling auxiliary system based on a big data frame according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a PHM modeling and modeling auxiliary system based on big data frames according to an embodiment of the present invention;
FIG. 3 is a schematic illustration of a Spark-based PHM modeling flow;
FIG. 4 is a schematic illustration of a PHM modular flow based on Flink;
FIG. 5 is a decision tree classification model stored in MySQL;
FIG. 6 is a decision tree classification model loaded by Flink;
FIG. 7 is a schematic flow chart of a PHM modeling and modeling assistance method based on a big data frame according to an embodiment of the invention.
Detailed Description
As shown in FIG. 1, the PHM modeling and modeling auxiliary system based on a big data frame comprises a Spark modeling module and a Flink modeling module;
the Spark modeling module is used for:
based on Spark, PHM modeling is conducted on historical monitoring data of the facility, and a PHM model with a preset function is generated, specifically:
intelligent PHM modeling is carried out on historical monitoring data of the facility through Spark MLlib, and the intelligent PHM modeling comprises training and testing of the model, so that a PHM model with a preset function is generated.
The facilities can be space station effective load, satellite, nuclear power facilities and the like, and can be determined according to actual conditions.
The historical monitoring data of the facility includes: the operation data such as temperature data, pressure data, flow data, vibration data and the like can obtain different historical monitoring data according to different preset functions.
Serializing the PHM model with the preset function to generate a byte array corresponding to the PHM model;
the module for the Flink is used for:
performing deserialization operation on the byte array corresponding to the PHM model, and loading the PHM model;
pulling current monitoring data of the facility to form an input data stream;
and preprocessing the input data stream to obtain a preprocessed data stream.
Wherein the preprocessing of the input data stream comprises: feature variable selection, regularization, scaling, dimension reduction and other pretreatment.
And converting the preprocessed data stream by using the loaded PHM model to obtain a predicted result data stream.
And carrying out comprehensive intelligent diagnosis and prediction stream processing on the preprocessed data stream based on the multi-PHM model to obtain a data stream with fault diagnosis and fault prediction results.
The module for flank may also issue a prediction result data stream through Kafka, or write time sequence database.
Optionally, in the above technical solution, the Spark modeling module is further configured to: writing a byte array corresponding to the PHM model into a database;
the Flink module is also used to: and reading the byte array corresponding to the PHM model from the database.
Optionally, in the above technical solution, when a preprocessing step exists in the process of developing PHM modeling, the Spark modeling module writes a preprocessing model corresponding to the preprocessing step into the database in a table form.
The preprocessing step may be a minimum-maximum scaling processing, and the preprocessing model corresponding to the minimum-maximum scaling processing is a minimum-maximum scaling preprocessing model.
Optionally, in the above technical solution, the module for flank is specifically configured to: current monitoring data of the facility is pulled through Kafka.
Optionally, in the above technical solution, the PHM model with a preset function is: the PHM model is used for data preprocessing, identifying the running mode and the health state of the facility, predicting the health state type of the facility, predicting the fault mode type of the facility, monitoring abnormal data and predicting the residual service life of the facility. Specifically:
1) Obtaining PHM model for data preprocessing by preprocessing the historical monitoring data of the facility such as minimum maximum scaling (MinMaxScale), standardized scaling (Standard Scale) or Principal Component Analysis (PCA);
2) Based on historical monitoring data of the facility, training a baseline model through a clustering algorithm, wherein the baseline model is used for judging which cluster the current monitoring data of the facility belongs to, namely, identifying the operation mode and the health state of a monitored object, namely, the facility, and training to obtain a PHM model for identifying the operation mode and the health state of the facility;
3) Training a diagnosis model through a classification algorithm based on historical monitoring data of the facility, wherein the diagnosis model is used for predicting which classification label the current monitoring data of the facility belongs to, and comprises a health state type or a fault mode type, training to obtain a PHM model for predicting the health state type of the facility, and training to obtain a PHM model for predicting the fault mode type of the facility;
4) Based on historical monitoring data of the facility, training a correlation model through a regression algorithm, wherein the correlation model is used for detecting deviation between target data in current monitoring data of the facility and model output data in real time so as to monitor abnormal conditions, and training to obtain a PHM model for monitoring abnormal data;
5) Based on historical monitoring data of the facilities, training a degradation model through a regression algorithm, and extrapolating and predicting data of a period of time in the future according to the current monitoring data so as to predict performance indexes or predict the residual service life, and training to obtain a PHM model for predicting the residual service life of the facilities.
It should be noted that PHM models with different functions can be trained according to actual needs. In the PHM modeling and modeling auxiliary system based on the big data frame, the big data frame refers to Hadoop, spark and Flink, the architecture design is shown in figure 2, the system architecture comprises three parts of data source, data processing and data sink, the service supported and developed by the system is divided into two large functional blocks of off-line modeling and on-line modeling, and the PHM modeling and modeling are integrally connected through the current popular big data frame.
As shown in fig. 2 and 3, the implementation procedure of PHM modeling based on Hadoop and Spark is as follows:
1) Data set preparation:
the historical monitoring data of the facility are subjected to data extraction, cleaning and arrangement to form a data set for PHM model training, and the data set can be used for PHM modeling directly or after pretreatment. The Spark-based PHM modeling module assumes that the dataset is ready. The data set may be stored in the form of a data file stored on a single machine, in the form of a distributed file, or in the form of a table in a database, both structured data, depending on the size of the data volume. The data set comprises two types, namely label-free and label-free, wherein the label-free data set is used for training a clustering model or a regression model, and the label-free data set is used for training a classification model.
2) The read dataset is RDD or DataFrame:
spark reads the data set as input data in RDD format or DataFrame format, and a Spark machine learning library provides APIs based on both RDD and DataFrame data formats. The initial input data after the data set is read needs to be converted, on one hand, the String type feature data is converted into Double type feature data, and on the other hand, the multi-column feature data is converted into single-column feature row vectors.
3) Data preprocessing:
after data is read, preprocessing operation is carried out on characteristic data in the read data according to the characteristics of the read data and PHM modeling requirements through Spark MLlib, wherein the preprocessing operation mainly comprises a regularization algorithm (Normalized), a minimum maximum scaling algorithm (MinMaxscaled), a standardized scaling algorithm (Standard scaled), a principal component analysis algorithm (PCA) and the like.
4) Dividing data:
through Spark MLlib, the PHM classification and regression model training and testing need to divide the data, including two dividing modes, namely, dividing a training set verification set (Train-verification Split) proportionally, and K-fold Cross-verification (Cross-verification). The former is less computationally expensive than the latter, but the latter is advantageous for finding a more optimal model.
5) PHM model parameter setting:
the PHM model comprises three types of clustering, classification and regression, the parameter setting of the model determines the advantages and disadvantages of the model, and different parameters or parameter combinations are required to be set through Spark MLlib to respectively train and verify, so that a relatively optimized model is selected. And carrying out parameter setting on the PHM model through Spark MLlib, and specifically adopting a parameter grid constructor for setting.
6) PHM model training, verification and test:
aiming at the PHM classification model and the PHM regression model, training the models under each parameter setting through a training set and a verification set which are divided in advance, selecting the optimal PHM model based on the verification set, and testing an unknown test set. And aiming at the PHM clustering model, detecting the effect of the clustering model on the same data set, and selecting the model under the optimal parameter setting.
7) PHM model storage:
the trained PHM model is stored in the MySQL database, so that the management is convenient.
Wherein, PHM cluster model is stored in form of table, as shown in the following Table 1. ClusterID represents the ID of a cluster, F1, F2 and … … FN are the values of characteristic variables in the center of the cluster, radius is the Radius of the cluster, distMean is the average value of the distances from the cluster members to the center of the cluster, and DistStd is the standard deviation of the distances from the cluster members to the center of the cluster. And establishing a corresponding table in MySQL aiming at different PHM cluster models, so that cluster model cluster related information is stored in the table.
Table 1:
ClusterID | F1 | F2 | …… | FN | Radius | DistMean | DistStd |
aiming at the PHM classification model and the PHM regression model, the PHM classification model and the PHM regression model are converted into Byte arrays (Array [ Byte ]), and the Byte arrays and model names corresponding to the PHM classification model and the PHM regression model are written into MySQL.
In addition, if a preprocessing algorithm such as minimum maximum scaling (MinMaxScale), standard scaling (Standard Scale), principal Component Analysis (PCA) is used in the modeling process, the relevant preprocessing model needs to be written into MySQL in the form of a table. The minimum and maximum scaling is to store the minimum value and the maximum value of each characteristic variable in the training data, the standard scaling is to store the standard deviation and the average value of each characteristic variable in the training data, and the principal component analysis is to store the characteristic matrix corresponding to the characteristic variable matrix of the training sample.
In the architecture shown in fig. 2, hadoop has two supporting roles: firstly, providing Yarn resource management and scheduling, and providing support for operating a Spark modeling module and a flank modeling module in a cluster mode; secondly, an HDFS distributed file system is provided and is used as a storage back end of a module computing state for a Flink.
As shown in fig. 4, the process of using the PHM model is as follows:
1) Loading a PHM model:
loading a corresponding PHM model from a MySQL database: when the PHM model is a PHM cluster model, directly reading and storing the table into a variable in a DataFrame format; when the PHM model is a PHM classification model or a PHM regression model, the inverse serialization operation is needed to invert the byte array into a model which can be directly used.
2) Reading the data stream:
forming an input data stream by pulling current monitoring data of the facility by Kafka;
3) Preprocessing a data stream:
if historical monitoring data is preprocessed while training the PHM model using Spark, each data point (or data element) in the input data stream needs to be preprocessed in the same way, resulting in a preprocessed data stream.
4) According to the type and business requirement of the PHM model, the PHM model is divided into the following three possible situations:
a. processing the preprocessed data stream based on the PHM cluster model:
(1) calculating the distance value between the current data point and the center of each class cluster;
(2) taking the class cluster with the smallest distance value with the data point as a target class cluster;
(3) detecting whether the distance value between the target cluster and the target cluster is smaller than or equal to the radius of the target cluster plus 3 times of standard deviation, if yes, jumping to the step (4), otherwise jumping to the step (5);
(4) outputting a data stream containing the target cluster ID and the distance value:
taking the target class cluster as a state identification result of the data point, packaging the ID of the target class cluster and the distance value into a sample class, and outputting the sample class as a state identification result data stream;
(5) outputting a data stream containing suspected anomalies and distance values:
and (3) indicating that the data point does not belong to any class cluster in the current model, outputting early warning as a suspected abnormal point, and packaging the suspected abnormal point early warning character string information and the distance value into a sample class to be output as a state recognition result data stream.
b. Processing the preprocessed data stream based on the PHM classification model:
(1) constructing an input feature vector:
constructing an input characteristic line vector from the current data point of the preprocessed data stream;
(2) predicting the classification to which the current data point belongs by using a PHM classification model:
invoking a 'prediction' method of the PHM classification model, and carrying out predictive analysis on the input characteristic line vector to obtain a class label;
(3) outputting a health state classification diagnosis result data stream:
the class labels are used as the result of the classification diagnosis of the health state or the failure mode and are output in a data stream mode.
c. Processing the preprocessed data stream based on the PHM regression model:
(1) constructing an input feature vector:
constructing an input characteristic line vector from current data points in the preprocessed data stream;
(2) predicting a target value by using PHM regression model:
invoking a 'prediction' method of the PHM regression model, and carrying out predictive analysis on the input feature vector to obtain a target value;
(3) outputting a predicted target value data stream:
the target value is used as a performance index or a prediction result of the remaining life (RUL) and is outputted in a data stream manner.
According to actual demands, comprehensive intelligent diagnosis and prediction stream processing based on a plurality of PHM models can be performed simultaneously, and fault diagnosis and prediction result data streams are obtained. The technical effects of the PHM modeling and modeling assistance system based on big data framework of the present invention are explained below through another embodiment:
1) Example background:
taking a bearing as a facility, the PHM model is as follows: PHM model for identifying the health status of a bearing. The IMS (intelligent maintenance system) center performs a bearing life test, performs vibration data acquisition during the period from the bearing operation to Failure (Run-to-Failure) test, and comprises acceleration sensor data of 4 different positions on a test bed, so that the historical monitoring data of the bearing are obtained. The method comprises the steps of selecting 980 data files before 2 nd sets of data sets, converting vibration original data of each data file into three-dimensional vibration time domain characteristic data in advance, wherein each data file is vibration high-frequency (20 kHz) data of 1 second, the three characteristic variables are vibration energy value (Vibenergy), vibration effective value (VibRMS) and vibration peak value (VibP 2P), each sensor generates 980 multiplied by 3 matrix, each data in the matrix is marked with state labels, and the total number of the four health state labels is four states representing the health state of a bearing from good to bad, namely the labels of health, sub-health, degradation and 4 states of failure are numbered by 1, 2, 3 and 4 respectively. After establishing the PHM classification model based on the Spark modeling module, loading the PHM classification model into the PHM classification model by using the Flink module, namely carrying out real-time predictive analysis on the data stream by using the PHM model for identifying the health state of the bearing, and giving a diagnosis result.
2) Test content and method:
the following two tests were mainly carried out:
(1) first test: training 4 decision tree state classification models based on Spark MLlib classification algorithm, and storing the model in a MySQL database;
(2) second test: and the vibration characteristic data of the 4 sensors are written into the 4 partitions corresponding to Kafka through the data acquisition device every second, so that the scene of on-site vibration monitoring is simulated. And testing the function of loading a decision tree state classification model constructed by a Spark modeling module based on the Flink by using a module and performing intelligent prediction of the real-time state. The specific test contents are shown in Table 2.
Table 2:
3) Test procedure and results:
(1) the data of the 4 sensors are respectively modeled, 90% of the data are used as training samples, and 10% of the data are used as test samples. The main parameters of the algorithm are set as follows: since the algorithm defaults to classification starting from 0, the number of classifications parameter numclassifications is set to 5; the impure degree is set to "gini"; the maximum depth maxDepth of the tree is set to 5; the maximum number of branches for the split feature maxBins is set to 32. The results of the 4 decision tree models are shown in Table 3.
Table 3:
sensor numbering | Depth of tree | Number of nodes of tree | Prediction error of test samples |
1 | 3 | 9 | 0.0103 |
2 | 5 | 33 | 0.0111 |
3 | 5 | 31 | 0.0814 |
4 | 5 | 27 | 0.0297 |
(2) Writing PHM classification model to MySQL
After obtaining 4 decision tree models, converting the decision tree models into byte arrays, writing the byte arrays into a MySQL database, wherein the names of the four models are DTCModelVib01/DTCModelVib02/DTCModelVib03/DTCModelVib04 respectively, and the MySQL storage model is shown in figure 5.
(3) The Flink uses a module to load PHM classification model:
when the Flink is run by the modulo module, the corresponding model byte array is read from the MySQL database, and then the byte array is inversely sequenced into a model which can be directly used by the Flink program. As shown in fig. 6, the model finally loaded is identical to the model obtained based on Spark training.
(4) Real-time prediction of health status based on a module for flank:
the input stream is vibration characteristic data, and the corresponding fields are event time stamp (timestamp), sensor number (sensor), order (cycle), vibration energy value (VibEnergy), vibration effective value (VibRMS), and vibration peak-to-peak value (VibP 2P), respectively. The health state output stream is obtained through the prediction of the decision tree model, and the corresponding fields are an event time stamp (timestamp), a sensor number (sensor) and a health state discrete value (virstatus) respectively. The Flink uses a module to convert vibration characteristic data into health states piece by piece:
(1617845113001,1,17.0,126.05303600000082,0.07845338360094828,0.8200000000000001)=>VibStatus(1617845113001,1,1)
the two fields of the time stamp and the sensor number are unchanged before and after processing, and the 3-dimensional feature vector is converted into 1 health state discrete value through stream processing. The state discrete value refers to a health state predicted by a decision tree model, and is one of 1, 2, 3 and 4, and represents a health, sub-health, degradation and fault 4-level health state respectively.
As shown in fig. 7, the PHM modeling and modeling auxiliary method based on big data frame in the embodiment of the present invention includes the following steps:
s1, performing PHM modeling on historical monitoring data of facilities based on Spark to generate a PHM model with a preset function;
s2, serializing the PHM model with the preset function to generate a byte array corresponding to the PHM model;
s3, performing deserialization operation on the byte array corresponding to the PHM model, and loading the PHM model;
s4, pulling current monitoring data of the facility to form an input data stream;
s5, preprocessing an input data stream to obtain a preprocessed data stream;
s6, converting the preprocessed data stream by using the loaded PHM model to obtain a predicted result data stream.
Optionally, in the above technical solution, before performing the deserialization operation on the byte array corresponding to the PHM model, the method further includes:
writing a byte array corresponding to the PHM model into a database;
and reading the byte array corresponding to the PHM model from the database.
Optionally, in the above technical solution, the method further includes:
when PHM modeling is carried out, corresponding preprocessing algorithms are called from a database according to actual needs, wherein a plurality of preprocessing algorithms are written into the database in a form of a table in advance.
Optionally, in the above technical solution, the PHM model with a preset function is: the system comprises a model for data preprocessing, a PHM model for identifying an operation mode and a health state of a facility, a PHM model for predicting a health state type of the facility, a PHM model for predicting a fault mode type of the facility, a PHM model for monitoring abnormal data, and a PHM model for predicting the residual service life of the facility.
Optionally, in the above technical solution, pulling current monitoring data of the facility includes:
s40, the current monitoring data of the facility are pulled through Kafka to form an input data stream.
In the above embodiments, although steps S1, S2, etc. are numbered, only specific embodiments are given herein, and those skilled in the art may adjust the execution sequence of S1, S2, etc. according to the actual situation, which is also within the scope of the present invention, and it is understood that some embodiments may include some or all of the above embodiments.
The implementation of the steps in the PHM modeling and modeling assistance method based on a big data frame of the present invention may refer to the content in the embodiment of the PHM modeling and modeling assistance system based on a big data frame, which is not described herein.
The electronic equipment comprises a memory, a processor and a program stored in the memory and running on the processor, wherein the processor realizes the steps of the PHM modeling and modeling auxiliary method based on the big data frame when executing the program.
The electronic device may be a computer, a mobile phone, or the like, and the program is computer software or mobile phone APP, and the parameters and steps in the electronic device of the present invention may refer to parameters and steps in the embodiment of the PHM modeling and modeling assistance method based on a big data frame, which are not described herein.
Those skilled in the art will appreciate that the present invention may be implemented as a system, method, or computer program product.
Accordingly, the present disclosure may be embodied in the following forms, namely: either entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or entirely software, or a combination of hardware and software, referred to herein generally as a "circuit," module "or" system. Furthermore, in some embodiments, the invention may also be embodied in the form of a computer program product in one or more computer-readable media, which contain computer-readable program code.
Any combination of one or more computer readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer-readable storage medium include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination thereof. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.
Claims (10)
1. The PHM modeling and modeling auxiliary system based on the big data frame is characterized by comprising a Spark modeling module and a flank modeling module;
the Spark modeling module is used for:
based on Spark, PHM modeling is conducted on historical monitoring data of the facility, and a PHM model with a preset function is generated;
serializing the PHM model with the preset function to generate a byte array corresponding to the PHM model; the module for the Flink is used for:
performing deserialization operation on the byte array corresponding to the PHM model, and loading the PHM model, wherein the PHM model is a PHM cluster model, a PHM classification model or a PHM regression model;
the PHM clustering model-based processing method for the preprocessed data stream comprises the following steps:
(1) calculating the distance value between the current data point and the center of each class cluster;
(2) taking the class cluster with the smallest distance value with the data point as a target class cluster;
(3) detecting whether the distance value between the target cluster and the target cluster is smaller than or equal to the radius of the target cluster plus 3 times of standard deviation, if yes, jumping to the step (4), otherwise jumping to the step (5);
(4) outputting a data stream containing the target cluster ID and the distance value:
taking the target class cluster as a state identification result of the data point, packaging the ID of the target class cluster and the distance value into a sample class, and outputting the sample class as a state identification result data stream;
(5) outputting a data stream containing suspected anomalies and distance values;
a process for processing a pre-processed data stream based on a PHM classification model, comprising:
(1) constructing an input feature vector:
constructing an input characteristic line vector from the current data point of the preprocessed data stream;
(2) predicting the classification to which the current data point belongs by using a PHM classification model:
invoking a 'prediction' method of the PHM classification model, and carrying out predictive analysis on the input characteristic line vector to obtain a class label;
(3) outputting a health state classification diagnosis result data stream:
the class labels are used as the result of the classification diagnosis of the health state or the fault mode and are output in a data stream mode;
the PHM regression model-based processing of the preprocessed data stream comprises the following steps:
(1) constructing an input feature vector:
constructing an input characteristic line vector from current data points in the preprocessed data stream;
(2) predicting a target value by using PHM regression model:
invoking a 'prediction' method of the PHM regression model, and carrying out predictive analysis on the input feature vector to obtain a target value;
(3) outputting a predicted target value data stream:
taking the target value as a performance index or a prediction result of the residual life, and outputting the target value in a data stream mode;
pulling current monitoring data of the facility to form an input data stream;
preprocessing the input data stream to obtain a preprocessed data stream;
and converting the preprocessed data stream by using the loaded PHM model to obtain a predicted result data stream.
2. The big data framework based PHM modeling and modeling assistance system of claim 1, wherein the Spark modeling module is further configured to: writing a byte array corresponding to the PHM model into a database;
the module for the Flink is also used for: and reading a byte array corresponding to the PHM model from the database.
3. The PHM modeling and modeling assist system based on big data framework of claim 2, wherein when there is a preprocessing step in the process of developing PHM modeling, the Spark modeling module writes the preprocessing model corresponding to the preprocessing step in the form of a table to the database.
4. A PHM modeling and modeling assist system based on big data framework according to any of claims 1 to 3, wherein the PHM model with preset functions is: the system comprises a model for data preprocessing, a PHM model for identifying an operation mode and a health state of the facility, a PHM model for predicting a health state type of the facility, a PHM model for predicting a fault mode type of the facility, a PHM model for monitoring abnormal data, and a PHM model for predicting the residual service life of the facility.
5. A PHM modeling and modeling assist system based on big data framework according to any of claims 1 to 3, wherein the module for flank is specifically configured to: the current monitoring data of the facility is pulled through Kafka to form an input data stream.
6. The PHM modeling and modeling auxiliary method based on the big data frame is characterized by comprising the following steps of:
based on Spark, PHM modeling is conducted on historical monitoring data of the facility, and a PHM model with a preset function is generated;
serializing the PHM model with the preset function to generate a byte array corresponding to the PHM model;
performing deserialization operation on the byte array corresponding to the PHM model, and loading the PHM model, wherein the PHM model is a PHM cluster model, a PHM classification model or a PHM regression model;
the PHM clustering model-based processing method for the preprocessed data stream comprises the following steps:
(1) calculating the distance value between the current data point and the center of each class cluster;
(2) taking the class cluster with the smallest distance value with the data point as a target class cluster;
(3) detecting whether the distance value between the target cluster and the target cluster is smaller than or equal to the radius of the target cluster plus 3 times of standard deviation, if yes, jumping to the step (4), otherwise jumping to the step (5);
(4) outputting a data stream containing the target cluster ID and the distance value:
taking the target class cluster as a state identification result of the data point, packaging the ID of the target class cluster and the distance value into a sample class, and outputting the sample class as a state identification result data stream;
(5) outputting a data stream containing suspected anomalies and distance values;
a process for processing a pre-processed data stream based on a PHM classification model, comprising:
(1) constructing an input feature vector:
constructing an input characteristic line vector from the current data point of the preprocessed data stream;
(2) predicting the classification to which the current data point belongs by using a PHM classification model:
invoking a 'prediction' method of the PHM classification model, and carrying out predictive analysis on the input characteristic line vector to obtain a class label;
(3) outputting a health state classification diagnosis result data stream:
the class labels are used as the result of the classification diagnosis of the health state or the fault mode and are output in a data stream mode;
the PHM regression model-based processing of the preprocessed data stream comprises the following steps:
(1) constructing an input feature vector:
constructing an input characteristic line vector from current data points in the preprocessed data stream;
(2) predicting a target value by using PHM regression model:
invoking a 'prediction' method of the PHM regression model, and carrying out predictive analysis on the input feature vector to obtain a target value;
(3) outputting a predicted target value data stream:
taking the target value as a performance index or a prediction result of the residual life, and outputting the target value in a data stream mode;
pulling current monitoring data of the facility to form an input data stream;
preprocessing the input data stream to obtain a preprocessed data stream;
and converting the preprocessed data stream by using the loaded PHM model to obtain a predicted result data stream.
7. The PHM modeling and modeling assist method based on big data frame as defined in claim 6, further comprising, before performing a deserialization operation on the byte array corresponding to the PHM model:
writing a byte array corresponding to the PHM model into a database;
and reading a byte array corresponding to the PHM model from the database.
8. The big data frame based PHM modeling and modeling assistance method of claim 7, further comprising:
and calling corresponding preprocessing algorithms from the database according to actual needs in the PHM modeling development process, wherein a plurality of preprocessing algorithms are written into the database in a form of a table in advance.
9. The PHM modeling and modeling assist method based on big data frame according to any one of claims 6 to 8, wherein the PHM model with the preset function is: the system comprises a model for data preprocessing, a PHM model for identifying an operation mode and a health state of the facility, a PHM model for predicting a health state type of the facility, a PHM model for predicting a fault mode type of the facility, a PHM model for monitoring abnormal data, and a PHM model for predicting the residual service life of the facility.
10. PHM modeling and modeling assistance method based on big data framework according to any of the claims 6 to 8, characterized in that pulling the current monitoring data of the facility comprises:
the current monitoring data of the facility is pulled through Kafka to form an input data stream.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310582187.XA CN116627761B (en) | 2023-05-22 | 2023-05-22 | PHM modeling and modeling auxiliary system and method based on big data frame |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310582187.XA CN116627761B (en) | 2023-05-22 | 2023-05-22 | PHM modeling and modeling auxiliary system and method based on big data frame |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116627761A CN116627761A (en) | 2023-08-22 |
CN116627761B true CN116627761B (en) | 2024-04-05 |
Family
ID=87641171
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310582187.XA Active CN116627761B (en) | 2023-05-22 | 2023-05-22 | PHM modeling and modeling auxiliary system and method based on big data frame |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116627761B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109886430A (en) * | 2019-01-24 | 2019-06-14 | 同济大学 | A kind of equipment health state evaluation and prediction technique based on industrial big data |
CN113836806A (en) * | 2021-09-23 | 2021-12-24 | 中国科学院空间应用工程与技术中心 | PHM model construction method, system, storage medium and electronic equipment |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200250584A1 (en) * | 2019-01-31 | 2020-08-06 | Marketech International Corp. | Modeling method for smart prognostics and health management system and computer program product thereof |
-
2023
- 2023-05-22 CN CN202310582187.XA patent/CN116627761B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109886430A (en) * | 2019-01-24 | 2019-06-14 | 同济大学 | A kind of equipment health state evaluation and prediction technique based on industrial big data |
CN113836806A (en) * | 2021-09-23 | 2021-12-24 | 中国科学院空间应用工程与技术中心 | PHM model construction method, system, storage medium and electronic equipment |
Non-Patent Citations (2)
Title |
---|
军用飞机PHM 系统一体化设计架构分析;景博等;《航空工程进展》;第13卷(第3期);第64-73页 * |
空间站有效载荷预测性维护支持系统设计;施建明等;《载人航天》;第27卷(第3期);第395-402页 * |
Also Published As
Publication number | Publication date |
---|---|
CN116627761A (en) | 2023-08-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9542255B2 (en) | Troubleshooting based on log similarity | |
US20180300338A1 (en) | Distributed high-cardinality data transformation system | |
CN111435366A (en) | Equipment fault diagnosis method and device and electronic equipment | |
CN115618269B (en) | Big data analysis method and system based on industrial sensor production | |
CN114691403A (en) | Server fault diagnosis method and device, electronic equipment and storage medium | |
CN115659175A (en) | Multi-mode data analysis method, device and medium for micro-service resources | |
CN116821646A (en) | Data processing chain construction method, data reduction method, device, equipment and medium | |
US20210019456A1 (en) | Accelerated simulation setup process using prior knowledge extraction for problem matching | |
CN117675838A (en) | Automatic synchronization and sharing method and system for intelligent measurement master station data | |
CN116361147A (en) | Method for positioning root cause of test case, device, equipment, medium and product thereof | |
CN116451116A (en) | Fault identification model construction method, device, computer equipment and storage medium | |
CN116627761B (en) | PHM modeling and modeling auxiliary system and method based on big data frame | |
Bond et al. | A hybrid learning approach to prognostics and health management applied to military ground vehicles using time-series and maintenance event data | |
CN118069885B (en) | Dynamic video content coding and retrieving method and system | |
CN118365497A (en) | Smart city environment monitoring method and device and electronic equipment | |
CN115328753B (en) | Fault prediction method and device, electronic equipment and storage medium | |
CN115220131B (en) | Meteorological data quality inspection method and system | |
Nguyen | Feature Engineering and Health Indicator Construction for Fault Detection and Diagnostic | |
CN114239538A (en) | Assertion processing method and device, computer equipment and storage medium | |
CN117435441B (en) | Log data-based fault diagnosis method and device | |
CN115238805B (en) | Training method of abnormal data recognition model and related equipment | |
CN117057240B (en) | Vehicle testing method and system based on virtual simulation | |
CN115906089B (en) | Vulnerability detection method and system based on binary function similarity | |
Yan et al. | Fault diagnosis of rolling bearing based on rough set and neural network | |
CN118941153A (en) | Data link abnormality positioning method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |