CN111597175B - Filling method of sensor missing value fusing time-space information - Google Patents
Filling method of sensor missing value fusing time-space information Download PDFInfo
- Publication number
- CN111597175B CN111597175B CN202010374180.5A CN202010374180A CN111597175B CN 111597175 B CN111597175 B CN 111597175B CN 202010374180 A CN202010374180 A CN 202010374180A CN 111597175 B CN111597175 B CN 111597175B
- Authority
- CN
- China
- Prior art keywords
- missing
- lstm
- data
- filling
- steps
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000013528 artificial neural network Methods 0.000 claims abstract description 17
- 210000002569 neuron Anatomy 0.000 claims description 11
- 238000013135 deep learning Methods 0.000 claims description 3
- 230000014759 maintenance of location Effects 0.000 claims description 3
- 238000012847 principal component analysis method Methods 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 4
- 230000036541 health Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Remote Sensing (AREA)
- Quality & Reliability (AREA)
- Investigating Or Analyzing Materials By The Use Of Ultrasonic Waves (AREA)
Abstract
The invention provides a filling method of sensor missing values fusing space-time information, which comprises the following steps: inputting N pieces of historical data X and M pieces of missing data X missing The method comprises the steps of carrying out a first treatment on the surface of the Wherein M, N is greater than the input timing length T; filling a threshold eta, inputting the history data into the trained LSTM-AE S After that, η=std (X-X'); obtaining a trained model LSTM-AE S The method comprises the steps of carrying out a first treatment on the surface of the Repaired data X repaired The method comprises the steps of carrying out a first treatment on the surface of the Dividing the original data into time sequence data sets; initializing LSTM-AE S The method comprises the steps of carrying out a first treatment on the surface of the Then using Tensorflow to initialize the network; updating LSTM-AE using a back propagation algorithm commonly used by neural networks S Weight W of (2); and filling the missing value. The method and the device consider the space-time information at the same time, can be relatively robust when a large number of sensors are simultaneously missing, can train a single model to process different types of missing, and can meet the real-time requirement of filling of missing values of the sensors.
Description
Technical Field
The invention belongs to the field of equipment health management, and in particular relates to a filling method of sensor missing values fused with space-time information.
Background
Previously, the missing value filling method utilized only the association in the data space, and did not use the timing information. They do not perform well when there is a multidimensional loss of data, and even cannot be used in the presence of a block loss. Furthermore, previous work has first assumed that the location of the missing values is known, modeling for a single missing type. But the location of the missing values is not known in real-time systems. In this case, in order to be able to cope with various types of missing in real time, the number of models to be trained grows exponentially with the number of sensors, which is disadvantageous for practical application of the missing value filling method.
Disclosure of Invention
In order to solve the problems, the invention provides a filling method of sensor missing values fused with space-time information, which combines a depth automatic encoder and a long-short time neural network.
The invention provides a filling method of sensor missing values fusing space-time information, which comprises the following steps:
inputting N pieces of historical data X and M pieces of missing data X missing The method comprises the steps of carrying out a first treatment on the surface of the Wherein M, N is greater than the input timing length T;
filling a threshold eta, inputting the history data into the trained LSTM-AE S Then η=std (X-X '), where X is the test data, X' is the model output data, std is the standard deviation; obtaining a trained model LSTM-AE S The method comprises the steps of carrying out a first treatment on the surface of the Repaired data X repaired ;
Dividing the original data into time sequence data sets;
initializing LSTM-AE S : constructing a multi-layer self-coding neural network by using a Tensorflow deep learning framework, wherein the neuron types use LSTM, the number of neurons of a first layer is consistent with the number of sensors, and the number of neurons of an intermediate coding layer is the minimum dimension when the information retention rate of the historical data X after the dimension reduction is more than 99% by using a principal component analysis method; then using Tensorflow to initialize the network;
updating LSTM-AE using a back propagation algorithm commonly used by neural networks S Weight W of (2);
and filling the missing value.
In the above method, wherein, in initializing LSTM-AE S And before updating the weight W, further comprising calculating a reconstruction error:wherein->Respectively X j 、X j Sensor data at time T in'.
In the above method, wherein, in initializing LSTM-AE S And before updating the weight W, further comprising calculating a regularization term error:wherein-> Respectively represent X j '、X j+1 '、X j-1 Data at time T in'.
In the above method, wherein, in initializing LSTM-AE S And before updating the weight W, further comprising calculating a penalty term:wherein->Is to solve the partial derivative, the whole neural network is regarded as a weight W, a bias term b and an input X T By h W,b (X T ) Referring to θ, the regularization parameter.
The method has the advantages that the space-time information in the data can be simultaneously mined for missing value filling, and filling precision in multidimensional missing is greatly improved. Moreover, since the automatic encoder recovers all data simultaneously, the model can be trained for different deletion types, and the model training complexity and the data requirement are greatly reduced. Meanwhile, smoothness regularization is introduced, so that the prediction accuracy and robustness of the model are further improved. The use of the shared weight strategy enables the model to converge more quickly, and reduces the training complexity of the algorithm.
The existence of the missing values is a great hidden trouble for the safe and stable operation of a plurality of large-scale equipment, particularly power plants, and the precision and the robustness of the existing methods can not be practically applied to the practical application of the missing value filling, and the algorithm provided by the invention can improve the accuracy of the classical algorithm by more than 60% on the data of the actually operated gas turbine and has stronger robustness under the condition of multidimensional missing.
The invention can be widely applied to the health management system of the large-scale power device so as to realize the stable operation of the health management system.
Drawings
FIG. 1 shows a training flow diagram for LSTM-AE and LSTM-AEs.
Detailed Description
The following examples will enable those skilled in the art to more fully understand the present invention and are not intended to limit the same in any way.
In order to be able to use the timing information and the spatial information in the sensor data, neurons in the automatic encoder are replaced with cell structures in the long-term neural network, so that the automatic encoder can mine the timing and the spatial information simultaneously.
In addition, in the training process, the sensor data at the current moment is recovered as a main target, the middle characteristic layer can be free from a long-short-time neural network layer and only needs to use common neurons, so that the complexity of the model is reduced. The input model in the training process is data in a matrix form, the corresponding sensor data are in the transverse direction, and the time axis is in the longitudinal direction.
Because the time series data has smooth characteristic, namely the value change between adjacent records is not too large, smoothness regularization is introduced into the model, and meanwhile, the risk of over-fitting of the model is reduced. In addition, the model becomes more complex and difficult to solve after the regular term is added, so that a strategy of sharing the weight is introduced to avoid unified solving of the loss of the regular term. And simultaneously inputting the data before and after the current moment into the model to obtain a reconstruction value, and directly solving the loss of the regularization term through the three reconstruction values.
The determination of the missing value is performed by a threshold determination method. In general, the difference between the reconstructed values corresponding to the missing values is large. If the difference value exceeds the threshold value, the model reconstruction value can be judged to replace the existing value, so that the purpose of automatic filling of the missing value is achieved.
The invention will be better understood by reference to the following examples.
Input: n pieces of history data X; m pieces of missing data X missing The method comprises the steps of carrying out a first treatment on the surface of the Wherein M, N must be greater than the input timing length T; for sensor data, the T value is generally more than 200; filling threshold: η, input the history data into trained LSTM-AE S After that, η=std(X-X '), wherein X is test data, X' is model output data, and std is standard deviation.
And (3) outputting: trained model LSTM-AE S The method comprises the steps of carrying out a first treatment on the surface of the Repaired data X repaired 。
Training:
1) Dividing the raw data into time series data sets:
for i=0 to N-T do
|X i =X(X i+1 ,....X i+T )
end
X train ={X 1 ,X 2 ,...X N-T }
2) Training a model:
initializing LSTM-AE S : constructing a multi-layer self-coding neural network by using a Tensorflow deep learning framework, wherein the neuron types use LSTM, the number of neurons of a first layer is consistent with the number of sensors, and the number of neurons of an intermediate coding layer is the minimum dimension when the information retention rate of the historical data X after the dimension reduction is more than 99% by using a principal component analysis method; network initialization is then performed using Tensorflow.
for j=1to N-T do
X is to be j-1 ,X j ,X j+1 Input LSTM-AE S And obtain output X j-1′ ,X j′ ,X j+1′ ;
Sensor data at time T in (b);
calculating a regularization term error:wherein-> Respectively represent X j '、X j+1' 、X j-1' Data at time T in (b);
calculating a loss term:wherein->The partial derivative is solved, and the whole neural network can be regarded as a weight W, a bias term b and an input X T Is herein h W,b (X T ) In the method, θ is a regularization parameter, and the value is generally smaller than 0.1, and larger values can lead to too smooth prediction results of an algorithm obtained through training, so that the results are poor;
updating LSTM-AE using a back propagation algorithm commonly used by neural networks S Weight W of (a).
Missing value filling:
for k=1 to M-T do
X k =X missing (X k+1 ,...X k+T );
x is to be k Inputting trained LSTM-AE s And obtain output X k ';
Wherein the method comprises the steps ofRefers to output X k ' predicted value at time T +.>Is the predicted value on its p-th sensor; the eta takes the value of the difference between the predicted value and the true value in the training processStandard deviation of values;
end。
FIG. 1 shows a model training flow diagram of the present invention. The algorithm flow disclosed by the invention may be better understood with reference to fig. 1.
The sensor missing value filling algorithm fused with the space-time information combines a depth automatic encoder and a long-short time neural network, integrates the feature extraction capability and the time sequence feature mining capability of the depth automatic encoder and the long-short time neural network, combines the depth automatic encoder and the long-short time neural network into the same depth network for optimization, and can more accurately fill the missing value through the combination of the space-time information. Meanwhile, the method utilizes the smoothness characteristic of the time sequence data of the sensor, namely that adjacent records should change smoothly, and smoothness regularization is added into the model, so that the filling precision is further improved. In addition, by introducing a weight sharing mechanism, the calculation of the regular term is simplified, and the training speed of the model is also improved to a certain extent. The method and the device consider the space-time information at the same time, can be relatively robust when a large number of sensors are simultaneously missing, have higher filling precision compared with the prior art, can simultaneously process different types of missing when one model is trained, and can meet the real-time requirement of filling the missing values of the sensors.
In addition, the method of the invention fuses the depth automatic encoder and the long and short time neural network, and combines the two into the same depth network for optimization. In addition, by adding smoothness regularization into the model, the accuracy of filling missing values by the algorithm can be further improved. In addition, data at different moments are input into a shared network through a weight sharing mechanism, and then the regular term loss is directly calculated, so that the calculation of the regular term is simplified, and the training speed of an algorithm is accelerated.
It should be understood by those skilled in the art that the above embodiments are exemplary embodiments only and that various changes, substitutions, and alterations can be made hereto without departing from the spirit and scope of the present application.
Claims (4)
1. A method of filling sensor missing values that fuse spatio-temporal information, comprising:
inputting N pieces of historical data X and M pieces of missing data X missing The method comprises the steps of carrying out a first treatment on the surface of the Wherein M, N is greater than the input timing length T;
filling a threshold eta, inputting the history data into the trained LSTM-AE S Then η=std (X-X '), where X is the test data, X' is the model output data, std is the standard deviation; obtaining a trained model LSTM-AE S The method comprises the steps of carrying out a first treatment on the surface of the Repaired data X repaired ;
Dividing N pieces of historical data X into time sequence data sets;
initializing LSTM-AE S : constructing a multi-layer self-coding neural network by using a Tensorflow deep learning framework, wherein the neuron types use LSTM, the number of neurons of a first layer is consistent with the number of sensors, and the number of neurons of an intermediate coding layer is the minimum dimension when the information retention rate of the historical data X after the dimension reduction is more than 99% by using a principal component analysis method; then using Tensorflow to initialize the network;
updating LSTM-AE using a back propagation algorithm commonly used by neural networks S Weight W of (2);
m pieces of missing data X missing Inputting trained LSTM-AE S And filling the missing value.
4. The method of claim 3, wherein, in initializing LSTM-AE S And before updating the weight W, further comprising calculating a penalty term:wherein->Is to solve the partial derivative, the whole neural network is regarded as a weight W, a bias term b and an input X T By h W,b (X T ) Referring to θ, the regularization parameter. />
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010374180.5A CN111597175B (en) | 2020-05-06 | 2020-05-06 | Filling method of sensor missing value fusing time-space information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010374180.5A CN111597175B (en) | 2020-05-06 | 2020-05-06 | Filling method of sensor missing value fusing time-space information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111597175A CN111597175A (en) | 2020-08-28 |
CN111597175B true CN111597175B (en) | 2023-06-02 |
Family
ID=72182558
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010374180.5A Active CN111597175B (en) | 2020-05-06 | 2020-05-06 | Filling method of sensor missing value fusing time-space information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111597175B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112948743B (en) * | 2021-03-26 | 2022-05-03 | 重庆邮电大学 | Coal mine gas concentration deficiency value filling method based on space-time fusion |
CN113297191B (en) * | 2021-05-28 | 2022-04-05 | 湖南大学 | Stream processing method and system for network missing data online filling |
CN113554105B (en) * | 2021-07-28 | 2023-04-18 | 桂林电子科技大学 | Missing data completion method for Internet of things based on space-time fusion |
CN116611717B (en) * | 2023-04-11 | 2024-03-19 | 南京邮电大学 | Filling method of fusion auxiliary information based on explicit and implicit expression |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107273429A (en) * | 2017-05-19 | 2017-10-20 | 哈工大大数据产业有限公司 | A kind of Missing Data Filling method and system based on deep learning |
CN107392307A (en) * | 2017-08-04 | 2017-11-24 | 电子科技大学 | The Forecasting Methodology of parallelization time series data |
CN108090558A (en) * | 2018-01-03 | 2018-05-29 | 华南理工大学 | A kind of automatic complementing method of time series missing values based on shot and long term memory network |
CN108805193A (en) * | 2018-06-01 | 2018-11-13 | 广东电网有限责任公司 | A kind of power loss data filling method based on mixed strategy |
-
2020
- 2020-05-06 CN CN202010374180.5A patent/CN111597175B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107273429A (en) * | 2017-05-19 | 2017-10-20 | 哈工大大数据产业有限公司 | A kind of Missing Data Filling method and system based on deep learning |
CN107392307A (en) * | 2017-08-04 | 2017-11-24 | 电子科技大学 | The Forecasting Methodology of parallelization time series data |
CN108090558A (en) * | 2018-01-03 | 2018-05-29 | 华南理工大学 | A kind of automatic complementing method of time series missing values based on shot and long term memory network |
CN108805193A (en) * | 2018-06-01 | 2018-11-13 | 广东电网有限责任公司 | A kind of power loss data filling method based on mixed strategy |
Non-Patent Citations (1)
Title |
---|
DeepTAL: Deep Learning for TDOA-Based Asynchronous Localization Security With Measurement Error and Missing Data;Yuan Xue等;《IEEE Access Volume:7)》;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111597175A (en) | 2020-08-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111597175B (en) | Filling method of sensor missing value fusing time-space information | |
CN110210169B (en) | LSTM-based shield tunneling machine fault prediction method | |
CN111062464B (en) | Power communication network reliability prediction and guarantee method and system based on deep learning | |
CN112947385B (en) | Aircraft fault diagnosis method and system based on improved Transformer model | |
CN111598325A (en) | Traffic speed prediction method based on hierarchical clustering and hierarchical attention mechanism | |
CN111160620A (en) | Short-term wind power prediction method based on end-to-end memory network | |
CN114495500B (en) | Traffic prediction method based on dual dynamic space-time diagram convolution | |
CN112330951A (en) | Method for realizing road network traffic data restoration based on generation of countermeasure network | |
CN116007937B (en) | Intelligent fault diagnosis method and device for mechanical equipment transmission part | |
CN112766603A (en) | Traffic flow prediction method, system, computer device and storage medium | |
Calvette et al. | Forecasting smart well production via deep learning and data driven optimization | |
CN109523082A (en) | A method of based on CNN-LSTM flight, normally clearance rate is predicted | |
CN116402352A (en) | Enterprise risk prediction method and device, electronic equipment and medium | |
CN110750455A (en) | Intelligent online self-updating fault diagnosis method and system based on system log analysis | |
CN116894180B (en) | Product manufacturing quality prediction method based on different composition attention network | |
CN117131979A (en) | Traffic flow speed prediction method and system based on directed hypergraph and attention mechanism | |
CN117768377A (en) | Power grid backbone optical communication system route calculation method based on graph neural network | |
CN117690289A (en) | Traffic network coding representation learning method based on mask graph attention mechanism | |
CN117477544A (en) | LSTM ultra-short-term photovoltaic power prediction method and system integrating time mode features | |
CN117318018A (en) | Short-term prediction method and system for wind power output | |
CN116821784A (en) | ST-Informar-based ship traffic flow long-sequence space-time prediction method | |
CN114936669B (en) | Mixed ship rolling prediction method based on data fusion | |
CN114399901B (en) | Method and equipment for controlling traffic system | |
CN115859826A (en) | Integrated learning-based shield equipment fault fusion prediction method | |
CN112712192B (en) | Coal mine gas concentration prediction method combining integrated learning and weighted extreme learning machine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |