CN111597175B - Filling method of sensor missing value fusing time-space information - Google Patents

Filling method of sensor missing value fusing time-space information Download PDF

Info

Publication number
CN111597175B
CN111597175B CN202010374180.5A CN202010374180A CN111597175B CN 111597175 B CN111597175 B CN 111597175B CN 202010374180 A CN202010374180 A CN 202010374180A CN 111597175 B CN111597175 B CN 111597175B
Authority
CN
China
Prior art keywords
missing
lstm
data
filling
steps
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010374180.5A
Other languages
Chinese (zh)
Other versions
CN111597175A (en
Inventor
胡清华
李东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202010374180.5A priority Critical patent/CN111597175B/en
Publication of CN111597175A publication Critical patent/CN111597175A/en
Application granted granted Critical
Publication of CN111597175B publication Critical patent/CN111597175B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Remote Sensing (AREA)
  • Quality & Reliability (AREA)
  • Investigating Or Analyzing Materials By The Use Of Ultrasonic Waves (AREA)

Abstract

The invention provides a filling method of sensor missing values fusing space-time information, which comprises the following steps: inputting N pieces of historical data X and M pieces of missing data X missing The method comprises the steps of carrying out a first treatment on the surface of the Wherein M, N is greater than the input timing length T; filling a threshold eta, inputting the history data into the trained LSTM-AE S After that, η=std (X-X'); obtaining a trained model LSTM-AE S The method comprises the steps of carrying out a first treatment on the surface of the Repaired data X repaired The method comprises the steps of carrying out a first treatment on the surface of the Dividing the original data into time sequence data sets; initializing LSTM-AE S The method comprises the steps of carrying out a first treatment on the surface of the Then using Tensorflow to initialize the network; updating LSTM-AE using a back propagation algorithm commonly used by neural networks S Weight W of (2); and filling the missing value. The method and the device consider the space-time information at the same time, can be relatively robust when a large number of sensors are simultaneously missing, can train a single model to process different types of missing, and can meet the real-time requirement of filling of missing values of the sensors.

Description

Filling method of sensor missing value fusing time-space information
Technical Field
The invention belongs to the field of equipment health management, and in particular relates to a filling method of sensor missing values fused with space-time information.
Background
Previously, the missing value filling method utilized only the association in the data space, and did not use the timing information. They do not perform well when there is a multidimensional loss of data, and even cannot be used in the presence of a block loss. Furthermore, previous work has first assumed that the location of the missing values is known, modeling for a single missing type. But the location of the missing values is not known in real-time systems. In this case, in order to be able to cope with various types of missing in real time, the number of models to be trained grows exponentially with the number of sensors, which is disadvantageous for practical application of the missing value filling method.
Disclosure of Invention
In order to solve the problems, the invention provides a filling method of sensor missing values fused with space-time information, which combines a depth automatic encoder and a long-short time neural network.
The invention provides a filling method of sensor missing values fusing space-time information, which comprises the following steps:
inputting N pieces of historical data X and M pieces of missing data X missing The method comprises the steps of carrying out a first treatment on the surface of the Wherein M, N is greater than the input timing length T;
filling a threshold eta, inputting the history data into the trained LSTM-AE S Then η=std (X-X '), where X is the test data, X' is the model output data, std is the standard deviation; obtaining a trained model LSTM-AE S The method comprises the steps of carrying out a first treatment on the surface of the Repaired data X repaired
Dividing the original data into time sequence data sets;
initializing LSTM-AE S : constructing a multi-layer self-coding neural network by using a Tensorflow deep learning framework, wherein the neuron types use LSTM, the number of neurons of a first layer is consistent with the number of sensors, and the number of neurons of an intermediate coding layer is the minimum dimension when the information retention rate of the historical data X after the dimension reduction is more than 99% by using a principal component analysis method; then using Tensorflow to initialize the network;
updating LSTM-AE using a back propagation algorithm commonly used by neural networks S Weight W of (2);
and filling the missing value.
In the above method, wherein, in initializing LSTM-AE S And before updating the weight W, further comprising calculating a reconstruction error:
Figure GDA0004146471000000021
wherein->
Figure GDA0004146471000000022
Respectively X j 、X j Sensor data at time T in'.
In the above method, wherein, in initializing LSTM-AE S And before updating the weight W, further comprising calculating a regularization term error:
Figure GDA0004146471000000023
wherein->
Figure GDA0004146471000000024
Figure GDA0004146471000000025
Respectively represent X j '、X j+1 '、X j-1 Data at time T in'.
In the above method, wherein, in initializing LSTM-AE S And before updating the weight W, further comprising calculating a penalty term:
Figure GDA0004146471000000026
wherein->
Figure GDA0004146471000000027
Is to solve the partial derivative, the whole neural network is regarded as a weight W, a bias term b and an input X T By h W,b (X T ) Referring to θ, the regularization parameter.
The method has the advantages that the space-time information in the data can be simultaneously mined for missing value filling, and filling precision in multidimensional missing is greatly improved. Moreover, since the automatic encoder recovers all data simultaneously, the model can be trained for different deletion types, and the model training complexity and the data requirement are greatly reduced. Meanwhile, smoothness regularization is introduced, so that the prediction accuracy and robustness of the model are further improved. The use of the shared weight strategy enables the model to converge more quickly, and reduces the training complexity of the algorithm.
The existence of the missing values is a great hidden trouble for the safe and stable operation of a plurality of large-scale equipment, particularly power plants, and the precision and the robustness of the existing methods can not be practically applied to the practical application of the missing value filling, and the algorithm provided by the invention can improve the accuracy of the classical algorithm by more than 60% on the data of the actually operated gas turbine and has stronger robustness under the condition of multidimensional missing.
The invention can be widely applied to the health management system of the large-scale power device so as to realize the stable operation of the health management system.
Drawings
FIG. 1 shows a training flow diagram for LSTM-AE and LSTM-AEs.
Detailed Description
The following examples will enable those skilled in the art to more fully understand the present invention and are not intended to limit the same in any way.
In order to be able to use the timing information and the spatial information in the sensor data, neurons in the automatic encoder are replaced with cell structures in the long-term neural network, so that the automatic encoder can mine the timing and the spatial information simultaneously.
In addition, in the training process, the sensor data at the current moment is recovered as a main target, the middle characteristic layer can be free from a long-short-time neural network layer and only needs to use common neurons, so that the complexity of the model is reduced. The input model in the training process is data in a matrix form, the corresponding sensor data are in the transverse direction, and the time axis is in the longitudinal direction.
Because the time series data has smooth characteristic, namely the value change between adjacent records is not too large, smoothness regularization is introduced into the model, and meanwhile, the risk of over-fitting of the model is reduced. In addition, the model becomes more complex and difficult to solve after the regular term is added, so that a strategy of sharing the weight is introduced to avoid unified solving of the loss of the regular term. And simultaneously inputting the data before and after the current moment into the model to obtain a reconstruction value, and directly solving the loss of the regularization term through the three reconstruction values.
The determination of the missing value is performed by a threshold determination method. In general, the difference between the reconstructed values corresponding to the missing values is large. If the difference value exceeds the threshold value, the model reconstruction value can be judged to replace the existing value, so that the purpose of automatic filling of the missing value is achieved.
The invention will be better understood by reference to the following examples.
Input: n pieces of history data X; m pieces of missing data X missing The method comprises the steps of carrying out a first treatment on the surface of the Wherein M, N must be greater than the input timing length T; for sensor data, the T value is generally more than 200; filling threshold: η, input the history data into trained LSTM-AE S After that, η=std(X-X '), wherein X is test data, X' is model output data, and std is standard deviation.
And (3) outputting: trained model LSTM-AE S The method comprises the steps of carrying out a first treatment on the surface of the Repaired data X repaired
Training:
1) Dividing the raw data into time series data sets:
for i=0 to N-T do
|X i =X(X i+1 ,....X i+T )
end
X train ={X 1 ,X 2 ,...X N-T }
2) Training a model:
initializing LSTM-AE S : constructing a multi-layer self-coding neural network by using a Tensorflow deep learning framework, wherein the neuron types use LSTM, the number of neurons of a first layer is consistent with the number of sensors, and the number of neurons of an intermediate coding layer is the minimum dimension when the information retention rate of the historical data X after the dimension reduction is more than 99% by using a principal component analysis method; network initialization is then performed using Tensorflow.
for j=1to N-T do
X is to be j-1 ,X j ,X j+1 Input LSTM-AE S And obtain output X j-1′ ,X j′ ,X j+1′
Calculating a reconstruction error:
Figure GDA0004146471000000041
wherein->
Figure GDA0004146471000000042
Respectively X j 、X j'
Sensor data at time T in (b);
calculating a regularization term error:
Figure GDA0004146471000000043
wherein->
Figure GDA0004146471000000044
Figure GDA0004146471000000045
Respectively represent X j '、X j+1' 、X j-1' Data at time T in (b);
calculating a loss term:
Figure GDA0004146471000000046
wherein->
Figure GDA0004146471000000047
The partial derivative is solved, and the whole neural network can be regarded as a weight W, a bias term b and an input X T Is herein h W,b (X T ) In the method, θ is a regularization parameter, and the value is generally smaller than 0.1, and larger values can lead to too smooth prediction results of an algorithm obtained through training, so that the results are poor;
updating LSTM-AE using a back propagation algorithm commonly used by neural networks S Weight W of (a).
Missing value filling:
for k=1 to M-T do
X k =X missing (X k+1 ,...X k+T );
x is to be k Inputting trained LSTM-AE s And obtain output X k ';
Figure GDA0004146471000000051
Wherein the method comprises the steps of
Figure GDA0004146471000000052
Refers to output X k ' predicted value at time T +.>
Figure GDA0004146471000000053
Is the predicted value on its p-th sensor; the eta takes the value of the difference between the predicted value and the true value in the training processStandard deviation of values;
end。
FIG. 1 shows a model training flow diagram of the present invention. The algorithm flow disclosed by the invention may be better understood with reference to fig. 1.
The sensor missing value filling algorithm fused with the space-time information combines a depth automatic encoder and a long-short time neural network, integrates the feature extraction capability and the time sequence feature mining capability of the depth automatic encoder and the long-short time neural network, combines the depth automatic encoder and the long-short time neural network into the same depth network for optimization, and can more accurately fill the missing value through the combination of the space-time information. Meanwhile, the method utilizes the smoothness characteristic of the time sequence data of the sensor, namely that adjacent records should change smoothly, and smoothness regularization is added into the model, so that the filling precision is further improved. In addition, by introducing a weight sharing mechanism, the calculation of the regular term is simplified, and the training speed of the model is also improved to a certain extent. The method and the device consider the space-time information at the same time, can be relatively robust when a large number of sensors are simultaneously missing, have higher filling precision compared with the prior art, can simultaneously process different types of missing when one model is trained, and can meet the real-time requirement of filling the missing values of the sensors.
In addition, the method of the invention fuses the depth automatic encoder and the long and short time neural network, and combines the two into the same depth network for optimization. In addition, by adding smoothness regularization into the model, the accuracy of filling missing values by the algorithm can be further improved. In addition, data at different moments are input into a shared network through a weight sharing mechanism, and then the regular term loss is directly calculated, so that the calculation of the regular term is simplified, and the training speed of an algorithm is accelerated.
It should be understood by those skilled in the art that the above embodiments are exemplary embodiments only and that various changes, substitutions, and alterations can be made hereto without departing from the spirit and scope of the present application.

Claims (4)

1. A method of filling sensor missing values that fuse spatio-temporal information, comprising:
inputting N pieces of historical data X and M pieces of missing data X missing The method comprises the steps of carrying out a first treatment on the surface of the Wherein M, N is greater than the input timing length T;
filling a threshold eta, inputting the history data into the trained LSTM-AE S Then η=std (X-X '), where X is the test data, X' is the model output data, std is the standard deviation; obtaining a trained model LSTM-AE S The method comprises the steps of carrying out a first treatment on the surface of the Repaired data X repaired
Dividing N pieces of historical data X into time sequence data sets;
initializing LSTM-AE S : constructing a multi-layer self-coding neural network by using a Tensorflow deep learning framework, wherein the neuron types use LSTM, the number of neurons of a first layer is consistent with the number of sensors, and the number of neurons of an intermediate coding layer is the minimum dimension when the information retention rate of the historical data X after the dimension reduction is more than 99% by using a principal component analysis method; then using Tensorflow to initialize the network;
updating LSTM-AE using a back propagation algorithm commonly used by neural networks S Weight W of (2);
m pieces of missing data X missing Inputting trained LSTM-AE S And filling the missing value.
2. The method of claim 1, wherein, at initialization of LSTM-AE S And before updating the weight W, further comprising calculating a reconstruction error:
Figure FDA0004189363970000011
wherein->
Figure FDA0004189363970000012
Respectively X j 、X j' The value range of j is 1 to N-T.
3. The method of claim 2, wherein, at initialization of LSTM-AE S And before updating the weight W, further comprising calculating a regularization term error:
Figure FDA0004189363970000013
wherein->
Figure FDA0004189363970000014
Respectively represent X j' 、X j+1' 、X j-1' The value range of j is 1 to N-T.
4. The method of claim 3, wherein, in initializing LSTM-AE S And before updating the weight W, further comprising calculating a penalty term:
Figure FDA0004189363970000015
wherein->
Figure FDA0004189363970000016
Is to solve the partial derivative, the whole neural network is regarded as a weight W, a bias term b and an input X T By h W,b (X T ) Referring to θ, the regularization parameter. />
CN202010374180.5A 2020-05-06 2020-05-06 Filling method of sensor missing value fusing time-space information Active CN111597175B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010374180.5A CN111597175B (en) 2020-05-06 2020-05-06 Filling method of sensor missing value fusing time-space information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010374180.5A CN111597175B (en) 2020-05-06 2020-05-06 Filling method of sensor missing value fusing time-space information

Publications (2)

Publication Number Publication Date
CN111597175A CN111597175A (en) 2020-08-28
CN111597175B true CN111597175B (en) 2023-06-02

Family

ID=72182558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010374180.5A Active CN111597175B (en) 2020-05-06 2020-05-06 Filling method of sensor missing value fusing time-space information

Country Status (1)

Country Link
CN (1) CN111597175B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112948743B (en) * 2021-03-26 2022-05-03 重庆邮电大学 Coal mine gas concentration deficiency value filling method based on space-time fusion
CN113297191B (en) * 2021-05-28 2022-04-05 湖南大学 Stream processing method and system for network missing data online filling
CN113554105B (en) * 2021-07-28 2023-04-18 桂林电子科技大学 Missing data completion method for Internet of things based on space-time fusion
CN116611717B (en) * 2023-04-11 2024-03-19 南京邮电大学 Filling method of fusion auxiliary information based on explicit and implicit expression

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273429A (en) * 2017-05-19 2017-10-20 哈工大大数据产业有限公司 A kind of Missing Data Filling method and system based on deep learning
CN107392307A (en) * 2017-08-04 2017-11-24 电子科技大学 The Forecasting Methodology of parallelization time series data
CN108090558A (en) * 2018-01-03 2018-05-29 华南理工大学 A kind of automatic complementing method of time series missing values based on shot and long term memory network
CN108805193A (en) * 2018-06-01 2018-11-13 广东电网有限责任公司 A kind of power loss data filling method based on mixed strategy

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107273429A (en) * 2017-05-19 2017-10-20 哈工大大数据产业有限公司 A kind of Missing Data Filling method and system based on deep learning
CN107392307A (en) * 2017-08-04 2017-11-24 电子科技大学 The Forecasting Methodology of parallelization time series data
CN108090558A (en) * 2018-01-03 2018-05-29 华南理工大学 A kind of automatic complementing method of time series missing values based on shot and long term memory network
CN108805193A (en) * 2018-06-01 2018-11-13 广东电网有限责任公司 A kind of power loss data filling method based on mixed strategy

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DeepTAL: Deep Learning for TDOA-Based Asynchronous Localization Security With Measurement Error and Missing Data;Yuan Xue等;《IEEE Access Volume:7)》;全文 *

Also Published As

Publication number Publication date
CN111597175A (en) 2020-08-28

Similar Documents

Publication Publication Date Title
CN111597175B (en) Filling method of sensor missing value fusing time-space information
CN110210169B (en) LSTM-based shield tunneling machine fault prediction method
CN111062464B (en) Power communication network reliability prediction and guarantee method and system based on deep learning
CN112947385B (en) Aircraft fault diagnosis method and system based on improved Transformer model
CN111598325A (en) Traffic speed prediction method based on hierarchical clustering and hierarchical attention mechanism
CN111160620A (en) Short-term wind power prediction method based on end-to-end memory network
CN114495500B (en) Traffic prediction method based on dual dynamic space-time diagram convolution
CN112330951A (en) Method for realizing road network traffic data restoration based on generation of countermeasure network
CN116007937B (en) Intelligent fault diagnosis method and device for mechanical equipment transmission part
CN112766603A (en) Traffic flow prediction method, system, computer device and storage medium
Calvette et al. Forecasting smart well production via deep learning and data driven optimization
CN109523082A (en) A method of based on CNN-LSTM flight, normally clearance rate is predicted
CN116402352A (en) Enterprise risk prediction method and device, electronic equipment and medium
CN110750455A (en) Intelligent online self-updating fault diagnosis method and system based on system log analysis
CN116894180B (en) Product manufacturing quality prediction method based on different composition attention network
CN117131979A (en) Traffic flow speed prediction method and system based on directed hypergraph and attention mechanism
CN117768377A (en) Power grid backbone optical communication system route calculation method based on graph neural network
CN117690289A (en) Traffic network coding representation learning method based on mask graph attention mechanism
CN117477544A (en) LSTM ultra-short-term photovoltaic power prediction method and system integrating time mode features
CN117318018A (en) Short-term prediction method and system for wind power output
CN116821784A (en) ST-Informar-based ship traffic flow long-sequence space-time prediction method
CN114936669B (en) Mixed ship rolling prediction method based on data fusion
CN114399901B (en) Method and equipment for controlling traffic system
CN115859826A (en) Integrated learning-based shield equipment fault fusion prediction method
CN112712192B (en) Coal mine gas concentration prediction method combining integrated learning and weighted extreme learning machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant