CN114065919A - Deficiency value completion method and medium based on generation countermeasure network - Google Patents
Deficiency value completion method and medium based on generation countermeasure network Download PDFInfo
- Publication number
- CN114065919A CN114065919A CN202111360740.2A CN202111360740A CN114065919A CN 114065919 A CN114065919 A CN 114065919A CN 202111360740 A CN202111360740 A CN 202111360740A CN 114065919 A CN114065919 A CN 114065919A
- Authority
- CN
- China
- Prior art keywords
- matrix
- data
- generator
- training
- historical data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/061—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a missing value completion method and a medium based on a generation countermeasure network, wherein the method comprises the following steps: collecting incomplete historical data to obtain a training set; iteratively training the generator and the discriminator in an off-line manner based on a training set; determining a confrontation network model through a cross validation method, and obtaining an optimal generator with the minimum error; and (4) performing missing value inference on the acquired data based on an optimal generator, and reconstructing complete data. The invention improves the problem that a large amount of complete historical data is needed in a data driving method, avoids the requirement of a traditional missing value inference method on a data structure, and is mainly used for solving the problem that the data quality is reduced due to the loss of the collected data caused by the faults of collection equipment, insufficient coverage of the collection equipment, errors of operators and the like.
Description
Technical Field
The invention belongs to the field of data cleaning, and particularly relates to a missing value completion method and medium based on a generation countermeasure network.
Background
Data is the basis of scientific research analysis and numerous applications as an important production factor, and its quality is directly related to model effect and final result. Only complete and accurate data can obtain ideal results, and missing or abnormal data can even lead to wrong conclusions. Data loss is inevitable, however, due to many reasons, such as collection equipment failure, insufficient coverage of collection equipment, operator error, etc. Therefore, in order to obtain correct results, data cleaning is an indispensable link in the whole data analysis process, and missing value completion is an important component in data cleaning.
The missing value completion methods commonly used at present mainly include mean value filling, maximum and minimum value filling and the like, but the methods are too simple and usually have great difference with the true data value. In many applications, data often appears in a matrix form, and therefore, interpolation methods such as compressive sensing, matrix filling, gaussian regression process, neural network and the like are also used for deducing missing values, but these methods usually assume that data has some characteristics, for example, compressive sensing assumes that data satisfies sparsity, matrix filling assumes that data has low rank, gaussian regression process assumes that data satisfies gaussian distribution, data-driven methods such as neural network and the like usually require a large amount of complete training data and the like, but these assumptions are often difficult to satisfy in an actual data set and usually only have a large amount of incomplete historical data.
Disclosure of Invention
The invention aims to provide a missing value completion method and medium based on a generated countermeasure network, which are used for solving the problem of data quality reduction caused by the loss of collected data due to the reasons of collection equipment faults, insufficient collection equipment coverage, operator errors and the like, do not need complete training data and can more accurately complete missing values.
The technical solution for realizing the purpose of the invention is as follows: a missing value completion method based on generation of a countermeasure network, comprising the steps of:
collecting incomplete historical data to obtain a training set;
training the generator and the discriminator in an off-line manner based on the training set;
determining and iteratively training a confrontation network model through a cross validation method, and obtaining an optimal generator with the minimum error;
and (4) performing missing value inference on the acquired data based on an optimal generator, and reconstructing complete data.
Further, training the generator in an off-line manner based on the training set, comprising the steps of:
flattening data in the training set at different moments into one-dimensional vectors to form a historical data matrix with continuous row moments and corresponding column data, and generating a corresponding indication matrix to identify whether the data is missing;
carrying out normalization processing on the historical data matrix, and replacing missing values in the historical data matrix by random variables generated by a Gaussian distribution generator;
will complement the historical data matrixTraining the generator as an input to the generator, the generator outputting a complete historical data matrix
Further, training the discriminator in an offline manner based on the training set specifically includes: randomly generating a clue matrix by the indication matrix and the set clue rate, using the indication matrix and the clue matrix as the input training discriminator of the discriminator, and generating a corresponding prediction history indication matrix by the discriminator
Further, the clue matrix is:
H=B·Ωhig+0.5(1-B)
where B is a random matrix of values 0, 1 generated by the cue rate h, ΩhisIs an indication matrix.
Further, the determining of the confrontation network model by the cross validation method specifically includes: and selecting the layer number and the activation function of the corresponding generator and the corresponding arbiter when the total error in the test data set is minimum through an 80-20 cross validation method as the model structure of the countermeasure network.
Further, the countermeasure network model is:
wherein the content of the first and second substances,in order to reverse the indication matrix, the display device,is an inverse historical data matrix, H is a clue matrix,in order to be a discriminator, the method comprises the steps of,in order to be a generator,is composed ofThe joint probability distribution of (c).
Further, the antagonistic network model is divided into 2 sub-models:
wherein the content of the first and second substances,andare all cross-entropy loss functions of the entropy,as a function of mean square error, alphaiIn order to lose the over-parameters,a history indication matrix predicted for the arbiter,to generate the complete historical data matrix for the generator,completing a historical data matrix through Gaussian variable interpolation;
during training, the generator is firstly fixed and then pressed according to V1Training the discriminators, then fixing the discriminators, by V2Training the generator until the iteration converges;
the best generator with the smallest acquisition error is: training is carried out through a plurality of groups of data training sets, and the generator with the minimum error is selected as the optimal generator.
Further, the indication matrix is a 0-1 indication matrix, 0 represents a missing value, and 1 represents a known value; the generator and the discriminator are two layers of fully connected neural networks.
Further, based on an optimal generator, missing value inference is carried out on the acquired data, and complete data are reconstructed, and the method specifically comprises the following steps:
carrying out normalization processing on the incomplete data matrix acquired this time, and replacing missing values in the matrix by random variables generated by Gaussian distribution to generate a corresponding temporary interpolation matrix;
taking the temporary interpolation matrix as the input of the optimal generator model and generating a temporary data matrix;
replacing the data at the position of the original missing value in the temporary data matrix with the missing value at the position corresponding to the temporary interpolation matrix, and generating a final interpolation matrix;
and carrying out inverse normalization processing on the final interpolation matrix to generate a final complete data matrix.
A computer storage medium having stored thereon an executable program for executing by a processor the steps of implementing the missing value padding method.
Compared with the prior art, the invention has the beneficial effects that: the method fully utilizes a large amount of incomplete historical data collected by the wireless sensor network and utilizes the generated countermeasure network to extract the time characteristics in the historical data so as to guide the data collection process to deduce the missing value and effectively reconstruct the complete data, and compared with the traditional missing value completion algorithm, the method has better reconstruction precision and does not need any data hypothesis; compared with the traditional data-driven algorithm, the method does not need a large amount of complete historical data and can more accurately complement the missing value.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention.
FIG. 2 is a flow chart of the method of the present invention.
Fig. 3 is a diagram of the structure of the countermeasure network of the present invention.
FIG. 4 is a diagram of simulation results of the method of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and the embodiments.
The invention provides a missing value completion method based on a generated countermeasure network, which is mainly used for solving the problem of data quality reduction caused by the loss of collected data due to the reasons of collection equipment faults, insufficient collection equipment coverage, operator errors and the like.
As shown in fig. 1, the method includes two stages, an offline training stage and an online reconstruction stage, and the signal data collected by crowdsourcing is illustrated in this embodiment. In the present embodiment, simulation data is used for simulation, and the simulation space size is 24 × 20 × 4m3The division grid size is 1 x 1m2Form a 24 × 20 grid with transmitter positions [15, 12, 1 ]]Then, the simulation signals are divided into different grids according to the coordinate range, 16000 pieces of signal maps with the size of 24 × 20 are generated together, in order to simulate the incompleteness of the historical signal map, 50% of grid signals are randomly selected from each signal map to be set as None (the signal default value which is not collected), and therefore the historical signal map with the dimensions of 16000 × 24 × 20 is finally generated.
With reference to fig. 2, in the off-line training stage, a large amount of incomplete historical data collected by sensors are used for training to generate a confrontation network model, data collected by sensing nodes at different times are flattened into a one-dimensional vector, a historical data matrix with behavior time and sensing node number is formed, and a corresponding 0-1 indication matrix is generated, wherein 0 indicates a missing value, and 1 indicates a known value; secondly, performing normalization processing on the historical data matrix, replacing missing values in the historical data matrix with random variables generated by a Gaussian distribution generator, and then training the generator by taking the matrix as the input of the generator; then, randomly generating a clue matrix by using the indication matrix and the clue rate, and training a discriminator by using the indication matrix and the clue matrix as the input of the discriminator; iteratively training a generator and a discriminator until the model converges; at the end of off-line training, the cross validation method selects a model which generates the minimum hyper-parameters of the network structure (such as the number of hidden layers and the number of neurons in each layer), clue rate and the like of the countermeasure network and the test error. The method specifically comprises the following steps:
firstly, 16000 signal maps with the size of 24 × 20 are flattened into a historical data matrix M of 16000 × 480hisAnd generates a corresponding indicator matrix omegahisIt is a 0-1 matrix, 0 refers to missing values, 1 represents a known value;
the historical data matrix MhisAccording toCarrying out normalization processing, replacing missing values in a historical data matrix by using a random variable Z generated by a Gaussian distribution generator, and then training the generator by using the matrix as the input of the generatorGenerating false data similar to real data by using a generator and calculating a final complete historical data matrixTo press V2A training generator;
using the indicator matrix omegahisAnd randomly generating a clue matrix H, B & omega by the pre-determined hyper-parameter clue rate Hhis+0.5(1-B), where B is a random matrix of 0, 1 generated by the cue rate h, and the discriminant is trained with the indicator matrix and the cue matrix as inputs to the discriminantThe discriminator determines whether the data is real data or false data generated by the generator and generates a predicted history indication matrixTo press V1And V2Training a discriminator and a generator;
wherein the content of the first and second substances,is an inverse history indication matrix which is a 0-1 matrix, 1 referring to a missing value, 0 representing a known value,is an inverse historical data matrix, H is a clue matrix,in order to be a discriminator, the method comprises the steps of,in order to be a generator,is composed ofThe joint probability distribution of (c). However, for simplicity of calculation, in the actual calculation process, we convert it into 2 sub-formulasAndduring training, the generator is firstly fixed and then pressed according to V1Training the discriminators, then fixing the discriminators, by V2Training the generator until the iterations converge, whereinAndare all cross-entropy loss functions of the entropy,as a function of mean square error, alphalTo be a loss superparameter (any positive number), which can be obtained by training, set to 700 in the experiment,a history indication matrix predicted for the arbiter,to generate the final complete historical data matrix for the generator,a historical temporary interpolation matrix interpolated by a gaussian variable. Referring to fig. 3, a cross validation method of 80-20 is adopted to select a generation countermeasure network structure which minimizes the total test error, namely the number of layers of a generator and a discriminator and an activation function, wherein the generator and the discriminator both adopt two layers of fully connected neural networks, and the activation function is selected to output the initial weight as it is and to be initialized randomly.
As shown in fig. 3, the cross-validation method of B0-20 is used to select a generation countermeasure network structure, i.e. the number of layers of generators and discriminators, and an activation function, which outputs initial weights as they are, to be randomly initialized, so as to minimize the total test error.
And in the on-line reconstruction stage, the generator model finished by off-line training is utilized to carry out missing value inference on the collected sparse incomplete data so as to reconstruct complete data. Firstly, performing normalization processing on an incomplete data matrix collected this time, and replacing missing values in the matrix by using random variables generated by Gaussian distribution to generate a corresponding temporary interpolation matrix; then, the temporary interpolation matrix is used as the input of the trained generator model and generates a temporary data matrix; then, replacing the currently collected incomplete data matrix with data of the original missing value position in the temporary data matrix, and generating a final interpolation matrix; and finally, carrying out inverse normalization processing on the final interpolation matrix to generate a final complete data matrix, and specifically comprising the following steps of:
the incomplete data matrix M collected this time is normalizedAnd replacing missing values in the matrix with random variables generated by Gaussian distribution to generate corresponding temporary interpolation matrix
Using the temporary interpolation matrix as input of the trained generator model and generating a temporary data matrix
Replacing the collected incomplete data matrix with the data of the original missing value position in the temporary data matrix, and generating a final interpolation matrix
And carrying out inverse normalization processing on the final interpolation matrix and generating a final complete data matrix.
In the embodiment, Bayesian compressed sensing BCS and matrix filling calculation are comparedLmaFit, self-encoder AE, Gaussian regression algorithm GPR and other algorithms, and the method is a curve corresponding to GAN, whereinThe relative error is shown, the specific experimental result is shown in FIG. 4, the relative error of the method is the lowest as seen in FIG. 4, and the accuracy of the method for compensating the missing value is verified.
The method is different from the prior missing value completion method, and the method infers the missing value of the acquired incomplete data by generating the nonlinear characteristics and distribution of confrontation network learning existing in a large amount of historical incomplete data so as to acquire complete data. Therefore, the invention does not need complete training data and can more accurately complete the missing value.
Claims (10)
1. A missing value completion method based on generation of a countermeasure network, comprising the steps of:
collecting incomplete historical data to obtain a training set;
training the generator and the discriminator in an off-line manner based on the training set;
determining and iteratively training a confrontation network model through a cross validation method, and obtaining an optimal generator with the minimum error;
and (4) performing missing value inference on the acquired data based on an optimal generator, and reconstructing complete data.
2. The deficiency completion method according to claim 1, wherein the generator is trained in an off-line manner based on a training set, comprising the steps of:
flattening data in the training set at different moments into one-dimensional vectors to form a historical data matrix with continuous row moments and corresponding column data, and generating a corresponding indication matrix to identify whether the data is missing;
carrying out normalization processing on the historical data matrix, and replacing missing values in the historical data matrix by random variables generated by a Gaussian distribution generator;
3. The deficiency completion method according to claim 2, wherein the off-line training of the discriminant based on the training set specifically comprises: randomly generating a clue matrix by the indication matrix and the set clue rate, using the indication matrix and the clue matrix as the input training discriminator of the discriminator, and generating a corresponding prediction history indication matrix by the discriminator
4. The missing value completion method according to claim 2, wherein the cue matrix is:
H=B·Ωhis+0.5(1-B)
where B is a random matrix of values 0, 1 generated by the cue rate h, ΩhisIs an indication matrix.
5. The deficiency value completion method according to claim 4, wherein the determination of the countermeasure network model by the cross-validation method is specifically: and selecting the layer number and the activation function of the corresponding generator and the corresponding arbiter when the total error in the test data set is minimum through an 80-20 cross validation method as the model structure of the countermeasure network.
6. The deficiency value completion method according to claim 5, wherein the countermeasure network model is:
wherein the content of the first and second substances,in order to reverse the indication matrix, the display device,is an inverse historical data matrix, H is a clue matrix,in order to be a discriminator, the method comprises the steps of,in order to be a generator,is composed ofThe joint probability distribution of (c).
7. The deficiency completion method according to claim 6, wherein the antagonistic network model is divided into 2 submodels:
wherein the content of the first and second substances,andare all cross-entropy loss functions of the entropy,as a function of mean square error, alphalIn order to lose the over-parameters,a history indication matrix predicted for the arbiter,to generate the complete historical data matrix for the generator,completing a historical data matrix through Gaussian variable interpolation;
during training, the generator is firstly fixed and then pressed according to V1Training the discriminators, then fixing the discriminators, by V2Training the generator until the iteration converges;
the best generator with the smallest acquisition error is: training is carried out through a plurality of groups of data training sets, and the generator with the minimum error is selected as the optimal generator.
8. The missing value completion method according to claim 2, wherein the indication matrix is a 0-1 indication matrix, 0 represents a missing value, and 1 represents a known value; the generator and the discriminator are two layers of fully connected neural networks.
9. The deficiency completion method according to claim 2, wherein the deficiency inference is performed on the collected data based on an optimal generator to reconstruct the complete data, and the method comprises the following specific steps:
carrying out normalization processing on the incomplete data matrix acquired this time, and replacing missing values in the matrix by random variables generated by Gaussian distribution to generate a corresponding temporary interpolation matrix;
taking the temporary interpolation matrix as the input of the optimal generator model and generating a temporary data matrix;
replacing the data at the position of the original missing value in the temporary data matrix with the missing value at the position corresponding to the temporary interpolation matrix, and generating a final interpolation matrix;
and carrying out inverse normalization processing on the final interpolation matrix to generate a final complete data matrix.
10. A computer storage medium storing an executable program for execution by a processor to perform the steps of implementing the deficiency value completion method of any one of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111360740.2A CN114065919A (en) | 2021-11-17 | 2021-11-17 | Deficiency value completion method and medium based on generation countermeasure network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111360740.2A CN114065919A (en) | 2021-11-17 | 2021-11-17 | Deficiency value completion method and medium based on generation countermeasure network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114065919A true CN114065919A (en) | 2022-02-18 |
Family
ID=80273390
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111360740.2A Pending CN114065919A (en) | 2021-11-17 | 2021-11-17 | Deficiency value completion method and medium based on generation countermeasure network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114065919A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115019510A (en) * | 2022-06-29 | 2022-09-06 | 华南理工大学 | Traffic data restoration method based on dynamic self-adaptive generation countermeasure network |
CN115905853A (en) * | 2022-09-05 | 2023-04-04 | 同济大学 | Aero-engine rotor system fault diagnosis method and device based on deep learning |
-
2021
- 2021-11-17 CN CN202111360740.2A patent/CN114065919A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115019510A (en) * | 2022-06-29 | 2022-09-06 | 华南理工大学 | Traffic data restoration method based on dynamic self-adaptive generation countermeasure network |
CN115019510B (en) * | 2022-06-29 | 2024-01-30 | 华南理工大学 | Traffic data restoration method based on dynamic self-adaptive generation countermeasure network |
CN115905853A (en) * | 2022-09-05 | 2023-04-04 | 同济大学 | Aero-engine rotor system fault diagnosis method and device based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yang et al. | Fault diagnosis for energy internet using correlation processing-based convolutional neural networks | |
CN111860982A (en) | Wind power plant short-term wind power prediction method based on VMD-FCM-GRU | |
CN114065919A (en) | Deficiency value completion method and medium based on generation countermeasure network | |
CN112033463B (en) | Nuclear power equipment state evaluation and prediction integrated method and system | |
CN109828552B (en) | Intermittent process fault monitoring and diagnosing method based on width learning system | |
CN107704962B (en) | Steam flow interval prediction method based on incomplete training data set | |
CN110443724B (en) | Electric power system rapid state estimation method based on deep learning | |
CN101017373A (en) | Industrial process multiresolution softsensoring instrument and method thereof | |
CN112461543B (en) | Rotary machine fault diagnosis method based on multi-classification support vector data description | |
CN112414715B (en) | Bearing fault diagnosis method based on mixed feature and improved gray level symbiosis algorithm | |
EP4050518A1 (en) | Generation of realistic data for training of artificial neural networks | |
CN103678886B (en) | A kind of satellite Bayesian network health based on ground test data determines method | |
CN114707712A (en) | Method for predicting requirement of generator set spare parts | |
CN112560966A (en) | Polarimetric SAR image classification method, medium and equipment based on scattergram convolution network | |
CN113485261A (en) | CAEs-ACNN-based soft measurement modeling method | |
CN113255546A (en) | Diagnosis method for aircraft system sensor fault | |
CN110096730B (en) | Method and system for rapidly evaluating voltage of power grid | |
Wang | Research on the fault diagnosis of mechanical equipment vibration system based on expert system | |
WO2024087129A1 (en) | Generative adversarial multi-head attention neural network self-learning method for aero-engine data reconstruction | |
CN116757062A (en) | Power load frequency characteristic analysis method, device, equipment and storage medium | |
CN111192257A (en) | Method, system and equipment for determining equipment state | |
CN115830462A (en) | SAR image reconstruction method and device based on cycle consistency countermeasure network | |
CN115238736A (en) | Method for identifying early fault of rotary machine | |
CN115359197A (en) | Geological curved surface reconstruction method based on spatial autocorrelation neural network | |
CN114638421A (en) | Method for predicting requirement of generator set spare parts |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |