CN109165664B - Attribute-missing data set completion and prediction method based on generation of countermeasure network - Google Patents

Attribute-missing data set completion and prediction method based on generation of countermeasure network Download PDF

Info

Publication number
CN109165664B
CN109165664B CN201810722774.3A CN201810722774A CN109165664B CN 109165664 B CN109165664 B CN 109165664B CN 201810722774 A CN201810722774 A CN 201810722774A CN 109165664 B CN109165664 B CN 109165664B
Authority
CN
China
Prior art keywords
data
network
missing
filling
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810722774.3A
Other languages
Chinese (zh)
Other versions
CN109165664A (en
Inventor
赵跃龙
王禹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201810722774.3A priority Critical patent/CN109165664B/en
Publication of CN109165664A publication Critical patent/CN109165664A/en
Application granted granted Critical
Publication of CN109165664B publication Critical patent/CN109165664B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention discloses an attribute missing data set completion and prediction method based on generation of a countermeasure network, which comprises the steps of 1) normalizing data minmax, simultaneously using one hot coding for discrete type attributes, and marking a missing value as 0; 2) creating a missing position encoding vector for the sample using the data set; 3) constructing a generating countermeasure network and an auxiliary prediction network for data filling and label prediction; 4) reducing the maximum and minimum values in the attributes into a result before minmax normalization; 5) selecting proper hyper-parameters through testing; the invention fully utilizes the data distribution information and the label information in the data set, can carry out effective data filling on the high-dimensional missing data set, and simultaneously, after the training is finished, another auxiliary prediction network contained in the method can directly give the prediction result of the label according to the input attribute missing data, and the method has simple process and higher prediction accuracy.

Description

Attribute-missing data set completion and prediction method based on generation of countermeasure network
Technical Field
The invention relates to the technical field of data preprocessing, in particular to a method for complementing and predicting an attribute-missing data set based on a generated countermeasure network.
Background
The phenomenon of data set attribute missing widely exists in various data sets, and is generally caused by information loss in the process of data acquisition or transmission. Loss of one or more attributes from a sample in a dataset can degrade the prediction accuracy of a model for subsequent prediction and classification. How to complement the missing data and use the information contained in the samples with missing attributes to construct a high-precision prediction model is a key problem in data preprocessing.
Most statistical tools process the problem of attribute loss by deleting the corresponding rows and columns of the lost samples, or fill the lost positions by using the median and the average of the columns; although this type of method is efficient and convenient, sample data distribution information cannot be fully utilized, resulting in inaccurate calculation results. In the process of multidimensional data processing, a plurality of associations often exist among different attributes of data, the associations among the attributes can provide more information for data filling, and in consideration of the data filling method with the associations, smaller deviation exists when a missing value is estimated, so that the information contained in the missing sample can be deeply mined.
On the basis, a further data filling method fills in missing values through modeling. If the regression filling method is used for establishing a regression equation by taking the missing attribute as a dependent variable to realize prediction, an EM algorithm initializes the missing value first, a final filling result is obtained through iteration of the step E and the step M, a k-nearest neighbor algorithm (KNN) calculates the most similar k samples in the Euclidean distance matching sample set according to the attribute which is not missing, and the filling result is obtained through weighted average. The filling results which are more accurate than the mean value and the median are obtained by the algorithms under the condition of enough data quantity, and then the regression filling method generally has the problems that the obvious linear relation among all attributes is needed, and the filling method based on the EM algorithm has high calculation complexity and is easy to fall into local optimum; the k-nearest neighbor-based filling method is simple to implement, but when large data volume is faced, the calculation amount is large, the complexity is extremely high, and the calculation is difficult.
Furthermore, the main purpose of data population is to provide more complete data for subsequent modeling predictions. The method does not involve a modeling process, the filled data and the predicted labels are often associated, and the combination of the prediction model and the filling method can enable the filled data to have a better prediction effect. The method aims at solving the two problems that the traditional data filling method has high calculation complexity when processing high-dimensional data and fails to fully mine label information to correct filling results; the invention carries out data filling on the basis of the generated confrontation network learning data distribution, and simultaneously establishes an auxiliary prediction network to fully mine the association between the data and the label, so that the mutual information of the data and the label is maximized.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an attribute missing data set complementing and predicting method based on a generated countermeasure network, which makes full use of data distribution information and label information in a data set and can effectively fill data in a high-dimensional missing data set.
In order to achieve the above purpose, the technical scheme provided by the invention is as follows: firstly, carrying out data preprocessing aiming at the attribute-missing data set, wherein the data preprocessing mainly comprises minmax normalization and one hot code conversion of discrete numerical variables; then, aiming at the samples with attribute deletion, constructing a coding vector of a deletion position so as to express the deletion position information; then constructing a filling network of missing data and an auxiliary prediction network to synchronously complete the filling and label prediction of the missing data; after the network training is finished, taking an output result of a network generated in a filling network as a filling result, and carrying out scale reduction according to the maximum and minimum values of the columns recorded during minmax normalization; finally, the super-parameters are continuously modified to observe the loss of the prediction result of the super-parameters in the verification set so as to complete the setting of the super-parameters; which comprises the following steps
1) Preprocessing data;
2) constructing a missing position coding vector;
3) constructing a missing data filling network and an auxiliary prediction network;
4) filling data scale reduction;
5) and testing and super-parameter setting.
In the step 1), different data types are preprocessed differently, the related main data types are divided into a continuous numerical value and a discrete numerical value, and the continuous numerical value is normalized by directly using minmax; for discrete numerical values, after being converted into one hot codes, minmax normalization is used, and 0 is uniformly supplemented for missing positions; further, whether the data set is divided into two parts: data with missing attributes versus data without missing attributes.
In step 2), a missing position-coding vector is constructed, which is the case: when the data is filled, the attribute positions of the missing samples are also important information, when the neural network is used for filling, only the missing positions need to be filled, when the missing position coding vector is constructed, each column of all the samples is traversed, if the attribute is missing, the attribute is marked as '1', otherwise, the attribute is marked as '0', the process is executed according to the flow, and each sample has a corresponding missing position coding vector.
In step 3), a missing data filling network and an auxiliary prediction network are constructed, wherein the conditions are as follows: the network is improved in the original generative countermeasure network as follows: removing noise obtained by random sampling from the input of a generated network; forming filled data by using the generated data and the missing position vector code; in addition, the introduction of the auxiliary prediction network considers the relation between the attributes and the tags more fully, when the attribute missing data is used for prediction, the loss between the tags and the real tags is predicted by using the auxiliary prediction network, and the generated network is updated by performing feedback calculation through a BP algorithm, so that the generated filling data has a better effect when a prediction model is constructed; the loss function in the antagonistic network and the loss function in the auxiliary prediction network are jointly generated, and the weight ratio of the antagonistic network and the auxiliary prediction network is controlled by the hyper-parameter to determine that the distribution of the generated filling data is closer to the distribution of the complete data or the prediction of the prediction model is more accurate; the structure of the data filling network and the auxiliary prediction network comprises a generation network, a judgment network and an auxiliary prediction network; the structure of these three networks is described in detail below:
generating a network: the input part is formed by splicing data with attribute missing and missing position coding vectors corresponding to the data; according to different data structures, the hidden layer can be formed by using a full-connection layer or a deconvolution layer, and particularly when the input data is picture type data, the generated filling data is obtained by using deconvolution operation; it is assumed here that the input data is denoted as I, which is a 100-dimensional vector, and thus the corresponding missing position encoding vector is denoted as E, which is also 100-dimensional, and the dimension of the input vector obtained by splicing is 200; the hidden layer is composed of a full connection layer, and the relu is used as an activation function; the final output layer is provided with 100 output units marked as O, and the activation function of the output layer adopts sigmoid; the padded data consists ultimately of I. (1-E) + O.E;
and (3) network discrimination: the input data comprises two parts, wherein the first part is a filling data result obtained based on the output of the generated network, the second part is sample data with non-missing attributes, and the output result is a decimal between 0 and 1 and represents the probability of judging whether the input data received by the network is from the data with non-missing attributes; the setting of the network structure is different according to the different types of the input data, and when the input data is image type data, the convolutional neural network is constructed; assuming here that the input data is a 100-dimensional vector, the hidden layer can be chosen to be made up of fully connected layers, with the activation function set to relu; the output layer only comprises one unit, the activation function is selected as sigmoid, and the probability is represented;
auxiliary prediction network: the input is completely consistent with the discrimination network, the output is a predicted value of the input sample about the label, when the prediction problem is a classification problem, cross entropy is adopted as a loss function, and when the prediction problem is a regression problem, an L2 norm or an L1 norm is adopted as the loss function; the network structure is the same as the setting method of the judgment network; assuming here that the input data is a 100-dimensional vector, the hidden layer can be chosen to be made up of fully connected layers, with the activation function set to relu; the output layer contains only one cell and the activation function is set in the manner described above.
In step 4), the generated filling data is subjected to scale reduction, and as the data normalization is performed by using minmax in the preprocessing stage, the final filling result can be obtained through reduction according to the maximum value and the minimum value of each recorded attribute.
In step 5), testing and hyper-parameter setting, wherein the conditions are as follows: during the training process of the network, loss is generated from two parts: generating a predicted loss of the antagonistic network and the auxiliary predictive network; the two losses are combined in different proportions lambda to obtain a comprehensive loss; different λ may affect the training of the model; in the operation process, the data set is segmented into a training set and a testing set, lambadas with different scales are selected from the training set and are respectively 0.1,0.3,0.5,0.7 and 0.9 for training, and meanwhile, the testing set is used for testing, and the minimum loss of the auxiliary prediction network on the testing set is used as a selection standard of the hyper-parameter.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the traditional filling method such as median filling and mean filling is simple and has a poor filling effect, while the KNN and EM-based method is often high in time complexity, so that the time complexity is extremely high when a high-dimensional data set is processed, and even the situation that the data set cannot be processed occurs. The generated countermeasure network has excellent effect on the distribution learning of high-dimensional data, so that the trouble caused by a high-dimensional data set can be solved; in addition, the samples without attribute deletion and the samples with attribute deletion generally obey the same distribution, and the filled data is approximate to the data set without attribute deletion from the distribution, so that the filling result does not deviate from the data distribution, and the prediction model is negatively influenced.
2. The traditional filling method does not consider the influence of filled data on the prediction result of the subsequent establishment of the prediction model, and the steps of the traditional filling method are that the missing data is filled to obtain the completed data, and then the filled data is used for establishing the prediction model, so that the data filling can not be guided by the prediction effect. According to the invention, the auxiliary prediction network is introduced to calculate the loss between the data prediction value of each filling and the real label, and the back propagation guidance is carried out to generate the data filling of the network, so that the prediction effect can be selected by observing the performance of the filled data on a prediction model, and the difference between the loss limit filling data of the network and the distribution of the real data is judged, so that the good filling effect is achieved, and the good prediction result is obtained. In addition, after the training is finished, an end-to-end network is obtained, and after data is input, a prediction result of the auxiliary prediction network can be directly obtained.
Drawings
FIG. 1 is a flow chart of missing data padding and prediction.
FIG. 2 is a data flow diagram of a generative confrontation network and prediction network of padding data.
Detailed Description
The present invention will be further described with reference to the following specific examples.
As shown in fig. 1, the method for complementing and predicting the attribute-missing data set based on the generation of the countermeasure network provided in this example includes the following specific cases:
1) data preprocessing: the data types of different attributes are different, and the corresponding processing modes are also different. The related main data types are divided into a continuous numerical value and a discrete numerical value, and for the continuous numerical value, minmax is directly used for normalization; for discrete values, after conversion to one hot encoding, minmax normalization was used, with a uniform 0 complement for the missing positions. The data set is further divided into two parts, data with missing attributes and data without missing attributes.
2) Constructing a missing position coding vector: the attribute positions of the missing samples are also important information in data filling, and only the missing positions need to be filled when the neural network is used for filling. When constructing the missing position encoding vector, each column of all samples is traversed, if the attribute is missing, the attribute is marked as "1", otherwise, the attribute is marked as "0". According to the execution of the flow, each sample has a missing position code vector corresponding to it.
3) Constructing a missing data filling network and an auxiliary prediction network: the invention provides an integrated network which is based on a generative countermeasure network and combined with an auxiliary prediction network to fill data and can predict. The network is improved in the original generative countermeasure network as follows: removing noise obtained by sampling from the input of a generated network; and secondly, the generated data and the missing position vector code are used for forming the filled data. In addition, the introduction of the auxiliary prediction network considers the relation between the attributes and the tags more fully, when the attribute missing data is used for prediction, the loss between the tags predicted by the auxiliary prediction network and the real tags is used for carrying out feedback calculation through a BP algorithm so as to update the generation network, and therefore the generated filling data has a better effect when a prediction model is constructed. And jointly generating a loss function in the countermeasure network and a loss function in the auxiliary prediction network, and controlling the weight ratio of the loss functions and the auxiliary prediction network through the hyper-parameters to determine that the generated filling data distribution is closer to the distribution of the complete data or enable the prediction model to predict more accurately. FIG. 2 is a block diagram of the most important data filling network and the auxiliary prediction network in the present invention, including the generation network, the discrimination network, and the auxiliary prediction network; the structure of these three networks is described in detail below:
generating a network: the input part is formed by splicing data with attribute missing and a missing position coding vector corresponding to the data. Depending on the structure of the data, the hidden layer can be constructed using either a fully connected layer or an deconvolution layer, and in particular, when the input data is picture type data, the generated padding data is obtained using the deconvolution layer. It is assumed here that the input data (denoted as I) is a 100-dimensional vector, and thus the corresponding missing position encoding vector (denoted as E) is also 100-dimensional, and the dimension of the input vector obtained by splicing is 200; the hidden layer is composed of a full connection layer, and the relu is used as an activation function; and the final output layer has 100 output units (marked as O), and the activation function of the output layer adopts sigmoid. The padded data is finally composed of I. (1-E) + O.E.
And (3) network discrimination: the input data comprises two parts, wherein the first part is a filling data result obtained based on the output of the generated network, the second part is sample data with non-missing attributes, and the output result is a decimal between 0 and 1 and represents the probability of judging whether the input data received by the network is from the data with non-missing attributes. The network structure is set differently according to the type of input data, and when the input data is image type data, the network structure can be constructed by a convolutional neural network. Assuming here that the input data is a 100-dimensional vector, the hidden layer can be chosen to be a fully-connected layer configuration, with the activation function set to relu; the output layer only comprises one unit, the activation function is selected as sigmoid, and the probability is represented.
Auxiliary prediction network: the input is completely consistent with the discriminant network, the output is the predicted value of the input sample about the label, when the prediction problem is the classification problem, the cross entropy is adopted as the loss function, and when the prediction problem is the regression problem, the L2 norm or the L1 norm is adopted as the loss function. The network structure is the same as the setting method of the discrimination network. Assuming here that the input data is a 100-dimensional vector, the hidden layer can be chosen to be a fully-connected layer configuration, with the activation function set to relu; the output layer contains only one cell and the activation function is set in the manner described above.
4) Filling data scale reduction: due to the fact that the preprocessing stage uses minmax to conduct data normalization, the final filling result can be obtained through restoration according to the maximum value and the minimum value of each recorded attribute.
5) Testing and hyper-parameter setting: in the training process of the network, loss is caused by loss in the two-part generative countermeasure network and the prediction loss of the auxiliary prediction network; these two losses combine in different ratios lambda to give a combined loss. Different λ's will affect the training of the model. In the operation process, the data set is segmented into a training set and a testing set, lambadas with different scales are selected from the training set and are respectively 0.1,0.3,0.5,0.7 and 0.9 for training, and meanwhile, the testing set is used for testing, and the minimum loss of the auxiliary prediction network on the testing set is used as a selection standard of the hyper-parameter.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that any changes made in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims (4)

1. A method for complementing and predicting attribute-missing data sets based on a generated countermeasure network is characterized by comprising the following steps: firstly, data preprocessing is carried out on a data set with lost attributes, wherein the data preprocessing comprises minmax normalization and one hot code conversion of discrete numerical variables; then, aiming at the samples with attribute deletion, constructing a coding vector of a deletion position so as to express the deletion position information; then constructing a filling network of missing data and an auxiliary prediction network to synchronously complete the filling and label prediction of the missing data; after the network training is finished, taking an output result of a network generated in a filling network as a filling result, and carrying out scale reduction according to the maximum and minimum values of the columns recorded during minmax normalization; finally, the super-parameters are continuously modified to observe the loss of the prediction result of the super-parameters in the verification set so as to complete the setting of the super-parameters; which comprises the following steps
1) Preprocessing data;
2) constructing a missing position coding vector, the case of which is: when data is filled, the attribute positions of the missing samples are important information, when a neural network is used for filling, only the missing positions need to be filled, when the missing position coding vectors are constructed, each column of all the samples is traversed, if the attribute is missing, the attribute is marked as '1', otherwise, the attribute is marked as '0', the process is executed according to the process, and each sample has a corresponding missing position coding vector;
3) constructing a missing data filling network and an auxiliary prediction network, wherein the conditions are as follows: the network is improved in the original generative countermeasure network as follows: removing noise from the input of the generated network; forming filled data by using the generated data and the missing position vector code; in addition, the introduction of the auxiliary prediction network fully considers the relation between the attributes and the tags, and when the attribute missing data is used for prediction, the loss between the tags and the real tags is predicted by using the auxiliary prediction network, and the generated network is updated by performing feedback calculation through a BP algorithm; jointly generating a loss function in the countermeasure network and a loss function in the auxiliary prediction network, and determining that the distribution of the generated filling data is close to the distribution of the complete data by controlling the weight ratio of the loss functions through the hyper-parameters; the structure of the data filling network and the auxiliary prediction network comprises a generation network, a judgment network and an auxiliary prediction network; the structure of these three networks is described in detail below:
generating a network: the input part is formed by splicing data with attribute missing and missing position coding vectors corresponding to the data; the input data is picture type data, and generated filling data is obtained by using deconvolution operation;
and (3) network discrimination: the input data comprises two parts, wherein the first part is a filling data result obtained based on the output of the generated network, the second part is sample data with non-missing attributes, and the output result is a decimal between 0 and 1 and represents the probability of judging whether the input data received by the network is from the data with non-missing attributes; the input data is image type data, and the network structure of the discrimination network is constructed by a convolution neural network;
auxiliary prediction network: the input is completely consistent with the discrimination network, the output is a predicted value of the input sample about the label, when the prediction problem is a classification problem, cross entropy is adopted as a loss function, and when the prediction problem is a regression problem, an L2 norm or an L1 norm is adopted as the loss function; the network structure is the same as the setting method of the judgment network;
4) filling data scale reduction;
5) and testing and super-parameter setting.
2. The method of claim 1, wherein the method comprises the following steps: in the step 1), different data types are preprocessed differently, the related data types are divided into a continuous numerical value and a discrete numerical value, and the continuous numerical value is normalized by directly using minmax; for discrete numerical values, after being converted into one hot codes, minmax normalization is used, and 0 is uniformly supplemented for missing positions; in addition, the data set is divided into two parts according to whether there is attribute missing: data with missing attributes versus data without missing attributes.
3. The method of claim 1, wherein the method comprises the following steps: in step 4), the generated filling data is subjected to scale reduction, and as the data normalization is performed by using minmax in the preprocessing stage, the final filling result can be obtained through reduction according to the maximum value and the minimum value of each recorded attribute.
4. The method of claim 1, wherein the method comprises the following steps: in step 5), testing and hyper-parameter setting, wherein the conditions are as follows: during the training process of the network, loss is generated from two parts: generating a predicted loss of the antagonistic network and the auxiliary predictive network; the two losses are combined in different proportions lambda to obtain a comprehensive loss; different λ may affect the training of the model; in the operation process, the data set is segmented into a training set and a testing set, lambadas with different scales are selected from the training set and are respectively 0.1,0.3,0.5,0.7 and 0.9 for training, and meanwhile, the testing set is used for testing, and the minimum loss of the auxiliary prediction network on the testing set is used as a selection standard of the hyper-parameter.
CN201810722774.3A 2018-07-04 2018-07-04 Attribute-missing data set completion and prediction method based on generation of countermeasure network Active CN109165664B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810722774.3A CN109165664B (en) 2018-07-04 2018-07-04 Attribute-missing data set completion and prediction method based on generation of countermeasure network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810722774.3A CN109165664B (en) 2018-07-04 2018-07-04 Attribute-missing data set completion and prediction method based on generation of countermeasure network

Publications (2)

Publication Number Publication Date
CN109165664A CN109165664A (en) 2019-01-08
CN109165664B true CN109165664B (en) 2020-09-22

Family

ID=64897277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810722774.3A Active CN109165664B (en) 2018-07-04 2018-07-04 Attribute-missing data set completion and prediction method based on generation of countermeasure network

Country Status (1)

Country Link
CN (1) CN109165664B (en)

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522973A (en) * 2019-01-17 2019-03-26 云南大学 Medical big data classification method and system based on production confrontation network and semi-supervised learning
CN109978257A (en) * 2019-03-25 2019-07-05 上海赢科信息技术有限公司 The continuation of insurance prediction technique and system of vehicle insurance
CN110046706B (en) * 2019-04-18 2022-12-20 腾讯科技(深圳)有限公司 Model generation method and device and server
CN110175168B (en) * 2019-05-28 2021-06-01 山东大学 Time sequence data filling method and system based on generation of countermeasure network
CN110647519B (en) * 2019-08-30 2023-10-03 中国平安人寿保险股份有限公司 Method and device for predicting missing attribute value in test sample
CN110728297B (en) * 2019-09-04 2021-08-06 电子科技大学 Low-cost antagonistic network attack sample generation method based on GAN
CN111046027B (en) * 2019-11-25 2023-07-25 北京百度网讯科技有限公司 Missing value filling method and device for time series data
CN113010500A (en) * 2019-12-18 2021-06-22 中国电信股份有限公司 Processing method and processing system for DPI data
CN111037365B (en) * 2019-12-26 2021-08-20 大连理工大学 Cutter state monitoring data set enhancing method based on generative countermeasure network
CN111177135B (en) * 2019-12-27 2020-11-10 清华大学 Landmark-based data filling method and device
CN111259953B (en) * 2020-01-15 2023-10-20 云南电网有限责任公司电力科学研究院 Equipment defect time prediction method based on capacitive equipment defect data
CN111259916A (en) * 2020-02-12 2020-06-09 东华大学 Low-rank projection feature extraction method under condition of label missing
CN111429605B (en) * 2020-04-10 2022-06-21 郑州大学 Missing value filling method based on generation type countermeasure network
CN111737463B (en) * 2020-06-04 2024-02-09 江苏名通信息科技有限公司 Big data missing value filling method, device and computer readable memory
CN111738007B (en) * 2020-07-03 2021-04-13 北京邮电大学 Chinese named entity identification data enhancement algorithm based on sequence generation countermeasure network
CN112036955B (en) * 2020-09-07 2021-09-24 贝壳找房(北京)科技有限公司 User identification method and device, computer readable storage medium and electronic equipment
CN112183723B (en) * 2020-09-17 2022-07-05 西北工业大学 Data processing method for clinical detection data missing problem
CN112465150A (en) * 2020-12-02 2021-03-09 南开大学 Real data enhancement-based multi-element time sequence data filling method
CN112712855B (en) * 2020-12-28 2022-09-20 华南理工大学 Joint training-based clustering method for gene microarray containing deletion value
CN114826988A (en) * 2021-01-29 2022-07-29 中国电信股份有限公司 Method and device for anomaly detection and parameter filling of time sequence data
CN113239022B (en) * 2021-04-19 2023-04-07 浙江大学 Method and device for complementing missing data in medical diagnosis, electronic device and medium
CN113515896B (en) * 2021-08-06 2022-08-09 红云红河烟草(集团)有限责任公司 Data missing value filling method for real-time cigarette acquisition
CN114936530A (en) * 2022-06-22 2022-08-23 郑州大学 Multi-element air quality data missing value filling model based on TAM and construction method thereof
CN115145906B (en) 2022-09-02 2023-01-03 之江实验室 Preprocessing and completion method for structured data
CN115883016B (en) * 2022-10-28 2024-02-02 南京航空航天大学 Flow data enhancement method and device based on federal generation countermeasure network
CN115829162B (en) * 2023-01-29 2023-05-26 北京市农林科学院信息技术研究中心 Crop yield prediction method, device, electronic equipment and medium
CN117034142B (en) * 2023-10-07 2024-02-09 之江实验室 Unbalanced medical data missing value filling method and system
CN117150231B (en) * 2023-10-27 2024-01-26 国网江苏省电力有限公司苏州供电分公司 Measurement data filling method and system based on correlation and generation countermeasure network
CN117421548B (en) * 2023-12-18 2024-03-12 四川互慧软件有限公司 Method and system for treating loss of physiological index data based on convolutional neural network
CN117524318B (en) * 2024-01-05 2024-03-22 深圳新合睿恩生物医疗科技有限公司 New antigen heterogeneous data integration method and device, equipment and storage medium
CN117556267B (en) * 2024-01-12 2024-04-02 闪捷信息科技有限公司 Missing sample data filling method and device, storage medium and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106952239A (en) * 2017-03-28 2017-07-14 厦门幻世网络科技有限公司 image generating method and device
CN107133934A (en) * 2017-05-18 2017-09-05 北京小米移动软件有限公司 Image completion method and device
AU2017101166A4 (en) * 2017-08-25 2017-11-02 Lai, Haodong MR A Method For Real-Time Image Style Transfer Based On Conditional Generative Adversarial Networks
KR20170137350A (en) * 2016-06-03 2017-12-13 (주)싸이언테크 Apparatus and method for studying pattern of moving objects using adversarial deep generative model
CN107945118A (en) * 2017-10-30 2018-04-20 南京邮电大学 A kind of facial image restorative procedure based on production confrontation network
CN107945140A (en) * 2017-12-20 2018-04-20 中国科学院深圳先进技术研究院 A kind of image repair method, device and equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170137350A (en) * 2016-06-03 2017-12-13 (주)싸이언테크 Apparatus and method for studying pattern of moving objects using adversarial deep generative model
CN106952239A (en) * 2017-03-28 2017-07-14 厦门幻世网络科技有限公司 image generating method and device
CN107133934A (en) * 2017-05-18 2017-09-05 北京小米移动软件有限公司 Image completion method and device
AU2017101166A4 (en) * 2017-08-25 2017-11-02 Lai, Haodong MR A Method For Real-Time Image Style Transfer Based On Conditional Generative Adversarial Networks
CN107945118A (en) * 2017-10-30 2018-04-20 南京邮电大学 A kind of facial image restorative procedure based on production confrontation network
CN107945140A (en) * 2017-12-20 2018-04-20 中国科学院深圳先进技术研究院 A kind of image repair method, device and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GAIN: Missing Data Imputation using Generative Adversarial Nets;Jinsung Yoon et al.;《Proceedings of the 35 th International Conference on Machine》;20180607;全文 *

Also Published As

Publication number Publication date
CN109165664A (en) 2019-01-08

Similar Documents

Publication Publication Date Title
CN109165664B (en) Attribute-missing data set completion and prediction method based on generation of countermeasure network
KR20210040248A (en) Generative structure-property inverse computational co-design of materials
CN112560967B (en) Multi-source remote sensing image classification method, storage medium and computing device
JP7020547B2 (en) Information processing equipment, control methods, and programs
CN113516133B (en) Multi-modal image classification method and system
CN110059625B (en) Face training and recognition method based on mixup
CN111339818A (en) Face multi-attribute recognition system
CN114519469A (en) Construction method of multivariate long sequence time sequence prediction model based on Transformer framework
CN111310918B (en) Data processing method, device, computer equipment and storage medium
CN110738363B (en) Photovoltaic power generation power prediction method
CN116844041A (en) Cultivated land extraction method based on bidirectional convolution time self-attention mechanism
CN115496144A (en) Power distribution network operation scene determining method and device, computer equipment and storage medium
CN109787821B (en) Intelligent prediction method for large-scale mobile client traffic consumption
AU2021106200A4 (en) Wind power probability prediction method based on quantile regression
CN113505477A (en) Process industry soft measurement data supplementing method based on SVAE-WGAN
Zhang et al. RSVRs based on feature extraction: a novel method for prediction of construction projects’ costs
US20230281363A1 (en) Optimal materials and devices design using artificial intelligence
CN115063374A (en) Model training method, face image quality scoring method, electronic device and storage medium
CN114595884A (en) Genetic intelligent optimization neural network wind power generation equipment temperature prediction method
CN113887471A (en) Video time sequence positioning method based on feature decoupling and cross comparison
CN112307288A (en) User clustering method for multiple channels
CN115796054B (en) Scene discovery and vulnerability analysis method, system, terminal and storage medium
CN116821745B (en) Control method and system of intelligent linear cutting slow wire-moving equipment
Deshpande et al. Long Range Probabilistic Forecasting in Time-Series using High Order Statistics
CN116643957A (en) Stream processing application delay performance prediction system, method and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant