CN107273429A - A kind of Missing Data Filling method and system based on deep learning - Google Patents
A kind of Missing Data Filling method and system based on deep learning Download PDFInfo
- Publication number
- CN107273429A CN107273429A CN201710358297.2A CN201710358297A CN107273429A CN 107273429 A CN107273429 A CN 107273429A CN 201710358297 A CN201710358297 A CN 201710358297A CN 107273429 A CN107273429 A CN 107273429A
- Authority
- CN
- China
- Prior art keywords
- data
- missing
- convolutional neural
- neural networks
- test sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a kind of Missing Data Filling method and system based on deep learning, wherein method comprises the following steps:Data set is pre-processed;It is trained and preserves using the convolutional neural networks of training sample set pair Primary Construction, Missing Data Filling is carried out to missing test sample collection using the convolutional neural networks obtained after training, and compare filling result with test sample collection, when not meeting required precision, the foregoing training of network structure and iteration of adjustment convolutional neural networks and verification step are until meet required precision;Partial data subset is inputted into convolutional neural networks, the convolutional neural networks improved;Missing data subset is inputted into the filling that perfect convolutional neural networks complete missing values.The problem of present invention solves database Missing Data Filling, has reached that the degree of accuracy is higher, the faster effect of efficiency, more truly can rapidly reduce missing data.
Description
Technical field
The present invention relates to field of computer technology, more particularly to a kind of Missing Data Filling method based on deep learning and it is
System.
Background technology
Since self-information technology is widely applied to the development of every profession and trade and these new fields and old fields of promotion that exceed the speed limit, data conduct
The resource that this technology is depended on for existence is constantly gathered with excavating, and data volume is just expanded with shockingly speed.Huge
Data undoubtedly add the difficulty of data management.Occur omission, incorrect measurement when in real world due to data inputting
Method, the limitation of collection condition are deleted etc. many factors because violating constraints and are likely to cause generation data
Missing.Missing values do not mean only that the blank of information, it is often more important that it can influence the work such as follow-up data excavation, statistical analysis
Progress.Handle missing values common method include delete comprising missing item first ancestral, using missing values as particular value processing or
Person carries out Missing Data Filling.It is all higher in view of real data storehouse miss rate, and missing pattern is generally stochastic model, therefore
It is more reasonable using the third processing method.
Some Missing Data Filling methods for being directed to different pieces of information are had been presented at present, and these methods are based primarily upon statistics side
Method.Such as EM algorithm (EM), importance sampling.Wherein EM algorithms are divided into two steps:The first step is to calculate to expect (E), i.e.,
Missing values are filled according to parameter;Second is to maximize (M), i.e., the maximum likelihood value of parameter is tried to achieve under available data collection, such as
This alternating iteration is until convergence.The complexity of this algorithm depends on missing variable number and probability density function.It is another conventional
Filling algorithm be exactly homing method, including linear regression, multiple regression and logistic regression etc..This class algorithm is according to data
Between correlation response variable is fitted with some explanatory variables.Another approximate bayes method based on sampling is
There is the extraction m data put back to fill m missing values in data have been observed.
Above-mentioned EM algorithm predicts the value of missing variable with the model of fit set up on partial data.Fitting
Quality is depending on the selection of independent variable and the complete degree of training set, and filling effect is dramatically by the shadow of available data
Ring.Bayes method, which is extracted, have been observed in data data to fill missing values, although method is simple and to maintain data substantially former
Begin to be distributed, but it ignores the dependency relation between variable.And statistical method needs to carry out explicit extraction feature in advance, as
The basis of probabilistic forecasting, and the bad conclusion of internal relation between data.
The content of the invention
The technical problem to be solved in the present invention is, is relied on for Missing Data Filling method high degree of the prior art
The integrality of data with existing, and the defect of deep relationship between data can not be looked for there is provided a kind of based on deep learning
Missing Data Filling method and system, can be inside depth mining data and the characteristics of correlation using deep neural network, energy
Filling precision and charging efficiency are improved simultaneously.
First aspect present invention comprises the following steps there is provided a kind of Missing Data Filling method based on deep learning:
(1) data set is pre-processed, the data set is divided into partial data subset and missing data subset, by institute
State the data in partial data subset and be divided into partial data in training sample set and test sample collection, random erasure test sample collection
It is used as missing test sample collection;
(2) it is trained and preserves using the convolutional neural networks of the training sample set pair Primary Construction, uses training
The convolutional neural networks obtained afterwards carry out Missing Data Filling to missing test sample collection, and will filling result and the test sample
Collection is compared, and the foregoing training of network structure and iteration and the checking step of the convolutional neural networks are adjusted when not meeting required precision
Suddenly until meeting required precision;
(3) convolutional neural networks for obtaining the partial data subset input step (2), the convolutional Neural improved
Network;
(4) the perfect convolutional neural networks that the missing data subset input step (3) obtains are completed into missing values
Filling.
According in the Missing Data Filling method of the present invention based on deep learning, the step (1) includes:
(1-1) collects data and builds pending data collection;
(1-2) classifies to the data set, the data of perfect mistake is separated as the partial data subset,
The data for having missing are separated as described and lack data subset;
(1-3) randomly selects 60%~80% data as training sample set from the partial data subset, remaining
It is used as the test sample collection;
(1-4) is concentrated in the test sample, and missing test sample collection is used as after random erasure partial data.
According in the Missing Data Filling method of the present invention based on deep learning, from institute in the step (1-3)
The data that 70% is randomly selected in partial data subset are stated as training sample set, remaining 30% data is used as test sample
Collection.
According in the Missing Data Filling method of the present invention based on deep learning, the step (2) specifically includes:
(2-1) builds convolutional neural networks, by input layer, the first convolutional layer, the first pond layer, the second convolutional layer, second
Pond layer, full articulamentum and output layer composition, and initiation parameter;
The training sample set is inputted the convolutional neural networks by (2-2), and convolutional Neural networking is according to the training sample
This intensive data carries out semi-supervised learning, and automatically updates weights, and network structure and inner parameter are preserved after the completion of training;
(2-3) will lack test sample collection and input the convolutional neural networks, and prediction filling missing values will lack test specimens
The filling result of this collection is compared with the test sample collection, if accuracy rate meets required precision, step (3) is performed, if accurately
Rate does not meet required precision, then return to step (2-1) is adjusted to the network structure of the convolutional neural networks.
According in the Missing Data Filling method of the present invention based on deep learning, institute is adjusted in the step (3)
The network structure for stating convolutional neural networks is the convolution number of times for increasing or decreasing convolutional neural networks.
Second aspect of the present invention is there is provided a kind of storage medium, wherein a plurality of instruction that is stored with, the instruction be suitable to by
Reason device loads and performs the step in the foregoing Missing Data Filling method based on deep learning.
Third aspect present invention there is provided a kind of Missing Data Filling system based on deep learning, including:
Data preprocessing module, for being pre-processed to data set, by the data set be divided into partial data subset and
Data in the partial data subset are divided into training sample set and test sample collection by missing data subset, and random erasure is surveyed
This concentrated part of sample data are used as missing test sample collection;
First network processing module, is instructed for the convolutional neural networks using the training sample set pair Primary Construction
Practice and preserve, Missing Data Filling is carried out to missing test sample collection using the convolutional neural networks obtained after training, and will filling
As a result compared with the test sample collection, the network structure of the convolutional neural networks is adjusted when not meeting required precision and is changed
For foregoing training and verification step until meeting required precision;
Second network process module, for the partial data subset to be inputted into what the first network processing module was obtained
Convolutional neural networks, the convolutional neural networks improved;
Missing Data Filling module, completes to lack for the missing data subset to be inputted into the perfect convolutional neural networks
The filling of mistake value.
According in the Missing Data Filling system of the present invention based on deep learning, the data preprocessing module bag
Include:
Data collection module, pending data collection is built for collecting data;
First taxon, for classifying to the data set, the data of perfect mistake are separated as described
Partial data subset, the data for having missing are separated as described and lack data subset;
Second taxon, for randomly selecting 60%~80% data from the partial data subset as training
Sample set, remaining is used as the test sample collection;
Data delete unit, for being concentrated in the test sample, and missing test specimens are used as after random erasure partial data
This collection.
According in the Missing Data Filling system of the present invention based on deep learning, second taxon is from institute
The data that 70% is randomly selected in partial data subset are stated as training sample set, remaining 30% data is used as test sample
Collection.
According in the Missing Data Filling system of the present invention based on deep learning, the first network processing module
Specifically include:
Network struction adjustment unit, for building convolutional neural networks, by input layer, the first convolutional layer, the first pond layer,
Second convolutional layer, the second pond layer, full articulamentum and output layer composition, and initiation parameter;
Network training unit, inputs the convolutional neural networks, convolutional Neural networking is according to institute by the training sample set
Stating training sample concentrates data to carry out semi-supervised learning, and automatically updates weights, and network structure and inside are preserved after the completion of training
Parameter;
Compare iteration unit, missing test sample collection is inputted into the convolutional neural networks, prediction filling missing values will lack
The filling result for losing test sample collection is compared with the test sample collection, if accuracy rate meets required precision, starts described the
Two network process modules, if accuracy rate does not meet required precision, start the network struction adjustment unit to convolution god
Network structure through network is adjusted.
Implement the Missing Data Filling method and system based on deep learning of the present invention, have the advantages that:This hair
The bright convolutional neural networks selected in deep neural network, after data set is pre-processed, are created according to data set size
Go out to meet the network number of plies of Database size specification, every layer of initial parameter is set, training set is inputted in network, neutral net is just
The relation between data oneself can be calculated, its data parameter is updated so that the present invention is not only restricted to the complete of data set
Whole property, excavates the relation between data with regard to energy depth, draws corresponding learning rate and weights, training network is formed, so as to predict
Missing values are simultaneously filled;Convolutional neural networks have the characteristics of weights are shared simultaneously, and high degree must be reduced in the training process
The quantity of weights, reduces the requirement to computer hardware and burden, reduces the generation of data over-fitting.
Brief description of the drawings
Fig. 1 is the Missing Data Filling method flow diagram based on deep learning according to the preferred embodiment of the present invention;
Fig. 2 is a kind of flow of embodiment of data prediction step in the method according to the preferred embodiment of the present invention
Figure;
Fig. 3 is the module frame chart of the Missing Data Filling system based on deep learning according to the preferred embodiment of the present invention;
Fig. 4 is a kind of signal of embodiment of data preprocessing module in the system according to the preferred embodiment of the present invention
Figure;
Fig. 5 is a kind of signal of embodiment of first network processing module in the system according to the preferred embodiment of the present invention
Figure.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
A part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, ordinary skill people
The every other embodiment that member is obtained on the premise of creative work is not made, belongs to the scope of protection of the invention.
Referring to Fig. 1, being the Missing Data Filling method flow diagram based on deep learning according to the preferred embodiment of the present invention.
As shown in figure 1, the Missing Data Filling method based on deep learning that the embodiment is provided comprises the following steps:
In step S101, flow starts;
In step s 102, data prediction step is performed, data set is pre-processed, including:Data set is divided into
Partial data subset A and missing data subset B;Data in partial data subset A are divided into training sample set a1 and test specimens
This collection a2;Partial data is used as missing test sample collection a3 in random erasure test sample collection a2.
Then, first network process step is performed in step S103~S105:Using training sample set a1 to preliminary structure
The convolutional neural networks built are trained and preserved, using the convolutional neural networks obtained after training to missing test sample collection a3
Missing Data Filling is carried out, and filling result is compared with test sample collection a2, convolutional Neural is adjusted when not meeting required precision
The foregoing training of network structure and iteration of network and verification step are until meet required precision.The step is specifically included:
In step s 103, build or adjust the network structure of convolutional neural networks.First, Preliminary design network structure,
By input layer, the first convolutional layer, the first pond layer, the second convolutional layer, the second pond layer, full articulamentum and output layer totally seven layers of group
Into thus Primary Construction convolutional neural networks, and initiation parameter.The parameter of the initialization includes each layer of neuron number
The size of convolution kernel, relevant with the specification of input and output in amount, each feature map size, convolutional layer.
In step S104, convolutional neural networks are trained and preserved using training sample set a1.Will in the step
The convolutional neural networks that training sample set a1 input steps S103 is obtained, convolutional Neural networking is according to data in training sample set a1
Semi-supervised learning is carried out, and automatically updates weights, network structure and inner parameter are preserved after the completion of training.Convolutional neural networks
Inner parameter at least includes weights and learning rate, in order to prevent weights symmetrization, general random initialization.Learning rate is randomly selected
Any Digit in 0-1, because parameter can be automatically updated according to study in subsequent process, initial value influence is little.
In step S105, the convolutional neural networks that test sample collection a3 input steps S104 is obtained will be lacked, prediction is filled out
Missing values are filled, test sample collection a3 filling result will be lacked after prediction filling missing values is with deleting the sample set before data
Test sample collection a2 is compared, and judges whether to meet required precision:
(1) if accuracy rate meets required precision, S106 is gone to step;
(2) if accuracy rate does not meet required precision, go to step S103 and the network structure of convolutional neural networks is adjusted
Foregoing training step S104 and verification step S105 is re-executed after whole, so constantly iteration is until meet required precision.Right
When the network structure of convolutional neural networks is adjusted, first choice changes convolution number of times, that is, increases or decreases convolutional layer number.Convolution
Purpose be depth excavate feature, pond layer closely follow convolution, it can thus be understood that two layers two layers increase optimize.When three increases
It was found that when final accuracy rate increases few, accordingly reduce back this there is the initial number of plies of accuracy rate for the first time.
In step s 106, the second network processes step, the volume that partial data subset A input steps S105 is obtained are performed
Product neutral net, the convolutional neural networks improved.Whole partial datas are inputted, full detail is obtained, make network calculations
More complete feature obtains weights, finally gives the best network structure of effect and internal weights, finally preserves the network structure
It is used as perfect convolutional neural networks.That is, the weights after S106 updates will be fixed by fixed preservation as parameter
Network structure, used for S107.Therefore, convolutional neural networks are first optimized by first network process step respectively in the present invention
Outside network structure, then pass through the weights inside the second network processes optimization order.
In step s 107, perform Missing Data Filling step, by missing data subset B input steps S106 obtain it is perfect
Convolutional neural networks complete missing values filling.Missing data subset B is inputted to final network structure in the step.
In step S108, flow terminates.
The present invention is before each application, and without repetition training, directly using the network trained, input has the number of missing values
According to group, you can predict missing values, it is filled.The convolutional neural networks that the present invention has been selected in deep neural network, by number
After being pre-processed according to collection, the network number of plies for meeting Database size specification is createed according to data set size, is set at the beginning of every layer
Beginning parameter, training set is inputted in network, and convolutional neural networks oneself can calculate the relation between data, and its data is joined
Number is updated.Therefore, the present invention is not only restricted to the integrality of data set, and just energy depth excavates the relation between data, obtains
Go out corresponding learning rate and weights, training network is formed, so as to predict missing values and fill.Convolutional neural networks have weights simultaneously
Shared the characteristics of, high degree must reduce the quantity of weights in the training process, reduce the requirement to computer hardware with bearing
Load, reduces the generation of data over-fitting.
Fig. 2 is please referred to, is a kind of implementation of data prediction step in the method according to the preferred embodiment of the present invention
The flow chart of mode.Specifically included as shown in Fig. 2 the data prediction step is abovementioned steps S102:
In step s 201, flow starts;
In step S202, collect data and build pending data collection.Collected in the step at true and accurate data latency
Reason.
In step S203, data with existing collection is classified, the data of perfect mistake are separated and are used as partial data
Subset A, the data for having missing is separated as lacking data subset B.
In step S204,60%~80% data are randomly selected from partial data subset A as training sample set
A1, remaining is used as test sample collection a2.In a preferred embodiment of the invention, step S204 is from partial data subset A
In randomly select 70% data as training sample set a1, remaining 30% data is used as test sample collection a2.
In step S205, in test sample collection a2, missing test sample collection a3 is used as after random erasure partial data.
It is preferred that missing test sample collection a3 will be used as after the data of test sample collection a2 random erasures 20%~40%.It is highly preferred that with
Machine deletes 30% data, and will delete the test sample collection a2 after data as missing test sample collection a3.
In step S206, the flow terminates.
Present invention also offers a kind of storage medium, wherein a plurality of instruction that is stored with, the instruction is suitable to be added by processor
Carry and perform the step in the foregoing Missing Data Filling method based on deep learning.Such as execution step S101~
S108。
Fig. 3 is please referred to, is the Missing Data Filling system based on deep learning according to the preferred embodiment of the present invention
Module frame chart.As shown in figure 3, the Missing Data Filling system 10 based on deep learning that the embodiment is provided at least includes:Data
Pretreatment module 100, first network processing module 200, the second network process module 300 and Missing Data Filling module 400.
Wherein data preprocessing module 100 is used to pre-process data set, including:Data set is divided into partial data
Subset A and missing data subset B;Data in partial data subset A are divided into training sample set a1 and test sample collection a2;With
Partial data is used as missing test sample collection a3 in machine deletion test sample collection a2.
First network processing module 200 is connected with data preprocessing module 100, for utilizing training sample set a1 to preliminary
The convolutional neural networks of structure are trained and preserved, using the convolutional neural networks obtained after training to missing test sample collection
A3 carries out Missing Data Filling, and filling result is compared with test sample collection a2, and convolution god is adjusted when not meeting required precision
Network structure through the network and foregoing training of iteration and verification step are until meet required precision.
Second network process module 300 connects with data preprocessing module 100 and first network processing module 200 simultaneously
Connect, for partial data subset A to be inputted into the convolutional neural networks that first network processing module 100 is obtained, the volume improved
Product neutral net.
Missing Data Filling module 400 is connected with the network process module 300 of data preprocessing module 100 and second simultaneously,
For missing data subset B to be inputted into the perfect convolutional neural networks completion missing values that the second network process module 300 is obtained
Filling.
Fig. 4 is please referred to, is a kind of implementation of data preprocessing module in the system according to the preferred embodiment of the present invention
The schematic diagram of mode.As shown in figure 4, the data preprocessing module 100 is specifically included:Data collection module 110, the first grouping sheet
First 120, second taxon 130 and data delete unit 140.
Data collection module 110 is used to collect data structure pending data collection.
First taxon 120 is connected with data collection module 110, for classifying to data set, will be perfect
The data of mistake are separated as partial data subset A, the data for having missing are separated as lacking data subset B.
Second taxon 130 is connected with the first taxon 120, for being randomly selected from partial data subset A
60%~80% data are as training sample set a1, and the remainder data in partial data subset A is used as test sample collection a2.
In a preferred embodiment of the present invention, the data that the second taxon 130 randomly selects 70% from partial data subset A are made
For training sample set a1, remaining 30% data is used as test sample collection a2.
Data are deleted unit 140 and are connected with the second taxon 130, in test sample collection a2, random erasure portion
Divided data is used as missing test sample collection a3.
Fig. 5 is please referred to, is that one kind of first network processing module in the system according to the preferred embodiment of the present invention is real
Apply the schematic diagram of mode.As shown in figure 5, the first network processing module 200 is specifically included:Network struction adjustment unit 210, net
Network training unit 220 and compare iteration unit 230.
Network struction adjustment unit 210, for building convolutional neural networks, by input layer, the first convolutional layer, the first pond
Layer, the second convolutional layer, the second pond layer, full articulamentum and output layer composition, and initiation parameter.The network struction adjustment unit
210 can also start the function of execution network structure regulation by comparing iteration unit 230, in the network knot to convolutional neural networks
When structure is adjusted, first choice changes convolution number of times, that is, increases or decreases convolutional layer number.
Network training unit 220 is connected with network struction adjustment unit 210, for training sample set a1 to be inputted into network structure
The convolutional neural networks that adjustment unit 210 is obtained are built, convolutional Neural networking carries out half according to data in the training sample set a1
Supervised learning, and weights are automatically updated, network structure and inner parameter are preserved after the completion of training.
Compare iteration unit 230 to be connected with network training unit 220, for missing test sample collection a3 input networks to be instructed
Practice the convolutional neural networks that unit 220 is obtained, prediction filling missing values will lack test sample collection a3 filling result with deleting
Test sample collection a2 before data is compared, if accuracy rate meets required precision, starts the second network process module 300, if accurate
True rate does not meet required precision, then starts network struction adjustment unit 210 and the network structure of convolutional neural networks is adjusted,
And continue to start network training unit 220 and compare iteration unit 230 to carry out iteration, until accuracy rate meets precision and wanted
Ask.
In summary, the present invention is using convolutional Neural networking, compared with traditional statistics method, is supervised using neutral net half
The relation between learning data is superintended and directed, the feature between data is implicitly extracted, is not only restricted to the integrality of available data, no matter instructs
How practice collection property, further feature can be found out and learnt and tested;Compared with shallow-layer artificial neural network, convolutional Neural net
Neuron weights between network, each characteristic layer are shared, and reduce number of parameters, reduce the complexity of network, eliminate simultaneously
Complicated cumbersome multiple reverse residual computations.The present invention solves the problems, such as database Missing Data Filling, has reached that the degree of accuracy is higher,
The faster effect of efficiency, more truly can rapidly reduce missing data.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
The present invention is described in detail with reference to the foregoing embodiments, it will be understood by those within the art that:It still may be used
To be modified to the technical scheme described in foregoing embodiments, or equivalent substitution is carried out to which part technical characteristic;
And these modification or replace, do not make appropriate technical solution essence depart from various embodiments of the present invention technical scheme spirit and
Scope.
Claims (10)
1. a kind of Missing Data Filling method based on deep learning, it is characterised in that comprise the following steps:
(1) data set is pre-processed, the data set is divided into partial data subset and missing data subset, will be described complete
Data in entire data subset are divided into partial data conduct in training sample set and test sample collection, random erasure test sample collection
Lack test sample collection;
(2) it is trained and preserves using the convolutional neural networks of the training sample set pair Primary Construction, is obtained using after training
The convolutional neural networks arrived carry out Missing Data Filling to missing test sample collection, and will filling result and the test sample collection ratio
Right, the foregoing training of network structure and iteration and the verification step that the convolutional neural networks are adjusted when not meeting required precision are straight
To meeting required precision;
(3) convolutional neural networks for obtaining the partial data subset input step (2), the convolutional Neural net improved
Network;
(4) the perfect convolutional neural networks that the missing data subset input step (3) obtains are completed to the filling of missing values.
2. the Missing Data Filling method according to claim 1 based on deep learning, it is characterised in that the step (1)
Including:
(1-1) collects data and builds pending data collection;
(1-2) classifies to the data set, the data of perfect mistake is separated as the partial data subset, will had
The data of missing are separated lacks data subset as described;
(1-3) randomly selects 60%~80% data as training sample set, remaining conduct from the partial data subset
The test sample collection;
(1-4) is concentrated in the test sample, and missing test sample collection is used as after random erasure partial data.
3. the Missing Data Filling method according to claim 2 based on deep learning, it is characterised in that the step (1-
3) 70% data are randomly selected from the partial data subset in as training sample set, remaining 30% data is used as survey
Try sample set.
4. according to the Missing Data Filling method according to any one of claims 1 to 3 based on deep learning, it is characterised in that
The step (2) specifically includes:
(2-1) builds convolutional neural networks, by input layer, the first convolutional layer, the first pond layer, the second convolutional layer, the second pond
Layer, full articulamentum and output layer composition, and initiation parameter;
The training sample set is inputted the convolutional neural networks by (2-2), and convolutional Neural networking is according to the training sample set
Middle data carry out semi-supervised learning, and automatically update weights, and network structure and inner parameter are preserved after the completion of training;
(2-3) will lack test sample collection and input the convolutional neural networks, and prediction filling missing values will lack test sample collection
Filling result compared with the test sample collection, if accuracy rate meets required precision, step (3) is performed, if accuracy rate is not
Meet required precision, then return to step (2-1) is adjusted to the network structure of the convolutional neural networks.
5. according to the Missing Data Filling method according to any one of claims 1 to 3 based on deep learning, it is characterised in that
The network structure of the adjustment convolutional neural networks is the convolution time for increasing or decreasing convolutional neural networks in the step (3)
Number.
6. a kind of storage medium, it is characterised in that be wherein stored with a plurality of instruction, the instruction is suitable to be loaded and held by processor
Step in row such as claim 1-5 in the Missing Data Filling method based on deep learning of any one.
7. a kind of Missing Data Filling system based on deep learning, it is characterised in that including:
Data preprocessing module, for being pre-processed to data set, is divided into partial data subset and missing by the data set
Data in the partial data subset are divided into training sample set and test sample collection, random erasure test specimens by data subset
This concentrated part data are used as missing test sample collection;
First network processing module, is trained simultaneously for the convolutional neural networks using the training sample set pair Primary Construction
Preserve, Missing Data Filling is carried out to missing test sample collection using the convolutional neural networks obtained after training, and result will be filled
Compared with the test sample collection, before network structure and iteration that the convolutional neural networks are adjusted when not meeting required precision
Training and verification step are stated until meeting required precision;
Second network process module, for the partial data subset to be inputted into the convolution that the first network processing module is obtained
Neutral net, the convolutional neural networks improved;
Missing Data Filling module, missing values are completed for the missing data subset to be inputted into the perfect convolutional neural networks
Filling.
8. the Missing Data Filling system according to claim 7 based on deep learning, it is characterised in that the data are located in advance
Reason module includes:
Data collection module, pending data collection is built for collecting data;
First taxon, for classifying to the data set, the data of perfect mistake is separated as described complete
Data subset, the data for having missing are separated as described and lack data subset;
Second taxon, for randomly selecting 60%~80% data from the partial data subset as training sample
Collection, remaining is used as the test sample collection;
Data delete unit, for being concentrated in the test sample, and missing test sample collection is used as after random erasure partial data.
9. the Missing Data Filling system according to claim 8 based on deep learning, it is characterised in that second classification
Unit randomly selects 70% data as training sample set from the partial data subset, and remaining 30% data is used as survey
Try sample set.
10. the Missing Data Filling system based on deep learning according to any one of claim 7~9, it is characterised in that
The first network processing module is specifically included:
Network struction adjustment unit, for building convolutional neural networks, by input layer, the first convolutional layer, the first pond layer, second
Convolutional layer, the second pond layer, full articulamentum and output layer composition, and initiation parameter;
Network training unit, inputs the convolutional neural networks, convolutional Neural networking is according to the instruction by the training sample set
Practice sample intensive data and carry out semi-supervised learning, and automatically update weights, network structure and inner parameter are preserved after the completion of training;
Compare iteration unit, missing test sample collection is inputted into the convolutional neural networks, prediction filling missing values survey missing
The filling result of examination sample set is compared with the test sample collection, if accuracy rate meets required precision, starts second net
Network processing module, if accuracy rate does not meet required precision, starts the network struction adjustment unit to the convolutional Neural net
The network structure of network is adjusted.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710358297.2A CN107273429B (en) | 2017-05-19 | 2017-05-19 | A kind of Missing Data Filling method and system based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710358297.2A CN107273429B (en) | 2017-05-19 | 2017-05-19 | A kind of Missing Data Filling method and system based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107273429A true CN107273429A (en) | 2017-10-20 |
CN107273429B CN107273429B (en) | 2018-04-13 |
Family
ID=60065112
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710358297.2A Active CN107273429B (en) | 2017-05-19 | 2017-05-19 | A kind of Missing Data Filling method and system based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107273429B (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108197706A (en) * | 2017-11-27 | 2018-06-22 | 华南师范大学 | Incomplete data deep learning neural network method, device, computer equipment and storage medium |
CN108615096A (en) * | 2018-05-10 | 2018-10-02 | 平安科技(深圳)有限公司 | Server, the processing method of Financial Time Series and storage medium |
CN110232564A (en) * | 2019-08-02 | 2019-09-13 | 南京擎盾信息科技有限公司 | A kind of traffic accident law automatic decision method based on multi-modal data |
CN110472817A (en) * | 2019-07-03 | 2019-11-19 | 西北大学 | A kind of XGBoost of combination deep neural network integrates credit evaluation system and its method |
CN111488551A (en) * | 2019-01-28 | 2020-08-04 | 斯特拉德视觉公司 | Method and device for verifying integrity of convolution operation |
CN111553463A (en) * | 2020-04-17 | 2020-08-18 | 东南大学 | Method for estimating throughput of wireless access point based on deep learning and network parameters |
CN111597175A (en) * | 2020-05-06 | 2020-08-28 | 天津大学 | Filling method for missing value of sensor fusing spatio-temporal information |
CN111966740A (en) * | 2020-08-24 | 2020-11-20 | 安徽思环科技有限公司 | Water quality fluorescence data feature extraction method based on deep learning |
CN112164468A (en) * | 2020-10-09 | 2021-01-01 | 北京航空航天大学 | Method for processing missing data of pregnancy examination data |
WO2021016995A1 (en) * | 2019-08-01 | 2021-02-04 | 深圳大学 | Data processing method and apparatus, computer device, and storage medium |
CN112750530A (en) * | 2021-01-05 | 2021-05-04 | 上海梅斯医药科技有限公司 | Model training method, terminal device and storage medium |
WO2021169116A1 (en) * | 2020-02-29 | 2021-09-02 | 平安科技(深圳)有限公司 | Intelligent missing data filling method, apparatus and device, and storage medium |
CN113515896A (en) * | 2021-08-06 | 2021-10-19 | 红云红河烟草(集团)有限责任公司 | Data missing value filling method for real-time cigarette acquisition |
CN113657717A (en) * | 2021-07-15 | 2021-11-16 | 福州大学至诚学院 | Method for evaluating robustness of electric commerce enterprise credit early warning model based on missing value sample |
CN113780666A (en) * | 2021-09-15 | 2021-12-10 | 湖北天天数链技术有限公司 | Missing value prediction method and device and readable storage medium |
CN115034039A (en) * | 2022-05-13 | 2022-09-09 | 西北工业大学 | PIV flow field data filling method based on convolutional neural network |
CN115223709A (en) * | 2022-07-26 | 2022-10-21 | 内蒙古卫数数据科技有限公司 | Missing value filling migration learning method based on disease distribution diagnosis neural network model |
CN115238807A (en) * | 2022-07-29 | 2022-10-25 | 中用科技有限公司 | AMC detection method based on artificial intelligence |
CN116831620A (en) * | 2023-07-13 | 2023-10-03 | 逸超医疗科技(北京)有限公司 | Doppler missing data filling method based on deep learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103177088A (en) * | 2013-03-08 | 2013-06-26 | 北京理工大学 | Biomedicine missing data compensation method |
CN103246702A (en) * | 2013-04-02 | 2013-08-14 | 大连理工大学 | Industrial sequential data missing filling method based on sectional state displaying |
CN103544218A (en) * | 2013-09-29 | 2014-01-29 | 广西师范大学 | Nearest neighbor filling method of non-fixed k values |
CN104751229A (en) * | 2015-04-13 | 2015-07-01 | 辽宁大学 | Bearing fault diagnosis method capable of recovering missing data of back propagation neural network estimation values |
-
2017
- 2017-05-19 CN CN201710358297.2A patent/CN107273429B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103177088A (en) * | 2013-03-08 | 2013-06-26 | 北京理工大学 | Biomedicine missing data compensation method |
CN103246702A (en) * | 2013-04-02 | 2013-08-14 | 大连理工大学 | Industrial sequential data missing filling method based on sectional state displaying |
CN103544218A (en) * | 2013-09-29 | 2014-01-29 | 广西师范大学 | Nearest neighbor filling method of non-fixed k values |
CN104751229A (en) * | 2015-04-13 | 2015-07-01 | 辽宁大学 | Bearing fault diagnosis method capable of recovering missing data of back propagation neural network estimation values |
Non-Patent Citations (4)
Title |
---|
WEI WEI 等: ""A generic neural network approach for filling missing data in data mining"", 《SYSTEMS,MAN AND CYBERNETICS,2003.IEEE INTERNATIONAL CONFERENCE ON 》 * |
卜范玉 等: ""基于深度学习的不完整大数据填充算法"", 《微电子学与计算机》 * |
胡玄子 等: ""数据处理中缺失数据填充方法的研究"", 《湖北工业大学学报》 * |
郑斌: ""基于改进遗传算法的不完整大数据填充挖掘算法"", 《微电子学与计算机》 * |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108197706A (en) * | 2017-11-27 | 2018-06-22 | 华南师范大学 | Incomplete data deep learning neural network method, device, computer equipment and storage medium |
CN108615096A (en) * | 2018-05-10 | 2018-10-02 | 平安科技(深圳)有限公司 | Server, the processing method of Financial Time Series and storage medium |
CN111488551A (en) * | 2019-01-28 | 2020-08-04 | 斯特拉德视觉公司 | Method and device for verifying integrity of convolution operation |
CN111488551B (en) * | 2019-01-28 | 2023-12-05 | 斯特拉德视觉公司 | Method and device for verifying integrity of convolution operation |
CN110472817A (en) * | 2019-07-03 | 2019-11-19 | 西北大学 | A kind of XGBoost of combination deep neural network integrates credit evaluation system and its method |
WO2021016995A1 (en) * | 2019-08-01 | 2021-02-04 | 深圳大学 | Data processing method and apparatus, computer device, and storage medium |
CN110232564A (en) * | 2019-08-02 | 2019-09-13 | 南京擎盾信息科技有限公司 | A kind of traffic accident law automatic decision method based on multi-modal data |
WO2021169116A1 (en) * | 2020-02-29 | 2021-09-02 | 平安科技(深圳)有限公司 | Intelligent missing data filling method, apparatus and device, and storage medium |
CN111553463A (en) * | 2020-04-17 | 2020-08-18 | 东南大学 | Method for estimating throughput of wireless access point based on deep learning and network parameters |
CN111553463B (en) * | 2020-04-17 | 2022-11-18 | 东南大学 | Method for estimating throughput of wireless access point based on deep learning and network parameters |
CN111597175A (en) * | 2020-05-06 | 2020-08-28 | 天津大学 | Filling method for missing value of sensor fusing spatio-temporal information |
CN111597175B (en) * | 2020-05-06 | 2023-06-02 | 天津大学 | Filling method of sensor missing value fusing time-space information |
CN111966740A (en) * | 2020-08-24 | 2020-11-20 | 安徽思环科技有限公司 | Water quality fluorescence data feature extraction method based on deep learning |
CN112164468A (en) * | 2020-10-09 | 2021-01-01 | 北京航空航天大学 | Method for processing missing data of pregnancy examination data |
CN112750530A (en) * | 2021-01-05 | 2021-05-04 | 上海梅斯医药科技有限公司 | Model training method, terminal device and storage medium |
CN113657717A (en) * | 2021-07-15 | 2021-11-16 | 福州大学至诚学院 | Method for evaluating robustness of electric commerce enterprise credit early warning model based on missing value sample |
CN113515896A (en) * | 2021-08-06 | 2021-10-19 | 红云红河烟草(集团)有限责任公司 | Data missing value filling method for real-time cigarette acquisition |
CN113515896B (en) * | 2021-08-06 | 2022-08-09 | 红云红河烟草(集团)有限责任公司 | Data missing value filling method for real-time cigarette acquisition |
CN113780666A (en) * | 2021-09-15 | 2021-12-10 | 湖北天天数链技术有限公司 | Missing value prediction method and device and readable storage medium |
CN113780666B (en) * | 2021-09-15 | 2024-03-22 | 湖北天天数链技术有限公司 | Missing value prediction method and device and readable storage medium |
CN115034039A (en) * | 2022-05-13 | 2022-09-09 | 西北工业大学 | PIV flow field data filling method based on convolutional neural network |
CN115034039B (en) * | 2022-05-13 | 2024-09-06 | 西北工业大学 | PIV flow field data deficiency supplementing method based on convolutional neural network |
CN115223709A (en) * | 2022-07-26 | 2022-10-21 | 内蒙古卫数数据科技有限公司 | Missing value filling migration learning method based on disease distribution diagnosis neural network model |
CN115223709B (en) * | 2022-07-26 | 2024-01-23 | 内蒙古卫数数据科技有限公司 | Deficiency value filling migration learning method based on cloth disease diagnosis neural network model |
CN115238807A (en) * | 2022-07-29 | 2022-10-25 | 中用科技有限公司 | AMC detection method based on artificial intelligence |
CN115238807B (en) * | 2022-07-29 | 2024-02-27 | 中用科技有限公司 | AMC detection method based on artificial intelligence |
CN116831620A (en) * | 2023-07-13 | 2023-10-03 | 逸超医疗科技(北京)有限公司 | Doppler missing data filling method based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN107273429B (en) | 2018-04-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107273429B (en) | A kind of Missing Data Filling method and system based on deep learning | |
Mienye et al. | Prediction performance of improved decision tree-based algorithms: a review | |
CN110363344A (en) | Probability integral parameter prediction method based on MIV-GP algorithm optimization BP neural network | |
CN110473592B (en) | Multi-view human synthetic lethal gene prediction method | |
CN108009674A (en) | Air PM2.5 concentration prediction methods based on CNN and LSTM fused neural networks | |
CN112700434B (en) | Medical image classification method and classification device thereof | |
Salama et al. | Utilizing multiple pheromones in an ant-based algorithm for continuous-attribute classification rule discovery | |
CN107222333A (en) | A kind of network node safety situation evaluation method based on BP neural network | |
CN106250461A (en) | A kind of algorithm utilizing gradient lifting decision tree to carry out data mining based on Spark framework | |
CN107679549A (en) | Generate the method and system of the assemblage characteristic of machine learning sample | |
GB2608540A (en) | Personalized automated machine learning | |
CN112765415A (en) | Link prediction method based on relational content joint embedding convolution neural network | |
CN114117787A (en) | Short-term wind power prediction method based on SSA (simple sequence analysis) optimization BP (back propagation) neural network | |
Tiruneh et al. | Feature selection for construction organizational competencies impacting performance | |
Tandekar et al. | A Review on Various Plant Disease Detection Using Image Processing | |
CN114170446A (en) | Temperature and brightness characteristic extraction method based on deep fusion neural network | |
CN117152528A (en) | Insulator state recognition method, insulator state recognition device, insulator state recognition apparatus, insulator state recognition program, and insulator state recognition program | |
CN107798331A (en) | From zoom image sequence characteristic extracting method and device | |
CN111222529A (en) | GoogLeNet-SVM-based sewage aeration tank foam identification method | |
CN110334869A (en) | A kind of mangrove forest ecological health forecast training method based on dynamic colony optimization algorithm | |
CN109934352A (en) | The automatic evolvement method of model of mind | |
CN115660221A (en) | Oil and gas reservoir economic recoverable reserve assessment method and system based on hybrid neural network | |
Mohanty et al. | Modeling the axial capacity of bored piles using multi-objective feature selection, functional network and multivariate adaptive regression spline | |
CN104463205B (en) | Data classification method based on chaos depth wavelet network | |
Balamurugan et al. | Artificial Intelligence Based Smart Farming and Data Collection Using Deep Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |