CN107273429A - A kind of Missing Data Filling method and system based on deep learning - Google Patents

A kind of Missing Data Filling method and system based on deep learning Download PDF

Info

Publication number
CN107273429A
CN107273429A CN201710358297.2A CN201710358297A CN107273429A CN 107273429 A CN107273429 A CN 107273429A CN 201710358297 A CN201710358297 A CN 201710358297A CN 107273429 A CN107273429 A CN 107273429A
Authority
CN
China
Prior art keywords
data
missing
convolutional neural
neural networks
test sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710358297.2A
Other languages
Chinese (zh)
Other versions
CN107273429B (en
Inventor
王宏志
王艺蒙
赵志强
孙旭冉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Da Da Data Industry Co Ltd
Original Assignee
Da Da Data Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Da Da Data Industry Co Ltd filed Critical Da Da Data Industry Co Ltd
Priority to CN201710358297.2A priority Critical patent/CN107273429B/en
Publication of CN107273429A publication Critical patent/CN107273429A/en
Application granted granted Critical
Publication of CN107273429B publication Critical patent/CN107273429B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a kind of Missing Data Filling method and system based on deep learning, wherein method comprises the following steps:Data set is pre-processed;It is trained and preserves using the convolutional neural networks of training sample set pair Primary Construction, Missing Data Filling is carried out to missing test sample collection using the convolutional neural networks obtained after training, and compare filling result with test sample collection, when not meeting required precision, the foregoing training of network structure and iteration of adjustment convolutional neural networks and verification step are until meet required precision;Partial data subset is inputted into convolutional neural networks, the convolutional neural networks improved;Missing data subset is inputted into the filling that perfect convolutional neural networks complete missing values.The problem of present invention solves database Missing Data Filling, has reached that the degree of accuracy is higher, the faster effect of efficiency, more truly can rapidly reduce missing data.

Description

A kind of Missing Data Filling method and system based on deep learning
Technical field
The present invention relates to field of computer technology, more particularly to a kind of Missing Data Filling method based on deep learning and it is System.
Background technology
Since self-information technology is widely applied to the development of every profession and trade and these new fields and old fields of promotion that exceed the speed limit, data conduct The resource that this technology is depended on for existence is constantly gathered with excavating, and data volume is just expanded with shockingly speed.Huge Data undoubtedly add the difficulty of data management.Occur omission, incorrect measurement when in real world due to data inputting Method, the limitation of collection condition are deleted etc. many factors because violating constraints and are likely to cause generation data Missing.Missing values do not mean only that the blank of information, it is often more important that it can influence the work such as follow-up data excavation, statistical analysis Progress.Handle missing values common method include delete comprising missing item first ancestral, using missing values as particular value processing or Person carries out Missing Data Filling.It is all higher in view of real data storehouse miss rate, and missing pattern is generally stochastic model, therefore It is more reasonable using the third processing method.
Some Missing Data Filling methods for being directed to different pieces of information are had been presented at present, and these methods are based primarily upon statistics side Method.Such as EM algorithm (EM), importance sampling.Wherein EM algorithms are divided into two steps:The first step is to calculate to expect (E), i.e., Missing values are filled according to parameter;Second is to maximize (M), i.e., the maximum likelihood value of parameter is tried to achieve under available data collection, such as This alternating iteration is until convergence.The complexity of this algorithm depends on missing variable number and probability density function.It is another conventional Filling algorithm be exactly homing method, including linear regression, multiple regression and logistic regression etc..This class algorithm is according to data Between correlation response variable is fitted with some explanatory variables.Another approximate bayes method based on sampling is There is the extraction m data put back to fill m missing values in data have been observed.
Above-mentioned EM algorithm predicts the value of missing variable with the model of fit set up on partial data.Fitting Quality is depending on the selection of independent variable and the complete degree of training set, and filling effect is dramatically by the shadow of available data Ring.Bayes method, which is extracted, have been observed in data data to fill missing values, although method is simple and to maintain data substantially former Begin to be distributed, but it ignores the dependency relation between variable.And statistical method needs to carry out explicit extraction feature in advance, as The basis of probabilistic forecasting, and the bad conclusion of internal relation between data.
The content of the invention
The technical problem to be solved in the present invention is, is relied on for Missing Data Filling method high degree of the prior art The integrality of data with existing, and the defect of deep relationship between data can not be looked for there is provided a kind of based on deep learning Missing Data Filling method and system, can be inside depth mining data and the characteristics of correlation using deep neural network, energy Filling precision and charging efficiency are improved simultaneously.
First aspect present invention comprises the following steps there is provided a kind of Missing Data Filling method based on deep learning:
(1) data set is pre-processed, the data set is divided into partial data subset and missing data subset, by institute State the data in partial data subset and be divided into partial data in training sample set and test sample collection, random erasure test sample collection It is used as missing test sample collection;
(2) it is trained and preserves using the convolutional neural networks of the training sample set pair Primary Construction, uses training The convolutional neural networks obtained afterwards carry out Missing Data Filling to missing test sample collection, and will filling result and the test sample Collection is compared, and the foregoing training of network structure and iteration and the checking step of the convolutional neural networks are adjusted when not meeting required precision Suddenly until meeting required precision;
(3) convolutional neural networks for obtaining the partial data subset input step (2), the convolutional Neural improved Network;
(4) the perfect convolutional neural networks that the missing data subset input step (3) obtains are completed into missing values Filling.
According in the Missing Data Filling method of the present invention based on deep learning, the step (1) includes:
(1-1) collects data and builds pending data collection;
(1-2) classifies to the data set, the data of perfect mistake is separated as the partial data subset, The data for having missing are separated as described and lack data subset;
(1-3) randomly selects 60%~80% data as training sample set from the partial data subset, remaining It is used as the test sample collection;
(1-4) is concentrated in the test sample, and missing test sample collection is used as after random erasure partial data.
According in the Missing Data Filling method of the present invention based on deep learning, from institute in the step (1-3) The data that 70% is randomly selected in partial data subset are stated as training sample set, remaining 30% data is used as test sample Collection.
According in the Missing Data Filling method of the present invention based on deep learning, the step (2) specifically includes:
(2-1) builds convolutional neural networks, by input layer, the first convolutional layer, the first pond layer, the second convolutional layer, second Pond layer, full articulamentum and output layer composition, and initiation parameter;
The training sample set is inputted the convolutional neural networks by (2-2), and convolutional Neural networking is according to the training sample This intensive data carries out semi-supervised learning, and automatically updates weights, and network structure and inner parameter are preserved after the completion of training;
(2-3) will lack test sample collection and input the convolutional neural networks, and prediction filling missing values will lack test specimens The filling result of this collection is compared with the test sample collection, if accuracy rate meets required precision, step (3) is performed, if accurately Rate does not meet required precision, then return to step (2-1) is adjusted to the network structure of the convolutional neural networks.
According in the Missing Data Filling method of the present invention based on deep learning, institute is adjusted in the step (3) The network structure for stating convolutional neural networks is the convolution number of times for increasing or decreasing convolutional neural networks.
Second aspect of the present invention is there is provided a kind of storage medium, wherein a plurality of instruction that is stored with, the instruction be suitable to by Reason device loads and performs the step in the foregoing Missing Data Filling method based on deep learning.
Third aspect present invention there is provided a kind of Missing Data Filling system based on deep learning, including:
Data preprocessing module, for being pre-processed to data set, by the data set be divided into partial data subset and Data in the partial data subset are divided into training sample set and test sample collection by missing data subset, and random erasure is surveyed This concentrated part of sample data are used as missing test sample collection;
First network processing module, is instructed for the convolutional neural networks using the training sample set pair Primary Construction Practice and preserve, Missing Data Filling is carried out to missing test sample collection using the convolutional neural networks obtained after training, and will filling As a result compared with the test sample collection, the network structure of the convolutional neural networks is adjusted when not meeting required precision and is changed For foregoing training and verification step until meeting required precision;
Second network process module, for the partial data subset to be inputted into what the first network processing module was obtained Convolutional neural networks, the convolutional neural networks improved;
Missing Data Filling module, completes to lack for the missing data subset to be inputted into the perfect convolutional neural networks The filling of mistake value.
According in the Missing Data Filling system of the present invention based on deep learning, the data preprocessing module bag Include:
Data collection module, pending data collection is built for collecting data;
First taxon, for classifying to the data set, the data of perfect mistake are separated as described Partial data subset, the data for having missing are separated as described and lack data subset;
Second taxon, for randomly selecting 60%~80% data from the partial data subset as training Sample set, remaining is used as the test sample collection;
Data delete unit, for being concentrated in the test sample, and missing test specimens are used as after random erasure partial data This collection.
According in the Missing Data Filling system of the present invention based on deep learning, second taxon is from institute The data that 70% is randomly selected in partial data subset are stated as training sample set, remaining 30% data is used as test sample Collection.
According in the Missing Data Filling system of the present invention based on deep learning, the first network processing module Specifically include:
Network struction adjustment unit, for building convolutional neural networks, by input layer, the first convolutional layer, the first pond layer, Second convolutional layer, the second pond layer, full articulamentum and output layer composition, and initiation parameter;
Network training unit, inputs the convolutional neural networks, convolutional Neural networking is according to institute by the training sample set Stating training sample concentrates data to carry out semi-supervised learning, and automatically updates weights, and network structure and inside are preserved after the completion of training Parameter;
Compare iteration unit, missing test sample collection is inputted into the convolutional neural networks, prediction filling missing values will lack The filling result for losing test sample collection is compared with the test sample collection, if accuracy rate meets required precision, starts described the Two network process modules, if accuracy rate does not meet required precision, start the network struction adjustment unit to convolution god Network structure through network is adjusted.
Implement the Missing Data Filling method and system based on deep learning of the present invention, have the advantages that:This hair The bright convolutional neural networks selected in deep neural network, after data set is pre-processed, are created according to data set size Go out to meet the network number of plies of Database size specification, every layer of initial parameter is set, training set is inputted in network, neutral net is just The relation between data oneself can be calculated, its data parameter is updated so that the present invention is not only restricted to the complete of data set Whole property, excavates the relation between data with regard to energy depth, draws corresponding learning rate and weights, training network is formed, so as to predict Missing values are simultaneously filled;Convolutional neural networks have the characteristics of weights are shared simultaneously, and high degree must be reduced in the training process The quantity of weights, reduces the requirement to computer hardware and burden, reduces the generation of data over-fitting.
Brief description of the drawings
Fig. 1 is the Missing Data Filling method flow diagram based on deep learning according to the preferred embodiment of the present invention;
Fig. 2 is a kind of flow of embodiment of data prediction step in the method according to the preferred embodiment of the present invention Figure;
Fig. 3 is the module frame chart of the Missing Data Filling system based on deep learning according to the preferred embodiment of the present invention;
Fig. 4 is a kind of signal of embodiment of data preprocessing module in the system according to the preferred embodiment of the present invention Figure;
Fig. 5 is a kind of signal of embodiment of first network processing module in the system according to the preferred embodiment of the present invention Figure.
Embodiment
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is A part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, ordinary skill people The every other embodiment that member is obtained on the premise of creative work is not made, belongs to the scope of protection of the invention.
Referring to Fig. 1, being the Missing Data Filling method flow diagram based on deep learning according to the preferred embodiment of the present invention. As shown in figure 1, the Missing Data Filling method based on deep learning that the embodiment is provided comprises the following steps:
In step S101, flow starts;
In step s 102, data prediction step is performed, data set is pre-processed, including:Data set is divided into Partial data subset A and missing data subset B;Data in partial data subset A are divided into training sample set a1 and test specimens This collection a2;Partial data is used as missing test sample collection a3 in random erasure test sample collection a2.
Then, first network process step is performed in step S103~S105:Using training sample set a1 to preliminary structure The convolutional neural networks built are trained and preserved, using the convolutional neural networks obtained after training to missing test sample collection a3 Missing Data Filling is carried out, and filling result is compared with test sample collection a2, convolutional Neural is adjusted when not meeting required precision The foregoing training of network structure and iteration of network and verification step are until meet required precision.The step is specifically included:
In step s 103, build or adjust the network structure of convolutional neural networks.First, Preliminary design network structure, By input layer, the first convolutional layer, the first pond layer, the second convolutional layer, the second pond layer, full articulamentum and output layer totally seven layers of group Into thus Primary Construction convolutional neural networks, and initiation parameter.The parameter of the initialization includes each layer of neuron number The size of convolution kernel, relevant with the specification of input and output in amount, each feature map size, convolutional layer.
In step S104, convolutional neural networks are trained and preserved using training sample set a1.Will in the step The convolutional neural networks that training sample set a1 input steps S103 is obtained, convolutional Neural networking is according to data in training sample set a1 Semi-supervised learning is carried out, and automatically updates weights, network structure and inner parameter are preserved after the completion of training.Convolutional neural networks Inner parameter at least includes weights and learning rate, in order to prevent weights symmetrization, general random initialization.Learning rate is randomly selected Any Digit in 0-1, because parameter can be automatically updated according to study in subsequent process, initial value influence is little.
In step S105, the convolutional neural networks that test sample collection a3 input steps S104 is obtained will be lacked, prediction is filled out Missing values are filled, test sample collection a3 filling result will be lacked after prediction filling missing values is with deleting the sample set before data Test sample collection a2 is compared, and judges whether to meet required precision:
(1) if accuracy rate meets required precision, S106 is gone to step;
(2) if accuracy rate does not meet required precision, go to step S103 and the network structure of convolutional neural networks is adjusted Foregoing training step S104 and verification step S105 is re-executed after whole, so constantly iteration is until meet required precision.Right When the network structure of convolutional neural networks is adjusted, first choice changes convolution number of times, that is, increases or decreases convolutional layer number.Convolution Purpose be depth excavate feature, pond layer closely follow convolution, it can thus be understood that two layers two layers increase optimize.When three increases It was found that when final accuracy rate increases few, accordingly reduce back this there is the initial number of plies of accuracy rate for the first time.
In step s 106, the second network processes step, the volume that partial data subset A input steps S105 is obtained are performed Product neutral net, the convolutional neural networks improved.Whole partial datas are inputted, full detail is obtained, make network calculations More complete feature obtains weights, finally gives the best network structure of effect and internal weights, finally preserves the network structure It is used as perfect convolutional neural networks.That is, the weights after S106 updates will be fixed by fixed preservation as parameter Network structure, used for S107.Therefore, convolutional neural networks are first optimized by first network process step respectively in the present invention Outside network structure, then pass through the weights inside the second network processes optimization order.
In step s 107, perform Missing Data Filling step, by missing data subset B input steps S106 obtain it is perfect Convolutional neural networks complete missing values filling.Missing data subset B is inputted to final network structure in the step.
In step S108, flow terminates.
The present invention is before each application, and without repetition training, directly using the network trained, input has the number of missing values According to group, you can predict missing values, it is filled.The convolutional neural networks that the present invention has been selected in deep neural network, by number After being pre-processed according to collection, the network number of plies for meeting Database size specification is createed according to data set size, is set at the beginning of every layer Beginning parameter, training set is inputted in network, and convolutional neural networks oneself can calculate the relation between data, and its data is joined Number is updated.Therefore, the present invention is not only restricted to the integrality of data set, and just energy depth excavates the relation between data, obtains Go out corresponding learning rate and weights, training network is formed, so as to predict missing values and fill.Convolutional neural networks have weights simultaneously Shared the characteristics of, high degree must reduce the quantity of weights in the training process, reduce the requirement to computer hardware with bearing Load, reduces the generation of data over-fitting.
Fig. 2 is please referred to, is a kind of implementation of data prediction step in the method according to the preferred embodiment of the present invention The flow chart of mode.Specifically included as shown in Fig. 2 the data prediction step is abovementioned steps S102:
In step s 201, flow starts;
In step S202, collect data and build pending data collection.Collected in the step at true and accurate data latency Reason.
In step S203, data with existing collection is classified, the data of perfect mistake are separated and are used as partial data Subset A, the data for having missing is separated as lacking data subset B.
In step S204,60%~80% data are randomly selected from partial data subset A as training sample set A1, remaining is used as test sample collection a2.In a preferred embodiment of the invention, step S204 is from partial data subset A In randomly select 70% data as training sample set a1, remaining 30% data is used as test sample collection a2.
In step S205, in test sample collection a2, missing test sample collection a3 is used as after random erasure partial data. It is preferred that missing test sample collection a3 will be used as after the data of test sample collection a2 random erasures 20%~40%.It is highly preferred that with Machine deletes 30% data, and will delete the test sample collection a2 after data as missing test sample collection a3.
In step S206, the flow terminates.
Present invention also offers a kind of storage medium, wherein a plurality of instruction that is stored with, the instruction is suitable to be added by processor Carry and perform the step in the foregoing Missing Data Filling method based on deep learning.Such as execution step S101~ S108。
Fig. 3 is please referred to, is the Missing Data Filling system based on deep learning according to the preferred embodiment of the present invention Module frame chart.As shown in figure 3, the Missing Data Filling system 10 based on deep learning that the embodiment is provided at least includes:Data Pretreatment module 100, first network processing module 200, the second network process module 300 and Missing Data Filling module 400.
Wherein data preprocessing module 100 is used to pre-process data set, including:Data set is divided into partial data Subset A and missing data subset B;Data in partial data subset A are divided into training sample set a1 and test sample collection a2;With Partial data is used as missing test sample collection a3 in machine deletion test sample collection a2.
First network processing module 200 is connected with data preprocessing module 100, for utilizing training sample set a1 to preliminary The convolutional neural networks of structure are trained and preserved, using the convolutional neural networks obtained after training to missing test sample collection A3 carries out Missing Data Filling, and filling result is compared with test sample collection a2, and convolution god is adjusted when not meeting required precision Network structure through the network and foregoing training of iteration and verification step are until meet required precision.
Second network process module 300 connects with data preprocessing module 100 and first network processing module 200 simultaneously Connect, for partial data subset A to be inputted into the convolutional neural networks that first network processing module 100 is obtained, the volume improved Product neutral net.
Missing Data Filling module 400 is connected with the network process module 300 of data preprocessing module 100 and second simultaneously, For missing data subset B to be inputted into the perfect convolutional neural networks completion missing values that the second network process module 300 is obtained Filling.
Fig. 4 is please referred to, is a kind of implementation of data preprocessing module in the system according to the preferred embodiment of the present invention The schematic diagram of mode.As shown in figure 4, the data preprocessing module 100 is specifically included:Data collection module 110, the first grouping sheet First 120, second taxon 130 and data delete unit 140.
Data collection module 110 is used to collect data structure pending data collection.
First taxon 120 is connected with data collection module 110, for classifying to data set, will be perfect The data of mistake are separated as partial data subset A, the data for having missing are separated as lacking data subset B.
Second taxon 130 is connected with the first taxon 120, for being randomly selected from partial data subset A 60%~80% data are as training sample set a1, and the remainder data in partial data subset A is used as test sample collection a2. In a preferred embodiment of the present invention, the data that the second taxon 130 randomly selects 70% from partial data subset A are made For training sample set a1, remaining 30% data is used as test sample collection a2.
Data are deleted unit 140 and are connected with the second taxon 130, in test sample collection a2, random erasure portion Divided data is used as missing test sample collection a3.
Fig. 5 is please referred to, is that one kind of first network processing module in the system according to the preferred embodiment of the present invention is real Apply the schematic diagram of mode.As shown in figure 5, the first network processing module 200 is specifically included:Network struction adjustment unit 210, net Network training unit 220 and compare iteration unit 230.
Network struction adjustment unit 210, for building convolutional neural networks, by input layer, the first convolutional layer, the first pond Layer, the second convolutional layer, the second pond layer, full articulamentum and output layer composition, and initiation parameter.The network struction adjustment unit 210 can also start the function of execution network structure regulation by comparing iteration unit 230, in the network knot to convolutional neural networks When structure is adjusted, first choice changes convolution number of times, that is, increases or decreases convolutional layer number.
Network training unit 220 is connected with network struction adjustment unit 210, for training sample set a1 to be inputted into network structure The convolutional neural networks that adjustment unit 210 is obtained are built, convolutional Neural networking carries out half according to data in the training sample set a1 Supervised learning, and weights are automatically updated, network structure and inner parameter are preserved after the completion of training.
Compare iteration unit 230 to be connected with network training unit 220, for missing test sample collection a3 input networks to be instructed Practice the convolutional neural networks that unit 220 is obtained, prediction filling missing values will lack test sample collection a3 filling result with deleting Test sample collection a2 before data is compared, if accuracy rate meets required precision, starts the second network process module 300, if accurate True rate does not meet required precision, then starts network struction adjustment unit 210 and the network structure of convolutional neural networks is adjusted, And continue to start network training unit 220 and compare iteration unit 230 to carry out iteration, until accuracy rate meets precision and wanted Ask.
In summary, the present invention is using convolutional Neural networking, compared with traditional statistics method, is supervised using neutral net half The relation between learning data is superintended and directed, the feature between data is implicitly extracted, is not only restricted to the integrality of available data, no matter instructs How practice collection property, further feature can be found out and learnt and tested;Compared with shallow-layer artificial neural network, convolutional Neural net Neuron weights between network, each characteristic layer are shared, and reduce number of parameters, reduce the complexity of network, eliminate simultaneously Complicated cumbersome multiple reverse residual computations.The present invention solves the problems, such as database Missing Data Filling, has reached that the degree of accuracy is higher, The faster effect of efficiency, more truly can rapidly reduce missing data.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although The present invention is described in detail with reference to the foregoing embodiments, it will be understood by those within the art that:It still may be used To be modified to the technical scheme described in foregoing embodiments, or equivalent substitution is carried out to which part technical characteristic; And these modification or replace, do not make appropriate technical solution essence depart from various embodiments of the present invention technical scheme spirit and Scope.

Claims (10)

1. a kind of Missing Data Filling method based on deep learning, it is characterised in that comprise the following steps:
(1) data set is pre-processed, the data set is divided into partial data subset and missing data subset, will be described complete Data in entire data subset are divided into partial data conduct in training sample set and test sample collection, random erasure test sample collection Lack test sample collection;
(2) it is trained and preserves using the convolutional neural networks of the training sample set pair Primary Construction, is obtained using after training The convolutional neural networks arrived carry out Missing Data Filling to missing test sample collection, and will filling result and the test sample collection ratio Right, the foregoing training of network structure and iteration and the verification step that the convolutional neural networks are adjusted when not meeting required precision are straight To meeting required precision;
(3) convolutional neural networks for obtaining the partial data subset input step (2), the convolutional Neural net improved Network;
(4) the perfect convolutional neural networks that the missing data subset input step (3) obtains are completed to the filling of missing values.
2. the Missing Data Filling method according to claim 1 based on deep learning, it is characterised in that the step (1) Including:
(1-1) collects data and builds pending data collection;
(1-2) classifies to the data set, the data of perfect mistake is separated as the partial data subset, will had The data of missing are separated lacks data subset as described;
(1-3) randomly selects 60%~80% data as training sample set, remaining conduct from the partial data subset The test sample collection;
(1-4) is concentrated in the test sample, and missing test sample collection is used as after random erasure partial data.
3. the Missing Data Filling method according to claim 2 based on deep learning, it is characterised in that the step (1- 3) 70% data are randomly selected from the partial data subset in as training sample set, remaining 30% data is used as survey Try sample set.
4. according to the Missing Data Filling method according to any one of claims 1 to 3 based on deep learning, it is characterised in that The step (2) specifically includes:
(2-1) builds convolutional neural networks, by input layer, the first convolutional layer, the first pond layer, the second convolutional layer, the second pond Layer, full articulamentum and output layer composition, and initiation parameter;
The training sample set is inputted the convolutional neural networks by (2-2), and convolutional Neural networking is according to the training sample set Middle data carry out semi-supervised learning, and automatically update weights, and network structure and inner parameter are preserved after the completion of training;
(2-3) will lack test sample collection and input the convolutional neural networks, and prediction filling missing values will lack test sample collection Filling result compared with the test sample collection, if accuracy rate meets required precision, step (3) is performed, if accuracy rate is not Meet required precision, then return to step (2-1) is adjusted to the network structure of the convolutional neural networks.
5. according to the Missing Data Filling method according to any one of claims 1 to 3 based on deep learning, it is characterised in that The network structure of the adjustment convolutional neural networks is the convolution time for increasing or decreasing convolutional neural networks in the step (3) Number.
6. a kind of storage medium, it is characterised in that be wherein stored with a plurality of instruction, the instruction is suitable to be loaded and held by processor Step in row such as claim 1-5 in the Missing Data Filling method based on deep learning of any one.
7. a kind of Missing Data Filling system based on deep learning, it is characterised in that including:
Data preprocessing module, for being pre-processed to data set, is divided into partial data subset and missing by the data set Data in the partial data subset are divided into training sample set and test sample collection, random erasure test specimens by data subset This concentrated part data are used as missing test sample collection;
First network processing module, is trained simultaneously for the convolutional neural networks using the training sample set pair Primary Construction Preserve, Missing Data Filling is carried out to missing test sample collection using the convolutional neural networks obtained after training, and result will be filled Compared with the test sample collection, before network structure and iteration that the convolutional neural networks are adjusted when not meeting required precision Training and verification step are stated until meeting required precision;
Second network process module, for the partial data subset to be inputted into the convolution that the first network processing module is obtained Neutral net, the convolutional neural networks improved;
Missing Data Filling module, missing values are completed for the missing data subset to be inputted into the perfect convolutional neural networks Filling.
8. the Missing Data Filling system according to claim 7 based on deep learning, it is characterised in that the data are located in advance Reason module includes:
Data collection module, pending data collection is built for collecting data;
First taxon, for classifying to the data set, the data of perfect mistake is separated as described complete Data subset, the data for having missing are separated as described and lack data subset;
Second taxon, for randomly selecting 60%~80% data from the partial data subset as training sample Collection, remaining is used as the test sample collection;
Data delete unit, for being concentrated in the test sample, and missing test sample collection is used as after random erasure partial data.
9. the Missing Data Filling system according to claim 8 based on deep learning, it is characterised in that second classification Unit randomly selects 70% data as training sample set from the partial data subset, and remaining 30% data is used as survey Try sample set.
10. the Missing Data Filling system based on deep learning according to any one of claim 7~9, it is characterised in that The first network processing module is specifically included:
Network struction adjustment unit, for building convolutional neural networks, by input layer, the first convolutional layer, the first pond layer, second Convolutional layer, the second pond layer, full articulamentum and output layer composition, and initiation parameter;
Network training unit, inputs the convolutional neural networks, convolutional Neural networking is according to the instruction by the training sample set Practice sample intensive data and carry out semi-supervised learning, and automatically update weights, network structure and inner parameter are preserved after the completion of training;
Compare iteration unit, missing test sample collection is inputted into the convolutional neural networks, prediction filling missing values survey missing The filling result of examination sample set is compared with the test sample collection, if accuracy rate meets required precision, starts second net Network processing module, if accuracy rate does not meet required precision, starts the network struction adjustment unit to the convolutional Neural net The network structure of network is adjusted.
CN201710358297.2A 2017-05-19 2017-05-19 A kind of Missing Data Filling method and system based on deep learning Active CN107273429B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710358297.2A CN107273429B (en) 2017-05-19 2017-05-19 A kind of Missing Data Filling method and system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710358297.2A CN107273429B (en) 2017-05-19 2017-05-19 A kind of Missing Data Filling method and system based on deep learning

Publications (2)

Publication Number Publication Date
CN107273429A true CN107273429A (en) 2017-10-20
CN107273429B CN107273429B (en) 2018-04-13

Family

ID=60065112

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710358297.2A Active CN107273429B (en) 2017-05-19 2017-05-19 A kind of Missing Data Filling method and system based on deep learning

Country Status (1)

Country Link
CN (1) CN107273429B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197706A (en) * 2017-11-27 2018-06-22 华南师范大学 Incomplete data deep learning neural network method, device, computer equipment and storage medium
CN108615096A (en) * 2018-05-10 2018-10-02 平安科技(深圳)有限公司 Server, the processing method of Financial Time Series and storage medium
CN110232564A (en) * 2019-08-02 2019-09-13 南京擎盾信息科技有限公司 A kind of traffic accident law automatic decision method based on multi-modal data
CN110472817A (en) * 2019-07-03 2019-11-19 西北大学 A kind of XGBoost of combination deep neural network integrates credit evaluation system and its method
CN111488551A (en) * 2019-01-28 2020-08-04 斯特拉德视觉公司 Method and device for verifying integrity of convolution operation
CN111553463A (en) * 2020-04-17 2020-08-18 东南大学 Method for estimating throughput of wireless access point based on deep learning and network parameters
CN111597175A (en) * 2020-05-06 2020-08-28 天津大学 Filling method for missing value of sensor fusing spatio-temporal information
CN111966740A (en) * 2020-08-24 2020-11-20 安徽思环科技有限公司 Water quality fluorescence data feature extraction method based on deep learning
CN112164468A (en) * 2020-10-09 2021-01-01 北京航空航天大学 Method for processing missing data of pregnancy examination data
WO2021016995A1 (en) * 2019-08-01 2021-02-04 深圳大学 Data processing method and apparatus, computer device, and storage medium
CN112750530A (en) * 2021-01-05 2021-05-04 上海梅斯医药科技有限公司 Model training method, terminal device and storage medium
WO2021169116A1 (en) * 2020-02-29 2021-09-02 平安科技(深圳)有限公司 Intelligent missing data filling method, apparatus and device, and storage medium
CN113515896A (en) * 2021-08-06 2021-10-19 红云红河烟草(集团)有限责任公司 Data missing value filling method for real-time cigarette acquisition
CN113657717A (en) * 2021-07-15 2021-11-16 福州大学至诚学院 Method for evaluating robustness of electric commerce enterprise credit early warning model based on missing value sample
CN113780666A (en) * 2021-09-15 2021-12-10 湖北天天数链技术有限公司 Missing value prediction method and device and readable storage medium
CN115034039A (en) * 2022-05-13 2022-09-09 西北工业大学 PIV flow field data filling method based on convolutional neural network
CN115223709A (en) * 2022-07-26 2022-10-21 内蒙古卫数数据科技有限公司 Missing value filling migration learning method based on disease distribution diagnosis neural network model
CN115238807A (en) * 2022-07-29 2022-10-25 中用科技有限公司 AMC detection method based on artificial intelligence
CN116831620A (en) * 2023-07-13 2023-10-03 逸超医疗科技(北京)有限公司 Doppler missing data filling method based on deep learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177088A (en) * 2013-03-08 2013-06-26 北京理工大学 Biomedicine missing data compensation method
CN103246702A (en) * 2013-04-02 2013-08-14 大连理工大学 Industrial sequential data missing filling method based on sectional state displaying
CN103544218A (en) * 2013-09-29 2014-01-29 广西师范大学 Nearest neighbor filling method of non-fixed k values
CN104751229A (en) * 2015-04-13 2015-07-01 辽宁大学 Bearing fault diagnosis method capable of recovering missing data of back propagation neural network estimation values

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177088A (en) * 2013-03-08 2013-06-26 北京理工大学 Biomedicine missing data compensation method
CN103246702A (en) * 2013-04-02 2013-08-14 大连理工大学 Industrial sequential data missing filling method based on sectional state displaying
CN103544218A (en) * 2013-09-29 2014-01-29 广西师范大学 Nearest neighbor filling method of non-fixed k values
CN104751229A (en) * 2015-04-13 2015-07-01 辽宁大学 Bearing fault diagnosis method capable of recovering missing data of back propagation neural network estimation values

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
WEI WEI 等: ""A generic neural network approach for filling missing data in data mining"", 《SYSTEMS,MAN AND CYBERNETICS,2003.IEEE INTERNATIONAL CONFERENCE ON 》 *
卜范玉 等: ""基于深度学习的不完整大数据填充算法"", 《微电子学与计算机》 *
胡玄子 等: ""数据处理中缺失数据填充方法的研究"", 《湖北工业大学学报》 *
郑斌: ""基于改进遗传算法的不完整大数据填充挖掘算法"", 《微电子学与计算机》 *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197706A (en) * 2017-11-27 2018-06-22 华南师范大学 Incomplete data deep learning neural network method, device, computer equipment and storage medium
CN108615096A (en) * 2018-05-10 2018-10-02 平安科技(深圳)有限公司 Server, the processing method of Financial Time Series and storage medium
CN111488551A (en) * 2019-01-28 2020-08-04 斯特拉德视觉公司 Method and device for verifying integrity of convolution operation
CN111488551B (en) * 2019-01-28 2023-12-05 斯特拉德视觉公司 Method and device for verifying integrity of convolution operation
CN110472817A (en) * 2019-07-03 2019-11-19 西北大学 A kind of XGBoost of combination deep neural network integrates credit evaluation system and its method
WO2021016995A1 (en) * 2019-08-01 2021-02-04 深圳大学 Data processing method and apparatus, computer device, and storage medium
CN110232564A (en) * 2019-08-02 2019-09-13 南京擎盾信息科技有限公司 A kind of traffic accident law automatic decision method based on multi-modal data
WO2021169116A1 (en) * 2020-02-29 2021-09-02 平安科技(深圳)有限公司 Intelligent missing data filling method, apparatus and device, and storage medium
CN111553463A (en) * 2020-04-17 2020-08-18 东南大学 Method for estimating throughput of wireless access point based on deep learning and network parameters
CN111553463B (en) * 2020-04-17 2022-11-18 东南大学 Method for estimating throughput of wireless access point based on deep learning and network parameters
CN111597175A (en) * 2020-05-06 2020-08-28 天津大学 Filling method for missing value of sensor fusing spatio-temporal information
CN111597175B (en) * 2020-05-06 2023-06-02 天津大学 Filling method of sensor missing value fusing time-space information
CN111966740A (en) * 2020-08-24 2020-11-20 安徽思环科技有限公司 Water quality fluorescence data feature extraction method based on deep learning
CN112164468A (en) * 2020-10-09 2021-01-01 北京航空航天大学 Method for processing missing data of pregnancy examination data
CN112750530A (en) * 2021-01-05 2021-05-04 上海梅斯医药科技有限公司 Model training method, terminal device and storage medium
CN113657717A (en) * 2021-07-15 2021-11-16 福州大学至诚学院 Method for evaluating robustness of electric commerce enterprise credit early warning model based on missing value sample
CN113515896A (en) * 2021-08-06 2021-10-19 红云红河烟草(集团)有限责任公司 Data missing value filling method for real-time cigarette acquisition
CN113515896B (en) * 2021-08-06 2022-08-09 红云红河烟草(集团)有限责任公司 Data missing value filling method for real-time cigarette acquisition
CN113780666A (en) * 2021-09-15 2021-12-10 湖北天天数链技术有限公司 Missing value prediction method and device and readable storage medium
CN113780666B (en) * 2021-09-15 2024-03-22 湖北天天数链技术有限公司 Missing value prediction method and device and readable storage medium
CN115034039A (en) * 2022-05-13 2022-09-09 西北工业大学 PIV flow field data filling method based on convolutional neural network
CN115034039B (en) * 2022-05-13 2024-09-06 西北工业大学 PIV flow field data deficiency supplementing method based on convolutional neural network
CN115223709A (en) * 2022-07-26 2022-10-21 内蒙古卫数数据科技有限公司 Missing value filling migration learning method based on disease distribution diagnosis neural network model
CN115223709B (en) * 2022-07-26 2024-01-23 内蒙古卫数数据科技有限公司 Deficiency value filling migration learning method based on cloth disease diagnosis neural network model
CN115238807A (en) * 2022-07-29 2022-10-25 中用科技有限公司 AMC detection method based on artificial intelligence
CN115238807B (en) * 2022-07-29 2024-02-27 中用科技有限公司 AMC detection method based on artificial intelligence
CN116831620A (en) * 2023-07-13 2023-10-03 逸超医疗科技(北京)有限公司 Doppler missing data filling method based on deep learning

Also Published As

Publication number Publication date
CN107273429B (en) 2018-04-13

Similar Documents

Publication Publication Date Title
CN107273429B (en) A kind of Missing Data Filling method and system based on deep learning
Mienye et al. Prediction performance of improved decision tree-based algorithms: a review
CN110363344A (en) Probability integral parameter prediction method based on MIV-GP algorithm optimization BP neural network
CN110473592B (en) Multi-view human synthetic lethal gene prediction method
CN108009674A (en) Air PM2.5 concentration prediction methods based on CNN and LSTM fused neural networks
CN112700434B (en) Medical image classification method and classification device thereof
Salama et al. Utilizing multiple pheromones in an ant-based algorithm for continuous-attribute classification rule discovery
CN107222333A (en) A kind of network node safety situation evaluation method based on BP neural network
CN106250461A (en) A kind of algorithm utilizing gradient lifting decision tree to carry out data mining based on Spark framework
CN107679549A (en) Generate the method and system of the assemblage characteristic of machine learning sample
GB2608540A (en) Personalized automated machine learning
CN112765415A (en) Link prediction method based on relational content joint embedding convolution neural network
CN114117787A (en) Short-term wind power prediction method based on SSA (simple sequence analysis) optimization BP (back propagation) neural network
Tiruneh et al. Feature selection for construction organizational competencies impacting performance
Tandekar et al. A Review on Various Plant Disease Detection Using Image Processing
CN114170446A (en) Temperature and brightness characteristic extraction method based on deep fusion neural network
CN117152528A (en) Insulator state recognition method, insulator state recognition device, insulator state recognition apparatus, insulator state recognition program, and insulator state recognition program
CN107798331A (en) From zoom image sequence characteristic extracting method and device
CN111222529A (en) GoogLeNet-SVM-based sewage aeration tank foam identification method
CN110334869A (en) A kind of mangrove forest ecological health forecast training method based on dynamic colony optimization algorithm
CN109934352A (en) The automatic evolvement method of model of mind
CN115660221A (en) Oil and gas reservoir economic recoverable reserve assessment method and system based on hybrid neural network
Mohanty et al. Modeling the axial capacity of bored piles using multi-objective feature selection, functional network and multivariate adaptive regression spline
CN104463205B (en) Data classification method based on chaos depth wavelet network
Balamurugan et al. Artificial Intelligence Based Smart Farming and Data Collection Using Deep Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant