CN111126576B

CN111126576B - Deep learning training method

Info

Publication number: CN111126576B
Application number: CN202010221098.9A
Authority: CN
Inventors: 代笃伟; 赵威; 申建虎; 王博; 张伟
Original assignee: Beijing Precision Diagnosis Medical Technology Co ltd
Current assignee: Xi'an Zhizhen Intelligent Technology Co.,Ltd.
Priority date: 2020-03-26
Filing date: 2020-03-26
Publication date: 2020-09-01
Anticipated expiration: 2040-03-26
Also published as: CN111126576A

Abstract

The invention discloses a training method for deep learning, which is characterized by collecting a cross-domain test data training model, effectively combining supervised learning and pseudo label learning according to the accuracy of a model prediction test set, greatly improving the generalization capability of a network, predicting the non-labeled data of the cross-domain by using the model obtained by training, generating a pseudo label, assisting manual labeling and improving the efficiency of manual labeling.

Description

Deep learning training method

Technical Field

The invention relates to the technical field of deep learning, in particular to a deep learning training method.

Background

In the big data era, huge amount of data is generated every day, but the data are all disordered, and if the data are summarized and analyzed manually, the time and the labor are consumed, and the result is poor. It is therefore desirable to train artificial intelligence to enable machines with the ability to analyze, summarize, and even infer, instead of human beings to accomplish some tasks.

Training data is one of the most important links in deep learning. Marking training data requires a lot of time and labor, and therefore deep learning gradually progresses from supervised learning to pseudo-label, semi-supervised learning, and unsupervised learning. Some deep learning models have good performance on a specific data domain, and after one data domain is replaced, the accuracy rate is greatly reduced, so that a plurality of strategies for preventing the overfitting of the models and improving the generalization capability of the models appear. For example: learning strategies using a combination of supervised and unsupervised learning, dropout strategies, weight regularization strategies, etc.

1. And (3) supervised learning: in popular terms, the data needs to be labeled manually. The training set requirements for supervised learning include input and output, also referred to as features and goals. The targets in the training set are labeled by humans. Supervised learning is carried out through training of existing training samples (namely known data and corresponding outputs) to obtain an optimal model (the model belongs to a certain function set, and the optimal model represents that the model is optimal under a certain evaluation criterion), then the model is utilized to map all inputs into corresponding outputs, and the outputs are simply judged, so that the classification purpose is realized. The goal of supervised learning is often to let a computer learn a classification system (model) that has already been created. Supervised learning is a common technique for training neural networks and decision trees, both of which are highly dependent on information given by a predetermined classification system. For neural networks, the classification system uses the information to determine network errors and then continually adjusts network parameters. For decision trees, the classification system uses it to determine which attributes provide the most information.

2. Unsupervised learning: the data is untagged or is similarly tagged. The meaning and the function of the data are not known, and the network learns how to classify the data. The input data is not marked and there is no definitive result. The sample data type is unknown, and the sample set needs to be classified (clustered) according to the similarity between samples so as to minimize the intra-class difference and maximize the inter-class difference. Generally speaking, in practical applications, the labels of samples cannot be known in advance in many cases, that is, there is no class corresponding to the training samples, so that the classifier design can only be learned from the original sample set without the labels of the samples. The unsupervised learning objective does not tell the computer how to do, but lets the computer learn how to do by itself. Unsupervised learning methods fall into two broad categories:

1) direct methods based on probability density function estimation: and finding out the distribution parameters of each category in the feature space, and then classifying.

2) The simple clustering method based on the similarity measurement among samples comprises the following steps: the principle is to try to find the initial kernels of the cores of different classes and then to group the samples into different classes according to a similarity measure between the samples and the cores. By using the clustering result, hidden information in the data set can be extracted, and unknown data can be classified and predicted.

3. Pseudo label: the model is trained on the label data, and then the trained model is used to predict the labels of the unlabeled data, thereby creating pseudo labels. In addition, the tag data and the newly generated pseudo tag data are combined as new training data.

4. Dropout: dropout is a kind of trim for training deep neural network, and when the network propagates forward, the activation values of some neurons stop working with a certain probability P.

5. Regularization: generally, a neural network can model any non-linear function, i.e. infinitely close to the target value by increasing the number of hidden layers, however, in such cases it is inevitable to fit the noise as well, and to avoid overfitting, the weights need to be regularized, meaning that the weight coefficients are guaranteed to be small enough in the absolute value sense that the noise will not fit well. The L2 regularization is to add a regularization term after the cost function, and the calculation method is as follows:

wherein theta is a parameter to be learned of the network layer, lambda is a regularization coefficient, the size of the regularization term is controlled, and a larger lambda value restricts the complexity of the model to a greater extent.

From the above, it can be seen that the model trained by the conventional training strategy performs well on the training data domain, but the performance drops sharply on the cross-domain data, and a large amount of labor cost is required for labeling the data.

Disclosure of Invention

The invention provides a deep learning training method, which solves the problems that a model trained in the prior art is good in performance on a training data domain, but the performance on cross-domain data is sharply reduced, and a large amount of labor cost is required for marking data.

The technical scheme of the invention is realized as follows:

a deep learning training method specifically comprises the following steps:

step 1, randomly extracting multiple categories from an image data set, acquiring pictures outside the image data set corresponding to the categories, and forming a data set;

step 2, dividing the data set into a training set data _0 and a test set data _1 according to a proportion, and inputting the training set into a deep neural network to obtain a model _ 0;

step 3, predicting samples in the test set data _1 by using the model _0, calculating the difference between a prediction result and a real result to obtain the prediction accuracy acc _0, and recording the pseudo labels of the samples in the test set data _1 obtained by prediction;

step 4, merging the training set data _0 and sample data with the pseudo labels into a training set data _2, inputting the training set data _2 into a deep neural network, and testing the accuracy of the model _0 to the test set data _1 after each training;

step 5, if the accuracy rate is less than the accuracy rate acc _0 for n times continuously, updating the accuracy rate acc _0 to an average value of the accuracy rates for n times, taking out 10% of data and real labels thereof from the test set data _1 and merging the data and the real labels thereof into the training set data _0, and repeating the step 4;

step 6, if the accuracy is greater than the accuracy acc _0 for n times continuously, updating the accuracy acc _0 to an average value of the accuracy for n times, recording a model corresponding to the highest accuracy of the accuracy for n times as a model _1, taking 10% of data from the training set data _0 and putting the data into the test set data _1, predicting a pseudo tag of the test set data _1 by using the model _1, updating the pseudo tag of the test set data _1, and repeating the step 4;

step 7, if all the data in the training set data _0 are moved to the test set data _1, stopping training after 5 times of training; or stopping training after the training set data _0 and the test set data _1 are in a balanced state.

As a preferred embodiment of the present invention, the step 1 specifically includes randomly sampling 100 categories from the image data set imageNet2012, and acquiring pictures outside the image data set corresponding to the 100 categories, wherein 10 pictures in each category form the data set.

As a preferred embodiment of the present invention, the step 2 specifically includes dividing the data set into a training set data _0 and a test set data _1 according to a ratio of 4:6, and inputting the training set into the deep neural network to obtain a model _ 0.

As a preferred embodiment of the present invention, step 4 specifically includes merging the training set data _0 and the sample data with the pseudo tag into the training set data _2, inputting the training set data _2 into the depth residual error network resnet50, and testing the accuracy of the model _0 for the test set data _1 after each training.

As a preferred embodiment of the present invention, the step 5 specifically includes, if the accuracy is less than the accuracy acc _0 for 5 consecutive times, updating the accuracy acc _0 to an average value of the accuracy for 5 consecutive times, extracting 10% of data and real tags thereof from the test set data _1, merging the data and real tags thereof into the training set data _0, and repeating the step 4.

As a preferred embodiment of the present invention, the step 6 specifically includes, if the accuracy is greater than the accuracy acc _0 for 5 times continuously, updating the accuracy acc _0 to an average value of the accuracy for 5 times, recording a model corresponding to the highest accuracy of the accuracy for 5 times as a model _1, taking 10% of data from the training set data _0 and placing the data in the test set data _1, predicting a pseudo tag of the test set data _1 by using the model _1, updating the pseudo tag of the test set data _1, and repeating the step 4.

The invention has the beneficial effects that:

1. overfitting of the model on a training set can be effectively prevented, and the generalization capability of the model is improved;

2. the pseudo label can be generated for the data, the pseudo label is reasonably utilized for training, and the manual labeling cost is saved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart of an embodiment of a deep learning training method according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, the present invention provides a deep learning training method, which specifically includes the following steps:

the step 1 specifically includes randomly sampling 100 categories from the image data set imageNet2012, and acquiring pictures outside the image data set corresponding to the 100 categories by using a crawler technology or other means, wherein 10 pictures in each category form the data set. The iamgene image dataset started in 2009, and currently there were 14197122 total images, which were divided into 21841 categories, which is the one most referenced in deep learning. The invention chooses to use imageNet2012, which contains a total of 1000 classes.

Step 2, dividing the data set into a training set data _0 and a test set data _1 according to a proportion, and inputting the training set into a deep neural network to obtain a model _ 0; specifically, a data set is divided into a training set data _0 and a test set data _1 according to a ratio of 4:6, and the training set is input into a deep neural network to obtain a model _ 0; 7:3 or 8:2 is generally used in deep learning to distribute training data and test data, and such a distribution mode is not reasonable, because deep learning essentially learns the distribution probability of the data, and the final purpose of deep learning is to apply to an actual scene, however, a collected data set is equivalent to or very small in an actual application scene, and therefore, in order to simulate the actual scene and guarantee sufficient training, the data set is distributed according to 4: 6.

step 4, merging the training set data _0 and sample data with the pseudo labels into a training set data _2, inputting the training set data _2 into a deep neural network, and testing the accuracy of the model _0 to the test set data _1 after each training; specifically, the training set data _2 is input into the deep neural network depth residual network resnet50, and the loss function selects softmax. The softmax loss function is the most commonly used classification loss function and is calculated as follows:

wherein

Representing the output of the fully connected layer.

Step 5, if the accuracy rate is less than the accuracy rate acc _0 for n times continuously, updating the accuracy rate acc _0 to an average value of the accuracy rates for n times, taking out 10% of data and real labels thereof from the test set data _1 and merging the data and the real labels thereof into the training set data _0, and repeating the step 4; in this step, the value of n may be 3 to 6, specifically, 5 may be selected in this embodiment.

Step 6, if the accuracy is greater than the accuracy acc _0 for n times continuously, updating the accuracy acc _0 to an average value of the accuracy for n times, recording a model corresponding to the highest accuracy of the accuracy for n times as a model _1, taking 10% of data from the training set data _0 and putting the data into the test set data _1, predicting a pseudo tag of the test set data _1 by using the model _1, updating the pseudo tag of the test set data _1, and repeating the step 4; in this step, the value of n may be 3 to 6, specifically, 5 may be selected in this embodiment.

The comparison of the accuracy (%) of the present invention with the existing training strategy is shown in the following table:

it can be known that after the training strategy of the present invention is applied, the accuracy of the model on the imagenet test set hardly changes, and the accuracy on the cross-domain test set is improved by 3%, which proves that the training strategy of the present invention is helpful for improving the generalization ability of the model.

The model is obtained by training the training strategy of the present invention to predict the unmarked data, generate the pseudo label, assist the manual marking, and improve the efficiency of the manual marking.

The invention has the beneficial effects that:

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A deep learning training method is characterized by comprising the following steps:

2. The deep learning training method as claimed in claim 1, wherein the step 1 specifically comprises randomly sampling 100 classes from the image data set imageNet2012, and obtaining pictures outside the image data set corresponding to the 100 classes, 10 pictures for each class, to form the data set.

3. The deep learning training method of claim 1, wherein the step 2 specifically comprises dividing the data set into a training set data _0 and a test set data _1 in a ratio of 4:6, and inputting the training set into the deep neural network to obtain a model _ 0.

4. The deep learning training method of claim 1, wherein the step 4 specifically includes merging the training set data _0 and the sample data with the pseudo tag into the training set data _2, inputting the training set data _2 into the deep residual error network resnet50, and testing the accuracy of the model _0 for the test set data _1 after each training.

5. The deep learning training method of claim 1, wherein the step 5 specifically includes, if the accuracy is less than the accuracy acc _0 for 5 consecutive times, updating the accuracy acc _0 to an average of the accuracy for 5 consecutive times, extracting 10% of data and its real tags from the test set data _1 and merging them into the training set data _0, and repeating the step 4.

6. The deep learning training method of claim 1, wherein the step 6 specifically includes, if the accuracy is greater than the accuracy acc _0 for 5 consecutive times, updating the accuracy acc _0 to an average of the accuracy for 5 times, recording a model corresponding to the highest accuracy of the 5 times as a model _1, taking 10% of data out of the training set data _0 and placing the data in the test set data _1, predicting a pseudo tag of the test set data _1 by using the model _1, updating the pseudo tag of the test set data _1, and repeating the step 4.