CN112381161B - Neural network training method - Google Patents
Neural network training method Download PDFInfo
- Publication number
- CN112381161B CN112381161B CN202011296897.9A CN202011296897A CN112381161B CN 112381161 B CN112381161 B CN 112381161B CN 202011296897 A CN202011296897 A CN 202011296897A CN 112381161 B CN112381161 B CN 112381161B
- Authority
- CN
- China
- Prior art keywords
- data
- training
- neural network
- class
- batch
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a neural network training method, which comprises the following steps: s1, performing preliminary training, namely performing deep learning neural network training on training sample data with unbalanced category data to obtain a preliminary optimal training model; s2, processing training sample data according to the initial optimal training model; and S3, secondary training, and continuing iterative training on the basis of the initial optimal training model by using the data processed by the S2 until the neural network training model converges. According to the method, the DBSCAN clustering result and the existing label are utilized to guide the data sampling of each batch in the neural network training process, and the convergence speed of the algorithm model is increased and the generalization performance of the algorithm model is improved through the balance of data among classes and the characteristic diversity of data in a single class.
Description
Technical Field
The invention relates to a deep learning algorithm, in particular to a neural network training method for improving the class imbalance of training sample data.
Background
In the deep learning neural network training process, an important step is gradient descent, namely, updating of weight parameters in the network, and common updating modes include the following three modes: 1. traversing all training data sets to calculate a primary loss function, calculating the gradient of the loss function to each parameter, and updating the gradient, wherein the method is called batch gradient descent; 2. the loss function is calculated once every time a data sample is trained, and then gradient updating parameters are solved, wherein the method is called random gradient descent; 3. dividing a training data set into a plurality of small data batches, calculating a loss function according to the batches, and updating parameters, wherein the method is called mini-batch gradient descent. All samples in the method 1 are trained once, so that the method has the defects of high calculation amount overhead and low calculation speed, and each sample in the method 2 updates parameters, so that the method has the advantages of high speed and poor convergence performance, so that a mini-batch gradient descent method is generally adopted in deep learning training at present.
The data class imbalance means that sample data of each class in the data set is greatly unbalanced. The problem is often encountered in deep learning algorithm training, particularly a classification algorithm model, the algorithm model trained by data with unbalanced classes has poor generalization performance, has serious bias in prediction reasoning, and often cannot be seen by general index parameters for measuring model performance in training, such as a two-classification model with a positive-negative sample ratio of 99:1 and extreme class imbalance, and the algorithm model has 99% prediction accuracy and 100% recall rate even if all data are predicted as positive samples.
The method solves the problem of data class imbalance, the traditional method utilizes a sample sampling method to relieve the data imbalance, the method mainly comprises random under-sampling (RUS) and random over-sampling (ROS) to ensure the balance among data classes, and mini-batch is carried out through random sampling during model training. The above conventional method has two disadvantages: 1. random sampling easily changes sample data distribution to cause model overfitting; 2. there is no way to ensure the data category balance in each batch, convergence is slow, and the generalization effect of the algorithm model is poor.
Disclosure of Invention
The present invention is directed to a neural network training method for improving class imbalance of training sample data, so as to solve the above problems. Therefore, the invention adopts the following specific technical scheme:
a neural network training method, comprising the steps of:
s1, performing preliminary training, namely performing deep learning neural network training on training sample data with unbalanced class data to obtain a preliminary optimal training model;
s2, processing the training sample data according to the initial optimal training model, wherein the specific process is as follows:
s21, extracting the feature vector of all pictures of each category according to the initial optimal training modelWhereinM represents a marked category label, and id-n represents a picture id number;
s22, feature vector is paired by using clustering algorithm DBSCANCarrying out category internal feature clustering according to each label category to obtain data clustering result of each categoryWherein the content of the first and second substances,a in the graph represents a marked class label and is called a first-level classification label, id-n represents a picture id number, and i represents a class label of DBSCAN clustering and is called a second-level classification label;
s23, obtaining the internal clustering condition of each category picture according to the data clustering result
S24, setting a sampling strategy of the deep learning neural network training process batch: fromExtracting batch samples from all the types of pictures, wherein the pictures in each batch meet the data balance of two-level classification: the data volume of each class between different class classes of the first class needs to meet the balance; data in the same first-level classification category accords with DBSCAN clustering distribution, and data quantity balance among second-level classification categories is met;
and S3, secondary training, and continuing iterative training on the basis of the initial optimal training model by using the data processed by the S2 until the neural network training model converges.
Further, the amount of data between different classes of training sample data differs by more than 4 times.
Further, the sample data size of each batch is 0.01% to 1% of the training sample data size.
Further, the sample data size per batch is 256 or 512.
Further, the epsilon parameter of DBSCAN is 0.6 and the minPts parameter is 2.
Further, the data amount between each category of the primary classification and the data amount between each category of the secondary classification in each batch are within 10%.
By adopting the technical scheme, the invention has the beneficial effects that: the clustering algorithm DBSCAN is adopted to cluster all data samples according to categories respectively to obtain the distribution condition of the data characteristics in each category, the number of the clustering categories does not need to be appointed in advance, the introduction of artificial prejudice is avoided, the clustering effect is better, meanwhile, the convergence speed of the algorithm model is improved by balancing the diversity of the data in the single category among the categories in each batch, and the model is ensured to have good generalization performance.
Drawings
To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. With these references, one of ordinary skill in the art will appreciate other possible embodiments and advantages of the present invention. The components in the drawings are not necessarily to scale, and similar reference numerals are generally used to identify similar components.
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a sample diagram of a batch.
Detailed Description
The invention will now be further described with reference to the drawings and the detailed description.
As shown in fig. 1, a neural network training method includes the following steps:
and S1, performing preliminary training, namely performing deep learning neural network training on training sample data with unbalanced class data to obtain a preliminary optimal training model. Here, the class data imbalance means that the data amounts between different classes of training sample data are greatly different (for example, different by 4 times or more), that is, the data amount of the most classes is different by at least 4 times from the data amount of the least classes.
And S2, processing the training sample data according to the initial optimal training model. Specifically, the clustering result is used for guiding the sampling of batch in the neural network training process: the data in each batch not only needs to satisfy the balance of the data among the categories, but also needs to satisfy the characteristic diversity of the data in a single category. The specific process of S2 is as follows:
s21, extracting the feature vectors of all pictures in each category according to the primary optimal training modelWhereinWhere M denotes a tagged category tag and id-n denotes a picture id number.
S22, feature vector is subjected to clustering algorithm DBSCANCarrying out category internal feature clustering according to each label category to obtain data clustering result of each categoryWherein, the first and the second end of the pipe are connected with each other,and A in the graph represents a labeled class label and is called a first-level classification label, id-n represents a picture id number, and i represents a class label of a DBSCAN cluster and is called a second-level classification label. It is composed ofIn (3), the epsilon parameter of DBSCAN is 0.6 and the minPts parameter is 2.
S23, obtaining the internal clustering condition of each category picture according to the data clustering result
S24, setting a sampling strategy of the batch in the deep learning neural network training process: fromExtracting batch samples from all the types of pictures, wherein the pictures in each batch meet the data balance of two-level classification: the data volume of each class between different class classification classes needs to meet the balance, generally the phase difference is required to be within 10 percent, and the best quantity is the same; data in the same first-class classification category accords with DBSCAN clustering distribution, data quantity balance among the second-class classification categories is met, phase difference is generally required to be within 10%, and the data quantity is preferably the same, so that data diversity in a single category is guaranteed. For example, there are 4 classes (first-class classes) for the training sample data, each class has 2, 4, and 3 second-class classes after being clustered by DBSCAN, and assuming that each Batch needs 256 samples, the number of samples needed for each class in one Batch is 256/4-64, and the number of samples needed for each class in a single class is 64/2-32, 64/4-16, 64/4-16, and 64/3-21.3 (non-integer, and only one class is rounded up and down), as shown in fig. 2. Preferably, the amount of data per batch is 0.01% to 1% of the amount of training sample data.
And S5, secondary training, and continuing iterative training on the basis of the initial optimal training model by using the data processed by the S2 until the neural network training model converges.
Experimental testing
1) An algorithm model: backbone: a 10-class neural network of googleNet and fully-connected layers, wherein the 10 classes are airlane, automobile, bird, cat, deer, dog, frog, horse, hip, and struck, respectively;
2) training sample data: the data set was from the cifar-10, 10 classes (airlane, automobile, bird, cat, deer, dog, frog, horse, ship, struck) with a simulated class imbalance ratio of 4: 1(5000:750), i.e. 5000 for 7 classes (airplan, automobile, bird, cat, deer, dog, frog) and 750 for 3 classes (horse, ship, truck);
3) test data: 10 categories, 1000 sheets each;
4) experimental hardware: 4 GTX 1080Ti GPU video cards;
5) the experimental process comprises the following steps: the size of the batch is 512, 600 batches are operated, and the accuracy acc and the loss value under the test set are calculated;
6) grouping experiments:
experiment 1, training by adopting the existing neural network training method, randomly segmenting training data according to the size of batch 512, and operating 600 batches;
experiment 2: the invention is used for improving the neural network training method of the data class imbalance, and particularly relates to training
The process is as follows:
first stage (preliminary training):
sampling the data of each category of the first 300 lots, ensuring that the data of each category in one lot are consistent, namely 512/10 (51) of each category, randomly distributing the rest 2 categories to 2 categories, and storing a model with the highest accuracy (test set);
second stage (secondary training):
data processing: taking out the model with the highest accuracy in the first stage (for test data, removing a full connection layer, and only reserving a backbone network for extracting picture characteristics), extracting the picture characteristics of all training samples, and clustering the data of 10 classes by using DBSCAN respectively, wherein one batch ensures that the data of each class in the first stage are consistent in number, and also ensures that the data of the subclass (secondary classification) in the class are balanced after the data of each class is clustered by the DBSCAN in a single class;
and (3) secondary training: using the processed data, the training continues for 300 lots based on the best model in the first stage.
7) The results of the experiment are shown in table 1:
TABLE 1
As can be seen from Table 1, the method of the present invention can improve the convergence rate and accuracy of model training.
In conclusion, in deep learning neural network training with unbalanced class data, the method provided by the invention guides data sampling of each batch in the neural network training process by using the clustering result of the DBSCAN and the existing label, and improves the convergence speed of the algorithm model and the generalization performance of the algorithm model through the balance of data among classes and the diversity of data characteristics in a single class (especially ensuring the number and distribution of difficult samples). The training method can be widely applied to class unbalanced AI scene training, and the landing application of artificial intelligence under various complex scenes is promoted.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (6)
1. A neural network training method is characterized by comprising the following steps:
s1, performing preliminary training, namely performing deep learning neural network training on training sample data with unbalanced class data to obtain a preliminary optimal training model;
s2, processing the training sample data according to the initial optimal training model, wherein the specific process is as follows:
s21, extracting the feature vectors of all pictures in each category according to the primary optimal training modelWhereinWhere M represents a labeled categoryA tag, id-n representing a picture id number;
s22, feature vector is paired by using clustering algorithm DBSCANCarrying out category internal feature clustering according to each label category to obtain data clustering result of each categoryWherein the content of the first and second substances,a in the graph represents a marked class label and is called a first-level classification label, id-n represents a picture id number, and i represents a class label of DBSCAN clustering and is called a second-level classification label;
s23, obtaining the internal clustering condition of each category picture according to the data clustering result
S24, setting a sampling strategy of the deep learning neural network training process batch: fromExtracting batch samples from all the types of pictures, wherein the pictures in each batch meet the data balance of two-level classification: the data quantity of each class between different class-level classification classes needs to satisfy balance; data in the same first-level classification category accords with DBSCAN clustering distribution, and data quantity balance among the second-level classification categories is met;
and S3, secondary training, and continuing iterative training on the basis of the initial optimal training model by using the data processed by the S2 until the neural network training model converges.
2. The neural network training method of claim 1, wherein the amount of data between different classes of training sample data differs by more than a factor of 4.
3. The neural network training method of claim 1, wherein the epsilon parameter of DBSCAN is 0.6 and the minPts parameter is 2.
4. The neural network training method of claim 1, wherein the sample data size of each batch is 0.01% to 1% of the training sample data size.
5. The neural network training method of claim 4, wherein the sample data size for each batch is 256 or 512.
6. The neural network training method of claim 1, wherein the amount of data between each class of the primary classification and the amount of data between each class of the secondary classification in each batch are within 10%.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011296897.9A CN112381161B (en) | 2020-11-18 | 2020-11-18 | Neural network training method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011296897.9A CN112381161B (en) | 2020-11-18 | 2020-11-18 | Neural network training method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112381161A CN112381161A (en) | 2021-02-19 |
CN112381161B true CN112381161B (en) | 2022-08-30 |
Family
ID=74585149
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011296897.9A Active CN112381161B (en) | 2020-11-18 | 2020-11-18 | Neural network training method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112381161B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114387457A (en) * | 2021-12-27 | 2022-04-22 | 腾晖科技建筑智能(深圳)有限公司 | Face intra-class interval optimization method based on parameter adjustment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108921208A (en) * | 2018-06-20 | 2018-11-30 | 天津大学 | The aligned sample and modeling method of unbalanced data based on deep learning |
CN109816092A (en) * | 2018-12-13 | 2019-05-28 | 北京三快在线科技有限公司 | Deep neural network training method, device, electronic equipment and storage medium |
CN110298451A (en) * | 2019-06-10 | 2019-10-01 | 上海冰鉴信息科技有限公司 | A kind of equalization method and device of the lack of balance data set based on Density Clustering |
CN110443281A (en) * | 2019-07-05 | 2019-11-12 | 重庆信科设计有限公司 | Adaptive oversampler method based on HDBSCAN cluster |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190385045A1 (en) * | 2018-06-14 | 2019-12-19 | Dell Products L.P. | Systems And Methods For Generalized Adaptive Storage Endpoint Prediction |
-
2020
- 2020-11-18 CN CN202011296897.9A patent/CN112381161B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108921208A (en) * | 2018-06-20 | 2018-11-30 | 天津大学 | The aligned sample and modeling method of unbalanced data based on deep learning |
CN109816092A (en) * | 2018-12-13 | 2019-05-28 | 北京三快在线科技有限公司 | Deep neural network training method, device, electronic equipment and storage medium |
CN110298451A (en) * | 2019-06-10 | 2019-10-01 | 上海冰鉴信息科技有限公司 | A kind of equalization method and device of the lack of balance data set based on Density Clustering |
CN110443281A (en) * | 2019-07-05 | 2019-11-12 | 重庆信科设计有限公司 | Adaptive oversampler method based on HDBSCAN cluster |
Also Published As
Publication number | Publication date |
---|---|
CN112381161A (en) | 2021-02-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20190279088A1 (en) | Training method, apparatus, chip, and system for neural network model | |
WO2020073951A1 (en) | Method and apparatus for training image recognition model, network device, and storage medium | |
CN107729999A (en) | Consider the deep neural network compression method of matrix correlation | |
CN107392919B (en) | Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method | |
CN110889487A (en) | Neural network architecture search apparatus and method, and computer-readable recording medium | |
CN112101544A (en) | Training method and device of neural network suitable for long-tail distributed data set | |
CN109460793A (en) | A kind of method of node-classification, the method and device of model training | |
CN113643230A (en) | Continuous learning method and system for identifying biomacromolecule particles of cryoelectron microscope | |
CN113887480B (en) | Burma language image text recognition method and device based on multi-decoder joint learning | |
CN112381161B (en) | Neural network training method | |
CN111105241A (en) | Identification method for anti-fraud of credit card transaction | |
Pietron et al. | Retrain or not retrain?-efficient pruning methods of deep cnn networks | |
CN116503676A (en) | Picture classification method and system based on knowledge distillation small sample increment learning | |
CN114821237A (en) | Unsupervised ship re-identification method and system based on multi-stage comparison learning | |
CN114841209A (en) | Multi-target domain electrocardiosignal classification method based on depth field self-adaption | |
CN111222534A (en) | Single-shot multi-frame detector optimization method based on bidirectional feature fusion and more balanced L1 loss | |
CN109978058A (en) | Determine the method, apparatus, terminal and storage medium of image classification | |
CN109543571B (en) | Intelligent identification and retrieval method for special-shaped processing characteristics of complex products | |
CN112488188B (en) | Feature selection method based on deep reinforcement learning | |
CN114444654A (en) | NAS-oriented training-free neural network performance evaluation method, device and equipment | |
CN113296947A (en) | Resource demand prediction method based on improved XGboost model | |
Dong et al. | Fast CNN pruning via redundancy-aware training | |
CN114758167B (en) | Dish identification method based on self-adaptive contrast learning | |
CN111950615A (en) | Network fault feature selection method based on tree species optimization algorithm | |
CN112686277A (en) | Method and device for model training |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |