CN112381161B - Neural network training method - Google Patents

Neural network training method Download PDF

Info

Publication number
CN112381161B
CN112381161B CN202011296897.9A CN202011296897A CN112381161B CN 112381161 B CN112381161 B CN 112381161B CN 202011296897 A CN202011296897 A CN 202011296897A CN 112381161 B CN112381161 B CN 112381161B
Authority
CN
China
Prior art keywords
data
training
neural network
class
batch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011296897.9A
Other languages
Chinese (zh)
Other versions
CN112381161A (en
Inventor
林淑强
尚占锋
张永光
林修明
欧阳天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Meiya Pico Information Co Ltd
Original Assignee
Xiamen Meiya Pico Information Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Meiya Pico Information Co Ltd filed Critical Xiamen Meiya Pico Information Co Ltd
Priority to CN202011296897.9A priority Critical patent/CN112381161B/en
Publication of CN112381161A publication Critical patent/CN112381161A/en
Application granted granted Critical
Publication of CN112381161B publication Critical patent/CN112381161B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a neural network training method, which comprises the following steps: s1, performing preliminary training, namely performing deep learning neural network training on training sample data with unbalanced category data to obtain a preliminary optimal training model; s2, processing training sample data according to the initial optimal training model; and S3, secondary training, and continuing iterative training on the basis of the initial optimal training model by using the data processed by the S2 until the neural network training model converges. According to the method, the DBSCAN clustering result and the existing label are utilized to guide the data sampling of each batch in the neural network training process, and the convergence speed of the algorithm model is increased and the generalization performance of the algorithm model is improved through the balance of data among classes and the characteristic diversity of data in a single class.

Description

Neural network training method
Technical Field
The invention relates to a deep learning algorithm, in particular to a neural network training method for improving the class imbalance of training sample data.
Background
In the deep learning neural network training process, an important step is gradient descent, namely, updating of weight parameters in the network, and common updating modes include the following three modes: 1. traversing all training data sets to calculate a primary loss function, calculating the gradient of the loss function to each parameter, and updating the gradient, wherein the method is called batch gradient descent; 2. the loss function is calculated once every time a data sample is trained, and then gradient updating parameters are solved, wherein the method is called random gradient descent; 3. dividing a training data set into a plurality of small data batches, calculating a loss function according to the batches, and updating parameters, wherein the method is called mini-batch gradient descent. All samples in the method 1 are trained once, so that the method has the defects of high calculation amount overhead and low calculation speed, and each sample in the method 2 updates parameters, so that the method has the advantages of high speed and poor convergence performance, so that a mini-batch gradient descent method is generally adopted in deep learning training at present.
The data class imbalance means that sample data of each class in the data set is greatly unbalanced. The problem is often encountered in deep learning algorithm training, particularly a classification algorithm model, the algorithm model trained by data with unbalanced classes has poor generalization performance, has serious bias in prediction reasoning, and often cannot be seen by general index parameters for measuring model performance in training, such as a two-classification model with a positive-negative sample ratio of 99:1 and extreme class imbalance, and the algorithm model has 99% prediction accuracy and 100% recall rate even if all data are predicted as positive samples.
The method solves the problem of data class imbalance, the traditional method utilizes a sample sampling method to relieve the data imbalance, the method mainly comprises random under-sampling (RUS) and random over-sampling (ROS) to ensure the balance among data classes, and mini-batch is carried out through random sampling during model training. The above conventional method has two disadvantages: 1. random sampling easily changes sample data distribution to cause model overfitting; 2. there is no way to ensure the data category balance in each batch, convergence is slow, and the generalization effect of the algorithm model is poor.
Disclosure of Invention
The present invention is directed to a neural network training method for improving class imbalance of training sample data, so as to solve the above problems. Therefore, the invention adopts the following specific technical scheme:
a neural network training method, comprising the steps of:
s1, performing preliminary training, namely performing deep learning neural network training on training sample data with unbalanced class data to obtain a preliminary optimal training model;
s2, processing the training sample data according to the initial optimal training model, wherein the specific process is as follows:
s21, extracting the feature vector of all pictures of each category according to the initial optimal training model
Figure BDA0002785662710000021
Wherein
Figure BDA0002785662710000022
M represents a marked category label, and id-n represents a picture id number;
s22, feature vector is paired by using clustering algorithm DBSCAN
Figure BDA0002785662710000023
Carrying out category internal feature clustering according to each label category to obtain data clustering result of each category
Figure BDA0002785662710000024
Wherein the content of the first and second substances,
Figure BDA0002785662710000025
a in the graph represents a marked class label and is called a first-level classification label, id-n represents a picture id number, and i represents a class label of DBSCAN clustering and is called a second-level classification label;
s23, obtaining the internal clustering condition of each category picture according to the data clustering result
Figure BDA0002785662710000026
S24, setting a sampling strategy of the deep learning neural network training process batch: from
Figure BDA0002785662710000031
Extracting batch samples from all the types of pictures, wherein the pictures in each batch meet the data balance of two-level classification: the data volume of each class between different class classes of the first class needs to meet the balance; data in the same first-level classification category accords with DBSCAN clustering distribution, and data quantity balance among second-level classification categories is met;
and S3, secondary training, and continuing iterative training on the basis of the initial optimal training model by using the data processed by the S2 until the neural network training model converges.
Further, the amount of data between different classes of training sample data differs by more than 4 times.
Further, the sample data size of each batch is 0.01% to 1% of the training sample data size.
Further, the sample data size per batch is 256 or 512.
Further, the epsilon parameter of DBSCAN is 0.6 and the minPts parameter is 2.
Further, the data amount between each category of the primary classification and the data amount between each category of the secondary classification in each batch are within 10%.
By adopting the technical scheme, the invention has the beneficial effects that: the clustering algorithm DBSCAN is adopted to cluster all data samples according to categories respectively to obtain the distribution condition of the data characteristics in each category, the number of the clustering categories does not need to be appointed in advance, the introduction of artificial prejudice is avoided, the clustering effect is better, meanwhile, the convergence speed of the algorithm model is improved by balancing the diversity of the data in the single category among the categories in each batch, and the model is ensured to have good generalization performance.
Drawings
To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. With these references, one of ordinary skill in the art will appreciate other possible embodiments and advantages of the present invention. The components in the drawings are not necessarily to scale, and similar reference numerals are generally used to identify similar components.
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a sample diagram of a batch.
Detailed Description
The invention will now be further described with reference to the drawings and the detailed description.
As shown in fig. 1, a neural network training method includes the following steps:
and S1, performing preliminary training, namely performing deep learning neural network training on training sample data with unbalanced class data to obtain a preliminary optimal training model. Here, the class data imbalance means that the data amounts between different classes of training sample data are greatly different (for example, different by 4 times or more), that is, the data amount of the most classes is different by at least 4 times from the data amount of the least classes.
And S2, processing the training sample data according to the initial optimal training model. Specifically, the clustering result is used for guiding the sampling of batch in the neural network training process: the data in each batch not only needs to satisfy the balance of the data among the categories, but also needs to satisfy the characteristic diversity of the data in a single category. The specific process of S2 is as follows:
s21, extracting the feature vectors of all pictures in each category according to the primary optimal training model
Figure BDA0002785662710000041
Wherein
Figure BDA0002785662710000042
Where M denotes a tagged category tag and id-n denotes a picture id number.
S22, feature vector is subjected to clustering algorithm DBSCAN
Figure BDA0002785662710000043
Carrying out category internal feature clustering according to each label category to obtain data clustering result of each category
Figure BDA0002785662710000044
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002785662710000045
and A in the graph represents a labeled class label and is called a first-level classification label, id-n represents a picture id number, and i represents a class label of a DBSCAN cluster and is called a second-level classification label. It is composed ofIn (3), the epsilon parameter of DBSCAN is 0.6 and the minPts parameter is 2.
S23, obtaining the internal clustering condition of each category picture according to the data clustering result
Figure BDA0002785662710000051
S24, setting a sampling strategy of the batch in the deep learning neural network training process: from
Figure BDA0002785662710000052
Extracting batch samples from all the types of pictures, wherein the pictures in each batch meet the data balance of two-level classification: the data volume of each class between different class classification classes needs to meet the balance, generally the phase difference is required to be within 10 percent, and the best quantity is the same; data in the same first-class classification category accords with DBSCAN clustering distribution, data quantity balance among the second-class classification categories is met, phase difference is generally required to be within 10%, and the data quantity is preferably the same, so that data diversity in a single category is guaranteed. For example, there are 4 classes (first-class classes) for the training sample data, each class has 2, 4, and 3 second-class classes after being clustered by DBSCAN, and assuming that each Batch needs 256 samples, the number of samples needed for each class in one Batch is 256/4-64, and the number of samples needed for each class in a single class is 64/2-32, 64/4-16, 64/4-16, and 64/3-21.3 (non-integer, and only one class is rounded up and down), as shown in fig. 2. Preferably, the amount of data per batch is 0.01% to 1% of the amount of training sample data.
And S5, secondary training, and continuing iterative training on the basis of the initial optimal training model by using the data processed by the S2 until the neural network training model converges.
Experimental testing
1) An algorithm model: backbone: a 10-class neural network of googleNet and fully-connected layers, wherein the 10 classes are airlane, automobile, bird, cat, deer, dog, frog, horse, hip, and struck, respectively;
2) training sample data: the data set was from the cifar-10, 10 classes (airlane, automobile, bird, cat, deer, dog, frog, horse, ship, struck) with a simulated class imbalance ratio of 4: 1(5000:750), i.e. 5000 for 7 classes (airplan, automobile, bird, cat, deer, dog, frog) and 750 for 3 classes (horse, ship, truck);
3) test data: 10 categories, 1000 sheets each;
4) experimental hardware: 4 GTX 1080Ti GPU video cards;
5) the experimental process comprises the following steps: the size of the batch is 512, 600 batches are operated, and the accuracy acc and the loss value under the test set are calculated;
6) grouping experiments:
experiment 1, training by adopting the existing neural network training method, randomly segmenting training data according to the size of batch 512, and operating 600 batches;
experiment 2: the invention is used for improving the neural network training method of the data class imbalance, and particularly relates to training
The process is as follows:
first stage (preliminary training):
sampling the data of each category of the first 300 lots, ensuring that the data of each category in one lot are consistent, namely 512/10 (51) of each category, randomly distributing the rest 2 categories to 2 categories, and storing a model with the highest accuracy (test set);
second stage (secondary training):
data processing: taking out the model with the highest accuracy in the first stage (for test data, removing a full connection layer, and only reserving a backbone network for extracting picture characteristics), extracting the picture characteristics of all training samples, and clustering the data of 10 classes by using DBSCAN respectively, wherein one batch ensures that the data of each class in the first stage are consistent in number, and also ensures that the data of the subclass (secondary classification) in the class are balanced after the data of each class is clustered by the DBSCAN in a single class;
and (3) secondary training: using the processed data, the training continues for 300 lots based on the best model in the first stage.
7) The results of the experiment are shown in table 1:
TABLE 1
Figure BDA0002785662710000071
As can be seen from Table 1, the method of the present invention can improve the convergence rate and accuracy of model training.
In conclusion, in deep learning neural network training with unbalanced class data, the method provided by the invention guides data sampling of each batch in the neural network training process by using the clustering result of the DBSCAN and the existing label, and improves the convergence speed of the algorithm model and the generalization performance of the algorithm model through the balance of data among classes and the diversity of data characteristics in a single class (especially ensuring the number and distribution of difficult samples). The training method can be widely applied to class unbalanced AI scene training, and the landing application of artificial intelligence under various complex scenes is promoted.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. A neural network training method is characterized by comprising the following steps:
s1, performing preliminary training, namely performing deep learning neural network training on training sample data with unbalanced class data to obtain a preliminary optimal training model;
s2, processing the training sample data according to the initial optimal training model, wherein the specific process is as follows:
s21, extracting the feature vectors of all pictures in each category according to the primary optimal training model
Figure FDA0002785662700000011
Wherein
Figure FDA0002785662700000012
Where M represents a labeled categoryA tag, id-n representing a picture id number;
s22, feature vector is paired by using clustering algorithm DBSCAN
Figure FDA0002785662700000013
Carrying out category internal feature clustering according to each label category to obtain data clustering result of each category
Figure FDA0002785662700000014
Wherein the content of the first and second substances,
Figure FDA0002785662700000015
a in the graph represents a marked class label and is called a first-level classification label, id-n represents a picture id number, and i represents a class label of DBSCAN clustering and is called a second-level classification label;
s23, obtaining the internal clustering condition of each category picture according to the data clustering result
Figure FDA0002785662700000016
S24, setting a sampling strategy of the deep learning neural network training process batch: from
Figure FDA0002785662700000017
Extracting batch samples from all the types of pictures, wherein the pictures in each batch meet the data balance of two-level classification: the data quantity of each class between different class-level classification classes needs to satisfy balance; data in the same first-level classification category accords with DBSCAN clustering distribution, and data quantity balance among the second-level classification categories is met;
and S3, secondary training, and continuing iterative training on the basis of the initial optimal training model by using the data processed by the S2 until the neural network training model converges.
2. The neural network training method of claim 1, wherein the amount of data between different classes of training sample data differs by more than a factor of 4.
3. The neural network training method of claim 1, wherein the epsilon parameter of DBSCAN is 0.6 and the minPts parameter is 2.
4. The neural network training method of claim 1, wherein the sample data size of each batch is 0.01% to 1% of the training sample data size.
5. The neural network training method of claim 4, wherein the sample data size for each batch is 256 or 512.
6. The neural network training method of claim 1, wherein the amount of data between each class of the primary classification and the amount of data between each class of the secondary classification in each batch are within 10%.
CN202011296897.9A 2020-11-18 2020-11-18 Neural network training method Active CN112381161B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011296897.9A CN112381161B (en) 2020-11-18 2020-11-18 Neural network training method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011296897.9A CN112381161B (en) 2020-11-18 2020-11-18 Neural network training method

Publications (2)

Publication Number Publication Date
CN112381161A CN112381161A (en) 2021-02-19
CN112381161B true CN112381161B (en) 2022-08-30

Family

ID=74585149

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011296897.9A Active CN112381161B (en) 2020-11-18 2020-11-18 Neural network training method

Country Status (1)

Country Link
CN (1) CN112381161B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114387457A (en) * 2021-12-27 2022-04-22 腾晖科技建筑智能(深圳)有限公司 Face intra-class interval optimization method based on parameter adjustment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108921208A (en) * 2018-06-20 2018-11-30 天津大学 The aligned sample and modeling method of unbalanced data based on deep learning
CN109816092A (en) * 2018-12-13 2019-05-28 北京三快在线科技有限公司 Deep neural network training method, device, electronic equipment and storage medium
CN110298451A (en) * 2019-06-10 2019-10-01 上海冰鉴信息科技有限公司 A kind of equalization method and device of the lack of balance data set based on Density Clustering
CN110443281A (en) * 2019-07-05 2019-11-12 重庆信科设计有限公司 Adaptive oversampler method based on HDBSCAN cluster

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190385045A1 (en) * 2018-06-14 2019-12-19 Dell Products L.P. Systems And Methods For Generalized Adaptive Storage Endpoint Prediction

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108921208A (en) * 2018-06-20 2018-11-30 天津大学 The aligned sample and modeling method of unbalanced data based on deep learning
CN109816092A (en) * 2018-12-13 2019-05-28 北京三快在线科技有限公司 Deep neural network training method, device, electronic equipment and storage medium
CN110298451A (en) * 2019-06-10 2019-10-01 上海冰鉴信息科技有限公司 A kind of equalization method and device of the lack of balance data set based on Density Clustering
CN110443281A (en) * 2019-07-05 2019-11-12 重庆信科设计有限公司 Adaptive oversampler method based on HDBSCAN cluster

Also Published As

Publication number Publication date
CN112381161A (en) 2021-02-19

Similar Documents

Publication Publication Date Title
US20190279088A1 (en) Training method, apparatus, chip, and system for neural network model
WO2020073951A1 (en) Method and apparatus for training image recognition model, network device, and storage medium
CN107729999A (en) Consider the deep neural network compression method of matrix correlation
CN107392919B (en) Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method
CN110889487A (en) Neural network architecture search apparatus and method, and computer-readable recording medium
CN112101544A (en) Training method and device of neural network suitable for long-tail distributed data set
CN109460793A (en) A kind of method of node-classification, the method and device of model training
CN113643230A (en) Continuous learning method and system for identifying biomacromolecule particles of cryoelectron microscope
CN113887480B (en) Burma language image text recognition method and device based on multi-decoder joint learning
CN112381161B (en) Neural network training method
CN111105241A (en) Identification method for anti-fraud of credit card transaction
Pietron et al. Retrain or not retrain?-efficient pruning methods of deep cnn networks
CN116503676A (en) Picture classification method and system based on knowledge distillation small sample increment learning
CN114821237A (en) Unsupervised ship re-identification method and system based on multi-stage comparison learning
CN114841209A (en) Multi-target domain electrocardiosignal classification method based on depth field self-adaption
CN111222534A (en) Single-shot multi-frame detector optimization method based on bidirectional feature fusion and more balanced L1 loss
CN109978058A (en) Determine the method, apparatus, terminal and storage medium of image classification
CN109543571B (en) Intelligent identification and retrieval method for special-shaped processing characteristics of complex products
CN112488188B (en) Feature selection method based on deep reinforcement learning
CN114444654A (en) NAS-oriented training-free neural network performance evaluation method, device and equipment
CN113296947A (en) Resource demand prediction method based on improved XGboost model
Dong et al. Fast CNN pruning via redundancy-aware training
CN114758167B (en) Dish identification method based on self-adaptive contrast learning
CN111950615A (en) Network fault feature selection method based on tree species optimization algorithm
CN112686277A (en) Method and device for model training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant