CN112418422B - Deep neural network training data sampling method based on human brain memory mechanism - Google Patents

Deep neural network training data sampling method based on human brain memory mechanism Download PDF

Info

Publication number
CN112418422B
CN112418422B CN202011307776.XA CN202011307776A CN112418422B CN 112418422 B CN112418422 B CN 112418422B CN 202011307776 A CN202011307776 A CN 202011307776A CN 112418422 B CN112418422 B CN 112418422B
Authority
CN
China
Prior art keywords
training
samples
sample
waiting
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011307776.XA
Other languages
Chinese (zh)
Other versions
CN112418422A (en
Inventor
何水兵
胡双
孙贤和
银燕龙
陈刚
任祖杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Zhejiang Lab
Original Assignee
Zhejiang University ZJU
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU, Zhejiang Lab filed Critical Zhejiang University ZJU
Priority to CN202011307776.XA priority Critical patent/CN112418422B/en
Publication of CN112418422A publication Critical patent/CN112418422A/en
Application granted granted Critical
Publication of CN112418422B publication Critical patent/CN112418422B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a deep neural network training data sampling method based on a human brain memory mechanism, which comprises the following steps: s1, setting the next round of sequences to be trained as the whole training set in the initial training period; s2, packing the data contained in the training sequence into a plurality of batches according to the batch size, putting the batches into a neural network for training, and obtaining the training loss value of the sample; s3, dividing the sample sequence into three types of difficult, intermediate and simple types according to the loss value; s4, adding a basic clock to the samples of the whole training sequence, wherein the intermediate and simple samples need to respectively calculate the additionally added clock number of the samples according to a countdown waiting function; s5, reducing the number of clocks for the samples of the whole training set, and putting the samples with the number of clocks being 0 into the sequence to be trained in the next round; s6, repeating the step 2-5 until the neural network converges or the training period number ends.

Description

Deep neural network training data sampling method based on human brain memory mechanism
Technical Field
The invention relates to the technical field of neural networks, in particular to a method and a framework for sampling importance of a deep neural network.
Background
With the development of deep learning in recent years, deep neural networks have achieved significant success in the fields of computer vision, speech recognition and natural language processing.
Training a high-precision deep neural network typically consumes a significant amount of time and computer resources. The standard neural network training process treats all samples indiscriminately. But this approach ignores the fact that there is variability between samples. In fact, not all samples contribute the same to the gradient descent, even if one and the same sample does not contribute the gradient descent of the neural network at different stages of the overall training. Therefore, the waste of system CPU resources, memory and IO resources can be caused by treating all samples equally in the training process, and the opportunity of reducing training time and accelerating training is lost.
Therefore, the whole training process is accelerated by skipping the training of some unimportant samples in the training process. But two problems that need to be solved when sampling the importance of a sample are 1) how to evaluate the importance of a sample; 2) how many important samples should be selected in different training phases. An optimal sampling distribution can be obtained through gradient calculation of samples, but the current deep learning framework (such as pytorch or tensorflow) cannot quickly obtain the gradient of a single sample, so that the method cannot be applied in practice. On the other hand, there is the importance of using the loss, the customized upper gradient limit to replace or approximate the sample gradient, and training an auxiliary neural network to predict the sample. However, training the auxiliary network requires the introduction of additional computational resource overhead, and the computation of the gradient upper bound is more complex and time-consuming than the loss. Meanwhile, the method for evaluating the importance of the sample by using the loss only performs experiments on small data and image classification tasks at present, and the application range of the algorithm has certain limitation.
Disclosure of Invention
In order to solve the defects of the prior art, and achieve the purposes of reducing the complexity of calculation, widening the application range and improving the acceleration efficiency when important sample sampling is carried out in the deep neural network training stage, the invention adopts the following technical scheme:
a deep neural network training data sampling method based on a human brain memory mechanism adopts a memory sampling mode to apply two characteristics of memory:
1. the stress of memory. Throughout the training process, the neural network should focus on samples that are often made mistakes, rather than samples that are judged correct or easy to judge.
2. Memory interval. In order to improve the effectiveness of the memory data, the period interval of sample training is adjusted through the difficulty degree of the samples.
As shown in fig. 1, in the sampling stage, all samples are sampled individually by the MSampler (memarized Sampler) method proposed by the present invention, or may be used in series with other samplers (non-MSampler), that is, other Sampler-filtered samples are used as the input of MSampler, or may be used in parallel with other samplers, and the intersection of two Sampler-filtered samples is used as the input data of each last epoch (training period).
As shown in fig. 2 and 3, the steps of the sampling method are as follows:
1. in the initial training period, the next round of sequence to be trained (running _ list) is set as the whole training set (total _ list).
2. The data contained in the training sequence (running _ list) is packed into a plurality of lots according to the lot size, put into the neural network for training, and the training loss value of the sample is obtained.
3. The sample sequence is divided into three types of Hard (Hard), Middle (Middle) and simple (Easy) according to the loss value loss. The rule of partitioning follows the following equation:
Figure BDA0002788767060000021
where N represents the total number of samples in the training set and γ represents the relaxation factor, the main effect is to increase the number of Hard samples so that more samples can be selected in the next epoch (training cycle). ε represents a minimum (usually 0) indicating that samples with a loss less than a minimum are judged to be easy samples.
4. The samples of the entire training sequence (running _ list) are incremented by one basic clock. Intermediate and simple samples require an additional incremental number of clocks (requiring additional wait time compared to difficult samples) to be calculated for the samples, respectively, according to a countdown wait function. This process indicates that training should be further enhanced immediately for samples with wrong judgment of the neural network, and samples with correct judgment of the neural network can wait for a certain time to train. In this way the total number of samples that need to be trained in the following training period can be reduced, thereby reducing training time.
5. Here, three countdown waiting functions are proposed, specifically as follows:
(1) step back waiting: the training latency of the samples is linearly increased every fixed number of cycles, as follows:
counts=bcount+1*(epoch/interval)
wherein counts represents the waiting time of each type of sample obtained by calculation, bcount represents the base number of waiting of samples in different levels, for example: the number of bases of the Middle category is set to be 2, the number of bases of the Hard category is set to be 3, the number of bases of the Middle category can also be set to be 1, the number of bases of the Hard category is set to be 2, and according to the actual situation, as a hyper parameter, epoch represents the number of training cycles (number of rounds), and interval represents how many iteration numbers are spaced.
(2) Exponential backoff wait: the training latency of the samples is exponentially increased every fixed number of cycles, as follows:
Figure BDA0002788767060000031
the min () represents a minimum function, the values of the left side and the right side in the brackets are smaller, the creating rate represents the base number of the exponential growth rate of the exponential backoff waiting mode, the base number is also used as a super parameter and needs to be set in advance, the larget count is used as a super parameter and needs to be set in advance, the waiting time upper limit is represented, and the longest waiting time cannot exceed the larget count.
(3) History sliding window: a history category value is maintained for each sample, and the number of sample wait periods is determined based on the sample history type and the number of consecutive times.
If a sample is of Middle class for 3 consecutive cycles, 2 waiting cycles are added. If 4 consecutive cycles are of Middle class, 3 wait cycles are added, and so on. If a sample is Easy in 2 consecutive cycles, 2 waiting cycles are added. If 3 consecutive cycles are all Easy categories, then add 3 wait cycles or discard (remove the Easy category sample from the training set), and so on.
6. The samples of the entire training set (total _ list) are decremented by one clock number, and then the samples with a clock count of 0 are placed into the sequence to be trained in the next round.
7. Repeating steps 2-6 until the neural network converges or the training cycle number ends.
The invention has the advantages and beneficial effects that:
in the training process, the samples with richer and more difficult information are focused on, rather than the samples with higher prediction accuracy are obtained easily, so that the total number of the neural network training set is reduced, the iteration times of each training period are reduced, the training time of the whole neural network is reduced, and the aim of accelerating the neural network training is fulfilled.
The importance of the sample is judged in advance through loss, whether the sample is read or not is determined, and the expenses on calculation and IO bandwidth resources are reduced.
The method and other acceleration strategies independent of training samples are mutually orthogonal, such as Loop performance in approximate computation, or low-rank decomposition acceleration training in Tensor computation, and the like.
The important sampling based on the memorability of the invention can be only provided for the interface of the MSampler by realizing the encapsulation of the details, thereby reducing the code modification to the original training process and having higher practicability.
Drawings
Fig. 1 is an overall frame diagram of the present invention.
FIG. 2 is a flow chart of the method of the present invention.
FIG. 3 is a diagram illustrating historical window rollback in the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
A deep neural network training data sampling method based on a human brain memory mechanism comprises the following steps:
1. when the epoch is 1, preprocessing operations (such as shuffle and data enhancement) are performed on all samples in the training set, the training set samples are divided into a plurality of lots according to the lot _ size, and the lots are sequentially put into the neural network for training.
2. As the neural network propagates forward, the loss for each sample is recorded in a loss _ history _ list.
3. Pass _ history _ list, current epoch as parameters are passed into the custom sampler of the pytorech deep learning framework.
4. Sampler needs to predefine the sampled hyper-parameters including the lose factor (γ), interval, increment _ rate, largest _ count, ε, the number of waiting clock cycles, bcount, the length of the sliding window, etc. The default 1 waiting clock period is 1epoch, the cardinality of the default Middle sample waiting clock period is 1 clock period, and the cardinality of the Easy sample waiting clock period is 2 clock periods; the length of the sliding window is 3 by default.
5. Sampler sorts samples according to loss value loss in loss _ history _ list, and then divides samples into three levels of difficulty (Hard), Middle (Middle), and simple (Easy) according to the following formula:
Figure BDA0002788767060000041
where N represents the total number of training set samples and γ represents the relaxation factor, the main effect is to increase the number of Hard samples so that more samples can be selected in the next epoch (training cycle). ε represents a minimum (usually 0) indicating that samples with a loss less than a minimum are judged to be easy samples.
6. If the historical sliding window function is used to determine the clock that Middle and Easy samples wait (as a countdown function), the loss per sample needs to be recorded in the sliding window list (sliding window) per sample to facilitate the calculation of the countdown wait clock later.
7. The number of clocks each sample should be incremented is calculated from the level to which each sample is divided and by a countdown wait function.
8. And subtracting 1 from the clock number of all samples in the whole training set, and placing the sample sequence with the clock number of 0 into the sample sequence to be trained in the next period.
9. And performing shuffle operation on the obtained sample sequence and returning, transmitting the obtained self-defined sampler serving as a parameter into a pyrrch DataLoader, generating batch data, and putting the batch data into a neural network for forward propagation and backward propagation to update the parameter.
10. And repeating the steps 2-9 until the training of the neural network is finished.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A deep neural network training data sampling method based on a human brain memory mechanism is characterized by comprising the following steps:
s1, setting the next round of sequences to be trained as the whole training set in the initial training period;
s2, packing the data contained in the training sequence into a plurality of batches according to the batch size batch _ size, putting the batches into a neural network for training, and obtaining the training loss value loss of the sample;
s3, dividing the sample sequence into three types of Hard Hard, Middle and simple Easy according to the loss value loss, wherein the division adopts the following formula:
Figure FDA0002788767050000011
wherein N represents the total number of training set samples, gamma represents a relaxation factor used for adjusting the number of Hard samples, and epsilon represents a minimum value;
s4, adding a basic clock to the samples of the whole training sequence, wherein the intermediate and simple samples need to respectively calculate the additionally added clock number of the samples according to a countdown waiting function;
s5, reducing the number of clocks for the samples of the whole training set, and putting the samples with the number of clocks being 0 into the sequence to be trained in the next round;
s6, repeating the steps S2-S5 until the neural network converges or the training period number ends.
2. The method as claimed in claim 1, wherein the countdown waiting function in step S4 uses step back waiting to linearly increase the training waiting time of the samples every fixed number of cycles, and the formula is as follows:
counts=bcount+1*(epoch/interval)
wherein, counts represents the waiting time of each type of sample obtained by calculation, bcount represents the waiting base number of samples of different grades, epoch represents the training period number, i.e. the number of rounds, interval represents the number of rounds, and then count increase is carried out again, i.e. the clock number updating interval.
3. The method as claimed in claim 1, wherein the countdown waiting function in step S4 employs exponential backoff waiting, and exponentially increases the training waiting time of the sample every fixed number of cycles, and the formula is as follows:
Figure FDA0002788767050000012
wherein, min () represents to take the minimum function, take the smaller value in the left and right sides value in the bracket, increment rate represents the exponential growth rate base number, larget count represents the waiting time upper limit, epoch represents the training cycle number, i.e. the number of rounds, interval represents to increase counts again after many rounds, i.e. the clock number update interval.
4. The method as claimed in claim 1, wherein the countdown waiting function in step S4 uses a history sliding window to maintain a history category value for each sample, and determines the number of waiting periods of the sample according to the history type and the number of consecutive times of the sample.
5. The method for sampling deep neural network training data based on human brain memory mechanism as claimed in claim 4, wherein the method for determining the number of sample waiting periods by using said historical sliding window is as follows: if a sample is in Middle class in 3 consecutive cycles, m is increased based on bcount1A waiting period, if 4 consecutive periods are all of Middle class, m is increased1+1 waiting periods, and so on, until the number of consecutive periods equals the sliding window length; if a sample is Easy in 2 consecutive cycles, m is increased based on bcount2A waiting period, if 3 continuous periods are all Easy type, m is increased2+1 waiting periods, and so on until consecutive periods equal the sliding window length, remove the Easy category sample from the training set, by default m1=m2=2。
6. The method as claimed in claim 1, wherein the step S1 is performed by preprocessing all samples in the training set, the preprocessing includes shuffling and enhancing the data.
7. The method as claimed in claim 1, wherein the step S2 is performed by recording loss value loss of each sample in a history loss list during forward propagation of the neural network, introducing the history loss list and the current training cycle number as parameters into a Sampler self-defined by a deep learning framework, and after the Sampler has predefined hyper-parameters of sampling, sorting the samples according to the loss value loss in the history loss list, and then ranking the samples.
8. The method as claimed in claim 7, wherein the hyper-parameters include a relaxation factor γ, a clock update interval, an exponential growth rate, a waiting time upper limit, a minimum value, a waiting clock period, a counting base bcount, and an increasing base m1,m2Or the length of the sliding window.
9. The method according to claim 1, wherein after step S5, the obtained sample sequence is subjected to shuffle operation and returned, the obtained custom sampler is introduced into a DataLoader of the deep learning framework as a parameter, and then batch data is generated and put into the neural network for forward propagation and backward propagation to update the parameter.
10. The sampling method for deep neural network training data based on human brain memory mechanism as claimed in one of claims 1-9, wherein said sampling method is used in series or in parallel with other sampling methods, said series use is to use the output of other sampling method as the input of said sampling method, said parallel use is to intersect the other sampling method with the sample outputted in said sampling method, and the intersected sample is used as the input data of each epoch training period.
CN202011307776.XA 2020-11-20 2020-11-20 Deep neural network training data sampling method based on human brain memory mechanism Active CN112418422B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011307776.XA CN112418422B (en) 2020-11-20 2020-11-20 Deep neural network training data sampling method based on human brain memory mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011307776.XA CN112418422B (en) 2020-11-20 2020-11-20 Deep neural network training data sampling method based on human brain memory mechanism

Publications (2)

Publication Number Publication Date
CN112418422A CN112418422A (en) 2021-02-26
CN112418422B true CN112418422B (en) 2022-05-27

Family

ID=74773265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011307776.XA Active CN112418422B (en) 2020-11-20 2020-11-20 Deep neural network training data sampling method based on human brain memory mechanism

Country Status (1)

Country Link
CN (1) CN112418422B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345219A (en) * 2018-03-01 2018-07-31 东华大学 Fypro production technology based on class brain memory GRU
CN111461294A (en) * 2020-03-16 2020-07-28 中国人民解放军空军工程大学 Intelligent aircraft brain cognitive learning method facing dynamic game
CN111626335A (en) * 2020-04-29 2020-09-04 杭州火烧云科技有限公司 Improved hard case mining training method and system of pixel-enhanced neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11348002B2 (en) * 2017-10-24 2022-05-31 International Business Machines Corporation Training of artificial neural networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345219A (en) * 2018-03-01 2018-07-31 东华大学 Fypro production technology based on class brain memory GRU
CN111461294A (en) * 2020-03-16 2020-07-28 中国人民解放军空军工程大学 Intelligent aircraft brain cognitive learning method facing dynamic game
CN111626335A (en) * 2020-04-29 2020-09-04 杭州火烧云科技有限公司 Improved hard case mining training method and system of pixel-enhanced neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《REMIND Your Neural Network to Prevent Catastrophic Forgetting》;Tyler L.Hayes等;《arXiv》;20200713;全文 *
《人工智能机理解释与数学方法探讨》;郭田德 等;《中国科学:数学》;20200528;全文 *

Also Published As

Publication number Publication date
CN112418422A (en) 2021-02-26

Similar Documents

Publication Publication Date Title
Dutta et al. On the discrepancy between the theoretical analysis and practical implementations of compressed communication for distributed deep learning
Lei et al. Less than a single pass: Stochastically controlled stochastic gradient
EP3540652A1 (en) Method, device, chip and system for training neural network model
CN113064879A (en) Database parameter adjusting method and device and computer readable storage medium
Rotman et al. Shuffling recurrent neural networks
CN104765589B (en) Grid parallel computation preprocess method based on MPI
CN112463189B (en) Distributed deep learning multi-step delay updating method based on communication operation sparsification
CN111882060A (en) Single-step delay stochastic gradient descent training method for machine learning
US11334646B2 (en) Information processing apparatus and method for controlling sampling apparatus
JP2019153098A (en) Vector generation device, sentence pair leaning device, vector generation method, sentence pair learning method, and program
CN108229714A (en) Prediction model construction method, Number of Outpatients Forecasting Methodology and device
CN116865251A (en) Short-term load probability prediction method and system
CN110110860B (en) Self-adaptive data sampling method for accelerating machine learning training
CN112099931A (en) Task scheduling method and device
US20220207374A1 (en) Mixed-granularity-based joint sparse method for neural network
CN112418422B (en) Deep neural network training data sampling method based on human brain memory mechanism
WO2020039790A1 (en) Information processing device, information processing method, and program
CN107277118A (en) The method and apparatus for generating the conventional access path of node
CN112598078B (en) Hybrid precision training method and device, electronic equipment and storage medium
Hidaka et al. Quantifying the impact of active choice in word learning
CN109614999A (en) A kind of data processing method, device, equipment and computer readable storage medium
CN112085179A (en) Method for increasing deep learning training data volume
CN116636815B (en) Electroencephalogram signal-based sleeping quality assessment method and system for underwater operators
CN117236900B (en) Individual tax data processing method and system based on flow automation
CN118035645A (en) Electromagnetic method data prediction method and device based on panning optimization LSTM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant