CN111860830A - Method, device, terminal and storage medium for dynamically optimizing sample number in model training - Google Patents

Method, device, terminal and storage medium for dynamically optimizing sample number in model training Download PDF

Info

Publication number
CN111860830A
CN111860830A CN202010566690.2A CN202010566690A CN111860830A CN 111860830 A CN111860830 A CN 111860830A CN 202010566690 A CN202010566690 A CN 202010566690A CN 111860830 A CN111860830 A CN 111860830A
Authority
CN
China
Prior art keywords
samples
gradient
iteration
model training
cosine value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010566690.2A
Other languages
Chinese (zh)
Inventor
辛永欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010566690.2A priority Critical patent/CN111860830A/en
Publication of CN111860830A publication Critical patent/CN111860830A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method, a device, a terminal and a storage medium for dynamically optimizing sample number in model trainingk‑h(ii) a Recording the gradient g of the k-th iterationk(ii) a Calculating the gradient gk‑hAnd gradient gkCosine value of the included angle; judging whether the calculated cosine value is smaller than a preset value; if the number of the samples is smaller than the preset value, the number of the samples is increased during the iteration of the step k + 1. The method judges whether to adjust the number of samples or not by calculating the cosine similarity of the gradient between two iterations, is simple and efficient by dynamically adjusting the number of samples by monitoring the gradient updating direction, and adjusts the number of samples to be better according to the gradient, thereby improving the training performance of the model, accelerating the convergence speed of the model, shortening the training time and saving resources.

Description

Method, device, terminal and storage medium for dynamically optimizing sample number in model training
Technical Field
The invention relates to the field of sample number optimization in model training, in particular to a method, a device, a terminal and a storage medium for dynamically optimizing sample number in model training.
Background
Model optimization is one of the most difficult challenges in the implementation of neural network learning algorithms. Hyper-parameter optimization aims to find hyper-parameters that optimize the performance of the deep learning algorithm on the validation dataset. The hyper-parameters are different from the parameters of a general model, and the hyper-parameters are parameters which do not need to be trained and are manually set before training. In a neural network, there are many hyper-parameters to be set, such as learning rate, batch _ size (the number of samples used in a training), number of network layers, number of neuron nodes.
The setting of the hyper-parameters has a direct influence on the model performance, and in order to maximize the model performance, it is important to know how to optimize the hyper-parameters. Several commonly used hyper-parametric optimization methods: manual parameter adjustment, gridding optimization, random optimization searching and the like. Currently, the parameter is adjusted manually.
In the deep neural network, the adjustment of the hyper-parameters is a necessary skill, the training state of the current model is judged by observing monitoring indexes such as loss functions and accuracy in the training process, and the hyper-parameters are adjusted in time to train the model more scientifically, so that the resource utilization rate can be improved. Adjusting different hyper-parameters has different influences on the performance of the training model, such as adjusting the learning rate, when the learning rate is too high, the model may not be converged, and the loss function continuously vibrates up and down; if the learning rate is too low, the convergence speed of the model is slow, and longer training time is needed; increasing the batch size generally allows the network to converge faster, but due to memory resource constraints, an over-sized batch may result in insufficient memory or a crash of the program core.
The prior art researches pay attention to the influence of the learning rate on the accelerated convergence of the model, and the influence of the batch _ size on the training performance of the model is relatively less researched. And increasing the batch _ size within a reasonable range can also bring benefits to the training and performance of the model: 1) the memory utilization rate is improved, and the parallelization efficiency of large matrix multiplication is improved; 2) the iteration times required by running one epoch (full data set) are reduced, and the processing speed of the same data volume is further accelerated; 3) within a certain range, generally, the larger the batch _ size, the more accurate the determined falling direction thereof, and the smaller the training oscillation caused.
The batch _ size affects the performance of the model training, and in the prior art, the fixed batch _ size is used in the whole training process, is preset according to experience, and cannot be dynamically adjusted according to needs, which is not beneficial to the model training.
Disclosure of Invention
In order to solve the problems, the invention provides a method, a device, a terminal and a storage medium for dynamically optimizing the number of samples in model training, wherein the number of samples is dynamically optimized in the training process, and the optimization mode is simple and efficient.
The technical scheme of the invention is as follows: a method for dynamically optimizing sample number in model training is based on small-batch gradient descent and comprises the following steps:
recording the gradient g of the k-h step iterationk-h
Recording the gradient g of the k-th iterationk
Calculating the gradient gk-hAnd gradient gkCosine value of the included angle;
judging whether the calculated cosine value is smaller than a preset value;
if the number of the samples is smaller than the preset value, the number of the samples is increased during the iteration of the step k + 1.
Further, increasing the number of samples means increasing the number of samples to twice the number of original samples.
Further, h is 1.
Further, if the calculated cosine value is larger than the preset value, keeping the number of samples unchanged.
The technical scheme of the invention also comprises a device for dynamically optimizing the sample number in model training, which is based on small-batch gradient descent and comprises,
A gradient recording module: recording the gradient g of each iteration;
cosine value calculation module: calculating the gradient g of the k-h step iterationk-hGradient g iterated with step kkCosine value of the included angle;
cosine value judging module: judging whether the calculated cosine value is smaller than a preset value;
a sample number optimization module: and if the calculated cosine value is smaller than the preset value, increasing the number of samples during the iteration of the step k + 1.
Further, the sample number optimization module increases the number of samples by twice as much as the original number of samples.
Further, h is 1.
Further, if the calculated cosine similarity is greater than a preset value, the sample number optimization module keeps the sample number unchanged.
The technical scheme of the invention also comprises a terminal, which comprises:
a processor;
a memory for storing instructions for execution by the processor;
wherein the processor is configured to perform the method described above.
The technical solution of the present invention also includes a computer readable storage medium storing a computer program, which when executed by a processor implements the method as described above.
The method, the device, the terminal and the storage medium for dynamically optimizing the number of samples in model training provided by the invention have the advantages that the gradient of each iteration is recorded, whether the number of samples is adjusted or not is judged by calculating the cosine similarity of the gradient between two iterations, the method for dynamically adjusting the number of samples by monitoring the gradient updating direction is simple and efficient, the number of samples is adjusted to be better according to the gradient, the model training performance is improved, the model convergence speed is accelerated, the training time is shortened, and resources are saved.
Drawings
Fig. 1 shows the positional relationship between the a-vector and the b-vector.
FIG. 2 is a schematic flow chart of a method according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a second embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings by way of specific examples, which are illustrative of the present invention and are not limited to the following embodiments.
Example one
In the deep neural learning network, the whole neural network can be regarded as a complex nonlinear function and used as a fitting model of a training sample. The value of the loss function (also called an objective function) is used to evaluate the quality of the assumed model, and the smaller the value of the loss function is, the better the assumed model fits the training data. Using gradient descent, it is actually the process of minimizing the loss function whose gradient indicates the direction of descent.
Gradient descent is a common optimization method for machine learning, and can be divided into three forms according to data of samples used each time: batch Gradient decline (Batch Gradient Description), random Gradient decline (Stochasticgradient Description) and minibatch Gradient decline (Mini-Batch Gradient Description).
Among them, the small batch gradient descent is the most common optimization method, which randomly uses the batch _ size samples for parameter update each time. The batch _ size refers to the number of samples used in one training session, and is hereinafter referred to as the number of samples.
Assuming that batch _ size is m, each sample is (x)i,yi) For a mini batch:
a loss function of
Figure BDA0002548075570000031
Gradient is as follows
Figure BDA0002548075570000032
It should be noted that the gradient g is a vector.
For cosine similarity, the cosine value of the included angle between two vectors in the vector space is used as a measure for measuring the difference between two individuals. The closer the cosine value is to 1, the closer the angle is to 0 degrees, i.e. the more similar the two vectors are, the angle is equal to 0, i.e. the two vectors are equal.
As shown in FIG. 1, the angle between the two vectors a and b is θ, and the cosine of the angle is
Figure BDA0002548075570000041
The closer the cosine value is to 1, the closer the angle θ is to 0 degrees, i.e., the more similar the a and b vectors are.
The method for dynamically optimizing the number of samples in model training provided by the embodiment is based on small-batch gradient descent.
Within a certain range, generally, the larger the batch _ size, the more accurate the determined falling direction thereof, and the smaller the training oscillation caused. However, the determined falling direction of the batch _ size is not changed after the batch _ size is increased to a certain degree, so that the excessive batch _ size does not contribute much to the training precision, and only increases the calculation amount of training.
In order to measure the change of the gradient direction, the cosine similarity of two gradient vectors is adopted to measure the change of the gradient direction, if the cosine value of an included angle between two gradients is large, the change of the gradient angle is small, the fluctuation of the gradient direction is not large, and the batch _ size does not need to be updated. If the cosine value of the included angle between the two gradients is small, the fluctuation of the gradient direction is large, and the batch _ size is updated.
As shown in fig. 2, specifically, the method includes the following steps:
s1, recording the gradient g of the k-h step iterationk-h
S2, recording the gradient g of the k step iterationk
S3, calculating a gradient gk-hAnd gradient gkCosine value of the included angle;
s4, judging whether the calculated cosine value is less than a preset value;
and S5, if the number of samples is smaller than the preset value, increasing the number of samples during the iteration of the step k + 1.
And h is an integer which satisfies the condition that h is more than or equal to 1 and is less than k, preferably, h is 1, namely, the cosine value of the gradient included angle of two adjacent steps of iteration is calculated, and the optimization precision is improved.
In this embodiment, increasing the sample number means increasing the sample number twice as large as the original sample number, that is, when the cosine value of the included angle between the kth step iteration gradient and the kth-h step iteration gradient is smaller than the preset value, increasing the sample number of the (k + 1) th step iteration twice as large as the sample number of the kth step iteration.
It should be noted that, if the calculated cosine value is greater than the preset value, the number of samples is kept unchanged, that is, the number of samples in the kth step is the same as that in the (k + 1) th iteration.
Example two
Based on the first embodiment, the present embodiment provides a device for dynamically optimizing sample number in model training, and similarly, the device is based on small-batch gradient descent, and includes the following functional modules.
Gradient recording module 101: recording the gradient g of each iteration;
cosine value calculation module 102: calculating the gradient g of the k-h step iterationk-hGradient g iterated with step kkCosine value of the included angle;
cosine value determination module 103: judging whether the calculated cosine value is smaller than a preset value;
sample number optimization module 104: and if the calculated cosine value is smaller than the preset value, increasing the number of samples during the iteration of the step k + 1.
And h is an integer which satisfies the condition that h is more than or equal to 1 and is less than k, preferably, h is 1, namely, the cosine value of the gradient included angle of two adjacent steps of iteration is calculated, and the optimization precision is improved.
In this embodiment, increasing the sample number means increasing the sample number twice as large as the original sample number, that is, when the cosine value of the included angle between the kth step iteration gradient and the kth-h step iteration gradient is smaller than the preset value, increasing the sample number of the (k + 1) th step iteration twice as large as the sample number of the kth step iteration.
It should be noted that, if the calculated cosine value is greater than the preset value, the number of samples is kept unchanged, that is, the number of samples in the kth step is the same as that in the (k + 1) th iteration.
EXAMPLE III
The present embodiments provide a terminal that includes a processor and a memory.
The memory is used for storing the execution instructions of the processor. The memory may be implemented by any type or combination of volatile or non-volatile memory terminals, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. The executable instructions in the memory, when executed by the processor, enable the terminal to perform some or all of the steps in the above-described method embodiments.
The processor is a control center of the storage terminal, connects various parts of the whole electronic terminal by using various interfaces and lines, and executes various functions of the electronic terminal and/or processes data by operating or executing software programs and/or modules stored in the memory and calling data stored in the memory. The processor may be composed of an Integrated Circuit (IC), for example, a single packaged IC, or a plurality of packaged ICs connected with the same or different functions.
Example four
The present embodiment provides a computer storage medium, wherein the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments provided in the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).
The above disclosure is only for the preferred embodiments of the present invention, but the present invention is not limited thereto, and any non-inventive changes that can be made by those skilled in the art and several modifications and amendments made without departing from the principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A method for dynamically optimizing sample number in model training is based on small-batch gradient descent and is characterized by comprising the following steps of:
recording the gradient g of the k-h step iterationk-h
Recording the gradient g of the k-th iterationk
Calculating the gradient gk-hAnd gradient gkCosine value of the included angle;
judging whether the calculated cosine value is smaller than a preset value;
if the number of the samples is smaller than the preset value, the number of the samples is increased during the iteration of the step k + 1.
2. The method for dynamically optimizing the number of samples in model training according to claim 1, wherein increasing the number of samples means increasing the number of samples to twice the number of original samples.
3. The method for dynamically optimizing the number of samples in model training according to claim 1 or 2, wherein h is 1.
4. The method for dynamically optimizing the number of samples in model training according to claim 1 or 2, wherein the number of samples is kept unchanged if the calculated cosine value is greater than a preset value.
5. A device for dynamically optimizing the number of samples in model training is based on small-batch gradient descent and is characterized by comprising,
a gradient recording module: recording the gradient g of each iteration;
cosine value calculation module: calculating the gradient g of the k-h step iterationk-hGradient g iterated with step kkCosine value of the included angle;
cosine value judging module: judging whether the calculated cosine value is smaller than a preset value;
a sample number optimization module: and if the calculated cosine value is smaller than the preset value, increasing the number of samples during the iteration of the step k + 1.
6. The apparatus for dynamically optimizing the number of samples in model training according to claim 5, wherein the sample number optimizing module increases the number of samples to twice the number of original samples.
7. The device for dynamically optimizing the sample number in model training according to claim 5 or 6, wherein h is 1.
8. The apparatus for dynamically optimizing the number of samples in model training according to claim 5 or 6, wherein the sample number optimizing module keeps the number of samples unchanged if the calculated cosine similarity is greater than a preset value.
9. A terminal, comprising:
a processor;
a memory for storing instructions for execution by the processor;
wherein the processor is configured to perform the method of any one of claims 1-4.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 4.
CN202010566690.2A 2020-06-19 2020-06-19 Method, device, terminal and storage medium for dynamically optimizing sample number in model training Withdrawn CN111860830A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010566690.2A CN111860830A (en) 2020-06-19 2020-06-19 Method, device, terminal and storage medium for dynamically optimizing sample number in model training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010566690.2A CN111860830A (en) 2020-06-19 2020-06-19 Method, device, terminal and storage medium for dynamically optimizing sample number in model training

Publications (1)

Publication Number Publication Date
CN111860830A true CN111860830A (en) 2020-10-30

Family

ID=72986950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010566690.2A Withdrawn CN111860830A (en) 2020-06-19 2020-06-19 Method, device, terminal and storage medium for dynamically optimizing sample number in model training

Country Status (1)

Country Link
CN (1) CN111860830A (en)

Similar Documents

Publication Publication Date Title
Bottou et al. Large scale online learning
CN110832509B (en) Black box optimization using neural networks
Bertsekas et al. Improved temporal difference methods with linear function approximation
WO2018039011A1 (en) Asychronous training of machine learning model
CN112154464B (en) Parameter searching method, parameter searching device, and parameter searching program
CN111079780A (en) Training method of space map convolution network, electronic device and storage medium
WO2022095432A1 (en) Neural network model training method and apparatus, computer device, and storage medium
CN115563610B (en) Training method, recognition method and device for intrusion detection model
CN112686383B (en) Method, system and device for reducing distributed random gradient of communication parallelism
US10482351B2 (en) Feature transformation device, recognition device, feature transformation method and computer readable recording medium
US11550274B2 (en) Information processing apparatus and information processing method
CN113541985B (en) Internet of things fault diagnosis method, model training method and related devices
CN111832693B (en) Neural network layer operation and model training method, device and equipment
EP4009239A1 (en) Method and apparatus with neural architecture search based on hardware performance
CN109522939A (en) Image classification method, terminal device and computer readable storage medium
CN110717601B (en) Anti-fraud method based on supervised learning and unsupervised learning
CN112685841A (en) Finite element modeling and correcting method and system for structure with connection relation
CN117059169A (en) Biological multi-sequence comparison method and system based on parameter self-adaptive growth optimizer
CN116993548A (en) Incremental learning-based education training institution credit assessment method and system for LightGBM-SVM
CN111860830A (en) Method, device, terminal and storage medium for dynamically optimizing sample number in model training
CN111930484A (en) Method and system for optimizing performance of thread pool of power grid information communication server
CN112561047B (en) Apparatus, method and computer readable storage medium for processing data
Paternain et al. Learning policies for markov decision processes in continuous spaces
US20210365838A1 (en) Apparatus and method for machine learning based on monotonically increasing quantization resolution
US20220405599A1 (en) Automated design of architectures of artificial neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20201030

WW01 Invention patent application withdrawn after publication