CN113780526A - Network training method, electronic device and storage medium - Google Patents

Network training method, electronic device and storage medium Download PDF

Info

Publication number
CN113780526A
CN113780526A CN202111007686.3A CN202111007686A CN113780526A CN 113780526 A CN113780526 A CN 113780526A CN 202111007686 A CN202111007686 A CN 202111007686A CN 113780526 A CN113780526 A CN 113780526A
Authority
CN
China
Prior art keywords
network
initial
learning rate
deep learning
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111007686.3A
Other languages
Chinese (zh)
Other versions
CN113780526B (en
Inventor
刘冲冲
付贤强
何武
朱海涛
户磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Dilusense Technology Co Ltd
Original Assignee
Beijing Dilusense Technology Co Ltd
Hefei Dilusense Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dilusense Technology Co Ltd, Hefei Dilusense Technology Co Ltd filed Critical Beijing Dilusense Technology Co Ltd
Priority to CN202111007686.3A priority Critical patent/CN113780526B/en
Publication of CN113780526A publication Critical patent/CN113780526A/en
Application granted granted Critical
Publication of CN113780526B publication Critical patent/CN113780526B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The embodiment of the invention relates to the field of deep learning, and discloses a network training method, electronic equipment and a storage medium. The invention discloses a network training method, electronic equipment and a storage medium, comprising the following steps: searching network parameters corresponding to at least two convergence points of a pre-trained initial deep learning network; fusing network parameters corresponding to at least two convergence points of the initial deep learning network to generate optimized network parameters; and changing the initial network parameters of the initial deep learning network into optimized network parameters to obtain a new deep learning network. By adopting the embodiment, the performance of the network for solving the actual problem can be deeply learned.

Description

Network training method, electronic device and storage medium
Technical Field
The embodiment of the invention relates to the field of deep learning, in particular to a network training method, electronic equipment and a storage medium.
Background
Accurate deep learning networks rely on excellent training methods, and generally conventional methods of deep learning networks require the acquisition or collection of data for a particular problem. In general, the data used to train the network is only a subset of the data that is involved in a particular problem. And after the data set is obtained, designing a performance index, and iteratively updating network parameters by using a gradient descent method to finally obtain the trained deep learning network.
The deep learning network usually adopts a gradient iteration training mode, which essentially performs a series of mathematical operations, and converges when a loss function value reaches a minimum, however, in the process of iterative optimization network, saddle points, local minimum and other problems can be encountered, so that the deep learning network cannot be trained to an optimal state; in addition, since the data set cannot be completely equivalent to the data distribution of the actual problem, even if the performance of the deep learning network is optimized in training data, the performance of the deep learning network for solving the actual problem cannot be guaranteed.
Disclosure of Invention
The embodiment of the invention aims to provide a network training method, electronic equipment and a storage medium, which can improve the performance of solving practical problems of a deep learning network.
In order to solve the foregoing technical problem, in a first aspect, an embodiment of the present application provides a method, an electronic device, and a storage medium for network training, including: searching network parameters corresponding to at least two convergence points of a pre-trained initial deep learning network; fusing network parameters corresponding to at least two convergence points of the initial deep learning network to generate optimized network parameters; and changing the initial network parameters of the initial deep learning network into optimized network parameters to obtain a new deep learning network.
In a second aspect, an embodiment of the present application further provides a face recognition method, which is applied to an electronic device, where the electronic device is deployed with a face recognition network obtained by the network training method, and the face recognition method includes: acquiring a face image to be recognized; and inputting the face image into a face recognition network to obtain a recognition result of the face image.
In a third aspect, an embodiment of the present application further provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the network training method or the face recognition method.
In a fourth aspect, the present application further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the above-mentioned network training method or performs the above-mentioned face recognition method.
In the embodiment of the application, the network parameters corresponding to at least two convergence points of the pre-trained initial deep learning network are searched, because of adopting the deep learning mode, the deep learning network training process has a plurality of convergence points, and the network parameters corresponding to different convergence points may be different, by fusing the network parameters corresponding to at least two convergence points, the difference between the obtained optimized network parameters and the network parameters corresponding to each other convergence point is reduced, and the new deep learning network obtained based on the optimized network parameters is more stable, and at the same time, because the optimized network parameters are fused with the network parameters corresponding to the plurality of convergence points, the difference between the actual working performance of the new deep learning network and the performance of the network on a training set is reduced, and the problem solving performance of the deep learning network is improved, for example, the recognition rate of the deep learning network for face recognition is improved; in addition, because the optimization is carried out based on the initial deep learning network, the network structure of the initial deep learning network can be reserved, a data preprocessing method for the network does not need to be changed, and a method for processing network output data does not need to be changed, so that extra workload cannot be added to the deployment of the network, and the time consumption of network reasoning cannot be increased.
Drawings
One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.
FIG. 1 is a flow chart of a method of network training in an embodiment of the present application;
FIG. 2 is a diagram illustrating network parameters corresponding to a plurality of convergence points searched for in an embodiment of the present application;
FIG. 3 is a diagram illustrating an example of constructing an iterative learning expression according to an embodiment of the present application;
FIG. 4 is a diagram illustrating obtaining a fine learning rate according to an embodiment of the present application;
FIG. 5 is a diagram illustrating an embodiment of the present invention for determining whether an initial deep learning network converges;
FIG. 6 is a diagram illustrating acquisition of an exploration learning rate according to an embodiment of the present application;
FIG. 7 is a schematic diagram of obtaining an optimization step size in an embodiment of the present application;
FIG. 8 is a schematic diagram of a plurality of network parameters merged in an embodiment of the present application;
FIG. 9 is a schematic diagram of another implementation of fusing multiple network parameters in an embodiment of the present application;
FIG. 10 is a flow chart of a method of network training in an embodiment of the present application;
FIG. 11 is a flow chart of a method of face recognition in an embodiment of the present application;
fig. 12 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.
The following embodiments are divided for convenience of description, and should not constitute any limitation to the specific implementation manner of the present invention, and the embodiments may be mutually incorporated and referred to without contradiction.
The network training method provided in the embodiment of the present application may be implemented by an electronic device, where the electronic device may be a server, a stand-alone device, and the network training method flow is shown in fig. 1:
step 101: and searching network parameters corresponding to the pre-trained initial deep learning network at least two convergence points.
In particular, the deep learning network is applied to various fields, for example, in the field of image recognition, the recognition of a human face can be realized by a human face recognition network. The embodiment of the application takes face recognition as an example, specifically describes a training process of a deep learning network (namely, the deep learning network is a face recognition network), and can also be applied to other fields in other embodiments, namely, the initial deep learning network can be a voice recognition network, an image migration network, a position positioning network and the like.
The conventional training method of the face recognition network needs to collect or collect various face data. Typically, the data used to train a face recognition network involves only a subset of the face image data; and after the face data set is obtained, designing a performance index, and iteratively updating the network parameters of the face recognition network by using a gradient descent method to finally obtain the trained face recognition network.
In the iterative optimization process of the existing face recognition network, problems such as saddle points, local minimum values and the like can be encountered, so that the face recognition network cannot be trained to an optimal state; in addition, because the sample training set cannot be completely equal to the data distribution in practical application, even if the performance of the face recognition network is optimized on the sample training set, the optimal performance of the face recognition network for solving practical problems cannot be ensured. The network training method can enable the performance of the trained face recognition network for solving the practical problem to be optimal.
The electronic device obtains an initial deep learning network which is trained in advance, the initial deep learning network in this example is an initial face recognition network, that is, the initial face recognition network mentioned later is an initial deep learning network, and the face recognition network is a deep learning network, which will not be described in this embodiment. The following specifically addresses the training process of the face recognition network.
The initial face recognition network to be trained is marked as mwW represents the network parameter to be optimized in the face recognition network, training mwThe loss function used is recorded as loss, the input face image sample is recorded as x, and the training process is the optimization target: argminw(loss(mw(x) ))). In this example, a random Gradient Descent (SGD) optimizer may be selected during the training process.
The data set is divided into a training set and a testing set, and the intersection of the training set and the testing set is an empty set. The initial face recognition network m can be trained in advance in a conventional modew. And testing an evaluation index p of the initial face recognition network through the test set, wherein the evaluation index refers to the performance of the initial face recognition network, such as the accuracy rate, the recognition rate and the like of face recognition, and when the performance of the network is not improved any more, the initial face recognition network is obtained. For example, when mwThe test performance of the last several times (for example, 20 times) on the test set is not improved in general, and the test performance is stabilized at the fixed upper limit performance value, which indicates that the initial face recognition network training is completed. Obtaining m on a test setwP is p0, mwIs denoted as w 0. When the face recognition network m is startedwAfter excellent performance is achieved, m cannot be increased any more by continuing conventional trainingwThe performance of (c).
In this embodiment, the learning rate in the initial face recognition network may be adjusted, so that other convergence points may be searched based on the training of the pre-trained initial face recognition network to obtain network parameters corresponding to the other convergence points, where the other convergence points are convergence points other than the convergence point corresponding to the pre-trained initial face recognition network.
Step 102: and fusing the network parameters corresponding to the at least two convergence points of the initial deep learning network to generate optimized network parameters.
Specifically, if at least 2 network parameters corresponding to the convergence points are searched, at least two network parameters may be fused, and the fusion manner may be an average value of the network parameters corresponding to each convergence point.
Step 103: and changing the initial network parameters of the initial deep learning network into optimized network parameters to obtain a new deep learning network.
In the embodiment of the application, the network parameters corresponding to at least two convergence points of the pre-trained initial deep learning network are searched, because of adopting the deep learning mode, the deep learning network training process has a plurality of convergence points, and the network parameters corresponding to different convergence points may be different, by fusing the network parameters corresponding to at least two convergence points, the difference between the obtained optimized network parameters and the network parameters corresponding to each other convergence point is reduced, and the new deep learning network obtained based on the optimized network parameters is more stable, and at the same time, because the optimized network parameters are fused with the network parameters corresponding to the plurality of convergence points, the difference between the actual working performance of the new deep learning network and the performance of the network on a training set is reduced, and the problem solving performance of the deep learning network is improved, for example, the recognition rate of the deep learning network for face recognition is improved; in addition, because the optimization is carried out based on the initial deep learning network, the network structure of the initial deep learning network can be reserved, a data preprocessing method for the network does not need to be changed, and a method for processing network output data does not need to be changed, so that extra workload cannot be added to the deployment of the network, and the time consumption of network reasoning cannot be increased.
In one embodiment, a specific way to search for network parameters corresponding to a plurality of convergence points is provided, and the process of each search is shown in fig. 2:
step 1011: and determining an iterative learning rate corresponding to the (n +1) th iterative training according to a preset iterative learning rate expression, wherein the iterative learning rate expression is a decreasing function, n is used for representing the iterative times, and n is an integer greater than or equal to 0.
Specifically, the electronic device may initialize the number n of times of fusing the at least two network parametersmWhen m is equal to 0, m iswIs set to w0, i.e. let w equal to w0, the shadow network m is initializedw’Where w' is a network parameter of the shadow network, mwAnd mw’The network structures are the same, only the network parameters are different, and the shadow network can be used for storing the fused network parameters so as to further fuse the network parameters corresponding to other convergence points in the following process.
The preset iterative learning expression can be a linear decreasing function, and the iterative learning expression comprises: the method comprises the steps of fine tuning learning rate, exploring learning rate and optimizing step length, wherein the fine tuning learning rate is the learning rate used for training to convergence, and the exploring learning rate is the learning rate used for searching the next convergence point. The iterative learning expression can be expressed as formula (1):
Figure BDA0003237602180000051
wherein, lrtIndicates the fine learning rate, lreRepresents the exploration learning rate, n represents the number of iterations, q is a fixed parameter, and q can be set according to the actual situation, for example, q can be 10. s denotes the optimization step size and floor (×) is a rounded down function.
Learning rate lr of electronic device through fine tuningtExploration learning rate lreThe iteration number n may determine the iterative learning rate of the (n +1) th iterative training, for example, if the initialization iteration number n is 0, the iterative learning rate corresponding to the 1 st iterative training may be determined to be lre
Step 1012: and training the initial deep learning network after the nth iterative training according to the iterative learning rate and a sample training set, wherein the sample training set comprises a plurality of sample images.
For example, initially, let n be 0, and the iterative learning rate for the 1 st iterative training be lreAnd performing iterative training on the initial deep learning network of the initial face recognition network for the 1 st time by using the sample training set. The sample training set includes several sample images, such as: 100, 1000 images, etc. In the embodiment of the application, the sample image is a human face image. The subsequent iterative training is similar to the first iterative training, and will not be described here.
Step 1013: and updating the iteration number n to n + 1.
Step 1014: and judging whether the iteration number n is smaller than the optimization step length, if so, returning to the step 1011, otherwise, ending the searching process, and executing the step 1015.
Step 1015: and acquiring the network parameters in the initial deep learning network after the nth iterative training as the network parameters corresponding to the searched convergence points, and executing the step 102.
Specifically, the updated iteration number is compared with the optimization step length, and whether the updated iteration number is smaller than the optimization step length is judged. And if the iteration times do not reach the optimized step length, determining the iterative learning rate for the next iterative training according to a preset iterative learning rate expression.
Because the expression of the iterative learning rate can be a decreasing function, namely the iterative learning rate is gradually changed from the exploration learning rate to the fine adjustment learning rate, the iterative learning rate is updated for 1 time when the training set is iterated for 1 time. And the process of exploring and converging to a new convergence point is completed from the time when the iterative learning rate is equal to the exploring learning rate to the time when the iterative learning rate is equal to the fine tuning learning rate.
Step 102: and fusing the network parameters corresponding to the at least two convergence points of the initial deep learning network to generate optimized network parameters.
Step 103: and changing the initial network parameters of the initial deep learning network into optimized network parameters to obtain a new deep learning network.
In this embodiment, by adjusting the learning rate and optimizing the step length, the search of the convergence point can be reduced, unnecessary computation is reduced, the speed of training the deep learning network by the electronic device is increased, and the consumption of computing resources is reduced. The method in the application can improve the speed of training the face recognition network by the electronic equipment.
In one embodiment, before performing step 1011, an iterative learning expression is constructed, the flow of which is shown in FIG. 3:
step 1011-1: and acquiring the fine tuning learning rate according to the initial deep learning network.
Specifically, other convergence points of the network can be searched based on the trained initial deep learning network. The fine tuning learning rate can ensure that the convergence is achieved in the network training process, and the fine tuning learning rate is smaller than the exploration learning rate.
Step 1011-2: and acquiring the exploration learning rate according to the initial deep learning network and the fine tuning learning rate.
Specifically, the exploration learning rate can assist the network training process to jump out of the current convergence point, so that other convergence points can be searched, and the condition that the same convergence point is searched is avoided.
Step 1011-3: and obtaining an optimization step length, updating the fine-tuning learning rate and updating the exploration learning rate according to the initial deep learning network, the fine-tuning learning rate and the exploration learning rate.
Step 1011: and determining an iterative learning rate corresponding to the (n +1) th iterative training according to a preset iterative learning rate expression, wherein the iterative learning rate expression is a decreasing function, n is used for representing the iterative times, and n is an integer greater than 0.
The iterative learning expression comprises: the learning rate is a learning rate from training to convergence, and the exploration learning rate is used for searching a learning rate of a next convergence point.
Step 1012: and training the initial deep learning network after the nth iterative training according to the iterative learning rate and a sample training set, wherein the sample training set comprises a plurality of sample images.
Step 1013: and updating the iteration number n to n + 1.
Step 1014: and (4) judging whether the iteration number n is smaller than the optimization step length, if so, executing step 1011, otherwise, ending the searching process, and executing step 1015.
Step 1015: and acquiring the network parameters in the initial deep learning network after the nth iterative training as the network parameters corresponding to the searched convergence points, and executing the step 102.
Step 102: and fusing the network parameters corresponding to the at least two convergence points of the initial deep learning network to generate optimized network parameters.
Step 103: and changing the initial network parameters of the initial deep learning network into optimized network parameters to obtain a new deep learning network.
In the embodiment, the fine tuning learning rate, the exploration learning rate and the optimization step length are respectively obtained through the initial identification network, so that the time for obtaining the fine tuning learning rate, the exploration learning rate and the optimization step length is reduced.
In one embodiment, a schematic diagram of obtaining a fine learning rate is provided, as shown in fig. 4:
step 1011-11: setting the initial fine tuning learning rate as a preset learning rate.
Specifically, the network parameter of the initial deep learning network is adjusted to w0, that is, w is w0, and the fine tuning learning rate lr is initializedt. Initialization is lrtAssigning an initial value, which may be determined according to the specific situation of the current training task, for example, the preset learning rate may be the learning rate of the pre-trained initial deep learning network as lrtThe initial value of (c).
Step 1011-12: and performing Nt times of iterative training on the initial deep learning network according to the initial fine tuning learning rate and the sample training set, wherein Nt is an integer larger than 1.
Deep learning is carried out according to the set initial fine tuning learning rate, and the initial deep learning network m is subjected to the sample training setwAnd performing Nt times of iterative training.
Step 1011-13: and judging whether the initial deep learning network is converged after Nt times of iterative training, if so, executing the step 1011-14, otherwise, executing the step 1011-15.
Step 1011-14: the initial fine-tuning learning rate is used as the fine-tuning learning rate.
Step 1011-15: and reducing the initial fine tuning learning rate, and returning to the step of executing Nt times of iterative training on the initial deep learning network according to the initial fine tuning learning rate.
The initial fine-tuning learning rate may be reduced by a predetermined ratio, for example, let lrt=rt*lrtWherein 0 is<rt<1,rtAt a predetermined ratio, rtMay be 0.9.
In this embodiment, the fine tuning learning rate can be determined quickly by determining whether the convergence is achieved, and the speed of obtaining the fine tuning learning rate is increased.
In an embodiment, a schematic diagram for determining whether the initial deep learning network converges after Nt times of iterative training is provided, which is specifically shown in fig. 5:
step 1011-: and acquiring an included angle between the parameter gradient of each iterative training and the corresponding network parameter.
In this example, the description is continued by taking a face recognition network as an example. Setting learning rate of network optimizer to lrtTraining face recognition network Nt times using training set, each training recording
Figure BDA0003237602180000081
The included angle between the angle w and the angle w,
Figure BDA0003237602180000082
to obtain the gradient operator of the network parameter w, Nt is a positive integer, for example, Nt is 100.
The
Figure BDA0003237602180000083
The angle to w can be calculated using conventional methods, such as using equation (2):
Figure BDA0003237602180000084
Figure BDA0003237602180000085
to solve the gradient operator of the network parameter w, angle represents an angle, x is the input face image, and w is the network parameter.
Step 1011-: and counting the proportion of the included angle to an obtuse angle or an acute angle.
The number of obtuse angles in the included angle can be counted and recorded as at, and the proportion of the included angle which is an obtuse angle or an acute angle is at/Nt。
Step 1011-: and if the proportion is in the preset range, determining that the initial deep learning network converges after Nt times of iterative training.
The preset range is expressed as: thdtl~thdthIf atAnd Nt is not in the preset range, the electronic equipment determines that the initial deep learning network (namely the initial face recognition network) after Nt iterations does not converge. Wherein, thdtl<thdthIs preferably selected from thdtl=0.4,thdth0.6; if atAnd Nt is within the preset range, the convergence of the initial deep learning network (namely the initial face recognition network) is determined after Nt times of iterative training.
In this embodiment, whether the initial deep learning network after iteration converges can be quickly determined by calculating an included angle between the parameter gradient and the network parameter.
In one embodiment, a schematic diagram of obtaining an exploratory learning rate is provided, as shown in fig. 6:
steps 1011 to 21: setting the initial exploration learning rate as a preset learning rate.
The electronic equipment adjusts the network parameters of the initial deep learning network to w0, namely, w is w0, and the initial exploration learning rate lr ise. The initial value given by the initial exploration learning rate is the initial exploration learning rate, and the initial value can be determined according to the specific situation of the current training task, for example, the learning rate of the pre-trained initial deep learning network can be used as lreInitial value of (1), i.e. initial search learning rate of lre
Steps 1011 to 22: ne1 iterative training is carried out on the initial deep learning network according to the initial exploration learning rate and the sample training set, and Ne1 is an integer larger than 1.
The electronic equipment searches the learning rate lr according to the set initialePerforming deep learning, and performing initial deep learning network m by using sample training setwNe1 iterative training is performed, Ne1 may be 100.
Steps 1011 to 23: ne2 iterative trainings are carried out on the initial deep learning network after Ne1 iterative trainings according to the fine-tuning learning rate, and Ne2 is an integer larger than 1.
The learning rate of the electronic equipment resetting training is the fine-tuning learning rate, namely, deep learning is carried out according to the fine-tuning learning rate, and the initial deep learning network m is subjected to the sample training setwNe2 iterative trainings are performed, and Ne2 may be the same as Ne1 or Ne1, for example, Ne2 is 100 or 50.
Steps 1011 to 24: and after Ne2 iterative training, judging whether the initial deep learning network is converged, if not, executing steps 1011-25, otherwise, executing steps 1011-26.
And the electronic equipment acquires the included angle between the parameter gradient of each iterative training and the corresponding network parameter.
The electronic equipment sets the learning rate of the network optimizer to lrtTraining m using training setwNetwork Nt times, each training record
Figure BDA0003237602180000091
The included angle between the angle w and the angle w,
Figure BDA0003237602180000092
to obtain the gradient operator of the network parameter w, Nt is a positive integer, for example, Nt is 100.
The
Figure BDA0003237602180000093
The angle to w can be calculated using conventional methods, as shown using equation (2); and counting the proportion of the included angle to an obtuse angle or an acute angle. The electronic equipment can count the acute and medium angles of the included angleThe number of (a) is denoted ast0In proportion to Ne 2.
The preset range is expressed as: thdtl~thdthIf at0if/Ne 2 is not within the preset range, the electronic device determines that the initial deep learning network after Ne2 iterations has not converged. thdtl<thdthIt can be taken as thdtl being 0.4 and thdth being 0.6; if at0if/Ne 2 is within the preset range, then Ne2 times of iterative training are performed, and the initial deep learning network converges.
Steps 1011 to 25: the initial search learning rate is used as the search learning rate.
Steps 1011 to 26: the initial exploration learning rate is increased, and the step of Ne1 times of iterative training of the initial deep learning network according to the initial exploration learning rate is returned to.
The electronic device may increase the initial fine-tuning learning rate by a predetermined ratio, for example, let lre=re*lre,re>1, can take re=1.1。
In this embodiment, the electronic device may determine the exploration learning rate quickly by determining whether to converge, so as to increase the speed of acquiring the exploration learning rate.
In one embodiment, a schematic diagram of obtaining the optimization step size is provided, as shown in fig. 7:
steps 1011 to 31: and acquiring an initial step length, an initial minimum step length and an evaluation index of the initial deep learning network as a first evaluation index.
Electronic equipment sets initial step length s and initial minimum step length smS and smAll positive integers, e.g. s-sample training lumped sample/samples used per training 2, smFloor (s 0.05) +1, floor (×) represents rounding down; test m on the test setwProperty p of (a). Since the iterative learning expression can be as shown in formula (3),
Figure BDA0003237602180000094
lrtindicates the fine learning rate, lreThe heuristic rate is expressed, x represents the number of iterations, q is a fixed parameter, and q may be set according to the actual situation, for example, q may be 10. s represents the optimization step size.
The initial iteration number x is 0, that is, the iterative learning rate lr corresponding to the 1 st iterative training is determinede. The evaluation index of the initial deep learning network is tested by the test set and is p0, and the evaluation index is the first evaluation index.
Steps 1011 to 32: and performing x times of iterative training on the initial deep learning network according to the fine tuning learning rate, the exploration learning rate and the initial step length, wherein x is an integer larger than 0.
For example, x is 0, training the initial deep learning network iteratively for 1 st time using the sample training set. The sample training set comprises a plurality of face sample images, such as: 100, 1000 images, etc. The number of update iterations x is x + 1.
Steps 1011 to 33: and when the iteration times x are larger than or equal to the initial step length, acquiring the initial deep learning network evaluation index after the current iteration and taking the initial deep learning network evaluation index as a second evaluation index.
Specifically, an initial deep learning network (i.e., an initial face recognition network in this example) m is recordedwHas a network parameter of wcLet m stand forwNetwork parameter w ═ w (w0+ w)c) Test m on test setwAnd p is used as a second evaluation index.
Steps 1011 to 34: and judging whether the second evaluation index is smaller than the first evaluation index with a preset proportion, if so, executing steps 1011-35, otherwise, executing steps 1011-36.
The preset ratio value may be set according to practical applications, for example, the preset ratio may be 0.95.
Steps 1011 to 35: and if the second evaluation index is smaller than the first evaluation index of the preset proportion, reducing the initial step length, and if the reduced initial step length is larger than the initial minimum step length, reducing the fine tuning learning rate and the exploration learning rate, and updating the initial minimum step length.
Specifically, the manner of decreasing the initial step size may be the manner of equation (4):
s ═ floor (s × 0.9) formula (4);
when the initial step length is reduced, s is judged>smIf the initial step length after reduction is larger than the initial minimum step length, the fine tuning learning rate and the exploration learning rate are reduced, and the initial minimum step length is updated, for example, lr may be sete=0.95*lre,lrt=0.95*lrt
Steps 1011 to 36: and if the second evaluation index is larger than or equal to the first evaluation index of the preset proportion, taking the initial step length as an optimization step length, and acquiring the fine tuning learning rate and the exploration learning rate of the current iteration.
Specifically, if the second evaluation index is greater than or equal to the first evaluation index, the electronic device takes the initial step length as an optimized step length, and obtains a fine tuning learning rate and an exploration learning rate adopted by the current iterative training.
In this embodiment, the electronic device determines a suitable optimization step length by fine-tuning the learning rate and exploring the learning rate, so as to avoid unnecessary search of convergence points due to an excessively large optimization step length.
In one embodiment, a schematic diagram for implementing fusion of multiple network parameters is provided, as shown in fig. 8:
step 101: and searching network parameters corresponding to the pre-trained initial deep learning network at least two convergence points.
Specifically, the electronic device may initialize the number n of times the converged network parameter is convergedmWhen m is equal to 0, m iswIs set to w0, i.e. let w equal to w0, the shadow network m is initializedw’Where w' is a network parameter of the shadow network, mwAnd mw’The network structure is the same, only the parameters are different, and the shadow network can be used for storing the fused network parameters so as to facilitate further fusion in the following.
Step 1021: and acquiring the network parameters in the ith superposition result and the average value of the network parameters corresponding to the (i +1) th convergence point, and taking the average value as the (i +1) th superposition result to perform the next fusion, wherein i is an integer greater than 0.
In particular, falseLet mwHas a network parameter of wcUpdating shadow network mw’W '═ w' × nm/(nm+1)+wc/(nm+1), wherein, once fusing is completed, the number n of times of fusing is updatedmI.e. nm=nm+1. Initial fusion number nm=0。
Step 103: and changing the initial network parameters of the initial deep learning network into optimized network parameters to obtain a new deep learning network.
In this embodiment, the electronic device fuses the network parameters corresponding to each convergence point in an average superposition manner, so that the fusion manner is simple, and meanwhile, the difference between the fused network parameters corresponding to other convergence points can be reduced, thereby improving the stability of the face recognition network.
In one embodiment, the present application provides another schematic diagram for implementing fusion of multiple network parameters, as shown in fig. 9:
step 101: and searching network parameters corresponding to the pre-trained initial deep learning network at least two convergence points.
Step 1021-1: and judging whether the value of i +1 is less than the preset fusion times, if so, executing the step 1021, and otherwise, ending the fusion of the network parameters.
The electronic equipment judges whether n is presentm>nthdEnd the converged network parameter, e.g., nthd=50。
Step 1021: and acquiring the network parameters in the ith superposition result and the average value of the network parameters corresponding to the (i +1) th convergence point, and taking the average value as the (i +1) th superposition result to perform the next fusion, wherein i is an integer greater than 0.
Step 103: and changing the initial network parameters of the initial deep learning network into optimized network parameters to obtain a new deep learning network.
In this embodiment, the electronic device reduces unnecessary fusion calculation and reduces the waste of calculation resources by stacking for a limited number of times while ensuring the fusion accuracy.
The above embodiments can be mutually combined and cited, for example, the following embodiments are examples after being combined, but not limited thereto; the embodiments can be arbitrarily combined into a new embodiment without contradiction.
In one embodiment, a flow chart of a method of network training is provided, as shown in FIG. 10.
Step 1011-1: and acquiring the fine tuning learning rate according to the initial deep learning network.
Step 1011-2: and acquiring the exploration learning rate according to the initial deep learning network and the fine tuning learning rate.
Step 1011-3: and obtaining an optimization step length and updating the fine-tuning learning rate and the exploration learning rate according to the initial deep learning network, the fine-tuning learning rate and the exploration learning rate.
Step 1011: and determining an iterative learning rate corresponding to the (n +1) th iterative training according to a preset iterative learning rate expression, wherein the iterative learning rate expression is a decreasing function, n is used for representing the iterative times, and n is an integer greater than 0.
The iterative learning expression comprises: the learning rate is a learning rate from training to convergence, and the exploration learning rate is used for searching a learning rate of a next convergence point.
Step 1012: and training the initial deep learning network after the nth iterative training according to the iterative learning rate and a sample training set, wherein the sample training set comprises a plurality of sample images.
Step 1013: and updating the iteration number n to n + 1.
Step 1014: and (4) judging whether the iteration number n is smaller than the optimization step length, if so, executing step 1011, otherwise, ending the searching process, and executing step 1015.
Step 1015: and obtaining the network parameters in the initial deep learning network after the nth iterative training as the network parameters corresponding to the searched convergence points, and executing the step 1021-1.
Step 1021-1: and judging whether the value of i +1 is less than the preset fusion times, if so, executing the step 1021, and otherwise, ending the fusion of the network parameters.
Step 1021: and acquiring the network parameters in the ith superposition result and the average value of the network parameters corresponding to the (i +1) th convergence point, and taking the average value as the (i +1) th superposition result to perform the next fusion, wherein i is an integer greater than 0.
Step 103: and changing the initial network parameters of the initial deep learning network into optimized network parameters to obtain a new deep learning network.
The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
The embodiment of the application also provides a face recognition method, which is applied to electronic equipment, wherein the electronic equipment is provided with the face recognition network obtained by the network training method. The flow of the face recognition method is shown in fig. 11:
step 201: and acquiring a face image to be recognized.
Specifically, the face recognition network deployed on the electronic device may be a face recognition network obtained by the electronic device in advance according to the network training method provided in the first embodiment. The specific process of training the face recognition network may be as follows: searching network parameters corresponding to at least two convergence points of a pre-trained initial face recognition network; fusing network parameters corresponding to at least two convergence points of the initial face recognition network to generate optimized network parameters; and changing the initial network parameters of the initial face recognition network into optimized network parameters to obtain a new face recognition network.
Step 202: and inputting the face image into a face recognition network to obtain a recognition result of the face image.
Specifically, the electronic device inputs the face image into the trained face recognition network, and then the recognition result of the face image can be obtained.
In the embodiment of the application, the trained face recognition network is obtained by training by adopting the network training method, so that the accuracy of face recognition network recognition is improved.
The embodiment of the present application relates to an electronic device, as shown in fig. 12, including: at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of network training described above.
The memory and the processor are connected by a bus, which may include any number of interconnected buses and bridges, linking together one or more of the various circuits of the processor and the memory. The bus may also link various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over a wireless medium via an antenna, which further receives the data and transmits the data to the processor.
The processor is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And the memory may be used to store data used by the processor in performing operations.
Embodiments of the present application relate to a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the network training method described above.
Those skilled in the art can understand that all or part of the steps in the method of the foregoing embodiments may be implemented by a program to instruct related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, etc.) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims (12)

1. A method of network training, comprising:
searching network parameters corresponding to at least two convergence points of a pre-trained initial deep learning network;
fusing network parameters corresponding to at least two convergence points of the initial deep learning network to generate optimized network parameters;
and changing the initial network parameters of the initial deep learning network into the optimized network parameters to obtain a new deep learning network.
2. The method according to claim 1, wherein the searching for the network parameters corresponding to the pre-trained initial deep learning network at least two convergence points comprises:
for each search the following processing is performed:
acquiring the network parameters in the initial deep learning network after the (n +1) th iterative training as the network parameters corresponding to the (n +1) th convergence point,
determining an iterative learning rate corresponding to the (n +1) th iterative training according to a preset iterative learning rate expression, wherein the iterative learning rate expression is a decreasing function, n is used for representing the iterative times, and n is an integer greater than 0;
training the initial deep learning network after the nth iterative training according to the iterative learning rate and a sample training set, wherein the sample training set comprises a plurality of sample images;
updating the iteration number n to n + 1;
judging whether the iteration times n are smaller than the optimization step length, if so, executing a step of determining the iteration learning rate corresponding to the (n +1) th iteration training according to a preset iteration learning rate expression; otherwise, acquiring the network parameters in the initial deep learning network after the nth iterative training as the network parameters corresponding to the searched convergence points.
3. The method of network training of claim 2, wherein the iterative learning expression comprises: the learning rate is adjusted in a fine mode, the exploration learning rate and the optimization step length, the fine adjustment learning rate is a learning rate from training to convergence, and the exploration learning rate is used for searching the learning rate of the next convergence point;
before determining the iterative learning rate corresponding to the (n +1) th iterative training according to the preset iterative learning rate expression, the method comprises the following steps:
acquiring a fine tuning learning rate according to the initial deep learning network;
acquiring an exploration learning rate according to the initial deep learning network and the fine tuning learning rate;
and according to the initial deep learning network, the fine tuning learning rate and the exploration learning rate, obtaining the optimization step length, updating the fine tuning learning rate and updating the exploration learning rate.
4. The method of network training according to claim 3, wherein the obtaining a fine learning rate according to the initial deep learning network comprises:
setting the initial fine tuning learning rate as a preset learning rate;
performing Nt times of iterative training on the initial deep learning network according to the initial fine tuning learning rate and the sample training set, wherein Nt is an integer greater than 1;
judging whether the initial deep learning network converges after Nt times of iterative training;
if the initial deep learning network is converged, taking the initial fine tuning learning rate as the fine tuning learning rate;
and if the initial deep learning network is not converged, reducing the initial fine tuning learning rate, and returning to the step of performing Nt times of iterative training on the initial deep learning network according to the initial fine tuning learning rate.
5. The method of network training according to claim 4, wherein the determining whether the initial deep learning network converges after Nt iterative training includes:
acquiring an included angle between the parameter gradient of each iterative training and each corresponding network parameter;
counting the proportion of the included angle as an obtuse angle or an acute angle;
and if the proportion is in a preset range, determining that the initial deep learning network converges after the Nt times of iterative training.
6. The method according to claim 3 or 4, wherein the obtaining an exploration learning rate according to the initial deep learning network and the fine tuning learning rate comprises:
setting the initial exploration learning rate as a preset learning rate;
performing Ne1 iterative training on the initial deep learning network according to the initial exploration learning rate and the sample training set, wherein Ne1 is an integer greater than 1;
carrying out Ne2 times of iterative training on the initial deep learning network subjected to Ne1 times of iterative training according to the fine tuning learning rate, wherein Ne2 is an integer greater than 1;
after Ne2 iterative training, judging whether the initial deep learning network converges;
if the initial deep learning network is not converged, taking the initial exploration learning rate as the exploration learning rate;
and if the initial deep learning network converges, increasing the initial exploration learning rate, and returning to the step of performing Ne1 times of iterative training on the initial deep learning network according to the initial exploration learning rate.
7. The method according to claim 3 or 4, wherein the obtaining an exploration learning rate according to the initial deep learning network and the fine tuning learning rate comprises:
acquiring an initial step length, an initial minimum step length and an evaluation index of the initial deep learning network as a first evaluation index;
performing x times of iterative training on the initial deep learning network according to the fine tuning learning rate, the exploration learning rate and the initial step length, wherein x is an integer larger than 0;
when the iteration times x are larger than or equal to the initial step length, acquiring an initial deep learning network evaluation index after current iteration and taking the initial deep learning network evaluation index as a second evaluation index;
if the second evaluation index is smaller than the first evaluation index in a preset proportion, reducing the initial step length, if the reduced initial step length is larger than the initial minimum step length, reducing the fine tuning learning rate and the exploration learning rate, and updating the initial minimum step length;
and if the second evaluation index is larger than or equal to the first evaluation index in a preset proportion, taking the initial step length as the optimization step length, and obtaining the fine tuning learning rate and the exploration learning rate of the current iteration.
8. The method for network training according to any one of claims 1 to 4, wherein the fusing the network parameters corresponding to the initial deep learning network at least two convergence points to generate optimized network parameters comprises:
and acquiring the network parameter in the ith superposition result and the average value of the network parameter corresponding to the (i +1) th convergence point, and taking the average value as the (i +1) th superposition result to perform the next fusion, wherein i is an integer greater than 0.
9. The method of network training of claim 8, wherein prior to performing the next fusion, the method further comprises:
and judging that the value of i +1 is less than the preset fusion times.
10. A face recognition method applied to an electronic device deployed with a face recognition network obtained by the network training method according to claims 1 to 9, the face recognition method comprising:
acquiring a face image to be recognized;
and inputting the face image into the face recognition network to obtain a recognition result of the face image.
11. An electronic device, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of network training of any one of claims 1-9 or to perform the method of face recognition of claim 10.
12. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the method of network training of any one of claims 1 to 9, or implements the method of face recognition of claim 10 when executed.
CN202111007686.3A 2021-08-30 2021-08-30 Face recognition network training method, electronic equipment and storage medium Active CN113780526B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111007686.3A CN113780526B (en) 2021-08-30 2021-08-30 Face recognition network training method, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111007686.3A CN113780526B (en) 2021-08-30 2021-08-30 Face recognition network training method, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113780526A true CN113780526A (en) 2021-12-10
CN113780526B CN113780526B (en) 2022-08-05

Family

ID=78840066

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111007686.3A Active CN113780526B (en) 2021-08-30 2021-08-30 Face recognition network training method, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113780526B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020114119A1 (en) * 2018-12-07 2020-06-11 深圳光启空间技术有限公司 Cross-domain network training method and cross-domain image recognition method
CN111767989A (en) * 2020-06-29 2020-10-13 北京百度网讯科技有限公司 Neural network training method and device
US20210133571A1 (en) * 2019-11-05 2021-05-06 California Institute Of Technology Systems and Methods for Training Neural Networks
CN113033525A (en) * 2021-05-26 2021-06-25 北京的卢深视科技有限公司 Training method of image recognition network, electronic device and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020114119A1 (en) * 2018-12-07 2020-06-11 深圳光启空间技术有限公司 Cross-domain network training method and cross-domain image recognition method
US20210133571A1 (en) * 2019-11-05 2021-05-06 California Institute Of Technology Systems and Methods for Training Neural Networks
CN111767989A (en) * 2020-06-29 2020-10-13 北京百度网讯科技有限公司 Neural network training method and device
CN113033525A (en) * 2021-05-26 2021-06-25 北京的卢深视科技有限公司 Training method of image recognition network, electronic device and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
L. XIANGMEI,ET.AL: "L. Xiangmei and Q. Zhi, "The application of Hybrid Neural Network Algorithms in Intrusion Detection System," 2011 International Conference on E-Business and E-Government (ICEE), 2011, pp. 1-4, doi: 10.1109/ICEBEG.2011.5882041.", 《2011 INTERNATIONAL CONFERENCE ON E-BUSINESS AND E-GOVERNMENT (ICEE)》 *
李辰政,等: "基于迁移学习的危险行为识别方法研究", 《科学技术与工程》 *

Also Published As

Publication number Publication date
CN113780526B (en) 2022-08-05

Similar Documents

Publication Publication Date Title
US11341424B2 (en) Method, apparatus and system for estimating causality among observed variables
KR20190086134A (en) Method and apparatus for selecting optiaml training model from various tarining models included in neural network
CN110832509B (en) Black box optimization using neural networks
JP7287397B2 (en) Information processing method, information processing apparatus, and information processing program
CN108446770B (en) Distributed machine learning slow node processing system and method based on sampling
CN111079780A (en) Training method of space map convolution network, electronic device and storage medium
CN112948608B (en) Picture searching method and device, electronic equipment and computer readable storage medium
CN109934330A (en) The method of prediction model is constructed based on the drosophila optimization algorithm of diversified population
CN115860081B (en) Core algorithm scheduling method, system, electronic equipment and storage medium
CN111260056B (en) Network model distillation method and device
CN114637881B (en) Image retrieval method based on multi-agent metric learning
CN110929218A (en) Difference minimization random grouping method and system
CN115659807A (en) Method for predicting talent performance based on Bayesian optimization model fusion algorithm
CN114882307A (en) Classification model training and image feature extraction method and device
CN114298326A (en) Model training method and device and model training system
CN113780526B (en) Face recognition network training method, electronic equipment and storage medium
US11307867B2 (en) Optimizing the startup speed of a modular system using machine learning
CN111310857A (en) Feature extraction method, electronic device and medical case similarity model construction method
CN116151384A (en) Quantum circuit processing method and device and electronic equipment
CN115329611A (en) Inertial navigation component simulation method and device, electronic equipment and storage medium
CN117557870B (en) Classification model training method and system based on federal learning client selection
CN112669893B (en) Method, system, device and equipment for determining read voltage to be used
CN117253209B (en) Automatic driving point cloud detection method, device, communication equipment and storage medium
CN113673591B (en) Self-adjusting sampling optimization image classification method, device and medium
CN117828382B (en) Network interface clustering method and device based on URL

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220512

Address after: 230091 room 611-217, R & D center building, China (Hefei) international intelligent voice Industrial Park, 3333 Xiyou Road, high tech Zone, Hefei, Anhui Province

Applicant after: Hefei lushenshi Technology Co.,Ltd.

Address before: 100083 room 3032, North B, bungalow, building 2, A5 Xueyuan Road, Haidian District, Beijing

Applicant before: BEIJING DILUSENSE TECHNOLOGY CO.,LTD.

Applicant before: Hefei lushenshi Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant