CN113780526A

CN113780526A - Network training method, electronic device and storage medium

Info

Publication number: CN113780526A
Application number: CN202111007686.3A
Authority: CN
Inventors: 刘冲冲; 付贤强; 何武; 朱海涛; 户磊
Original assignee: Beijing Dilusense Technology Co Ltd; Hefei Dilusense Technology Co Ltd
Current assignee: Hefei Dilusense Technology Co Ltd
Priority date: 2021-08-30
Filing date: 2021-08-30
Publication date: 2021-12-10
Anticipated expiration: 2041-08-30
Also published as: CN113780526B

Abstract

The embodiment of the invention relates to the field of deep learning, and discloses a network training method, electronic equipment and a storage medium. The invention discloses a network training method, electronic equipment and a storage medium, comprising the following steps: searching network parameters corresponding to at least two convergence points of a pre-trained initial deep learning network; fusing network parameters corresponding to at least two convergence points of the initial deep learning network to generate optimized network parameters; and changing the initial network parameters of the initial deep learning network into optimized network parameters to obtain a new deep learning network. By adopting the embodiment, the performance of the network for solving the actual problem can be deeply learned.

Description

Network training method, electronic device and storage medium

Technical Field

The embodiment of the invention relates to the field of deep learning, in particular to a network training method, electronic equipment and a storage medium.

Background

Accurate deep learning networks rely on excellent training methods, and generally conventional methods of deep learning networks require the acquisition or collection of data for a particular problem. In general, the data used to train the network is only a subset of the data that is involved in a particular problem. And after the data set is obtained, designing a performance index, and iteratively updating network parameters by using a gradient descent method to finally obtain the trained deep learning network.

The deep learning network usually adopts a gradient iteration training mode, which essentially performs a series of mathematical operations, and converges when a loss function value reaches a minimum, however, in the process of iterative optimization network, saddle points, local minimum and other problems can be encountered, so that the deep learning network cannot be trained to an optimal state; in addition, since the data set cannot be completely equivalent to the data distribution of the actual problem, even if the performance of the deep learning network is optimized in training data, the performance of the deep learning network for solving the actual problem cannot be guaranteed.

Disclosure of Invention

The embodiment of the invention aims to provide a network training method, electronic equipment and a storage medium, which can improve the performance of solving practical problems of a deep learning network.

In order to solve the foregoing technical problem, in a first aspect, an embodiment of the present application provides a method, an electronic device, and a storage medium for network training, including: searching network parameters corresponding to at least two convergence points of a pre-trained initial deep learning network; fusing network parameters corresponding to at least two convergence points of the initial deep learning network to generate optimized network parameters; and changing the initial network parameters of the initial deep learning network into optimized network parameters to obtain a new deep learning network.

In a second aspect, an embodiment of the present application further provides a face recognition method, which is applied to an electronic device, where the electronic device is deployed with a face recognition network obtained by the network training method, and the face recognition method includes: acquiring a face image to be recognized; and inputting the face image into a face recognition network to obtain a recognition result of the face image.

In a third aspect, an embodiment of the present application further provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the network training method or the face recognition method.

In a fourth aspect, the present application further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the above-mentioned network training method or performs the above-mentioned face recognition method.

In the embodiment of the application, the network parameters corresponding to at least two convergence points of the pre-trained initial deep learning network are searched, because of adopting the deep learning mode, the deep learning network training process has a plurality of convergence points, and the network parameters corresponding to different convergence points may be different, by fusing the network parameters corresponding to at least two convergence points, the difference between the obtained optimized network parameters and the network parameters corresponding to each other convergence point is reduced, and the new deep learning network obtained based on the optimized network parameters is more stable, and at the same time, because the optimized network parameters are fused with the network parameters corresponding to the plurality of convergence points, the difference between the actual working performance of the new deep learning network and the performance of the network on a training set is reduced, and the problem solving performance of the deep learning network is improved, for example, the recognition rate of the deep learning network for face recognition is improved; in addition, because the optimization is carried out based on the initial deep learning network, the network structure of the initial deep learning network can be reserved, a data preprocessing method for the network does not need to be changed, and a method for processing network output data does not need to be changed, so that extra workload cannot be added to the deployment of the network, and the time consumption of network reasoning cannot be increased.

Drawings

One or more embodiments are illustrated by way of example in the accompanying drawings, which correspond to the figures in which like reference numerals refer to similar elements and which are not to scale unless otherwise specified.

FIG. 1 is a flow chart of a method of network training in an embodiment of the present application;

FIG. 2 is a diagram illustrating network parameters corresponding to a plurality of convergence points searched for in an embodiment of the present application;

FIG. 3 is a diagram illustrating an example of constructing an iterative learning expression according to an embodiment of the present application;

FIG. 4 is a diagram illustrating obtaining a fine learning rate according to an embodiment of the present application;

FIG. 5 is a diagram illustrating an embodiment of the present invention for determining whether an initial deep learning network converges;

FIG. 6 is a diagram illustrating acquisition of an exploration learning rate according to an embodiment of the present application;

FIG. 7 is a schematic diagram of obtaining an optimization step size in an embodiment of the present application;

FIG. 8 is a schematic diagram of a plurality of network parameters merged in an embodiment of the present application;

FIG. 9 is a schematic diagram of another implementation of fusing multiple network parameters in an embodiment of the present application;

FIG. 10 is a flow chart of a method of network training in an embodiment of the present application;

FIG. 11 is a flow chart of a method of face recognition in an embodiment of the present application;

fig. 12 is a schematic structural diagram of an electronic device in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.

The following embodiments are divided for convenience of description, and should not constitute any limitation to the specific implementation manner of the present invention, and the embodiments may be mutually incorporated and referred to without contradiction.

The network training method provided in the embodiment of the present application may be implemented by an electronic device, where the electronic device may be a server, a stand-alone device, and the network training method flow is shown in fig. 1:

step 101: and searching network parameters corresponding to the pre-trained initial deep learning network at least two convergence points.

In particular, the deep learning network is applied to various fields, for example, in the field of image recognition, the recognition of a human face can be realized by a human face recognition network. The embodiment of the application takes face recognition as an example, specifically describes a training process of a deep learning network (namely, the deep learning network is a face recognition network), and can also be applied to other fields in other embodiments, namely, the initial deep learning network can be a voice recognition network, an image migration network, a position positioning network and the like.

The conventional training method of the face recognition network needs to collect or collect various face data. Typically, the data used to train a face recognition network involves only a subset of the face image data; and after the face data set is obtained, designing a performance index, and iteratively updating the network parameters of the face recognition network by using a gradient descent method to finally obtain the trained face recognition network.

In the iterative optimization process of the existing face recognition network, problems such as saddle points, local minimum values and the like can be encountered, so that the face recognition network cannot be trained to an optimal state; in addition, because the sample training set cannot be completely equal to the data distribution in practical application, even if the performance of the face recognition network is optimized on the sample training set, the optimal performance of the face recognition network for solving practical problems cannot be ensured. The network training method can enable the performance of the trained face recognition network for solving the practical problem to be optimal.

The electronic device obtains an initial deep learning network which is trained in advance, the initial deep learning network in this example is an initial face recognition network, that is, the initial face recognition network mentioned later is an initial deep learning network, and the face recognition network is a deep learning network, which will not be described in this embodiment. The following specifically addresses the training process of the face recognition network.

The initial face recognition network to be trained is marked as m_wW represents the network parameter to be optimized in the face recognition network, training m_wThe loss function used is recorded as loss, the input face image sample is recorded as x, and the training process is the optimization target: argmin_w(loss(m_w(x) ))). In this example, a random Gradient Descent (SGD) optimizer may be selected during the training process.

The data set is divided into a training set and a testing set, and the intersection of the training set and the testing set is an empty set. The initial face recognition network m can be trained in advance in a conventional mode_w. And testing an evaluation index p of the initial face recognition network through the test set, wherein the evaluation index refers to the performance of the initial face recognition network, such as the accuracy rate, the recognition rate and the like of face recognition, and when the performance of the network is not improved any more, the initial face recognition network is obtained. For example, when m_wThe test performance of the last several times (for example, 20 times) on the test set is not improved in general, and the test performance is stabilized at the fixed upper limit performance value, which indicates that the initial face recognition network training is completed. Obtaining m on a test set_wP is p0, m_wIs denoted as w 0. When the face recognition network m is started_wAfter excellent performance is achieved, m cannot be increased any more by continuing conventional training_wThe performance of (c).

In this embodiment, the learning rate in the initial face recognition network may be adjusted, so that other convergence points may be searched based on the training of the pre-trained initial face recognition network to obtain network parameters corresponding to the other convergence points, where the other convergence points are convergence points other than the convergence point corresponding to the pre-trained initial face recognition network.

Step 102: and fusing the network parameters corresponding to the at least two convergence points of the initial deep learning network to generate optimized network parameters.

Specifically, if at least 2 network parameters corresponding to the convergence points are searched, at least two network parameters may be fused, and the fusion manner may be an average value of the network parameters corresponding to each convergence point.

Step 103: and changing the initial network parameters of the initial deep learning network into optimized network parameters to obtain a new deep learning network.

In one embodiment, a specific way to search for network parameters corresponding to a plurality of convergence points is provided, and the process of each search is shown in fig. 2:

step 1011: and determining an iterative learning rate corresponding to the (n +1) th iterative training according to a preset iterative learning rate expression, wherein the iterative learning rate expression is a decreasing function, n is used for representing the iterative times, and n is an integer greater than or equal to 0.

Specifically, the electronic device may initialize the number n of times of fusing the at least two network parameters_mWhen m is equal to 0, m is_wIs set to w0, i.e. let w equal to w0, the shadow network m is initialized_w’Where w' is a network parameter of the shadow network, m_wAnd m_w’The network structures are the same, only the network parameters are different, and the shadow network can be used for storing the fused network parameters so as to further fuse the network parameters corresponding to other convergence points in the following process.

The preset iterative learning expression can be a linear decreasing function, and the iterative learning expression comprises: the method comprises the steps of fine tuning learning rate, exploring learning rate and optimizing step length, wherein the fine tuning learning rate is the learning rate used for training to convergence, and the exploring learning rate is the learning rate used for searching the next convergence point. The iterative learning expression can be expressed as formula (1):

wherein, lr_tIndicates the fine learning rate, lr_eRepresents the exploration learning rate, n represents the number of iterations, q is a fixed parameter, and q can be set according to the actual situation, for example, q can be 10. s denotes the optimization step size and floor (×) is a rounded down function.

Learning rate lr of electronic device through fine tuning_tExploration learning rate lr_eThe iteration number n may determine the iterative learning rate of the (n +1) th iterative training, for example, if the initialization iteration number n is 0, the iterative learning rate corresponding to the 1 st iterative training may be determined to be lr_e。

Step 1012: and training the initial deep learning network after the nth iterative training according to the iterative learning rate and a sample training set, wherein the sample training set comprises a plurality of sample images.

For example, initially, let n be 0, and the iterative learning rate for the 1 st iterative training be lr_eAnd performing iterative training on the initial deep learning network of the initial face recognition network for the 1 st time by using the sample training set. The sample training set includes several sample images, such as: 100, 1000 images, etc. In the embodiment of the application, the sample image is a human face image. The subsequent iterative training is similar to the first iterative training, and will not be described here.

Step 1013: and updating the iteration number n to n + 1.

Step 1014: and judging whether the iteration number n is smaller than the optimization step length, if so, returning to the step 1011, otherwise, ending the searching process, and executing the step 1015.

Step 1015: and acquiring the network parameters in the initial deep learning network after the nth iterative training as the network parameters corresponding to the searched convergence points, and executing the step 102.

Specifically, the updated iteration number is compared with the optimization step length, and whether the updated iteration number is smaller than the optimization step length is judged. And if the iteration times do not reach the optimized step length, determining the iterative learning rate for the next iterative training according to a preset iterative learning rate expression.

Because the expression of the iterative learning rate can be a decreasing function, namely the iterative learning rate is gradually changed from the exploration learning rate to the fine adjustment learning rate, the iterative learning rate is updated for 1 time when the training set is iterated for 1 time. And the process of exploring and converging to a new convergence point is completed from the time when the iterative learning rate is equal to the exploring learning rate to the time when the iterative learning rate is equal to the fine tuning learning rate.

In this embodiment, by adjusting the learning rate and optimizing the step length, the search of the convergence point can be reduced, unnecessary computation is reduced, the speed of training the deep learning network by the electronic device is increased, and the consumption of computing resources is reduced. The method in the application can improve the speed of training the face recognition network by the electronic equipment.

In one embodiment, before performing step 1011, an iterative learning expression is constructed, the flow of which is shown in FIG. 3:

step 1011-1: and acquiring the fine tuning learning rate according to the initial deep learning network.

Specifically, other convergence points of the network can be searched based on the trained initial deep learning network. The fine tuning learning rate can ensure that the convergence is achieved in the network training process, and the fine tuning learning rate is smaller than the exploration learning rate.

Step 1011-2: and acquiring the exploration learning rate according to the initial deep learning network and the fine tuning learning rate.

Specifically, the exploration learning rate can assist the network training process to jump out of the current convergence point, so that other convergence points can be searched, and the condition that the same convergence point is searched is avoided.

Step 1011-3: and obtaining an optimization step length, updating the fine-tuning learning rate and updating the exploration learning rate according to the initial deep learning network, the fine-tuning learning rate and the exploration learning rate.

Step 1011: and determining an iterative learning rate corresponding to the (n +1) th iterative training according to a preset iterative learning rate expression, wherein the iterative learning rate expression is a decreasing function, n is used for representing the iterative times, and n is an integer greater than 0.

The iterative learning expression comprises: the learning rate is a learning rate from training to convergence, and the exploration learning rate is used for searching a learning rate of a next convergence point.

Step 1013: and updating the iteration number n to n + 1.

Step 1014: and (4) judging whether the iteration number n is smaller than the optimization step length, if so, executing step 1011, otherwise, ending the searching process, and executing step 1015.

In the embodiment, the fine tuning learning rate, the exploration learning rate and the optimization step length are respectively obtained through the initial identification network, so that the time for obtaining the fine tuning learning rate, the exploration learning rate and the optimization step length is reduced.

In one embodiment, a schematic diagram of obtaining a fine learning rate is provided, as shown in fig. 4:

step 1011-11: setting the initial fine tuning learning rate as a preset learning rate.

Specifically, the network parameter of the initial deep learning network is adjusted to w0, that is, w is w0, and the fine tuning learning rate lr is initialized_t. Initialization is lr_tAssigning an initial value, which may be determined according to the specific situation of the current training task, for example, the preset learning rate may be the learning rate of the pre-trained initial deep learning network as lr_tThe initial value of (c).

Step 1011-12: and performing Nt times of iterative training on the initial deep learning network according to the initial fine tuning learning rate and the sample training set, wherein Nt is an integer larger than 1.

Deep learning is carried out according to the set initial fine tuning learning rate, and the initial deep learning network m is subjected to the sample training set_wAnd performing Nt times of iterative training.

Step 1011-13: and judging whether the initial deep learning network is converged after Nt times of iterative training, if so, executing the step 1011-14, otherwise, executing the step 1011-15.

Step 1011-14: the initial fine-tuning learning rate is used as the fine-tuning learning rate.

Step 1011-15: and reducing the initial fine tuning learning rate, and returning to the step of executing Nt times of iterative training on the initial deep learning network according to the initial fine tuning learning rate.

The initial fine-tuning learning rate may be reduced by a predetermined ratio, for example, let lr_t＝r_t*lr_tWherein 0 is<r_t<1，r_tAt a predetermined ratio, r_tMay be 0.9.

In this embodiment, the fine tuning learning rate can be determined quickly by determining whether the convergence is achieved, and the speed of obtaining the fine tuning learning rate is increased.

In an embodiment, a schematic diagram for determining whether the initial deep learning network converges after Nt times of iterative training is provided, which is specifically shown in fig. 5:

step 1011-: and acquiring an included angle between the parameter gradient of each iterative training and the corresponding network parameter.

In this example, the description is continued by taking a face recognition network as an example. Setting learning rate of network optimizer to lr_tTraining face recognition network Nt times using training set, each training recording

The included angle between the angle w and the angle w,

to obtain the gradient operator of the network parameter w, Nt is a positive integer, for example, Nt is 100.

The

The angle to w can be calculated using conventional methods, such as using equation (2):

to solve the gradient operator of the network parameter w, angle represents an angle, x is the input face image, and w is the network parameter.

Step 1011-: and counting the proportion of the included angle to an obtuse angle or an acute angle.

The number of obtuse angles in the included angle can be counted and recorded as at, and the proportion of the included angle which is an obtuse angle or an acute angle is a_t/Nt。

Step 1011-: and if the proportion is in the preset range, determining that the initial deep learning network converges after Nt times of iterative training.

The preset range is expressed as: thd_tl～thd_thIf a_tAnd Nt is not in the preset range, the electronic equipment determines that the initial deep learning network (namely the initial face recognition network) after Nt iterations does not converge. Wherein, thd_tl<thd_thIs preferably selected from thd_tl＝0.4，thd_th0.6; if a_tAnd Nt is within the preset range, the convergence of the initial deep learning network (namely the initial face recognition network) is determined after Nt times of iterative training.

In this embodiment, whether the initial deep learning network after iteration converges can be quickly determined by calculating an included angle between the parameter gradient and the network parameter.

In one embodiment, a schematic diagram of obtaining an exploratory learning rate is provided, as shown in fig. 6:

steps 1011 to 21: setting the initial exploration learning rate as a preset learning rate.

The electronic equipment adjusts the network parameters of the initial deep learning network to w0, namely, w is w0, and the initial exploration learning rate lr is_e. The initial value given by the initial exploration learning rate is the initial exploration learning rate, and the initial value can be determined according to the specific situation of the current training task, for example, the learning rate of the pre-trained initial deep learning network can be used as lr_eInitial value of (1), i.e. initial search learning rate of lr_e。

Steps 1011 to 22: ne1 iterative training is carried out on the initial deep learning network according to the initial exploration learning rate and the sample training set, and Ne1 is an integer larger than 1.

The electronic equipment searches the learning rate lr according to the set initial_ePerforming deep learning, and performing initial deep learning network m by using sample training set_wNe1 iterative training is performed, Ne1 may be 100.

Steps 1011 to 23: ne2 iterative trainings are carried out on the initial deep learning network after Ne1 iterative trainings according to the fine-tuning learning rate, and Ne2 is an integer larger than 1.

The learning rate of the electronic equipment resetting training is the fine-tuning learning rate, namely, deep learning is carried out according to the fine-tuning learning rate, and the initial deep learning network m is subjected to the sample training set_wNe2 iterative trainings are performed, and Ne2 may be the same as Ne1 or Ne1, for example, Ne2 is 100 or 50.

Steps 1011 to 24: and after Ne2 iterative training, judging whether the initial deep learning network is converged, if not, executing steps 1011-25, otherwise, executing steps 1011-26.

And the electronic equipment acquires the included angle between the parameter gradient of each iterative training and the corresponding network parameter.

The electronic equipment sets the learning rate of the network optimizer to lr_tTraining m using training set_wNetwork Nt times, each training record

The included angle between the angle w and the angle w,

The

The angle to w can be calculated using conventional methods, as shown using equation (2); and counting the proportion of the included angle to an obtuse angle or an acute angle. The electronic equipment can count the acute and medium angles of the included angleThe number of (a) is denoted as_t0In proportion to Ne 2.

The preset range is expressed as: thd_tl～thd_thIf a_t0if/Ne 2 is not within the preset range, the electronic device determines that the initial deep learning network after Ne2 iterations has not converged. thd_tl<thd_thIt can be taken as thdtl being 0.4 and thdth being 0.6; if a_t0if/Ne 2 is within the preset range, then Ne2 times of iterative training are performed, and the initial deep learning network converges.

Steps 1011 to 25: the initial search learning rate is used as the search learning rate.

Steps 1011 to 26: the initial exploration learning rate is increased, and the step of Ne1 times of iterative training of the initial deep learning network according to the initial exploration learning rate is returned to.

The electronic device may increase the initial fine-tuning learning rate by a predetermined ratio, for example, let lr_e＝r_e*lr_e，r_e>1, can take r_e＝1.1。

In this embodiment, the electronic device may determine the exploration learning rate quickly by determining whether to converge, so as to increase the speed of acquiring the exploration learning rate.

In one embodiment, a schematic diagram of obtaining the optimization step size is provided, as shown in fig. 7:

steps 1011 to 31: and acquiring an initial step length, an initial minimum step length and an evaluation index of the initial deep learning network as a first evaluation index.

Electronic equipment sets initial step length s and initial minimum step length s_mS and s_mAll positive integers, e.g. s-sample training lumped sample/samples used per training 2, s_mFloor (s 0.05) +1, floor (×) represents rounding down; test m on the test set_wProperty p of (a). Since the iterative learning expression can be as shown in formula (3),

lr_tindicates the fine learning rate, lr_eThe heuristic rate is expressed, x represents the number of iterations, q is a fixed parameter, and q may be set according to the actual situation, for example, q may be 10. s represents the optimization step size.

The initial iteration number x is 0, that is, the iterative learning rate lr corresponding to the 1 st iterative training is determined_e. The evaluation index of the initial deep learning network is tested by the test set and is p0, and the evaluation index is the first evaluation index.

Steps 1011 to 32: and performing x times of iterative training on the initial deep learning network according to the fine tuning learning rate, the exploration learning rate and the initial step length, wherein x is an integer larger than 0.

For example, x is 0, training the initial deep learning network iteratively for 1 st time using the sample training set. The sample training set comprises a plurality of face sample images, such as: 100, 1000 images, etc. The number of update iterations x is x + 1.

Steps 1011 to 33: and when the iteration times x are larger than or equal to the initial step length, acquiring the initial deep learning network evaluation index after the current iteration and taking the initial deep learning network evaluation index as a second evaluation index.

Specifically, an initial deep learning network (i.e., an initial face recognition network in this example) m is recorded_wHas a network parameter of w_cLet m stand for_wNetwork parameter w ═ w (w0+ w)_c) Test m on test set_wAnd p is used as a second evaluation index.

Steps 1011 to 34: and judging whether the second evaluation index is smaller than the first evaluation index with a preset proportion, if so, executing steps 1011-35, otherwise, executing steps 1011-36.

The preset ratio value may be set according to practical applications, for example, the preset ratio may be 0.95.

Steps 1011 to 35: and if the second evaluation index is smaller than the first evaluation index of the preset proportion, reducing the initial step length, and if the reduced initial step length is larger than the initial minimum step length, reducing the fine tuning learning rate and the exploration learning rate, and updating the initial minimum step length.

Specifically, the manner of decreasing the initial step size may be the manner of equation (4):

s ═ floor (s × 0.9) formula (4);

when the initial step length is reduced, s is judged>s_mIf the initial step length after reduction is larger than the initial minimum step length, the fine tuning learning rate and the exploration learning rate are reduced, and the initial minimum step length is updated, for example, lr may be set_e＝0.95*lr_e，lr_t＝0.95*lr_t。

Steps 1011 to 36: and if the second evaluation index is larger than or equal to the first evaluation index of the preset proportion, taking the initial step length as an optimization step length, and acquiring the fine tuning learning rate and the exploration learning rate of the current iteration.

Specifically, if the second evaluation index is greater than or equal to the first evaluation index, the electronic device takes the initial step length as an optimized step length, and obtains a fine tuning learning rate and an exploration learning rate adopted by the current iterative training.

In this embodiment, the electronic device determines a suitable optimization step length by fine-tuning the learning rate and exploring the learning rate, so as to avoid unnecessary search of convergence points due to an excessively large optimization step length.

In one embodiment, a schematic diagram for implementing fusion of multiple network parameters is provided, as shown in fig. 8:

Specifically, the electronic device may initialize the number n of times the converged network parameter is converged_mWhen m is equal to 0, m is_wIs set to w0, i.e. let w equal to w0, the shadow network m is initialized_w’Where w' is a network parameter of the shadow network, m_wAnd m_w’The network structure is the same, only the parameters are different, and the shadow network can be used for storing the fused network parameters so as to facilitate further fusion in the following.

Step 1021: and acquiring the network parameters in the ith superposition result and the average value of the network parameters corresponding to the (i +1) th convergence point, and taking the average value as the (i +1) th superposition result to perform the next fusion, wherein i is an integer greater than 0.

In particular, falseLet m_wHas a network parameter of w_cUpdating shadow network m_w’W '═ w' × n_m/(n_m+1)+w_c/(n_m+1), wherein, once fusing is completed, the number n of times of fusing is updated_mI.e. n_m＝n_m+1. Initial fusion number n_m＝0。

In this embodiment, the electronic device fuses the network parameters corresponding to each convergence point in an average superposition manner, so that the fusion manner is simple, and meanwhile, the difference between the fused network parameters corresponding to other convergence points can be reduced, thereby improving the stability of the face recognition network.

In one embodiment, the present application provides another schematic diagram for implementing fusion of multiple network parameters, as shown in fig. 9:

Step 1021-1: and judging whether the value of i +1 is less than the preset fusion times, if so, executing the step 1021, and otherwise, ending the fusion of the network parameters.

The electronic equipment judges whether n is present_m>n_thdEnd the converged network parameter, e.g., n_thd＝50。

In this embodiment, the electronic device reduces unnecessary fusion calculation and reduces the waste of calculation resources by stacking for a limited number of times while ensuring the fusion accuracy.

The above embodiments can be mutually combined and cited, for example, the following embodiments are examples after being combined, but not limited thereto; the embodiments can be arbitrarily combined into a new embodiment without contradiction.

In one embodiment, a flow chart of a method of network training is provided, as shown in FIG. 10.

Step 1011-3: and obtaining an optimization step length and updating the fine-tuning learning rate and the exploration learning rate according to the initial deep learning network, the fine-tuning learning rate and the exploration learning rate.

Step 1013: and updating the iteration number n to n + 1.

Step 1015: and obtaining the network parameters in the initial deep learning network after the nth iterative training as the network parameters corresponding to the searched convergence points, and executing the step 1021-1.

The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.

The embodiment of the application also provides a face recognition method, which is applied to electronic equipment, wherein the electronic equipment is provided with the face recognition network obtained by the network training method. The flow of the face recognition method is shown in fig. 11:

step 201: and acquiring a face image to be recognized.

Specifically, the face recognition network deployed on the electronic device may be a face recognition network obtained by the electronic device in advance according to the network training method provided in the first embodiment. The specific process of training the face recognition network may be as follows: searching network parameters corresponding to at least two convergence points of a pre-trained initial face recognition network; fusing network parameters corresponding to at least two convergence points of the initial face recognition network to generate optimized network parameters; and changing the initial network parameters of the initial face recognition network into optimized network parameters to obtain a new face recognition network.

Step 202: and inputting the face image into a face recognition network to obtain a recognition result of the face image.

Specifically, the electronic device inputs the face image into the trained face recognition network, and then the recognition result of the face image can be obtained.

In the embodiment of the application, the trained face recognition network is obtained by training by adopting the network training method, so that the accuracy of face recognition network recognition is improved.

The embodiment of the present application relates to an electronic device, as shown in fig. 12, including: at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of network training described above.

The memory and the processor are connected by a bus, which may include any number of interconnected buses and bridges, linking together one or more of the various circuits of the processor and the memory. The bus may also link various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over a wireless medium via an antenna, which further receives the data and transmits the data to the processor.

The processor is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And the memory may be used to store data used by the processor in performing operations.

Embodiments of the present application relate to a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the network training method described above.

Those skilled in the art can understand that all or part of the steps in the method of the foregoing embodiments may be implemented by a program to instruct related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, etc.) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific examples for carrying out the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims

1. A method of network training, comprising:

searching network parameters corresponding to at least two convergence points of a pre-trained initial deep learning network;

fusing network parameters corresponding to at least two convergence points of the initial deep learning network to generate optimized network parameters;

and changing the initial network parameters of the initial deep learning network into the optimized network parameters to obtain a new deep learning network.

2. The method according to claim 1, wherein the searching for the network parameters corresponding to the pre-trained initial deep learning network at least two convergence points comprises:

for each search the following processing is performed:

acquiring the network parameters in the initial deep learning network after the (n +1) th iterative training as the network parameters corresponding to the (n +1) th convergence point,

determining an iterative learning rate corresponding to the (n +1) th iterative training according to a preset iterative learning rate expression, wherein the iterative learning rate expression is a decreasing function, n is used for representing the iterative times, and n is an integer greater than 0;

training the initial deep learning network after the nth iterative training according to the iterative learning rate and a sample training set, wherein the sample training set comprises a plurality of sample images;

updating the iteration number n to n + 1;

judging whether the iteration times n are smaller than the optimization step length, if so, executing a step of determining the iteration learning rate corresponding to the (n +1) th iteration training according to a preset iteration learning rate expression; otherwise, acquiring the network parameters in the initial deep learning network after the nth iterative training as the network parameters corresponding to the searched convergence points.

3. The method of network training of claim 2, wherein the iterative learning expression comprises: the learning rate is adjusted in a fine mode, the exploration learning rate and the optimization step length, the fine adjustment learning rate is a learning rate from training to convergence, and the exploration learning rate is used for searching the learning rate of the next convergence point;

before determining the iterative learning rate corresponding to the (n +1) th iterative training according to the preset iterative learning rate expression, the method comprises the following steps:

acquiring a fine tuning learning rate according to the initial deep learning network;

acquiring an exploration learning rate according to the initial deep learning network and the fine tuning learning rate;

and according to the initial deep learning network, the fine tuning learning rate and the exploration learning rate, obtaining the optimization step length, updating the fine tuning learning rate and updating the exploration learning rate.

4. The method of network training according to claim 3, wherein the obtaining a fine learning rate according to the initial deep learning network comprises:

setting the initial fine tuning learning rate as a preset learning rate;

performing Nt times of iterative training on the initial deep learning network according to the initial fine tuning learning rate and the sample training set, wherein Nt is an integer greater than 1;

judging whether the initial deep learning network converges after Nt times of iterative training;

if the initial deep learning network is converged, taking the initial fine tuning learning rate as the fine tuning learning rate;

and if the initial deep learning network is not converged, reducing the initial fine tuning learning rate, and returning to the step of performing Nt times of iterative training on the initial deep learning network according to the initial fine tuning learning rate.

5. The method of network training according to claim 4, wherein the determining whether the initial deep learning network converges after Nt iterative training includes:

acquiring an included angle between the parameter gradient of each iterative training and each corresponding network parameter;

counting the proportion of the included angle as an obtuse angle or an acute angle;

and if the proportion is in a preset range, determining that the initial deep learning network converges after the Nt times of iterative training.

6. The method according to claim 3 or 4, wherein the obtaining an exploration learning rate according to the initial deep learning network and the fine tuning learning rate comprises:

setting the initial exploration learning rate as a preset learning rate;

performing Ne1 iterative training on the initial deep learning network according to the initial exploration learning rate and the sample training set, wherein Ne1 is an integer greater than 1;

carrying out Ne2 times of iterative training on the initial deep learning network subjected to Ne1 times of iterative training according to the fine tuning learning rate, wherein Ne2 is an integer greater than 1;

after Ne2 iterative training, judging whether the initial deep learning network converges;

if the initial deep learning network is not converged, taking the initial exploration learning rate as the exploration learning rate;

and if the initial deep learning network converges, increasing the initial exploration learning rate, and returning to the step of performing Ne1 times of iterative training on the initial deep learning network according to the initial exploration learning rate.

7. The method according to claim 3 or 4, wherein the obtaining an exploration learning rate according to the initial deep learning network and the fine tuning learning rate comprises:

acquiring an initial step length, an initial minimum step length and an evaluation index of the initial deep learning network as a first evaluation index;

performing x times of iterative training on the initial deep learning network according to the fine tuning learning rate, the exploration learning rate and the initial step length, wherein x is an integer larger than 0;

when the iteration times x are larger than or equal to the initial step length, acquiring an initial deep learning network evaluation index after current iteration and taking the initial deep learning network evaluation index as a second evaluation index;

if the second evaluation index is smaller than the first evaluation index in a preset proportion, reducing the initial step length, if the reduced initial step length is larger than the initial minimum step length, reducing the fine tuning learning rate and the exploration learning rate, and updating the initial minimum step length;

and if the second evaluation index is larger than or equal to the first evaluation index in a preset proportion, taking the initial step length as the optimization step length, and obtaining the fine tuning learning rate and the exploration learning rate of the current iteration.

8. The method for network training according to any one of claims 1 to 4, wherein the fusing the network parameters corresponding to the initial deep learning network at least two convergence points to generate optimized network parameters comprises:

and acquiring the network parameter in the ith superposition result and the average value of the network parameter corresponding to the (i +1) th convergence point, and taking the average value as the (i +1) th superposition result to perform the next fusion, wherein i is an integer greater than 0.

9. The method of network training of claim 8, wherein prior to performing the next fusion, the method further comprises:

and judging that the value of i +1 is less than the preset fusion times.

10. A face recognition method applied to an electronic device deployed with a face recognition network obtained by the network training method according to claims 1 to 9, the face recognition method comprising:

acquiring a face image to be recognized;

and inputting the face image into the face recognition network to obtain a recognition result of the face image.

11. An electronic device, comprising:

at least one processor; and the number of the first and second groups,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of network training of any one of claims 1-9 or to perform the method of face recognition of claim 10.

12. A computer-readable storage medium, storing a computer program, wherein the computer program, when executed by a processor, implements the method of network training of any one of claims 1 to 9, or implements the method of face recognition of claim 10 when executed.