CN112906909A

CN112906909A - Deep learning model training method and device, electronic equipment and storage medium

Info

Publication number: CN112906909A
Application number: CN202110492705.XA
Authority: CN
Inventors: 刘山和
Original assignee: Hubei Ecarx Technology Co Ltd
Current assignee: Hubei Ecarx Technology Co Ltd
Priority date: 2021-05-07
Filing date: 2021-05-07
Publication date: 2021-06-04

Abstract

The embodiment of the application provides a deep learning model training method, a deep learning model training device, electronic equipment and a storage medium, wherein the method comprises the following steps: iteratively training a deep learning model based on sample data, and judging whether parameters of the deep learning model are local optimal parameters or not after each iteration training; if so, carrying out variation on the parameters of the deep learning model based on the parameter variation rate; judging whether the variation times reach the set times, if so, determining the local optimal parameters as the final parameters of the deep learning model; if not, continuing iterative training of the deep learning model based on the varied parameters; judging whether the loss value of the deep learning model is superior to the loss value calculated based on the local optimal parameters or not; if yes, resetting the parameter variation rate to an initial value, and returning to the step of training the deep learning model based on sample data iteration; if not, returning to the step of carrying out variation on the parameters of the deep learning model based on the parameter variation rate. Thereby enabling determination of globally optimal parameters.

Description

Deep learning model training method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of deep learning technologies, and in particular, to a deep learning model training method and apparatus, an electronic device, and a storage medium.

Background

In the deep learning algorithm, a loss function is required to describe the difference between the model output value and the true value, and the optimization direction of the deep learning is guided according to the difference, for example, the optimization is performed towards the direction that the difference becomes smaller by using a gradient descent method, and the process of optimizing the deep learning model can also be understood as the process of adjusting parameters in the deep learning model.

However, for complex data scenarios, such as speech recognition, complex loss functions need to be set, and there may be multiple function wave bottoms.

By adopting the existing deep learning model training method, if the loss function is complex, only the local optimal parameters of the model parameters can be obtained, and the global optimal parameters of the model parameters cannot be obtained, so that the final deep learning model training effect is poor.

Disclosure of Invention

An object of the embodiments of the present application is to provide a deep learning model training method and apparatus, an electronic device, and a storage medium thereof, so as to determine global optimal parameters of a deep learning model and improve a deep learning model training effect. The specific technical scheme is as follows:

in order to achieve the above object, an embodiment of the present application provides a deep learning model training method, where the method includes:

iteratively training a deep learning model based on sample data, and judging whether parameters of the deep learning model are local optimal parameters or not after each iteration training;

if so, carrying out variation on the parameters of the deep learning model based on the parameter variation rate, and calculating and recording variation times of the parameters of the deep learning model based on the parameter variation rate;

judging whether the variation times reach set times, if so, determining the local optimal parameters as final parameters of the deep learning model to obtain a target deep learning model;

if not, continuing to iteratively train the deep learning model based on the varied parameters, and judging whether the loss value of the deep learning model is superior to the loss value calculated based on the local optimal parameters or not aiming at the deep learning model iteratively trained based on the varied parameters; wherein the loss value represents a value of a loss function having a parameter of the deep learning model as a variable;

if yes, resetting the parameter variation rate to an initial value, and returning to the step of training the deep learning model based on sample data iteration;

if not, returning to the step of carrying out variation on the parameters of the deep learning model based on the parameter variation rate.

Optionally, the mutating the parameters of the deep learning model based on the parameter variation rate includes:

and randomly replacing the parameters of the deep learning model by the parameter variation rate to obtain the varied parameters.

Optionally, when the parameters of the deep learning model are varied based on the parameter variation rate, the parameter variation rate is gradually increased to vary the parameters of the deep learning model.

Optionally, the initial value of the parameter variation rate is zero.

Optionally, the deep learning model is a speech recognition deep learning model, and the sample data is speech sample data.

In order to achieve the above object, an embodiment of the present application further provides a deep learning model training apparatus, where the apparatus includes:

the training module is used for iteratively training a deep learning model based on sample data and judging whether parameters of the deep learning model are local optimal parameters or not after each iteration training;

the first variation module is used for performing variation on the parameters of the deep learning model based on the parameter variation rate and recording the variation times of the parameters of the deep learning model based on the parameter variation rate if the local optimal parameters are obtained;

the variation frequency judging module is used for judging whether the variation frequency reaches a set frequency or not;

the target model determining module is used for determining the local optimal parameters as final parameters of the deep learning model when the variation times reach set times, and if the obtained target deep learning model reaches the set variation times, determining the local optimal parameters as final parameters of the deep learning model to obtain the target deep learning model;

the iteration module is used for continuing to iteratively train the deep learning model based on the parameters after the variation when the variation times do not reach the set times, and judging whether the loss value of the deep learning model is superior to the loss value calculated based on the local optimal parameters or not aiming at the deep learning model iteratively trained based on the parameters after the variation; wherein the loss value represents a value of a loss function having a parameter of the deep learning model as a variable;

the resetting module is used for resetting the parameter variation rate to an initial value when the loss value of the deep learning model is superior to the loss value calculated based on the local optimal parameter, and returning to the step of iteratively training the deep learning model based on sample data;

and the second mutation module is used for returning to the step of mutating the parameters of the deep learning model based on the parameter mutation rate when the loss value of the deep learning model is not superior to the loss value calculated based on the locally optimal parameters.

In order to achieve the above object, an embodiment of the present application further provides an electronic device, including a processor, a communication interface, a memory, and a communication bus; the processor, the communication interface and the memory complete mutual communication through a communication bus;

a memory for storing a computer program;

and the processor is used for realizing any method step when executing the program stored in the memory.

To achieve the above object, an embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements any of the above method steps.

The embodiment of the application has the following beneficial effects:

by applying the deep learning model training method, the device, the electronic equipment and the storage medium provided by the embodiment of the application, the deep learning model is iteratively trained based on sample data, and whether the parameters of the deep learning model are local optimal parameters or not is judged after each iteration training; if so, carrying out variation on the parameters of the deep learning model based on the parameter variation rate, and recording the variation times of the parameters of the deep learning model based on the parameter variation rate; judging whether the variation times reach the set times, if so, determining the local optimal parameters as final parameters of the deep learning model to obtain a target deep learning model; if not, continuing to iteratively train the deep learning model based on the varied parameters, and judging whether the loss value of the deep learning model is superior to the loss value calculated based on the local optimal parameters or not aiming at the deep learning model iteratively trained based on the varied parameters; wherein the loss value represents a value of a loss function using a parameter of the deep learning model as a variable; if yes, resetting the parameter variation rate to an initial value, and returning to the step of training the deep learning model based on sample data iteration; if not, returning to the step of carrying out variation on the parameters of the deep learning model based on the parameter variation rate. Therefore, when the deep learning model falls into the local optimal parameters, the local optimal parameters can be jumped out through parameter variation, and the real global optimal parameters are searched, so that the global optimal parameters in the deep learning model are determined, and the model training effect is improved.

Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a deep learning model training method according to an embodiment of the present disclosure;

FIG. 2 is a schematic structural diagram of a deep learning model training apparatus according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In order to solve the technical problem that if an existing deep learning model training method is adopted, if a loss function is complex, only local optimal parameters can be obtained, and global optimal parameters cannot be obtained, so that the final model training effect is poor, the embodiment of the application provides a deep learning model training method and device, electronic equipment and a storage medium.

The deep learning model training method provided by the embodiment of the application relates to the technical field of deep learning, and can be particularly applied to the fields of voice recognition, image recognition, user behavior analysis and the like.

As one example, the deep learning model is a speech recognition deep learning model, then the sample data may be speech sample data.

Referring to fig. 1, fig. 1 is a schematic flowchart of a deep learning model training method provided in an embodiment of the present application, where the method may include the following steps:

s101: and iteratively training the deep learning model based on the sample data, and judging whether the parameters of the deep learning model are local optimal parameters or not after each iteration training. If yes, go to S102.

Those skilled in the art will appreciate that in the deep learning algorithm, a loss function is required to describe the difference between the model output and the actual value, and the loss function takes the parameters of the deep learning model as variables. The process of training the deep learning model is the process of continuously adjusting the model parameters in the deep learning module, so that the loss function value is continuously reduced.

In the embodiment of the application, the deep learning model can be iteratively trained based on sample data, and whether the parameters of the deep learning model are locally optimal parameters or not is judged after each iteration training. And continuously adjusting parameters in the deep learning model in the iterative training process.

Wherein, adjusting the parameters of the deep learning model can be realized based on a gradient descent method.

In the embodiment of the application, the basis for judging whether the local optimal parameters are obtained can be various and can be set according to actual requirements.

As an example, during iterative training, if the model parameters change less, it may be considered that locally optimal parameters are obtained. For example, in the process of several successive rounds of iterative training, if the difference values of the model parameters obtained by two adjacent rounds of iterations are all smaller than a preset threshold, it can be considered that the local optimal parameters are obtained.

As another example, in an iterative training process, if the value of the loss function becomes smaller, a locally optimal solution may be considered to be obtained. For example, in the process of several successive rounds of iterative training, if the difference values of the loss values calculated based on the model parameters obtained by two adjacent rounds of iteration are all smaller than a preset threshold, it can be considered that the local optimal parameters are obtained.

In the embodiment of the application, after each round of iterative training, if the local optimal parameters are not obtained through judgment, the step of iteratively training the deep learning model based on the sample data is returned, and the local optimal parameters can be obtained after multiple times of iterative training.

S102: the parameters of the deep learning model are mutated based on the parameter mutation rate, and the mutation times of the parameters of the deep learning model which are mutated based on the parameter mutation rate are recorded;

s103: judging whether the variation times reach the set times, if so, determining the local optimal parameters as final parameters of the deep learning model to obtain a target deep learning model;

in the embodiment of the present application, after the local optimal parameter is obtained, a loss value calculated based on the local optimal parameter may be recorded. The loss value represents a value of a loss function having a parameter of the deep learning model as a variable.

In the embodiment of the application, after the local optimal parameters are obtained, the parameters of the deep learning model are mutated based on the parameter mutation rate, and after each parameter mutation, the mutation times of the parameters of the deep learning model based on the parameter mutation rate are recorded.

And if the variation times reach the set value, determining the local optimal parameters as the final parameters of the deep learning model to obtain the target deep learning model.

Wherein, the parameters of the deep learning model are mutated based on the parameter mutation rate, namely: and carrying out random replacement on the parameters of the deep learning model by using the parameter variation rate to obtain the varied parameters.

When random replacement is carried out, partial parameters in the deep learning model can be replaced, and all parameters in the deep learning model can also be replaced.

S104: if not, continuously and iteratively training a deep learning model based on the varied parameters, and judging whether the loss value of the deep learning model is superior to the loss value calculated based on the local optimal parameters or not aiming at the deep learning model iteratively trained based on the varied parameters; wherein the loss value represents a value of a loss function having a parameter of the deep learning model as a variable.

If yes, go to step S105; if not, the process returns to the step S102, that is, the process returns to the step of mutating the parameters of the deep learning model based on the parameter mutation rate.

S105: and resetting the parameter variation rate as an initial value, and returning to the step of training the deep learning model based on sample data iteration.

In the embodiment of the application, after parameter variation is carried out, the deep learning model is continuously subjected to iterative training based on the varied parameters, and whether the loss value of the deep learning model is superior to the loss value calculated based on the local optimal parameters or not is judged in the iterative training process.

If the loss value of the deep learning model based on the parameter iterative training after the variation is superior to the loss value calculated based on the local optimal parameter, which indicates that a model parameter better than the local optimal parameter is found, the current parameter variation rate can be reset to an initial value, namely, zero variation rate, and on the basis of the variation parameter, the step of iteratively training the deep learning model based on the sample data is returned. In the subsequent iterative training process, new local optimal parameters can be obtained.

If the loss value of the deep learning model based on the parameter iterative training after the variation is not superior to the loss value calculated based on the local optimal parameter, which indicates that no model parameter better than the local optimal solution is found, the model parameter can be continuously varied, and the deep learning model is continuously iteratively trained based on the parameter after the variation.

The parameter variation rate represents a probability, that is, a probability of performing variation processing on the model parameter. For faster convergence of the model, the initial parameter variation rate may be set to zero or a smaller value, and the initial parameter variation rate is adopted before obtaining the locally optimal parameter.

In an embodiment of the present application, when a parameter of the deep learning model is varied based on the parameter variation rate, the parameter variation rate may be gradually increased to vary the parameter of the deep learning model.

That is, after obtaining the locally optimal parameters, the parameter variation rate can be continuously increased. For example, the parameter variation rate is set to be positively correlated with the number of iteration rounds after the locally optimal parameter is obtained.

As an example, after obtaining the local optimal parameter, the parameter variation rate is set to be 0.1 in the first iteration, the parameter variation rate is set to be 0.2 in the second iteration, and so on.

In the embodiment of the application, the larger the variation times is, the larger the probability of obtaining a model parameter which is better than the local optimal parameter is, if the set times is reached, the model parameter which is better than the local optimal parameter is still not found, the current local optimal parameter can be considered as the global optimal parameter, and the current local optimal parameter can be determined as the final parameter of the deep learning model, so that the target deep learning model, namely the trained deep learning model, is obtained.

As an example, if the set number of variations is 50, after obtaining the local optimal parameter, continuing iterative training, and performing parameter variation, where in the 50 parameter variations, the loss value of the deep learning model corresponding to the variation parameter of each time is not better than the loss value calculated based on the local optimal parameter, which indicates that a new more optimal parameter cannot be found, the local optimal parameter may be considered to be the global optimal parameter.

In the embodiment of the application, the global optimal parameters of the deep learning model can be determined, so that the training effect of the deep learning model can be improved.

For example, when the deep learning model is a speech recognition deep learning model and the sample data is speech sample data, the global optimal parameters of the speech recognition deep learning model can be determined by using the deep learning model training method provided by the embodiment of the application, so that the accuracy of speech recognition is improved.

By applying the deep learning model training method provided by the embodiment of the application, the deep learning model is iteratively trained based on sample data, and whether the parameters of the deep learning model are local optimal parameters or not is judged after each iteration training; if so, carrying out variation on the parameters of the deep learning model based on the parameter variation rate, recording the variation times of the variation on the parameters of the deep learning model based on the parameter variation rate, and if the variation times reach the set value, determining the local optimal parameters as the final parameters of the deep learning model to obtain the target deep learning model; continuously and iteratively training the deep learning model based on the varied parameters, and judging whether the loss value of the deep learning model is superior to the loss value calculated based on the local optimal parameters or not aiming at the deep learning model iteratively trained based on the varied parameters; wherein the loss value represents a value of a loss function using a parameter of the deep learning model as a variable; if yes, resetting the parameter variation rate to an initial value, and returning to the step of training the deep learning model based on sample data iteration; if not, increasing the parameter variation rate to perform variation on the parameters of the deep learning model, and returning to the step of continuously iteratively training the deep learning model based on the varied parameters.

Therefore, when the deep learning model falls into the local optimal parameters, the local optimal parameters can be jumped out through parameter variation, and the real global optimal parameters are searched, so that the global optimal parameters in the deep learning model are determined, and the model training effect is improved.

An embodiment of the present application further provides a deep learning model training device, referring to fig. 2, fig. 2 is a schematic structural diagram of the deep learning model training device provided in the embodiment of the present application, and as shown in fig. 2, the device includes:

the training module 201 is configured to iteratively train a deep learning model based on sample data, and determine whether a parameter of the deep learning model is a locally optimal parameter after each iteration training;

a first variation module 202, configured to, if a local optimal parameter is obtained, vary a parameter of the deep learning model based on a parameter variation rate, and record a variation frequency of varying the parameter of the deep learning model based on the parameter variation rate;

a variation frequency judging module 203, configured to judge whether the variation frequency reaches a set frequency;

the target model determining module 204 is configured to determine the locally optimal parameter as a final parameter of the deep learning model when the variation times reaches a set number, so as to obtain a target deep learning model;

the iteration module 205 is configured to continue to iteratively train the deep learning model based on the varied parameters when the variation times do not reach the set times, and determine whether a loss value of the deep learning model is better than a loss value calculated based on the locally optimal parameter for the deep learning model iteratively trained based on the varied parameters; wherein the loss value represents a value of a loss function having a parameter of the deep learning model as a variable;

a resetting module 206, configured to reset the parameter variation rate to an initial value when the loss value of the deep learning model is better than the loss value calculated based on the locally optimal parameter, and return to the step of iteratively training the deep learning model based on sample data;

and a second mutation module 207, configured to, when the loss value of the deep learning model is not better than the loss value calculated based on the locally optimal parameter, return to the step of mutating the parameter of the deep learning model based on the parameter mutation rate.

By applying the deep learning model training device provided by the embodiment of the application, the deep learning model is iteratively trained based on sample data, and whether the parameters of the deep learning model are local optimal parameters or not is judged after each iteration training; if so, carrying out variation on the parameters of the deep learning model based on the parameter variation rate, recording the variation times of the variation on the parameters of the deep learning model based on the parameter variation rate, and if the variation times reach the set value, determining the local optimal parameters as the final parameters of the deep learning model to obtain the target deep learning model; continuously and iteratively training the deep learning model based on the varied parameters, and judging whether the loss value of the deep learning model is superior to the loss value calculated based on the local optimal parameters or not aiming at the deep learning model iteratively trained based on the varied parameters; wherein the loss value represents a value of a loss function using a parameter of the deep learning model as a variable; if yes, resetting the parameter variation rate to an initial value, and returning to the step of training the deep learning model based on sample data iteration; if not, returning to the step of carrying out variation on the parameters of the deep learning model based on the parameter variation rate.

The method and the device are based on the same application concept, and because the principles of solving the problems of the method and the device are similar, the implementation of the device and the method can be mutually referred, and repeated parts are not repeated.

The embodiment of the present application further provides an electronic device, as shown in fig. 3, which includes a processor 301, a communication interface 302, a memory 303, and a communication bus 304, where the processor 301, the communication interface 302, and the memory 303 complete mutual communication through the communication bus 304,

a memory 303 for storing a computer program;

the processor 301, when executing the program stored in the memory 303, implements the following steps:

if so, carrying out variation on the parameters of the deep learning model based on the parameter variation rate, and recording the variation times of the parameters of the deep learning model based on the parameter variation rate;

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

By applying the electronic equipment provided by the embodiment of the application, when the deep learning model falls into the local optimal parameters, the local optimal parameters can be jumped out through parameter variation, and the real global optimal parameters are searched, so that the global optimal parameters in the deep learning model are determined, and the model training effect is improved.

In yet another embodiment provided by the present application, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any of the deep learning model training methods described above.

In yet another embodiment provided by the present application, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the steps of any of the deep learning model training methods of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the embodiments of the deep learning model training apparatus, the electronic device, the computer-readable storage medium and the computer program product, since they are substantially similar to the embodiments of the deep learning model training method, the description is relatively simple, and relevant points can be found in the partial description of the embodiments of the deep learning model training method.

The above description is only for the preferred embodiment of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. A deep learning model training method, the method comprising:

2. The method of claim 1, wherein the mutating the parameters of the deep learning model based on a parameter mutation rate comprises:

3. The method of claim 1, wherein when the parameters of the deep learning model are varied based on parameter variation rates, the parameters of the deep learning model are varied by gradually increasing the parameter variation rates.

4. The method of claim 1, wherein the initial value of the parameter variation rate is zero.

5. The method according to any one of claims 1-4, wherein the deep learning model is a speech recognition deep learning model, and the sample data is speech sample data.

6. An apparatus for deep learning model training, the apparatus comprising:

the target model determining module is used for determining the local optimal parameters as final parameters of the deep learning model when the variation times reach set times to obtain a target deep learning model;

7. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any one of claims 1 to 5 when executing a program stored in the memory.

8. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-5.