CN110751288A

CN110751288A - Model training method and device, computer equipment and storage medium

Info

Publication number: CN110751288A
Application number: CN201910875745.5A
Authority: CN
Inventors: 王健宗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-09-17
Filing date: 2019-09-17
Publication date: 2020-02-04
Anticipated expiration: 2039-09-17
Also published as: CN110751288B

Abstract

The application relates to a model training method, a model training device, computer equipment and a storage medium. The method relates to deep learning, comprising: obtaining model output data, wherein the model output data is obtained by inputting training sample data into a model to be trained; acquiring preset benchmark result data corresponding to the training sample data; when the model to be trained does not meet the training completion condition, determining the current gradient of the model output data according to the output difference between the model output data and the reference result data; according to a preset Gaussian random rule, performing gradient updating on the current gradient to obtain an updated gradient of the model output data; and determining an update weight of the model to be trained according to the update gradient, updating the model to be trained according to the update weight, and returning to the step of obtaining the output data of the model. By adopting the method, the model training effect can be improved.

Description

Model training method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a model training method and apparatus, a computer device, and a storage medium.

Background

With the development of computer technology, the theory and technology of Artificial Intelligence (AI) are becoming mature and the application field is expanding. Machine Learning (ML) is the most mainstream artificial intelligence implementation method at present, and Deep Learning (DL) is a branch of Machine Learning, so that the Machine simulates human activities such as audio-visual and thinking, the complex pattern recognition problems are solved, and the artificial intelligence related technology is greatly improved. The concept of deep learning is derived from the research of Artificial Neural Networks (ANN), and the Artificial Neural networks abstract the Neural networks of the human brain from the information processing perspective, establish a certain simple model, and form different networks according to different connection modes, which are referred to as Neural networks or Neural Network-like networks for short. Therefore, Deep learning is also called Deep Neural Networks (DNNs).

The performance functions, i.e. objective functions, in deep learning networks are mostly non-convex, i.e. not strictly monotonic. At present, a network model is trained directly based on a gradient descent method, the non-convex problem of an objective function cannot be effectively solved, and for a non-strict monotonous objective function, gradient descent is easy to fall into a small local optimal problem, so that the training effect of the network model is limited.

Disclosure of Invention

In view of the above, it is necessary to provide a model training method, apparatus, computer device and storage medium capable of improving the effect of model training.

A method of model training, the method comprising:

obtaining model output data, wherein the model output data is obtained by inputting training sample data into a model to be trained;

acquiring preset benchmark result data corresponding to the training sample data;

when the model to be trained does not meet the training completion condition, determining the current gradient of the model output data according to the output difference between the model output data and the reference result data;

according to a preset Gaussian random rule, performing gradient updating on the current gradient to obtain an updated gradient of the model output data;

and determining an update weight of the model to be trained according to the update gradient, updating the model to be trained according to the update weight, and returning to the step of obtaining the output data of the model.

In one embodiment, after obtaining the preset benchmark result data corresponding to the training sample data, the method further includes:

and when the model to be trained meets the training completion condition or the training times reach a preset time threshold value according to the model output data and the reference result data, finishing the training to obtain the target model taking the model to be trained as the training completion.

In one embodiment, determining the current gradient of the model output data based on the output difference between the model output data and the baseline result data comprises:

acquiring a preset loss function;

determining a gradient function of the model output data according to the loss function;

and obtaining the current gradient of the model output data according to the gradient function, the model output data and the reference result data.

In one embodiment, the gradient updating the current gradient according to a preset gaussian random rule to obtain an updated gradient of the model output data includes:

acquiring a Gaussian random number;

and when the Gaussian random number meets a preset gradient updating condition, performing gradient updating on the current gradient through a preset gradient updating parameter to obtain an updating gradient of the model output data.

In one embodiment, the gaussian random number is obtained under the conditions of a preset mean value and a preset standard deviation; the gradient updating condition comprises the distance between the Gaussian random number and a preset mean value, and the distance exceeds 3 times of a preset standard deviation; the value range of the gradient updating parameter is 0-1.

In one embodiment, determining the update weight of the model to be trained according to the update gradient includes:

and determining the updating weight of the model to be trained according to the updating gradient based on a gradient descent method.

In one embodiment, based on the gradient descent method, determining the update weight of the model to be trained according to the update gradient includes:

acquiring the current weight and the learning rate of the model to be trained;

and calculating the update weight of the model to be trained according to the current weight, the learning rate, the update gradient and the training sample data.

A model training apparatus, the apparatus comprising:

the model output acquisition module is used for acquiring model output data, and the model output data is obtained by inputting training sample data into a model to be trained;

the reference result acquisition module is used for acquiring the preset reference result data corresponding to the training sample data;

the current gradient determining module is used for determining the current gradient of the model output data according to the output difference between the model output data and the reference result data when the model to be trained does not meet the training completion condition;

the gradient updating processing module is used for performing gradient updating on the current gradient according to a preset Gaussian random rule to obtain an updating gradient of the model output data;

and the weight updating processing module is used for determining an updating weight of the model to be trained according to the updating gradient, updating the model to be trained according to the updating weight and returning to the step of obtaining the output data of the model.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

According to the model training method, the model training device, the computer equipment and the storage medium, when the model to be trained does not meet the training completion condition, the current gradient of the model output data is determined, the current gradient is subjected to gradient updating according to a preset Gaussian random rule, the updating weight of the model to be trained is determined based on the obtained updating gradient, the model to be trained is updated according to the updating weight, and the model to be trained is returned to continue training. In the model training process, gradient updating is carried out through a Gaussian random rule, gradient rising can be achieved with small probability, the situation that the gradient reaches 0 too fast and falls into a local optimal solution is avoided, and therefore the effect of model training is improved.

Drawings

FIG. 1 is a diagram illustrating an exemplary implementation of a model training method;

FIG. 2 is a schematic flow chart diagram of a model training method in one embodiment;

FIG. 3 is a schematic flow chart illustrating the determination of a current gradient in one embodiment;

FIG. 4 is a block diagram showing the structure of a model training apparatus according to an embodiment;

FIG. 5 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The model training method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The terminal 102 sends training sample data to the server 104, the server 104 inputs the training sample data into a model to be trained stored on the server 104 to obtain model output data, when the model to be trained does not meet training completion conditions, the current gradient of the model output data is determined, gradient updating is carried out on the current gradient according to a preset Gaussian random rule, an updating weight of the model to be trained is determined based on the obtained updating gradient, the model to be trained is updated according to the updating weight, and the training is continued. In addition, the training sample data may be directly obtained from the local by the server 104 for model training, and the model to be trained may be constructed at the current server 104 or obtained from other end, and the local server 104 only performs the model training process. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.

In one embodiment, as shown in fig. 2, a model training method is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:

step S201: and obtaining model output data, wherein the model output data is obtained by inputting training sample data into the model to be trained.

The training sample data is model training input data constructed according to the training sample set when the network model is trained, and is used for training the network model. The model to be trained is a model to be trained, such as a neural network model. The model output data is obtained by the output of the model to be trained after the training sample data is input into the model to be trained.

Step S203: and acquiring preset benchmark result data corresponding to the training sample data.

The benchmark result data is preset corresponding to the training sample data and is used as a result corresponding to the training sample data actually, namely the benchmark data needing fitting is output by the network model.

Step S205: and when the model to be trained does not meet the training completion condition, determining the current gradient of the model output data according to the output difference between the model output data and the reference result data.

When the model to be trained does not meet the training completion condition, i.e., the condition whether the training is finished or not, for example, the training frequency of the model to be trained is insufficient, or the output accuracy of the model to be trained does not meet the requirement, the model to be trained is considered to not meet the training completion condition, i.e., the training needs to be continued, and the current gradient of the model output data is determined according to the output difference between the model output data and the reference result data. Wherein, the output difference between the model output data and the reference result data is used for reflecting the difference between the current model and the model required by the target; the gradient is intended to be a vector, which indicates that the directional derivative of a certain function at the point takes the maximum value along the direction, i.e. the function changes the fastest and the rate of change is the maximum along the direction (the direction of the gradient) at the point, and the current gradient is the gradient value of the data point of the model output data. Specifically, when the output difference between the model output data and the reference result data is represented by a loss function, a gradient function can be obtained according to the loss function, and further a current gradient can be obtained.

Step S207: and according to a preset Gaussian random rule, performing gradient updating on the current gradient to obtain an updated gradient of the model output data.

After the current gradient is determined, gradient updating is carried out on the current gradient based on a preset Gaussian random rule, and an updating gradient of the model output data is obtained. The gaussian random rule may be to generate a gaussian random number, and determine whether to update the current gradient according to the gaussian random number, for example, determine whether to update the current gradient according to whether the size of the gaussian random number satisfies a preset threshold, and if so, update the current gradient according to a preset gradient update parameter to obtain an update gradient of the model output data. The current gradient can be randomly interfered by the Gaussian random number, so that when the model to be trained enters local optimum, the gradient can still be prevented from reaching 0 too fast by a gradient rising method with a small probability, thereby falling into a local optimum solution and improving the effect of model training.

Step S209: and determining an update weight of the model to be trained according to the update gradient, updating the model to be trained according to the update weight, and returning to the step of obtaining the output data of the model.

And after the update gradient of the model output data is obtained, determining the update weight of the model to be trained according to the update gradient. The weight is a parameter of a hidden layer in the model to be trained, and the model to be trained can be controlled to carry out different processing on input to obtain different outputs by adjusting the weight, so that the model to be trained is adjusted. Updating the weight value is the target value that needs to be adjusted in the training process. And after the update weight of the model to be trained is obtained, updating the model to be trained according to the update weight, if the current weight in the model to be trained is directly replaced by the update weight, updating the model to be trained, and returning to the step of obtaining the output data of the model to perform the next training processing until the training of the model to be trained is completed.

In the model training method, when the model to be trained does not meet the training completion condition, the current gradient of the model output data is determined, the current gradient is subjected to gradient updating according to a preset Gaussian random rule, the updating weight of the model to be trained is determined based on the obtained updating gradient, the model to be trained is updated according to the updating weight, and the training is continued. In the model training process, gradient updating is carried out through a Gaussian random rule, gradient rising can be achieved with small probability, the situation that the gradient reaches 0 too fast and falls into a local optimal solution is avoided, and therefore the effect of model training is improved.

In one embodiment, after obtaining the training sample data corresponding to the preset benchmark result data, the method further includes: and when the model to be trained meets the training completion condition or the training times reach a preset time threshold value according to the model output data and the reference result data, finishing the training to obtain the target model taking the model to be trained as the training completion.

In this embodiment, when it is determined that the model to be trained is trained, the training is ended, and a trained target model, such as a deep learning neural network model, is obtained. Specifically, after the preset benchmark result data corresponding to the training sample data is obtained, whether the model to be trained meets the training completion condition is determined according to the model output data and the benchmark result data, if the output accuracy of the training is determined according to the model output data and the benchmark result data, if the output accuracy meets the preset model accuracy threshold value, the training completion condition is considered to be met, the training is ended, and the model to be trained is used as the target model after the training is completed. In addition, when the training frequency of the model to be trained reaches a preset frequency threshold, for example, the preset frequency threshold is 100 times, and when it is detected that the training frequency of the model to be trained reaches 100 times, the training is terminated to obtain the trained target model.

In one embodiment, as shown in FIG. 3, determining the current gradient of the model output data based on the output difference between the model output data and the baseline result data comprises:

step S301: and acquiring a preset loss function.

The loss function is used for representing the output difference between the model output data and the reference result data, and can be flexibly set according to actual requirements. For example, for real number field vectors y1 and y2, y1 is a dependent variable corresponding to n training sample data with input x, that is, reference result data corresponding to the training sample data, and y2 is output of a model to be trained, that is, model output data when x is used as input of the model to be trained. The output difference between y1 and y2 can be characterized by a loss function E (y1, y2), which in particular can be an euler distance function.

Step S303: from the loss function, a gradient function of the model output data is determined.

And after obtaining the loss function, determining the gradient function of the model output data according to the loss function. The gradient function of the model output data y2 as in the above example may be

Step S305: and obtaining the current gradient of the model output data according to the gradient function, the model output data and the reference result data.

After the gradient function of the model output data is obtained, the current gradient of the model output data is obtained according to the gradient function, the model output data and the reference result data, and specifically, the model output data and the reference result data can be respectively substituted into the gradient function to obtain the current gradient of the model output data. As in the above example, substituting specific values for y1 and y2And obtaining the current gradient of the model output data.

In one embodiment, the gradient updating the current gradient according to a preset gaussian random rule, and obtaining the updated gradient of the model output data includes: acquiring a Gaussian random number; and when the Gaussian random number meets a preset gradient updating condition, performing gradient updating on the current gradient through a preset gradient updating parameter to obtain an updating gradient of the model output data.

In this embodiment, the current gradient is updated by the generated gaussian random number. Specifically, when the current gradient is subjected to gradient updating, a gaussian random number is obtained, and the gaussian random number can be randomly generated under the conditions of a preset mean value and a preset standard deviation. And when the obtained Gaussian random number meets the gradient updating condition, performing gradient updating on the current gradient through a preset gradient updating parameter to obtain an updating gradient of the model output data. In a specific application, the mean value of the gaussian random number k is μ, the standard deviation is σ, and the gradient update condition is | k- μ | >3 σ, that is, when the obtained gaussian random number k satisfies | k- μ | >3 σ, it indicates that the gradient update condition is satisfied, and the current gradient can be subjected to gradient update through a preset gradient update parameter, for example, a gradient update parameter θ whose value is 0-1, so as to obtain an update gradient of the model output data.

In this embodiment, the gaussian random number is obtained under the conditions of a preset mean value and a preset standard deviation, for example, the mean value of the gaussian random number k is μ, and the standard deviation is σ. The gradient updating condition comprises the distance between the Gaussian random number and the preset mean value, and exceeds 3 times of the preset standard deviation, namely the gradient updating condition is | k-mu>3 sigma. The value range of the gradient updating parameter theta is 0-1. The current gradient is subjected to gradient updating to obtain an updating gradient of the model output data, and the updating gradient can be obtained as

Gaussian random numbers are introduced to carry out gradient updating on the current gradient, random interference can be generated on the updating of the gradient, the gradient is prevented from reaching 0 too fast, and therefore the local optimal solution is involved, and the effect of model training is improved.

In one embodiment, determining the update weight of the model to be trained according to the update gradient includes: and determining the updating weight of the model to be trained according to the updating gradient based on a gradient descent method.

In this embodiment, based on the gradient descent method, the update weight of the model to be trained is determined according to the update gradient. The objective of the gradient descent method is to find the minimum value of an objective function, the gradient is a vector, the objective function descends fastest at a specific point along the opposite direction of the gradient, an image metaphor is that when a descending mountain is imagined, the fastest direction of descending the mountain can be reached by one step, namely the opposite direction of the gradient, and each step is equivalent to one iteration updating of the gradient descent method. Just because the performance functions in the deep learning network model, namely the objective functions, are almost all non-convex, the traditional method of directly obtaining the optimal solution of the objective functions by the gradient descent method is difficult to obtain, and most of the optimal solution is locally optimal. In the application, the Gaussian random number is introduced to interfere gradient descent, so that the situation that the gradient reaches 0 too fast can be avoided, a local optimal solution is trapped, a better update weight is obtained, and the effect of model training is improved.

In one embodiment, based on the gradient descent method, determining the update weight of the model to be trained according to the update gradient includes: acquiring the current weight and the learning rate of the model to be trained; and calculating the update weight of the model to be trained according to the current weight, the learning rate, the update gradient and the training sample data.

Specifically, when the updated weight of the model to be trained is determined by a gradient descent method, the current weight and the learning rate of the model to be trained are obtained. The current weight value can be determined according to a hidden layer of the model to be trained, and the learning rate is preset before the model is trained. For example, the current weight w may be an initial value of 0.5, and the learning rate lr is 0.01. And calculating the update weight of the model to be trained according to the current weight, the learning rate, the update gradient and the training sample data. Such as updating the weight

In one embodiment, the model to be trained is an image recognition model for recognizing the class of the object in the image. Training sample data x ═ 0.5,0.7,1.0]The reference result data y1 ═ 0.6,0.71,1.2]The initialized current weight w is 0.5, the loss function E (y1, y2) is defined as an euler distance function, the learning rate lr is 0.01, the number of iteration steps, i.e., the training time threshold, is 100, and the gradient update parameter θ is 0.2. After the training sample data x is input into the model to be trained, the model output data y2 output by the model to be trained can be obtained [0.25,0.35,0.5 ═]. The loss function E (y1, y2) is the euler distance function, i.e., E (y1, y2) ═ y1-y2)²Y2 corresponds to a gradient ofAnd acquiring a Gaussian random number k, wherein the mean value of the Gaussian random number k is 0 and the standard deviation is 1. If Gaussian random number k>3.0, the gradient corresponding to y2 can be obtained according to the gradient update parameter theta

Otherwise, the current gradient is kept unchanged. Since the model output y x w for the model to be trained, the model output y x w

So that the update formula for the update weight w' is

And then obtaining an update weight of the model to be trained, updating the model to be trained according to the update weight, returning to the step of obtaining the output data of the model, and updating for the next time until the model training is finished, namely the training times reach the iteration steps of 100.

It should be understood that although the various steps in the flow charts of fig. 2-3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-3 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 4, there is provided a model training apparatus including: a model output obtaining module 401, a reference result obtaining module 403, a current gradient determining module 405, a gradient updating processing module 407, and a weight updating processing module 409, wherein:

a model output obtaining module 401, configured to obtain model output data, where the model output data is obtained by inputting training sample data into a model to be trained;

a benchmark result obtaining module 403, configured to obtain preset benchmark result data corresponding to the training sample data;

a current gradient determining module 405, configured to determine a current gradient of the model output data according to an output difference between the model output data and the reference result data when the model to be trained does not meet the training completion condition;

the gradient updating processing module 407 is configured to perform gradient updating on the current gradient according to a preset gaussian random rule to obtain an updated gradient of the model output data;

and the weight updating processing module 409 is used for determining an updating weight of the model to be trained according to the updating gradient, updating the model to be trained according to the updating weight, and returning to the step of obtaining the model output data.

In one embodiment, the training stopping module is further included, and is configured to, when it is determined that the model to be trained satisfies the training completion condition or the training frequency reaches a preset frequency threshold according to the model output data and the reference result data, stop training to obtain a target model with the model to be trained as the training completion.

In one embodiment, the current gradient determination module 405 includes a loss function unit, a gradient function unit, and a gradient determination unit; wherein: the loss function unit is used for acquiring a preset loss function; the gradient function unit is used for determining a gradient function of the model output data according to the loss function; and the gradient determining unit is used for obtaining the current gradient of the model output data according to the gradient function, the model output data and the reference result data.

In one embodiment, the gradient update processing module 407 includes a random number acquisition unit and a gradient update unit; wherein: a random number acquisition unit for acquiring a gaussian random number; and the gradient updating unit is used for performing gradient updating on the current gradient through a preset gradient updating parameter to obtain an updating gradient of the model output data when the Gaussian random number meets a preset gradient updating condition.

In one embodiment, the weight update processing module 409 includes a gradient descent processing unit, configured to determine an update weight of the model to be trained according to the update gradient based on a gradient descent method.

In one embodiment, the gradient descent processing unit comprises a parameter acquisition subunit and a weight value updating subunit; wherein: the parameter obtaining subunit is used for obtaining the current weight and the learning rate of the model to be trained; and the weight updating subunit is used for calculating the updating weight of the model to be trained according to the current weight, the learning rate, the updating gradient and the training sample data.

For specific limitations of the model training device, reference may be made to the above limitations of the model training method, which are not described herein again. The modules in the model training device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 5. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a model training method.

Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, there is provided a computer device comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program:

In one embodiment, the processor, when executing the computer program, further performs the steps of: and when the model to be trained meets the training completion condition or the training times reach a preset time threshold value according to the model output data and the reference result data, finishing the training to obtain the target model taking the model to be trained as the training completion.

In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring a preset loss function; determining a gradient function of the model output data according to the loss function; and obtaining the current gradient of the model output data according to the gradient function, the model output data and the reference result data.

In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring a Gaussian random number; and when the Gaussian random number meets a preset gradient updating condition, performing gradient updating on the current gradient through a preset gradient updating parameter to obtain an updating gradient of the model output data.

In one embodiment, the processor, when executing the computer program, further performs the steps of: the Gaussian random number is obtained under the conditions of a preset mean value and a preset standard deviation; the gradient updating condition comprises the distance between the Gaussian random number and a preset mean value, and the distance exceeds 3 times of a preset standard deviation; the value range of the gradient updating parameter is 0-1.

In one embodiment, the processor, when executing the computer program, further performs the steps of: and determining the updating weight of the model to be trained according to the updating gradient based on a gradient descent method.

In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring the current weight and the learning rate of the model to be trained; and calculating the update weight of the model to be trained according to the current weight, the learning rate, the update gradient and the training sample data.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of: and when the model to be trained meets the training completion condition or the training times reach a preset time threshold value according to the model output data and the reference result data, finishing the training to obtain the target model taking the model to be trained as the training completion.

In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring a preset loss function; determining a gradient function of the model output data according to the loss function; and obtaining the current gradient of the model output data according to the gradient function, the model output data and the reference result data.

In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring a Gaussian random number; and when the Gaussian random number meets a preset gradient updating condition, performing gradient updating on the current gradient through a preset gradient updating parameter to obtain an updating gradient of the model output data.

In one embodiment, the computer program when executed by the processor further performs the steps of: the Gaussian random number is obtained under the conditions of a preset mean value and a preset standard deviation; the gradient updating condition comprises the distance between the Gaussian random number and a preset mean value, and the distance exceeds 3 times of a preset standard deviation; the value range of the gradient updating parameter is 0-1.

In one embodiment, the computer program when executed by the processor further performs the steps of: and determining the updating weight of the model to be trained according to the updating gradient based on a gradient descent method.

In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring the current weight and the learning rate of the model to be trained; and calculating the update weight of the model to be trained according to the current weight, the learning rate, the update gradient and the training sample data.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of model training, the method comprising:

and determining an update weight of the model to be trained according to the update gradient, updating the model to be trained according to the update weight, and returning to the step of obtaining model output data.

2. The method according to claim 1, further comprising, after the obtaining of the preset benchmark result data corresponding to the training sample data,:

and when the model to be trained is determined to meet training completion conditions or the training times reach a preset time threshold according to the model output data and the reference result data, finishing training to obtain a target model taking the model to be trained as the training completion.

3. The method of claim 1, wherein determining the current gradient of the model output data based on the output difference between the model output data and the baseline result data comprises:

acquiring a preset loss function;

4. The method of claim 1, wherein the performing a gradient update on the current gradient according to a preset gaussian random rule to obtain an updated gradient of the model output data comprises:

acquiring a Gaussian random number;

5. The method according to claim 4, wherein the Gaussian random number is obtained under the conditions of a preset mean value and a preset standard deviation; the gradient updating condition comprises the distance between the Gaussian random number and the preset mean value, and the distance exceeds 3 times of the preset standard deviation; the value range of the gradient updating parameter is 0-1.

6. The method according to any one of claims 1 to 5, wherein the determining the update weight of the model to be trained according to the update gradient comprises:

7. The method according to claim 6, wherein the determining the update weight of the model to be trained according to the update gradient based on the gradient descent method comprises:

acquiring the current weight and the learning rate of the model to be trained;

and calculating the updating weight of the model to be trained according to the current weight, the learning rate, the updating gradient and the training sample data.

8. A model training apparatus, the apparatus comprising:

the benchmark result acquisition module is used for acquiring preset benchmark result data corresponding to the training sample data;

a current gradient determining module, configured to determine a current gradient of the model output data according to an output difference between the model output data and the reference result data when the model to be trained does not meet a training completion condition;

and the weight updating processing module is used for determining an updating weight of the model to be trained according to the updating gradient, updating the model to be trained according to the updating weight and returning to the step of acquiring model output data.

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.