WO2022110640A1

WO2022110640A1 - Model optimization method and apparatus, computer device and storage medium

Info

Publication number: WO2022110640A1
Application number: PCT/CN2021/090501
Authority: WO
Inventors: 莫琪
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-11-27
Filing date: 2021-04-28
Publication date: 2022-06-02
Also published as: CN112488183A; CN112488183B

Abstract

Provided are a model optimization method and apparatus applied to gradient descent with momentum, and a computer device and a storage medium. The method comprises: receiving a model optimization request sent by a user terminal, the model optimization request at least carrying an original prediction model and an original training data set (S101); performing a sampling operation on the original training data set to obtain a current round of training data set (S102); defining a target function on the basis of the current round of training data set (S103); initializing model optimization parameters of the original prediction model to obtain an initial speed parameter and an initial decision parameter (S104); calculating gradient data corresponding to the initial decision parameter that needs to be updated in the current round (S105); determining whether the gradient data has been updated (S106); if the gradient data has not been updated, outputting a sampling abnormality signal (S107); if the gradient data has been updated, updating the initial speed parameter on the basis of the gradient data to obtain an updated speed (S108); updating the initial decision parameter on the basis of the updated speed to obtain an updated decision parameter (S109); and when the initial decision parameter and the updated decision parameter satisfy a convergence condition, obtaining a target prediction model (S110). The problem of over-fitting of an embedding layer caused by the use of historical momentum to update the words that have not been sampled in the current batch during training can be effectively avoided.

Description

A model optimization method, device, computer equipment and storage medium

This application is based on the Chinese invention patent application with the application number 202011359384.8 filed on November 27, 2020, entitled "A Model Optimization Method, Apparatus, Computer Equipment and Storage Medium", and claims its priority.

technical field

The present application relates to model optimization of artificial intelligence, and in particular, to a model optimization method, device, computer equipment and storage medium applied to momentum gradient descent.

Background technique

Optimization problem is one of the most important research directions in computational mathematics. In the field of deep learning, optimization algorithms are also one of the key links. Even with the same data set and model architecture, different optimization algorithms are likely to lead to different training results, and even some models do not converge.

There is an existing model optimization method. In the model training process of deep learning, an exponentially weighted moving average is used to train the model based on the momentum accumulated with historical gradients, so as to improve the accuracy of the model.

However, the applicant realizes that the applicant finds that the traditional model optimization method is generally unintelligent, and the Embedding layer may have an overfitting problem during the model optimization process.

SUMMARY OF THE INVENTION

The purpose of the embodiments of the present application is to propose a model optimization method, device, computer equipment and storage medium applied to momentum gradient descent, so as to solve the problem that the traditional model optimization method will overfit the Embedding layer during the model optimization process. .

In order to solve the above technical problems, the embodiments of the present application provide a model optimization method applied to momentum gradient descent, which adopts the following technical solutions:

receiving a model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;

Perform a sampling operation in the original training data set to obtain the current round of training data sets;

Define an objective function based on the current round of training data sets;

Initializing the model optimization parameters of the original prediction model to obtain initial speed parameters and initial decision parameters;

The gradient data corresponding to the initial decision parameter needs to be updated in the calculation of the current round;

determine whether the gradient data has been updated;

If the gradient data is not updated, output a sampling abnormal signal;

If the gradient data has been updated, update the initial speed parameter based on the gradient data to obtain an update speed;

Update the initial decision parameters based on the update speed to obtain update decision parameters;

When the initial decision parameter and the updated decision parameter satisfy the convergence condition, a target prediction model is obtained.

In order to solve the above technical problems, the embodiment of the present application also provides a model optimization device applied to momentum gradient descent, which adopts the following technical solutions:

a request receiving module, configured to receive a model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;

a sampling operation module, used for sampling operation in the original training data set to obtain the training data set of this round;

a function definition module for defining an objective function based on the current round of training data sets;

an initialization module for initializing the model optimization parameters of the original prediction model to obtain initial speed parameters and initial decision-making parameters;

a gradient calculation module, used to calculate the gradient data corresponding to the initial decision parameter that needs to be updated in this round;

a gradient judgment module for judging whether the gradient data has been updated;

An abnormality confirmation module, used for outputting a sampling abnormality signal if the gradient data is not updated;

a speed parameter update module, configured to update the initial speed parameter based on the gradient data to obtain an update speed if the gradient data has been updated;

a decision parameter update module, configured to update the initial decision parameter based on the update speed to obtain an update decision parameter;

A target model obtaining module, configured to obtain a target prediction model when the initial decision parameters and the updated decision parameters satisfy a convergence condition.

In order to solve the above-mentioned technical problems, the embodiment of the present application also provides a computer device, which adopts the following technical solutions:

It includes a memory and a processor, the memory stores computer-readable instructions, and when the processor executes the computer-readable instructions, the processor implements the steps of the model optimization method applied to the momentum gradient descent as described below:

Define an objective function based on the current round of training data sets;

determine whether the gradient data has been updated;

If the gradient data is not updated, output a sampling abnormal signal;

In order to solve the above technical problems, the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical solutions:

The computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, implements the steps of the model optimization method applied to the momentum gradient descent as described below:

Define an objective function based on the current round of training data sets;

determine whether the gradient data has been updated;

If the gradient data is not updated, output a sampling abnormal signal;

Compared with the prior art, the model optimization method, device, computer equipment and storage medium applied to momentum gradient descent provided by the embodiments of the present application mainly have the following beneficial effects:

The present application provides a model optimization method applied to momentum gradient descent, which receives a model optimization request sent by a user terminal, where the model optimization request at least carries an original prediction model and an original training data set; Sampling operation to obtain the current round of training data sets; define an objective function based on the current round of training data sets; initialize model optimization algorithm parameters to obtain initial speed parameters and initial decision parameters; calculate the gradient corresponding to the initial decision parameters that need to be updated in this round data; determine whether the gradient data has been updated; if the gradient data has not been updated, output a sampling abnormal signal; if the gradient data has been updated, update the initial speed parameter based on the gradient data to obtain an update speed; The initial decision parameter is updated based on the update speed to obtain an updated decision parameter; when the initial decision parameter and the updated decision parameter satisfy a convergence condition, a target prediction model is obtained. During the training process of stochastic gradient descent with momentum, the training data of the current round is not sampled, and the gradient update of this round will still use historical momentum to update, which may lead to overfitting of the Embedding layer. Before the gradient, by confirming whether the gradient data has been updated, the training data of this round is confirmed to be sampled, and then the gradient update operation is performed, thereby effectively avoiding the words that have not been sampled in the current batch during training, and the history will still be used. Momentum to update causes the problem of overfitting of the Embedding layer.

Description of drawings

In order to illustrate the solutions in the present application more clearly, the following will briefly introduce the accompanying drawings used in the description of the embodiments of the present application. For those of ordinary skill, other drawings can also be obtained from these drawings without any creative effort.

Fig. 1 is the realization flow chart of the model optimization method applied to momentum gradient descent provided by the first embodiment of the present application;

Fig. 2 is the realization flow chart of step S103 in Fig. 1;

Fig. 3 is the realization flow chart of step S110 in Fig. 1;

4 is a schematic structural diagram of a model optimization device applied to momentum gradient descent provided by Embodiment 2 of the present application;

Fig. 5 is the structural representation of function definition module 103 in Fig. 4;

FIG. 6 is a schematic structural diagram of an embodiment of a computer device according to the present application.

Detailed ways

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field of this application; the terms used herein in the specification of the application are for the purpose of describing specific embodiments only It is not intended to limit the application; the terms "comprising" and "having" and any variations thereof in the description and claims of this application and the above description of the drawings are intended to cover non-exclusive inclusion. The terms "first", "second" and the like in the description and claims of the present application or the above drawings are used to distinguish different objects, rather than to describe a specific order.

Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

In order to make those skilled in the art better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the accompanying drawings.

Example 1

As shown in FIG. 1 , it shows the implementation flow chart of the model optimization method applied to the momentum gradient descent provided according to the first embodiment of the present application. For the convenience of description, only the part related to the present application is shown.

In step S101, a model optimization request sent by a user terminal is received, where the model optimization request at least carries the original prediction model and the original training data set.

In this embodiment of the present application, a user terminal refers to a terminal device used to execute the image processing method for preventing credential abuse provided by the present application, and the current terminal may be, for example, a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, Mobile terminals such as PDAs (Personal Digital Assistants), PADs (Tablet Computers), PMPs (Portable Multimedia Players), navigation devices, etc., as well as stationary terminals such as digital TVs, desktop computers, etc. The examples are only for the convenience of understanding, and are not used to limit the present application.

In the embodiment of the present application, the original prediction model is not a prediction model optimized by gradient descent.

In step S102, a sampling operation is performed in the original training data set to obtain the current round of training data sets.

In the embodiment of the present application, the sampling operation refers to the process of extracting individuals or samples from the overall training data, that is, the process of performing experiments or observations on the overall training data. There are two types of random sampling and non-random sampling. The former refers to a sampling method that draws samples from the population in accordance with the principle of randomization, without any subjectivity, including simple random sampling, systematic sampling, cluster sampling and stratified sampling. The latter is a method of extracting samples based on the researcher's point of view, experience or related knowledge, with obvious subjective color.

In the embodiment of the present application, the training data set of the current round refers to a training data set with a small amount of data selected after the above sampling operation, so as to reduce the training time of the model.

In step S103, an objective function is defined based on the current round of training data sets.

In this embodiment of the present application, a user-text matrix R may be generated based on a data set of user texts, and the user-text matrix R may be decomposed based on the singular value decomposition method to obtain a user-hidden feature matrix P and a latent feature-text matrix Q. , construct the objective function based on the user-text matrix R

objective function

Expressed as:

Among them, R ^(Λ) represents the user-text matrix R user's scoring data set of text; p _m` represents the latent feature corresponding to the mth user in the user-hidden feature matrix P; q _n` represents the latent feature-text matrix Q The hidden feature corresponding to the n-th text in ; r _m,n represents the rating data of user m for text n;

Represents the rating data of user m to text n in the rating data set; λ ₂ represents the regularization factor of the latent feature matrix.

In step S104, the model optimization parameters of the original prediction model are initialized to obtain initial speed parameters and initial decision parameters.

In the embodiment of the present application, initialization is to assign a variable to a default value and a control to a default state. Specifically, it includes an initialization learning rate ∈, a momentum parameter α, an initial decision parameter θ, and an initial velocity v.

In step S105, the gradient data corresponding to the initial decision parameters that need to be updated in the current round is calculated.

In the embodiment of the present application, the gradient data is expressed as:

Among them, g represents the gradient data; m represents the total number of training data in this round; θ represents the initial decision parameter; x ⁽ⁱ⁾ represents the i-th training data in this round;

represents the objective function.

In step S106, it is determined whether the gradient data has been updated.

In the embodiment of the present application, after a training data is sampled, the gradient of its Embedding is not 0. Based on the characteristics of the sampling, it is possible to know whether the training data has been sampled by judging whether the gradient data has been updated.

In step S107, if the gradient data is not updated, a sampling abnormal signal is output.

In the embodiment of the present application, if the gradient data has not been updated, it means that the training data has not been sampled before performing subsequent update operations, and there is no training data that has been repeatedly sampled, and the corresponding Embedding layer will also be repeatedly trained based on historical momentum. Update, resulting in overfitting.

In step S108, if the gradient data has been updated, the initial speed parameter is updated based on the gradient data to obtain the update speed.

In the embodiment of the present application, the update speed is expressed as:

v _new = αv _old -∈g

Among them, v _new represents the update speed; v _old represents the initial speed parameter; α represents the momentum parameter; ∈ represents the learning rate; g represents the gradient data.

In step S109, the initial decision parameter is updated based on the update speed to obtain the update decision parameter.

In the embodiment of the present application, the update decision parameter is expressed as:

θ _new = θ _old +v _new

Among them, θ _new represents the update decision parameter; θ _old represents the initial decision parameter; v _new represents the update speed.

In step S110, when the initial decision parameters and the updated decision parameters satisfy the convergence condition, a target prediction model is obtained.

The model optimization method applied to momentum gradient descent provided by the first embodiment of the present application receives a model optimization request sent by a user terminal, and the model optimization request carries at least the original prediction model and the original training data set; the sampling operation is performed in the original training data set, Obtain the training data set of this round; define the objective function based on the training data set of this round; initialize the parameters of the model optimization algorithm to obtain the initial speed parameters and initial decision parameters; calculate the gradient data corresponding to the initial decision parameters that need to be updated in this round; judge whether the gradient data has been Update; if the gradient data is not updated, the sampling abnormal signal is output; if the gradient data has been updated, the initial speed parameter is updated based on the gradient data to obtain the update speed; the initial decision parameter is updated based on the update speed, and the updated decision parameter is obtained; when the initial decision parameter And when the updated decision parameters meet the convergence conditions, the target prediction model is obtained. During the training process of stochastic gradient descent with momentum, the training data of the current round is not sampled, and the gradient update of this round will still use historical momentum to update, which may lead to overfitting of the Embedding layer. Before the gradient, confirm whether the gradient data has been updated, so as to confirm that the training data of this round is definitely sampled, and then perform the gradient update operation, thereby effectively avoiding the words that have not been sampled in the current batch during training, and still use the history. Momentum to update causes the problem of overfitting of the Embedding layer.

Continuing to refer to FIG. 2 , a flowchart of the implementation of step S103 in FIG. 1 is shown. For the convenience of description, only the parts related to the present application are shown.

In some optional implementation manners of Embodiment 1 of the present application, the foregoing step S103 specifically includes: step S201 , step S202 and step S203 .

In step S201, a user-text matrix R is generated based on a data set of user texts.

In step S202, the user-text matrix R is decomposed based on the singular value decomposition method to obtain the user-hidden feature matrix P and the latent feature-text matrix Q.

In the embodiments of the present application, singular value decomposition (Singular Value Decomposition) is an important matrix decomposition in linear algebra, and singular value decomposition is a generalization of eigen decomposition on any matrix.

In step S203, an objective function is constructed based on the user-text matrix R.

In this embodiment of the present application, the objective function

Expressed as:

Continuing to refer to FIG. 3 , a flowchart of the implementation of step S110 in FIG. 1 is shown. For the convenience of description, only the parts related to the present application are shown.

In some optional implementation manners of Embodiment 1 of the present application, the foregoing step S110 specifically includes: step S301 , step S302 , step S303 and step S304 .

In step S301, the initial decision parameter and the decision parameter difference of the updated decision parameter are calculated.

In the embodiment of the present application, the difference value of the decision parameter is mainly used to judge the change amount of the current model parameter and the model parameter of the previous round. When the change amount is less than a certain value, it is considered that the decision parameter tends to a certain stable value, so that the The predictive model reaches stability.

In step S302, it is determined whether the decision parameter difference is smaller than a preset convergence threshold.

In this embodiment of the present application, the user can adjust the preset convergence threshold according to the actual situation.

In step S303, if the decision parameter difference is less than or equal to the preset convergence threshold, it is determined that the current prediction model is converged, and the current prediction model is used as the target prediction model.

In the embodiment of the present application, when the decision parameter difference is less than or equal to the preset convergence threshold, it means that the decision parameter tends to a certain stable value, and the prediction model is stable.

In step S304, if the difference of the decision parameters is greater than the preset convergence threshold, it is determined that the current prediction model has not converged, and the parameter optimization operation is continued.

In the embodiment of the present application, when the difference of the decision parameters is greater than the preset convergence threshold, it means that the decision parameters have not reached a certain stable value, and the parameters of the prediction model still need to be optimized.

In some optional implementations of Embodiment 1 of the present application, the gradient data is represented as:

represents the objective function.

In some optional implementation manners of Embodiment 1 of the present application, the update speed is expressed as:

v _new = αv _old -∈g

In some optional implementations of Embodiment 1 of the present application, the update decision parameter is expressed as:

θ _new = θ _old +v _new

To sum up, the model optimization method applied to momentum gradient descent provided by the first embodiment of the present application receives a model optimization request sent by a user terminal, and the model optimization request at least carries the original prediction model and the original training data set; Sampling operation to obtain the training data set of this round; define the objective function based on the training data set of this round; initialize the parameters of the model optimization algorithm to obtain the initial speed parameters and initial decision parameters; calculate the gradient data corresponding to the initial decision parameters that need to be updated in this round; judge the gradient Whether the data has been updated; if the gradient data has not been updated, output a sampling abnormal signal; if the gradient data has been updated, update the initial speed parameter based on the gradient data to obtain the update speed; update the initial decision parameter based on the update speed to obtain the update decision parameter; when When the initial decision parameters and the updated decision parameters satisfy the convergence conditions, the target prediction model is obtained. During the training process of stochastic gradient descent with momentum, the training data of the current round is not sampled, and the gradient update of this round will still use historical momentum to update, which may lead to overfitting of the Embedding layer. Before the gradient, by confirming whether the gradient data has been updated, the training data of this round is confirmed to be sampled, and then the gradient update operation is performed, thereby effectively avoiding the words that have not been sampled in the current batch during training, and the history will still be used. Momentum to update causes the problem of overfitting of the Embedding layer.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a computer-readable storage medium. , when the computer-readable instructions are executed, the processes of the above-mentioned method embodiments may be included. Wherein, the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.

It should be understood that although the various steps in the flowchart of the accompanying drawings are sequentially shown in the order indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order and may be performed in other orders. Moreover, at least a part of the steps in the flowcharts of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the execution sequence is also It does not have to be performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of sub-steps or stages of other steps.

Embodiment 2

Further referring to FIG. 4 , as an implementation of the method shown in FIG. 1 , the present application provides an embodiment of a model optimization device applied to momentum gradient descent, which is similar to the method embodiment shown in FIG. 1 . Correspondingly, the apparatus can be specifically applied to various electronic devices.

As shown in FIG. 4 , the model optimization device 100 applied to momentum gradient descent in this embodiment includes: a request receiving module 101 , a sampling operation module 102 , a function definition module 103 , an initialization module 104 , a gradient calculation module 105 , and a gradient judgment module 106 , an abnormality confirmation module 107 , a speed parameter update module 108 , a decision parameter update module 109 and a target model acquisition module 110 . in:

a request receiving module 101, configured to receive a model optimization request sent by a user terminal, where the model optimization request at least carries the original prediction model and the original training data set;

The sampling operation module 102 is used to perform sampling operation in the original training data set to obtain the training data set of this round;

The function definition module 103 is used to define an objective function based on the current round of training data sets;

The initialization module 104 is used to initialize the model optimization parameters of the original prediction model to obtain initial speed parameters and initial decision parameters;

The gradient calculation module 105 is used to calculate the gradient data corresponding to the initial decision parameter that needs to be updated in this round;

The gradient judgment module 106 is used for judging whether the gradient data has been updated;

An abnormality confirmation module 107, configured to output a sampling abnormality signal if the gradient data is not updated;

a speed parameter update module 108, configured to update the initial speed parameter based on the gradient data to obtain the update speed if the gradient data has been updated;

A decision parameter update module 109, configured to update the initial decision parameter based on the update speed to obtain the updated decision parameter;

The target model obtaining module 110 is configured to obtain the target prediction model when the initial decision parameters and the updated decision parameters satisfy the convergence condition.

objective function

Expressed as:

represents the objective function.

In the embodiment of the present application, when a training data is sampled, the gradient of its Embedding is not 0. Based on the characteristics of the sampling, it can be known whether the training data has been sampled by judging whether the gradient data has been updated.

In the embodiment of the present application, the update speed is expressed as:

v _new = αv _old -∈g

θ _new = θ _old +v _new

In the model optimization device applied to momentum gradient descent provided by the second embodiment of the present application, since the stochastic gradient descent with momentum is in the training process, the training data of the current round has not been sampled, and the gradient update of this round will still use the history Momentum is used to update, which may lead to overfitting of the Embedding layer. Before updating the gradient, this application confirms whether the gradient data has been updated to confirm that the training data of this round is definitely sampled, and then performs the gradient update operation, thereby effectively avoiding Words that have not been sampled in the current batch during training will still use historical momentum to update the problem of overfitting the Embedding layer.

Continuing to refer to FIG. 5 , a schematic structural diagram of the function definition module 103 in FIG. 4 is shown. For the convenience of description, only the parts related to the present application are shown.

In some optional implementations of Embodiment 1 of the present application, the function definition module 103 specifically includes: a matrix generation submodule 1031 , a matrix decomposition submodule 1032 , and a function construction submodule 1033 . in:

a matrix generation submodule 1031, configured to generate a user-text matrix based on a data set of user texts;

The matrix decomposition submodule 1032 is configured to perform a decomposition operation on the user-text matrix based on the singular value decomposition method to obtain the user-hidden feature matrix and the latent feature-text matrix;

The function construction sub-module 1033 is used to construct an objective function based on the user-text matrix.

In this embodiment of the present application, the objective function

Expressed as:

In some optional implementations of the second embodiment of the present application, the gradient data is represented as:

represents the objective function.

In some optional implementations of the second embodiment of the present application, the update speed is expressed as:

v _new = αv _old -∈g

In some optional implementations of the second embodiment of the present application, the update decision parameter is expressed as:

θ _new = θ _old +v _new

In some implementations of the second embodiment of the present application, the target model obtaining module 110 specifically includes: a difference calculation submodule, a convergence judgment submodule, a convergence confirmation submodule, and a non-convergence confirmation submodule. in:

a difference calculation submodule, configured to calculate the difference between the initial decision parameter and the decision parameter of the updated decision parameter;

a convergence judgment submodule, configured to judge whether the decision parameter difference is less than the preset convergence threshold;

a convergence confirmation submodule, configured to determine that the current prediction model is converged if the decision parameter difference is less than or equal to the preset convergence threshold, and use the current prediction model as the target prediction model;

The non-convergence confirmation sub-module is configured to determine that the current prediction model is not converged and continue to perform the parameter optimization operation if the decision parameter difference is greater than the preset convergence threshold.

To sum up, the model optimization device applied to momentum gradient descent provided by the second embodiment of the present application includes: a request receiving module, configured to receive a model optimization request sent by a user terminal, where the model optimization request at least carries the original prediction model and original training data set; the sampling operation module is used to perform sampling operation in the original training data set to obtain the current round of training data set; the function definition module is used to define the objective function based on the current round of training data set; the initialization module is used to initialize the original prediction model The model optimizes the parameters to obtain the initial speed parameters and the initial decision parameters; the gradient calculation module is used to calculate the gradient data corresponding to the initial decision parameters that need to be updated in this round; the gradient judgment module is used to judge whether the gradient data has been updated; the abnormal confirmation module is used to If the gradient data has not been updated, the sampling abnormal signal will be output; the speed parameter update module is used to update the initial speed parameter based on the gradient data if the gradient data has been updated to obtain the update speed; the decision parameter update module is used to update based on the update speed. The initial decision parameters are used to obtain the updated decision parameters; the target model acquisition module is used to obtain the target prediction model when the initial decision parameters and the updated decision parameters satisfy the convergence condition. During the training process of stochastic gradient descent with momentum, the training data of the current round is not sampled, and the gradient update of this round will still use historical momentum to update, which may lead to overfitting of the Embedding layer. Before the gradient, by confirming whether the gradient data has been updated, the training data of this round is confirmed to be sampled, and then the gradient update operation is performed, thereby effectively avoiding the words that have not been sampled in the current batch during training, and the history will still be used. Momentum to update causes the problem of overfitting of the Embedding layer.

To solve the above technical problems, the embodiments of the present application also provide computer equipment. Please refer to FIG. 6 for details. FIG. 6 is a block diagram of the basic structure of a computer device according to this embodiment.

The computer device 200 includes a memory 210 , a processor 220 , and a network interface 230 that communicate with each other through a system bus. It should be noted that only the computer device 200 with components 210-230 is shown in the figure, but it should be understood that implementation of all of the shown components is not required, and more or less components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.

The computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment. The computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.

The memory 210 includes at least one type of readable storage medium, including flash memory, hard disk, multimedia card, card-type memory (eg, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), magnetic memory, magnetic disks, optical disks, etc., the computer readable storage Media can be non-volatile or volatile. In some embodiments, the memory 210 may be an internal storage unit of the computer device 200 , such as a hard disk or a memory of the computer device 200 . In other embodiments, the memory 210 may also be an external storage device of the computer device 200, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Of course, the memory 210 may also include both the internal storage unit of the computer device 200 and its external storage device. In this embodiment, the memory 210 is generally used to store the operating system and various application software installed in the computer device 200, such as computer-readable instructions applied to the model optimization method of momentum gradient descent. In addition, the memory 210 can also be used to temporarily store various types of data that have been output or will be output.

The processor 220 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. The processor 220 is typically used to control the overall operation of the computer device 200 . In this embodiment, the processor 220 is configured to execute the computer-readable instructions stored in the memory 210 or process data, for example, the computer-readable instructions for executing the model optimization method applied to momentum gradient descent.

The network interface 230 may include a wireless network interface or a wired network interface, and the network interface 230 is generally used to establish a communication connection between the computer device 200 and other electronic devices.

In the model optimization method applied to momentum gradient descent provided by this application, since the stochastic gradient descent with momentum is in the training process, the training data of the current round is not sampled, and the gradient update of this round will still use historical momentum to update , which may lead to overfitting of the Embedding layer. Before updating the gradient, this application confirms whether the gradient data has been updated to confirm that the training data of this round is definitely sampled, and then performs the gradient update operation, thereby effectively avoiding the training process. Words that have not been sampled in the current batch will still use historical momentum to update the problem of overfitting of the Embedding layer.

The present application also provides another embodiment, that is, to provide a computer-readable storage medium, where the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to perform the steps of the model optimization method applied to momentum gradient descent as described above.

From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.

Obviously, the above-described embodiments are only a part of the embodiments of the present application, rather than all of the embodiments. The accompanying drawings show the preferred embodiments of the present application, but do not limit the scope of the patent of the present application. This application may be embodied in many different forms, rather these embodiments are provided so that a thorough and complete understanding of the disclosure of this application is provided. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or perform equivalent replacements for some of the technical features. . Any equivalent structure made by using the contents of the description and drawings of the present application, which is directly or indirectly used in other related technical fields, is also within the scope of protection of the patent of the present application.

Claims

A model optimization method applied to momentum gradient descent, which includes the following steps:

receiving a model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;

Perform a sampling operation in the original training data set to obtain the current round of training data sets;

Define an objective function based on the current round of training data sets;

Initializing the model optimization parameters of the original prediction model to obtain initial speed parameters and initial decision parameters;

The gradient data corresponding to the initial decision parameter needs to be updated in the calculation of the current round;

determine whether the gradient data has been updated;

If the gradient data is not updated, output a sampling abnormal signal;

If the gradient data has been updated, update the initial speed parameter based on the gradient data to obtain an update speed;

Update the initial decision parameters based on the update speed to obtain update decision parameters;

When the initial decision parameter and the updated decision parameter satisfy the convergence condition, a target prediction model is obtained.
The model optimization method applied to momentum gradient descent according to claim 1, wherein the current round of training data set includes a data set of user text, and the step of defining an objective function based on the current round of training data set, specifically include:

generating a user-text matrix based on the dataset of user texts;

Decomposing the user-text matrix based on the singular value decomposition method to obtain a user-hidden feature matrix and a latent feature-text matrix;

The objective function is constructed based on the user-text matrix, and the objective function RSE R(Λ) is expressed as:

Among them, R (Λ) represents the user-text matrix R user's scoring data set of text; p m` represents the latent feature corresponding to the mth user in the user-hidden feature matrix P; q n` represents the latent feature-text matrix Q The hidden feature corresponding to the n-th text in ; r m,n represents the rating data of user m for text n;
Represents the rating data of user m to text n in the rating data set; λ 2 represents the regularization factor of the latent feature matrix.
The model optimization method applied to momentum gradient descent according to claim 1, wherein the gradient data is expressed as:

Among them, g represents the gradient data; m represents the total number of the current round of training data; θ represents the initial decision parameter; x (i) represents the i-th current round of training data; RSE R(Λ) represents the objective function.
The model optimization method applied to momentum gradient descent according to claim 3, wherein the update speed is expressed as:

v new = αv old -∈g

Wherein, v new represents the update speed; v old represents the initial speed parameter; α represents the momentum parameter; ∈ represents the learning rate; g represents the gradient data.
The model optimization method applied to momentum gradient descent according to claim 1, wherein the update decision parameter is expressed as:

θ new = θ old +v new

Wherein, θ new represents the update decision parameter; θ old represents the initial decision parameter; v new represents the update speed.
The model optimization method applied to momentum gradient descent according to claim 5, wherein the convergence condition is a preset convergence threshold; the target is obtained when the initial decision parameter and the updated decision parameter satisfy the convergence condition The steps of the prediction model, including:

calculating the decision parameter difference between the initial decision parameter and the updated decision parameter;

judging whether the decision parameter difference is less than the preset convergence threshold;

If the decision parameter difference is less than or equal to the preset convergence threshold, determine that the current prediction model is converged, and use the current prediction model as the target prediction model;

If the decision parameter difference is greater than the preset convergence threshold, it is determined that the current prediction model has not converged, and the parameter optimization operation is continued.
A model optimization device applied to momentum gradient descent, comprising:

a request receiving module, configured to receive a model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;

a sampling operation module, used for sampling operation in the original training data set to obtain the training data set of this round;

a function definition module for defining an objective function based on the current round of training data sets;

an initialization module for initializing the model optimization parameters of the original prediction model to obtain initial speed parameters and initial decision parameters;

a gradient calculation module, used to calculate the gradient data corresponding to the initial decision parameter that needs to be updated in this round;

a gradient judgment module for judging whether the gradient data has been updated;

An abnormality confirmation module, used for outputting a sampling abnormality signal if the gradient data is not updated;

a speed parameter update module, configured to update the initial speed parameter based on the gradient data to obtain an update speed if the gradient data has been updated;

a decision parameter update module, configured to update the initial decision parameter based on the update speed to obtain an update decision parameter;

A target model obtaining module, configured to obtain a target prediction model when the initial decision parameters and the updated decision parameters satisfy a convergence condition.
The model optimization device applied to momentum gradient descent according to claim 7, wherein the function definition module comprises:

a matrix generation submodule for generating a user-text matrix based on the data set of the user text;

a matrix decomposition submodule, configured to perform a decomposition operation on the user-text matrix based on the singular value decomposition method to obtain a user-hidden feature matrix and a latent feature-text matrix;

A function construction submodule for constructing an objective function based on the user-text matrix, and the objective function RSER (Λ) is expressed as:

Among them, R (Λ) represents the user-text matrix R user's scoring data set of text; p m` represents the latent feature corresponding to the mth user in the user-hidden feature matrix P; q n` represents the latent feature-text matrix Q The hidden feature corresponding to the n-th text in ; r m, n represents the rating data of user m for text n;
Represents the rating data of user m to text n in the rating data set; λ 2 represents the regularization factor of the latent feature matrix.
A computer device comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, the processor implements the steps of a model optimization method applied to momentum gradient descent as described below :

receiving a model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;

Perform a sampling operation in the original training data set to obtain the current round of training data sets;

Define an objective function based on the current round of training data sets;

Initializing the model optimization parameters of the original prediction model to obtain initial speed parameters and initial decision parameters;

The gradient data corresponding to the initial decision parameter needs to be updated in the calculation of the current round;

determine whether the gradient data has been updated;

If the gradient data is not updated, output a sampling abnormal signal;

If the gradient data has been updated, update the initial speed parameter based on the gradient data to obtain an update speed;

Update the initial decision parameters based on the update speed to obtain update decision parameters;

When the initial decision parameter and the updated decision parameter satisfy the convergence condition, a target prediction model is obtained.
The computer device according to claim 9, wherein the current round of training data set includes a data set of user text, and the step of defining an objective function based on the current round of training data set specifically includes:

generating a user-text matrix based on the dataset of user texts;

Decomposing the user-text matrix based on the singular value decomposition method to obtain a user-hidden feature matrix and a latent feature-text matrix;

The objective function is constructed based on the user-text matrix, and the objective function RSE R(Λ) is expressed as:

Among them, R (Λ) represents the user-text matrix R user's scoring data set of text; p m` represents the latent feature corresponding to the mth user in the user-hidden feature matrix P; q n` represents the latent feature-text matrix Q The hidden feature corresponding to the n-th text in ; r m,n represents the rating data of user m for text n;
Represents the rating data of user m to text n in the rating data set; λ 2 represents the regularization factor of the latent feature matrix.
The computer device of claim 9, wherein the gradient data is represented as:

Among them, g represents the gradient data; m represents the total number of the current round of training data; θ represents the initial decision parameter; x (i) represents the i-th current round of training data; RSE R(Λ) represents the objective function.
The computer device of claim 11, wherein the update rate is expressed as:

v new = αv old -∈g

Wherein, v new represents the update speed; v old represents the initial speed parameter; α represents the momentum parameter; ∈ represents the learning rate; g represents the gradient data.
The computer device according to claim 9, wherein the update decision parameter is expressed as:

θ new = θ old +v new

Wherein, θ new represents the update decision parameter; θ old represents the initial decision parameter; v new represents the update speed.
The computer device according to claim 13, wherein the convergence condition is a preset convergence threshold; the step of obtaining the target prediction model when the initial decision parameter and the updated decision parameter satisfy the convergence condition, specifically comprises: :

calculating the decision parameter difference between the initial decision parameter and the updated decision parameter;

judging whether the decision parameter difference is less than the preset convergence threshold;

If the decision parameter difference is less than or equal to the preset convergence threshold, determine that the current prediction model is converged, and use the current prediction model as the target prediction model;

If the decision parameter difference is greater than the preset convergence threshold, it is determined that the current prediction model has not converged, and the parameter optimization operation is continued.
A computer-readable storage medium, wherein computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the following model optimization method applied to momentum gradient descent is implemented A step of:

receiving a model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;

Perform a sampling operation in the original training data set to obtain the current round of training data sets;

Define an objective function based on the current round of training data sets;

Initializing the model optimization parameters of the original prediction model to obtain initial speed parameters and initial decision parameters;

The gradient data corresponding to the initial decision parameter needs to be updated in the calculation of the current round;

determine whether the gradient data has been updated;

If the gradient data is not updated, output a sampling abnormal signal;

If the gradient data has been updated, update the initial speed parameter based on the gradient data to obtain an update speed;

Update the initial decision parameters based on the update speed to obtain update decision parameters;

When the initial decision parameter and the updated decision parameter satisfy the convergence condition, a target prediction model is obtained.
The computer-readable storage medium according to claim 15, wherein the current round of training data set includes a data set of user text, and the step of defining an objective function based on the current round of training data set specifically includes:

generating a user-text matrix based on the dataset of user texts;

Decomposing the user-text matrix based on the singular value decomposition method to obtain a user-hidden feature matrix and a latent feature-text matrix;

The objective function is constructed based on the user-text matrix, and the objective function RSE R(Λ) is expressed as:

Among them, R (Λ) represents the user-text matrix R user's scoring data set of text; p m` represents the latent feature corresponding to the mth user in the user-hidden feature matrix P; q n` represents the latent feature-text matrix Q The hidden feature corresponding to the n-th text in ; r m, n represents the rating data of user m for text n;
Represents the rating data of user m to text n in the rating data set; λ 2 represents the regularization factor of the latent feature matrix.
The computer-readable storage medium of claim 15, wherein the gradient data is represented as:

Among them, g represents the gradient data; m represents the total number of the current round of training data; θ represents the initial decision parameter; x (i) represents the i-th current round of training data; RSE R(Λ) represents the objective function.
The computer-readable storage medium of claim 17, wherein the update rate is expressed as:

v new = αv old -∈g

Wherein, v new represents the update speed; v old represents the initial speed parameter; α represents the momentum parameter; ∈ represents the learning rate; g represents the gradient data.
The computer-readable storage medium of claim 15, wherein the update decision parameter is represented as:

θ new = θ old +v new

Wherein, θ new represents the update decision parameter; θ old represents the initial decision parameter; v new represents the update speed.
The computer-readable storage medium according to claim 19, wherein the convergence condition is a preset convergence threshold; the step of obtaining a target prediction model when the initial decision parameter and the updated decision parameter satisfy the convergence condition , including:

calculating the decision parameter difference between the initial decision parameter and the updated decision parameter;

judging whether the decision parameter difference is less than the preset convergence threshold;

If the decision parameter difference is less than or equal to the preset convergence threshold, determine that the current prediction model is converged, and use the current prediction model as the target prediction model;

If the decision parameter difference is greater than the preset convergence threshold, it is determined that the current prediction model has not converged, and the parameter optimization operation is continued.