WO2022110640A1 - Procédé et appareil d'optimisation de modèle, dispositif informatique et support de stockage - Google Patents

Procédé et appareil d'optimisation de modèle, dispositif informatique et support de stockage Download PDF

Info

Publication number
WO2022110640A1
WO2022110640A1 PCT/CN2021/090501 CN2021090501W WO2022110640A1 WO 2022110640 A1 WO2022110640 A1 WO 2022110640A1 CN 2021090501 W CN2021090501 W CN 2021090501W WO 2022110640 A1 WO2022110640 A1 WO 2022110640A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
update
updated
decision parameter
gradient
Prior art date
Application number
PCT/CN2021/090501
Other languages
English (en)
Chinese (zh)
Inventor
莫琪
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022110640A1 publication Critical patent/WO2022110640A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present application relates to model optimization of artificial intelligence, and in particular, to a model optimization method, device, computer equipment and storage medium applied to momentum gradient descent.
  • optimization problem is one of the most important research directions in computational mathematics. In the field of deep learning, optimization algorithms are also one of the key links. Even with the same data set and model architecture, different optimization algorithms are likely to lead to different training results, and even some models do not converge.
  • the applicant realizes that the applicant finds that the traditional model optimization method is generally unintelligent, and the Embedding layer may have an overfitting problem during the model optimization process.
  • the purpose of the embodiments of the present application is to propose a model optimization method, device, computer equipment and storage medium applied to momentum gradient descent, so as to solve the problem that the traditional model optimization method will overfit the Embedding layer during the model optimization process. .
  • the embodiments of the present application provide a model optimization method applied to momentum gradient descent, which adopts the following technical solutions:
  • model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;
  • the gradient data corresponding to the initial decision parameter needs to be updated in the calculation of the current round
  • the embodiment of the present application also provides a model optimization device applied to momentum gradient descent, which adopts the following technical solutions:
  • a request receiving module configured to receive a model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;
  • a sampling operation module used for sampling operation in the original training data set to obtain the training data set of this round
  • a function definition module for defining an objective function based on the current round of training data sets
  • an initialization module for initializing the model optimization parameters of the original prediction model to obtain initial speed parameters and initial decision-making parameters
  • a gradient calculation module used to calculate the gradient data corresponding to the initial decision parameter that needs to be updated in this round
  • a gradient judgment module for judging whether the gradient data has been updated
  • An abnormality confirmation module used for outputting a sampling abnormality signal if the gradient data is not updated
  • a speed parameter update module configured to update the initial speed parameter based on the gradient data to obtain an update speed if the gradient data has been updated
  • a decision parameter update module configured to update the initial decision parameter based on the update speed to obtain an update decision parameter
  • a target model obtaining module configured to obtain a target prediction model when the initial decision parameters and the updated decision parameters satisfy a convergence condition.
  • the embodiment of the present application also provides a computer device, which adopts the following technical solutions:
  • the memory stores computer-readable instructions
  • the processor executes the computer-readable instructions, the processor implements the steps of the model optimization method applied to the momentum gradient descent as described below:
  • model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;
  • the gradient data corresponding to the initial decision parameter needs to be updated in the calculation of the current round
  • the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical solutions:
  • the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, implements the steps of the model optimization method applied to the momentum gradient descent as described below:
  • model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;
  • the gradient data corresponding to the initial decision parameter needs to be updated in the calculation of the current round
  • the model optimization method, device, computer equipment and storage medium applied to momentum gradient descent provided by the embodiments of the present application mainly have the following beneficial effects:
  • the present application provides a model optimization method applied to momentum gradient descent, which receives a model optimization request sent by a user terminal, where the model optimization request at least carries an original prediction model and an original training data set; Sampling operation to obtain the current round of training data sets; define an objective function based on the current round of training data sets; initialize model optimization algorithm parameters to obtain initial speed parameters and initial decision parameters; calculate the gradient corresponding to the initial decision parameters that need to be updated in this round data; determine whether the gradient data has been updated; if the gradient data has not been updated, output a sampling abnormal signal; if the gradient data has been updated, update the initial speed parameter based on the gradient data to obtain an update speed; The initial decision parameter is updated based on the update speed to obtain an updated decision parameter; when the initial decision parameter and the updated decision parameter satisfy a convergence condition, a target prediction model is obtained.
  • the training data of the current round is not sampled, and the gradient update of this round will still use historical momentum to update, which may lead to overfitting of the Embedding layer.
  • the training data of this round is confirmed to be sampled, and then the gradient update operation is performed, thereby effectively avoiding the words that have not been sampled in the current batch during training, and the history will still be used.
  • Momentum to update causes the problem of overfitting of the Embedding layer.
  • Fig. 1 is the realization flow chart of the model optimization method applied to momentum gradient descent provided by the first embodiment of the present application;
  • Fig. 2 is the realization flow chart of step S103 in Fig. 1;
  • Fig. 3 is the realization flow chart of step S110 in Fig. 1;
  • Embodiment 4 is a schematic structural diagram of a model optimization device applied to momentum gradient descent provided by Embodiment 2 of the present application;
  • Fig. 5 is the structural representation of function definition module 103 in Fig. 4;
  • FIG. 6 is a schematic structural diagram of an embodiment of a computer device according to the present application.
  • FIG. 1 shows the implementation flow chart of the model optimization method applied to the momentum gradient descent provided according to the first embodiment of the present application. For the convenience of description, only the part related to the present application is shown.
  • step S101 a model optimization request sent by a user terminal is received, where the model optimization request at least carries the original prediction model and the original training data set.
  • a user terminal refers to a terminal device used to execute the image processing method for preventing credential abuse provided by the present application
  • the current terminal may be, for example, a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, Mobile terminals such as PDAs (Personal Digital Assistants), PADs (Tablet Computers), PMPs (Portable Multimedia Players), navigation devices, etc., as well as stationary terminals such as digital TVs, desktop computers, etc.
  • PDAs Personal Digital Assistants
  • PADs Tablett Computers
  • PMPs Portable Multimedia Players
  • navigation devices etc.
  • stationary terminals such as digital TVs, desktop computers, etc.
  • the examples are only for the convenience of understanding, and are not used to limit the present application.
  • the original prediction model is not a prediction model optimized by gradient descent.
  • step S102 a sampling operation is performed in the original training data set to obtain the current round of training data sets.
  • the sampling operation refers to the process of extracting individuals or samples from the overall training data, that is, the process of performing experiments or observations on the overall training data.
  • the former refers to a sampling method that draws samples from the population in accordance with the principle of randomization, without any subjectivity, including simple random sampling, systematic sampling, cluster sampling and stratified sampling.
  • the latter is a method of extracting samples based on the researcher's point of view, experience or related knowledge, with obvious subjective color.
  • the training data set of the current round refers to a training data set with a small amount of data selected after the above sampling operation, so as to reduce the training time of the model.
  • step S103 an objective function is defined based on the current round of training data sets.
  • a user-text matrix R may be generated based on a data set of user texts, and the user-text matrix R may be decomposed based on the singular value decomposition method to obtain a user-hidden feature matrix P and a latent feature-text matrix Q.
  • construct the objective function based on the user-text matrix R objective function Expressed as:
  • R ( ⁇ ) represents the user-text matrix R user's scoring data set of text
  • p m ⁇ represents the latent feature corresponding to the mth user in the user-hidden feature matrix P
  • q n ⁇ represents the latent feature-text matrix Q
  • the hidden feature corresponding to the n-th text in ; r m,n represents the rating data of user m for text n; Represents the rating data of user m to text n in the rating data set;
  • ⁇ 2 represents the regularization factor of the latent feature matrix.
  • step S104 the model optimization parameters of the original prediction model are initialized to obtain initial speed parameters and initial decision parameters.
  • initialization is to assign a variable to a default value and a control to a default state. Specifically, it includes an initialization learning rate ⁇ , a momentum parameter ⁇ , an initial decision parameter ⁇ , and an initial velocity v.
  • step S105 the gradient data corresponding to the initial decision parameters that need to be updated in the current round is calculated.
  • the gradient data is expressed as:
  • g represents the gradient data
  • m represents the total number of training data in this round
  • represents the initial decision parameter
  • x (i) represents the i-th training data in this round
  • step S106 it is determined whether the gradient data has been updated.
  • the gradient of its Embedding is not 0. Based on the characteristics of the sampling, it is possible to know whether the training data has been sampled by judging whether the gradient data has been updated.
  • step S107 if the gradient data is not updated, a sampling abnormal signal is output.
  • the gradient data has not been updated, it means that the training data has not been sampled before performing subsequent update operations, and there is no training data that has been repeatedly sampled, and the corresponding Embedding layer will also be repeatedly trained based on historical momentum. Update, resulting in overfitting.
  • step S108 if the gradient data has been updated, the initial speed parameter is updated based on the gradient data to obtain the update speed.
  • the update speed is expressed as:
  • v new represents the update speed
  • v old represents the initial speed parameter
  • represents the momentum parameter
  • represents the learning rate
  • g represents the gradient data.
  • step S109 the initial decision parameter is updated based on the update speed to obtain the update decision parameter.
  • the update decision parameter is expressed as:
  • ⁇ new represents the update decision parameter
  • ⁇ old represents the initial decision parameter
  • v new represents the update speed
  • step S110 when the initial decision parameters and the updated decision parameters satisfy the convergence condition, a target prediction model is obtained.
  • the model optimization method applied to momentum gradient descent receives a model optimization request sent by a user terminal, and the model optimization request carries at least the original prediction model and the original training data set; the sampling operation is performed in the original training data set, Obtain the training data set of this round; define the objective function based on the training data set of this round; initialize the parameters of the model optimization algorithm to obtain the initial speed parameters and initial decision parameters; calculate the gradient data corresponding to the initial decision parameters that need to be updated in this round; judge whether the gradient data has been Update; if the gradient data is not updated, the sampling abnormal signal is output; if the gradient data has been updated, the initial speed parameter is updated based on the gradient data to obtain the update speed; the initial decision parameter is updated based on the update speed, and the updated decision parameter is obtained; when the initial decision parameter And when the updated decision parameters meet the convergence conditions, the target prediction model is obtained.
  • the training data of the current round is not sampled, and the gradient update of this round will still use historical momentum to update, which may lead to overfitting of the Embedding layer.
  • the gradient confirm whether the gradient data has been updated, so as to confirm that the training data of this round is definitely sampled, and then perform the gradient update operation, thereby effectively avoiding the words that have not been sampled in the current batch during training, and still use the history.
  • Momentum to update causes the problem of overfitting of the Embedding layer.
  • step S103 in FIG. 1 a flowchart of the implementation of step S103 in FIG. 1 is shown. For the convenience of description, only the parts related to the present application are shown.
  • step S103 specifically includes: step S201 , step S202 and step S203 .
  • step S201 a user-text matrix R is generated based on a data set of user texts.
  • step S202 the user-text matrix R is decomposed based on the singular value decomposition method to obtain the user-hidden feature matrix P and the latent feature-text matrix Q.
  • singular value decomposition is an important matrix decomposition in linear algebra, and singular value decomposition is a generalization of eigen decomposition on any matrix.
  • step S203 an objective function is constructed based on the user-text matrix R.
  • R ( ⁇ ) represents the user-text matrix R user's scoring data set of text
  • p m ⁇ represents the latent feature corresponding to the mth user in the user-hidden feature matrix P
  • q n ⁇ represents the latent feature-text matrix Q
  • the hidden feature corresponding to the n-th text in ; r m,n represents the rating data of user m for text n; Represents the rating data of user m to text n in the rating data set;
  • ⁇ 2 represents the regularization factor of the latent feature matrix.
  • step S110 in FIG. 1 a flowchart of the implementation of step S110 in FIG. 1 is shown. For the convenience of description, only the parts related to the present application are shown.
  • step S110 specifically includes: step S301 , step S302 , step S303 and step S304 .
  • step S301 the initial decision parameter and the decision parameter difference of the updated decision parameter are calculated.
  • the difference value of the decision parameter is mainly used to judge the change amount of the current model parameter and the model parameter of the previous round.
  • the change amount is less than a certain value, it is considered that the decision parameter tends to a certain stable value, so that the The predictive model reaches stability.
  • step S302 it is determined whether the decision parameter difference is smaller than a preset convergence threshold.
  • the user can adjust the preset convergence threshold according to the actual situation.
  • step S303 if the decision parameter difference is less than or equal to the preset convergence threshold, it is determined that the current prediction model is converged, and the current prediction model is used as the target prediction model.
  • the decision parameter difference when the decision parameter difference is less than or equal to the preset convergence threshold, it means that the decision parameter tends to a certain stable value, and the prediction model is stable.
  • step S304 if the difference of the decision parameters is greater than the preset convergence threshold, it is determined that the current prediction model has not converged, and the parameter optimization operation is continued.
  • the difference of the decision parameters when the difference of the decision parameters is greater than the preset convergence threshold, it means that the decision parameters have not reached a certain stable value, and the parameters of the prediction model still need to be optimized.
  • the gradient data is represented as:
  • g represents the gradient data
  • m represents the total number of training data in this round
  • represents the initial decision parameter
  • x (i) represents the i-th training data in this round
  • the update speed is expressed as:
  • v new represents the update speed
  • v old represents the initial speed parameter
  • represents the momentum parameter
  • represents the learning rate
  • g represents the gradient data.
  • the update decision parameter is expressed as:
  • ⁇ new represents the update decision parameter
  • ⁇ old represents the initial decision parameter
  • v new represents the update speed
  • the model optimization method applied to momentum gradient descent receives a model optimization request sent by a user terminal, and the model optimization request at least carries the original prediction model and the original training data set; Sampling operation to obtain the training data set of this round; define the objective function based on the training data set of this round; initialize the parameters of the model optimization algorithm to obtain the initial speed parameters and initial decision parameters; calculate the gradient data corresponding to the initial decision parameters that need to be updated in this round; judge the gradient Whether the data has been updated; if the gradient data has not been updated, output a sampling abnormal signal; if the gradient data has been updated, update the initial speed parameter based on the gradient data to obtain the update speed; update the initial decision parameter based on the update speed to obtain the update decision parameter; when When the initial decision parameters and the updated decision parameters satisfy the convergence conditions, the target prediction model is obtained.
  • the training data of the current round is not sampled, and the gradient update of this round will still use historical momentum to update, which may lead to overfitting of the Embedding layer.
  • the training data of this round is confirmed to be sampled, and then the gradient update operation is performed, thereby effectively avoiding the words that have not been sampled in the current batch during training, and the history will still be used.
  • Momentum to update causes the problem of overfitting of the Embedding layer.
  • the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.
  • the present application provides an embodiment of a model optimization device applied to momentum gradient descent, which is similar to the method embodiment shown in FIG. 1 .
  • the apparatus can be specifically applied to various electronic devices.
  • the model optimization device 100 applied to momentum gradient descent in this embodiment includes: a request receiving module 101 , a sampling operation module 102 , a function definition module 103 , an initialization module 104 , a gradient calculation module 105 , and a gradient judgment module 106 , an abnormality confirmation module 107 , a speed parameter update module 108 , a decision parameter update module 109 and a target model acquisition module 110 .
  • a request receiving module 101 a sampling operation module 102
  • a function definition module 103 the initialization module 104
  • a gradient calculation module 105 includes a gradient calculation module 105 , and a gradient judgment module 106 , an abnormality confirmation module 107 , a speed parameter update module 108 , a decision parameter update module 109 and a target model acquisition module 110 .
  • a request receiving module 101 configured to receive a model optimization request sent by a user terminal, where the model optimization request at least carries the original prediction model and the original training data set;
  • the sampling operation module 102 is used to perform sampling operation in the original training data set to obtain the training data set of this round;
  • the function definition module 103 is used to define an objective function based on the current round of training data sets
  • the initialization module 104 is used to initialize the model optimization parameters of the original prediction model to obtain initial speed parameters and initial decision parameters;
  • the gradient calculation module 105 is used to calculate the gradient data corresponding to the initial decision parameter that needs to be updated in this round;
  • the gradient judgment module 106 is used for judging whether the gradient data has been updated
  • An abnormality confirmation module 107 configured to output a sampling abnormality signal if the gradient data is not updated
  • a speed parameter update module 108 configured to update the initial speed parameter based on the gradient data to obtain the update speed if the gradient data has been updated;
  • a decision parameter update module 109 configured to update the initial decision parameter based on the update speed to obtain the updated decision parameter
  • the target model obtaining module 110 is configured to obtain the target prediction model when the initial decision parameters and the updated decision parameters satisfy the convergence condition.
  • a user terminal refers to a terminal device used to execute the image processing method for preventing credential abuse provided by the present application
  • the current terminal may be, for example, a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, Mobile terminals such as PDAs (Personal Digital Assistants), PADs (Tablet Computers), PMPs (Portable Multimedia Players), navigation devices, etc., as well as stationary terminals such as digital TVs, desktop computers, etc.
  • PDAs Personal Digital Assistants
  • PADs Tablett Computers
  • PMPs Portable Multimedia Players
  • navigation devices etc.
  • stationary terminals such as digital TVs, desktop computers, etc.
  • the examples are only for the convenience of understanding, and are not used to limit the present application.
  • the original prediction model is not a prediction model optimized by gradient descent.
  • the sampling operation refers to the process of extracting individuals or samples from the overall training data, that is, the process of performing experiments or observations on the overall training data.
  • the former refers to a sampling method that draws samples from the population in accordance with the principle of randomization, without any subjectivity, including simple random sampling, systematic sampling, cluster sampling and stratified sampling.
  • the latter is a method of extracting samples based on the researcher's point of view, experience or related knowledge, with obvious subjective color.
  • the training data set of the current round refers to a training data set with a small amount of data selected after the above sampling operation, so as to reduce the training time of the model.
  • a user-text matrix R may be generated based on a data set of user texts, and the user-text matrix R may be decomposed based on the singular value decomposition method to obtain a user-hidden feature matrix P and a latent feature-text matrix Q.
  • construct the objective function based on the user-text matrix R objective function Expressed as:
  • R ( ⁇ ) represents the user-text matrix R user's scoring data set of text
  • p m ⁇ represents the latent feature corresponding to the mth user in the user-hidden feature matrix P
  • q n ⁇ represents the latent feature-text matrix Q
  • the hidden feature corresponding to the n-th text in ; r m,n represents the rating data of user m for text n; Represents the rating data of user m to text n in the rating data set;
  • ⁇ 2 represents the regularization factor of the latent feature matrix.
  • initialization is to assign a variable to a default value and a control to a default state. Specifically, it includes an initialization learning rate ⁇ , a momentum parameter ⁇ , an initial decision parameter ⁇ , and an initial velocity v.
  • the gradient data is expressed as:
  • g represents the gradient data
  • m represents the total number of training data in this round
  • represents the initial decision parameter
  • x (i) represents the i-th training data in this round
  • the gradient of its Embedding is not 0. Based on the characteristics of the sampling, it can be known whether the training data has been sampled by judging whether the gradient data has been updated.
  • the gradient data has not been updated, it means that the training data has not been sampled before performing subsequent update operations, and there is no training data that has been repeatedly sampled, and the corresponding Embedding layer will also be repeatedly trained based on historical momentum. Update, resulting in overfitting.
  • the update speed is expressed as:
  • v new represents the update speed
  • v old represents the initial speed parameter
  • represents the momentum parameter
  • represents the learning rate
  • g represents the gradient data.
  • the update decision parameter is expressed as:
  • ⁇ new represents the update decision parameter
  • ⁇ old represents the initial decision parameter
  • v new represents the update speed
  • FIG. 5 a schematic structural diagram of the function definition module 103 in FIG. 4 is shown. For the convenience of description, only the parts related to the present application are shown.
  • the function definition module 103 specifically includes: a matrix generation submodule 1031 , a matrix decomposition submodule 1032 , and a function construction submodule 1033 . in:
  • a matrix generation submodule 1031 configured to generate a user-text matrix based on a data set of user texts
  • the matrix decomposition submodule 1032 is configured to perform a decomposition operation on the user-text matrix based on the singular value decomposition method to obtain the user-hidden feature matrix and the latent feature-text matrix;
  • the function construction sub-module 1033 is used to construct an objective function based on the user-text matrix.
  • singular value decomposition is an important matrix decomposition in linear algebra, and singular value decomposition is a generalization of eigen decomposition on any matrix.
  • R ( ⁇ ) represents the user-text matrix R user's scoring data set of text
  • p m ⁇ represents the latent feature corresponding to the mth user in the user-hidden feature matrix P
  • q n ⁇ represents the latent feature-text matrix Q
  • the hidden feature corresponding to the n-th text in ; r m,n represents the rating data of user m for text n; Represents the rating data of user m to text n in the rating data set;
  • ⁇ 2 represents the regularization factor of the latent feature matrix.
  • the gradient data is represented as:
  • g represents the gradient data
  • m represents the total number of training data in this round
  • represents the initial decision parameter
  • x (i) represents the i-th training data in this round
  • the update speed is expressed as:
  • v new represents the update speed
  • v old represents the initial speed parameter
  • represents the momentum parameter
  • represents the learning rate
  • g represents the gradient data.
  • the update decision parameter is expressed as:
  • ⁇ new represents the update decision parameter
  • ⁇ old represents the initial decision parameter
  • v new represents the update speed
  • the target model obtaining module 110 specifically includes: a difference calculation submodule, a convergence judgment submodule, a convergence confirmation submodule, and a non-convergence confirmation submodule. in:
  • a difference calculation submodule configured to calculate the difference between the initial decision parameter and the decision parameter of the updated decision parameter
  • a convergence judgment submodule configured to judge whether the decision parameter difference is less than the preset convergence threshold
  • a convergence confirmation submodule configured to determine that the current prediction model is converged if the decision parameter difference is less than or equal to the preset convergence threshold, and use the current prediction model as the target prediction model;
  • the non-convergence confirmation sub-module is configured to determine that the current prediction model is not converged and continue to perform the parameter optimization operation if the decision parameter difference is greater than the preset convergence threshold.
  • the model optimization device applied to momentum gradient descent includes: a request receiving module, configured to receive a model optimization request sent by a user terminal, where the model optimization request at least carries the original prediction model and original training data set; the sampling operation module is used to perform sampling operation in the original training data set to obtain the current round of training data set; the function definition module is used to define the objective function based on the current round of training data set; the initialization module is used to initialize the original prediction model
  • the model optimizes the parameters to obtain the initial speed parameters and the initial decision parameters; the gradient calculation module is used to calculate the gradient data corresponding to the initial decision parameters that need to be updated in this round; the gradient judgment module is used to judge whether the gradient data has been updated; the abnormal confirmation module is used to If the gradient data has not been updated, the sampling abnormal signal will be output; the speed parameter update module is used to update the initial speed parameter based on the gradient data if the gradient data has been updated to obtain the update speed; the decision parameter update module is used to update based on
  • the initial decision parameters are used to obtain the updated decision parameters; the target model acquisition module is used to obtain the target prediction model when the initial decision parameters and the updated decision parameters satisfy the convergence condition.
  • the training data of the current round is not sampled, and the gradient update of this round will still use historical momentum to update, which may lead to overfitting of the Embedding layer.
  • the training data of this round is confirmed to be sampled, and then the gradient update operation is performed, thereby effectively avoiding the words that have not been sampled in the current batch during training, and the history will still be used.
  • Momentum to update causes the problem of overfitting of the Embedding layer.
  • FIG. 6 is a block diagram of the basic structure of a computer device according to this embodiment.
  • the computer device 200 includes a memory 210 , a processor 220 , and a network interface 230 that communicate with each other through a system bus. It should be noted that only the computer device 200 with components 210-230 is shown in the figure, but it should be understood that implementation of all of the shown components is not required, and more or less components may be implemented instead.
  • the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • embedded equipment etc.
  • the computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment.
  • the computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.
  • the memory 210 includes at least one type of readable storage medium, including flash memory, hard disk, multimedia card, card-type memory (eg, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the computer readable storage Media can be non-volatile or volatile.
  • the memory 210 may be an internal storage unit of the computer device 200 , such as a hard disk or a memory of the computer device 200 .
  • the memory 210 may also be an external storage device of the computer device 200, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc.
  • the memory 210 may also include both the internal storage unit of the computer device 200 and its external storage device.
  • the memory 210 is generally used to store the operating system and various application software installed in the computer device 200, such as computer-readable instructions applied to the model optimization method of momentum gradient descent.
  • the memory 210 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 220 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
  • the processor 220 is typically used to control the overall operation of the computer device 200 .
  • the processor 220 is configured to execute the computer-readable instructions stored in the memory 210 or process data, for example, the computer-readable instructions for executing the model optimization method applied to momentum gradient descent.
  • the network interface 230 may include a wireless network interface or a wired network interface, and the network interface 230 is generally used to establish a communication connection between the computer device 200 and other electronic devices.
  • the present application also provides another embodiment, that is, to provide a computer-readable storage medium, where the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to perform the steps of the model optimization method applied to momentum gradient descent as described above.
  • the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.
  • a storage medium such as ROM/RAM, magnetic disk, CD-ROM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

La présente invention se rapporte à un procédé et à un appareil d'optimisation de modèle appliqués à une descente de gradient avec moment, à un dispositif informatique et à un support de stockage. Le procédé comprend : la réception d'une demande d'optimisation de modèle envoyée par un terminal d'utilisateur, la demande d'optimisation de modèle transportant au moins un modèle de prédiction d'origine et un ensemble de données d'apprentissage d'origine (S101) ; la réalisation d'une opération d'échantillonnage sur l'ensemble de données d'apprentissage d'origine pour obtenir un cycle actuel de l'ensemble de données d'apprentissage (S102) ; la définition d'une fonction cible sur la base du cycle actuel de l'ensemble de données d'apprentissage (S103) ; l'initialisation de paramètres d'optimisation de modèle du modèle de prédiction d'origine pour obtenir un paramètre de vitesse initial et un paramètre de décision initial (S104) ; le calcul de données de gradient correspondant au paramètre de décision initial qui nécessite d'être mis à jour dans le cycle actuel (S105) ; le fait de déterminer si les données de gradient ont été mises à jour (S106) ; si les données de gradient n'ont pas été mises à jour, la sortie d'un signal d'anomalie d'échantillonnage (S107) ; si les données de gradient ont été mises à jour, la mise à jour du paramètre de vitesse initial sur la base des données de gradient pour obtenir une vitesse mise à jour (S108) ; la mise à jour du paramètre de décision initial sur la base de la vitesse mise à jour pour obtenir un paramètre de décision mis à jour (S109) ; et lorsque le paramètre de décision initial et le paramètre de décision mis à jour satisfont une condition de convergence, l'obtention d'un modèle de prédiction cible (S110). La présente invention permet d'éviter efficacement le problème de surapprentissage d'une couche incorporée causé par l'utilisation d'un moment historique pour mettre à jour les mots qui n'ont pas été échantillonnés dans le lot actuel durant l'apprentissage.
PCT/CN2021/090501 2020-11-27 2021-04-28 Procédé et appareil d'optimisation de modèle, dispositif informatique et support de stockage WO2022110640A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011359384.8 2020-11-27
CN202011359384.8A CN112488183B (zh) 2020-11-27 2020-11-27 一种模型优化方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022110640A1 true WO2022110640A1 (fr) 2022-06-02

Family

ID=74935992

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/090501 WO2022110640A1 (fr) 2020-11-27 2021-04-28 Procédé et appareil d'optimisation de modèle, dispositif informatique et support de stockage

Country Status (2)

Country Link
CN (1) CN112488183B (fr)
WO (1) WO2022110640A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116068903A (zh) * 2023-04-06 2023-05-05 中国人民解放军国防科技大学 一种闭环系统鲁棒性能的实时优化方法、装置及设备
CN116451872A (zh) * 2023-06-08 2023-07-18 北京中电普华信息技术有限公司 碳排放预测分布式模型训练方法、相关方法及装置
CN117077598A (zh) * 2023-10-13 2023-11-17 青岛展诚科技有限公司 一种基于Mini-batch梯度下降法的3D寄生参数的优化方法
CN117596156A (zh) * 2023-12-07 2024-02-23 机械工业仪器仪表综合技术经济研究所 一种工业应用5g网络的评估模型的构建方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112488183B (zh) * 2020-11-27 2024-05-10 平安科技(深圳)有限公司 一种模型优化方法、装置、计算机设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170103161A1 (en) * 2015-10-13 2017-04-13 The Governing Council Of The University Of Toronto Methods and systems for 3d structure estimation
CN110390561A (zh) * 2019-07-04 2019-10-29 四川金赞科技有限公司 基于动量加速随机梯度下降的用户-金融产品选用倾向高速预测方法和装置
CN110730037A (zh) * 2019-10-21 2020-01-24 苏州大学 一种基于动量梯度下降法的相干光通信系统光信噪比监测方法
CN111507530A (zh) * 2020-04-17 2020-08-07 集美大学 基于分数阶动量梯度下降的rbf神经网络船舶交通流预测方法
CN111695295A (zh) * 2020-06-01 2020-09-22 中国人民解放军火箭军工程大学 一种光栅耦合器的入射参数反演模型的构建方法
CN112488183A (zh) * 2020-11-27 2021-03-12 平安科技(深圳)有限公司 一种模型优化方法、装置、计算机设备及存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110889509B (zh) * 2019-11-11 2023-04-28 安徽超清科技股份有限公司 一种基于梯度动量加速的联合学习方法及装置
CN111639710B (zh) * 2020-05-29 2023-08-08 北京百度网讯科技有限公司 图像识别模型训练方法、装置、设备以及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170103161A1 (en) * 2015-10-13 2017-04-13 The Governing Council Of The University Of Toronto Methods and systems for 3d structure estimation
CN110390561A (zh) * 2019-07-04 2019-10-29 四川金赞科技有限公司 基于动量加速随机梯度下降的用户-金融产品选用倾向高速预测方法和装置
CN110730037A (zh) * 2019-10-21 2020-01-24 苏州大学 一种基于动量梯度下降法的相干光通信系统光信噪比监测方法
CN111507530A (zh) * 2020-04-17 2020-08-07 集美大学 基于分数阶动量梯度下降的rbf神经网络船舶交通流预测方法
CN111695295A (zh) * 2020-06-01 2020-09-22 中国人民解放军火箭军工程大学 一种光栅耦合器的入射参数反演模型的构建方法
CN112488183A (zh) * 2020-11-27 2021-03-12 平安科技(深圳)有限公司 一种模型优化方法、装置、计算机设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XIAO LIU CLASSMATE: "Machine Learning Optimization Methods: Momentum Momentum Gradient Descent", CSDN BLOG, 2 December 2019 (2019-12-02), pages 1 - 8, XP055933014, Retrieved from the Internet <URL:https://blog.csdn.net/SweetSeven_/article/details/103353990> [retrieved on 20220620] *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116068903A (zh) * 2023-04-06 2023-05-05 中国人民解放军国防科技大学 一种闭环系统鲁棒性能的实时优化方法、装置及设备
CN116451872A (zh) * 2023-06-08 2023-07-18 北京中电普华信息技术有限公司 碳排放预测分布式模型训练方法、相关方法及装置
CN116451872B (zh) * 2023-06-08 2023-09-01 北京中电普华信息技术有限公司 碳排放预测分布式模型训练方法、相关方法及装置
CN117077598A (zh) * 2023-10-13 2023-11-17 青岛展诚科技有限公司 一种基于Mini-batch梯度下降法的3D寄生参数的优化方法
CN117077598B (zh) * 2023-10-13 2024-01-26 青岛展诚科技有限公司 一种基于Mini-batch梯度下降法的3D寄生参数的优化方法
CN117596156A (zh) * 2023-12-07 2024-02-23 机械工业仪器仪表综合技术经济研究所 一种工业应用5g网络的评估模型的构建方法
CN117596156B (zh) * 2023-12-07 2024-05-07 机械工业仪器仪表综合技术经济研究所 一种工业应用5g网络的评估模型的构建方法

Also Published As

Publication number Publication date
CN112488183B (zh) 2024-05-10
CN112488183A (zh) 2021-03-12

Similar Documents

Publication Publication Date Title
WO2022110640A1 (fr) Procédé et appareil d&#39;optimisation de modèle, dispositif informatique et support de stockage
US20230107574A1 (en) Generating trained neural networks with increased robustness against adversarial attacks
US10936949B2 (en) Training machine learning models using task selection policies to increase learning progress
WO2021155713A1 (fr) Procédé de reconnaissance faciale à base de fusion de modèle de greffage de poids, et dispositif y relatif
US20190303535A1 (en) Interpretable bio-medical link prediction using deep neural representation
WO2021120677A1 (fr) Procédé et appareil d&#39;entraînement de modèle d&#39;entreposage, dispositif informatique et support de stockage
WO2019095570A1 (fr) Procédé de prédiction de popularité d&#39;un événement, serveur et support d&#39;informations lisible par ordinateur
CN111340221B (zh) 神经网络结构的采样方法和装置
CN113435583B (zh) 基于联邦学习的对抗生成网络模型训练方法及其相关设备
CN114780727A (zh) 基于强化学习的文本分类方法、装置、计算机设备及介质
WO2022105117A1 (fr) Procédé et dispositif d&#39;évaluation de qualité d&#39;image, dispositif informatique et support de stockage
WO2020168851A1 (fr) Reconnaissance de comportement
WO2020248365A1 (fr) Procédé et appareil d&#39;attribution intelligente de mémoires d&#39;apprentissage de modèles et support de stockage lisible par ordinateur
WO2022105121A1 (fr) Procédé et appareil de distillation appliqués à un modèle bert, dispositif et support de stockage
WO2020191001A1 (fr) Analyse et prédiction de liaison de réseau du monde réel en utilisant des modèles de factorisation matricielle probabilistes étendus avec des nœuds étiquetés
CN112214775A (zh) 对图数据的注入式攻击方法、装置、介质及电子设备
CN110462638A (zh) 使用后验锐化训练神经网络
WO2023207411A1 (fr) Procédé et appareil de détermination de trafic sur la base de données spatiotemporelles, et dispositif et support
CN115730597A (zh) 多级语义意图识别方法及其相关设备
WO2022116439A1 (fr) Procédé de détection d&#39;image ct basé sur un apprentissage fédéré et dispositif associé
CN113791909A (zh) 服务器容量调整方法、装置、计算机设备及存储介质
CN108475346A (zh) 神经随机访问机器
CN111144473A (zh) 训练集构建方法、装置、电子设备及计算机可读存储介质
CN115099875A (zh) 基于决策树模型的数据分类方法及相关设备
CN115545753A (zh) 一种基于贝叶斯算法的合作伙伴预测方法及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21896155

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21896155

Country of ref document: EP

Kind code of ref document: A1