WO2022110640A1 - Model optimization method and apparatus, computer device and storage medium - Google Patents
Model optimization method and apparatus, computer device and storage medium Download PDFInfo
- Publication number
- WO2022110640A1 WO2022110640A1 PCT/CN2021/090501 CN2021090501W WO2022110640A1 WO 2022110640 A1 WO2022110640 A1 WO 2022110640A1 CN 2021090501 W CN2021090501 W CN 2021090501W WO 2022110640 A1 WO2022110640 A1 WO 2022110640A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- user
- update
- updated
- decision parameter
- gradient
- Prior art date
Links
- 238000005457 optimization Methods 0.000 title claims abstract description 93
- 238000000034 method Methods 0.000 title claims abstract description 69
- 238000012549 training Methods 0.000 claims abstract description 127
- 238000005070 sampling Methods 0.000 claims abstract description 55
- 230000005856 abnormality Effects 0.000 claims abstract description 8
- 239000011159 matrix material Substances 0.000 claims description 82
- 230000006870 function Effects 0.000 claims description 52
- 238000000354 decomposition reaction Methods 0.000 claims description 23
- 238000004364 calculation method Methods 0.000 claims description 14
- 230000002159 abnormal effect Effects 0.000 claims description 12
- 238000012790 confirmation Methods 0.000 claims description 9
- 238000010276 construction Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 230000009897 systematic effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Definitions
- the present application relates to model optimization of artificial intelligence, and in particular, to a model optimization method, device, computer equipment and storage medium applied to momentum gradient descent.
- optimization problem is one of the most important research directions in computational mathematics. In the field of deep learning, optimization algorithms are also one of the key links. Even with the same data set and model architecture, different optimization algorithms are likely to lead to different training results, and even some models do not converge.
- the applicant realizes that the applicant finds that the traditional model optimization method is generally unintelligent, and the Embedding layer may have an overfitting problem during the model optimization process.
- the purpose of the embodiments of the present application is to propose a model optimization method, device, computer equipment and storage medium applied to momentum gradient descent, so as to solve the problem that the traditional model optimization method will overfit the Embedding layer during the model optimization process. .
- the embodiments of the present application provide a model optimization method applied to momentum gradient descent, which adopts the following technical solutions:
- model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;
- the gradient data corresponding to the initial decision parameter needs to be updated in the calculation of the current round
- the embodiment of the present application also provides a model optimization device applied to momentum gradient descent, which adopts the following technical solutions:
- a request receiving module configured to receive a model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;
- a sampling operation module used for sampling operation in the original training data set to obtain the training data set of this round
- a function definition module for defining an objective function based on the current round of training data sets
- an initialization module for initializing the model optimization parameters of the original prediction model to obtain initial speed parameters and initial decision-making parameters
- a gradient calculation module used to calculate the gradient data corresponding to the initial decision parameter that needs to be updated in this round
- a gradient judgment module for judging whether the gradient data has been updated
- An abnormality confirmation module used for outputting a sampling abnormality signal if the gradient data is not updated
- a speed parameter update module configured to update the initial speed parameter based on the gradient data to obtain an update speed if the gradient data has been updated
- a decision parameter update module configured to update the initial decision parameter based on the update speed to obtain an update decision parameter
- a target model obtaining module configured to obtain a target prediction model when the initial decision parameters and the updated decision parameters satisfy a convergence condition.
- the embodiment of the present application also provides a computer device, which adopts the following technical solutions:
- the memory stores computer-readable instructions
- the processor executes the computer-readable instructions, the processor implements the steps of the model optimization method applied to the momentum gradient descent as described below:
- model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;
- the gradient data corresponding to the initial decision parameter needs to be updated in the calculation of the current round
- the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical solutions:
- the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, implements the steps of the model optimization method applied to the momentum gradient descent as described below:
- model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;
- the gradient data corresponding to the initial decision parameter needs to be updated in the calculation of the current round
- the model optimization method, device, computer equipment and storage medium applied to momentum gradient descent provided by the embodiments of the present application mainly have the following beneficial effects:
- the present application provides a model optimization method applied to momentum gradient descent, which receives a model optimization request sent by a user terminal, where the model optimization request at least carries an original prediction model and an original training data set; Sampling operation to obtain the current round of training data sets; define an objective function based on the current round of training data sets; initialize model optimization algorithm parameters to obtain initial speed parameters and initial decision parameters; calculate the gradient corresponding to the initial decision parameters that need to be updated in this round data; determine whether the gradient data has been updated; if the gradient data has not been updated, output a sampling abnormal signal; if the gradient data has been updated, update the initial speed parameter based on the gradient data to obtain an update speed; The initial decision parameter is updated based on the update speed to obtain an updated decision parameter; when the initial decision parameter and the updated decision parameter satisfy a convergence condition, a target prediction model is obtained.
- the training data of the current round is not sampled, and the gradient update of this round will still use historical momentum to update, which may lead to overfitting of the Embedding layer.
- the training data of this round is confirmed to be sampled, and then the gradient update operation is performed, thereby effectively avoiding the words that have not been sampled in the current batch during training, and the history will still be used.
- Momentum to update causes the problem of overfitting of the Embedding layer.
- Fig. 1 is the realization flow chart of the model optimization method applied to momentum gradient descent provided by the first embodiment of the present application;
- Fig. 2 is the realization flow chart of step S103 in Fig. 1;
- Fig. 3 is the realization flow chart of step S110 in Fig. 1;
- Embodiment 4 is a schematic structural diagram of a model optimization device applied to momentum gradient descent provided by Embodiment 2 of the present application;
- Fig. 5 is the structural representation of function definition module 103 in Fig. 4;
- FIG. 6 is a schematic structural diagram of an embodiment of a computer device according to the present application.
- FIG. 1 shows the implementation flow chart of the model optimization method applied to the momentum gradient descent provided according to the first embodiment of the present application. For the convenience of description, only the part related to the present application is shown.
- step S101 a model optimization request sent by a user terminal is received, where the model optimization request at least carries the original prediction model and the original training data set.
- a user terminal refers to a terminal device used to execute the image processing method for preventing credential abuse provided by the present application
- the current terminal may be, for example, a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, Mobile terminals such as PDAs (Personal Digital Assistants), PADs (Tablet Computers), PMPs (Portable Multimedia Players), navigation devices, etc., as well as stationary terminals such as digital TVs, desktop computers, etc.
- PDAs Personal Digital Assistants
- PADs Tablett Computers
- PMPs Portable Multimedia Players
- navigation devices etc.
- stationary terminals such as digital TVs, desktop computers, etc.
- the examples are only for the convenience of understanding, and are not used to limit the present application.
- the original prediction model is not a prediction model optimized by gradient descent.
- step S102 a sampling operation is performed in the original training data set to obtain the current round of training data sets.
- the sampling operation refers to the process of extracting individuals or samples from the overall training data, that is, the process of performing experiments or observations on the overall training data.
- the former refers to a sampling method that draws samples from the population in accordance with the principle of randomization, without any subjectivity, including simple random sampling, systematic sampling, cluster sampling and stratified sampling.
- the latter is a method of extracting samples based on the researcher's point of view, experience or related knowledge, with obvious subjective color.
- the training data set of the current round refers to a training data set with a small amount of data selected after the above sampling operation, so as to reduce the training time of the model.
- step S103 an objective function is defined based on the current round of training data sets.
- a user-text matrix R may be generated based on a data set of user texts, and the user-text matrix R may be decomposed based on the singular value decomposition method to obtain a user-hidden feature matrix P and a latent feature-text matrix Q.
- construct the objective function based on the user-text matrix R objective function Expressed as:
- R ( ⁇ ) represents the user-text matrix R user's scoring data set of text
- p m ⁇ represents the latent feature corresponding to the mth user in the user-hidden feature matrix P
- q n ⁇ represents the latent feature-text matrix Q
- the hidden feature corresponding to the n-th text in ; r m,n represents the rating data of user m for text n; Represents the rating data of user m to text n in the rating data set;
- ⁇ 2 represents the regularization factor of the latent feature matrix.
- step S104 the model optimization parameters of the original prediction model are initialized to obtain initial speed parameters and initial decision parameters.
- initialization is to assign a variable to a default value and a control to a default state. Specifically, it includes an initialization learning rate ⁇ , a momentum parameter ⁇ , an initial decision parameter ⁇ , and an initial velocity v.
- step S105 the gradient data corresponding to the initial decision parameters that need to be updated in the current round is calculated.
- the gradient data is expressed as:
- g represents the gradient data
- m represents the total number of training data in this round
- ⁇ represents the initial decision parameter
- x (i) represents the i-th training data in this round
- step S106 it is determined whether the gradient data has been updated.
- the gradient of its Embedding is not 0. Based on the characteristics of the sampling, it is possible to know whether the training data has been sampled by judging whether the gradient data has been updated.
- step S107 if the gradient data is not updated, a sampling abnormal signal is output.
- the gradient data has not been updated, it means that the training data has not been sampled before performing subsequent update operations, and there is no training data that has been repeatedly sampled, and the corresponding Embedding layer will also be repeatedly trained based on historical momentum. Update, resulting in overfitting.
- step S108 if the gradient data has been updated, the initial speed parameter is updated based on the gradient data to obtain the update speed.
- the update speed is expressed as:
- v new represents the update speed
- v old represents the initial speed parameter
- ⁇ represents the momentum parameter
- ⁇ represents the learning rate
- g represents the gradient data.
- step S109 the initial decision parameter is updated based on the update speed to obtain the update decision parameter.
- the update decision parameter is expressed as:
- ⁇ new represents the update decision parameter
- ⁇ old represents the initial decision parameter
- v new represents the update speed
- step S110 when the initial decision parameters and the updated decision parameters satisfy the convergence condition, a target prediction model is obtained.
- the model optimization method applied to momentum gradient descent receives a model optimization request sent by a user terminal, and the model optimization request carries at least the original prediction model and the original training data set; the sampling operation is performed in the original training data set, Obtain the training data set of this round; define the objective function based on the training data set of this round; initialize the parameters of the model optimization algorithm to obtain the initial speed parameters and initial decision parameters; calculate the gradient data corresponding to the initial decision parameters that need to be updated in this round; judge whether the gradient data has been Update; if the gradient data is not updated, the sampling abnormal signal is output; if the gradient data has been updated, the initial speed parameter is updated based on the gradient data to obtain the update speed; the initial decision parameter is updated based on the update speed, and the updated decision parameter is obtained; when the initial decision parameter And when the updated decision parameters meet the convergence conditions, the target prediction model is obtained.
- the training data of the current round is not sampled, and the gradient update of this round will still use historical momentum to update, which may lead to overfitting of the Embedding layer.
- the gradient confirm whether the gradient data has been updated, so as to confirm that the training data of this round is definitely sampled, and then perform the gradient update operation, thereby effectively avoiding the words that have not been sampled in the current batch during training, and still use the history.
- Momentum to update causes the problem of overfitting of the Embedding layer.
- step S103 in FIG. 1 a flowchart of the implementation of step S103 in FIG. 1 is shown. For the convenience of description, only the parts related to the present application are shown.
- step S103 specifically includes: step S201 , step S202 and step S203 .
- step S201 a user-text matrix R is generated based on a data set of user texts.
- step S202 the user-text matrix R is decomposed based on the singular value decomposition method to obtain the user-hidden feature matrix P and the latent feature-text matrix Q.
- singular value decomposition is an important matrix decomposition in linear algebra, and singular value decomposition is a generalization of eigen decomposition on any matrix.
- step S203 an objective function is constructed based on the user-text matrix R.
- R ( ⁇ ) represents the user-text matrix R user's scoring data set of text
- p m ⁇ represents the latent feature corresponding to the mth user in the user-hidden feature matrix P
- q n ⁇ represents the latent feature-text matrix Q
- the hidden feature corresponding to the n-th text in ; r m,n represents the rating data of user m for text n; Represents the rating data of user m to text n in the rating data set;
- ⁇ 2 represents the regularization factor of the latent feature matrix.
- step S110 in FIG. 1 a flowchart of the implementation of step S110 in FIG. 1 is shown. For the convenience of description, only the parts related to the present application are shown.
- step S110 specifically includes: step S301 , step S302 , step S303 and step S304 .
- step S301 the initial decision parameter and the decision parameter difference of the updated decision parameter are calculated.
- the difference value of the decision parameter is mainly used to judge the change amount of the current model parameter and the model parameter of the previous round.
- the change amount is less than a certain value, it is considered that the decision parameter tends to a certain stable value, so that the The predictive model reaches stability.
- step S302 it is determined whether the decision parameter difference is smaller than a preset convergence threshold.
- the user can adjust the preset convergence threshold according to the actual situation.
- step S303 if the decision parameter difference is less than or equal to the preset convergence threshold, it is determined that the current prediction model is converged, and the current prediction model is used as the target prediction model.
- the decision parameter difference when the decision parameter difference is less than or equal to the preset convergence threshold, it means that the decision parameter tends to a certain stable value, and the prediction model is stable.
- step S304 if the difference of the decision parameters is greater than the preset convergence threshold, it is determined that the current prediction model has not converged, and the parameter optimization operation is continued.
- the difference of the decision parameters when the difference of the decision parameters is greater than the preset convergence threshold, it means that the decision parameters have not reached a certain stable value, and the parameters of the prediction model still need to be optimized.
- the gradient data is represented as:
- g represents the gradient data
- m represents the total number of training data in this round
- ⁇ represents the initial decision parameter
- x (i) represents the i-th training data in this round
- the update speed is expressed as:
- v new represents the update speed
- v old represents the initial speed parameter
- ⁇ represents the momentum parameter
- ⁇ represents the learning rate
- g represents the gradient data.
- the update decision parameter is expressed as:
- ⁇ new represents the update decision parameter
- ⁇ old represents the initial decision parameter
- v new represents the update speed
- the model optimization method applied to momentum gradient descent receives a model optimization request sent by a user terminal, and the model optimization request at least carries the original prediction model and the original training data set; Sampling operation to obtain the training data set of this round; define the objective function based on the training data set of this round; initialize the parameters of the model optimization algorithm to obtain the initial speed parameters and initial decision parameters; calculate the gradient data corresponding to the initial decision parameters that need to be updated in this round; judge the gradient Whether the data has been updated; if the gradient data has not been updated, output a sampling abnormal signal; if the gradient data has been updated, update the initial speed parameter based on the gradient data to obtain the update speed; update the initial decision parameter based on the update speed to obtain the update decision parameter; when When the initial decision parameters and the updated decision parameters satisfy the convergence conditions, the target prediction model is obtained.
- the training data of the current round is not sampled, and the gradient update of this round will still use historical momentum to update, which may lead to overfitting of the Embedding layer.
- the training data of this round is confirmed to be sampled, and then the gradient update operation is performed, thereby effectively avoiding the words that have not been sampled in the current batch during training, and the history will still be used.
- Momentum to update causes the problem of overfitting of the Embedding layer.
- the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.
- the present application provides an embodiment of a model optimization device applied to momentum gradient descent, which is similar to the method embodiment shown in FIG. 1 .
- the apparatus can be specifically applied to various electronic devices.
- the model optimization device 100 applied to momentum gradient descent in this embodiment includes: a request receiving module 101 , a sampling operation module 102 , a function definition module 103 , an initialization module 104 , a gradient calculation module 105 , and a gradient judgment module 106 , an abnormality confirmation module 107 , a speed parameter update module 108 , a decision parameter update module 109 and a target model acquisition module 110 .
- a request receiving module 101 a sampling operation module 102
- a function definition module 103 the initialization module 104
- a gradient calculation module 105 includes a gradient calculation module 105 , and a gradient judgment module 106 , an abnormality confirmation module 107 , a speed parameter update module 108 , a decision parameter update module 109 and a target model acquisition module 110 .
- a request receiving module 101 configured to receive a model optimization request sent by a user terminal, where the model optimization request at least carries the original prediction model and the original training data set;
- the sampling operation module 102 is used to perform sampling operation in the original training data set to obtain the training data set of this round;
- the function definition module 103 is used to define an objective function based on the current round of training data sets
- the initialization module 104 is used to initialize the model optimization parameters of the original prediction model to obtain initial speed parameters and initial decision parameters;
- the gradient calculation module 105 is used to calculate the gradient data corresponding to the initial decision parameter that needs to be updated in this round;
- the gradient judgment module 106 is used for judging whether the gradient data has been updated
- An abnormality confirmation module 107 configured to output a sampling abnormality signal if the gradient data is not updated
- a speed parameter update module 108 configured to update the initial speed parameter based on the gradient data to obtain the update speed if the gradient data has been updated;
- a decision parameter update module 109 configured to update the initial decision parameter based on the update speed to obtain the updated decision parameter
- the target model obtaining module 110 is configured to obtain the target prediction model when the initial decision parameters and the updated decision parameters satisfy the convergence condition.
- a user terminal refers to a terminal device used to execute the image processing method for preventing credential abuse provided by the present application
- the current terminal may be, for example, a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, Mobile terminals such as PDAs (Personal Digital Assistants), PADs (Tablet Computers), PMPs (Portable Multimedia Players), navigation devices, etc., as well as stationary terminals such as digital TVs, desktop computers, etc.
- PDAs Personal Digital Assistants
- PADs Tablett Computers
- PMPs Portable Multimedia Players
- navigation devices etc.
- stationary terminals such as digital TVs, desktop computers, etc.
- the examples are only for the convenience of understanding, and are not used to limit the present application.
- the original prediction model is not a prediction model optimized by gradient descent.
- the sampling operation refers to the process of extracting individuals or samples from the overall training data, that is, the process of performing experiments or observations on the overall training data.
- the former refers to a sampling method that draws samples from the population in accordance with the principle of randomization, without any subjectivity, including simple random sampling, systematic sampling, cluster sampling and stratified sampling.
- the latter is a method of extracting samples based on the researcher's point of view, experience or related knowledge, with obvious subjective color.
- the training data set of the current round refers to a training data set with a small amount of data selected after the above sampling operation, so as to reduce the training time of the model.
- a user-text matrix R may be generated based on a data set of user texts, and the user-text matrix R may be decomposed based on the singular value decomposition method to obtain a user-hidden feature matrix P and a latent feature-text matrix Q.
- construct the objective function based on the user-text matrix R objective function Expressed as:
- R ( ⁇ ) represents the user-text matrix R user's scoring data set of text
- p m ⁇ represents the latent feature corresponding to the mth user in the user-hidden feature matrix P
- q n ⁇ represents the latent feature-text matrix Q
- the hidden feature corresponding to the n-th text in ; r m,n represents the rating data of user m for text n; Represents the rating data of user m to text n in the rating data set;
- ⁇ 2 represents the regularization factor of the latent feature matrix.
- initialization is to assign a variable to a default value and a control to a default state. Specifically, it includes an initialization learning rate ⁇ , a momentum parameter ⁇ , an initial decision parameter ⁇ , and an initial velocity v.
- the gradient data is expressed as:
- g represents the gradient data
- m represents the total number of training data in this round
- ⁇ represents the initial decision parameter
- x (i) represents the i-th training data in this round
- the gradient of its Embedding is not 0. Based on the characteristics of the sampling, it can be known whether the training data has been sampled by judging whether the gradient data has been updated.
- the gradient data has not been updated, it means that the training data has not been sampled before performing subsequent update operations, and there is no training data that has been repeatedly sampled, and the corresponding Embedding layer will also be repeatedly trained based on historical momentum. Update, resulting in overfitting.
- the update speed is expressed as:
- v new represents the update speed
- v old represents the initial speed parameter
- ⁇ represents the momentum parameter
- ⁇ represents the learning rate
- g represents the gradient data.
- the update decision parameter is expressed as:
- ⁇ new represents the update decision parameter
- ⁇ old represents the initial decision parameter
- v new represents the update speed
- FIG. 5 a schematic structural diagram of the function definition module 103 in FIG. 4 is shown. For the convenience of description, only the parts related to the present application are shown.
- the function definition module 103 specifically includes: a matrix generation submodule 1031 , a matrix decomposition submodule 1032 , and a function construction submodule 1033 . in:
- a matrix generation submodule 1031 configured to generate a user-text matrix based on a data set of user texts
- the matrix decomposition submodule 1032 is configured to perform a decomposition operation on the user-text matrix based on the singular value decomposition method to obtain the user-hidden feature matrix and the latent feature-text matrix;
- the function construction sub-module 1033 is used to construct an objective function based on the user-text matrix.
- singular value decomposition is an important matrix decomposition in linear algebra, and singular value decomposition is a generalization of eigen decomposition on any matrix.
- R ( ⁇ ) represents the user-text matrix R user's scoring data set of text
- p m ⁇ represents the latent feature corresponding to the mth user in the user-hidden feature matrix P
- q n ⁇ represents the latent feature-text matrix Q
- the hidden feature corresponding to the n-th text in ; r m,n represents the rating data of user m for text n; Represents the rating data of user m to text n in the rating data set;
- ⁇ 2 represents the regularization factor of the latent feature matrix.
- the gradient data is represented as:
- g represents the gradient data
- m represents the total number of training data in this round
- ⁇ represents the initial decision parameter
- x (i) represents the i-th training data in this round
- the update speed is expressed as:
- v new represents the update speed
- v old represents the initial speed parameter
- ⁇ represents the momentum parameter
- ⁇ represents the learning rate
- g represents the gradient data.
- the update decision parameter is expressed as:
- ⁇ new represents the update decision parameter
- ⁇ old represents the initial decision parameter
- v new represents the update speed
- the target model obtaining module 110 specifically includes: a difference calculation submodule, a convergence judgment submodule, a convergence confirmation submodule, and a non-convergence confirmation submodule. in:
- a difference calculation submodule configured to calculate the difference between the initial decision parameter and the decision parameter of the updated decision parameter
- a convergence judgment submodule configured to judge whether the decision parameter difference is less than the preset convergence threshold
- a convergence confirmation submodule configured to determine that the current prediction model is converged if the decision parameter difference is less than or equal to the preset convergence threshold, and use the current prediction model as the target prediction model;
- the non-convergence confirmation sub-module is configured to determine that the current prediction model is not converged and continue to perform the parameter optimization operation if the decision parameter difference is greater than the preset convergence threshold.
- the model optimization device applied to momentum gradient descent includes: a request receiving module, configured to receive a model optimization request sent by a user terminal, where the model optimization request at least carries the original prediction model and original training data set; the sampling operation module is used to perform sampling operation in the original training data set to obtain the current round of training data set; the function definition module is used to define the objective function based on the current round of training data set; the initialization module is used to initialize the original prediction model
- the model optimizes the parameters to obtain the initial speed parameters and the initial decision parameters; the gradient calculation module is used to calculate the gradient data corresponding to the initial decision parameters that need to be updated in this round; the gradient judgment module is used to judge whether the gradient data has been updated; the abnormal confirmation module is used to If the gradient data has not been updated, the sampling abnormal signal will be output; the speed parameter update module is used to update the initial speed parameter based on the gradient data if the gradient data has been updated to obtain the update speed; the decision parameter update module is used to update based on
- the initial decision parameters are used to obtain the updated decision parameters; the target model acquisition module is used to obtain the target prediction model when the initial decision parameters and the updated decision parameters satisfy the convergence condition.
- the training data of the current round is not sampled, and the gradient update of this round will still use historical momentum to update, which may lead to overfitting of the Embedding layer.
- the training data of this round is confirmed to be sampled, and then the gradient update operation is performed, thereby effectively avoiding the words that have not been sampled in the current batch during training, and the history will still be used.
- Momentum to update causes the problem of overfitting of the Embedding layer.
- FIG. 6 is a block diagram of the basic structure of a computer device according to this embodiment.
- the computer device 200 includes a memory 210 , a processor 220 , and a network interface 230 that communicate with each other through a system bus. It should be noted that only the computer device 200 with components 210-230 is shown in the figure, but it should be understood that implementation of all of the shown components is not required, and more or less components may be implemented instead.
- the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.
- ASIC Application Specific Integrated Circuit
- FPGA Field-Programmable Gate Array
- DSP Digital Signal Processor
- embedded equipment etc.
- the computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment.
- the computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.
- the memory 210 includes at least one type of readable storage medium, including flash memory, hard disk, multimedia card, card-type memory (eg, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
- the computer readable storage Media can be non-volatile or volatile.
- the memory 210 may be an internal storage unit of the computer device 200 , such as a hard disk or a memory of the computer device 200 .
- the memory 210 may also be an external storage device of the computer device 200, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc.
- the memory 210 may also include both the internal storage unit of the computer device 200 and its external storage device.
- the memory 210 is generally used to store the operating system and various application software installed in the computer device 200, such as computer-readable instructions applied to the model optimization method of momentum gradient descent.
- the memory 210 can also be used to temporarily store various types of data that have been output or will be output.
- the processor 220 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
- the processor 220 is typically used to control the overall operation of the computer device 200 .
- the processor 220 is configured to execute the computer-readable instructions stored in the memory 210 or process data, for example, the computer-readable instructions for executing the model optimization method applied to momentum gradient descent.
- the network interface 230 may include a wireless network interface or a wired network interface, and the network interface 230 is generally used to establish a communication connection between the computer device 200 and other electronic devices.
- the present application also provides another embodiment, that is, to provide a computer-readable storage medium, where the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to perform the steps of the model optimization method applied to momentum gradient descent as described above.
- the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation.
- the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.
- a storage medium such as ROM/RAM, magnetic disk, CD-ROM
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Provided are a model optimization method and apparatus applied to gradient descent with momentum, and a computer device and a storage medium. The method comprises: receiving a model optimization request sent by a user terminal, the model optimization request at least carrying an original prediction model and an original training data set (S101); performing a sampling operation on the original training data set to obtain a current round of training data set (S102); defining a target function on the basis of the current round of training data set (S103); initializing model optimization parameters of the original prediction model to obtain an initial speed parameter and an initial decision parameter (S104); calculating gradient data corresponding to the initial decision parameter that needs to be updated in the current round (S105); determining whether the gradient data has been updated (S106); if the gradient data has not been updated, outputting a sampling abnormality signal (S107); if the gradient data has been updated, updating the initial speed parameter on the basis of the gradient data to obtain an updated speed (S108); updating the initial decision parameter on the basis of the updated speed to obtain an updated decision parameter (S109); and when the initial decision parameter and the updated decision parameter satisfy a convergence condition, obtaining a target prediction model (S110). The problem of over-fitting of an embedding layer caused by the use of historical momentum to update the words that have not been sampled in the current batch during training can be effectively avoided.
Description
本申请以2020年11月27日提交的申请号为202011359384.8,名称为“一种模型优化方法、装置、计算机设备及存储介质”的中国发明专利申请为基础,并要求其优先权。This application is based on the Chinese invention patent application with the application number 202011359384.8 filed on November 27, 2020, entitled "A Model Optimization Method, Apparatus, Computer Equipment and Storage Medium", and claims its priority.
本申请涉及人工智能的模型优化,尤其涉及一种应用于动量梯度下降的模型优化方法、装置、计算机设备及存储介质。The present application relates to model optimization of artificial intelligence, and in particular, to a model optimization method, device, computer equipment and storage medium applied to momentum gradient descent.
最优化问题是计算数学中最为重要的研究方向之一。在深度学习领域,优化算法同样是关键环节之一。即使完全相同的数据集与模型架构,不同的优化算法也很可能导致不同的训练结果,甚至有的模型出现不收敛现象。Optimization problem is one of the most important research directions in computational mathematics. In the field of deep learning, optimization algorithms are also one of the key links. Even with the same data set and model architecture, different optimization algorithms are likely to lead to different training results, and even some models do not converge.
现有一种模型优化方法,在深度学习的模型训练过程中,采用指数加权移动平均的方式,基于积攒了历史梯度的动量对该模型进行训练,以提高该模型的准确率。There is an existing model optimization method. In the model training process of deep learning, an exponentially weighted moving average is used to train the model based on the momentum accumulated with historical gradients, so as to improve the accuracy of the model.
然而,申请人意识到申请人发现传统的模型优化方法普遍不智能,在模型优化的过程中Embedding层会出现过拟合的问题。However, the applicant realizes that the applicant finds that the traditional model optimization method is generally unintelligent, and the Embedding layer may have an overfitting problem during the model optimization process.
发明内容SUMMARY OF THE INVENTION
本申请实施例的目的在于提出一种应用于动量梯度下降的模型优化方法、装置、计算机设备及存储介质,以解决传统的模型优化方法在模型优化的过程中Embedding层会出现过拟合的问题。The purpose of the embodiments of the present application is to propose a model optimization method, device, computer equipment and storage medium applied to momentum gradient descent, so as to solve the problem that the traditional model optimization method will overfit the Embedding layer during the model optimization process. .
为了解决上述技术问题,本申请实施例提供一种应用于动量梯度下降的模型优化方法,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiments of the present application provide a model optimization method applied to momentum gradient descent, which adopts the following technical solutions:
接收用户终端发送的模型优化请求,所述模型优化请求至少携带有原始预测模型以及原始训练数据集;receiving a model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;
在所述原始训练数据集中进行采样操作,得到本轮训练数据集;Perform a sampling operation in the original training data set to obtain the current round of training data sets;
基于所述本轮训练数据集定义目标函数;Define an objective function based on the current round of training data sets;
初始化所述原始预测模型的模型优化参数,得到初始速度参数以及初始决策参数;Initializing the model optimization parameters of the original prediction model to obtain initial speed parameters and initial decision parameters;
计算本轮需要更新所述初始决策参数对应的梯度数据;The gradient data corresponding to the initial decision parameter needs to be updated in the calculation of the current round;
判断所述梯度数据是否已更新;determine whether the gradient data has been updated;
若所述梯度数据未更新,则输出采样异常信号;If the gradient data is not updated, output a sampling abnormal signal;
若所述梯度数据已更新,则基于所述梯度数据更新所述初始速度参数,得到更新速度;If the gradient data has been updated, update the initial speed parameter based on the gradient data to obtain an update speed;
基于所述更新速度更新所述初始决策参数,得到更新决策参数;Update the initial decision parameters based on the update speed to obtain update decision parameters;
当所述初始决策参数以及所述更新决策参数满足收敛条件时,得到目标预测模型。When the initial decision parameter and the updated decision parameter satisfy the convergence condition, a target prediction model is obtained.
为了解决上述技术问题,本申请实施例还提供一种应用于动量梯度下降的模型优化装置,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiment of the present application also provides a model optimization device applied to momentum gradient descent, which adopts the following technical solutions:
请求接收模块,用于接收用户终端发送的模型优化请求,所述模型优化请求至少携带有原始预测模型以及原始训练数据集;a request receiving module, configured to receive a model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;
采样操作模块,用于在所述原始训练数据集中进行采样操作,得到本轮训练数据集;a sampling operation module, used for sampling operation in the original training data set to obtain the training data set of this round;
函数定义模块,用于基于所述本轮训练数据集定义目标函数;a function definition module for defining an objective function based on the current round of training data sets;
初始化模块,用于初始化所述原始预测模型的模型优化参数,得到初始速度参数以及 初始决策参数;an initialization module for initializing the model optimization parameters of the original prediction model to obtain initial speed parameters and initial decision-making parameters;
梯度计算模块,用于计算本轮需要更新所述初始决策参数对应的梯度数据;a gradient calculation module, used to calculate the gradient data corresponding to the initial decision parameter that needs to be updated in this round;
梯度判断模块,用于判断所述梯度数据是否已更新;a gradient judgment module for judging whether the gradient data has been updated;
异常确认模块,用于若所述梯度数据未更新,则输出采样异常信号;An abnormality confirmation module, used for outputting a sampling abnormality signal if the gradient data is not updated;
速度参数更新模块,用于若所述梯度数据已更新,则基于所述梯度数据更新所述初始速度参数,得到更新速度;a speed parameter update module, configured to update the initial speed parameter based on the gradient data to obtain an update speed if the gradient data has been updated;
决策参数更新模块,用于基于所述更新速度更新所述初始决策参数,得到更新决策参数;a decision parameter update module, configured to update the initial decision parameter based on the update speed to obtain an update decision parameter;
目标模型获取模块,用于当所述初始决策参数以及所述更新决策参数满足收敛条件时,得到目标预测模型。A target model obtaining module, configured to obtain a target prediction model when the initial decision parameters and the updated decision parameters satisfy a convergence condition.
为了解决上述技术问题,本申请实施例还提供一种计算机设备,采用了如下所述的技术方案:In order to solve the above-mentioned technical problems, the embodiment of the present application also provides a computer device, which adopts the following technical solutions:
包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下所述的应用于动量梯度下降的模型优化方法的步骤:It includes a memory and a processor, the memory stores computer-readable instructions, and when the processor executes the computer-readable instructions, the processor implements the steps of the model optimization method applied to the momentum gradient descent as described below:
接收用户终端发送的模型优化请求,所述模型优化请求至少携带有原始预测模型以及原始训练数据集;receiving a model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;
在所述原始训练数据集中进行采样操作,得到本轮训练数据集;Perform a sampling operation in the original training data set to obtain the current round of training data sets;
基于所述本轮训练数据集定义目标函数;Define an objective function based on the current round of training data sets;
初始化所述原始预测模型的模型优化参数,得到初始速度参数以及初始决策参数;Initializing the model optimization parameters of the original prediction model to obtain initial speed parameters and initial decision parameters;
计算本轮需要更新所述初始决策参数对应的梯度数据;The gradient data corresponding to the initial decision parameter needs to be updated in the calculation of the current round;
判断所述梯度数据是否已更新;determine whether the gradient data has been updated;
若所述梯度数据未更新,则输出采样异常信号;If the gradient data is not updated, output a sampling abnormal signal;
若所述梯度数据已更新,则基于所述梯度数据更新所述初始速度参数,得到更新速度;If the gradient data has been updated, update the initial speed parameter based on the gradient data to obtain an update speed;
基于所述更新速度更新所述初始决策参数,得到更新决策参数;Update the initial decision parameters based on the update speed to obtain update decision parameters;
当所述初始决策参数以及所述更新决策参数满足收敛条件时,得到目标预测模型。When the initial decision parameter and the updated decision parameter satisfy the convergence condition, a target prediction model is obtained.
为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical solutions:
所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下所述的应用于动量梯度下降的模型优化方法的步骤:The computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, implements the steps of the model optimization method applied to the momentum gradient descent as described below:
接收用户终端发送的模型优化请求,所述模型优化请求至少携带有原始预测模型以及原始训练数据集;receiving a model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;
在所述原始训练数据集中进行采样操作,得到本轮训练数据集;Perform a sampling operation in the original training data set to obtain the current round of training data sets;
基于所述本轮训练数据集定义目标函数;Define an objective function based on the current round of training data sets;
初始化所述原始预测模型的模型优化参数,得到初始速度参数以及初始决策参数;Initializing the model optimization parameters of the original prediction model to obtain initial speed parameters and initial decision parameters;
计算本轮需要更新所述初始决策参数对应的梯度数据;The gradient data corresponding to the initial decision parameter needs to be updated in the calculation of the current round;
判断所述梯度数据是否已更新;determine whether the gradient data has been updated;
若所述梯度数据未更新,则输出采样异常信号;If the gradient data is not updated, output a sampling abnormal signal;
若所述梯度数据已更新,则基于所述梯度数据更新所述初始速度参数,得到更新速度;If the gradient data has been updated, update the initial speed parameter based on the gradient data to obtain an update speed;
基于所述更新速度更新所述初始决策参数,得到更新决策参数;Update the initial decision parameters based on the update speed to obtain update decision parameters;
当所述初始决策参数以及所述更新决策参数满足收敛条件时,得到目标预测模型。When the initial decision parameter and the updated decision parameter satisfy the convergence condition, a target prediction model is obtained.
与现有技术相比,本申请实施例提供的应用于动量梯度下降的模型优化方法、装置、计算机设备及存储介质主要有以下有益效果:Compared with the prior art, the model optimization method, device, computer equipment and storage medium applied to momentum gradient descent provided by the embodiments of the present application mainly have the following beneficial effects:
本申请提供了一种应用于动量梯度下降的模型优化方法,接收用户终端发送的模型优化请求,所述模型优化请求至少携带有原始预测模型以及原始训练数据集;在所述原始训练数据集中进行采样操作,得到本轮训练数据集;基于所述本轮训练数据集定义目标函数;初始化模型优化算法参数,得到初始速度参数以及初始决策参数;计算本轮需要更新所述 初始决策参数对应的梯度数据;判断所述梯度数据是否已更新;若所述梯度数据未更新,则输出采样异常信号;若所述梯度数据已更新,则基于所述梯度数据更新所述初始速度参数,得到更新速度;基于所述更新速度更新所述初始决策参数,得到更新决策参数;当所述初始决策参数以及所述更新决策参数满足收敛条件时,得到目标预测模型。由于带动量的随机梯度下降在训练过程中,当前轮次的训练数据没有被采样到,而该轮次梯度更新仍然会使用历史动量来更新,这可能导致Embedding层过拟合,本申请在更新梯度之前,通过确认梯度数据是否已经更新,从而确认该轮次的训练数据确定被采样,才进行该梯度更新操作,从而有效避免在训练时当前batch中没被采样到的词,依然会使用历史动量来更新导致Embedding层过拟合的问题。The present application provides a model optimization method applied to momentum gradient descent, which receives a model optimization request sent by a user terminal, where the model optimization request at least carries an original prediction model and an original training data set; Sampling operation to obtain the current round of training data sets; define an objective function based on the current round of training data sets; initialize model optimization algorithm parameters to obtain initial speed parameters and initial decision parameters; calculate the gradient corresponding to the initial decision parameters that need to be updated in this round data; determine whether the gradient data has been updated; if the gradient data has not been updated, output a sampling abnormal signal; if the gradient data has been updated, update the initial speed parameter based on the gradient data to obtain an update speed; The initial decision parameter is updated based on the update speed to obtain an updated decision parameter; when the initial decision parameter and the updated decision parameter satisfy a convergence condition, a target prediction model is obtained. During the training process of stochastic gradient descent with momentum, the training data of the current round is not sampled, and the gradient update of this round will still use historical momentum to update, which may lead to overfitting of the Embedding layer. Before the gradient, by confirming whether the gradient data has been updated, the training data of this round is confirmed to be sampled, and then the gradient update operation is performed, thereby effectively avoiding the words that have not been sampled in the current batch during training, and the history will still be used. Momentum to update causes the problem of overfitting of the Embedding layer.
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the solutions in the present application more clearly, the following will briefly introduce the accompanying drawings used in the description of the embodiments of the present application. For those of ordinary skill, other drawings can also be obtained from these drawings without any creative effort.
图1是本申请实施例一提供的应用于动量梯度下降的模型优化方法的实现流程图;Fig. 1 is the realization flow chart of the model optimization method applied to momentum gradient descent provided by the first embodiment of the present application;
图2是图1中步骤S103的实现流程图;Fig. 2 is the realization flow chart of step S103 in Fig. 1;
图3是图1中步骤S110的实现流程图;Fig. 3 is the realization flow chart of step S110 in Fig. 1;
图4是本申请实施例二提供的应用于动量梯度下降的模型优化装置的结构示意图;4 is a schematic structural diagram of a model optimization device applied to momentum gradient descent provided by Embodiment 2 of the present application;
图5是图4中函数定义模块103的结构示意图;Fig. 5 is the structural representation of function definition module 103 in Fig. 4;
图6是根据本申请的计算机设备的一个实施例的结构示意图。FIG. 6 is a schematic structural diagram of an embodiment of a computer device according to the present application.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field of this application; the terms used herein in the specification of the application are for the purpose of describing specific embodiments only It is not intended to limit the application; the terms "comprising" and "having" and any variations thereof in the description and claims of this application and the above description of the drawings are intended to cover non-exclusive inclusion. The terms "first", "second" and the like in the description and claims of the present application or the above drawings are used to distinguish different objects, rather than to describe a specific order.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.
为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述。In order to make those skilled in the art better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the accompanying drawings.
实施例一Example 1
如图1所示,示出了根据本申请实施例一提供的应用于动量梯度下降的模型优化方法的实现流程图,为了便于说明,仅示出与本申请相关的部分。As shown in FIG. 1 , it shows the implementation flow chart of the model optimization method applied to the momentum gradient descent provided according to the first embodiment of the present application. For the convenience of description, only the part related to the present application is shown.
在步骤S101中,接收用户终端发送的模型优化请求,模型优化请求至少携带有原始预测模型以及原始训练数据集。In step S101, a model optimization request sent by a user terminal is received, where the model optimization request at least carries the original prediction model and the original training data set.
在本申请实施例中,用户终端指的是用于执行本申请提供的预防证件滥用的图像处理方法的终端设备,该当前终端可以是诸如移动电话、智能电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、导航装置等等的移动终端以及诸如数字TV、台式计算机等等的固定终端,应当理解,此处对用户终端的举例仅为方便理解,不用于限定本申请。In this embodiment of the present application, a user terminal refers to a terminal device used to execute the image processing method for preventing credential abuse provided by the present application, and the current terminal may be, for example, a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, Mobile terminals such as PDAs (Personal Digital Assistants), PADs (Tablet Computers), PMPs (Portable Multimedia Players), navigation devices, etc., as well as stationary terminals such as digital TVs, desktop computers, etc. The examples are only for the convenience of understanding, and are not used to limit the present application.
在本申请实施例中,原始预测模型未进行梯度下降优化的预测模型。In the embodiment of the present application, the original prediction model is not a prediction model optimized by gradient descent.
在步骤S102中,在原始训练数据集中进行采样操作,得到本轮训练数据集。In step S102, a sampling operation is performed in the original training data set to obtain the current round of training data sets.
在本申请实施例中,采样操作是指从总体训练数据中抽取个体或样品的过程,也即对总体训练数据进行试验或观测的过程。分随机抽样和非随机抽样两种类型。前者指遵照随机化原则从总体中抽取样本的抽样方法,它不带任何主观性,包括简单随机抽样、系统抽样、整群抽样和分层抽样。后者是一种凭研究者的观点、经验或者有关知识来抽取样本的方法,带有明显主观色彩。In the embodiment of the present application, the sampling operation refers to the process of extracting individuals or samples from the overall training data, that is, the process of performing experiments or observations on the overall training data. There are two types of random sampling and non-random sampling. The former refers to a sampling method that draws samples from the population in accordance with the principle of randomization, without any subjectivity, including simple random sampling, systematic sampling, cluster sampling and stratified sampling. The latter is a method of extracting samples based on the researcher's point of view, experience or related knowledge, with obvious subjective color.
在本申请实施例中,本轮训练数据集指的是经过上述采样操作后筛选出的数据量较小的训练数据集,以减少模型的训练时间。In the embodiment of the present application, the training data set of the current round refers to a training data set with a small amount of data selected after the above sampling operation, so as to reduce the training time of the model.
在步骤S103中,基于本轮训练数据集定义目标函数。In step S103, an objective function is defined based on the current round of training data sets.
在本申请实施例中,可基于用户文本的数据集生成用户-文本矩阵R,基于奇异值分解法对用户-文本矩阵R进行分解操作,得到用户-隐特征矩阵P以及隐特征-文本矩阵Q,基于用户-文本矩阵R构造目标函数
目标函数
表示为:
In this embodiment of the present application, a user-text matrix R may be generated based on a data set of user texts, and the user-text matrix R may be decomposed based on the singular value decomposition method to obtain a user-hidden feature matrix P and a latent feature-text matrix Q. , construct the objective function based on the user-text matrix R objective function Expressed as:
其中,R
(Λ)表示用户-文本矩阵R用户对文本的评分数据集合;p
m`表示用户-隐特征矩阵P中第m个用户对应的隐特征;q
n`表示隐特征-文本矩阵Q中第n个文本对应的隐特征;r
m,n表示用户m对文本n的评分数据;
表示评分数据集合中用户m对文本n的评分数据;λ
2表示隐特征矩阵的正则化因子。
Among them, R (Λ) represents the user-text matrix R user's scoring data set of text; p m` represents the latent feature corresponding to the mth user in the user-hidden feature matrix P; q n` represents the latent feature-text matrix Q The hidden feature corresponding to the n-th text in ; r m,n represents the rating data of user m for text n; Represents the rating data of user m to text n in the rating data set; λ 2 represents the regularization factor of the latent feature matrix.
在步骤S104中,初始化原始预测模型的模型优化参数,得到初始速度参数以及初始决策参数。In step S104, the model optimization parameters of the original prediction model are initialized to obtain initial speed parameters and initial decision parameters.
在本申请实施例中,初始化就是把变量赋为默认值,把控件设为默认状态,具体的,包括初始化学习率∈、动量参数α、初始决策参数θ和初始速度v。In the embodiment of the present application, initialization is to assign a variable to a default value and a control to a default state. Specifically, it includes an initialization learning rate ∈, a momentum parameter α, an initial decision parameter θ, and an initial velocity v.
在步骤S105中,计算本轮需要更新初始决策参数对应的梯度数据。In step S105, the gradient data corresponding to the initial decision parameters that need to be updated in the current round is calculated.
在本申请实施例中,梯度数据表示为:In the embodiment of the present application, the gradient data is expressed as:
其中,g表示梯度数据;m表示本轮训练数据的总数量;θ表示初始决策参数;x
(i)表示第i个本轮训练数据;
表示目标函数。
Among them, g represents the gradient data; m represents the total number of training data in this round; θ represents the initial decision parameter; x (i) represents the i-th training data in this round; represents the objective function.
在步骤S106中,判断梯度数据是否已更新。In step S106, it is determined whether the gradient data has been updated.
在本申请实施例中,当一个训练数据被采样过后,它的Embedding的梯度不为0,基于该采样的特征,通过判断梯度数据是否已更新,即可获知该训练数据是否被采样过。In the embodiment of the present application, after a training data is sampled, the gradient of its Embedding is not 0. Based on the characteristics of the sampling, it is possible to know whether the training data has been sampled by judging whether the gradient data has been updated.
在步骤S107中,若梯度数据未更新,则输出采样异常信号。In step S107, if the gradient data is not updated, a sampling abnormal signal is output.
在本申请实施例中,若梯度数据未更新,则说明该训练数据没有被采样过便进行后续的更新操作,没有被反复采样的训练数据,对应的Embedding层基于历史动量也会被被反复训练更新,导致了过拟合情况发生。In the embodiment of the present application, if the gradient data has not been updated, it means that the training data has not been sampled before performing subsequent update operations, and there is no training data that has been repeatedly sampled, and the corresponding Embedding layer will also be repeatedly trained based on historical momentum. Update, resulting in overfitting.
在步骤S108中,若梯度数据已更新,则基于梯度数据更新初始速度参数,得到更新速度。In step S108, if the gradient data has been updated, the initial speed parameter is updated based on the gradient data to obtain the update speed.
在本申请实施例中,更新速度表示为:In the embodiment of the present application, the update speed is expressed as:
v
new=αv
old-∈g
v new = αv old -∈g
其中,v
new表示更新速度;v
old表示初始速度参数;α表示动量参数;∈表示学习率;g表示梯度数据。
Among them, v new represents the update speed; v old represents the initial speed parameter; α represents the momentum parameter; ∈ represents the learning rate; g represents the gradient data.
在步骤S109中,基于更新速度更新初始决策参数,得到更新决策参数。In step S109, the initial decision parameter is updated based on the update speed to obtain the update decision parameter.
在本申请实施例中,更新决策参数表示为:In the embodiment of the present application, the update decision parameter is expressed as:
θ
new=θ
old+v
new
θ new = θ old +v new
其中,θ
new表示更新决策参数;θ
old表示初始决策参数;v
new表示更新速度。
Among them, θ new represents the update decision parameter; θ old represents the initial decision parameter; v new represents the update speed.
在步骤S110中,当初始决策参数以及更新决策参数满足收敛条件时,得到目标预测模型。In step S110, when the initial decision parameters and the updated decision parameters satisfy the convergence condition, a target prediction model is obtained.
本申请实施例一提供的应用于动量梯度下降的模型优化方法,接收用户终端发送的模型优化请求,模型优化请求至少携带有原始预测模型以及原始训练数据集;在原始训练数据集中进行采样操作,得到本轮训练数据集;基于本轮训练数据集定义目标函数;初始化模型优化算法参数,得到初始速度参数以及初始决策参数;计算本轮需要更新初始决策参数对应的梯度数据;判断梯度数据是否已更新;若梯度数据未更新,则输出采样异常信号;若梯度数据已更新,则基于梯度数据更新初始速度参数,得到更新速度;基于更新速度更新初始决策参数,得到更新决策参数;当初始决策参数以及更新决策参数满足收敛条件时,得到目标预测模型。由于带动量的随机梯度下降在训练过程中,当前轮次的训练数据没有被采样到,而该轮次梯度更新仍然会使用历史动量来更新,这可能导致Embedding层过拟合,本申请在更新梯度之前,通过确认梯度数据是否已经更新,从而确认该轮次的训练数据确定被采样,才进行该梯度更新操作,从而有效避免在训练时当前batch中没被采样到的词,依然会使用历史动量来更新导致Embedding层过拟合的问题。The model optimization method applied to momentum gradient descent provided by the first embodiment of the present application receives a model optimization request sent by a user terminal, and the model optimization request carries at least the original prediction model and the original training data set; the sampling operation is performed in the original training data set, Obtain the training data set of this round; define the objective function based on the training data set of this round; initialize the parameters of the model optimization algorithm to obtain the initial speed parameters and initial decision parameters; calculate the gradient data corresponding to the initial decision parameters that need to be updated in this round; judge whether the gradient data has been Update; if the gradient data is not updated, the sampling abnormal signal is output; if the gradient data has been updated, the initial speed parameter is updated based on the gradient data to obtain the update speed; the initial decision parameter is updated based on the update speed, and the updated decision parameter is obtained; when the initial decision parameter And when the updated decision parameters meet the convergence conditions, the target prediction model is obtained. During the training process of stochastic gradient descent with momentum, the training data of the current round is not sampled, and the gradient update of this round will still use historical momentum to update, which may lead to overfitting of the Embedding layer. Before the gradient, confirm whether the gradient data has been updated, so as to confirm that the training data of this round is definitely sampled, and then perform the gradient update operation, thereby effectively avoiding the words that have not been sampled in the current batch during training, and still use the history. Momentum to update causes the problem of overfitting of the Embedding layer.
继续参阅图2,示出了图1中步骤S103的实现流程图,为了便于说明,仅示出与本申请相关的部分。Continuing to refer to FIG. 2 , a flowchart of the implementation of step S103 in FIG. 1 is shown. For the convenience of description, only the parts related to the present application are shown.
在本申请实施例一的一些可选的实现方式中,上述步骤S103具体包括:步骤S201、步骤S202以及步骤S203。In some optional implementation manners of Embodiment 1 of the present application, the foregoing step S103 specifically includes: step S201 , step S202 and step S203 .
在步骤S201中,基于用户文本的数据集生成用户-文本矩阵R。In step S201, a user-text matrix R is generated based on a data set of user texts.
在步骤S202中,基于奇异值分解法对用户-文本矩阵R进行分解操作,得到用户-隐特征矩阵P以及隐特征-文本矩阵Q。In step S202, the user-text matrix R is decomposed based on the singular value decomposition method to obtain the user-hidden feature matrix P and the latent feature-text matrix Q.
在本申请实施例中,奇异值分解(Singular Value Decomposition)是线性代数中一种重要的矩阵分解,奇异值分解则是特征分解在任意矩阵上的推广。In the embodiments of the present application, singular value decomposition (Singular Value Decomposition) is an important matrix decomposition in linear algebra, and singular value decomposition is a generalization of eigen decomposition on any matrix.
在步骤S203中,基于用户-文本矩阵R构造目标函数。In step S203, an objective function is constructed based on the user-text matrix R.
在本申请实施例中,目标函数
表示为:
In this embodiment of the present application, the objective function Expressed as:
其中,R
(Λ)表示用户-文本矩阵R用户对文本的评分数据集合;p
m`表示用户-隐特征矩阵P中第m个用户对应的隐特征;q
n`表示隐特征-文本矩阵Q中第n个文本对应的隐特征;r
m,n表示用户m对文本n的评分数据;
表示评分数据集合中用户m对文本n的评分数据;λ
2表示隐特征矩阵的正则化因子。
Among them, R (Λ) represents the user-text matrix R user's scoring data set of text; p m` represents the latent feature corresponding to the mth user in the user-hidden feature matrix P; q n` represents the latent feature-text matrix Q The hidden feature corresponding to the n-th text in ; r m,n represents the rating data of user m for text n; Represents the rating data of user m to text n in the rating data set; λ 2 represents the regularization factor of the latent feature matrix.
继续参阅图3,示出了图1中步骤S110的实现流程图,为了便于说明,仅示出与本申请相关的部分。Continuing to refer to FIG. 3 , a flowchart of the implementation of step S110 in FIG. 1 is shown. For the convenience of description, only the parts related to the present application are shown.
在本申请实施例一的一些可选的实现方式中,上述步骤S110具体包括:步骤S301、步骤S302、步骤S303以及步骤S304。In some optional implementation manners of Embodiment 1 of the present application, the foregoing step S110 specifically includes: step S301 , step S302 , step S303 and step S304 .
在步骤S301中,计算初始决策参数以及更新决策参数的决策参数差值。In step S301, the initial decision parameter and the decision parameter difference of the updated decision parameter are calculated.
在本申请实施例中,决策参数差值主要用于判断当前模型参数与上轮模型参数的变化量,当该变化量小于一定数值时,则认为决策参数趋向于某个稳定的数值,以使得该预测模型达到稳定。In the embodiment of the present application, the difference value of the decision parameter is mainly used to judge the change amount of the current model parameter and the model parameter of the previous round. When the change amount is less than a certain value, it is considered that the decision parameter tends to a certain stable value, so that the The predictive model reaches stability.
在步骤S302中,判断决策参数差值是否小于预设收敛阈值。In step S302, it is determined whether the decision parameter difference is smaller than a preset convergence threshold.
在本申请实施例中,用户可以根据实际情况调整预设收敛阈值。In this embodiment of the present application, the user can adjust the preset convergence threshold according to the actual situation.
在步骤S303中,若决策参数差值小于或等于预设收敛阈值,则确定当前的预测模型收敛,并将当前的预测模型作为目标预测模型。In step S303, if the decision parameter difference is less than or equal to the preset convergence threshold, it is determined that the current prediction model is converged, and the current prediction model is used as the target prediction model.
在本申请实施例中,当决策参数差值小于或等于预设收敛阈值,则说明决策参数趋向 于某个稳定的数值,该预测模型达到稳定。In the embodiment of the present application, when the decision parameter difference is less than or equal to the preset convergence threshold, it means that the decision parameter tends to a certain stable value, and the prediction model is stable.
在步骤S304中,若决策参数差值大于预设收敛阈值,则则确定当前的预测模型未收敛,继续执行参数优化操作。In step S304, if the difference of the decision parameters is greater than the preset convergence threshold, it is determined that the current prediction model has not converged, and the parameter optimization operation is continued.
在本申请实施例中,当决策参数差值大于预设收敛阈值,则说明决策参数未达到某个稳定的数值,该预测模型的参数仍然需要进行优化。In the embodiment of the present application, when the difference of the decision parameters is greater than the preset convergence threshold, it means that the decision parameters have not reached a certain stable value, and the parameters of the prediction model still need to be optimized.
在本申请实施例一的一些可选的实现方式中,梯度数据表示为:In some optional implementations of Embodiment 1 of the present application, the gradient data is represented as:
其中,g表示梯度数据;m表示本轮训练数据的总数量;θ表示初始决策参数;x
(i)表示第i个本轮训练数据;
表示目标函数。
Among them, g represents the gradient data; m represents the total number of training data in this round; θ represents the initial decision parameter; x (i) represents the i-th training data in this round; represents the objective function.
在本申请实施例一的一些可选的实现方式中,更新速度表示为:In some optional implementation manners of Embodiment 1 of the present application, the update speed is expressed as:
v
new=αv
old-∈g
v new = αv old -∈g
其中,v
new表示更新速度;v
old表示初始速度参数;α表示动量参数;∈表示学习率;g表示梯度数据。
Among them, v new represents the update speed; v old represents the initial speed parameter; α represents the momentum parameter; ∈ represents the learning rate; g represents the gradient data.
在本申请实施例一的一些可选的实现方式中,更新决策参数表示为:In some optional implementations of Embodiment 1 of the present application, the update decision parameter is expressed as:
θ
new=θ
old+v
new
θ new = θ old +v new
其中,θ
new表示更新决策参数;θ
old表示初始决策参数;v
new表示更新速度。
Among them, θ new represents the update decision parameter; θ old represents the initial decision parameter; v new represents the update speed.
综上,本申请实施例一提供的应用于动量梯度下降的模型优化方法,接收用户终端发送的模型优化请求,模型优化请求至少携带有原始预测模型以及原始训练数据集;在原始训练数据集中进行采样操作,得到本轮训练数据集;基于本轮训练数据集定义目标函数;初始化模型优化算法参数,得到初始速度参数以及初始决策参数;计算本轮需要更新初始决策参数对应的梯度数据;判断梯度数据是否已更新;若梯度数据未更新,则输出采样异常信号;若梯度数据已更新,则基于梯度数据更新初始速度参数,得到更新速度;基于更新速度更新初始决策参数,得到更新决策参数;当初始决策参数以及更新决策参数满足收敛条件时,得到目标预测模型。由于带动量的随机梯度下降在训练过程中,当前轮次的训练数据没有被采样到,而该轮次梯度更新仍然会使用历史动量来更新,这可能导致Embedding层过拟合,本申请在更新梯度之前,通过确认梯度数据是否已经更新,从而确认该轮次的训练数据确定被采样,才进行该梯度更新操作,从而有效避免在训练时当前batch中没被采样到的词,依然会使用历史动量来更新导致Embedding层过拟合的问题。To sum up, the model optimization method applied to momentum gradient descent provided by the first embodiment of the present application receives a model optimization request sent by a user terminal, and the model optimization request at least carries the original prediction model and the original training data set; Sampling operation to obtain the training data set of this round; define the objective function based on the training data set of this round; initialize the parameters of the model optimization algorithm to obtain the initial speed parameters and initial decision parameters; calculate the gradient data corresponding to the initial decision parameters that need to be updated in this round; judge the gradient Whether the data has been updated; if the gradient data has not been updated, output a sampling abnormal signal; if the gradient data has been updated, update the initial speed parameter based on the gradient data to obtain the update speed; update the initial decision parameter based on the update speed to obtain the update decision parameter; when When the initial decision parameters and the updated decision parameters satisfy the convergence conditions, the target prediction model is obtained. During the training process of stochastic gradient descent with momentum, the training data of the current round is not sampled, and the gradient update of this round will still use historical momentum to update, which may lead to overfitting of the Embedding layer. Before the gradient, by confirming whether the gradient data has been updated, the training data of this round is confirmed to be sampled, and then the gradient update operation is performed, thereby effectively avoiding the words that have not been sampled in the current batch during training, and the history will still be used. Momentum to update causes the problem of overfitting of the Embedding layer.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a computer-readable storage medium. , when the computer-readable instructions are executed, the processes of the above-mentioned method embodiments may be included. Wherein, the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowchart of the accompanying drawings are sequentially shown in the order indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order and may be performed in other orders. Moreover, at least a part of the steps in the flowcharts of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the execution sequence is also It does not have to be performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of sub-steps or stages of other steps.
实施例二Embodiment 2
进一步参考图4,作为对上述图1所示方法的实现,本申请提供了一种应用于动量梯度下降的模型优化装置的一个实施例,该装置实施例与图1所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。Further referring to FIG. 4 , as an implementation of the method shown in FIG. 1 , the present application provides an embodiment of a model optimization device applied to momentum gradient descent, which is similar to the method embodiment shown in FIG. 1 . Correspondingly, the apparatus can be specifically applied to various electronic devices.
如图4所示,本实施例的应用于动量梯度下降的模型优化装置100包括:请求接收模块101、采样操作模块102、函数定义模块103、初始化模块104、梯度计算模块105、梯度判断模块106、异常确认模块107、速度参数更新模块108、决策参数更新模块109以及 目标模型获取模块110。其中:As shown in FIG. 4 , the model optimization device 100 applied to momentum gradient descent in this embodiment includes: a request receiving module 101 , a sampling operation module 102 , a function definition module 103 , an initialization module 104 , a gradient calculation module 105 , and a gradient judgment module 106 , an abnormality confirmation module 107 , a speed parameter update module 108 , a decision parameter update module 109 and a target model acquisition module 110 . in:
请求接收模块101,用于接收用户终端发送的模型优化请求,模型优化请求至少携带有原始预测模型以及原始训练数据集;a request receiving module 101, configured to receive a model optimization request sent by a user terminal, where the model optimization request at least carries the original prediction model and the original training data set;
采样操作模块102,用于在原始训练数据集中进行采样操作,得到本轮训练数据集;The sampling operation module 102 is used to perform sampling operation in the original training data set to obtain the training data set of this round;
函数定义模块103,用于基于本轮训练数据集定义目标函数;The function definition module 103 is used to define an objective function based on the current round of training data sets;
初始化模块104,用于初始化原始预测模型的模型优化参数,得到初始速度参数以及初始决策参数;The initialization module 104 is used to initialize the model optimization parameters of the original prediction model to obtain initial speed parameters and initial decision parameters;
梯度计算模块105,用于计算本轮需要更新初始决策参数对应的梯度数据;The gradient calculation module 105 is used to calculate the gradient data corresponding to the initial decision parameter that needs to be updated in this round;
梯度判断模块106,用于判断梯度数据是否已更新;The gradient judgment module 106 is used for judging whether the gradient data has been updated;
异常确认模块107,用于若梯度数据未更新,则输出采样异常信号;An abnormality confirmation module 107, configured to output a sampling abnormality signal if the gradient data is not updated;
速度参数更新模块108,用于若梯度数据已更新,则基于梯度数据更新初始速度参数,得到更新速度;a speed parameter update module 108, configured to update the initial speed parameter based on the gradient data to obtain the update speed if the gradient data has been updated;
决策参数更新模块109,用于基于更新速度更新初始决策参数,得到更新决策参数;A decision parameter update module 109, configured to update the initial decision parameter based on the update speed to obtain the updated decision parameter;
目标模型获取模块110,用于当初始决策参数以及更新决策参数满足收敛条件时,得到目标预测模型。The target model obtaining module 110 is configured to obtain the target prediction model when the initial decision parameters and the updated decision parameters satisfy the convergence condition.
在本申请实施例中,用户终端指的是用于执行本申请提供的预防证件滥用的图像处理方法的终端设备,该当前终端可以是诸如移动电话、智能电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、导航装置等等的移动终端以及诸如数字TV、台式计算机等等的固定终端,应当理解,此处对用户终端的举例仅为方便理解,不用于限定本申请。In this embodiment of the present application, a user terminal refers to a terminal device used to execute the image processing method for preventing credential abuse provided by the present application, and the current terminal may be, for example, a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, Mobile terminals such as PDAs (Personal Digital Assistants), PADs (Tablet Computers), PMPs (Portable Multimedia Players), navigation devices, etc., as well as stationary terminals such as digital TVs, desktop computers, etc. The examples are only for the convenience of understanding, and are not used to limit the present application.
在本申请实施例中,原始预测模型未进行梯度下降优化的预测模型。In the embodiment of the present application, the original prediction model is not a prediction model optimized by gradient descent.
在本申请实施例中,采样操作是指从总体训练数据中抽取个体或样品的过程,也即对总体训练数据进行试验或观测的过程。分随机抽样和非随机抽样两种类型。前者指遵照随机化原则从总体中抽取样本的抽样方法,它不带任何主观性,包括简单随机抽样、系统抽样、整群抽样和分层抽样。后者是一种凭研究者的观点、经验或者有关知识来抽取样本的方法,带有明显主观色彩。In the embodiment of the present application, the sampling operation refers to the process of extracting individuals or samples from the overall training data, that is, the process of performing experiments or observations on the overall training data. There are two types of random sampling and non-random sampling. The former refers to a sampling method that draws samples from the population in accordance with the principle of randomization, without any subjectivity, including simple random sampling, systematic sampling, cluster sampling and stratified sampling. The latter is a method of extracting samples based on the researcher's point of view, experience or related knowledge, with obvious subjective color.
在本申请实施例中,本轮训练数据集指的是经过上述采样操作后筛选出的数据量较小的训练数据集,以减少模型的训练时间。In the embodiment of the present application, the training data set of the current round refers to a training data set with a small amount of data selected after the above sampling operation, so as to reduce the training time of the model.
在本申请实施例中,可基于用户文本的数据集生成用户-文本矩阵R,基于奇异值分解法对用户-文本矩阵R进行分解操作,得到用户-隐特征矩阵P以及隐特征-文本矩阵Q,基于用户-文本矩阵R构造目标函数
目标函数
表示为:
In this embodiment of the present application, a user-text matrix R may be generated based on a data set of user texts, and the user-text matrix R may be decomposed based on the singular value decomposition method to obtain a user-hidden feature matrix P and a latent feature-text matrix Q. , construct the objective function based on the user-text matrix R objective function Expressed as:
其中,R
(Λ)表示用户-文本矩阵R用户对文本的评分数据集合;p
m`表示用户-隐特征矩阵P中第m个用户对应的隐特征;q
n`表示隐特征-文本矩阵Q中第n个文本对应的隐特征;r
m,n表示用户m对文本n的评分数据;
表示评分数据集合中用户m对文本n的评分数据;λ
2表示隐特征矩阵的正则化因子。
Among them, R (Λ) represents the user-text matrix R user's scoring data set of text; p m` represents the latent feature corresponding to the mth user in the user-hidden feature matrix P; q n` represents the latent feature-text matrix Q The hidden feature corresponding to the n-th text in ; r m,n represents the rating data of user m for text n; Represents the rating data of user m to text n in the rating data set; λ 2 represents the regularization factor of the latent feature matrix.
在本申请实施例中,初始化就是把变量赋为默认值,把控件设为默认状态,具体的,包括初始化学习率∈、动量参数α、初始决策参数θ和初始速度v。In the embodiment of the present application, initialization is to assign a variable to a default value and a control to a default state. Specifically, it includes an initialization learning rate ∈, a momentum parameter α, an initial decision parameter θ, and an initial velocity v.
在本申请实施例中,梯度数据表示为:In the embodiment of the present application, the gradient data is expressed as:
其中,g表示梯度数据;m表示本轮训练数据的总数量;θ表示初始决策参数;x
(i)表示第i个本轮训练数据;
表示目标函数。
Among them, g represents the gradient data; m represents the total number of training data in this round; θ represents the initial decision parameter; x (i) represents the i-th training data in this round; represents the objective function.
在本申请实施例中,当一个训练数据被采样过后,它的Embedding的梯度不为0,基 于该采样的特征,通过判断梯度数据是否已更新,即可获知该训练数据是否被采样过。In the embodiment of the present application, when a training data is sampled, the gradient of its Embedding is not 0. Based on the characteristics of the sampling, it can be known whether the training data has been sampled by judging whether the gradient data has been updated.
在本申请实施例中,若梯度数据未更新,则说明该训练数据没有被采样过便进行后续的更新操作,没有被反复采样的训练数据,对应的Embedding层基于历史动量也会被被反复训练更新,导致了过拟合情况发生。In the embodiment of the present application, if the gradient data has not been updated, it means that the training data has not been sampled before performing subsequent update operations, and there is no training data that has been repeatedly sampled, and the corresponding Embedding layer will also be repeatedly trained based on historical momentum. Update, resulting in overfitting.
在本申请实施例中,更新速度表示为:In the embodiment of the present application, the update speed is expressed as:
v
new=αv
old-∈g
v new = αv old -∈g
其中,v
new表示更新速度;v
old表示初始速度参数;α表示动量参数;∈表示学习率;g表示梯度数据。
Among them, v new represents the update speed; v old represents the initial speed parameter; α represents the momentum parameter; ∈ represents the learning rate; g represents the gradient data.
在本申请实施例中,更新决策参数表示为:In the embodiment of the present application, the update decision parameter is expressed as:
θ
new=θ
old+v
new
θ new = θ old +v new
其中,θ
new表示更新决策参数;θ
old表示初始决策参数;v
new表示更新速度。
Among them, θ new represents the update decision parameter; θ old represents the initial decision parameter; v new represents the update speed.
本申请实施例二提供的应用于动量梯度下降的模型优化装置,由于带动量的随机梯度下降在训练过程中,当前轮次的训练数据没有被采样到,而该轮次梯度更新仍然会使用历史动量来更新,这可能导致Embedding层过拟合,本申请在更新梯度之前,通过确认梯度数据是否已经更新,从而确认该轮次的训练数据确定被采样,才进行该梯度更新操作,从而有效避免在训练时当前batch中没被采样到的词,依然会使用历史动量来更新导致Embedding层过拟合的问题。In the model optimization device applied to momentum gradient descent provided by the second embodiment of the present application, since the stochastic gradient descent with momentum is in the training process, the training data of the current round has not been sampled, and the gradient update of this round will still use the history Momentum is used to update, which may lead to overfitting of the Embedding layer. Before updating the gradient, this application confirms whether the gradient data has been updated to confirm that the training data of this round is definitely sampled, and then performs the gradient update operation, thereby effectively avoiding Words that have not been sampled in the current batch during training will still use historical momentum to update the problem of overfitting the Embedding layer.
继续参阅图5,示出了图4中函数定义模块103的结构示意图,为了便于说明,仅示出与本申请相关的部分。Continuing to refer to FIG. 5 , a schematic structural diagram of the function definition module 103 in FIG. 4 is shown. For the convenience of description, only the parts related to the present application are shown.
在本申请实施例一的一些可选的实现方式中,上述函数定义模块103具体包括:矩阵生成子模块1031、矩阵分解子模块1032以及函数构造子模块1033。其中:In some optional implementations of Embodiment 1 of the present application, the function definition module 103 specifically includes: a matrix generation submodule 1031 , a matrix decomposition submodule 1032 , and a function construction submodule 1033 . in:
矩阵生成子模块1031,用于基于用户文本的数据集生成用户-文本矩阵;a matrix generation submodule 1031, configured to generate a user-text matrix based on a data set of user texts;
矩阵分解子模块1032,用于基于奇异值分解法对用户-文本矩阵进行分解操作,得到用户-隐特征矩阵以及隐特征-文本矩阵;The matrix decomposition submodule 1032 is configured to perform a decomposition operation on the user-text matrix based on the singular value decomposition method to obtain the user-hidden feature matrix and the latent feature-text matrix;
函数构造子模块1033,用于基于用户-文本矩阵构造目标函数。The function construction sub-module 1033 is used to construct an objective function based on the user-text matrix.
在本申请实施例中,奇异值分解(Singular Value Decomposition)是线性代数中一种重要的矩阵分解,奇异值分解则是特征分解在任意矩阵上的推广。In the embodiments of the present application, singular value decomposition (Singular Value Decomposition) is an important matrix decomposition in linear algebra, and singular value decomposition is a generalization of eigen decomposition on any matrix.
在本申请实施例中,目标函数
表示为:
In this embodiment of the present application, the objective function Expressed as:
其中,R
(Λ)表示用户-文本矩阵R用户对文本的评分数据集合;p
m`表示用户-隐特征矩阵P中第m个用户对应的隐特征;q
n`表示隐特征-文本矩阵Q中第n个文本对应的隐特征;r
m,n表示用户m对文本n的评分数据;
表示评分数据集合中用户m对文本n的评分数据;λ
2表示隐特征矩阵的正则化因子。
Among them, R (Λ) represents the user-text matrix R user's scoring data set of text; p m` represents the latent feature corresponding to the mth user in the user-hidden feature matrix P; q n` represents the latent feature-text matrix Q The hidden feature corresponding to the n-th text in ; r m,n represents the rating data of user m for text n; Represents the rating data of user m to text n in the rating data set; λ 2 represents the regularization factor of the latent feature matrix.
在本申请实施例二的一些可选的实现方式中,梯度数据表示为:In some optional implementations of the second embodiment of the present application, the gradient data is represented as:
其中,g表示梯度数据;m表示本轮训练数据的总数量;θ表示初始决策参数;x
(i)表示第i个本轮训练数据;
表示目标函数。
Among them, g represents the gradient data; m represents the total number of training data in this round; θ represents the initial decision parameter; x (i) represents the i-th training data in this round; represents the objective function.
在本申请实施例二的一些可选的实现方式中,更新速度表示为:In some optional implementations of the second embodiment of the present application, the update speed is expressed as:
v
new=αv
old-∈g
v new = αv old -∈g
其中,v
new表示更新速度;v
old表示初始速度参数;α表示动量参数;∈表示学习率;g表示梯度数据。
Among them, v new represents the update speed; v old represents the initial speed parameter; α represents the momentum parameter; ∈ represents the learning rate; g represents the gradient data.
在本申请实施例二的一些可选的实现方式中,更新决策参数表示为:In some optional implementations of the second embodiment of the present application, the update decision parameter is expressed as:
θ
new=θ
old+v
new
θ new = θ old +v new
其中,θ
new表示更新决策参数;θ
old表示初始决策参数;v
new表示更新速度。
Among them, θ new represents the update decision parameter; θ old represents the initial decision parameter; v new represents the update speed.
在本申请实施例二的一些可实现方式中,上述目标模型获取模块110具体包括:差值计算子模块、收敛判断子模块、收敛确认子模块以及未收敛确认子模块。其中:In some implementations of the second embodiment of the present application, the target model obtaining module 110 specifically includes: a difference calculation submodule, a convergence judgment submodule, a convergence confirmation submodule, and a non-convergence confirmation submodule. in:
差值计算子模块,用于计算所述初始决策参数以及所述更新决策参数的决策参数差值;a difference calculation submodule, configured to calculate the difference between the initial decision parameter and the decision parameter of the updated decision parameter;
收敛判断子模块,用于判断所述决策参数差值是否小于所述预设收敛阈值;a convergence judgment submodule, configured to judge whether the decision parameter difference is less than the preset convergence threshold;
收敛确认子模块,用于若所述决策参数差值小于或等于所述预设收敛阈值,则确定当前的预测模型收敛,并将所述当前的预测模型作为所述目标预测模型;a convergence confirmation submodule, configured to determine that the current prediction model is converged if the decision parameter difference is less than or equal to the preset convergence threshold, and use the current prediction model as the target prediction model;
未收敛确认子模块,用于若所述决策参数差值大于所述预设收敛阈值,则则确定当前的预测模型未收敛,继续执行参数优化操作。The non-convergence confirmation sub-module is configured to determine that the current prediction model is not converged and continue to perform the parameter optimization operation if the decision parameter difference is greater than the preset convergence threshold.
综上,本申请实施例二提供的应用于动量梯度下降的模型优化装置,包括:请求接收模块,用于接收用户终端发送的模型优化请求,模型优化请求至少携带有原始预测模型以及原始训练数据集;采样操作模块,用于在原始训练数据集中进行采样操作,得到本轮训练数据集;函数定义模块,用于基于本轮训练数据集定义目标函数;初始化模块,用于初始化原始预测模型的模型优化参数,得到初始速度参数以及初始决策参数;梯度计算模块,用于计算本轮需要更新初始决策参数对应的梯度数据;梯度判断模块,用于判断梯度数据是否已更新;异常确认模块,用于若梯度数据未更新,则输出采样异常信号;速度参数更新模块,用于若梯度数据已更新,则基于梯度数据更新初始速度参数,得到更新速度;决策参数更新模块,用于基于更新速度更新初始决策参数,得到更新决策参数;目标模型获取模块,用于当初始决策参数以及更新决策参数满足收敛条件时,得到目标预测模型。由于带动量的随机梯度下降在训练过程中,当前轮次的训练数据没有被采样到,而该轮次梯度更新仍然会使用历史动量来更新,这可能导致Embedding层过拟合,本申请在更新梯度之前,通过确认梯度数据是否已经更新,从而确认该轮次的训练数据确定被采样,才进行该梯度更新操作,从而有效避免在训练时当前batch中没被采样到的词,依然会使用历史动量来更新导致Embedding层过拟合的问题。To sum up, the model optimization device applied to momentum gradient descent provided by the second embodiment of the present application includes: a request receiving module, configured to receive a model optimization request sent by a user terminal, where the model optimization request at least carries the original prediction model and original training data set; the sampling operation module is used to perform sampling operation in the original training data set to obtain the current round of training data set; the function definition module is used to define the objective function based on the current round of training data set; the initialization module is used to initialize the original prediction model The model optimizes the parameters to obtain the initial speed parameters and the initial decision parameters; the gradient calculation module is used to calculate the gradient data corresponding to the initial decision parameters that need to be updated in this round; the gradient judgment module is used to judge whether the gradient data has been updated; the abnormal confirmation module is used to If the gradient data has not been updated, the sampling abnormal signal will be output; the speed parameter update module is used to update the initial speed parameter based on the gradient data if the gradient data has been updated to obtain the update speed; the decision parameter update module is used to update based on the update speed. The initial decision parameters are used to obtain the updated decision parameters; the target model acquisition module is used to obtain the target prediction model when the initial decision parameters and the updated decision parameters satisfy the convergence condition. During the training process of stochastic gradient descent with momentum, the training data of the current round is not sampled, and the gradient update of this round will still use historical momentum to update, which may lead to overfitting of the Embedding layer. Before the gradient, by confirming whether the gradient data has been updated, the training data of this round is confirmed to be sampled, and then the gradient update operation is performed, thereby effectively avoiding the words that have not been sampled in the current batch during training, and the history will still be used. Momentum to update causes the problem of overfitting of the Embedding layer.
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图6,图6为本实施例计算机设备基本结构框图。To solve the above technical problems, the embodiments of the present application also provide computer equipment. Please refer to FIG. 6 for details. FIG. 6 is a block diagram of the basic structure of a computer device according to this embodiment.
所述计算机设备200包括通过系统总线相互通信连接存储器210、处理器220、网络接口230。需要指出的是,图中仅示出了具有组件210-230的计算机设备200,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。The computer device 200 includes a memory 210 , a processor 220 , and a network interface 230 that communicate with each other through a system bus. It should be noted that only the computer device 200 with components 210-230 is shown in the figure, but it should be understood that implementation of all of the shown components is not required, and more or less components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment. The computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.
所述存储器210至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等,所述计算机可读存储介质可以是非易失性,也可以是易失性。在一些实施例中,所述存储器210可以是所述计算机设备200的内部存储单元,例如该计算机设备200的硬盘或内存。在另一些实施例中,所述存储器210也可以是所述计算机设备200的外部存储设备,例如该计算机设备200上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器210还可以既包括所述计算机设备200的内部存储单元也包括其外部存储设备。本实施例中,所述存储器210通常用于存储安装于所述计算机设备200的操作系统和各类应用软件,例如应用于动量梯度下降的模型优化方法 的计算机可读指令等。此外,所述存储器210还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 210 includes at least one type of readable storage medium, including flash memory, hard disk, multimedia card, card-type memory (eg, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), magnetic memory, magnetic disks, optical disks, etc., the computer readable storage Media can be non-volatile or volatile. In some embodiments, the memory 210 may be an internal storage unit of the computer device 200 , such as a hard disk or a memory of the computer device 200 . In other embodiments, the memory 210 may also be an external storage device of the computer device 200, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Of course, the memory 210 may also include both the internal storage unit of the computer device 200 and its external storage device. In this embodiment, the memory 210 is generally used to store the operating system and various application software installed in the computer device 200, such as computer-readable instructions applied to the model optimization method of momentum gradient descent. In addition, the memory 210 can also be used to temporarily store various types of data that have been output or will be output.
所述处理器220在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器220通常用于控制所述计算机设备200的总体操作。本实施例中,所述处理器220用于运行所述存储器210中存储的计算机可读指令或者处理数据,例如运行所述应用于动量梯度下降的模型优化方法的计算机可读指令。The processor 220 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. The processor 220 is typically used to control the overall operation of the computer device 200 . In this embodiment, the processor 220 is configured to execute the computer-readable instructions stored in the memory 210 or process data, for example, the computer-readable instructions for executing the model optimization method applied to momentum gradient descent.
所述网络接口230可包括无线网络接口或有线网络接口,该网络接口230通常用于在所述计算机设备200与其他电子设备之间建立通信连接。The network interface 230 may include a wireless network interface or a wired network interface, and the network interface 230 is generally used to establish a communication connection between the computer device 200 and other electronic devices.
本申请提供的应用于动量梯度下降的模型优化方法,由于带动量的随机梯度下降在训练过程中,当前轮次的训练数据没有被采样到,而该轮次梯度更新仍然会使用历史动量来更新,这可能导致Embedding层过拟合,本申请在更新梯度之前,通过确认梯度数据是否已经更新,从而确认该轮次的训练数据确定被采样,才进行该梯度更新操作,从而有效避免在训练时当前batch中没被采样到的词,依然会使用历史动量来更新导致Embedding层过拟合的问题。In the model optimization method applied to momentum gradient descent provided by this application, since the stochastic gradient descent with momentum is in the training process, the training data of the current round is not sampled, and the gradient update of this round will still use historical momentum to update , which may lead to overfitting of the Embedding layer. Before updating the gradient, this application confirms whether the gradient data has been updated to confirm that the training data of this round is definitely sampled, and then performs the gradient update operation, thereby effectively avoiding the training process. Words that have not been sampled in the current batch will still use historical momentum to update the problem of overfitting of the Embedding layer.
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令可被至少一个处理器执行,以使所述至少一个处理器执行如上述的应用于动量梯度下降的模型优化方法的步骤。The present application also provides another embodiment, that is, to provide a computer-readable storage medium, where the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to perform the steps of the model optimization method applied to momentum gradient descent as described above.
本申请提供的应用于动量梯度下降的模型优化方法,由于带动量的随机梯度下降在训练过程中,当前轮次的训练数据没有被采样到,而该轮次梯度更新仍然会使用历史动量来更新,这可能导致Embedding层过拟合,本申请在更新梯度之前,通过确认梯度数据是否已经更新,从而确认该轮次的训练数据确定被采样,才进行该梯度更新操作,从而有效避免在训练时当前batch中没被采样到的词,依然会使用历史动量来更新导致Embedding层过拟合的问题。In the model optimization method applied to momentum gradient descent provided by this application, since the stochastic gradient descent with momentum is in the training process, the training data of the current round is not sampled, and the gradient update of this round will still use historical momentum to update , which may lead to overfitting of the Embedding layer. Before updating the gradient, this application confirms whether the gradient data has been updated to confirm that the training data of this round is definitely sampled, and then performs the gradient update operation, thereby effectively avoiding the training process. Words that have not been sampled in the current batch will still use historical momentum to update the problem of overfitting of the Embedding layer.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。Obviously, the above-described embodiments are only a part of the embodiments of the present application, rather than all of the embodiments. The accompanying drawings show the preferred embodiments of the present application, but do not limit the scope of the patent of the present application. This application may be embodied in many different forms, rather these embodiments are provided so that a thorough and complete understanding of the disclosure of this application is provided. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or perform equivalent replacements for some of the technical features. . Any equivalent structure made by using the contents of the description and drawings of the present application, which is directly or indirectly used in other related technical fields, is also within the scope of protection of the patent of the present application.
Claims (20)
- 一种应用于动量梯度下降的模型优化方法,其中,包括下述步骤:A model optimization method applied to momentum gradient descent, which includes the following steps:接收用户终端发送的模型优化请求,所述模型优化请求至少携带有原始预测模型以及原始训练数据集;receiving a model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;在所述原始训练数据集中进行采样操作,得到本轮训练数据集;Perform a sampling operation in the original training data set to obtain the current round of training data sets;基于所述本轮训练数据集定义目标函数;Define an objective function based on the current round of training data sets;初始化所述原始预测模型的模型优化参数,得到初始速度参数以及初始决策参数;Initializing the model optimization parameters of the original prediction model to obtain initial speed parameters and initial decision parameters;计算本轮需要更新所述初始决策参数对应的梯度数据;The gradient data corresponding to the initial decision parameter needs to be updated in the calculation of the current round;判断所述梯度数据是否已更新;determine whether the gradient data has been updated;若所述梯度数据未更新,则输出采样异常信号;If the gradient data is not updated, output a sampling abnormal signal;若所述梯度数据已更新,则基于所述梯度数据更新所述初始速度参数,得到更新速度;If the gradient data has been updated, update the initial speed parameter based on the gradient data to obtain an update speed;基于所述更新速度更新所述初始决策参数,得到更新决策参数;Update the initial decision parameters based on the update speed to obtain update decision parameters;当所述初始决策参数以及所述更新决策参数满足收敛条件时,得到目标预测模型。When the initial decision parameter and the updated decision parameter satisfy the convergence condition, a target prediction model is obtained.
- 根据权利要求1所述的应用于动量梯度下降的模型优化方法,其中,所述本轮训练数据集包括用户文本的数据集,所述基于所述本轮训练数据集定义目标函数的步骤,具体包括:The model optimization method applied to momentum gradient descent according to claim 1, wherein the current round of training data set includes a data set of user text, and the step of defining an objective function based on the current round of training data set, specifically include:基于所述用户文本的数据集生成用户-文本矩阵;generating a user-text matrix based on the dataset of user texts;基于奇异值分解法对所述用户-文本矩阵进行分解操作,得到用户-隐特征矩阵以及隐特征-文本矩阵;Decomposing the user-text matrix based on the singular value decomposition method to obtain a user-hidden feature matrix and a latent feature-text matrix;基于所述用户-文本矩阵构造目标函数,所述目标函数RSE R(Λ)表示为: The objective function is constructed based on the user-text matrix, and the objective function RSE R(Λ) is expressed as:其中,R (Λ)表示用户-文本矩阵R用户对文本的评分数据集合;p m`表示用户-隐特征矩阵P中第m个用户对应的隐特征;q n`表示隐特征-文本矩阵Q中第n个文本对应的隐特征;r m,n表示用户m对文本n的评分数据; 表示评分数据集合中用户m对文本n的评分数据;λ 2表示隐特征矩阵的正则化因子。 Among them, R (Λ) represents the user-text matrix R user's scoring data set of text; p m` represents the latent feature corresponding to the mth user in the user-hidden feature matrix P; q n` represents the latent feature-text matrix Q The hidden feature corresponding to the n-th text in ; r m,n represents the rating data of user m for text n; Represents the rating data of user m to text n in the rating data set; λ 2 represents the regularization factor of the latent feature matrix.
- 根据权利要求1所述的应用于动量梯度下降的模型优化方法,其中,所述梯度数据表示为:The model optimization method applied to momentum gradient descent according to claim 1, wherein the gradient data is expressed as:其中,g表示所述梯度数据;m表示所述本轮训练数据的总数量;θ表示所述初始决策参数;x (i)表示第i个所述本轮训练数据;RSE R(Λ)表示所述目标函数。 Among them, g represents the gradient data; m represents the total number of the current round of training data; θ represents the initial decision parameter; x (i) represents the i-th current round of training data; RSE R(Λ) represents the objective function.
- 根据权利要求3所述的应用于动量梯度下降的模型优化方法,其中,所述更新速度表示为:The model optimization method applied to momentum gradient descent according to claim 3, wherein the update speed is expressed as:v new=αv old-∈g v new = αv old -∈g其中,v new表示所述更新速度;v old表示所述初始速度参数;α表示动量参数;∈表示学习率;g表示所述梯度数据。 Wherein, v new represents the update speed; v old represents the initial speed parameter; α represents the momentum parameter; ∈ represents the learning rate; g represents the gradient data.
- 根据权利要求1所述的应用于动量梯度下降的模型优化方法,其中,所述更新决策参数表示为:The model optimization method applied to momentum gradient descent according to claim 1, wherein the update decision parameter is expressed as:θ new=θ old+v new θ new = θ old +v new其中,θ new表示更新决策参数;θ old表示初始决策参数;v new表示所述更新速度。 Wherein, θ new represents the update decision parameter; θ old represents the initial decision parameter; v new represents the update speed.
- 根据权利要求5所述的应用于动量梯度下降的模型优化方法,其中,所述收敛条件为预设收敛阈值;所述当所述初始决策参数以及所述更新决策参数满足收敛条件时,得到目标预测模型的步骤,具体包括:The model optimization method applied to momentum gradient descent according to claim 5, wherein the convergence condition is a preset convergence threshold; the target is obtained when the initial decision parameter and the updated decision parameter satisfy the convergence condition The steps of the prediction model, including:计算所述初始决策参数以及所述更新决策参数的决策参数差值;calculating the decision parameter difference between the initial decision parameter and the updated decision parameter;判断所述决策参数差值是否小于所述预设收敛阈值;judging whether the decision parameter difference is less than the preset convergence threshold;若所述决策参数差值小于或等于所述预设收敛阈值,则确定当前的预测模型收敛,并将所述当前的预测模型作为所述目标预测模型;If the decision parameter difference is less than or equal to the preset convergence threshold, determine that the current prediction model is converged, and use the current prediction model as the target prediction model;若所述决策参数差值大于所述预设收敛阈值,则则确定当前的预测模型未收敛,继续执行参数优化操作。If the decision parameter difference is greater than the preset convergence threshold, it is determined that the current prediction model has not converged, and the parameter optimization operation is continued.
- 一种应用于动量梯度下降的模型优化装置,其中,包括:A model optimization device applied to momentum gradient descent, comprising:请求接收模块,用于接收用户终端发送的模型优化请求,所述模型优化请求至少携带有原始预测模型以及原始训练数据集;a request receiving module, configured to receive a model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;采样操作模块,用于在所述原始训练数据集中进行采样操作,得到本轮训练数据集;a sampling operation module, used for sampling operation in the original training data set to obtain the training data set of this round;函数定义模块,用于基于所述本轮训练数据集定义目标函数;a function definition module for defining an objective function based on the current round of training data sets;初始化模块,用于初始化所述原始预测模型的模型优化参数,得到初始速度参数以及初始决策参数;an initialization module for initializing the model optimization parameters of the original prediction model to obtain initial speed parameters and initial decision parameters;梯度计算模块,用于计算本轮需要更新所述初始决策参数对应的梯度数据;a gradient calculation module, used to calculate the gradient data corresponding to the initial decision parameter that needs to be updated in this round;梯度判断模块,用于判断所述梯度数据是否已更新;a gradient judgment module for judging whether the gradient data has been updated;异常确认模块,用于若所述梯度数据未更新,则输出采样异常信号;An abnormality confirmation module, used for outputting a sampling abnormality signal if the gradient data is not updated;速度参数更新模块,用于若所述梯度数据已更新,则基于所述梯度数据更新所述初始速度参数,得到更新速度;a speed parameter update module, configured to update the initial speed parameter based on the gradient data to obtain an update speed if the gradient data has been updated;决策参数更新模块,用于基于所述更新速度更新所述初始决策参数,得到更新决策参数;a decision parameter update module, configured to update the initial decision parameter based on the update speed to obtain an update decision parameter;目标模型获取模块,用于当所述初始决策参数以及所述更新决策参数满足收敛条件时,得到目标预测模型。A target model obtaining module, configured to obtain a target prediction model when the initial decision parameters and the updated decision parameters satisfy a convergence condition.
- 根据权利要求7所述的应用于动量梯度下降的模型优化装置,其中,所述函数定义模块包括:The model optimization device applied to momentum gradient descent according to claim 7, wherein the function definition module comprises:矩阵生成子模块,用于基于所述用户文本的数据集生成用户-文本矩阵;a matrix generation submodule for generating a user-text matrix based on the data set of the user text;矩阵分解子模块,用于基于奇异值分解法对所述用户-文本矩阵进行分解操作,得到用户-隐特征矩阵以及隐特征-文本矩阵;a matrix decomposition submodule, configured to perform a decomposition operation on the user-text matrix based on the singular value decomposition method to obtain a user-hidden feature matrix and a latent feature-text matrix;函数构造子模块,用于基于所述用户-文本矩阵构造目标函数,所述目标函数RSE R(Λ)表示为: A function construction submodule for constructing an objective function based on the user-text matrix, and the objective function RSER (Λ) is expressed as:其中,R (Λ)表示用户-文本矩阵R用户对文本的评分数据集合;p m`表示用户-隐特征矩阵P中第m个用户对应的隐特征;q n`表示隐特征-文本矩阵Q中第n个文本对应的隐特征;r m,n表示用户m对文本n的评分数据; 表示评分数据集合中用户m对文本n的评分数据;λ 2表示隐特征矩阵的正则化因子。 Among them, R (Λ) represents the user-text matrix R user's scoring data set of text; p m` represents the latent feature corresponding to the mth user in the user-hidden feature matrix P; q n` represents the latent feature-text matrix Q The hidden feature corresponding to the n-th text in ; r m, n represents the rating data of user m for text n; Represents the rating data of user m to text n in the rating data set; λ 2 represents the regularization factor of the latent feature matrix.
- 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下所述的应用于动量梯度下降的模型优化方法的步骤:A computer device comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, the processor implements the steps of a model optimization method applied to momentum gradient descent as described below :接收用户终端发送的模型优化请求,所述模型优化请求至少携带有原始预测模型以及原始训练数据集;receiving a model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;在所述原始训练数据集中进行采样操作,得到本轮训练数据集;Perform a sampling operation in the original training data set to obtain the current round of training data sets;基于所述本轮训练数据集定义目标函数;Define an objective function based on the current round of training data sets;初始化所述原始预测模型的模型优化参数,得到初始速度参数以及初始决策参数;Initializing the model optimization parameters of the original prediction model to obtain initial speed parameters and initial decision parameters;计算本轮需要更新所述初始决策参数对应的梯度数据;The gradient data corresponding to the initial decision parameter needs to be updated in the calculation of the current round;判断所述梯度数据是否已更新;determine whether the gradient data has been updated;若所述梯度数据未更新,则输出采样异常信号;If the gradient data is not updated, output a sampling abnormal signal;若所述梯度数据已更新,则基于所述梯度数据更新所述初始速度参数,得到更新速度;If the gradient data has been updated, update the initial speed parameter based on the gradient data to obtain an update speed;基于所述更新速度更新所述初始决策参数,得到更新决策参数;Update the initial decision parameters based on the update speed to obtain update decision parameters;当所述初始决策参数以及所述更新决策参数满足收敛条件时,得到目标预测模型。When the initial decision parameter and the updated decision parameter satisfy the convergence condition, a target prediction model is obtained.
- 根据权利要求9所述的计算机设备,其中,所述本轮训练数据集包括用户文本的数据集,所述基于所述本轮训练数据集定义目标函数的步骤,具体包括:The computer device according to claim 9, wherein the current round of training data set includes a data set of user text, and the step of defining an objective function based on the current round of training data set specifically includes:基于所述用户文本的数据集生成用户-文本矩阵;generating a user-text matrix based on the dataset of user texts;基于奇异值分解法对所述用户-文本矩阵进行分解操作,得到用户-隐特征矩阵以及隐特征-文本矩阵;Decomposing the user-text matrix based on the singular value decomposition method to obtain a user-hidden feature matrix and a latent feature-text matrix;基于所述用户-文本矩阵构造目标函数,所述目标函数RSE R(Λ)表示为: The objective function is constructed based on the user-text matrix, and the objective function RSE R(Λ) is expressed as:其中,R (Λ)表示用户-文本矩阵R用户对文本的评分数据集合;p m`表示用户-隐特征矩阵P中第m个用户对应的隐特征;q n`表示隐特征-文本矩阵Q中第n个文本对应的隐特征;r m,n表示用户m对文本n的评分数据; 表示评分数据集合中用户m对文本n的评分数据;λ 2表示隐特征矩阵的正则化因子。 Among them, R (Λ) represents the user-text matrix R user's scoring data set of text; p m` represents the latent feature corresponding to the mth user in the user-hidden feature matrix P; q n` represents the latent feature-text matrix Q The hidden feature corresponding to the n-th text in ; r m,n represents the rating data of user m for text n; Represents the rating data of user m to text n in the rating data set; λ 2 represents the regularization factor of the latent feature matrix.
- 根据权利要求9所述的计算机设备,其中,所述梯度数据表示为:The computer device of claim 9, wherein the gradient data is represented as:其中,g表示所述梯度数据;m表示所述本轮训练数据的总数量;θ表示所述初始决策参数;x (i)表示第i个所述本轮训练数据;RSE R(Λ)表示所述目标函数。 Among them, g represents the gradient data; m represents the total number of the current round of training data; θ represents the initial decision parameter; x (i) represents the i-th current round of training data; RSE R(Λ) represents the objective function.
- 根据权利要求11所述的计算机设备,其中,所述更新速度表示为:The computer device of claim 11, wherein the update rate is expressed as:v new=αv old-∈g v new = αv old -∈g其中,v new表示所述更新速度;v old表示所述初始速度参数;α表示动量参数;∈表示学习率;g表示所述梯度数据。 Wherein, v new represents the update speed; v old represents the initial speed parameter; α represents the momentum parameter; ∈ represents the learning rate; g represents the gradient data.
- 根据权利要求9所述的计算机设备,其中,所述更新决策参数表示为:The computer device according to claim 9, wherein the update decision parameter is expressed as:θ new=θ old+v new θ new = θ old +v new其中,θ new表示更新决策参数;θ old表示初始决策参数;v new表示所述更新速度。 Wherein, θ new represents the update decision parameter; θ old represents the initial decision parameter; v new represents the update speed.
- 根据权利要求13所述的计算机设备,其中,所述收敛条件为预设收敛阈值;所述当所述初始决策参数以及所述更新决策参数满足收敛条件时,得到目标预测模型的步骤,具体包括:The computer device according to claim 13, wherein the convergence condition is a preset convergence threshold; the step of obtaining the target prediction model when the initial decision parameter and the updated decision parameter satisfy the convergence condition, specifically comprises: :计算所述初始决策参数以及所述更新决策参数的决策参数差值;calculating the decision parameter difference between the initial decision parameter and the updated decision parameter;判断所述决策参数差值是否小于所述预设收敛阈值;judging whether the decision parameter difference is less than the preset convergence threshold;若所述决策参数差值小于或等于所述预设收敛阈值,则确定当前的预测模型收敛,并将所述当前的预测模型作为所述目标预测模型;If the decision parameter difference is less than or equal to the preset convergence threshold, determine that the current prediction model is converged, and use the current prediction model as the target prediction model;若所述决策参数差值大于所述预设收敛阈值,则则确定当前的预测模型未收敛,继续执行参数优化操作。If the decision parameter difference is greater than the preset convergence threshold, it is determined that the current prediction model has not converged, and the parameter optimization operation is continued.
- 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下所述的应用于动量梯度下降的模型优化方法的步骤:A computer-readable storage medium, wherein computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the following model optimization method applied to momentum gradient descent is implemented A step of:接收用户终端发送的模型优化请求,所述模型优化请求至少携带有原始预测模型以及原始训练数据集;receiving a model optimization request sent by the user terminal, where the model optimization request at least carries the original prediction model and the original training data set;在所述原始训练数据集中进行采样操作,得到本轮训练数据集;Perform a sampling operation in the original training data set to obtain the current round of training data sets;基于所述本轮训练数据集定义目标函数;Define an objective function based on the current round of training data sets;初始化所述原始预测模型的模型优化参数,得到初始速度参数以及初始决策参数;Initializing the model optimization parameters of the original prediction model to obtain initial speed parameters and initial decision parameters;计算本轮需要更新所述初始决策参数对应的梯度数据;The gradient data corresponding to the initial decision parameter needs to be updated in the calculation of the current round;判断所述梯度数据是否已更新;determine whether the gradient data has been updated;若所述梯度数据未更新,则输出采样异常信号;If the gradient data is not updated, output a sampling abnormal signal;若所述梯度数据已更新,则基于所述梯度数据更新所述初始速度参数,得到更新速度;If the gradient data has been updated, update the initial speed parameter based on the gradient data to obtain an update speed;基于所述更新速度更新所述初始决策参数,得到更新决策参数;Update the initial decision parameters based on the update speed to obtain update decision parameters;当所述初始决策参数以及所述更新决策参数满足收敛条件时,得到目标预测模型。When the initial decision parameter and the updated decision parameter satisfy the convergence condition, a target prediction model is obtained.
- 根据权利要求15所述的计算机可读存储介质,其中,所述本轮训练数据集包括用户文本的数据集,所述基于所述本轮训练数据集定义目标函数的步骤,具体包括:The computer-readable storage medium according to claim 15, wherein the current round of training data set includes a data set of user text, and the step of defining an objective function based on the current round of training data set specifically includes:基于所述用户文本的数据集生成用户-文本矩阵;generating a user-text matrix based on the dataset of user texts;基于奇异值分解法对所述用户-文本矩阵进行分解操作,得到用户-隐特征矩阵以及隐特征-文本矩阵;Decomposing the user-text matrix based on the singular value decomposition method to obtain a user-hidden feature matrix and a latent feature-text matrix;基于所述用户-文本矩阵构造目标函数,所述目标函数RSE R(Λ)表示为: The objective function is constructed based on the user-text matrix, and the objective function RSE R(Λ) is expressed as:其中,R (Λ)表示用户-文本矩阵R用户对文本的评分数据集合;p m`表示用户-隐特征矩阵P中第m个用户对应的隐特征;q n`表示隐特征-文本矩阵Q中第n个文本对应的隐特征;r m,n表示用户m对文本n的评分数据; 表示评分数据集合中用户m对文本n的评分数据;λ 2表示隐特征矩阵的正则化因子。 Among them, R (Λ) represents the user-text matrix R user's scoring data set of text; p m` represents the latent feature corresponding to the mth user in the user-hidden feature matrix P; q n` represents the latent feature-text matrix Q The hidden feature corresponding to the n-th text in ; r m, n represents the rating data of user m for text n; Represents the rating data of user m to text n in the rating data set; λ 2 represents the regularization factor of the latent feature matrix.
- 根据权利要求15所述的计算机可读存储介质,其中,所述梯度数据表示为:The computer-readable storage medium of claim 15, wherein the gradient data is represented as:其中,g表示所述梯度数据;m表示所述本轮训练数据的总数量;θ表示所述初始决策参数;x (i)表示第i个所述本轮训练数据;RSE R(Λ)表示所述目标函数。 Among them, g represents the gradient data; m represents the total number of the current round of training data; θ represents the initial decision parameter; x (i) represents the i-th current round of training data; RSE R(Λ) represents the objective function.
- 根据权利要求17所述的计算机可读存储介质,其中,所述更新速度表示为:The computer-readable storage medium of claim 17, wherein the update rate is expressed as:v new=αv old-∈g v new = αv old -∈g其中,v new表示所述更新速度;v old表示所述初始速度参数;α表示动量参数;∈表示学习率;g表示所述梯度数据。 Wherein, v new represents the update speed; v old represents the initial speed parameter; α represents the momentum parameter; ∈ represents the learning rate; g represents the gradient data.
- 根据权利要求15所述的计算机可读存储介质,其中,所述更新决策参数表示为:The computer-readable storage medium of claim 15, wherein the update decision parameter is represented as:θ new=θ old+v new θ new = θ old +v new其中,θ new表示更新决策参数;θ old表示初始决策参数;v new表示所述更新速度。 Wherein, θ new represents the update decision parameter; θ old represents the initial decision parameter; v new represents the update speed.
- 根据权利要求19所述的计算机可读存储介质,其中,所述收敛条件为预设收敛阈值;所述当所述初始决策参数以及所述更新决策参数满足收敛条件时,得到目标预测模型的步骤,具体包括:The computer-readable storage medium according to claim 19, wherein the convergence condition is a preset convergence threshold; the step of obtaining a target prediction model when the initial decision parameter and the updated decision parameter satisfy the convergence condition , including:计算所述初始决策参数以及所述更新决策参数的决策参数差值;calculating the decision parameter difference between the initial decision parameter and the updated decision parameter;判断所述决策参数差值是否小于所述预设收敛阈值;judging whether the decision parameter difference is less than the preset convergence threshold;若所述决策参数差值小于或等于所述预设收敛阈值,则确定当前的预测模型收敛,并将所述当前的预测模型作为所述目标预测模型;If the decision parameter difference is less than or equal to the preset convergence threshold, determine that the current prediction model is converged, and use the current prediction model as the target prediction model;若所述决策参数差值大于所述预设收敛阈值,则则确定当前的预测模型未收敛,继续执行参数优化操作。If the decision parameter difference is greater than the preset convergence threshold, it is determined that the current prediction model has not converged, and the parameter optimization operation is continued.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011359384.8 | 2020-11-27 | ||
CN202011359384.8A CN112488183B (en) | 2020-11-27 | 2020-11-27 | Model optimization method, device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022110640A1 true WO2022110640A1 (en) | 2022-06-02 |
Family
ID=74935992
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/090501 WO2022110640A1 (en) | 2020-11-27 | 2021-04-28 | Model optimization method and apparatus, computer device and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112488183B (en) |
WO (1) | WO2022110640A1 (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116068903A (en) * | 2023-04-06 | 2023-05-05 | 中国人民解放军国防科技大学 | Real-time optimization method, device and equipment for robustness performance of closed-loop system |
CN116451872A (en) * | 2023-06-08 | 2023-07-18 | 北京中电普华信息技术有限公司 | Carbon emission prediction distributed model training method, related method and device |
CN117033352A (en) * | 2023-07-03 | 2023-11-10 | 深圳大学 | Data restoration method and device, terminal equipment and storage medium |
CN117077598A (en) * | 2023-10-13 | 2023-11-17 | 青岛展诚科技有限公司 | 3D parasitic parameter optimization method based on Mini-batch gradient descent method |
CN117350360A (en) * | 2023-09-21 | 2024-01-05 | 摩尔线程智能科技(北京)有限责任公司 | Fine tuning method and device for large model, electronic equipment and storage medium |
CN117350564A (en) * | 2023-10-13 | 2024-01-05 | 内蒙古电力勘测设计院有限责任公司 | Investment prediction method and device for power transmission and transformation project |
CN117596156A (en) * | 2023-12-07 | 2024-02-23 | 机械工业仪器仪表综合技术经济研究所 | Construction method of evaluation model of industrial application 5G network |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112488183B (en) * | 2020-11-27 | 2024-05-10 | 平安科技(深圳)有限公司 | Model optimization method, device, computer equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170103161A1 (en) * | 2015-10-13 | 2017-04-13 | The Governing Council Of The University Of Toronto | Methods and systems for 3d structure estimation |
CN110390561A (en) * | 2019-07-04 | 2019-10-29 | 四川金赞科技有限公司 | User-financial product of stochastic gradient descent is accelerated to select tendency ultra rapid predictions method and apparatus based on momentum |
CN110730037A (en) * | 2019-10-21 | 2020-01-24 | 苏州大学 | Optical signal-to-noise ratio monitoring method of coherent optical communication system based on momentum gradient descent method |
CN111507530A (en) * | 2020-04-17 | 2020-08-07 | 集美大学 | RBF neural network ship traffic flow prediction method based on fractional order momentum gradient descent |
CN111695295A (en) * | 2020-06-01 | 2020-09-22 | 中国人民解放军火箭军工程大学 | Method for constructing incident parameter inversion model of grating coupler |
CN112488183A (en) * | 2020-11-27 | 2021-03-12 | 平安科技(深圳)有限公司 | Model optimization method and device, computer equipment and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110889509B (en) * | 2019-11-11 | 2023-04-28 | 安徽超清科技股份有限公司 | Gradient momentum acceleration-based joint learning method and device |
CN111639710B (en) * | 2020-05-29 | 2023-08-08 | 北京百度网讯科技有限公司 | Image recognition model training method, device, equipment and storage medium |
-
2020
- 2020-11-27 CN CN202011359384.8A patent/CN112488183B/en active Active
-
2021
- 2021-04-28 WO PCT/CN2021/090501 patent/WO2022110640A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170103161A1 (en) * | 2015-10-13 | 2017-04-13 | The Governing Council Of The University Of Toronto | Methods and systems for 3d structure estimation |
CN110390561A (en) * | 2019-07-04 | 2019-10-29 | 四川金赞科技有限公司 | User-financial product of stochastic gradient descent is accelerated to select tendency ultra rapid predictions method and apparatus based on momentum |
CN110730037A (en) * | 2019-10-21 | 2020-01-24 | 苏州大学 | Optical signal-to-noise ratio monitoring method of coherent optical communication system based on momentum gradient descent method |
CN111507530A (en) * | 2020-04-17 | 2020-08-07 | 集美大学 | RBF neural network ship traffic flow prediction method based on fractional order momentum gradient descent |
CN111695295A (en) * | 2020-06-01 | 2020-09-22 | 中国人民解放军火箭军工程大学 | Method for constructing incident parameter inversion model of grating coupler |
CN112488183A (en) * | 2020-11-27 | 2021-03-12 | 平安科技(深圳)有限公司 | Model optimization method and device, computer equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
XIAO LIU CLASSMATE: "Machine Learning Optimization Methods: Momentum Momentum Gradient Descent", CSDN BLOG, 2 December 2019 (2019-12-02), pages 1 - 8, XP055933014, Retrieved from the Internet <URL:https://blog.csdn.net/SweetSeven_/article/details/103353990> [retrieved on 20220620] * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116068903A (en) * | 2023-04-06 | 2023-05-05 | 中国人民解放军国防科技大学 | Real-time optimization method, device and equipment for robustness performance of closed-loop system |
CN116451872A (en) * | 2023-06-08 | 2023-07-18 | 北京中电普华信息技术有限公司 | Carbon emission prediction distributed model training method, related method and device |
CN116451872B (en) * | 2023-06-08 | 2023-09-01 | 北京中电普华信息技术有限公司 | Carbon emission prediction distributed model training method, related method and device |
CN117033352A (en) * | 2023-07-03 | 2023-11-10 | 深圳大学 | Data restoration method and device, terminal equipment and storage medium |
CN117350360A (en) * | 2023-09-21 | 2024-01-05 | 摩尔线程智能科技(北京)有限责任公司 | Fine tuning method and device for large model, electronic equipment and storage medium |
CN117077598A (en) * | 2023-10-13 | 2023-11-17 | 青岛展诚科技有限公司 | 3D parasitic parameter optimization method based on Mini-batch gradient descent method |
CN117350564A (en) * | 2023-10-13 | 2024-01-05 | 内蒙古电力勘测设计院有限责任公司 | Investment prediction method and device for power transmission and transformation project |
CN117077598B (en) * | 2023-10-13 | 2024-01-26 | 青岛展诚科技有限公司 | 3D parasitic parameter optimization method based on Mini-batch gradient descent method |
CN117596156A (en) * | 2023-12-07 | 2024-02-23 | 机械工业仪器仪表综合技术经济研究所 | Construction method of evaluation model of industrial application 5G network |
CN117596156B (en) * | 2023-12-07 | 2024-05-07 | 机械工业仪器仪表综合技术经济研究所 | Construction method of evaluation model of industrial application 5G network |
Also Published As
Publication number | Publication date |
---|---|
CN112488183A (en) | 2021-03-12 |
CN112488183B (en) | 2024-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022110640A1 (en) | Model optimization method and apparatus, computer device and storage medium | |
US11481617B2 (en) | Generating trained neural networks with increased robustness against adversarial attacks | |
US10936949B2 (en) | Training machine learning models using task selection policies to increase learning progress | |
WO2021155713A1 (en) | Weight grafting model fusion-based facial recognition method, and related device | |
CN113435583B (en) | Federal learning-based countermeasure generation network model training method and related equipment thereof | |
US20190303535A1 (en) | Interpretable bio-medical link prediction using deep neural representation | |
WO2021120677A1 (en) | Warehousing model training method and device, computer device and storage medium | |
WO2019095570A1 (en) | Method for predicting popularity of event, server, and computer readable storage medium | |
CN111340221B (en) | Neural network structure sampling method and device | |
WO2020168851A1 (en) | Behavior recognition | |
CN110462638A (en) | Training neural network is sharpened using posteriority | |
WO2022105121A1 (en) | Distillation method and apparatus applied to bert model, device, and storage medium | |
WO2020191001A1 (en) | Real-world network link analysis and prediction using extended probailistic maxtrix factorization models with labeled nodes | |
CN112214775A (en) | Injection type attack method and device for graph data, medium and electronic equipment | |
WO2022116439A1 (en) | Federated learning-based ct image detection method and related device | |
CN112651436A (en) | Optimization method and device based on uncertain weight graph convolution neural network | |
CN108475346A (en) | Neural random access machine | |
CN115730597A (en) | Multi-level semantic intention recognition method and related equipment thereof | |
CN113420161B (en) | Node text fusion method and device, computer equipment and storage medium | |
CN114241411A (en) | Counting model processing method and device based on target detection and computer equipment | |
CN113961720A (en) | Method for predicting entity relationship and method and device for training relationship prediction model | |
CN113791909A (en) | Server capacity adjusting method and device, computer equipment and storage medium | |
CN111144473A (en) | Training set construction method and device, electronic equipment and computer readable storage medium | |
CN115099875A (en) | Data classification method based on decision tree model and related equipment | |
CN115545753A (en) | Partner prediction method based on Bayesian algorithm and related equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21896155 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21896155 Country of ref document: EP Kind code of ref document: A1 |