CN117114127A - Model training method, device, computer equipment and storage medium - Google Patents

Model training method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN117114127A
CN117114127A CN202310859847.4A CN202310859847A CN117114127A CN 117114127 A CN117114127 A CN 117114127A CN 202310859847 A CN202310859847 A CN 202310859847A CN 117114127 A CN117114127 A CN 117114127A
Authority
CN
China
Prior art keywords
learning rate
loss
training
processing model
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310859847.4A
Other languages
Chinese (zh)
Inventor
徐晓健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202310859847.4A priority Critical patent/CN117114127A/en
Publication of CN117114127A publication Critical patent/CN117114127A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Feedback Control In General (AREA)

Abstract

The application relates to a training method, a training device, computer equipment and a storage medium of a model. The method comprises the following steps: determining a first learning rate and a second learning rate according to the initial learning rate and the preset training times, performing first-round training on the initial processing model based on the first learning rate to obtain a first loss and a first intermediate processing model, performing second-round training on the initial processing model after the first-round training based on the second learning rate to obtain a second loss and a second intermediate processing model, and determining a third learning rate according to the first loss, the second learning rate and the preset training times; and training the second intermediate processing model based on the third learning rate to obtain a third loss, and completing training according to the convergence condition of the third loss to obtain a trained processing model. According to the training method of the model, the parameters of the model are adjusted according to the convergence condition of the model, so that the model convergence efficiency is greatly improved.

Description

Model training method, device, computer equipment and storage medium
Technical Field
The present application relates to the field of financial information processing technologies, and in particular, to a model training method, apparatus, computer device, and storage medium.
Background
With the continued development of artificial intelligence (Artificial Intelligence, AI) technology, many new models have been developed, such as classification models, detection models, identification models, and the like. However, in most of training or optimizing processes related to financial models (e.g., bill classification models), the processing difficulty of financial information is great, so that the whole training process of the corresponding processing model needs to take a long time, and it is difficult to meet the processing requirement of a large amount of financial information.
Therefore, how to improve the training efficiency of the model is a problem to be solved.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a model training method, apparatus, computer device, and storage medium that can improve the training efficiency of a model.
In a first aspect, the present application provides a method of training a model. The method comprises the following steps:
determining a first learning rate and a second learning rate according to the initial learning rate and the preset training times;
based on a first learning rate, performing a first round of training on the initial processing model to obtain a first loss and a first intermediate processing model;
based on a second learning rate, performing a second training on the initial processing model after the first training to obtain a second loss and a second intermediate processing model;
Determining a third learning rate according to the first loss, the second learning rate and the preset training times;
and training the second intermediate processing model based on the third learning rate to obtain a third loss, and completing training according to the convergence condition of the third loss to obtain a trained processing model.
In one embodiment, training is completed according to the convergence of the third loss, and a trained processing model is obtained, including:
if the third loss converges, training is completed to obtain a processing model;
if the third loss does not converge, the third learning rate is taken as a new second learning rate, the second loss value is taken as a new first loss value, the third loss value is taken as a new second loss value, the trained second intermediate processing model is taken as a new second intermediate processing model, and the step of determining the third learning rate according to the first loss, the second learning rate and the preset training times is carried out.
In one embodiment, if the first learning rate is the same as the second learning rate, determining the first learning rate according to the initial learning rate and the preset training times includes:
and carrying out product operation on the inverse proportion value of the preset training times and the initial learning rate, and determining a first learning rate.
In one embodiment, determining the third learning rate based on the first loss, the second learning rate, and the preset number of exercises includes:
determining a loss fluctuation parameter based on the first loss and the second loss;
and determining a third learning rate according to the loss fluctuation parameter, the second learning rate and the preset training times.
In one embodiment, determining the loss ripple parameter from the first loss and the second loss comprises:
performing product operation on the inverse proportion value of the first loss and the second loss to determine a ratio;
and (5) carrying out indexing processing on the ratio value to determine the loss fluctuation parameter.
In one embodiment, determining the third learning rate according to the loss fluctuation parameter, the second learning rate, and the preset training number includes:
performing product operation on an inverse proportion value of the preset training times and the initial learning rate, and determining an intermediate learning rate;
and performing product operation on the intermediate learning rate and the loss fluctuation parameter to determine a third learning rate.
In one embodiment, the method further comprises:
and training the trained processing model in the second stage based on the initial learning rate to obtain a trained target processing model.
In a second aspect, the application further provides a training device of the model. The device comprises:
the first determining module is used for determining a first learning rate and a second learning rate according to the initial learning rate and the preset training times;
the first training module is used for carrying out first training on the initial processing model based on a first learning rate to obtain a first loss and a first intermediate processing model;
the second training module is used for carrying out second-round training on the initial processing model after the first-round training based on a second learning rate to obtain a second loss and a second intermediate processing model;
the second determining module is used for determining a third learning rate according to the first loss, the second learning rate and the preset training times;
and the third training module is used for training the second intermediate processing model based on a third learning rate to obtain a third loss, and completing training according to the convergence condition of the third loss to obtain a trained processing model.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:
Determining a first learning rate and a second learning rate according to the initial learning rate and the preset training times;
based on a first learning rate, performing a first round of training on the initial processing model to obtain a first loss and a first intermediate processing model;
based on a second learning rate, performing a second training on the initial processing model after the first training to obtain a second loss and a second intermediate processing model;
determining a third learning rate according to the first loss, the second learning rate and the preset training times;
and training the second intermediate processing model based on the third learning rate to obtain a third loss, and completing training according to the convergence condition of the third loss to obtain a trained processing model.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
determining a first learning rate and a second learning rate according to the initial learning rate and the preset training times;
based on a first learning rate, performing a first round of training on the initial processing model to obtain a first loss and a first intermediate processing model;
Based on a second learning rate, performing a second training on the initial processing model after the first training to obtain a second loss and a second intermediate processing model;
determining a third learning rate according to the first loss, the second learning rate and the preset training times;
and training the second intermediate processing model based on the third learning rate to obtain a third loss, and completing training according to the convergence condition of the third loss to obtain a trained processing model.
In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:
determining a first learning rate and a second learning rate according to the initial learning rate and the preset training times;
based on a first learning rate, performing a first round of training on the initial processing model to obtain a first loss and a first intermediate processing model;
based on a second learning rate, performing a second training on the initial processing model after the first training to obtain a second loss and a second intermediate processing model;
determining a third learning rate according to the first loss, the second learning rate and the preset training times;
And training the second intermediate processing model based on the third learning rate to obtain a third loss, and completing training according to the convergence condition of the third loss to obtain a trained processing model.
The training method, the device, the computer equipment and the storage medium of the model are used for determining a first learning rate and a second learning rate according to the initial learning rate and the preset training times, then performing first-round training on the initial processing model based on the first learning rate to obtain a first loss and a first intermediate processing model, then performing second-round training on the initial processing model subjected to the first-round training based on the second learning rate to obtain a second loss and a second intermediate processing model, and determining a third learning rate according to the first loss, the second learning rate and the preset training times; and training the second intermediate processing model based on the third learning rate to obtain a third loss, and completing training according to the convergence condition of the third loss to obtain a trained processing model. According to the model training method, the parameters of the model are adjusted based on the loss between the input value and the output value of the model until the model converges, and the determined parameters of the model at the moment are the parameters of the model after training, namely, the parameters of the model are adjusted according to the convergence condition of the model.
Drawings
FIG. 1 is a diagram of an application environment for a training method for a model in one embodiment;
FIG. 2 is a flow diagram of a training method for a model in one embodiment;
FIG. 3 is a flowchart illustrating the step S205 in the embodiment of FIG. 2;
FIG. 4 is a flowchart illustrating the step S204 in the embodiment of FIG. 2;
FIG. 5 is a flowchart illustrating the step S401 in the embodiment of FIG. 4;
FIG. 6 is a flowchart illustrating the step S402 in the embodiment of FIG. 4;
FIG. 7 is a flow chart of a training method for a model in another embodiment;
FIG. 8 is a flow chart of a training method for a model in yet another embodiment;
FIG. 9 is a block diagram of a training device in a model of one embodiment;
FIG. 10 is a block diagram of a training device in a model of one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The training method of the model provided by the embodiment of the application can be applied to the computer equipment shown in the figure 2. The computer device comprises a processor, a memory, and a computer program stored in the memory, wherein the processor is connected through a system bus, and when executing the computer program, the processor can execute the steps of the method embodiments described below. Optionally, the computer device may further comprise an input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium having stored therein an operating system, computer programs, and a database, an internal memory. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used for communicating with an external terminal through a network connection. Optionally, the computer device may be a server, a personal computer, a personal digital assistant, other terminal devices, such as a tablet computer, a mobile phone, etc., or a cloud or remote server, and the embodiment of the present application does not limit a specific form of the computer device.
It will be appreciated by those skilled in the art that the architecture shown in fig. 1 is merely a block diagram of some of the architecture associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements are applied, and that a particular terminal may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
After the application scenario of the model training method provided by the embodiment of the present application is described, the model training method described by the present application is described in detail below.
In one embodiment, as shown in fig. 2, a training method of a model is provided, and an example of application of the method to the computer device in fig. 1 is described, including the following steps:
s201, determining a first learning rate and a second learning rate according to the initial learning rate and the preset training times.
The learning rate is a parameter to be adjusted in the model training process; the initial learning rate is an initial parameter to be adjusted in the model training process; the preset training times are the times to be trained of a training model preset in the computer equipment before model training starts; the first learning rate is the learning rate of the model used in the first round training process; the second learning rate is the learning rate that the model uses during the second round of training. The first learning rate and the second learning rate may be the same, and the first learning rate and the second learning rate may be different.
In the embodiment of the application, according to the type of the model to be trained, the complexity of the model to be trained and the sample amount of sample data, the initial learning rate and the preset training times of the model to be trained are determined, numerical operation is carried out on the initial learning rate and the preset training times of the model to be trained, and the first learning rate of the model to be trained in the first round training process and the second learning rate of the model to be trained in the second round training process are determined. For example, if the model to be trained is a bill classification model in the financial field, and the variety and the sample size of the bill are more, determining that the initial learning rate for training the bill classification model is eta, the preset training frequency of the bill classification model is M, performing product operation on the initial learning rate eta and the preset training frequency M of the bill classification model to obtain a first learning rate of the bill classification model in a first round training process, and performing accumulation and operation on the initial learning rate eta and the preset training frequency M of the bill classification model to obtain a second learning rate of the bill classification model in a second round training process.
S202, performing first training in the initial processing model based on a first learning rate to obtain a first loss and a first intermediate processing model.
The first loss is the difference value between the input data and the output data of the initial processing model after the first round of training, and the smaller the value of the first loss is, the better the performance of the initial processing model after the first round of training is; the first intermediate process model is a model after a first round of training of the initial process model.
In the embodiment of the application, after the first learning rate is determined, the computer equipment sets the determined first learning rate as the learning rate in the initial processing model, inputs input data into the initial processing model for first round training, obtains a first intermediate processing model and a model processing result after the first round training is performed on the initial processing model, and determines the first loss according to the model processing result and the tag value of the verification data.
And S203, performing a second training on the initial processing model after the first training based on the second learning rate to obtain a second loss and a second intermediate processing model.
The second loss is the difference value between the input data and the output data of the initial processing model after the second round of training, and the smaller the value of the second loss is, the better the performance of the initial processing model after the second round of training is; the second intermediate process model is a model after a second round of training the initial process model.
In the embodiment of the application, after the second learning rate is determined, the computer equipment sets the determined second learning rate as the learning rate in the initial processing model after the first round of training, inputs the input data into the initial processing model after the first round of training for the second round of training, obtains a second intermediate processing model and a model prediction result after the second round of training for the initial processing model, and determines the second loss according to the model prediction result and the label value of the verification data.
S204, determining a third learning rate according to the first loss, the second learning rate and the preset training times.
In the embodiment of the application, after the first loss, the second learning rate and the preset training times are obtained, the first loss, the second learning rate and the preset training times are subjected to numerical operation to obtain the third learning rate.
S205, training the second intermediate processing model based on the third learning rate to obtain a third loss, and completing training according to the convergence condition of the third loss to obtain a trained processing model.
The third loss is a difference value between the input data and the output data of the second intermediate processing model, the smaller the value of the third loss is, the better the performance of the second intermediate processing model after training is, the third loss may meet the convergence condition, and the third loss may not meet the convergence condition.
In the embodiment of the application, after the third learning rate is determined, the computer equipment sets the determined third learning rate as the learning rate in the second intermediate processing model, inputs the input data into the second intermediate processing model for third training, obtains a trained processing model and a model prediction result, determines third loss according to the model prediction result and the label value of the verification data, and completes training according to the convergence condition of the third loss to obtain the trained processing model.
The computer device sets the determined third learning rate as the learning rate in the second intermediate processing model, inputs the input data into the second intermediate processing model to perform third training, obtains a trained processing model and a model prediction result, determines a third loss according to the model prediction result and a label value of verification data, and determines that the model trained on the second intermediate processing model is the trained processing model if the third loss meets a preset convergence condition.
The computer device sets the determined third learning rate as the learning rate in the second intermediate processing model, inputs the input data into the second intermediate processing model to perform third training to obtain a trained processing model and a model prediction result, determines a third loss according to the model prediction result and a label value of verification data, determines a fourth learning rate according to the third loss, the second loss, the third learning rate and a preset training frequency if the third loss does not meet a preset convergence condition, trains the third intermediate processing model based on the fourth learning rate to obtain a fourth loss and a fourth intermediate processing model, according to the n-1 loss, the n-2 loss, the n-1 learning rate and the preset training frequency, trains the n-1 intermediate processing model based on the n-1 learning rate to obtain the n-1 loss until the n-1 loss meets the convergence condition, indicates that the initial processing model converges at this time, and obtains the trained processing model.
According to the training method of the model provided by the embodiment of the application, the first learning rate and the second learning rate are determined according to the initial learning rate and the preset training times, then the first round of training is carried out on the initial processing model based on the first learning rate to obtain a first loss and a first intermediate processing model, then the second round of training is carried out on the initial processing model after the first round of training based on the second learning rate to obtain a second loss and a second intermediate processing model, and the third learning rate is determined according to the first loss, the second learning rate and the preset training times; and training the second intermediate processing model based on the third learning rate to obtain a third loss, and completing training according to the convergence condition of the third loss to obtain a trained processing model. According to the model training method, the parameters of the model are adjusted based on the loss between the input value and the output value of the model until the model converges, and the determined parameters of the model at the moment are the parameters of the model after training, namely, the parameters of the model are adjusted according to the convergence condition of the model.
In one embodiment, the process of completing training according to the convergence of the third loss to obtain a trained processing model may be described on the basis of the embodiment shown in fig. 2, as shown in fig. 3, where S205 "complete training according to the convergence of the third loss to obtain a trained processing model" includes:
s301, judging whether the third loss is converged or not.
In the embodiment of the present application, after the third loss is obtained, the computer device may compare the third loss value with a preset threshold, and if the third loss value is smaller than the preset threshold, it indicates that the third loss is converged, and if the third loss value is larger than the preset threshold, it indicates that the third loss is not converged.
S302, if yes, training is completed to obtain a processing model.
In the embodiment of the application, if the third loss converges, the initial processing model training is completed to obtain the processing model.
And S303, if not, taking the third learning rate as a new second learning rate, taking the second loss value as a new first loss value, taking the third loss value as a new second loss value, taking the trained second intermediate processing model as a new second intermediate processing model, and returning to execute the step of determining the third learning rate according to the first loss, the second learning rate and the preset training times.
In the embodiment of the application, if the third loss is not converged, the initial processing model is not trained, the third learning rate is taken as a new second learning rate, the second loss value is taken as a new first loss value, the third loss value is taken as a new second loss value, the trained second intermediate processing model is taken as a new second intermediate processing model, the third learning rate is determined according to the first loss, the second learning rate and the preset training times, the second intermediate processing model is trained based on the third learning rate to obtain the third loss, whether the third loss is converged is judged again, if the third loss is converged, the training is finished to obtain the processing model, if the third loss is not converged, the third learning rate is taken as a new second learning rate, the second loss value is taken as a new first loss value, the third loss value is taken as a new second loss value, the trained second intermediate processing model is taken as a new second intermediate processing model, the initial processing model is performed according to the first loss, the second learning rate and the preset training times are determined, the third learning rate is calculated based on the third learning rate, and the initial processing model is obtained until the initial processing condition is satisfied.
According to the model training method, the parameters of the model are adjusted based on the loss between the input value and the output value of the model until the model converges, and the determined parameters of the model at the moment are the parameters of the model after training, namely, the parameters of the model are adjusted according to the convergence condition of the model.
In one embodiment, on the basis of the embodiment shown in fig. 2 or fig. 3, if the first learning rate and the second learning rate are the same, a process of determining the first learning rate according to the initial learning rate and the preset training number may be described, where the step S201 "determining the first learning rate according to the initial learning rate and the preset training number" includes:
and carrying out product operation on the inverse proportion value of the preset training times and the initial learning rate, and determining a first learning rate.
In the embodiment of the present application, after determining the initial learning rate and the preset training frequency of the model to be trained according to the type of the model to be trained, the complexity of the model to be trained, and the sample size of the sample data, the computer device may perform product operation on the inverse proportion value of the preset training frequency and the initial learning rate to determine the first learning rate, where the process is represented by the following formula (1):
S 1 =η*1/M (1)
Wherein S is 1 For the first learning rate, η is the initial learning rate, and M is the preset training number.
Further, in an embodiment, a process of determining the third learning rate according to the first loss, the second learning rate, and the preset training number may be described based on the above embodiment, as shown in fig. 4, S204 "determining the third learning rate according to the first loss, the second learning rate, and the preset training number", including:
s401, determining a loss fluctuation parameter according to the first loss and the second loss.
In the embodiment of the application, after the first loss and the second loss are obtained, the first loss value and the second loss can be operated, and the obtained operation result is the loss fluctuation parameter; for example, the first loss value and the second loss value may be accumulated and summed, and the result of the accumulated and summed operation may be determined as a loss ripple function.
Alternatively, a method of determining a loss ripple parameter based on the first loss and the second loss is provided below.
As shown in fig. 5, S401 "determines a loss fluctuation parameter according to the first loss and the second loss", including:
s501, carrying out product operation on the inverse proportion value of the first loss and the second loss, and determining the ratio.
In the embodiment of the present application, after the first loss and the second loss are determined, the computer device may perform a product operation on an inverse proportion value of the first loss and the second loss, and determine a result of the product operation as a ratio, where the above process is expressed by the following formula (2):
R=L 2 /L 1 (1)
wherein R is the ratio, L 1 L is the first loss 2 Is the second loss.
S502, performing indexing processing on the comparison value, and determining loss fluctuation parameters.
In the embodiment of the present application, after determining the ratio, the computer device may perform an indexing process on the ratio, and determine a result of the indexing process as a loss fluctuation parameter, where a process of determining the loss fluctuation parameter is as follows by using the following formula (3):
P=e 1-R (3)
wherein P is a loss fluctuation parameter, L 1 L is the first loss 2 Is the second loss.
S402, determining a third learning rate according to the loss fluctuation parameter, the second learning rate and the preset training times.
In the embodiment of the application, after the loss fluctuation parameter, the second learning rate and the preset training times are obtained, the loss fluctuation parameter, the second learning rate and the preset training times can be operated, and the obtained operation result is the third learning rate; for example, the product operation may be performed on the loss fluctuation parameter and the second learning rate to obtain a first operation result, and the product operation may be performed on the first operation result and the preset training frequency to obtain a second operation, that is, the third learning rate.
Optionally, a method for determining the third learning rate according to the loss fluctuation parameter, the second learning rate, and the preset training number is provided below.
As shown in fig. 6, S402 "determines a third learning rate according to the loss fluctuation parameter, the second learning rate, and the preset training number", including:
s601, performing product operation on an inverse proportion value of preset training times and an initial learning rate, and determining an intermediate learning rate.
In the embodiment of the present application, after determining the preset training times and the initial learning rate, the computer device may perform data operation processing on the preset training times and the initial learning rate, and determine a result of the processing as an intermediate learning rate, where a process of determining the intermediate learning rate is as follows by using the following formula (4):
S 2 =η*i/M (4)
wherein S is 2 For the middle learning rate, eta is initial learning, i is the number of training rounds, and M is the preset training times.
S602, performing product operation on the intermediate learning rate and the loss fluctuation parameter to determine a third learning rate.
In the embodiment of the present application, after the intermediate learning rate and the loss fluctuation parameter are determined, the computer device may perform data operation processing on the intermediate learning rate and the loss fluctuation parameter, and determine a result of the processing as a third learning rate, where a process of determining the third learning rate is as follows by using the following formula (5):
S 3 =S 2 *P (5)
Wherein S is 3 For the third learning rate, S 2 For the intermediate learning rate, P is the loss ripple parameter.
According to the model training method, the parameters of the model are adjusted based on the loss between the input value and the output value of the model until the model converges, and the determined parameters of the model at the moment are the parameters of the model after training, namely, the parameters of the model are adjusted according to the convergence condition of the model.
In one embodiment, based on the embodiment shown in fig. 2, as shown in fig. 7, the method further includes:
s206, training the trained processing model in the second stage based on the initial learning rate to obtain a trained target processing model.
In the embodiment of the application, under the condition that the third loss converges, a trained processing model is obtained, and the second stage training is performed on the trained processing model based on the initial learning rate, so as to obtain a trained target processing model.
In one embodiment, as shown in fig. 8, a training method for a complete model is provided, comprising:
s10, performing product operation on an inverse proportion value of preset training times and an initial learning rate, and determining a first learning rate and a second learning rate;
S11, performing first-round training on the initial processing model based on a first learning rate to obtain a first loss and a first intermediate processing model;
s12, based on a second learning rate, performing a second training on the initial processing model after the first training to obtain a second loss and a second intermediate processing model;
s13, carrying out product operation on the inverse proportion value of the first loss and the second loss, and determining a ratio;
s14, carrying out indexing treatment on the comparison value, and determining loss fluctuation parameters;
s15, performing product operation on an inverse proportion value of the preset training times and the initial learning rate, and determining an intermediate learning rate;
s16, performing product operation on the intermediate learning rate and the loss fluctuation parameter to determine a third learning rate;
s17, training the second intermediate processing model based on the third learning rate to obtain a third loss, and judging whether the third loss is converged or not;
s18, if convergence is achieved, training is completed to obtain a processing model;
s19, if the first learning rate is not converged, taking the third learning rate as a new second learning rate, taking the second loss value as a new first loss value, taking the third loss value as a new second loss value, taking the trained second intermediate processing model as a new second intermediate processing model, and returning to execute the step of determining the third learning rate according to the first loss, the second learning rate and the preset training times;
And S20, training the trained processing model in a second stage based on the initial learning rate to obtain a trained target processing model.
According to the model training method, the parameters of the model are adjusted based on the loss between the input value and the output value of the model until the model converges, and the determined parameters of the model at the moment are the parameters of the model after training, namely, the parameters of the model are adjusted according to the convergence condition of the model.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a training device for the model, which is used for realizing the training method of the model. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the training device for one or more models provided below may be referred to the limitation of the training method for the model hereinabove, and will not be described herein.
In one embodiment, as shown in fig. 9, there is provided a training apparatus of a model, including: a first determination module 10, a first training module 11, a second training module 12, a second determination module 13 and a third training module 14, wherein:
a first determining module 10, configured to determine a first learning rate and a second learning rate according to an initial learning rate and a preset training frequency;
a first training module 11, configured to perform a first training on the initial processing model based on a first learning rate, to obtain a first loss and a first intermediate processing model;
a second training module 12, configured to perform a second training on the initial processing model after the first training based on a second learning rate, to obtain a second loss and a second intermediate processing model;
A second determining module 13, configured to determine a third learning rate according to the first loss, the second learning rate, and the preset training times;
and the third training module 14 is configured to train the second intermediate processing model based on a third learning rate, obtain a third loss, and complete training according to a convergence condition of the third loss to obtain a trained processing model.
In one embodiment, the third training module 14 is configured to obtain the processing model after training if the third loss converges; if the third loss does not converge, the third learning rate is taken as a new second learning rate, the second loss value is taken as a new first loss value, the third loss value is taken as a new second loss value, the trained second intermediate processing model is taken as a new second intermediate processing model, and the step of determining the third learning rate according to the first loss, the second learning rate and the preset training times is carried out.
In one embodiment, if the first learning rate is the same as the second learning rate, the first determining module 10 is configured to perform a product operation on the inverse proportion value of the preset training times and the initial learning rate to determine the first learning rate.
In one embodiment, the second determining module 13 comprises a first determining unit and a second determining unit, wherein:
a first determining unit, specifically configured to determine a loss fluctuation parameter according to the first loss and the second loss;
the second determining unit is specifically configured to determine a third learning rate according to the loss fluctuation parameter, the second learning rate, and the preset training number.
In one embodiment, the first determining unit is specifically configured to perform a product operation on the inverse proportion value of the first loss and the second loss to determine a ratio; and (5) carrying out indexing processing on the ratio value to determine the loss fluctuation parameter.
In one embodiment, the second determining unit is specifically configured to perform a product operation on an inverse proportion value of the preset training times and the initial learning rate, and determine an intermediate learning rate; and performing product operation on the intermediate learning rate and the loss fluctuation parameter to determine a third learning rate.
In one embodiment, as shown in fig. 10, the apparatus further includes: and the obtaining module 15 is configured to perform second-stage training on the trained processing model based on the initial learning rate, so as to obtain a trained target processing model.
The various modules in the training device of the model described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in FIG. 1. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing learning rate data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a training method for a model.
It will be appreciated by those skilled in the art that the architecture shown in fig. 1 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting as to the computer device to which the present inventive arrangements may be implemented, as a particular computer device may include more or less components than those shown, or may be combined with some components, or may have a different arrangement of components.
In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:
determining a first learning rate and a second learning rate according to the initial learning rate and the preset training times;
based on a first learning rate, performing a first round of training on the initial processing model to obtain a first loss and a first intermediate processing model;
based on a second learning rate, performing a second training on the initial processing model after the first training to obtain a second loss and a second intermediate processing model;
determining a third learning rate according to the first loss, the second learning rate and the preset training times;
and training the second intermediate processing model based on the third learning rate to obtain a third loss, and completing training according to the convergence condition of the third loss to obtain a trained processing model.
In one embodiment, the processor when executing the computer program further performs the steps of:
if the third loss converges, training is completed to obtain a processing model;
if the third loss does not converge, the third learning rate is taken as a new second learning rate, the second loss value is taken as a new first loss value, the third loss value is taken as a new second loss value, the trained second intermediate processing model is taken as a new second intermediate processing model, and the step of determining the third learning rate according to the first loss, the second learning rate and the preset training times is carried out.
In one embodiment, the processor when executing the computer program further performs the steps of:
and carrying out product operation on the inverse proportion value of the preset training times and the initial learning rate, and determining a first learning rate.
In one embodiment, the processor when executing the computer program further performs the steps of:
determining a loss fluctuation parameter based on the first loss and the second loss;
and determining a third learning rate according to the loss fluctuation parameter, the second learning rate and the preset training times.
In one embodiment, the processor when executing the computer program further performs the steps of:
performing product operation on the inverse proportion value of the first loss and the second loss to determine a ratio;
And (5) carrying out indexing processing on the ratio value to determine the loss fluctuation parameter.
In one embodiment, the processor when executing the computer program further performs the steps of:
performing product operation on an inverse proportion value of the preset training times and the initial learning rate, and determining an intermediate learning rate;
and performing product operation on the intermediate learning rate and the loss fluctuation parameter to determine a third learning rate.
In one embodiment, the processor when executing the computer program further performs the steps of:
and training the trained processing model in the second stage based on the initial learning rate to obtain a trained target processing model.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:
determining a first learning rate and a second learning rate according to the initial learning rate and the preset training times;
based on a first learning rate, performing a first round of training on the initial processing model to obtain a first loss and a first intermediate processing model;
based on a second learning rate, performing a second training on the initial processing model after the first training to obtain a second loss and a second intermediate processing model;
Determining a third learning rate according to the first loss, the second learning rate and the preset training times;
and training the second intermediate processing model based on the third learning rate to obtain a third loss, and completing training according to the convergence condition of the third loss to obtain a trained processing model.
In one embodiment, the computer program when executed by the processor further performs the steps of:
if the third loss converges, training is completed to obtain a processing model;
if the third loss does not converge, the third learning rate is taken as a new second learning rate, the second loss value is taken as a new first loss value, the third loss value is taken as a new second loss value, the trained second intermediate processing model is taken as a new second intermediate processing model, and the step of determining the third learning rate according to the first loss, the second learning rate and the preset training times is carried out.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and carrying out product operation on the inverse proportion value of the preset training times and the initial learning rate, and determining a first learning rate.
In one embodiment, the computer program when executed by the processor further performs the steps of:
Determining a loss fluctuation parameter based on the first loss and the second loss;
and determining a third learning rate according to the loss fluctuation parameter, the second learning rate and the preset training times.
In one embodiment, the computer program when executed by the processor further performs the steps of:
performing product operation on the inverse proportion value of the first loss and the second loss to determine a ratio;
and (5) carrying out indexing processing on the ratio value to determine the loss fluctuation parameter.
In one embodiment, the computer program when executed by the processor further performs the steps of:
performing product operation on an inverse proportion value of the preset training times and the initial learning rate, and determining an intermediate learning rate;
and performing product operation on the intermediate learning rate and the loss fluctuation parameter to determine a third learning rate.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and training the trained processing model in the second stage based on the initial learning rate to obtain a trained target processing model.
In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:
determining a first learning rate and a second learning rate according to the initial learning rate and the preset training times;
Based on a first learning rate, performing a first round of training on the initial processing model to obtain a first loss and a first intermediate processing model;
based on a second learning rate, performing a second training on the initial processing model after the first training to obtain a second loss and a second intermediate processing model;
determining a third learning rate according to the first loss, the second learning rate and the preset training times;
and training the second intermediate processing model based on the third learning rate to obtain a third loss, and completing training according to the convergence condition of the third loss to obtain a trained processing model.
In one embodiment, the computer program when executed by the processor further performs the steps of:
if the third loss converges, training is completed to obtain a processing model;
if the third loss does not converge, the third learning rate is taken as a new second learning rate, the second loss value is taken as a new first loss value, the third loss value is taken as a new second loss value, the trained second intermediate processing model is taken as a new second intermediate processing model, and the step of determining the third learning rate according to the first loss, the second learning rate and the preset training times is carried out.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and carrying out product operation on the inverse proportion value of the preset training times and the initial learning rate, and determining a first learning rate.
In one embodiment, the computer program when executed by the processor further performs the steps of:
determining a loss fluctuation parameter based on the first loss and the second loss;
and determining a third learning rate according to the loss fluctuation parameter, the second learning rate and the preset training times.
In one embodiment, the computer program when executed by the processor further performs the steps of:
performing product operation on the inverse proportion value of the first loss and the second loss to determine a ratio;
and (5) carrying out indexing processing on the ratio value to determine the loss fluctuation parameter.
In one embodiment, the computer program when executed by the processor further performs the steps of:
performing product operation on an inverse proportion value of the preset training times and the initial learning rate, and determining an intermediate learning rate;
and performing product operation on the intermediate learning rate and the loss fluctuation parameter to determine a third learning rate.
In one embodiment, the computer program when executed by the processor further performs the steps of:
and training the trained processing model in the second stage based on the initial learning rate to obtain a trained target processing model.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (10)

1. A method of training a model, the method comprising:
determining a first learning rate and a second learning rate according to the initial learning rate and the preset training times;
based on the first learning rate, performing a first training in the initial processing model to obtain a first loss and a first intermediate processing model;
based on the second learning rate, performing a second training on the initial processing model after the first training to obtain a second loss and a second intermediate processing model;
Determining a third learning rate according to the first loss, the second learning rate and the preset training times;
and training the second intermediate processing model based on the third learning rate to obtain a third loss, and completing training according to the convergence condition of the third loss to obtain a trained processing model.
2. The method of claim 1, wherein the training is performed based on the convergence of the third loss to obtain a trained process model, comprising:
if the third loss converges, training is completed to obtain a processing model;
and if the third loss does not converge, taking the third learning rate as a new second learning rate, taking the second loss value as a new first loss value, taking the third loss value as a new second loss value, taking the trained second intermediate processing model as a new second intermediate processing model, and returning to execute the step of determining the third learning rate according to the first loss, the second learning rate and the preset training times.
3. The method according to claim 1 or 2, wherein if the first learning rate and the second learning rate are the same, the determining the first learning rate according to the initial learning rate and the preset training number comprises:
And carrying out product operation on the inverse proportion value of the preset training times and the initial learning rate to determine a first learning rate.
4. The method according to claim 1 or 2, wherein said determining a third learning rate based on said first loss, said second learning rate and said preset number of exercises comprises:
determining a loss ripple parameter from the first loss and the second loss;
and determining the third learning rate according to the loss fluctuation parameter, the second learning rate and the preset training times.
5. The method of claim 4, wherein said determining a loss ripple parameter from said first loss and said second loss comprises:
performing product operation on the inverse proportion value of the first loss and the second loss to determine a ratio;
and carrying out indexing treatment on the ratio to determine a loss fluctuation parameter.
6. The method of claim 4, wherein said determining said third learning rate based on said loss fluctuation parameter, said second learning rate, and said preset number of exercises comprises:
performing product operation on the inverse proportion value of the preset training times and the initial learning rate to determine an intermediate learning rate;
And carrying out product operation on the intermediate learning rate and the loss fluctuation parameter to determine the third learning rate.
7. The method according to claim 1, wherein the method further comprises:
and training the trained processing model in a second stage based on the initial learning rate to obtain a trained target processing model.
8. A training device for a model, the device comprising:
the first determining module is used for determining a first learning rate and a second learning rate according to the initial learning rate and the preset training times;
the first training module is used for carrying out first training on the initial processing model based on the first learning rate to obtain a first loss and a first intermediate processing model;
the second training module is used for carrying out second training on the initial processing model after the first training based on the second learning rate to obtain a second loss and a second intermediate processing model;
the second determining module is used for determining a third learning rate according to the first loss, the second learning rate and the preset training times;
and the third training module is used for training the second intermediate processing model based on the third learning rate to obtain a third loss, and completing training according to the convergence condition of the third loss to obtain a trained processing model.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
CN202310859847.4A 2023-07-13 2023-07-13 Model training method, device, computer equipment and storage medium Pending CN117114127A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310859847.4A CN117114127A (en) 2023-07-13 2023-07-13 Model training method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310859847.4A CN117114127A (en) 2023-07-13 2023-07-13 Model training method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117114127A true CN117114127A (en) 2023-11-24

Family

ID=88808145

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310859847.4A Pending CN117114127A (en) 2023-07-13 2023-07-13 Model training method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117114127A (en)

Similar Documents

Publication Publication Date Title
US20190370659A1 (en) Optimizing neural network architectures
CN110929047A (en) Knowledge graph reasoning method and device concerning neighbor entities
CN115729796B (en) Abnormal operation analysis method based on artificial intelligence and big data application system
CN115293919A (en) Graph neural network prediction method and system oriented to social network distribution generalization
CN117033039A (en) Fault detection method, device, computer equipment and storage medium
CN115081613A (en) Method and device for generating deep learning model, electronic equipment and storage medium
CN114065003A (en) Network structure searching method, system and medium oriented to super large searching space
CN116800671A (en) Data transmission method, apparatus, computer device, storage medium, and program product
CN117114127A (en) Model training method, device, computer equipment and storage medium
CN115202879A (en) Multi-type intelligent model-based cloud edge collaborative scheduling method and application
CN116738429B (en) Target detection engine optimization method, device and system based on generation countermeasure
CN117932245B (en) Financial data missing value completion method, device and storage medium
CN116484280A (en) Training method of object classification model, object classification method and device
CN116881122A (en) Test case generation method, device, equipment, storage medium and program product
CN116881450A (en) Information classification method, apparatus, computer device, storage medium, and program product
CN117459576A (en) Data pushing method and device based on edge calculation and computer equipment
CN118133044A (en) Problem extension method, device, computer equipment, storage medium and product
CN115934394A (en) Data processing method, device, equipment and storage medium
CN117172896A (en) Prediction method, prediction apparatus, computer device, storage medium, and program product
CN117407418A (en) Information acquisition method, information acquisition device, computer apparatus, storage medium, and program product
CN118195769A (en) Product overdue state prediction method, device, computer equipment, storage medium and computer program product
CN117372148A (en) Method, apparatus, device, medium and program product for determining credit risk level
CN116306887A (en) Model optimization method, device, computer equipment and storage medium
CN116861047A (en) Man-machine interaction method, device, equipment and medium
CN115618221A (en) Model training method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination