CN114004358B - Deep learning model training method - Google Patents

Deep learning model training method Download PDF

Info

Publication number
CN114004358B
CN114004358B CN202111635664.1A CN202111635664A CN114004358B CN 114004358 B CN114004358 B CN 114004358B CN 202111635664 A CN202111635664 A CN 202111635664A CN 114004358 B CN114004358 B CN 114004358B
Authority
CN
China
Prior art keywords
model
training
node
deep learning
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111635664.1A
Other languages
Chinese (zh)
Other versions
CN114004358A (en
Inventor
张家兴
李鹏飞
郑海波
王昊
王瑞
吴晓均
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Digital Economy Academy IDEA
Original Assignee
International Digital Economy Academy IDEA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Digital Economy Academy IDEA filed Critical International Digital Economy Academy IDEA
Priority to CN202111635664.1A priority Critical patent/CN114004358B/en
Publication of CN114004358A publication Critical patent/CN114004358A/en
Application granted granted Critical
Publication of CN114004358B publication Critical patent/CN114004358B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/10Interfaces, programming languages or software development kits, e.g. for simulating neural networks
    • G06N3/105Shells for specifying net layout

Abstract

The invention discloses a deep learning model training method, which comprises the following steps: acquiring a first data set; setting initial model parameters and training suspension conditions of the deep learning model, carrying out multiple iterative training on the deep learning model based on the first data set, and calculating model parameter values generated by each iteration according to the initial model parameters; generating a model node when the training suspension condition is satisfied; and saving the model parameter values generated by the last iterative training as the node information of the model nodes. The deep learning model training method provided by the invention can form at least two model nodes in the training process and store node information. Furthermore, a display interface of the node information can be formed, and interactive operation of deep learning model training is provided on the display interface, so that a user can timely adjust according to the node information of each model node in the training process, and an optimal deep learning model can be obtained more quickly.

Description

Deep learning model training method
Technical Field
The invention relates to the technical field of deep learning, in particular to a deep learning model training method.
Background
With the increasingly mature AI (intellectual intelligence) technology and theory, the application field of AI algorithms is wider and wider, and the traffic is in a rapidly developing situation. In order to enable an algorithm engineer to break away from a complicated engineering development task and concentrate on algorithm research and development, various machine learning algorithm platforms are produced. In the prior art, most AI algorithm platforms only provide a training record list ordered according to time, the training process is not managed, and the interactivity with a user in the model training process is poor.
Accordingly, the prior art is yet to be improved and developed.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a deep learning model training method aiming at the above-mentioned defects in the prior art, and to solve the problem of poor interactivity with users in the model training process in the prior art.
The technical scheme adopted by the invention for solving the technical problem is as follows:
a deep learning model training method comprises the following steps:
acquiring a first data set;
setting initial model parameters and training suspension conditions of the deep learning model, carrying out multiple iterative training on the deep learning model based on the first data set, and calculating model parameter values generated by each iteration according to the initial model parameters;
generating a model node when the training suspension condition is satisfied;
and saving the model parameter values generated by the last iterative training as the node information of the model nodes.
The first data set comprises training samples and test samples;
performing multiple times of iterative training on the deep learning model based on the training samples;
and the training stopping condition is that the number of times of the iterative training reaches a preset training number, and the last iterative training is ended.
Testing the deep learning model after the last iterative training based on the test sample to obtain test data and then generating a model node;
the test data comprises at least one of model accuracy, sample classification accuracy, confidence and test classification result.
The node information of the model node further includes at least: the first data set, the test data, the model node generation time, and the computational power consumed by the deep learning model training.
Generating a display interface according to the node information of the model node;
the information of the display interface is as follows: and generating at least one of a relationship graph and the node information according to the relationship of the model node generation time.
The presentation interface further comprises at least one interactive operation: the interactive operation of obtaining the first data set, and the interactive operation of setting the initial model parameters and the training suspension conditions of the deep learning model.
After the step of generating a model node after the training suspension condition is met, receiving a retraining instruction of a user and acquiring a second data set, wherein the second data set comprises the first data set and a sample data set, and training samples in the sample data set have corresponding weights;
the retraining instruction also comprises selecting a model node as a father model node, carrying out multiple iterative training on the deep learning model based on the second data set, and calculating a model parameter value generated by each iteration by taking a model parameter value stored by the father model node as an initial model parameter;
generating a sub-model node after the training suspension condition is met;
and saving the model parameter values generated by the last iterative training as the node information of the sub-model nodes.
The step of selecting a model node as a parent model node comprises:
and taking the model node with the maximum accuracy as a parent model node.
The step of selecting a model node as a parent model node comprises:
and taking the model node with the maximum accuracy rate on the sample data set as a father model node.
The step of selecting a model node as a parent model node comprises:
and taking the model node with the maximum accuracy rate on part of training samples in the sample data set as a father model node.
The step of selecting a model node as a parent model node comprises:
and taking the model node with the model generation time arranged in the front as a parent model node.
In the step of setting initial model parameters and training suspension conditions of the deep learning model, generating initial model nodes;
the step of selecting a model node as a parent model node comprises:
and taking the starting model node as a father model node.
Generating a display interface according to the node information of the father model node and the node information of the child model node;
the information of the display interface is as follows: at least one information of the relationship graph of the father model node and the child model node and the node information.
Preferably, the display interface is further provided with a user interaction interface, and the user interaction interface comprises at least one interaction operation: the interactive operation of obtaining the second data set, the interactive operation of selecting one model node as a father model node, the interactive operation of setting a suspension condition, the interactive operation of deleting and adding the sample data set and the interactive operation of modifying the weight of the training samples in the sample data set.
A computer device comprising a memory storing a computer program and a processor, wherein the processor implements the steps of any of the methods described above when executing the computer program.
A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, carries out the steps of the method of any of the above.
Has the advantages that: the deep learning model training method provided by the invention meets the training suspension condition to generate one model node, can form a plurality of model nodes in the training process, and saves the node information. The invention can form a display interface according to the node information to display the node information of the model nodes and visually display the relationship graph among the model nodes according to the node information. Furthermore, the interactive operation of deep learning model training is provided on the display interface, so that a user can timely adjust according to the node information of each model node in the training process, and the optimal deep learning model can be obtained more quickly.
Drawings
FIG. 1 is a flow chart of a deep learning model training method in the present invention.
FIG. 2 is a first diagram of a journey tree of the deep learning model training method of the present invention.
FIG. 3 is a diagram of a journey tree of the deep learning model training method of the present invention.
FIG. 4 is an interactive interface diagram of the deep learning model training method of the present invention.
FIG. 5 is a schematic diagram of a sample data set of the deep learning model training method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1-5, the present invention provides some embodiments of a deep learning model training method.
As shown in fig. 1, the deep learning model training method according to the embodiment of the present invention includes the following steps:
step S100, a first data set is obtained.
In particular, the first data set includes a sample and a label corresponding to the sample. According to the difference of the training purpose and the testing purpose, the first data set is divided into a training set and a testing set, and the sample is divided into a training sample and a testing sample. The training samples are used for training the deep learning model, and the training result of the deep learning model can be obtained by training the deep learning model through the training samples. The test sample is used for testing the deep learning model, after the deep learning model is trained through the training sample, the test sample is used for testing the deep learning model, and test data of the deep learning model can be obtained through testing the deep learning model, wherein the test data comprises at least one of model accuracy, sample classification accuracy, confidence and test classification results.
The accuracy of the model refers to the proportion of the correct training samples to the total training samples. The sample classification accuracy rate refers to the proportion of the training samples with correct prediction in a certain class to all the training samples in the class. Confidence is the probability that a true value will appear within a certain range centered on the measured value. The test classification result refers to the result of sample classification during testing.
The test classification results can be displayed by adopting a confusion matrix. For example, taking the classification of two as an example, the classification of samples can be classified into positive or negative classes during testing, and four cases can occur: true class, positive class and also predicted as positive class; false positive class, negative class is predicted to be positive class; a true negative class, the negative class predicted to be a negative class; false negative classes, positive classes are predicted as negative classes. For example, 10 classification labels are taken as examples, and the labels (label) 1 to 10 are respectively, a sample having 10 labels 1 is predicted as label 1, a sample having 2 labels 1 is predicted as label 2, and a sample having 2 labels 1 is predicted as label 3.
According to whether the weight is carried, dividing the data set into a sample data set and a sample data set, wherein the sample data set comprises: the sample data set comprises the sample, the label corresponding to the sample and the weight corresponding to the sample. In order to distinguish the sample data set from the sample data set, a sample in the sample data set is denoted as a first sample, and a sample in the sample data set is denoted as a second sample. The sample data set includes: the first sample and the label to which the first sample corresponds. The sample data set includes: the second sample, the label corresponding to the second sample, and the weight corresponding to the second sample. It should be noted that the first data set includes a sample data set, that is, includes a first sample and a label corresponding to the first sample, and since the first sample is divided into a training sample and a testing sample, the first data set includes the first training sample, the label corresponding to the first training sample, the first testing sample, and the label corresponding to the first testing sample.
In retraining, the second data set may include a sample data set and/or a sample data set, and since the second sample is divided into a training sample and a testing sample, the sample data set includes the second training sample, a label corresponding to the second training sample, a weight corresponding to the second training sample, the second testing sample, a label corresponding to the second testing sample, and a weight corresponding to the second testing sample. The sample data set in the second data set is the same as the sample data set in the first data set, that is, the sample data set in the second data set includes the first training sample, the label corresponding to the first training sample, the first test sample, the label corresponding to the second training sample, the weight corresponding to the second training sample, the label corresponding to the second test sample, and the weight corresponding to the second test sample. In some embodiments, the sample data set in the second data set may also be different from the sample data set in the first data set, that is, the sample data set in the first data set may be adjusted to be the sample data set in the second data set.
It should be noted that, according to different training tasks, different deep learning models may be adopted, and different data sets are correspondingly used. Deep learning models include, but are not limited to, BERT (Bidirectional Encoder representation from transforms) models, GPT (Generative Pre-Training) models, and ResNet (Residual Neural Network) models.
Training tasks include, but are not limited to, at least one of:
target segmentation for segmenting a target and a background in an input image;
classifying objects in the input image or classifying texts;
target tracking based on the input image;
medical image based diagnostic assistance;
speech recognition (classification), speech correction based on input speech.
Taking a classification type training task as an example, each training sample in the data set has a corresponding label, and for example, the classification type training task is classification of an object in an input image, the training image is taken as a training sample, and a category of the object is taken as a label. The training images can be crawled on the internet and cleaned by using a web crawler, so that the cost is saved. For example, the classification-type training task is speech recognition of input speech, training speech is used as a training sample, and text corresponding to the speech is used as a label.
The text classification is used for illustration, the first training sample is a first news sample, the second training sample is a second news sample, and the labels comprise sports news, entertainment news and economic news, as shown in fig. 4.
S200, setting initial model parameters and training suspension conditions of the deep learning model, carrying out multiple iterative training on the deep learning model based on the first data set, and calculating model parameter values generated by each iteration according to the initial model parameters.
Before training, initial model parameters are configured for the deep learning model, default values can be adopted, and the deep learning model can also be pre-trained to obtain the initial model parameters. The training suspension condition refers to a condition for suspending training by generating a model node, for example, the training suspension condition specifically refers to a condition that the number of times of iterative training reaches a preset training number, and the last iterative training is ended. The preset training times may be set as needed, for example, assuming that 100 iterations are required for one training, the preset iteration training times may be set to 100, 100 iterations are performed, and the 100 th iteration training is finished, so that the training termination condition is satisfied. For another example, when there are many samples, 10000 iterations are required for one training to enable all samples to be sufficiently trained, and because the number of iterations is large, the preset training number may also be set to 100, 100 iterations are performed, and the 100 th iteration training is finished, which satisfies the training termination condition.
In the iterative training process of the deep learning model based on the first data set, the initial model parameters are calculated to generate model parameter values, and the generated model parameter values may be different in each iteration.
The initial model parameters include: the system comprises a first training parameter, a first model structure parameter and a learning strategy. The training parameters are parameters that require training data to drive adjustment, such as specific kernel parameters of the convolution kernel, i.e. weight (weight), bias (bias), etc. The model structure parameters comprise hyper-parameters, the hyper-parameters refer to parameters which do not need training data to drive adjustment, but are parameters which are manually adjusted before training or in training, and the hyper-parameters are divided into three types: network parameters, optimization parameters, and regularization parameters. The learning strategy includes an initial learning rate, a decay factor, and the like.
The step S200 includes:
and step S210, generating a starting model node.
And in the step of setting the initial model parameters of the deep learning model, generating an initial model node. Preferably, in the step of setting the initial model parameters as the first training parameters, the first model structure parameters and the learning strategy, the initial model node is generated.
Step S200 further includes:
and S220, performing multiple times of iterative training on the deep learning model based on the training samples.
Specifically, based on a training sample (i.e., the first training sample in the first data set), the deep learning model is iteratively trained a plurality of times starting with the initial model parameters.
First, initial model parameters are used for first iterative training. And calculating the model parameter values of each subsequent iteration by using the first training parameter, the first model structure parameter and the learning strategy, namely a second training parameter and a second model structure parameter of the second iteration, a third training parameter and a third model structure parameter … … of the third iteration, and training the deep learning model by using the training parameters and the model structure parameters corresponding to the number of iterations, so as to perform multiple iteration training.
Specifically, the iterative training of the deep learning model includes multiple iterative training, and the deep learning model is trained with the initial model parameters and the first data set in the first iterative cycle (i.e., the first iterative training). And comparing the prediction result of the training in the first iteration period with the label of the first training sample in the first data set, and calculating the objective function value. And then calculating and generating a model parameter value of the first iterative training according to the initial model parameter and the objective function value.
And in the second iteration period (namely, the second iteration training), the training of the deep learning model is carried out by using the model parameter values generated by the first iteration training and the first data set. And comparing the prediction result of the training in the second iteration period with the label of the first training sample in the first data set, and calculating the objective function value. And then calculating and generating a model parameter value of the second iterative training according to the initial model parameter and the objective function value.
And step 300, generating a model node when the training suspension condition is met.
Specifically, based on the first data set, a deep learning model is trained, corresponding iterative training is performed by using a second training parameter and a second model structure parameter, and a model node is generated after preset training times are completed.
As shown in FIG. 2, the starting model nodes are represented by open circles and the training model nodes are represented by filled circles. And setting a first training parameter, a first model structure parameter and the learning strategy at the initial model node. It can be appreciated that at the time of initial model node generation, the deep learning model has not been trained and tested; and performing corresponding iterative training by using the second training parameter and the second model structure parameter, and generating a training model node after finishing the training for the preset training times.
Step S300 further includes:
and S310, testing the deep learning model after the last iterative training based on the test sample to obtain test data and then generating a model node.
Specifically, after the iterative training reaches a preset training number, the deep learning model after the last iterative training is tested based on a test sample (i.e., a first test sample in the first data set), so as to obtain test data, and generate a model node.
The test data comprises at least one of model accuracy, sample classification accuracy, confidence and test classification result.
Specifically, step S310 includes:
step S311, testing is carried out based on the test sample in the first data set, and a model node is generated after the test data is obtained by comparing the test result with the label of the test sample.
Specifically, after multiple iterative trainings are finished, model parameter values of the last iterative training are obtained, the first test sample in the first data set is input into the deep learning model to obtain a prediction result of the first test sample, and the accuracy of the test data, such as the deep learning model, is obtained according to the prediction result and the label of the first test sample.
Optionally, a preset training time may be determined according to a training requirement of the deep learning model, and when the number of iterative training times satisfies the preset training time, a model node is generated. When training samples (e.g., 10000 first training samples) in the first data set are all trained for 1 time, the number of times of iterative training of the deep learning model is set to 100. That is, after the deep learning model is trained for 100 times of iteration, a model node is obtained. After 100 times of iterative training, testing is carried out based on the test sample in the first data set, and a model node is generated after the test data is obtained by comparing the test result with the label of the test sample.
And S400, saving the model parameter value generated by the last iterative training as the node information of the model node.
Specifically, in the model node training, after the deep learning model is subjected to one iterative training, the next iterative training is performed, and a model node is generated after the preset iterative training times are completed. In the last iterative training, a model parameter value is generated, and the model parameter value is used as node information of the model node.
It should be noted that, as iterative training is performed, a training task may satisfy a training termination condition for multiple times, and multiple model nodes may be generated. Specifically, when the iterative training meets the training suspension condition, a model node is generated, and then the iterative training is continued on the basis of the model node until the next time the training suspension condition is met, the next model node is generated. For example, when the number of iterative training times reaches a preset training number, a model node is generated, iterative training is continued on the basis of the model node, and when the number of iterative training times reaches the preset training number, a next model node is generated until a training termination condition is met. Further, node information of a plurality of model nodes is stored, and a training journey tree is formed according to the node information, wherein the training journey tree comprises at least two model nodes. Of course, different training trip trees can be obtained by using the same deep learning model according to different training tasks.
For example, as shown in fig. 2, a deep learning model is trained and tested based on a first data set, a first model node (model node generation time: 14:30 at 7/8/7/2021) is generated after 100 times of iterative training and testing is completed, and model parameter values are generated by calculating initial model parameters in each iterative training, that is, second training parameters and second model structure parameters are calculated from the first training parameters, the first model structure parameters and the learning strategy. After the first model node is generated, the iterative training is continued for 100 times, and a second model node is generated (model node generation time: 2021, 7, 8, 15: 00). By analogy, the third model node (model node generation time: 15:30 on 8/7/2021) and the fourth model node (model node generation time: 16:00 on 8/7/2021) can be obtained.
The node information further comprises at least the first data set, the test data, model node generation time, and consumption calculation power of deep learning model training. Specifically, the node information of the model node may include at least one of the first data set, the test data, the generation time of the model node, and the consumption calculation power of the deep learning model training, in addition to the model parameter value generated by the last iteration training.
As shown in fig. 3, the node information includes a first data set (specifically, a test set) with IDs 20210727, 20210728, 20210739, a model node generation time, and an accuracy of the model. The generation time of the model nodes is 15:00 at 7-8 months and 8-8 days in 2021, 15:30 at 7-8 months and 8-16: 00 at 7-8 months in 2021. As shown in fig. 4, the accuracy of the model node is 99.47%.
And S500, generating a display interface according to the node information of the model node.
In the node information stored in each model node, the model parameter values of iterative training are calculated and generated by using the initial model parameters, that is, the second training parameter of each iterative training and the second model structure parameter of the last iterative training are calculated and obtained by using the first training parameter, the first model structure parameter and the learning strategy. Therefore, an internal association relationship exists between the model parameter values in each node information, and a display interface of the node information can be further generated according to the association relationship between the model parameter values. In addition, the generation time of each model node has a sequential relationship.
The information of the display interface further comprises: and generating at least one of a relationship graph and the node information according to the relationship of the model node generation time.
Preferably, a user interaction interface is further arranged on the display interface, so that a user can set and interact aiming at the training of the deep learning model. Therefore, the user can timely adjust according to the node information of each model node in the training process so as to obtain the optimal deep learning model more quickly.
The user interaction interface includes: the interactive operation of obtaining the first data set, and the interactive operation of setting initial model parameters and training suspension conditions of the deep learning model. As shown in FIG. 4, the user interaction interface includes uploading a first data set, which includes a training set and a test set.
Step S600, receiving a retraining instruction of a user and obtaining a second data set, where the second data set includes the first data set and a sample data set, and training samples in the sample data set have corresponding weights.
After finishing the multiple iterative training required by the user, completing the training of the deep model learning once. However, for the training task of deep model learning, multiple times of training are required. After finishing one training, a second and multiple retraining can also be started based on the last training.
Specifically, a retraining instruction of a user is received and a second data set is obtained, wherein the second data set comprises a first data set and a sample data set, and training samples in the sample data set have corresponding weights. The first data set adopts a sample data set, and training samples in the sample data set do not have corresponding weights.
The retraining instructions also include selecting a model node as a parent model node.
Step S700, carrying out multiple iterative training on the deep learning model based on the second data set, and calculating a model parameter value generated by each iteration by taking a model parameter value stored in a father model node as an initial model parameter;
step S800, generating a sub-model node after the training suspension condition is met;
and S900, saving the model parameter values generated by the last iterative training as the node information of the sub-model nodes.
In order to better describe the retraining from the parent model node, the new model node generated in the multiple iterative training from the parent model node can be regarded as the child model node. The relational graph of the model nodes comprises a time relational graph of the generation time of the model nodes and a relational graph of a parent model node and a child model node.
It should be noted that, a retraining instruction of a user is received, one model node is selected as a parent model node, and retraining is performed on the basis of the parent model node, that is, model parameter values (such as a second training parameter and a second model structure parameter) stored in the parent model node are used as initial model parameters; performing a plurality of iterative training of the deep learning model starting with the initial model parameters based on the second data set. At this time, the second training parameters and the second model structure parameters of the parent model nodes stored in the parent model nodes are equivalent to the first training parameters and the first model structure parameters set in the initial model nodes in the first training. The retraining instruction may further include a new learning strategy, the third training parameter and the third model structure parameter of each iterative training are calculated by using the second training parameter and the second model structure parameter stored in the parent model node and the learning strategy, and the corresponding iterative training is performed by using the third training parameter and the third model structure parameter.
The node information of the sub-model node further includes: the second data set, the test data, sub-model node generation time and the consumption calculation power of deep learning model training.
In the present invention, the second data set comprises a sample data set in which training samples (i.e. second training samples) have corresponding weights.
Since some training samples are important, a sample data set is formed by adding weights on the basis of forming the sample data set. The second data set includes a sample data set. The sample dataset includes: a second training sample, a label corresponding to the second training sample, and a weight corresponding to the second training sample.
The sample data set is obtained by the following steps:
configuring corresponding weights for a first training sample in the sample data set to form a sample data set; and/or
Data is collected and a sample data set is formed.
In particular, the sample data set in the first data set may be adapted. The adjustments here include: increasing or decreasing the training samples, modifying labels corresponding to the training samples, modifying weights corresponding to the training samples, and the like.
For example, after the second training sample is acquired, the weight corresponding to the second training sample in the sample data set may be determined according to the weight configuration instruction, and the weights of different second training samples are different, so that the weight corresponding to each second training sample is determined according to the user requirement, and the sample data set is obtained. Of course, the weights may also be re-adjusted, and if the weights are adjusted, the next sub-model node may be formed by continuing training, and whether the adjustment of the weights is appropriate may be determined according to the next sub-model node (e.g., test data).
For example, as shown in fig. 5, 3 second training samples (text 1, text 2, and text 3) are obtained, weights (0.5, and 0.8, respectively) are configured for the 3 second training samples, the 3 second training samples are input into a deep learning model (which may be a deep learning model corresponding to a parent model node), and prediction results output by the deep learning model (which are sports news, and economic news, respectively) are obtained, and labels of the 3 second training samples are sports news, entertainment news, and economic news, respectively. It can be seen that if the weight of the 3 rd second training sample is larger, the 3 rd second training sample may be added to the sample data set for further training, and if the prediction result of the 2 nd second training sample is inconsistent with the label, which indicates that the deep learning model is predicted incorrectly, the 2 nd second training sample is added to the sample data set for further training. As for the 1 st training sample, the sample data set may be added or deleted.
It should be noted that, a sample data generation model may be generated from a first training sample in the first data set according to a sample, and the first training sample is used as a second training sample after the weight is configured, so as to obtain a sample data set.
In the present invention, the retraining instruction further comprises selecting a model node as a parent model node, and specifically, selecting a model node as a parent model node includes the following 5 cases:
1. and taking the model node with the maximum accuracy rate as a parent model node.
Specifically, when the user needs to retrain, the user sends a retraining instruction as required to determine the parent model node, and because each model node obtains test data during testing, the user can select the model node with the maximum accuracy in the test data as the parent model node.
For example, as shown in fig. 2, when the deep learning model is trained to the fourth model node (model node generation time: 16: 00/7/8/2021), a retraining instruction of the user is received, and since the accuracy of the deep learning model corresponding to the first, second, and third model nodes is the greatest among the accuracy of the deep learning model corresponding to the third model node, the third model node (model node generation time: 15: 30/7/8/2021) is used as the parent model node.
2. And taking the model node with the maximum accuracy rate on the sample data set as a father model node.
Since the second training sample is important, a weight is configured, and a model node corresponding to the neural network having the greatest accuracy for the second training sample among the model nodes is taken as a parent model node.
Specifically, the accuracy of testing the deep learning model corresponding to each model node by the sample data set is determined, and the second training sample is important, so that the second training sample has a corresponding weight, that is, when the second training sample in the sample data set is input into the deep learning model corresponding to a certain model node and the accuracy between the prediction result of the second training sample and the label is higher, the deep learning model corresponding to the model node has a higher accuracy on the sample data set, and thus the model node is used as a parent model node.
3. And taking the model node with the maximum accuracy rate on part of training samples in the sample data set as a father model node.
For example, the parent model node is a model node having the maximum accuracy for a second training sample in which the user is interested; and the second training sample which is interested by the user is a second training sample with the weight larger than the preset weight.
In different second training samples, the weights of the second training samples are different, and since some second training samples which are interesting (or concerned) by the user are important, and the weights of the second training samples are larger, the model node which has the maximum accuracy rate to the second training sample which is interesting to the user is used as a parent model node.
Specifically, the second training sample with the larger weight is input into the deep learning model corresponding to a certain model node, and when the accuracy between the prediction result of the second training sample with the larger weight and the label is higher, it indicates that the deep learning model corresponding to the model node has higher accuracy for the second training sample with the larger weight, so that the model node is used as a parent model node.
4. And taking the model nodes with the generation time of the model nodes arranged in front as parent model nodes.
And taking the model nodes with the generation time arranged in the front as parent model nodes.
For another example, a parent model node is determined according to the generation time of the model nodes, and when the deep learning model is trained to a seventh model node (model node generation time: 09:00, 7/9/2021), the selectable model nodes include a third model node, a fifth model node and a sixth model node (since the accuracy of the deep learning model corresponding to the first model node is too low and is not considered), the model node with the generation time of the model node arranged in the front is considered preferentially, and therefore the third model node is taken as the parent model node.
5. And taking the initial model node as a parent model node.
Specifically, the parent model node is the initial model node, for example, when the accuracy of each model node is smaller than a first preset threshold, the user sends a retraining instruction to use the initial model node as the parent model node. Or when the average value of the accuracy rates of the model nodes is smaller than a second preset threshold value, the user sends a retraining instruction to take the initial model node as a father model node.
Specifically, the accuracy of each model node is smaller than a first preset threshold, or if the average value of the accuracy of each model node is smaller than a second preset threshold, which indicates that the accuracy of each model node is too low, the starting model node may be used as a parent model node, and the model starting node is retrained from the beginning.
In the invention, one model node is selected as a father model node, namely a second training parameter and a second model structure parameter which are stored by the father model node are used as initial model parameters; performing a plurality of iterative training of the deep learning model starting with the initial model parameters based on the second data set.
Specifically, a third training parameter and a third model structure parameter of each iterative training are calculated according to a second training parameter and a second model structure parameter stored in a parent model node and the learning strategy, and the third training parameter and the third model structure parameter are used for performing corresponding iterative training.
In the retraining, the step of generating a sub-model node after the training suspension condition is satisfied further includes:
the second data set further comprises a test sample;
testing based on the test sample in the second data set to obtain test data and then generating a sub-model node;
the test data comprises at least one of model accuracy, sample classification accuracy, confidence and test classification results.
The node information at least comprises the second data set, the test data, the sub-model node generation time and the consumption calculation power of deep learning model training.
The consumption computing power refers to the computing power required between two model nodes.
And S1000, generating a display interface according to the node information of the father model node and the node information of the sub model node.
After the step of storing the model parameter value generated by the last iterative training as the node information of the sub model node, displaying an interface on the node information of the father model node and the node information of the sub model node;
the information of the display interface is as follows: and the model node generates at least one of a time relation graph among time, a relation graph of a parent model node and a child model node and the node information. As shown in the graph of the relationship shown in the training itinerary of FIG. 2, the model node generation time of 2021, 7, 9 days, begins with the parent model node of 2021, 7, 8, 15:00 model generation time as the starting model node.
Preferably, the display interface is further provided with a user interaction interface, and the user interaction interface comprises at least one interaction operation: and the interactive operation of obtaining the second data set, the interactive operation of selecting one model node as a father model node, the interactive operation of setting a training suspension condition, the interactive operation of setting a learning strategy, and the interactive operation of deleting and adding the sample data set and modifying the weight of the training samples in the sample data set.
As shown in fig. 4, a case set (i.e., a sample data set) may be uploaded, downloaded, added, and deleted from the case set, and may also perform interactive operations of deleting, adding, and modifying weights for selected samples.
The interactive operation of selecting a model node as a parent model node refers to the selection of a model node as a parent model node by a user.
The deep learning model training method provided by the invention can form at least two model nodes in the training process and store node information. Furthermore, a display interface of the node information can be formed, and interactive operation of deep learning model training is provided on the display interface, so that a user can timely adjust according to the node information of each model node in the training process, and an optimal deep learning model can be obtained more quickly. For example, the deep learning model training method provided by the invention can generate a training journey tree according to the node information to show the node information in the training process, and a user can carry out interaction operation in time according to the node information in the training process, adjust samples in a data set or select father model nodes for retraining. In addition, the invention adjusts the samples in the data set in the second and subsequent training, and can carry out targeted training; in particular, a sample data set is provided, with weights assigned to particular (important) training samples, thereby enabling special training and targeted training.
Based on the deep learning model training method described in any of the above embodiments, the present invention further provides a preferred embodiment of a computer device:
the computer device comprises a memory and a processor, the memory storing a computer program, the processor implementing the steps when executing the computer program:
acquiring a first data set;
setting initial model parameters and training suspension conditions of the deep learning model, carrying out multiple iterative training on the deep learning model based on the first data set, and calculating model parameter values generated by each iteration according to the initial model parameters;
generating a model node when the training suspension condition is satisfied;
and saving the model parameter values generated by the last iterative training as the node information of the model nodes.
Based on the deep learning model training method described in any of the above embodiments, the present invention further provides a preferred embodiment of a computer-readable storage medium:
the computer device comprises a memory and a processor, on which a computer program is stored, which computer program, when being executed by the processor, carries out the steps of:
acquiring a first data set;
setting initial model parameters and training suspension conditions of the deep learning model, carrying out multiple iterative training on the deep learning model based on the first data set, and calculating model parameter values generated by each iteration according to the initial model parameters;
generating a model node when the training suspension condition is satisfied;
and saving the model parameter values generated by the last iterative training as the node information of the model nodes.
It will be understood that the invention is not limited to the examples described above, but that modifications and variations will occur to those skilled in the art in light of the above teachings, and that all such modifications and variations are considered to be within the scope of the invention as defined by the appended claims.

Claims (15)

1. A deep learning model training method is characterized by comprising the following steps:
acquiring a first data set;
setting initial model parameters and training suspension conditions of the deep learning model, carrying out multiple iterative training on the deep learning model based on the first data set, and calculating model parameter values generated by each iteration according to the initial model parameters;
generating a model node when the training suspension condition is satisfied;
saving the model parameter value generated by the last iterative training as the node information of the model node, and continuing the iterative training on the basis of the model node until the next time of meeting the training suspension condition, generating the next model node;
receiving a retraining instruction of a user and acquiring a second data set, wherein the second data set comprises the first data set and a sample data set, and the sample data set comprises a second training sample, a label corresponding to the second training sample, a weight corresponding to the second training sample, a second test sample, a label corresponding to the second test sample and a weight corresponding to the second test sample;
the retraining instruction also comprises selecting a model node as a father model node, carrying out multiple iterative training on the deep learning model based on the second data set, and calculating a model parameter value generated by each iteration by taking a model parameter value stored by the father model node as an initial model parameter;
generating a sub-model node after the training suspension condition is met;
saving the model parameter value generated by the last iterative training as the node information of the sub-model node;
generating a display interface; wherein, the information of the display interface comprises: generating a relationship graph according to the relationship between the generation time of the model node and the relationship graph between the father model node and the child model node; the display interface is also provided with a user interaction interface, and the user interaction interface comprises interaction operation of selecting one model node as a father model node.
2. The deep learning model training method according to claim 1, wherein: the first data set comprises training samples and test samples;
performing multiple times of iterative training on the deep learning model based on the training samples;
and the training stopping condition is that the number of times of the iterative training reaches a preset training number, and the last iterative training is ended.
3. The deep learning model training method according to claim 2, wherein:
testing the deep learning model after the last iterative training based on the test sample to obtain test data and then generating a model node;
the test data comprises at least one of model accuracy, sample classification accuracy, confidence and test classification result.
4. The deep learning model training method according to claim 3, wherein: the node information of the model node further includes at least: the first data set, the test data, the model node generation time, and the computational power consumed by the deep learning model training.
5. The deep learning model training method according to any one of claims 1 to 4, wherein the information of the presentation interface further comprises: at least one of the node information.
6. The deep learning model training method of claim 5, wherein the presentation interface further comprises at least one interactive operation: the interactive operation of obtaining the first data set, and the interactive operation of setting the initial model parameters and the training suspension conditions of the deep learning model.
7. The deep learning model training method of claim 1, wherein the step of selecting a model node as a parent model node comprises:
and taking the model node with the maximum accuracy rate as a parent model node.
8. The deep learning model training method of claim 1, wherein the step of selecting a model node as a parent model node comprises:
and taking the model node with the maximum accuracy rate on the sample data set as a father model node.
9. The method for training the deep learning model according to claim 1, wherein the step of selecting a model node as a parent model node comprises:
and taking the model node with the maximum accuracy rate on part of training samples in the sample data set as a father model node.
10. The method for training the deep learning model according to claim 1, wherein the step of selecting a model node as a parent model node comprises:
and taking the model node with the model generation time arranged in the front as a parent model node.
11. The deep learning model training method according to claim 1, wherein in the step of setting initial model parameters and training suspension conditions of the deep learning model, a starting model node is generated;
the step of selecting a model node as a parent model node comprises:
and taking the starting model node as a father model node.
12. The deep learning model training method of claim 1,
the sample data set is obtained by the following steps:
after a second training sample is collected, determining the weight corresponding to the second training sample according to a weight configuration instruction;
inputting the second training sample into the deep learning model to obtain a prediction result output by the deep learning model;
and if the weight of the second training sample is greater than the preset weight or the prediction result is inconsistent with the label corresponding to the second training sample, adding the second training sample into a sample data set.
13. The deep learning model training method of claim 1, wherein the user interaction interface further comprises at least one interaction operation: the interactive operation of obtaining the second data set, the interactive operation of setting the initial model parameters and the training suspension conditions of the deep learning model, and the interactive operation of deleting and adding the sample data set and modifying the weight of the training samples in the sample data set.
14. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 13 when executing the computer program.
15. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 13.
CN202111635664.1A 2021-12-29 2021-12-29 Deep learning model training method Active CN114004358B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111635664.1A CN114004358B (en) 2021-12-29 2021-12-29 Deep learning model training method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111635664.1A CN114004358B (en) 2021-12-29 2021-12-29 Deep learning model training method

Publications (2)

Publication Number Publication Date
CN114004358A CN114004358A (en) 2022-02-01
CN114004358B true CN114004358B (en) 2022-06-14

Family

ID=79932139

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111635664.1A Active CN114004358B (en) 2021-12-29 2021-12-29 Deep learning model training method

Country Status (1)

Country Link
CN (1) CN114004358B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117708309A (en) * 2024-02-18 2024-03-15 粤港澳大湾区数字经济研究院(福田) Method, system, equipment and medium for searching question and answer

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107783762A (en) * 2017-11-24 2018-03-09 重庆金融资产交易所有限责任公司 Interface creating method, device, storage medium and computer equipment
CN110795782A (en) * 2019-09-20 2020-02-14 久瓴(上海)智能科技有限公司 Method and device for generating connection node, computer equipment and storage medium
WO2020041883A1 (en) * 2018-08-29 2020-03-05 Carleton University Enabling wireless network personalization using zone of tolerance modeling and predictive analytics
CN111158812A (en) * 2019-12-17 2020-05-15 平安医疗健康管理股份有限公司 Interface display method and device based on interface prediction model and computer equipment
CN111461332A (en) * 2020-03-24 2020-07-28 北京五八信息技术有限公司 Deep learning model online reasoning method and device, electronic equipment and storage medium
CN111931916A (en) * 2020-08-13 2020-11-13 广东省电信规划设计院有限公司 Exploration method and device of deep learning model
CN112016699A (en) * 2020-08-31 2020-12-01 北京灵汐科技有限公司 Deep learning model training method, working node and parameter server
CN112036577A (en) * 2020-08-20 2020-12-04 第四范式(北京)技术有限公司 Method and device for application machine learning based on data form and electronic equipment
CN112465048A (en) * 2020-12-04 2021-03-09 苏州浪潮智能科技有限公司 Deep learning model training method, device, equipment and storage medium
CN112949853A (en) * 2021-02-23 2021-06-11 北京金山云网络技术有限公司 Deep learning model training method, system, device and equipment
CN113240089A (en) * 2021-05-20 2021-08-10 北京百度网讯科技有限公司 Graph neural network model training method and device based on graph retrieval engine
CN113496282A (en) * 2020-04-02 2021-10-12 北京金山数字娱乐科技有限公司 Model training method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210365611A1 (en) * 2018-09-27 2021-11-25 Oracle International Corporation Path prescriber model simulation for nodes in a time-series network
US11620300B2 (en) * 2018-09-28 2023-04-04 Splunk Inc. Real-time measurement and system monitoring based on generated dependency graph models of system components

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107783762A (en) * 2017-11-24 2018-03-09 重庆金融资产交易所有限责任公司 Interface creating method, device, storage medium and computer equipment
WO2020041883A1 (en) * 2018-08-29 2020-03-05 Carleton University Enabling wireless network personalization using zone of tolerance modeling and predictive analytics
CN110795782A (en) * 2019-09-20 2020-02-14 久瓴(上海)智能科技有限公司 Method and device for generating connection node, computer equipment and storage medium
CN111158812A (en) * 2019-12-17 2020-05-15 平安医疗健康管理股份有限公司 Interface display method and device based on interface prediction model and computer equipment
CN111461332A (en) * 2020-03-24 2020-07-28 北京五八信息技术有限公司 Deep learning model online reasoning method and device, electronic equipment and storage medium
CN113496282A (en) * 2020-04-02 2021-10-12 北京金山数字娱乐科技有限公司 Model training method and device
CN111931916A (en) * 2020-08-13 2020-11-13 广东省电信规划设计院有限公司 Exploration method and device of deep learning model
CN112036577A (en) * 2020-08-20 2020-12-04 第四范式(北京)技术有限公司 Method and device for application machine learning based on data form and electronic equipment
CN112016699A (en) * 2020-08-31 2020-12-01 北京灵汐科技有限公司 Deep learning model training method, working node and parameter server
CN112465048A (en) * 2020-12-04 2021-03-09 苏州浪潮智能科技有限公司 Deep learning model training method, device, equipment and storage medium
CN112949853A (en) * 2021-02-23 2021-06-11 北京金山云网络技术有限公司 Deep learning model training method, system, device and equipment
CN113240089A (en) * 2021-05-20 2021-08-10 北京百度网讯科技有限公司 Graph neural network model training method and device based on graph retrieval engine

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"一种融合表示学习与主题表征的作者合作预测模型";张鑫等;《数据分析与知识发现》;20201202;第88-100页 *
"数据管理系统的历史、现状与未来";杜小勇等;《软件学报》;20191231;第30卷(第1期);第127-141页 *

Also Published As

Publication number Publication date
CN114004358A (en) 2022-02-01

Similar Documents

Publication Publication Date Title
US20210019658A1 (en) Learning data augmentation policies
US11544536B2 (en) Hybrid neural architecture search
EP3711000B1 (en) Regularized neural network architecture search
KR102532749B1 (en) Method and apparatus for hierarchical learning of neural networks based on weak supervised learning
US11803731B2 (en) Neural architecture search with weight sharing
WO2020140073A1 (en) Neural architecture search through a graph search space
KR20190056009A (en) Apparatus and method related to metric learning based data classification
CN111656373A (en) Training neural network model
CN107544960B (en) Automatic question-answering method based on variable binding and relation activation
US20230112710A1 (en) Creating user inteface using machine learning
JP7295282B2 (en) Method for on-device learning of machine learning network of autonomous driving car through multi-stage learning using adaptive hyperparameter set and on-device learning device using the same
US20220215208A1 (en) Domain Adaptation Using Simulation to Simulation Transfer
CN114004358B (en) Deep learning model training method
US20220114836A1 (en) Method for processing image, and apparatus therefor
US20220092429A1 (en) Training neural networks using learned optimizers
WO2020240572A1 (en) Method for training a discriminator
Taymouri et al. Encoder-decoder generative adversarial nets for suffix generation and remaining time prediction of business process models
CN117237479A (en) Product style automatic generation method, device and equipment based on diffusion model
CN114137967B (en) Driving behavior decision method based on multi-network joint learning
WO2021159099A1 (en) Searching for normalization-activation layer architectures
WO2021159101A1 (en) Fine-grained stochastic neural architecture search
CN112685558B (en) Training method and device for emotion classification model
US20230177794A1 (en) Electronic device and method of inferring object in image
US20240104391A1 (en) Reward-model based reinforcement learning for performing reasoning tasks
CN111612141A (en) Deep learning model training and automatic tuning method and system based on knowledge base

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant