CN112508126A

CN112508126A - Deep learning model training method and device, electronic equipment and readable storage medium

Info

Publication number: CN112508126A
Application number: CN202011531212.4A
Authority: CN
Inventors: 赵雪鹏; 聂磊; 邹建法
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2021-03-16
Anticipated expiration: 2040-12-22
Also published as: CN112508126B

Abstract

The application discloses a deep learning model training method and device, electronic equipment and a readable storage medium, and relates to the technical field of computer vision, deep learning and image processing in artificial intelligence. The specific implementation scheme is as follows: when the electronic equipment iteratively trains the deep learning model, in each round of training, sample blocks are read from a training sample set, and the deep learning model is trained by adopting different training methods according to the types of samples contained in the sample blocks. And when the samples contained in the sample block are misjudged samples, the deep learning model is trained by adopting a deep learning method, and the parameters of the deep learning model are shared by the distillation training method and the deep learning method.

Description

Deep learning model training method and device, electronic equipment and readable storage medium

Technical Field

The application relates to the technical field of computer vision, deep learning and image processing in artificial intelligence, in particular to a deep learning model training method and device, electronic equipment and a readable storage medium.

Background

With the rapid development of Artificial Intelligence (AI), various deep learning models related to image processing are widely used in life and production in the field of computer vision. Common deep learning models include face recognition models, image classification models, image segmentation models, detection models, and the like.

In the field of computer vision, deep learning can be achieved with great success, and the deep learning mainly depends on mass data. In the deep learning process, a platform with high computing power is used for deep learning of mass data, and therefore a deep learning model is obtained. For a large-scale deep learning model, the training time is few days, and more days, weeks or even longer. The mass data is also referred to as basic data.

After the deep learning model is online, if the deep learning model has a problem, the deep learning model needs to be optimized in a short time to minimize the loss. However, the current deep learning model optimization method is high in time cost and cannot guarantee the prediction capability of basic data.

Disclosure of Invention

The application provides a deep learning model training method and device, electronic equipment and a readable storage medium.

According to a first aspect of the application, a deep learning module training method is provided, which includes:

acquiring a sample block in a training sample set, wherein the sample block comprises a preset number of image samples with the same category, the image samples in the training sample set comprise misjudged samples and basic samples, the basic samples are image samples used in training a deep learning model, the misjudged samples are image samples which cannot be correctly identified by the deep learning model, and the difference value between the number of the basic samples and the number of the misjudged samples is smaller than a preset threshold value;

for a sample block containing a basic sample, training the deep learning model by adopting a distillation training method;

and for the sample block containing the misjudged sample, training the deep learning model by adopting a deep learning method, wherein the model training parameters of the distillation training method and the deep learning method are the same.

According to a second aspect of the present application, there is provided a deep learning model training apparatus, comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring sample blocks in a training sample set, the sample blocks comprise image samples with the same preset number and types, the image samples in the training sample set comprise misjudged samples and basic samples, the basic samples are image samples used in the training of a deep learning model, the misjudged samples are image samples which cannot be correctly identified by the deep learning model, and the difference value between the number of the basic samples and the number of the misjudged samples is smaller than a preset threshold value;

the distillation module is used for training the deep learning model by adopting a distillation training method for a sample block containing a basic sample;

and the deep learning module is used for training the deep learning model by adopting a deep learning method for the sample block containing the misjudged sample, and the model training parameters of the distillation training method and the deep learning method are the same.

According to a third aspect of the present application, there is provided a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of an electronic device can read the computer program, execution of the computer program by the at least one processor causing the electronic device to perform the method of the first aspect.

In a fourth aspect, an embodiment of the present application provides an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the first aspect or any possible implementation of the method of the first aspect.

In a fifth aspect, embodiments of the present application provide a non-transitory computer-readable storage medium storing computer instructions for causing an electronic device to perform the method of the first aspect or the various possible implementations of the first aspect.

According to the technical scheme of the application, the number of the misjudged samples is similar to that of the basic samples, so that the sampling number of the misjudged samples is increased, and the capability of a deep learning model for learning the misjudged samples is improved; meanwhile, the electronic equipment solves the problem of forgetting basic data in the model iteration process by combining a distillation training method, and achieves the purpose of obtaining deep learning capable of accurately predicting both basic samples and misjudged samples through a small number of times of iterative training.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a scene schematic diagram of a deep learning model training method provided in an embodiment of the present application;

FIG. 2 is a flowchart of a deep learning model training method provided by an embodiment of the present application;

FIG. 3 is a process diagram of a deep learning model training method provided by an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating a process of reading a sample block in the deep learning model training method provided by the embodiment of the present application;

FIG. 5 is a schematic diagram of a process of a teacher model guiding deep learning model to learn in a deep learning model training method provided in an embodiment of the present application;

fig. 6 is a schematic structural diagram of a deep learning model training apparatus according to an embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of another deep learning model training apparatus according to an embodiment of the present disclosure;

FIG. 8 is a schematic block diagram of an example electronic device used to implement embodiments of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the process of training the deep learning model, massive data, computing equipment with high computing power and a large amount of time are needed. After the deep learning model is on line, if a problem occurs, the deep learning model needs to be optimized in a short time, so that the performance of the deep learning model is improved, and the loss is reduced to the lowest.

Taking a detection model on a production line for producing parts as an example, the input of the detection model is an image of each part produced on the production line, and the output result is used for indicating whether the part is a qualified product. In general, the production line of the workshop is basically uninterrupted for 7 × 24 hours. If the deep learning model has problems, the flow of the whole production line can be influenced. Therefore, when the accuracy of the detection model is low, the detection model needs to be optimized in a short time.

The deep learning model with the problem is assumed to be trained based on 1 million images, the 1 million images are called basic samples, and the basic samples comprise positive samples and negative samples. Images that cannot be correctly detected by the problematic deep learning model are referred to as misjudged samples, for example, a qualified product is judged as an unqualified product, or an unqualified product is judged as a qualified product, and the number of the misjudged samples is 100. Common optimization methods include the following:

the first method comprises the following steps: and copying 100 misjudged samples into 1000 misjudged samples by adopting a copying mode, training an optimization model by utilizing 1 ten thousand basic samples and 1000 misjudged samples, wherein the iteration times are the same as those in the training process of the problematic deep learning model. After the optimization model obtained by the method is online, the prediction capability on both the basic sample and the misjudged sample is strong, but the training time is very long, the time cost is high, and the requirement on quick iteration cannot be met. The prediction capability of the basic sample and the misjudged sample refers to that: when an image to be detected is similar to a basic sample or a misjudged sample, a detection result can be accurately obtained.

And the second method comprises the following steps: and (3) carrying out fine adjustment on the parameter model of the deep learning with the problem by using 100 misjudgment samples so as to optimize the deep learning model, wherein the optimized deep learning model has reduced prediction capability on a product corresponding to the basic sample.

And the third is that: the method is characterized in that 100 misjudged samples are copied into 5000 misjudged samples in a copying mode, 1 ten thousand basic samples and 5000 misjudged sample images are used for training an optimization model, the iteration times are smaller than those of a deep learning model training process, time can be saved, but the optimization model has the model effect of forgetting the basic samples, namely the prediction capability of the optimized deep learning model on products corresponding to the basic samples is reduced.

Obviously, the above several deep learning model optimization methods cannot take both the substantial reduction of the iteration cost and the model effect into consideration.

The embodiment of the application relates to the technical field of computer vision, deep learning and image processing in artificial intelligence, and the deep learning model is optimized in a knowledge distillation mode, so that the effect of the deep learning model is improved, and the time is saved.

Fig. 1 is a scene schematic diagram of a deep learning model training method provided in an embodiment of the present application. Referring to fig. 1, the scenario includes: database 101, server 102 and network 103, database 101 and server 102 establishing a network connection through network 103. Network 103 includes various types of network connections, such as wire, wireless communication links, or fiber optic cables.

The database 101 stores mass data, the mass data includes a basic sample set and a misjudgment sample set, the basic sample set includes positive samples and negative samples, and the misjudgment sample set includes the positive samples and the negative samples.

The server 102 is a server capable of providing a variety of servers, and the server 102 trains a base model by using a base sample set to obtain a deep learning model, which is also referred to as an unoptimized model. After the deep learning model is online, some images cannot be correctly identified, and the images are called misjudgment samples. The server 102 acquires the basic sample set and the misjudgment sample set from the database and trains the deep learning model.

The server 102 may be hardware or software. When the server 102 is hardware, the server 102 is a single server or a distributed server cluster composed of a plurality of servers. When the server 102 is software, it may be a plurality of software modules or a single software module, and the like, and the embodiments of the present disclosure are not limited.

It should be understood that the number of databases 101, servers 102, and networks 103 in fig. 1 is merely illustrative. In actual implementation, any number of databases 101, servers 102, and networks 103 are deployed according to actual needs.

In addition, the above-described fig. 1 is described taking the database 101 as a remote database as an example. However, the embodiments of the present application are not limited in sequence, and in other possible implementations, the database 101 may also be a local database of the server 102, and in this case, the network 103 in fig. 1 may not exist.

Next, a site testing method according to an embodiment of the present application is described in detail based on the architecture shown in fig. 1.

Fig. 2 is a flowchart of a deep learning model training method provided in an embodiment of the present application, where an execution subject of the embodiment is an electronic device, and the electronic device is, for example, the server in fig. 1, and the embodiment of the present application is not limited thereto. The embodiment comprises the following steps:

201. sample blocks in a training sample set are obtained.

The sample block comprises a preset number of image samples with the same category, the image samples in the training sample set comprise misjudged samples and basic samples, the basic samples are image samples used in the training of the deep learning model, the misjudged samples are image samples which cannot be correctly identified by the deep learning model, and the difference value between the number of the basic samples and the number of the misjudged samples is smaller than a preset threshold value.

Illustratively, the electronic device performs model training on a basic sample in advance to obtain a deep learning model, and if the deep learning model is on line, the deep learning model needs to be optimally trained.

And storing misjudged samples and basic samples for training the deep learning model in the database, wherein the misjudged samples are samples which cannot be correctly identified by the deep learning model. For example, a good product is mistakenly identified as a bad product, and a bad product is mistakenly identified as a good product. The samples in the misjudged sample set include positive samples and/or negative samples. The base samples include positive samples and negative samples.

When the training deep learning model is initially optimized, the electronic device acquires a training sample set from the database, stores the acquired samples in a sample block manner, and each sample block contains at least one sample of the same category, for example, if the size of the sample block is 3, it indicates that each sample block contains 3 basic samples or 3 misjudged samples. And the difference value between the misjudged sample and the basic sample in the training sample set is smaller than a preset threshold value. The number of samples contained in each sample block is related to the computing power of the electronic device. The higher the computing power of the electronic device, the greater the number of samples contained in the sample block; when the computing power of the electronic device is low, the block of samples contains a smaller number of samples. The sample block is also called a sample batch (batch), etc.

In the training process, the electronic device reads one sample block from the training sample set at a time, for example, randomly reads one sample block; or queuing the sample blocks and reading the sample blocks in the queue in sequence. After the sample block is read, the electronic equipment trains the deep learning model in different training modes according to the types of the samples contained in the sample block.

During the training process, the electronic device randomly reads a sample block in the training sample set, for example, a base sample block or a misjudged sample block. The samples contained in the basic sample block are all basic samples, and the samples contained in the misjudged sample block are all misjudged samples.

202. And for a sample block containing a basic sample, training the deep learning model by adopting a distillation training method.

203. And for the sample block containing the misjudged sample, training the deep learning model by adopting a deep learning method, wherein the model training parameters of the distillation training method and the deep learning method are the same.

Illustratively, when a sample block contains samples that are base samples, the sample block is also referred to as a base sample block, and the electronic device trains the deep learning model by using a knowledge distillation method. When the sample contained in the sample block is a misjudged sample, the sample block is also called a misjudged sample block, and at the moment, the electronic equipment trains the deep learning model by adopting a deep learning method.

In the embodiment of the application, the distillation training method and the deep learning method share the parameters of the deep learning model. That is to say: each iterative training is carried out on the basis of the last iterative training. For example, in the previous training process, the sample block read by the electronic device is the basic sample block, and the electronic device trains the deep learning model by using the distillation training method. In the training process of the round, the sample block read by the electronic device is a misjudged sample block, and the electronic device trains the deep learning model after the previous round of parameter adjustment by adopting a deep learning method.

According to the deep learning model training method provided by the embodiment of the application, when the electronic equipment iteratively trains the deep learning model, in each round of training, sample blocks are read from a training sample set, and different training methods are adopted to train the deep learning model according to the types of samples contained in the sample blocks. And when the samples contained in the sample block are misjudged samples, the deep learning model is trained by adopting a deep learning method, and the parameters of the deep learning model are shared by the distillation training method and the deep learning method. By adopting the scheme, the number of the misjudged samples is similar to that of the basic samples, so that the sampling number of the misjudged samples is increased, and the capability of the deep learning model for learning the misjudged samples is improved; meanwhile, the electronic equipment solves the problem of forgetting basic data in the model iteration process by combining a distillation training method, and achieves the purpose of obtaining deep learning capable of accurately predicting both basic samples and misjudged samples through a small number of times of iterative training.

In the above embodiment, before the electronic device trains the deep learning model, the base sample set and the misjudged sample set are obtained from the database, and the training sample set is determined according to the base sample set and the misjudged sample set. In the process of determining the training sample set, the electronic equipment copies the misjudgment sample set to obtain N misjudgment sample sets, and the training sample set is generated according to the basic samples in the basic sample set and the misjudgment samples in the N misjudgment sample sets.

Illustratively, mass data are stored in the database, the mass data comprise a basic sample set and a misjudgment sample set, and basic samples in the basic sample set are samples used in training a deep learning model and comprise positive samples and negative samples. The samples in the misjudged sample set are samples which cannot be correctly identified after the deep learning model is on line, for example, qualified products are mistakenly identified as unqualified products, and unqualified products are mistakenly identified as qualified products. The samples in the misjudged sample set include positive samples and/or negative samples.

In order to train the deep learning model, the electronic device obtains a basic sample set and a misjudgment sample set from a database so as to obtain a training sample set. For example, the electronic device obtains a base sample set and a misjudgment sample set from a database, and further obtains the same amount of base samples and misjudgment samples, thereby obtaining a training sample set.

For another example, when the number of the misjudged samples is small, the electronic device duplicates the misjudged samples in the misjudged sample set, so that the number of the misjudged samples is multiplied, and the number of the misjudged samples is close to the number of the base samples in the base sample set. For example, the base sample set includes 1 ten thousand base samples, and the misjudged sample set includes 100 misjudged samples. When the preset threshold is 100, the electronic equipment copies the misjudgment sample set for 100 times to obtain 1 ten thousand misjudgment samples; when the preset threshold is 5000, the electronic equipment copies the misjudgment sample set for 50 times to obtain 5000 misjudgment samples.

By adopting the scheme, the electronic equipment can rapidly increase the number of the misjudged samples by copying the misjudged samples in the misjudged sample set, and the learning capacity of the trained deep learning model on the misjudged samples is improved.

In the above embodiments, the unoptimized deep learning model is referred to as a base model or a teacher model. Fig. 3 is a process schematic diagram of a deep learning model training method provided in an embodiment of the present application. Referring to fig. 3, the ratio of the number of the basic sample set and the number of the misjudged sample sets is M: and N, the electronic equipment acquires M basic sample sets and N misjudgment sample sets. Wherein the value of M is, for example, 1, and the value of N is related to the number of misjudged samples in the misjudged sample set. For example, the electronic device obtains a basic sample set and a misjudgment sample set from the database, copies the basic sample set M times to obtain M basic sample sets, and copies the misjudgment sample set N times to obtain N misjudgment sample sets. For another example, the electronic device obtains M basic sample sets and N misjudgment sample sets from the database.

In summary, a difference between the total number of the basic samples included in the M basic sample sets and the total number of the misjudged samples included in the N misjudged sample sets is smaller than a preset threshold, where the preset threshold is, for example, 0, and the like, that is, the total number of the misjudged samples is close to the total number of the basic samples. After the electronic equipment obtains the misjudged samples and the basic samples, the samples are stored in a sample block mode, and each sample block comprises samples with the same preset quantity and the same type.

Assuming that the deep learning model is trained by using the basic sample for y times of iteration training, the method described in the embodiment of the application is adopted, and the number of times of iteration of the model is y × M/N times. Therefore, the training time of the model can be greatly reduced, and the effect of fast iteration is achieved. Moreover, the number of the misjudged samples is increased, the weight of the second loss value of the misjudged samples is increased, the learning capacity of the misjudged samples is increased, and meanwhile, the prediction capacity of the deep learning model on the basic samples can be guaranteed by adopting a distillation training method.

In the above embodiment, in the process of iteratively training the deep learning model, before the electronic device reads a sample block from the training sample set each time, the electronic device disordering the basic samples included in the training sample set, disordering the misjudged samples included in the training sample set, dividing the disordering basic samples into a plurality of basic sample blocks according to the preset number, and dividing the misjudged samples into a plurality of misjudged sample blocks.

Fig. 4 is a schematic process diagram of reading a sample block in the deep learning model training method provided by the embodiment of the present application. Referring to fig. 4, the diagonal filled rectangles represent basic samples, the labels of the 9 basic samples are sequentially 1-9, the horizontal filled rectangles represent misjudged samples, and the labels of the 9 misjudged samples are sequentially 1-9. After the misordering, the order of the basic samples is disturbed, and the order of the misjudged samples is also disturbed. The electronic device stores the samples in sample blocks, each sample block containing 3 samples.

By adopting the scheme, when each round of training is started, the electronic equipment executes out-of-order operation and reads the sample blocks, so that the sequence and the combination of the sample blocks read at each time are different, and the training effect is improved.

In the above embodiments, the deep learning method is a method for training the deep learning model by using the basic sample. The distillation training method is a method for training the deep learning model under the guidance of a teacher model. The process of training the deep learning model is the process of adjusting the parameters of the deep learning model by carrying out iterative training on the deep learning model.

In the following, based on fig. 3, how to perform distillation training deep learning to train the deep learning model and how to determine that the deep learning model is optimal will be described in detail.

First, the training method was distilled.

When the electronic equipment trains the deep learning model by adopting a distillation training method for a sample block containing a basic sample, firstly, the sample block is input into the deep learning model and a teacher model, and the teacher model is the deep learning model which is not optimized. And then, the electronic equipment supervises the learning of the deep learning model on the sample block according to the learning of the teacher model on the sample block, and adjusts the parameters of the deep learning model to train the deep learning model, wherein the parameters of the teacher model are not updated.

Referring to fig. 3, a basic sample block is input to a teacher model and a deep learning model, the teacher model learns samples in the basic sample block, parameters of the teacher model are not changed in the learning process, learning of the basic sample block by the deep learning model is only supervised, and the parameters of the deep learning model are continuously adjusted in the learning process.

By adopting the scheme, the purpose of training the deep learning model by adopting a distillation method is realized.

In the above embodiment, when the electronic device determines the distillation loss value from a sample block including a base sample, first, a first intermediate result obtained by a teacher model learning the sample block and a second intermediate result obtained by a deep learning model learning the sample block under the direction of the teacher model are determined. The electronics then determine the distillation loss value based on the first intermediate result and the second intermediate result.

For example, assuming that the teacher model and the deep learning model have 5 stages respectively, after the sample block is input to the deep learning model, a result is finally obtained through the 5 stages, and the result is the deep learning model with the parameters adjusted. The electronic device extracts features of one or more of the 5 stages as second intermediate results. Similarly, the electronic device can extract a second intermediate result, and the distillation loss value can be obtained by comparing the two intermediate results.

By adopting the scheme, the aim of accurately obtaining the distillation loss value by the electronic equipment is fulfilled.

Fig. 5 is a schematic process diagram of a teacher model guiding deep learning model learning in the deep learning model training method provided in the embodiment of the present application. Referring to fig. 5, each of the deep learning model and the teacher model includes at least one of a backbone network (backbone) layer, a Feature Pyramid (FPN) layer, a region of interest (ROI) layer, a head (Bbox head) network layer, and a face (mask head) network layer. And when the electronic equipment determines that the deep learning model learns a second intermediate result obtained by the sample block under the guidance of the teacher model, the electronic equipment inputs the sample block into the deep learning model, supervises the output of the layer corresponding to the deep learning model according to the output of each layer of the teacher model, and determines the second intermediate result according to the output of the layer corresponding to the deep learning model.

Referring to fig. 5, for the background layer, an adaptive layer (adaptive layer) is added between the teacher model and the deep learning model to increase the feature learning capability of the deep learning model. Aiming at the RoI layer, the electronic equipment selects the RoI of the teacher model, supervises the feature learning of the deep learning model at a specific position on a feature map (feature map), and reduces the influence of a non-RoI area on a distillation algorithm. Aiming at a Bbox head layer and a mask head layer, classification and regression distillation (distillation) loss is added between a teacher model and a deep learning model by the electronic equipment, and the prediction capability of the deep learning model is improved.

By adopting the scheme, the electronic equipment carries out distillation training on the basic sample through a distillation algorithm under strong supervision, so that the deep learning model can keep the prediction capability of the basic sample, namely, the forgetting probability of the deep learning model on the basic sample in the model training process is reduced.

Second, a deep learning method.

For a sample block containing a misjudged sample, when the electronic equipment trains the deep learning model by adopting a deep learning method, firstly, the electronic equipment inputs the sample block into the deep learning model. Then, the electronic device learns the sample block, and adjusts parameters of the deep learning model to train the deep learning model.

Referring to fig. 3, the misjudged sample block is input to the deep learning model, the deep learning model learns the misjudged samples in the misjudged sample block, and the parameters of the deep learning model are continuously adjusted during the learning process to train the deep learning model.

By adopting the scheme, the purpose of training the deep learning model by adopting a deep learning method is achieved.

It should be noted that two deep learning models are illustrated in fig. 3, only for clearly showing the distillation training process and the deep learning process. In fact, the two deep learning models are the same model. That is, distillation training and deep learning are different iterative processes in the model iterative training process. For example, in the first round of training, the sample block is the basic sample block, and the distillation training method is adopted, and in the second round of training, the sample block is also the basic sample block, and the distillation training is continued on the basis of the distillation training of the first round. In the third training process, if the sample block is a misjudged sample block, a deep learning method is adopted for learning on the basis of the second theory distillation training.

And finally, the deep learning model reaches the optimal value.

And determining the loss value of the training of the electronic equipment in each round of training process. For a sample block containing a base sample, at which point the round of training is distillation training, the electronics determine two loss values: a distillation loss value and a first loss value. Wherein the distillation loss value is indicative of a difference between a predictive ability of a teacher model for the base samples in the sample block and a predictive ability of the deep learning model for the base samples in the sample block, the first loss value is inversely related to the predictive ability of the deep learning model for the base samples in the sample block, and the teacher model is a model when the deep learning model is not optimized.

For example, the smaller the first loss value is, the more accurately the electronic device can predict the detection result of the base sample by using the deep learning model. For example. The base sample is a negative sample, the electronic device can identify that the base sample is a negative sample based on the deep learning model. The larger the first loss value is, the lower the accuracy of the deep learning model is, and it is easy to identify a negative sample as a positive sample, or identify a positive sample as a negative sample.

The smaller the distillation loss is, the closer the learning capabilities of the deep learning model and the teacher model are; on the contrary, the difference between the learning abilities of the deep learning model and the teacher model is larger.

For a sample block containing misjudged samples, at the moment, the training of the current round is deep learning, and the electronic equipment determines a second loss value, wherein the second loss value is inversely related to the prediction capability of the deep learning model on the misjudged samples in the sample block.

For example, the smaller the second loss value is, the lower the second loss value is, the more accurate the electronic device can predict the detection result of the misjudged sample by using the deep learning model. For example. If the misjudged sample is a negative sample, the electronic device can recognize that the misjudged sample is a negative sample based on the deep learning model. The larger the second loss value is, the lower the accuracy of the deep learning model is, and a negative sample is easily recognized as a positive sample, or a positive sample is recognized as a negative sample.

In the above embodiment, the first loss value and the second loss value include a classification loss, a regression loss, and the like.

And after the electronic equipment determines the loss value of each round of training, determining whether the deep learning model is optimal or not according to the loss value of each round of training. In one mode, the electronic device determines an xth round of training, takes the deep learning model after the xth round of training as the optimal model, and the loss value of the xth round of training is the minimum. When the xth round of training is distillation training, the loss value of the xth round of training is the sum of the first loss value and the distillation loss value, or the loss value of the xth round of training is the weighted sum of the first loss value and the distillation loss value.

In another mode, when the distillation loss value, the first loss value and the second loss value meet a preset condition, the deep learning model is determined to reach an optimal state.

In this way, the electronic device determines an a-th round of training and a b-th round of training, where the a-th round of training and the b-th round of training are training with different training methods, and the a-th round of training and the b-th round of training are adjacent or non-adjacent. When the a-th round training and the b-th round training are not adjacent, the training method of other rounds between the a-th round training and the b-th round training is the same as that of the a-th round training, or the training method of other rounds between the a-th round training and the b-th round training is the same as that of the b-th round training. Assuming that the training of the a-th round is distillation training and the training of the b-th round is deep learning, when the distillation loss value and the first loss value of the training of the a-th round and the second loss value of the training of the b-th round meet preset conditions, the electronic equipment determines that the deep learning model reaches an optimal state. The preset conditions are, for example, that the sum of the loss values is less than a preset value, the distillation loss value is less than a preset distillation loss value, and the like.

By adopting the scheme, the electronic equipment determines whether the deep learning model training is finished according to the distillation loss value, the first loss value and the second loss value, and the purpose of accurately determining the optimized deep learning model is achieved.

In the above embodiment, the preset condition is that the weighted sum value is smaller than a preset value, and when the electronic device determines that the deep learning model reaches the optimal state, first, the electronic device determines a first weight, a second weight, and a third weight, where the distillation loss value, the first loss value, and the second loss value correspond in sequence. The electronics then weight and sum the distillation loss value, the first loss value, and the second loss value according to the first weight, the second weight, and the third weight. And when the weighted sum value is smaller than a preset value, determining that the deep learning model reaches an optimal state.

Illustratively, if the loss value of the lifting misjudged sample, i.e., the weight of the loss value of the second sample, is not considered, the weight ratio of the distillation loss value, the first loss value and the second loss value is 1:1: 1. To increase the loss value of the misjudged sample, the weight of the second loss value is increased. The weight ratio of the distillation loss value, the first loss value and the second loss value after the increase is 1:1:3, etc.

By adopting the scheme, the electronic equipment can accelerate the learning ability of the deep learning model to the misjudged samples by increasing the weight proportion of the second loss value of the misjudged samples.

In the above embodiment, after the electronic device finishes training the deep learning model, the electronic device processes the target image by using the trained deep learning model.

Illustratively, after completing the deep learning model training, the electronic device deploys the trained deep learning model to a production line or the like, shoots a workpiece generated on the production line to obtain a target image, and detects the target image to determine whether the workpiece is qualified or not.

In the above, a specific implementation of the deep learning model training method mentioned in the embodiments of the present application is introduced, and the following is an embodiment of the apparatus of the present application, which can be used to implement the embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Fig. 6 is a schematic structural diagram of a deep learning model training apparatus according to an embodiment of the present application. The apparatus may be integrated in or implemented by an electronic device. As shown in fig. 6, in this embodiment, the deep learning model training apparatus 600 may include: an acquisition module 61, a distillation module 62, and a deep learning module 63.

An obtaining module 61, configured to obtain a sample block in a training sample set, where the sample block includes a preset number of image samples with the same category, the image samples in the training sample set include a misjudged sample and a basic sample, the basic sample is an image sample used when a deep learning model is trained, the misjudged sample is an image sample that cannot be correctly identified by the deep learning model, and a difference between the number of the basic sample and the number of the misjudged sample is smaller than a preset threshold;

a distillation module 62, configured to train the deep learning model by using a distillation training method for a sample block containing a base sample;

and the deep learning module 63 is configured to train the deep learning model by using a deep learning method for a sample block containing a misjudged sample, where the model training parameters of the distillation training method and the deep learning method are the same.

In one possible implementation, the distilling module 62 is configured to input the sample block into the deep learning model and a teacher model, where the teacher model is an untrained deep learning model, supervise learning of the sample block by the deep learning model according to learning of the sample block by the teacher model, and adjust parameters of the deep learning model to train the deep learning model, and parameters of the teacher model are not updated.

In a possible implementation, the deep learning module 63 is configured to input the sample block to the deep learning model; and learning the sample block, and adjusting parameters of the deep learning model to train the deep learning model.

Fig. 7 is a schematic structural diagram of another deep learning model training apparatus according to an embodiment of the present application. The acquisition module 71, the distillation module 72, and the depth learning module 73 in the present embodiment correspond to the acquisition module 61, the distillation module 62, and the depth learning module 63 in fig. 6, respectively. The deep learning model training apparatus 700 provided in this embodiment further includes:

a first processing module 74, configured to determine, for a sample block containing a base sample, a distillation loss value and a first loss value from the sample block, where the distillation loss value is used to indicate a difference between a prediction capability of a teacher model for the base sample in the sample block and a prediction capability of the deep learning model for the base sample in the sample block, the first loss value is inversely related to the prediction capability of the deep learning model for the base sample in the sample block, and the teacher model is a model when the deep learning model is untrained; for a sample block containing misjudged samples, determining a second loss value according to the sample block, wherein the second loss value is inversely related to the prediction capability of the deep learning model on the misjudged samples in the sample block; and when the distillation loss value, the first loss value and the second loss value meet preset conditions, determining that the deep learning model reaches an optimal state.

In a possible implementation manner, the preset condition is that a weighted summation value is smaller than a preset value, when the distillation loss value, the first loss value, and the second loss value satisfy the preset condition, the first processing module 74 is configured to determine a first weight, a second weight, and a third weight that correspond to the distillation loss value, the first loss value, and the second loss value in sequence, and weight and sum the distillation loss value, the first loss value, and the second loss value according to the first weight, the second weight, and the third weight; and when the weighted sum value is smaller than a preset value, determining that the deep learning model reaches an optimal state.

In one possible implementation, for a sample block containing a base sample, the first processing module 74 is configured to determine a first intermediate result obtained by the teacher model learning the sample block when determining the distillation loss value according to the sample block; determining a second intermediate result obtained by the deep learning model learning the sample block under the guidance of the teacher model; determining the distillation loss value based on the first intermediate result and the second intermediate result.

In a possible implementation manner, when determining that the deep learning model learns the second intermediate result obtained by the sample block under the guidance of the teacher model, the first processing module 74 is configured to input the sample block to the deep learning model, monitor output of corresponding layers of the deep learning model according to output of each layer of the teacher model, where each layer of the teacher model includes at least one of a backbone network layer, a feature pyramid layer, an interest region layer, a head network layer, and a face network layer, and determine the second intermediate result according to output of corresponding layers of the deep learning model.

Referring to fig. 7 again, in a possible implementation manner, the deep learning model training apparatus 700 further includes:

a second processing module 75, configured to, before the obtaining module 61 obtains a sample block in a training sample set, disorder a basic sample included in the training sample set, and disorder a misjudged sample included in the training sample set; and dividing the disordered basic samples into a plurality of basic sample blocks according to the preset number, and dividing the disordered misjudgment samples into a plurality of misjudgment sample blocks.

In a feasible implementation manner, the second processing module 75 is further configured to copy the misjudgment sample sets to obtain N misjudgment sample sets before misordering the basic samples included in the training sample set and misjudging samples included in the training sample set; and generating the training sample set according to the basic samples in the basic sample set and the misjudged samples in the N misjudged sample sets.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

There is also provided, in accordance with an embodiment of the present application, a computer program product, including: a computer program, stored in a readable storage medium, from which at least one processor of the electronic device can read the computer program, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any of the embodiments described above.

FIG. 8 is a schematic block diagram of an example electronic device used to implement embodiments of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the electronic device 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM802, and the RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 performs the respective methods and processes described above, such as the deep learning model training method. For example, in some embodiments, the deep learning model training method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM802 and/or communications unit 809. When loaded into RAM803 and executed by computing unit 801, a computer program may perform one or more steps of the deep learning model training method described above. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the deep learning model training method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty, weak service extensibility and the like in a conventional physical host and a Virtual Private Server (VPS). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A deep learning model training method comprises the following steps:

2. The method of claim 1, wherein the training the deep learning model using a distillation training method for a sample block containing base samples comprises:

inputting the sample block to the deep learning model and a teacher model, the teacher model being the deep learning model that is not trained;

and monitoring the learning of the deep learning model on the sample block according to the learning of the teacher model on the sample block, and adjusting the parameters of the deep learning model to train the deep learning model, wherein the parameters of the teacher model are not updated.

3. The method of claim 1, wherein the training the deep learning model with a deep learning method for a sample block containing misjudged samples comprises:

inputting the sample block to the deep learning model;

and learning the sample block, and adjusting parameters of the deep learning model to train the deep learning model.

4. The method of any of claims 1-3, further comprising:

for a sample block containing a base sample, determining a distillation loss value and a first loss value according to the sample block, wherein the distillation loss value is used for indicating the difference between the prediction capability of a teacher model on the base sample in the sample block and the prediction capability of the deep learning model on the base sample in the sample block, the first loss value is inversely related to the prediction capability of the deep learning model on the base sample in the sample block, and the teacher model is a model when the deep learning model is not trained;

for a sample block containing misjudged samples, determining a second loss value according to the sample block, wherein the second loss value is inversely related to the prediction capability of the deep learning model on the misjudged samples in the sample block;

and when the distillation loss value, the first loss value and the second loss value meet preset conditions, determining that the deep learning model reaches an optimal state.

5. The method according to claim 4, wherein the preset condition is that a value of the weighted sum is smaller than a preset value, and the determining that the deep learning model reaches an optimal state when the distillation loss value, the first loss value, and the second loss value satisfy the preset condition comprises:

determining a first weight, a second weight and a third weight which correspond to the distillation loss value, the first loss value and the second loss value in sequence;

weighting and summing the distillation loss value, the first loss value, and the second loss value according to the first weight, the second weight, and the third weight;

and when the weighted sum value is smaller than a preset value, determining that the deep learning model reaches an optimal state.

6. The method of claim 4, wherein the determining a distillation loss value from a sample block containing a base sample comprises:

determining a first intermediate result obtained by the teacher model learning the sample block;

determining a second intermediate result obtained by the deep learning model learning the sample block under the guidance of the teacher model;

determining the distillation loss value based on the first intermediate result and the second intermediate result.

7. The method of claim 6, wherein the determining a second intermediate result of the deep learning model learning the sample block under direction of the teacher model comprises:

inputting the sample blocks into the deep learning model, and monitoring the output of the corresponding layers of the deep learning model according to the output of each layer of the teacher model, wherein each layer of the teacher model comprises at least one of a backbone network layer, a characteristic pyramid layer, an interesting region layer, a head network layer and a face network layer;

and determining the second intermediate result according to the output of the corresponding layer of the deep learning model.

8. The method of any of claims 1-3, wherein the obtaining of the sample blocks in the training sample set is preceded by:

disorder basic samples contained in the training sample set, and disorder misjudgment samples contained in the training sample set;

and dividing the disordered basic samples into a plurality of basic sample blocks according to the preset number, and dividing the disordered misjudgment samples into a plurality of misjudgment sample blocks.

9. The method of claim 8, wherein the misordering of the base samples included in the training sample set prior to misordering of the misjudged samples included in the training sample set comprises:

copying the misjudgment sample sets to obtain N misjudgment sample sets;

and generating the training sample set according to the basic samples in the basic sample set and the misjudged samples in the N misjudged sample sets.

10. A deep learning model training apparatus comprising:

11. The apparatus of claim 10, wherein,

the distillation module is used for inputting the sample blocks into the deep learning model and a teacher model, the teacher model is an untrained deep learning model, learning of the sample blocks is supervised by the deep learning model according to the teacher model, parameters of the deep learning model are adjusted to train the deep learning model, and the parameters of the teacher model are not updated.

12. The apparatus of claim 10, wherein,

the deep learning module is used for inputting the sample block to the deep learning model; and learning the sample block, and adjusting parameters of the deep learning model to train the deep learning model.

13. The apparatus of any of claims 10-12, further comprising:

a first processing module, configured to determine, for a sample block containing a base sample, a distillation loss value and a first loss value from the sample block, where the distillation loss value is used to indicate a difference between a prediction capability of a teacher model for the base sample in the sample block and a prediction capability of the deep learning model for the base sample in the sample block, and the first loss value is inversely related to the prediction capability of the deep learning model for the base sample in the sample block, and the teacher model is a model when the deep learning model is not trained; for a sample block containing misjudged samples, determining a second loss value according to the sample block, wherein the second loss value is inversely related to the prediction capability of the deep learning model on the misjudged samples in the sample block; and when the distillation loss value, the first loss value and the second loss value meet preset conditions, determining that the deep learning model reaches an optimal state.

14. The apparatus according to claim 13, wherein the preset condition is that a weighted sum value is smaller than a preset value, when the distillation loss value, the first loss value and the second loss value satisfy the preset condition, the first processing module is configured to determine a first weight, a second weight and a third weight corresponding to the distillation loss value, the first loss value and the second loss value in sequence, and to weight and sum the distillation loss value, the first loss value and the second loss value according to the first weight, the second weight and the third weight; and when the weighted sum value is smaller than a preset value, determining that the deep learning model reaches an optimal state.

15. The apparatus of claim 13, wherein,

when the first processing module determines a distillation loss value according to the sample block, determining a first intermediate result obtained by the teacher model when the teacher model learns the sample block; determining a second intermediate result obtained by the deep learning model learning the sample block under the guidance of the teacher model; determining the distillation loss value based on the first intermediate result and the second intermediate result.

16. The apparatus of claim 15, wherein,

the first processing module is used for inputting the sample blocks into the deep learning model when determining a second intermediate result obtained by the deep learning model when the deep learning model learns the sample blocks under the guidance of the teacher model, monitoring the output of a corresponding layer of the deep learning model according to the output of each layer of the teacher model, wherein each layer of the teacher model comprises at least one of a backbone network layer, a feature pyramid layer, an interested region layer, a head network layer and a face network layer, and determining the second intermediate result according to the output of the corresponding layer of the deep learning model.

17. The apparatus of any of claims 10-12, further comprising:

the second processing module is used for disordering basic samples contained in a training sample set and disordering misjudgment samples contained in the training sample set before the acquisition module acquires sample blocks in the training sample set; and dividing the disordered basic samples into a plurality of basic sample blocks according to the preset number, and dividing the disordered misjudgment samples into a plurality of misjudgment sample blocks.

18. The apparatus of claim 17, wherein,

the second processing module is used for disordering basic samples contained in the training sample set, and copying the misjudgment sample set before misjudgment samples contained in the training sample set are disordering to obtain N misjudgment sample sets; and generating the training sample set according to the basic samples in the basic sample set and the misjudged samples in the N misjudged sample sets.

19. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.

21. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-9.