CN113516185B

CN113516185B - Model training method, device, electronic equipment and storage medium

Info

Publication number: CN113516185B
Application number: CN202110777357.0A
Authority: CN
Inventors: 戴兵
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-07-09
Filing date: 2021-07-09
Publication date: 2023-10-31
Anticipated expiration: 2041-07-09
Also published as: CN113516185A

Abstract

The disclosure provides a method, a device, electronic equipment and a storage medium for model training, relates to the field of deep learning, and particularly relates to the field of model training. The specific implementation scheme is as follows: acquiring a target sample set; the target sample set comprises part of sample objects belonging to the original category in the original sample set and sample objects belonging to the newly added category in the newly added sample set; training a target classification model to be trained by adopting a target sample set based on original information corresponding to the original classification model to obtain a target classification model after training; wherein, each class of the target classification model for reasoning comprises an original class and a newly added class; the original information corresponding to the original classification model comprises: and aiming at each sample object in the target sample set, utilizing the classification result obtained when the original classification model is used for classifying. Through the scheme of the present disclosure, the efficiency and accuracy of model training can be considered.

Description

Model training method, device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of deep learning, in particular to the field of model training, and specifically relates to a method, a device, electronic equipment and a storage medium for model training.

Background

And the object classification model is used for classifying the objects. In the use of object classification models, it is often desirable to increase the number of classification categories that the object classification model can identify.

In the related art, in order to train to obtain a new model, a sample set utilized by an original classification model and a newly added sample set are often combined into a new sample set, and the new object classification model is trained through the new sample set, so that a new model with the identification capability of the original class of the original classification model and the identification capability of the newly added class is finally obtained.

Disclosure of Invention

The present disclosure provides a method, apparatus, device, and storage medium for model training that may compromise the efficiency and accuracy of model training.

According to an aspect of the present disclosure, there is provided a method of model training, comprising:

acquiring a target sample set; the target sample set comprises part of sample objects belonging to an original category in an original sample set and sample objects belonging to a new category in a new sample set, wherein the original sample set is a sample set utilized by an original classification model which is trained in advance;

training a target classification model to be trained by adopting the target sample set based on the original information corresponding to the original classification model to obtain the target classification model after training;

Wherein each class of the target classification model for reasoning comprises the original class and the newly added class; the original information corresponding to the original classification model comprises: and aiming at each sample object in the target sample set, utilizing the classification result obtained when the original classification model is used for classifying.

According to another aspect of the present disclosure, there is provided an apparatus for model training, including:

the sample set acquisition module is used for acquiring a target sample set; the target sample set comprises part of sample objects belonging to an original category in an original sample set and sample objects belonging to a new category in a new sample set, wherein the original sample set is a sample set utilized by an original classification model which is trained in advance;

the model training module is used for training the target classification model to be trained by adopting the target sample set based on the original information corresponding to the original classification model to obtain the target classification model after training;

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of model training.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method of model training.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method of model training.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;

FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a model distillation provided by an embodiment of the present disclosure;

FIG. 5 is a schematic diagram according to a fourth embodiment of the present disclosure;

FIG. 6 is a block diagram of an electronic device for implementing a method of model training of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

An object classification model is a model for classifying objects. In the use of object classification models, it is often desirable to increase the number of classification categories that the object classification model can identify.

The original classification model is an image classification model obtained by training an original sample set containing 1000 kinds of labels and a total of 100 tens of thousands (1000 samples in each kind) of sample objects, and has the recognition capability of 1000 kinds of classification categories, and when a new sample set containing 20 kinds of labels and a total of 4000 samples is added, the new classification model can be trained based on the original sample set and the new sample set, namely, the new classification model with the recognition capability of 1020 kinds of classification categories.

In the related art, in order to train to obtain a new model, an original sample set and a new sample set are often required to be combined into a new sample set, and then a new object classification model is trained through the new sample set, so that a new model with the identification capability of the original classification category of the original classification model and the identification capability of the new added category is finally obtained.

In the related art, the original sample set and the newly added sample set are set as new sample sets, so that the number of samples in the combined sample sets is large, the time consumption for obtaining the newly added model by training is large, and the efficiency is low.

For example, the original sample set contains 1000 types of labels and 100 tens of thousands of sample objects in total, and the newly added sample set contains 20 types of labels and 4000 sample objects in total, and because the number of sample objects in the original sample set is large, the combined sample set also contains more than 100 tens of thousands of sample objects, so that the training of the newly added model is very time-consuming, and the experience of a user is also affected.

In order to achieve both efficiency and accuracy of model training, embodiments of the present disclosure provide a method for model training.

It should be noted that, in a specific application, the method for model training provided in the embodiments of the present disclosure may be applied to various electronic devices, for example, a personal computer, a server, and other devices having data processing capabilities. In addition, it can be understood that the model training method provided by the embodiment of the present disclosure may be implemented by software, hardware, or a combination of software and hardware.

The method for training the model provided by the embodiment of the disclosure can comprise the following steps:

training a target classification model to be trained by adopting a target sample set based on original information corresponding to the original classification model to obtain a target classification model after training;

wherein, each class of the target classification model for reasoning comprises an original class and a newly added class; the original information corresponding to the original classification model comprises: and aiming at each sample object in the target sample set, utilizing the classification result obtained when the original classification model is used for classifying.

According to the scheme provided by the disclosure, the target sample set only comprises part of sample objects belonging to the original category in the original sample set, so that the number of sample objects in the target sample set is smaller than that of total sample objects, therefore, the time required for training the newly-added model can be reduced, meanwhile, the accuracy of the newly-added model can be improved by training the newly-added model through the original information of the trained original classification model, and the efficiency and the accuracy of model training can be both considered through the scheme provided by the disclosure.

A method for model training provided by embodiments of the present disclosure is described below with reference to the accompanying drawings.

As shown in fig. 1, the method for training a model according to the embodiment of the disclosure may include the following steps:

s101, acquiring a target sample set; the target sample set comprises part of sample objects belonging to an original category in an original sample set and sample objects belonging to a new category in a new sample set, wherein the original sample set is a sample set utilized by an original classification model which is trained in advance;

the original classification model may be a model for classifying objects such as images and audio. The sample object may be an object of an image, audio, etc., corresponding to the original classification model. For example, if the original classification model is an image classification model, such as a model constructed based on CNN (Convolutional Neural Network ), the sample object may be a sample image.

The original sample set is a sample set utilized by the original classification model which is trained in advance. That is, the original classification model is trained from sample objects in the original sample set. The original class is the classification class to which the sample object in the original sample set belongs. For example, the original sample set contains 100 ten thousand sample objects, and belongs to category 1-category 1000, and then category 1-category 1000 are all original categories. Meanwhile, the original classification model is obtained by training an original sample set, so that the original classification model is a type which can be inferred by the original classification model.

In the case where there are a plurality of original categories, the target sample set includes a part of sample objects belonging to each of the original categories in the original sample set. For example, if the number of sample objects in each type in the original sample set is 1000, the target sample set may contain 200 of 1000 sample images in each type in the original sample image.

The new sample set contains sample objects of a new class, which is a classification class different from the original class. If the original category is category 1-category 1000, the newly added category may be category 1001, category 1002, etc.

The target sample set includes all sample objects for which the newly added sample set belongs to the newly added class.

Since the target sample set includes only a part of sample objects belonging to the original category in the original sample set, the number of sample objects in the target sample set is smaller than the total number of sample objects, and thus, the time required to train the target classification model can be reduced.

S102, training a target classification model to be trained by adopting a target sample set based on original information corresponding to the original classification model to obtain a target classification model after training.

The object classification model is an object classification model which needs to be newly added, and each class used for reasoning comprises an original class and a newly added class. For example, if the original class contains 1000 classes and the added class contains 20 classes, the target classification model is used to infer 1020 classes.

The original information corresponding to the original classification model comprises: and aiming at each sample object in the target sample set, utilizing the classification result obtained when the original classification model is used for classifying. Since the original classification model is used for deducing the original category, after the sample object is input into the original classification model, the original classification model can output the probability that the sample object belongs to the original category, namely the classification result of the sample object. It should be noted that, if the original categories are plural, the classification result output by the original classification model includes the probability that the sample object belongs to each original category.

Since the original information contains the classification result obtained when the original classification model is used for classifying each sample object in the target sample set, the classification result can be used as reference information for training the specified inference result of the target classification model, and the specified inference result specifically refers to: the target classification model classifies the sample objects to obtain the results belonging to the original category in the classification results, and further, under the condition that the target sample set only contains part of sample objects belonging to the original category in the original sample set, the target classification model with higher accuracy can be obtained through training. It can be appreciated that, based on the original information corresponding to the original classification model, the manner of training the target classification model to be trained by using the target sample set may be referred to as distillation training of the target classification model by using the original classification model.

According to the scheme provided by the disclosure, the target sample set only comprises part of sample objects belonging to the original category in the original sample set, so that the number of sample objects in the target sample set is smaller than that of total sample objects, therefore, the time required for training a newly-added model can be reduced, meanwhile, the accuracy of the target classification model can be improved by training the newly-added model by utilizing the original information of the trained original classification model, and the efficiency and the accuracy of model training can be both considered through the scheme provided by the disclosure.

Based on the embodiment of fig. 1, as shown in fig. 2, the method for model training provided in another embodiment of the disclosure, S102 described above may include steps S1021-S1024:

s1021, acquiring a sample object from a target sample set as a target sample object;

the target sample set includes a part of sample objects belonging to an original category in the original sample set and a sample object belonging to a new category in the new sample set, and the sample object can be obtained from the target sample set as the target sample object.

Alternatively, in order to fully utilize each sample object in the target sample set to train the target classification model, each acquired sample object may be recorded, so that when the sample object is acquired, an unused sample object may be acquired from the target sample set as the target sample object.

S1022, inputting the target sample object into the target classification model to obtain a first classification result, and inputting the target sample object into the original classification model to obtain a second classification result;

the first classification result includes probabilities that the target sample object predicted by the target classification model belongs to each original category and each new category. The second classification result contains the probabilities that the target sample object predicted by the original classification model belongs to each original class.

For example, the original category includes category 1, category 2, and category 3, and the added category includes category 4. After the sample image A is input into the target classification model, the first classification result comprises: 10% probability of category 1, 10% probability of category 2, 75% probability of category 3, 5% probability of category 4. After the sample image a is input into the original classification model, the second classification result obtained includes: 10% probability of category 1, 10% probability of category 2, 80% probability of category 3.

S1023, determining the loss of the target classification model by utilizing the difference between the result of the original category and the second classification result in the first classification result, and taking the loss as the result loss of the target classification model;

the above-mentioned result loss is also called distillation loss, that is, the loss of adjusting the classification result output by the target classification model by using the classification result of the original type output by the original classification model.

The difference between the result for the original category and the second category in the first classification result may reflect the difference between the predictive power of the target classification model and the predictive power of the original classification model. Therefore, the loss of the target classification model can be determined as the result loss of the target classification model by utilizing the difference between the result of the original category and the second category in the first classification result.

Alternatively, the loss of the target classification model may be determined in various manners by using the difference between the result of the original category and the result of the second category in the first classification result, where the loss is used as the result loss of the target classification model, and the method includes:

a first mode: and calculating the difference value of the first classification result aiming at the result of each original class and the corresponding second classification result, and taking the sum of the calculated difference values as the result loss of the target classification model.

Illustratively, the result for the original category in the first classification result is that the target classification model predicts a probability that the target sample object belongs to the original category. Illustratively, the original category contains category 1, category 2, and category 3, the newly added category contains category 4, and the first classification result includes: 10% probability of category 1, 10% probability of category 2, 75% probability of category 3, 5% probability of category 4. The first classification result is for the original category: 10% probability of category 1, 10% probability of category 2, 75% probability of category 3.

In one implementation, the difference between the probability of each original category in the results for the original category in the first classification result and the probability in the second classification result may be calculated first, and then the absolute values of the calculated differences may be added to obtain the result loss.

Illustratively, the first classification result is for the original category: 10% probability of category 1, 10% probability of category 2, 75% probability of category 3. The second classification result includes: 10% probability of category 1, 10% probability of category 2, 80% probability of category 3. Then the calculated class 1 difference is 0, class 2 difference is 0, class 3 difference is 5%, and the resulting loss is 5%.

The second mode is as follows: and calculating the divergence between the result aiming at the original category and the second category in the first category result, and taking the divergence as the result loss of the target category model.

The divergence may be a KL (Kullback-Leibler, relative entropy) divergence or a JS (Jensen-Shannon, jansen-Shannon) divergence.

Wherein, the calculation formula of KL divergence is as follows:

wherein p is the result of the first classification result for the original category, q is the second classification result, n is the number of the original categories, p (x) _i ) For the probability value, q (x) _i ) Is the probability value of the ith category in the second category result.

The calculation formula of the JS divergence is as follows:

wherein P is ₁ P being the result of the first classification result for the original class ₂ As a result of the second classification,to calculate P ₁ And->KL divergence of>To calculate P ₂ And->KL divergence of (c).

If the calculated divergence is JS divergence, the second method may include:

and calculating JS divergence between the result aiming at the original category and the second category in the first category result, and taking the JS divergence as the result loss of the target category model.

As the JS divergence can more accurately reflect the difference condition between the result aiming at the original category and the second category in the first category result, the result loss of the target category model can be more accurate.

S1024, based on the result loss, the parameters of the target classification model are adjusted, and before the sample objects in the target sample set are all utilized, the step of obtaining the sample objects from the target sample set as the target sample objects is executed back, and the training of the target classification model is continued.

For the neural network model, the larger the loss is, the larger the adjustment amplitude of the parameter to be adjusted is, so that the parameter of the target classification model can be adjusted based on the result loss in combination with the actual situation and the requirement.

Optionally, in an implementation manner, the adjusting the parameters of the target classification model based on the result loss may be further implemented by the following steps one-step two:

step one: weighting the result loss and the classification loss corresponding to the target sample object to obtain the model loss of the classification model;

wherein the classification loss is a loss determined based on a difference of the first classification result and the class calibration result of the target sample object. The method comprises the steps that whether the class of a target sample object is different from the class calibration result of the target sample object is finally output by a target classification model is calculated, if the class of the target sample object is the same as the class calibration result of the target sample object, the classification loss is 0 if the class of the target sample object can be accurately identified by the target classification model, otherwise, the class of the target sample object cannot be accurately identified by the target classification model, and the classification loss is determined based on the difference between the class predicted by the target classification model and the class calibration result.

After determining the classification loss, the classification loss corresponding to the result loss and the target sample object may be weighted according to the configured weights for the classification loss and the result in advance, so as to obtain a model loss of the classification model.

Step two: based on the model loss, adjusting parameters of the target classification model according to a preset parameter adjustment mode.

The predetermined parameter adjustment manner may be the same as the existing manner of adjusting the parameter based on the loss, and the embodiments of the present disclosure are not described herein again. Illustratively, the predetermined parameter adjustment may be a random gradient descent, a batch gradient descent, or the like.

Because the classification loss is determined based on the difference between the first classification result and the class calibration result of the target sample object, the accuracy of model training can be further improved through the classification loss.

Optionally, in an implementation manner, in order to further improve the efficiency of model training, the parameters of the target classification model and the designated network layer of the original classification model are the same, and the designated network layer is a network layer except for the fully connected layer for outputting the classification class, where in this case, the parameters in the fully connected layer of the target classification model may be adjusted based on the model loss.

The training efficiency is further improved because only parameters of the full-connection layer are required to be adjusted in the training process of the target classification model.

According to the scheme provided by the disclosure, the target sample set only comprises part of sample objects belonging to the original category in the original sample set, so that the number of sample objects in the target sample set is smaller than that of total sample objects, therefore, the time required for training the newly-added model can be reduced, meanwhile, the accuracy of the newly-added model can be improved by adjusting the parameters of the newly-added model through utilizing the result loss between the trained original model and the newly-added model, and the efficiency and the accuracy of model training can be both considered through the scheme provided by the disclosure.

Optionally, in an embodiment, when the number of sample objects in each new class in the new sample set is smaller, the model prediction class after training is avoided from being biased to the original class by means of the ratio of the sample objects in the new class in the total sample objects, so that the accuracy of training the model can be further improved.

In one implementation, upsampling may be achieved by increasing the number of sample objects of the newly added class within the target sample set. For example, a plurality of sample objects may be replicated for each sample object belonging to the newly added type, thereby increasing the duty cycle of the newly added type of sample object in the total sample object.

In another implementation manner, the model may be further trained by using the sample object for each sample object multiple times, where the embodiment of the present disclosure may further include:

after adjusting the parameters of the target classification model based on the result loss, it may be identified whether the target sample object belongs to a sample object of the original class or to a sample object of a new class.

If the target sample object is a sample object belonging to the original category, the step of returning to the target sample set to acquire the sample object as the target sample object is performed.

If the target sample object is a sample object belonging to the newly added category, determining whether the utilized times of the target sample object is smaller than a preset threshold value; and if the number of times of utilization is not less than a preset threshold value, executing the step of returning to execute the step of acquiring the sample object from the target sample set as the target sample object, otherwise, executing the step of inputting the target sample object into the target classification model to obtain a first classification result.

The number of times the target sample object is utilized can be obtained by recording after training the model by utilizing the target sample object each time, and the number of times the target sample object is utilized is increased by 1 each time. The above-mentioned preset threshold may be determined based on actual use and requirements, for example, in the case that the number of sample objects of each original class is 1000, and the number of sample objects of each new class is 200, the preset threshold may be 5.

If the number of times of the utilization of the target sample object is smaller than the preset threshold, training the target classification model by using the target sample object continuously, namely, executing the step of inputting the target sample object into the target classification model to obtain a first classification result until the number of times of the utilization is not smaller than the preset threshold.

If the number of times the target sample object has been utilized is not less than the preset threshold, the sample object needs to be re-acquired to train the target classification model. I.e. the step of executing the return execution of the acquisition of a sample object from the target sample set as target sample object.

According to the scheme provided by the disclosure, the efficiency and the accuracy of model training can be considered, meanwhile, when the number of sample objects of each new type in the new sample set is small, the accuracy of training the model can be further improved by improving the ratio of the sample objects of the new type in the target sample set.

Optionally, in an embodiment, after all the sample objects in the target sample set are utilized, that is, after the training of the target classification model is completed, the target classification model may also be tested by using a test sample set, where the sample objects in the test sample set may be sample objects in the target sample set or sample objects collected by other manners. Specifically, sample objects in the test sample set are input into the target classification model, if the model loss of the target classification model meets the preset loss condition, the target classification model is determined to be trained, otherwise, the target classification model needs to be trained by reusing the target sample set.

In order to better understand the solution provided in the present disclosure, as shown in fig. 3, taking an image classification scenario as an example, the solution provided in the present disclosure is introduced, where an original sample set in the image classification scenario is a base, the base includes 1000 types of labels, each classification category includes 1000 images, a total of 100 ten thousand images, and the newly added sample set includes 20 types of labels, each classification category includes 200 images, and a total of 4000 images, and the method includes the steps of:

1. taking about 200 samples from each class of 1000 images of the bottom library, so that the total amount of the bottom library is 20 ten thousand, which is four fifths less than the original 100 ten thousand, namely about four fifths less training time;

2. 200 images of 20 types are added, the up-sampling is carried out for about 5 times during training, and the total amount of the added images is changed into 2 ten thousand. The reason for doubling the newly added image is that if the doubling is not performed, the newly added 4000 images are lower in specific gravity than 20 ten thousand images of the base, and the newly added 20 images are less in full-connection updating, so that the accuracy rate on the 1000 images of the base is high, the accuracy rate on the 20 images of the newly added 20 images is low, and after the doubling is performed, the original 1000 images can be guaranteed to have good accuracy rate, and the newly added 20 images can be guaranteed to have good accuracy rate because of doubling.

3. When training is performed, the information of the bottom library model needs to be kept as far as possible, and the newly added model training can be distilled by using the bottom library model, as shown in the following fig. 4.

In the figure, x is the input image, n is the original category number 1000, m is the added category number 20,is a bottom warehouseThe output of the full connection layer in the model obtains n-dimensional vector scores, wherein each dimension vector score is the probability that the type of the input image predicted by the original classification model is an original type, o ^n+m ＝[o ₁ ,o ₂ ,…,o _n ,o _n+1 ,…,o _n+m ]The method is characterized in that the target classification model outputs vector scores of the following n+m dimensions, and similarly, each dimension vector score is the probability that the target classification model predicts that the category of the input image is an original category or a newly added category, and n-dimensional vector scores in the n+m dimensions output by the target classification model are distilled through the vector scores in the n dimensions output by the original classification model, namely distillation losses are calculated. Meanwhile, the cross entropy loss of the vector score of the n+m dimension of the target classification model is required to be calculated and used as the classification loss of the target classification model.

4. After the training of the target classification model is completed, the target classification model can be tested by using a test sample set, wherein the sample images in the test sample set can be sample images in the target sample set or sample images collected by other modes. Specifically, sample images in the test sample set are input into the target classification model, if the model loss of the target classification model meets the preset loss condition, the target classification model is determined to be trained, otherwise, the target classification model needs to be trained by reusing the target sample set.

Through the online optimized distillation scheme, the image can be newly added for training on the basis of the original bottom library model, and the newly added image can be quickly and better trained under the condition of ensuring the accuracy of the bottom library image, so that the newly added model has higher accuracy and simultaneously has the recognition capability of the bottom library image.

According to an embodiment of the present disclosure, as shown in fig. 5, the present disclosure further provides an apparatus for model training, where the apparatus includes:

a sample set acquisition module 501, configured to acquire a target sample set; the target sample set comprises part of sample objects belonging to an original category in an original sample set and sample objects belonging to a new category in a new sample set, wherein the original sample set is a sample set utilized by an original classification model which is trained in advance;

the model training module 502 is configured to train a target classification model to be trained by using a target sample set based on original information corresponding to the original classification model, so as to obtain a trained target classification model;

Optionally, the model training module includes:

an object acquisition sub-module, configured to acquire a sample object from a target sample set as a target sample object;

the object input sub-module is used for inputting the target sample object into the target classification model to obtain a first classification result, and inputting the target sample object into the original classification model to obtain a second classification result;

the loss determination submodule is used for determining the loss of the target classification model by utilizing the difference between the result of the original category and the result of the second category in the first classification result and taking the loss as the result loss of the target classification model;

and the parameter adjustment sub-module is used for adjusting parameters of the target classification model based on the result loss, and returning to execute the step of acquiring the sample object from the target sample set as the target sample object before the sample objects in the target sample set are all utilized, so as to continuously train the target classification model.

Optionally, the parameter adjustment sub-module includes:

the loss weighting unit is used for weighting the result loss and the classification loss corresponding to the target sample object to obtain the model loss of the classification model; wherein the classification loss is a loss determined based on a difference of the first classification result and a class calibration result of the target sample object;

And the parameter adjustment unit is used for adjusting the parameters of the target classification model according to a preset parameter adjustment mode based on the model loss.

Optionally, the parameters of the designated network layer of the target classification model are the same as those of the designated network layer of the original classification model, and the designated network layer is a network layer except for a full connection layer for outputting classification types;

and the parameter adjusting unit is also used for adjusting parameters in the full-connection layer of the target classification model based on the model loss.

Optionally, the loss determination submodule includes:

the difference value calculation sub-module is used for calculating the difference value of the results of each original category and the corresponding second category in the first category result, and taking the sum of the calculated difference values as the result loss of the target category model; or alternatively, the process may be performed,

and the divergence calculation sub-module is used for calculating the divergence between the result aiming at the original category and the second category in the first category result as the result loss of the target category model.

Optionally, the divergence calculating sub-module is further configured to calculate a jensen-shannon JS divergence between the result for the original category and the second category in the first category result, as a result loss of the target category model.

Optionally, the parameter adjustment sub-module is further configured to, after adjusting the parameters of the target classification model based on the result loss, execute a step of returning to execute the sample object obtained from the target sample set as the target sample object if the target sample object is the sample object belonging to the original class; if the target sample object is a sample object belonging to the newly added category, determining whether the utilized times of the target sample object is smaller than a preset threshold value; and if the number of times of utilization is not less than a preset threshold value, executing the step of returning to execute the step of acquiring the sample object from the target sample set as the target sample object, otherwise, executing the step of inputting the target sample object into the target classification model to obtain a first classification result.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

The embodiment of the disclosure provides an electronic device, comprising:

at least one processor; and

The present disclosure provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method of model training.

The present disclosure provides a computer program product comprising a computer program which, when executed by a processor, implements a method of model training.

Fig. 6 illustrates a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the various methods and processes described above, such as the method of model training. For example, in some embodiments, the method of model training may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM 603 and executed by the computing unit 601, one or more steps of the method of model training described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the method of model training in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of model training, comprising:

acquiring a target sample set; the target sample set comprises part of sample objects belonging to an original category in an original sample set and sample objects belonging to a new category in a new sample set, wherein the original sample set is a sample set utilized by an original classification model which is trained in advance, the sample objects in the original sample set and the new sample set are images, and the original classification model is a model for classifying the images;

the target classification model is a model for classifying images, and each class of the target classification model for reasoning comprises the original class and the newly added class; the original information corresponding to the original classification model comprises: aiming at each sample object in the target sample set, classifying by using the original classification model to obtain a classification result;

The training of the target classification model to be trained by adopting the target sample set based on the original information corresponding to the original classification model comprises the following steps:

obtaining a sample object from the target sample set as a target sample object;

inputting the target sample object into the target classification model to obtain a first classification result, and inputting the target sample object into the original classification model to obtain a second classification result; the first classification result is the probability that the target sample object belongs to the original category and the newly added category, and the second classification result is the probability that the target sample object belongs to the original category;

determining the loss of the target classification model by utilizing the difference between the result of the original category and the second classification result in the first classification result, and taking the loss as the result loss of the target classification model;

based on the result loss, adjusting parameters of the target classification model, and before all sample objects in the target sample set are utilized, returning to execute the step of acquiring the sample objects from the target sample set as target sample objects, and continuing to train the target classification model;

Wherein said adjusting parameters of said target classification model based on said resulting loss comprises:

weighting the result loss and the classification loss corresponding to the target sample object to obtain model loss of the classification model; wherein the classification loss is a loss determined based on a difference of the first classification result and a class calibration result of the target sample object;

and adjusting the parameters of the target classification model according to a preset parameter adjustment mode based on the model loss.

2. The method of claim 1, wherein the target classification model is the same as parameters of a designated network layer of the original classification model, the designated network layer being a network layer other than a fully connected layer for outputting classification categories;

the step of adjusting the parameters of the target classification model according to a preset parameter adjustment mode based on the model loss comprises the following steps:

based on the model loss, parameters in a fully connected layer of the target classification model are adjusted.

3. The method of claim 1, wherein the determining the loss of the target classification model using the difference in the first classification result between the original class result and the second classification result as the result loss of the target classification model comprises:

Calculating the difference value of the first classification result aiming at each original class result and the corresponding second classification result, and taking the sum of the calculated difference values as the result loss of the target classification model; or alternatively, the process may be performed,

and calculating the divergence between the result aiming at the original category and the second category in the first category result as the result loss of the target category model.

4. A method according to claim 3, wherein said calculating a divergence between the results for the original category and the second category in the first classification result as a result loss of the target classification model comprises:

and calculating the Jasen-Shannon JS divergence between the result aiming at the original category and the second category in the first category result as the result loss of the target category model.

5. The method of claim 1, wherein after said adjusting parameters of said target classification model in a predetermined parameter adjustment based on said resulting loss, further comprising:

if the target sample object is the sample object belonging to the original category, returning to the step of acquiring the sample object from the target sample set as the target sample object;

If the target sample object is the sample object belonging to the new class, determining whether the utilized times of the target sample object is smaller than a preset threshold value; and if not, returning to execute the step of acquiring the sample object from the target sample set as a target sample object, otherwise, executing the step of inputting the target sample object into the target classification model to obtain a first classification result until the utilized times are not less than the preset threshold value.

6. An apparatus for model training, comprising:

the sample set acquisition module is used for acquiring a target sample set; the target sample set comprises part of sample objects belonging to an original category in an original sample set and sample objects belonging to a new category in a new sample set, wherein the original sample set is a sample set utilized by an original classification model which is trained in advance, the sample objects in the original sample set and the new sample set are images, and the original classification model is a model for classifying the images;

the model training module comprises:

an object acquisition sub-module, configured to acquire a sample object from the target sample set, as a target sample object;

the object input sub-module is used for inputting the target sample object into the target classification model to obtain a first classification result, and inputting the target sample object into the original classification model to obtain a second classification result; the first classification result is the probability that the target sample object belongs to the original category and the newly added category, and the second classification result is the probability that the target sample object belongs to the original category;

a loss determination submodule, configured to determine a loss of the target classification model by using a difference between the first classification result and the second classification result for the original classification result, as a result loss of the target classification model;

The parameter adjustment sub-module is used for adjusting parameters of the target classification model based on the result loss, and returning to execute the step of acquiring the sample object from the target sample set as a target sample object before the sample objects in the target sample set are all utilized, so as to continuously train the target classification model;

wherein, the parameter adjustment submodule includes:

7. The apparatus of claim 6, wherein the target classification model is the same as parameters of a designated network layer of the original classification model, the designated network layer being a network layer other than a fully connected layer for outputting classification categories;

the parameter adjusting unit is specifically configured to adjust parameters in a full connection layer of the target classification model based on the model loss.

8. The apparatus of claim 6, wherein the loss determination submodule comprises:

9. The apparatus of claim 8, wherein the divergence calculation sub-module is further configured to calculate a jensen-shannon JS divergence between the results for the original category and the second category in the first category result as a result loss for the target category model.

10. The apparatus of claim 6, wherein the parameter adjustment sub-module is further configured to perform the returning to perform the step of obtaining a sample object from the target sample set as a target sample object if the target sample object is a sample object belonging to the original class after the parameter of the target classification model is adjusted in a predetermined parameter adjustment manner based on the result loss; if the target sample object is the sample object belonging to the new class, determining whether the utilized times of the target sample object is smaller than a preset threshold value; and if not, executing the step of returning to execute the sample object obtained from the target sample set as a target sample object, otherwise, executing the step of inputting the target sample object into the target classification model to obtain a first classification result until the utilized times are not less than the preset threshold.

11. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.