CN113326852A

CN113326852A - Model training method, device, equipment, storage medium and program product

Info

Publication number: CN113326852A
Application number: CN202110650902.XA
Authority: CN
Inventors: 杨馥魁
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-06-11
Filing date: 2021-06-11
Publication date: 2021-08-31

Abstract

The disclosure discloses a model training method, a model training device, model training equipment, storage media and a program product, and relates to the technical field of artificial intelligence, in particular to the technical field of computer vision and deep learning. The specific implementation scheme is as follows: acquiring a sample image set; inputting sample images in the sample image set to a pre-trained first model to obtain a first feature set, wherein the first model is a teacher model; determining a first positive sample feature set and a first negative sample feature set based on the first feature set, and generating a sample feature pair; and taking the sample images in the sample image set as input, taking the second feature set as output, and training a second model to obtain a feature extraction model, wherein the second model is a student model, and a loss function of the second model is constructed based on the sample feature pairs of the first model and the sample feature pairs of the second model. The knowledge distillation method for the positive and negative sample characteristic pair relation is realized by the scheme, and the distillation precision is improved.

Description

Model training method, device, equipment, storage medium and program product

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to the field of computer vision and deep learning technologies, and in particular, to a model training method and apparatus, an electronic device, a storage medium, and a program product.

Background

With the wide application of deep learning in multiple fields such as image recognition, voice recognition, natural language processing and the like, most models are too complicated in calculation processing and cannot run on a mobile terminal or embedded equipment, so that the models need to be compressed, and a knowledge distillation technology is one of important technologies in model compression.

The traditional knowledge distillation technology takes a soft target related to a teacher model as a part of total loss to induce the training of the student model and realize knowledge transfer, and the knowledge distillation algorithm is mostly completed by fitting the characteristics of the teacher model and the student model. Specifically, a teacher model is first trained on a set of labeled samples, and then the features learned by the teacher model are migrated to the student models. In the migration process, the student model learns the output of the teacher model on the one hand and the real label of the image on the other hand. And after the distillation process is finished, the student model is the image recognition model in the image recognition task.

Disclosure of Invention

The present disclosure provides a model training method, apparatus, device, storage medium, and program product and a method, apparatus, device, storage medium, and program product for generating information.

According to a first aspect of the present disclosure, there is provided a model training method, comprising: obtaining a sample image set, wherein the sample image set comprises sample images marked with labels; inputting sample images in the sample image set to a pre-trained first model to obtain a first feature set corresponding to the sample images in the input sample image set, wherein the first model is a trained teacher model; determining a first positive sample feature set corresponding to the first feature set and a first negative sample feature set corresponding to the first feature set based on the first feature set, and generating a sample feature pair of a first model for characterizing the first positive sample feature set and the first negative sample feature set; taking the sample images in the sample image set as input, taking a second feature set corresponding to the sample images in the input sample image set as output, and training a second model to obtain a feature extraction model, wherein the second model is a student model to be trained, a loss function of the second model is constructed based on a sample feature pair of the first model and a sample feature pair of the second model, and the sample feature pair of the second model is used for representing a positive sample feature set in the second feature set and a negative sample feature set in the second feature set.

According to a second aspect of the present disclosure, there is provided a method for generating information, comprising: acquiring a target image; and inputting the target image into a pre-trained feature extraction model to generate a feature set corresponding to the target image, wherein the feature extraction model is obtained by training with the method of any embodiment of the model training method.

According to a third aspect of the present disclosure, there is provided a model training apparatus comprising: an obtaining unit configured to obtain a sample image set, wherein the sample image set includes labeled sample images; the input unit is configured to input the sample images in the sample image set to a pre-trained first model to obtain a first feature set corresponding to the sample images in the input sample image set, wherein the first model is a trained teacher model; the generating unit is configured to determine a first positive sample feature set corresponding to the first feature set and a first negative sample feature set corresponding to the first feature set based on the first feature set, and generate a sample feature pair of a first model representing the first positive sample feature set and the first negative sample feature set; and the training unit is configured to take the sample images in the sample image set as input, take a second feature set corresponding to the sample images in the input sample image set as output, train a second model to obtain a feature extraction model, wherein the second model is a student model to be trained, a loss function of the second model is constructed based on a sample feature pair of the first model and a sample feature pair of the second model, and the sample feature pair of the second model is used for representing a positive sample feature set in the second feature set and a negative sample feature set in the second feature set.

According to a fourth aspect of the present disclosure, there is provided an apparatus for generating information, comprising: an image acquisition unit configured to acquire a target image; and the information generating unit is configured to input the target image to a pre-trained feature extraction model and generate a feature set corresponding to the target image, wherein the feature extraction model is obtained by training through the method of any one embodiment of the above model training method.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect or the second aspect.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions, wherein the computer instructions are configured to cause a computer to perform the method as described in any one implementation manner of the first aspect or the second aspect.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method as described in any of the implementations of the first or second aspect.

According to the technology disclosed by the invention, a sample image set is obtained, sample images in the sample image set are input to a pre-trained first model, a first characteristic set corresponding to the sample images in the input sample image set is obtained, wherein the first model is a trained teacher model, a first positive sample characteristic set corresponding to the first characteristic set and a first negative sample characteristic set corresponding to the first characteristic set are determined based on the first characteristic set, a sample characteristic pair of the first model for representing the first positive sample characteristic set and the first negative sample characteristic set is generated, the sample images in the sample image set are used as input, a second characteristic set corresponding to the sample images in the input sample image set is used as output, the second model is trained, a characteristic extraction model is obtained, wherein the second model is a student model to be trained, and a loss function of the second model is constructed based on the sample characteristic pair of the first model and the sample characteristic pair of the second model, the sample feature pairs of the second model are used for representing the positive sample feature set in the second feature set and the negative sample feature set in the second feature set, so that the knowledge distillation method for the relation of the positive sample feature pairs and the negative sample feature pairs is realized, and the distillation precision is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of a first embodiment of a model training method according to the present disclosure;

FIG. 2 is a diagram of a scenario in which a model training method of an embodiment of the present disclosure may be implemented;

FIG. 3 is a schematic diagram of a second embodiment of a model training method according to the present disclosure;

FIG. 4 is a schematic diagram of a first embodiment of a method for generating information in accordance with the present disclosure;

FIG. 5 is a schematic block diagram of one embodiment of a model training apparatus according to the present disclosure;

FIG. 6 is a schematic block diagram illustrating one embodiment of an apparatus for generating information according to the present disclosure;

FIG. 7 is a block diagram of an electronic device for implementing a model training method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 shows a schematic diagram 100 of a first embodiment of a model training method according to the present disclosure. The model training method comprises the following steps:

step 101, a sample image set is obtained.

In this embodiment, the executing subject (e.g., server) may obtain the sample image set from other electronic devices or locally by way of a wired or wireless connection. Wherein the sample image set may comprise a large number of sample images. The sample image may be an image obtained by photographing various kinds of objects. Typically, the sample image may be a labeled sample image. It should be noted that the above-mentioned wireless connection means may include, but is not limited to, 3G, 4G, 5G connection, WiFi connection, bluetooth connection, WiMAX connection, Zigbee connection, uwb (ultra wideband) connection, and other now known or later developed wireless connection means.

Step 102, inputting the sample images in the sample image set to a pre-trained first model, and obtaining a first feature set corresponding to the sample images in the input sample image set.

In this embodiment, for the sample images in the sample image set acquired in step 101, the executing subject may input the sample images in the sample image set to the first model trained in advance, and obtain a first feature set corresponding to the sample images in the input sample image set. Wherein, the first model is a trained teacher model. The teacher model can be a model obtained by performing supervised training on any neural network available for classification in advance by using a labeled sample set. In general, a teacher model is a complex model with high precision and slow reasoning speed, and the capability of the teacher model is generally larger than that of a student model. The first set of features may characterize a correspondence between labels of sample images in the sample image set and feature information of corresponding sample images in the sample image set. The feature information of the image may be information for characterizing features of the image, and the features of the image may be various basic elements of the image (e.g., color, shape, lines, texture, etc.).

Step 103, based on the first feature set, determining a first positive sample feature set corresponding to the first feature set and a first negative sample feature set corresponding to the first feature set, and generating a sample feature pair of the first model characterizing the first positive sample feature set and the first negative sample feature set.

In this embodiment, the executive body may classify the obtained first feature set by using a sample feature classification manner, generate a first positive sample feature set corresponding to the first feature set and a first negative sample feature set corresponding to the first feature set and characterizing the positive sample features, and then summarize the first positive sample feature set and the first negative sample feature set, so as to generate a sample feature pair of the first model characterizing the first positive sample feature set and the first negative sample feature set. The positive sample features can represent features of sample images with high similarity belonging to the same category, and the negative sample features can represent features of sample images with low similarity not belonging to the same category.

It should be noted that the sample feature classification model may be a corresponding relationship table, which is pre-established by technicians based on a large amount of statistical calculations and used for characterizing image feature information and positive sample features and negative sample features; the model may be obtained by training using various conventional Logistic Regression models (LR).

And 104, taking the sample images in the sample image set as input, taking a second feature set corresponding to the sample images in the input sample image set as output, and training a second model to obtain a feature extraction model.

In this embodiment, the executing entity may use a machine learning algorithm to train the second model by taking the sample images in the sample image set as input and taking the second feature set corresponding to the sample images in the input sample image set as output, so as to obtain the feature extraction model. The second model is a student model to be trained, a loss function of the second model is constructed based on the sample feature pairs of the first model and the sample feature pairs of the second model, and the sample feature pairs of the second model are used for representing a positive sample feature set in the second feature set and a negative sample feature set in the second feature set. The number of features of the first set of features and the second set of features may be the same or may be different. The generation process of the positive sample feature set, the negative sample feature set and the sample feature pairs of the second model is basically the same as that of the first model. The student model can be any neural network that can be used for classification, whether or not it is pre-trained. Student models are compact and low complexity models, often with less capability than teacher models. Here, the feature extraction model can be obtained by performing supervised training on the student model. Generally, the trained student model can be directly used as a feature extraction model, when an image to be recognized is received, the trained student model is directly used for feature extraction or image recognition of the image to be recognized, a teacher model is not used, and therefore the image recognition speed is increased.

It should be noted that the execution subject may store a first model, a second model, and a feature extraction model that are trained in advance, a network architecture of each model is defined in advance, and each model may be, for example, a data table or a calculation formula, and the present embodiment does not limit this aspect at all. The machine learning algorithm is a well-known technology widely studied and applied at present, and is not described herein again.

For ease of understanding, a scenario is provided in which the model training method of the embodiment of the present disclosure may be implemented, and referring to fig. 2, the model training method 200 of the embodiment runs in a server 201. The server 201 first obtains a sample image set 202, wherein the sample image set includes labeled sample images, then the server 201 inputs the sample images in the sample image set to a pre-trained first model to obtain a first feature set 203 corresponding to the sample images in the input sample image set, then the server 201 determines a first positive sample feature set corresponding to the first feature set and a first negative sample feature set corresponding to the first feature set based on the first feature set to generate a sample feature pair 204 of the first model characterizing the first positive sample feature set and the first negative sample feature set, and finally the server 201 takes the sample images in the sample image set as input, takes a second feature set corresponding to the sample images in the input sample image set as output, trains the second model to obtain a feature extraction model 205, wherein the first model is a trained teacher model, the second model is a student model to be trained, a loss function of the second model is constructed based on the sample feature pairs of the first model and the sample feature pairs of the second model, and the sample feature pairs of the second model are used for representing a positive sample feature set in the second feature set and a negative sample feature set in the second feature set.

The model training method provided by the above embodiment of the present disclosure includes obtaining a sample image set, inputting sample images in the sample image set to a pre-trained first model, obtaining a first feature set corresponding to the sample images in the input sample image set, where the first model is a trained teacher model, determining a first positive sample feature set corresponding to the first feature set and a first negative sample feature set corresponding to the first feature set based on the first feature set, generating a sample feature pair of the first model characterizing the first positive sample feature set and the first negative sample feature set, taking the sample images in the sample image set as input, taking a second feature set corresponding to the sample images in the input sample image set as output, training the second model, and obtaining a feature extraction model, where the second model is a student model to be trained, and a loss function of the second model is constructed based on the sample feature pair of the first model and the sample feature pair of the second model The sample feature pairs of the second model are used for representing the positive sample feature set in the second feature set and the negative sample feature set in the second feature set, the problem that information constraint in the class is lacked in the prior distillation technology is solved, a knowledge distillation method for the positive and negative sample feature pair relation is realized, the positive and negative sample feature pair relation output by the student model is made to approach the positive and negative sample feature pair relation output by the teacher model as much as possible, the purposes of small intra-class distance and large inter-class distance are achieved, and the distillation precision is improved.

With further reference to FIG. 3, a schematic diagram 300 of a second embodiment of a model training method is shown. The process of the method comprises the following steps:

in step 301, a sample image set is obtained.

Step 302, inputting the sample images in the sample image set to a pre-trained first model, and obtaining a first feature set corresponding to the sample images in the input sample image set.

Step 303, based on the first feature set, performing sample division on the sample images in the input sample image set to obtain a first positive sample set corresponding to the sample images in the sample image set and a first negative sample set corresponding to the sample images in the sample image set.

In this embodiment, the execution subject may analyze the first feature set, and perform sample division on the sample images in the input sample image set based on the analysis result to obtain a first positive sample set representing a positive sample corresponding to the sample images in the sample image set and a first negative sample set representing a negative sample corresponding to the sample images in the sample image set. The positive samples may represent samples belonging to a certain category, and the negative samples may represent samples not belonging to a certain category, for example, to perform image recognition of the letter a, a sample that is the letter a belongs to the positive sample, and a sample that is not the letter a belongs to the negative sample.

In some optional implementation manners of this embodiment, based on the first feature set, performing sample division on sample images in the input sample image set to obtain a first positive sample set corresponding to the sample images in the sample image set and a first negative sample set corresponding to the sample images in the sample image set, including: analyzing the first feature set to generate a similarity matrix corresponding to the first feature set, wherein the similarity matrix represents the similarity degree of each sample image in the input sample image set; according to the similarity matrix and the similarity threshold, sample division is carried out on the sample images in the input sample image set to obtain a first positive sample set corresponding to the sample images in the sample image set and a first negative sample set corresponding to the sample images in the sample image set, the positive sample set represents samples with similarity values larger than the similarity threshold in the similarity matrix, and the negative sample set represents samples with similarity values not larger than the similarity threshold in the similarity matrix. An efficient and accurate positive and negative sample division method is realized.

Step 304, selecting the first feature set according to the first positive sample set and the first negative sample set to obtain a first positive sample feature set corresponding to the first positive sample set and a first negative sample feature set corresponding to the first negative sample set.

In this embodiment, the executing subject may select the first feature set according to the first positive sample set and the first negative sample set obtained in step 303, so as to obtain a first positive sample feature set corresponding to the first positive sample set and a first negative sample feature set corresponding to the first negative sample set. The first positive sample feature set is a set of features of each positive sample, and the first negative sample feature set is a set of features of each negative sample.

Step 305, merging the first positive sample feature set and the first negative sample feature set, and generating a sample feature pair of the first model for characterizing the first positive sample feature set and the first negative sample feature set.

In this embodiment, the executing subject may combine the first positive sample feature set and the first negative sample feature set according to the sample feature pair generation rule, and generate a sample feature pair of the first model that characterizes the first positive sample feature set and the first negative sample feature set.

Step 306, inputting the sample images in the sample image set to the second model, and obtaining a second feature set corresponding to the sample images in the input sample image set.

In this embodiment, the executing subject may input the sample images in the sample image set to the second model, and obtain the second feature set corresponding to the sample images in the input sample image set.

And 307, determining a second positive sample feature set corresponding to the second feature set and a second negative sample feature set corresponding to the second feature set based on the second feature set, and generating a sample feature pair of a second model characterizing the second positive sample feature set and the second negative sample feature set.

In this embodiment, the executive subject may classify the second feature set by using a sample feature classification manner, generate a second positive sample feature set representing the positive sample features and a second negative sample feature set representing the negative sample features, which correspond to the second feature set, and then summarize the second positive sample feature set and the second negative sample feature set, so as to generate a sample feature pair of the second model representing the second positive sample feature set and the second negative sample feature set.

In some optional implementations of this embodiment, determining, based on the second feature set, a second positive sample feature set corresponding to the second feature set and a second negative sample feature set corresponding to the second feature set, and generating a sample feature pair of a second model characterizing the second positive sample feature set and the second negative sample feature set includes: based on the second feature set, carrying out sample division on the sample images in the input sample image set to obtain a second positive sample set corresponding to the sample images in the sample image set and a second negative sample set corresponding to the sample images in the sample image set; selecting a second feature set according to the second positive sample set and the second negative sample set to obtain a second positive sample feature set corresponding to the second positive sample set and a second negative sample feature set corresponding to the second negative sample set; and merging the second positive sample feature set and the second negative sample feature set to generate a sample feature pair of the second model for characterizing the second positive sample feature set and the second negative sample feature set. And the generation of the sample feature pairs is completed based on the division of the samples, and another positive and negative sample feature pair division method is realized and is used as the data support of the subsequent model training.

In some optional implementation manners of this embodiment, based on the second feature set, performing sample division on sample images in the input sample image set to obtain a second positive sample set corresponding to the sample images in the sample image set and a second negative sample set corresponding to the sample images in the sample image set, including: analyzing the second feature set to generate a similarity matrix corresponding to the second feature set; and according to the similarity matrix and the similarity threshold corresponding to the second feature set, performing sample division on the sample images in the input sample image set to obtain a second positive sample set corresponding to the sample images in the sample image set and a second negative sample set corresponding to the sample images in the sample image set. The method for efficiently and accurately dividing the positive and negative samples is realized, and the loss function is accurately and reasonably calculated by keeping the sample characteristic pairs consistent with the sample characteristic pairs of the first model.

Step 308, a loss function is calculated based on the pair of sample features of the first model and the pair of sample features of the second model.

In this embodiment, the executing entity may calculate the loss function based on the pair of sample features of the first model generated in step 305 and the pair of sample features of the second model generated in step 307. Wherein the objective of the loss function is to make the distance of the negative examples as far as possible and the distance of the positive examples as close as possible, and to make the distance between the positive and negative examples of the second model close to the distance between the positive and negative examples of the first model. Here, the formula of the loss function may be expressed as:

where m is a constant and e is a constant ≈ 2.71828. The model training achieves the purpose that the positive and negative sample characteristic pair relation output by the student model approaches the positive and negative sample characteristic pair relation output by the teacher model as much as possible, and the purposes that the intra-class distance is small enough and the inter-class distance is large enough are achieved.

And 309, performing back propagation on the loss function based on gradient descent, and updating parameters of the second model to obtain a feature extraction model.

In this embodiment, the executing entity may perform back propagation on the loss function based on gradient descent, and update the parameters of the second model to obtain the feature extraction model.

In general, for each iteration of the training, the performing agent may first determine whether the loss function calculated in the iteration of the training is minimized. If the loss function is minimized, the student model is converged, and the student model at the moment can be used as an image recognition model. If the loss function is not minimized, the student model is not converged, and then the loss function is reversely propagated based on gradient descent, the parameters of the student model are updated, and the next round of iterative training is continued. The more the number of rounds of iterative training is, the higher the precision of the feature extraction model obtained by training is.

In some optional implementation manners of this embodiment, when training the student model, the adopted loss function may be a cross entropy loss function, the adopted learning rate may be a cosine descent strategy, and the adopted optimizer may be a momentum gradient descent optimizer. A cross entropy loss function is adopted during the training of the student model, the cross entropy loss function does not relate to other hyper-parameters (such as temperature parameters), manpower and material resources are not consumed for parameter adjustment, and iteration is faster and easier.

In some optional implementations of this embodiment, the method further includes: performing image preprocessing on a sample image in a sample image set, and adding the processed sample image to the sample image set, wherein the image preprocessing comprises at least one of the following steps: image negation, image compression and contrast stretching. By preprocessing the image, irrelevant information in the image is eliminated, useful real information is recovered, the detectability of the relevant information is enhanced, and the data is simplified to the maximum extent; by adding the processed sample images into the sample image set, sample data can be expanded, and the accuracy of model training is improved.

In this embodiment, the specific operations of

steps

301 and 302 are substantially the same as the operations of

steps

101 and 102 in the embodiment shown in fig. 1, and are not described again here.

As can be seen from fig. 3, compared with the embodiment corresponding to fig. 1, the schematic diagram 300 of the model training method in this embodiment employs dividing the training samples to obtain the divided positive and negative sample feature pairs, calculating the loss function based on the sample feature pair of the first model and the sample feature pair of the second model, wherein the loss function is aimed at making the distance between the negative samples as far as possible and the distance between the positive samples as close as possible, making the distance between the positive sample and the negative sample of the second model close to the distance between the positive sample and the negative sample of the first model, performing back propagation on the loss function based on gradient descent, updating the parameters of the second model to obtain the feature extraction model, and implementing another knowledge distillation method for the positive and negative sample feature pair relationship to make the positive and negative sample feature pair relationship output by the student model approach to the positive and negative sample feature pair relationship output by the teacher model as close as possible, the purposes that the intra-class distance is small enough and the inter-class distance is large enough are achieved, and the distillation precision is improved.

With further reference to fig. 4, a schematic diagram 400 of a first embodiment of a method for generating information according to the present disclosure is presented. The method for generating information comprises the following steps:

step 401, a target image is acquired.

In this embodiment, the execution subject (e.g., a server or a terminal device) may acquire the target image from other electronic devices or locally by means of wired connection or wireless connection.

Step 402, inputting the target image into a pre-trained feature extraction model, and generating a feature set corresponding to the target image.

In this embodiment, the executing subject may input the target image acquired in step 401 to a pre-trained feature extraction model, and generate a feature set corresponding to the target image. The feature extraction model is obtained by training through the method of any one embodiment of the model training method.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 1, the flow 400 of the method for generating information in the present embodiment highlights the step of generating the feature set corresponding to the target image by using the trained feature extraction model. Therefore, the scheme described in the embodiment can utilize a more accurate model to realize the extraction of the features with different types, different levels and different depths and rich pertinence.

With further reference to fig. 5, as an implementation of the method shown in fig. 1 to 3, the present disclosure provides an embodiment of a model training apparatus, which corresponds to the embodiment of the method shown in fig. 1, and besides the features described below, the embodiment of the apparatus may further include the same or corresponding features as the embodiment of the method shown in fig. 1, and produce the same or corresponding effects as the embodiment of the method shown in fig. 1, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 5, the model training apparatus 500 of the present embodiment includes: the system comprises an acquisition unit 501, an input unit 502, a generation unit 503 and a training unit 504, wherein the acquisition unit is configured to acquire a sample image set, wherein the sample image set comprises sample images marked with labels; the input unit is configured to input the sample images in the sample image set to a pre-trained first model to obtain a first feature set corresponding to the sample images in the input sample image set, wherein the first model is a trained teacher model; the generating unit is configured to determine a first positive sample feature set corresponding to the first feature set and a first negative sample feature set corresponding to the first feature set based on the first feature set, and generate a sample feature pair of a first model representing the first positive sample feature set and the first negative sample feature set; and the training unit is configured to take the sample images in the sample image set as input, take a second feature set corresponding to the sample images in the input sample image set as output, train a second model to obtain a feature extraction model, wherein the second model is a student model to be trained, a loss function of the second model is constructed based on a sample feature pair of the first model and a sample feature pair of the second model, and the sample feature pair of the second model is used for representing a positive sample feature set in the second feature set and a negative sample feature set in the second feature set.

In this embodiment, specific processes of the obtaining unit 501, the input unit 502, the generating unit 503, and the training unit 504 of the model training apparatus 500 and technical effects thereof may refer to the related descriptions of step 101 to step 104 in the embodiment corresponding to fig. 1, and are not described herein again.

In some optional implementations of this embodiment, the generating unit includes: the dividing module is configured to perform sample division on sample images in the input sample image set based on the first feature set to obtain a first positive sample set corresponding to the sample images in the sample image set and a first negative sample set corresponding to the sample images in the sample image set; the selecting module is configured to select the first feature set according to the first positive sample set and the first negative sample set to obtain a first positive sample feature set corresponding to the first positive sample set and a first negative sample feature set corresponding to the first negative sample set; a merging module configured to merge the first positive sample feature set and the first negative sample feature set to generate a sample feature pair of the first model characterizing the first positive sample feature set and the first negative sample feature set.

In some optional implementations of this embodiment, the dividing module is further configured to: analyzing the first feature set to generate a similarity matrix corresponding to the first feature set; according to the similarity matrix and the similarity threshold, sample division is carried out on the sample images in the input sample image set to obtain a first positive sample set corresponding to the sample images in the sample image set and a first negative sample set corresponding to the sample images in the sample image set, the positive sample set represents samples with similarity values larger than the similarity threshold in the similarity matrix, and the negative sample set represents samples with similarity values not larger than the similarity threshold in the similarity matrix.

In some optional implementations of this embodiment, the training unit includes: an input module configured to input sample images in the sample image set to a second model, resulting in a second feature set corresponding to the sample images in the input sample image set; the generating module is configured to determine a second positive sample feature set corresponding to the second feature set and a second negative sample feature set corresponding to the second feature set based on the second feature set, and generate a sample feature pair of a second model representing the second positive sample feature set and the second negative sample feature set; a calculation module configured to calculate a loss function based on the pair of sample features of the first model and the pair of sample features of the second model; an updating module configured to back-propagate the loss function based on the gradient descent, updating parameters of the second model.

In some optional implementations of this embodiment, the generating module includes: the dividing submodule is configured to perform sample division on the sample images in the input sample image set based on the second feature set to obtain a second positive sample set corresponding to the sample images in the sample image set and a second negative sample set corresponding to the sample images in the sample image set; the selecting submodule is configured to select the second feature set according to the second positive sample set and the second negative sample set to obtain a second positive sample feature set corresponding to the second positive sample set and a second negative sample feature set corresponding to the second negative sample set; and the merging submodule is configured to merge the second positive sample feature set and the second negative sample feature set and generate a sample feature pair of the second model for characterizing the second positive sample feature set and the second negative sample feature set.

In some optional implementations of this embodiment, the partitioning sub-module is further configured to: analyzing the second feature set to generate a similarity matrix corresponding to the second feature set; and according to the similarity matrix and the similarity threshold corresponding to the second feature set, performing sample division on the sample images in the input sample image set to obtain a second positive sample set corresponding to the sample images in the sample image set and a second negative sample set corresponding to the sample images in the sample image set.

In some optional implementations of the present embodiment, the penalty function objective is to approximate the distance between the positive and negative examples of the second model to the distance between the positive and negative examples of the first model.

In some optional implementations of this embodiment, the apparatus further includes: a preprocessing unit configured to perform image preprocessing on a sample image in a sample image set and add the processed sample image to the sample image set, wherein the image preprocessing includes at least one of: image negation, image compression and contrast stretching.

The above embodiment of the present disclosure provides a model training apparatus, where a sample image set is obtained by an obtaining unit, an input unit inputs sample images in the sample image set to a first model trained in advance, a first feature set corresponding to the sample images in the input sample image set is obtained, a generating unit determines, based on the first feature set, a first positive sample feature set corresponding to the first feature set and a first negative sample feature set corresponding to the first feature set, and generates a sample feature pair of the first model characterizing the first positive sample feature set and the first negative sample feature set, a training unit takes the sample images in the sample image set as input, takes a second feature set corresponding to the sample images in the input sample image set as output, trains a second model to obtain a feature extraction model, where the second model is a student model to be trained, and a loss function of the second model is constructed based on the sample feature pair of the first model and the sample feature pair of the second model The sample characteristic pair of the second model is used for representing the positive sample characteristic set and the negative sample characteristic set in the second characteristic set, model training modes based on knowledge distillation technology are enriched, the purposes that the intra-class distance is small enough and the inter-class distance is large enough are achieved, and the distillation precision is improved.

With continuing reference to fig. 6, as an implementation of the method shown in fig. 4 described above, the present disclosure provides an embodiment of an apparatus for generating information, the apparatus embodiment corresponds to the method embodiment shown in fig. 4, and in addition to the features described below, the apparatus embodiment may further include the same or corresponding features as the method embodiment shown in fig. 4, and produce the same or corresponding effects as the method embodiment shown in fig. 4, and the apparatus may be applied to various electronic devices.

As shown in fig. 6, the apparatus 600 for generating information of the present embodiment includes: an image acquisition unit 601 and an information generation unit 602, wherein the image acquisition unit is configured to acquire a target image; and the information generating unit is configured to input the target image to a pre-trained feature extraction model and generate a feature set corresponding to the target image, wherein the feature extraction model is obtained by training through the method of any one embodiment of the above model training method.

In this embodiment, specific processes of the image obtaining unit 601 and the information generating unit 602 of the apparatus 600 for generating information and technical effects brought by the processes can refer to the related descriptions of step 401 to step 402 in the embodiment corresponding to fig. 4, which are not described herein again.

It should be noted that, in the technical solution of the present disclosure, if the acquisition, storage, application, etc. of the personal information of the user are involved, the requirements of the relevant laws and regulations are met, and the customs of the public order is not violated.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 701 performs the various methods and processes described above, such as method XXX. For example, in some embodiments, method XXX may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM 702 and/or communications unit 709. When the computer program is loaded into RAM703 and executed by computing unit 701, one or more steps of method XXX described above may be performed. Alternatively, in other embodiments, computing unit 701 may be configured to perform method XXX by any other suitable means (e.g., by way of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A model training method, comprising:

obtaining a sample image set, wherein the sample image set comprises sample images marked with labels;

inputting the sample images in the sample image set to a pre-trained first model to obtain a first feature set corresponding to the input sample images in the sample image set, wherein the first model is a trained teacher model;

determining a first positive sample feature set corresponding to the first feature set and a first negative sample feature set corresponding to the first feature set based on the first feature set, and generating a sample feature pair of the first model for characterizing the first positive sample feature set and the first negative sample feature set;

taking the sample images in the sample image set as input, taking a second feature set corresponding to the input sample images in the sample image set as output, and training a second model to obtain a feature extraction model, wherein the second model is a student model to be trained, a loss function of the second model is constructed based on a sample feature pair of the first model and a sample feature pair of the second model, and the sample feature pair of the second model is used for representing a positive sample feature set in the second feature set and a negative sample feature set in the second feature set.

2. The method of claim 1, wherein the determining, based on the first feature set, a first positive sample feature set corresponding to the first feature set and a first negative sample feature set corresponding to the first feature set, and generating sample feature pairs of the first model characterizing the first positive sample feature set and the first negative sample feature set, comprises:

based on the first feature set, performing sample division on the input sample images in the sample image set to obtain a first positive sample set corresponding to the sample images in the sample image set and a first negative sample set corresponding to the sample images in the sample image set;

selecting the first feature set according to the first positive sample set and the first negative sample set to obtain a first positive sample feature set corresponding to the first positive sample set and a first negative sample feature set corresponding to the first negative sample set;

merging the first positive sample feature set and the first negative sample feature set to generate a sample feature pair of the first model characterizing the first positive sample feature set and the first negative sample feature set.

3. The method according to claim 2, wherein the sample partitioning of the input sample images in the sample image set based on the first feature set to obtain a first positive sample set corresponding to the sample images in the sample image set and a first negative sample set corresponding to the sample images in the sample image set comprises:

analyzing the first feature set to generate a similarity matrix corresponding to the first feature set;

according to the similarity matrix and the similarity threshold, sample division is carried out on the input sample images in the sample image set, a first positive sample set corresponding to the sample images in the sample image set and a first negative sample set corresponding to the sample images in the sample image set are obtained, the positive sample set represents samples with similarity values larger than the similarity threshold in the similarity matrix, and the negative sample set represents samples with similarity values not larger than the similarity threshold in the similarity matrix.

4. The method according to one of claims 1 to 3, wherein the training of the second model using the input corresponding sample images in the sample image set as input and the second feature set corresponding to the input sample images in the sample image set as output comprises:

inputting sample images in the sample image set into the second model to obtain a second feature set corresponding to the input sample images in the sample image set;

determining a second positive sample feature set corresponding to the second feature set and a second negative sample feature set corresponding to the second feature set based on the second feature set, and generating a sample feature pair of the second model characterizing the second positive sample feature set and the second negative sample feature set;

calculating a loss function based on the pair of sample features of the first model and the pair of sample features of the second model;

and reversely propagating the loss function based on gradient descent, and updating the parameters of the second model.

5. The method of claim 4, wherein the determining, based on the second feature set, a second positive sample feature set corresponding to the second feature set and a second negative sample feature set corresponding to the second feature set, and generating sample feature pairs of the second model characterizing the second positive sample feature set and the second negative sample feature set, comprises:

based on the second feature set, performing sample division on the input sample images in the sample image set to obtain a second positive sample set corresponding to the sample images in the sample image set and a second negative sample set corresponding to the sample images in the sample image set;

selecting the second feature set according to the second positive sample set and the second negative sample set to obtain a second positive sample feature set corresponding to the second positive sample set and a second negative sample feature set corresponding to the second negative sample set;

merging the second positive sample feature set and the second negative sample feature set to generate a sample feature pair of the second model characterizing the second positive sample feature set and the second negative sample feature set.

6. The method according to claim 5, wherein the sample partitioning of the input sample images in the sample image set based on the second feature set to obtain a second positive sample set corresponding to the sample images in the sample image set and a second negative sample set corresponding to the sample images in the sample image set comprises:

analyzing the second feature set to generate a similarity matrix corresponding to the second feature set;

and according to the similarity matrix and the similarity threshold corresponding to the second feature set, performing sample division on the input sample images in the sample image set to obtain a second positive sample set corresponding to the sample images in the sample image set and a second negative sample set corresponding to the sample images in the sample image set.

7. The method of claim 1, wherein the penalty function objective is to approximate a distance between positive and negative samples of the second model to a distance between positive and negative samples of the first model.

8. The method of one of claims 1 to 7, further comprising:

performing image preprocessing on a sample image in the sample image set, and adding the processed sample image to the sample image set, wherein the image preprocessing includes at least one of: image negation, image compression and contrast stretching.

9. A method for generating information, comprising:

acquiring a target image;

inputting the target image into a pre-trained feature extraction model, and generating a feature set corresponding to the target image, wherein the feature extraction model is obtained by training according to the method of any one of claims 1 to 8.

10. A model training apparatus comprising:

an obtaining unit configured to obtain a sample image set, wherein the sample image set includes labeled sample images;

the input unit is configured to input sample images in the sample image set to a pre-trained first model, and obtain a first feature set corresponding to the input sample images in the sample image set, wherein the first model is a trained teacher model;

a generating unit configured to determine, based on the first feature set, a first positive sample feature set corresponding to the first feature set and a first negative sample feature set corresponding to the first feature set, and generate a sample feature pair of the first model characterizing the first positive sample feature set and the first negative sample feature set;

a training unit configured to take the sample images in the sample image set as input, take a second feature set corresponding to the input sample images in the sample image set as output, train a second model, and obtain a feature extraction model, where the second model is a student model to be trained, a loss function of the second model is constructed based on a sample feature pair of the first model and a sample feature pair of the second model, and the sample feature pair of the second model is used for representing a positive sample feature set in the second feature set and a negative sample feature set in the second feature set.

11. The apparatus of claim 10, wherein the generating unit comprises:

a dividing module configured to perform sample division on the input sample images in the sample image set based on the first feature set to obtain a first positive sample set corresponding to the sample images in the sample image set and a first negative sample set corresponding to the sample images in the sample image set;

a selecting module configured to select the first feature set according to the first positive sample set and the first negative sample set to obtain a first positive sample feature set corresponding to the first positive sample set and a first negative sample feature set corresponding to the first negative sample set;

a merging module configured to merge the first positive sample feature set and the first negative sample feature set to generate a sample feature pair of the first model characterizing the first positive sample feature set and the first negative sample feature set.

12. The apparatus of claim 11, wherein the partitioning module is further configured to:

13. The apparatus according to one of claims 10-12, wherein the training unit comprises:

an input module configured to input sample images in the sample image set to the second model, resulting in a second feature set corresponding to the input sample images in the sample image set;

a generating module configured to determine, based on the second feature set, a second positive sample feature set corresponding to the second feature set and a second negative sample feature set corresponding to the second feature set, and generate a sample feature pair of the second model characterizing the second positive sample feature set and the second negative sample feature set;

a calculation module configured to calculate a loss function based on the pair of sample features of the first model and the pair of sample features of the second model;

an update module configured to back-propagate the loss function based on gradient descent, updating parameters of the second model.

14. The apparatus of claim 13, wherein the generating means comprises:

a dividing submodule configured to perform sample division on the input sample images in the sample image set based on the second feature set, so as to obtain a second positive sample set corresponding to the sample images in the sample image set and a second negative sample set corresponding to the sample images in the sample image set;

the selecting submodule is configured to select the second feature set according to the second positive sample set and the second negative sample set to obtain a second positive sample feature set corresponding to the second positive sample set and a second negative sample feature set corresponding to the second negative sample set;

a merging submodule configured to merge the second positive sample feature set and the second negative sample feature set, generating a sample feature pair of the second model characterizing the second positive sample feature set and the second negative sample feature set.

15. The apparatus of claim 14, wherein the partitioning sub-module is further configured to:

16. The apparatus of claim 10, wherein the penalty function objective is to approximate a distance between positive and negative samples of the second model to a distance between positive and negative samples of the first model.

17. The apparatus of one of claims 10-16, further comprising:

a preprocessing unit configured to perform image preprocessing on a sample image in the sample image set and add the processed sample image to the sample image set, wherein the image preprocessing includes at least one of: image negation, image compression and contrast stretching.

18. An apparatus for generating information, comprising:

an image acquisition unit configured to acquire a target image;

an information generating unit configured to input the target image to a pre-trained feature extraction model, and generate a feature set corresponding to the target image, wherein the feature extraction model is trained by the method according to one of claims 1 to 8.

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-9.

21. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-9.