CN114372580A

CN114372580A - Model training method, storage medium, electronic device, and computer program product

Info

Publication number: CN114372580A
Application number: CN202111671078.2A
Authority: CN
Inventors: 郑凯; 王远江; 袁野
Original assignee: Shenzhen Kuangshi Jinzhi Technology Co ltd; Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Current assignee: Shenzhen Kuangshi Jinzhi Technology Co ltd; Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-04-19

Abstract

An embodiment of the present application provides a model training method, a storage medium, an electronic device, and a computer program product, where the model training method includes: inputting the first sample image into a first student model for feature extraction to obtain a first query sample feature; inputting the second sample image into the first isomorphic teacher model for feature extraction to obtain a first candidate sample feature; inputting the first sample image and the second sample image into a heterogeneous teacher model for feature extraction to obtain image features; wherein the image features comprise second query sample features; determining a first sample feature relationship between the first query sample feature and the first candidate sample feature; respectively determining a second sample feature relationship between the second query sample feature and a plurality of second candidate sample features recorded by the heterogeneous teacher model; and training the first student model according to the first sample characteristic relation and the second sample characteristic relation. By means of the technical scheme, the performance of the first student model can be improved.

Description

Model training method, storage medium, electronic device, and computer program product

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a model training method, a storage medium, an electronic device, and a computer program product.

Background

Knowledge distillation is a common model compression method, and in a frame of a student teacher, the feature expression 'knowledge' learned by a complex teacher model with strong learning ability is distilled out and transmitted to a student network with small parameter and weak learning ability.

At present, the existing training method of the student model usually depends on historical knowledge provided by an isomorphic teacher model. However, the performance of the trained student model is poor because the knowledge provided by the isomorphic teacher model is limited.

Disclosure of Invention

An object of the embodiments of the present application is to provide a model training method, a storage medium, an electronic device, and a computer program product, so as to improve performance of a first student model.

In a first aspect, an embodiment of the present application provides a model training method, where the model training method includes: acquiring a first sample image and a second sample image; the first sample image and the second sample image are two images obtained by adopting different data enhancement modes for the same image; inputting the first sample image into a first student model for feature extraction to obtain first query sample features output by the first student model; inputting the second sample image into the first isomorphic teacher model for feature extraction to obtain a first candidate sample feature output by the first isomorphic teacher model; inputting the first sample image and the second sample image into the heterogeneous teacher model for feature extraction, and obtaining image features output by the heterogeneous teacher model; wherein the image features comprise second query sample features; determining a first sample feature relationship between the first query sample feature and the first candidate sample feature; respectively determining a second sample feature relationship between the second query sample feature and a plurality of second candidate sample features recorded by the heterogeneous teacher model; and training the first student model according to the first sample characteristic relation and the second sample characteristic relation.

Therefore, on the basis of the existing first isomorphic teacher model, the heterogeneous teacher model is introduced to make up the knowledge constraint of the isomorphic teacher model, and the first student model is trained by determining the first sample characteristic relationship and the second sample characteristic relationship and according to the first sample characteristic relationship and the second sample characteristic relationship, so that the relationship among the training sample characteristics is also considered in the process of training the first student model, and the performance of the first student model can be improved.

In one possible embodiment, the heterogeneous teacher model includes a second student model and a second homogeneous teacher model, the image features further include a second candidate sample feature; wherein, input first sample image and second sample image into heterogeneous teacher's model and carry out feature extraction, obtain the image feature of heterogeneous teacher's model output, include: inputting the first sample image into a second student model for feature extraction to obtain second query sample features output by the second student model; inputting the second sample image into a second isomorphic teacher model for feature extraction to obtain a second candidate sample feature output by the second isomorphic teacher model; wherein the plurality of second candidate sample features recorded by the heterogeneous teacher model include a second candidate sample feature currently output by the second homogeneous teacher model and at least one second candidate sample feature output at least once before.

In one possible embodiment, determining a second sample feature relationship between the second query sample feature and a plurality of second candidate sample features of the heterogeneous teacher model record, respectively, comprises: respectively determining whether the second query sample features and each second candidate sample feature belong to the same category; if the second query sample features belong to the same category, determining that a second sample feature relationship between the second query sample features and each second candidate sample feature is a positive sample relationship; and if the second candidate sample features belong to different categories, determining that a second sample feature relationship between the second query sample feature and each second candidate sample feature is a negative sample relationship.

In one possible embodiment, the determining whether the second query sample feature and each second candidate sample feature belong to the same category separately includes: acquiring a plurality of clustering centers; similarity calculation is carried out on the second query sample characteristic and each clustering center of the plurality of clustering centers, and a first similarity score between the second query sample characteristic and each clustering center is obtained; taking the cluster center with the highest similarity score in the plurality of first similarity scores as a first target cluster center corresponding to the second query sample characteristic; carrying out similarity calculation on the plurality of second candidate sample characteristics and each clustering center to obtain a second similarity score between each second candidate sample characteristic and each clustering center; taking the cluster center with the highest similarity score in the plurality of second similarity scores as a second target cluster center corresponding to each second candidate sample feature; if the first target clustering center and the second target clustering center are the same clustering center, determining that the second query sample features and each second candidate sample feature belong to the same category; and if the first target clustering center and the second target clustering center are different clustering centers, determining that the second query sample feature and each second candidate sample feature belong to different categories.

In one possible embodiment, training the first student model according to the first sample feature relationship and the second sample feature relationship comprises: generating a first supervision sub-signal by utilizing the first sample characteristic relation; wherein the first supervisory sub-signal comprises a first sample feature relationship identification, and the first sample feature relationship identification is used to represent a sample feature relationship of the first query sample feature and the first candidate sample feature; generating a second supervision sub-signal by utilizing the second sample characteristic relation; wherein the second supervisory sub-signal comprises a second sample feature relationship identification, and the second sample feature relationship identification is used to represent a sample feature relationship of the second query sample feature and the historical candidate sample features; generating a supervision signal according to the first supervision sub-signal and the second supervision sub-signal; the first student model is trained using the supervisory signals.

In one possible embodiment, generating the supervisory signal based on the first supervisory sub-signal and the second supervisory sub-signal comprises: and splicing the first supervision sub-signal and the second supervision sub-signal to obtain a supervision signal.

In a second aspect, an embodiment of the present application provides a model training apparatus, including: the acquisition module is used for acquiring a first sample image and a second sample image; the first sample image and the second sample image are two images obtained by adopting different data enhancement modes for the same image; the first input module is used for inputting the first sample image into the first student model for feature extraction to obtain first query sample features output by the first student model; the second input module is used for inputting the second sample image into the first isomorphic teacher model for feature extraction to obtain a first candidate sample feature output by the first isomorphic teacher model; the third input module is used for inputting the first sample image and the second sample image into the heterogeneous teacher model for feature extraction to obtain image features output by the heterogeneous teacher model; wherein the image features comprise second query sample features; a first determination module for determining a first sample feature relationship between the first query sample feature and the first candidate sample feature; the second determination module is used for respectively determining a second sample feature relationship between the second query sample feature and a plurality of second candidate sample features recorded by the heterogeneous teacher model; and the training module is used for training the first student model according to the first sample characteristic relation and the second sample characteristic relation.

In a possible embodiment, the determining module is specifically configured to: similarity calculation is carried out on the query sample characteristics, the candidate sample characteristics and each historical candidate sample characteristic respectively to obtain a plurality of similarity scores; and determining the sample feature relationship between the query sample feature and the candidate sample feature and the sample feature relationship between the query sample feature and each historical candidate sample feature according to the plurality of similarity scores.

In a possible embodiment, the determining module is specifically configured to: acquiring a clustering center pool containing a plurality of clustering centers; adding the candidate sample characteristics into a memory queue to obtain a new memory queue; and respectively carrying out similarity calculation on the query sample characteristic and each candidate sample characteristic in the new memory queue and each clustering center in the clustering center pool to obtain a plurality of similarity scores.

In one possible embodiment, the sample feature relationships include positive sample relationships and negative sample relationships;

a determination module specifically configured to: taking a cluster center with the highest similarity score in a plurality of similarity scores corresponding to the query sample characteristics as a first target cluster center corresponding to the query sample characteristics; taking a cluster center with the highest similarity score in a plurality of similarity scores corresponding to the first candidate sample feature as a second target cluster center corresponding to the first candidate sample feature, wherein the first candidate sample feature is any one candidate sample feature of the candidate sample features in the new memory queue; if the first target clustering center and the second target clustering center are the same clustering center, determining that the query sample feature and the first candidate sample feature belong to a positive sample relationship; and if the first target clustering center and the second target clustering center are different clustering centers, determining that the query sample feature and the first candidate sample feature belong to a negative sample relationship.

In a possible embodiment, the training module is specifically configured to: generating a supervision signal by utilizing a sample characteristic relation, wherein the supervision signal comprises a plurality of relation identifications, the relation identifications are in one-to-one correspondence with the candidate sample characteristics in the new memory queue, and each relation identification in the relation identifications is used for representing the sample characteristic relation between the corresponding candidate sample characteristic in the new memory queue and the query sample characteristic; and updating and training the student model by using the supervision signal.

In one possible embodiment, the training module is further configured to perform update training on the heterogeneous teacher model by using the supervision signal.

In a third aspect, an embodiment of the present application provides a storage medium, where a computer program is stored on the storage medium, and when the computer program is executed by a processor, the computer program performs the method according to the first aspect or any optional implementation manner of the first aspect.

In a fourth aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the method of the first aspect or any of the alternative implementations of the first aspect.

In a fifth aspect, the present application provides a computer program product which, when run on a computer, causes the computer to perform the method of the first aspect or any possible implementation manner of the first aspect.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

FIG. 1 illustrates a flow chart of a model training method provided by an embodiment of the present application;

FIG. 2 is a block diagram illustrating a model training apparatus according to an embodiment of the present disclosure;

fig. 3 shows a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In recent years, technical research based on artificial intelligence, such as computer vision, deep learning, machine learning, image processing, and image recognition, has been actively developed. Artificial Intelligence (AI) is an emerging scientific technology for studying and developing theories, methods, techniques and application systems for simulating and extending human Intelligence. The artificial intelligence subject is a comprehensive subject and relates to various technical categories such as chips, big data, cloud computing, internet of things, distributed storage, deep learning, machine learning and neural networks. Computer vision is used as an important branch of artificial intelligence, particularly a machine is used for identifying the world, and the computer vision technology generally comprises the technologies of face identification, living body detection, fingerprint identification and anti-counterfeiting verification, biological feature identification, face detection, pedestrian detection, target detection, pedestrian identification, image processing, image identification, image semantic understanding, image retrieval, character identification, video processing, video content identification, behavior identification, three-dimensional reconstruction, virtual reality, augmented reality, synchronous positioning and map construction (SLAM), computational photography, robot navigation and positioning and the like. With the research and progress of artificial intelligence technology, the technology is applied to various fields, such as security, city management, traffic management, building management, park management, face passage, face attendance, logistics management, warehouse management, robots, intelligent marketing, computational photography, mobile phone images, cloud services, smart homes, wearable equipment, unmanned driving, automatic driving, smart medical treatment, face payment, face unlocking, fingerprint unlocking, testimony verification, smart screens, smart televisions, cameras, mobile internet, live webcasts, beauty treatment, medical beauty treatment, intelligent temperature measurement and the like.

At present, in the process of training a student model, all training sample features (for example, candidate sample features and query sample features) are regarded as independent categories, and a constructed training task also distinguishes all training sample features indiscriminately, so that the relationship or feature similarity between the training sample features is largely ignored, and further the improvement of the performance of the student model is hindered.

Referring to fig. 1, fig. 1 shows a flowchart of a model training method provided in an embodiment of the present application. It is to be understood that the model training method may be performed by a model training apparatus, and the model training apparatus may be the model training apparatus shown in fig. 2. Meanwhile, the model training device can be set according to actual requirements, for example, the model training device can be a computer or the like. Specifically, the model training method includes S110-S170.

Step S110, acquiring a first sample image and a second sample image; wherein the first sample image and the second sample image are two images obtained by applying different data enhancement modes to the same image.

It should be understood that the specific enhancement mode corresponding to the data enhancement mode may be set according to actual requirements, and the embodiment of the present application is not limited thereto.

For example, the data enhancement mode may be color change, cropping, or the like.

Correspondingly, the changed color corresponding to the first data enhancement mode and the changed color corresponding to the second data enhancement mode may be different, or the cropping zone corresponding to the first data enhancement mode and the cropping zone corresponding to the second data enhancement mode may be different.

It should be understood that the data enhancement manner may also include, for example, rotating the image, changing color differences of the image, distorting the image, enhancing noise of the image, and the like, which is not limited in the embodiment of the present application.

It should also be understood that the specific manner in which the model training apparatus obtains the first sample image and the second sample image may also be set according to actual requirements.

For example, the model training device may perform data enhancement on the preset image in a first data enhancement manner to obtain a first sample image, and the model training device may further perform data enhancement on the preset image in a second data enhancement manner to obtain a second sample image. The first data enhancement mode and the second data enhancement mode are different data enhancement modes.

For another example, the model training device may receive the first sample image and the second sample image transmitted by the other device.

Step S120, inputting the first sample image into a first student model for feature extraction, and obtaining first query sample features output by the first student model.

It should be understood that the specific model of the first student model and the parameters and structures related to the model thereof may be set according to actual requirements, and the embodiment of the present application is not limited thereto.

Step S130, inputting the second sample image into the first isomorphic teacher model for feature extraction, and obtaining a first candidate sample feature output by the first isomorphic teacher model.

It should be understood that the specific model of the first isomorphic teacher model and the parameters and structures related to the model thereof may be set according to actual requirements, and the embodiments of the present application are not limited thereto.

For example, the model structure of the first homogeneous teacher model and the model structure of the first student model may be the same, but the model parameters of the first homogeneous teacher model and the model parameters of the first student model may be different.

It should be noted here that, in addition to the first candidate sample feature output this time, the first configuration teacher model may also record the first candidate sample feature output each time.

For example, during the nth training of the first student model, the first homogeneous teacher model can output the first candidate sample feature this time, and the first homogeneous teacher model can also record the first candidate sample feature output during each training from the 1 st training to the nth training, that is, the plurality of first candidate sample features recorded by the first homogeneous teacher model include the first candidate sample feature output by the first homogeneous teacher model nth time and the second candidate sample feature output by the first homogeneous teacher model N-1 times before. Wherein N is a positive integer greater than or equal to 2.

It should be understood that the specific manner in which the first homogeneous teacher model records the first candidate sample features may be set according to actual requirements.

For example, a first memory queue for recording first candidate sample features may be maintained in the first configuration teacher model, and when the first configuration teacher model outputs a new first candidate sample feature, the model training apparatus may add the new first candidate sample feature to the first memory queue to update the first memory queue.

Step S140, inputting the first sample image and the second sample image into the heterogeneous teacher model for feature extraction, and obtaining image features output by the heterogeneous teacher model.

It should be understood that the specific model of the heterogeneous teacher model and the parameters and structures related to the model may be set according to actual requirements, and the embodiments of the present application are not limited thereto.

For example, the heterogeneous teacher model may include a second student model and a second homogeneous teacher model. Wherein the second student model may be a more complex model than the first student model, the second student model being able to extract more detailed features than the first student model, i.e. the second query sample features have more details than the first query sample features; the second isomorphic teacher model may be a more complex model than the first isomorphic teacher model, and the second isomorphic teacher model may also be capable of extracting more detailed features than the first isomorphic teacher model, i.e., the second candidate sample features have more details than the first candidate sample features.

It should be noted here that the more complex structure may mean that the second heterogeneous teacher model has more layer structures than the first heterogeneous teacher model (or the second student model has more layer structures than the first student model), or may mean that the logical relationship between the layer structures of the second heterogeneous teacher model is more complex than the logical relationship between the layer structures of the first heterogeneous teacher model, or the like, or that the logical relationship between the layer structures of the second student model is more complex than the logical relationship between the layer structures of the first student model, or the like.

For example, the model structure of the second isomorphic teacher model and the model structure of the second student model may be the same, but the model parameters of the second isomorphic teacher model and the model parameters of the second student model may be different.

In order to facilitate understanding of step S140, the following description is made by way of specific embodiments.

Specifically, in the case that the heterogeneous teacher model includes a second student model and a second homogeneous teacher model, the first sample image may be input into the second student model for feature extraction to obtain a second query sample feature output by the second student model, and the second sample image may also be input into the second homogeneous teacher model for feature extraction to obtain a second candidate sample feature output by the second homogeneous teacher model.

And the second isomorphic teacher model may also record a plurality of second candidate sample features, and a specific manner of recording the plurality of second candidate sample features by the second isomorphic teacher model is similar to a specific manner of recording the plurality of first candidate sample features by the first isomorphic teacher model, which may be specifically referred to a description about recording the plurality of first candidate sample features by the first isomorphic teacher model in step S130, and is not described in detail herein.

For example, in a case where the plurality of second candidate sample features recorded by the second isomorphic teacher model includes a second candidate sample feature currently output by the second isomorphic teacher model and at least one second candidate sample feature previously output at least once, the second isomorphic teacher model may also maintain a second memory queue, and when the second isomorphic teacher model outputs a new second candidate sample feature, the model training apparatus may add the new second candidate sample feature to the second memory queue to update the second memory queue.

Step S150, a first sample feature relationship between the first query sample feature and the first candidate sample feature is determined.

It should be understood that the first sample feature relationship may be used to represent a first association relationship between the first query sample feature and the first candidate sample feature, and the first association relationship is used to represent whether an object corresponding to the first query sample feature and an object corresponding to the first candidate sample feature are the same object.

For example, in the case that the first sample image and the second sample image are obtained by using different data enhancement methods for images of the same preset object (e.g., lion, etc.), both the first query sample feature and the first candidate sample feature are related features about the same preset object, that is, the object corresponding to the first query sample feature and the object corresponding to the first candidate sample feature are the same object. The specific object of the preset object may be set according to actual requirements, and the embodiment of the application is not limited to this.

It should also be understood that the first sample characteristic relationship may include a positive sample relationship and a negative sample relationship. The positive sample relation is used for representing that an object corresponding to the first query sample characteristic and an object corresponding to the first candidate sample characteristic are the same object; the negative sample relationship is used to indicate that the object corresponding to the first query sample feature and the object corresponding to the first candidate sample feature are different objects.

It should also be understood that the specific manner of determining the first sample feature relationship between the first query sample feature and the first candidate sample feature may be set according to actual requirements, and the embodiments of the present application are not limited thereto.

For example, the process of determining the first sample characteristic relationship and the process of subsequently determining the second sample characteristic relationship may be similar, and specific reference may be made to the related description of subsequently determining the second sample characteristic relationship.

Step S160, respectively determining a second sample feature relationship between the second query sample feature and the plurality of second candidate sample features recorded by the heterogeneous teacher model.

It should also be understood that the second sample characteristic relationship is similar to the first sample characteristic relationship, and specific reference may be made to the related description of the first sample characteristic relationship in step S150, which is not described in detail herein.

For example, the second sample feature relationship is used to represent a second association between the second query sample feature and each of the plurality of second candidate sample features, and the second association may be used to represent whether the object to which the second query sample feature corresponds and the object to which each of the second candidate sample features corresponds are the same object.

It should be noted here that, since the image of the heterogeneous teacher model input each time may be an image of a different object, the model training apparatus needs to determine the second sample feature relationship between the second query sample feature output this time by the second student model and each of the second candidate sample features recorded by the second homogeneous teacher model.

It should also be understood that the specific process of determining the second sample feature relationship between the second query sample feature and the plurality of second candidate sample features recorded by the heterogeneous teacher model may be set according to actual needs, and the embodiments of the present application are not limited thereto.

Optionally, in step S160, it is determined whether the second query sample feature and each second candidate sample feature belong to the same category; if the second query sample features belong to the same category, determining that a second sample feature relationship between the second query sample features and each second candidate sample feature is a positive sample relationship; and if the second candidate sample features belong to different categories, determining that a second sample feature relationship between the second query sample feature and each second candidate sample feature is a negative sample relationship.

It should also be understood that the specific manner of determining whether the second query sample feature and each second candidate sample feature belong to the same category may be set according to actual requirements, and the embodiment of the present application is not limited thereto. Optionally, feature similarity between the second query sample feature and the current second candidate sample feature may be calculated, and if the feature similarity is greater than or equal to a preset similarity, it is determined that the second query sample feature and the current second candidate sample feature belong to the same category; and if the feature similarity is smaller than the preset similarity, determining that the second query sample feature and the current second candidate sample feature belong to different categories. Wherein the current second candidate sample feature is any one of the plurality of second candidate sample features.

It should be understood that the specific value of the preset similarity may be set and stored according to actual requirements, and the embodiment of the present application is not limited thereto.

Optionally, determining whether the second query sample feature and each second candidate sample feature belong to the same category includes: acquiring a plurality of clustering centers; similarity calculation is carried out on the second query sample characteristic and each clustering center of the plurality of clustering centers, and a first similarity score between the second query sample characteristic and each clustering center is obtained; taking the cluster center with the highest similarity score in the plurality of first similarity scores as a first target cluster center corresponding to the second query sample characteristic; carrying out similarity calculation on the current second candidate sample characteristic and each clustering center to obtain a second similarity score between the current second candidate sample characteristic and each clustering center; taking the cluster center with the highest similarity score in the plurality of second similarity scores as a second target cluster center corresponding to the current second candidate sample characteristic; if the first target clustering center and the second target clustering center are the same clustering center, determining that the second query sample feature and the current second candidate sample feature belong to the same category; and if the first target clustering center and the second target clustering center are different clustering centers, determining that the second query sample feature and the current second candidate sample feature belong to different categories. Wherein the current second candidate sample feature is any one of the plurality of second candidate sample features.

It should also be understood that the plurality of cluster centers may represent the results of all cluster centers after feature clustering of all second candidate sample features.

And S170, training the first student model according to the first sample characteristic relation and the second sample characteristic relation.

It should be understood that, according to the first sample feature relationship and the second sample feature relationship, the specific process of training the first student model may be set according to actual needs, and the embodiment of the present application is not limited thereto.

Optionally, training the first student model according to the first sample feature relationship and the second sample feature relationship includes: generating a first supervision sub-signal by utilizing the first sample characteristic relation; generating a second supervision sub-signal by utilizing the second sample characteristic relation; generating a supervision signal according to the first supervision sub-signal and the second supervision sub-signal; the first student model is trained using the supervisory signals. Wherein the first supervisory sub-signal comprises a first sample feature relationship identification, and the first sample feature relationship identification is used to represent a sample feature relationship of the first query sample feature and the first candidate sample feature; the second supervisory sub-signal includes a second sample feature relationship identification, and the second sample feature relationship identification is for representing a sample feature relationship of the second query sample feature and the historical candidate sample features.

It should also be understood that the specific identification of the first sample characteristic relationship identification can be set according to actual requirements.

For example, positive sample relationships are represented by plus signs and negative sample relationships are represented by minus signs.

Correspondingly, the specific identifier of the second sample characteristic relationship is similar to the specific identifier of the first sample characteristic relationship identifier, and the specific description of the first sample characteristic relationship identifier may be referred to specifically.

It should also be understood that, according to the first and second supervisory signals, the specific manner of generating the supervisory signal may be set according to actual requirements, and the embodiments of the present application are not limited thereto.

For example, the first supervisory sub-signal and the second supervisory sub-signal may be spliced to obtain the supervisory signal.

As another example, the first supervisory sub-signal and the second supervisory sub-signal may be weighted to obtain the supervisory signal.

It should also be understood that the specific process of training the first student model using the supervisory signals may refer to training the first student model using a gradient descent method, which may make the distance between two features of the positive sample relationship closer and closer, and may also make the distance between two features of the negative sample relationship farther and farther.

In addition, the second student model can be trained by using the supervision signal (or the second supervision sub-signal), so that the second query sample features output by the subsequent second student model contain more details, and the training effect of the first student model is further improved.

It should be understood that the specific process of training the second student model by using the supervisory signal (or the second supervisory sub-signal) can be referred to the related description of training the first student model by using the supervisory signal, and is not similarly described here.

In addition, after the updating training of the first student model is completed, the parameters of the first student model can be used for updating the parameters of the first isomorphic teacher model.

For example, after the current update training on the first student model is completed, a new parameter may be determined using the parameter B and the parameter a, and the new parameter may be used as a parameter of the first isomorphic teacher model, that is, the new parameter of the first isomorphic teacher model may be a × B + (1-a) × a. The value a is a preset weight, and the specific value of a may be set according to actual requirements, which is not limited in this embodiment of the application.

Correspondingly, the parameters of the second isomorphic teacher model can also be updated by using the parameters of the second student model, and the specific process is similar to the process of updating the parameters of the first isomorphic teacher model by using the parameters of the first student model, which can be specifically referred to the above related description.

Therefore, on the basis of the existing first isomorphic teacher model, the heterogeneous teacher model is introduced to make up the knowledge constraint of the isomorphic teacher model, the first student model is trained by determining the first sample characteristic relationship and the second sample characteristic relationship and according to the first sample characteristic relationship and the second sample characteristic relationship, so that the relationship among the training sample characteristics is also considered in the process of training the first student model, and the performance of the first student model can be improved.

It should be understood that the above model training method is only exemplary, and those skilled in the art can make various modifications according to the above method, and the solution after the modification also falls within the scope of the present application.

Referring to fig. 2, fig. 2 is a block diagram illustrating a model training apparatus 200 according to an embodiment of the present disclosure. It should be understood that the model training apparatus 200 is capable of performing the steps of the above method embodiments, the specific functions of the model training apparatus 200 can be referred to the above description, and the detailed description is omitted here to avoid redundancy. The model training apparatus 200 includes at least one software function module that can be stored in a memory in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the model training apparatus 200. Specifically, the model training apparatus 200 includes:

an obtaining module 210, configured to obtain a first sample image and a second sample image; the first sample image and the second sample image are two images obtained by adopting different data enhancement modes for the same image;

the first input module 220 is configured to input the first sample image into the first student model for feature extraction, so as to obtain a first query sample feature output by the first student model;

the second input module 230 is configured to input the second sample image into the first isomorphic teacher model for feature extraction, so as to obtain a first candidate sample feature output by the first isomorphic teacher model;

a third input module 240, configured to input the first sample image and the second sample image into the heterogeneous teacher model for feature extraction, so as to obtain image features output by the heterogeneous teacher model; wherein the image features comprise second query sample features;

a first determining module 250 for determining a first sample feature relationship between the first query sample feature and the first candidate sample feature;

a second determining module 260, configured to determine a second sample feature relationship between the second query sample feature and a plurality of second candidate sample features recorded by the heterogeneous teacher model;

and a training module 270 configured to train the first student model according to the first sample feature relationship and the second sample feature relationship.

In one possible embodiment, the heterogeneous teacher model includes a second student model and a second homogeneous teacher model, the image features further include a second candidate sample feature;

the third input module 240 is specifically configured to: inputting the first sample image into a second student model for feature extraction to obtain second query sample features output by the second student model; inputting the second sample image into a second isomorphic teacher model for feature extraction to obtain a second candidate sample feature output by the second isomorphic teacher model; wherein the plurality of second candidate sample features recorded by the heterogeneous teacher model include a second candidate sample feature currently output by the second homogeneous teacher model and at least one second candidate sample feature output at least once before.

In a possible embodiment, the second determining module 260 is specifically configured to: respectively determining whether the second query sample features and each second candidate sample feature belong to the same category; if the second query sample features belong to the same category, determining that a second sample feature relationship between the second query sample features and each second candidate sample feature is a positive sample relationship; and if the second candidate sample features belong to different categories, determining that a second sample feature relationship between the second query sample feature and each second candidate sample feature is a negative sample relationship.

In a possible embodiment, the second determining module 260 is specifically configured to: acquiring a plurality of clustering centers; similarity calculation is carried out on the second query sample characteristic and each clustering center of the plurality of clustering centers, and a first similarity score between the second query sample characteristic and each clustering center is obtained; taking the cluster center with the highest similarity score in the plurality of first similarity scores as a first target cluster center corresponding to the second query sample characteristic; carrying out similarity calculation on the plurality of second candidate sample characteristics and each clustering center to obtain a second similarity score between each second candidate sample characteristic and each clustering center; taking the cluster center with the highest similarity score in the plurality of second similarity scores as a second target cluster center corresponding to each second candidate sample feature; if the first target clustering center and the second target clustering center are the same clustering center, determining that the second query sample features and each second candidate sample feature belong to the same category; and if the first target clustering center and the second target clustering center are different clustering centers, determining that the second query sample feature and each second candidate sample feature belong to different categories.

In a possible embodiment, the training module 270 is specifically configured to: generating a first supervision sub-signal by utilizing the first sample characteristic relation; wherein the first supervisory sub-signal comprises a first sample feature relationship identification, and the first sample feature relationship identification is used to represent a sample feature relationship of the first query sample feature and the first candidate sample feature; generating a second supervision sub-signal by utilizing the second sample characteristic relation; wherein the second supervisory sub-signal comprises a second sample feature relationship identification, and the second sample feature relationship identification is used to represent a sample feature relationship of the second query sample feature and the historical candidate sample features; generating a supervision signal according to the first supervision sub-signal and the second supervision sub-signal; the first student model is trained using the supervisory signals.

In a possible embodiment, the training module 270 is specifically configured to splice the first supervisory sub-signal and the second supervisory sub-signal to obtain the supervisory signal.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method, and will not be described in too much detail herein.

Referring to fig. 3, fig. 3 is a block diagram illustrating an electronic device 300 according to an embodiment of the present disclosure. As shown in fig. 3, electronic device 300 may include a processor 310, a communication interface 320, a memory 330, and at least one communication bus 340. Wherein the communication bus 340 is used for realizing direct connection communication of these components. The communication interface 320 of the device in the embodiment of the present application is used for performing signaling or data communication with other node devices. The processor 310 may be an integrated circuit chip having signal processing capabilities. The Processor 310 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor 310 may be any conventional processor or the like.

The Memory 330 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 330 stores computer-readable instructions that, when executed by the processor 310, the electronic device 300 may perform the steps of the above-described method embodiments.

The electronic device 300 may further include a memory controller, an input-output unit, an audio unit, and a display unit.

The memory 330, the memory controller, the processor 310, the peripheral interface, the input/output unit, the audio unit, and the display unit are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, these elements may be electrically connected to each other via one or more communication buses 340. The processor 310 is used to execute executable modules stored in the memory 330, such as software functional modules or computer programs included in the electronic device 300.

The input and output unit is used for providing input data for a user to realize the interaction of the user and the server (or the local terminal). The input/output unit may be, but is not limited to, a mouse, a keyboard, and the like.

The audio unit provides an audio interface to the user, which may include one or more microphones, one or more speakers, and audio circuitry.

The display unit provides an interactive interface (e.g., a user interface) between the electronic device and the user or for displaying image data to the user reference. In this embodiment, the display unit may be a liquid crystal display or a touch display. In the case of a touch display, the display can be a capacitive touch screen or a resistive touch screen, which supports single-point and multi-point touch operations. The support of single-point and multi-point touch operations means that the touch display can sense touch operations simultaneously generated from one or more positions on the touch display, and the sensed touch operations are sent to the processor for calculation and processing.

It will be appreciated that the configuration shown in fig. 3 is merely illustrative and that electronic device 300 may include more or fewer components than shown in fig. 3 or have a different configuration than shown in fig. 3. The components shown in fig. 3 may be implemented in hardware, software, or a combination thereof.

The present application provides a storage medium having stored thereon a computer program which, when executed by a processor, performs the method of an embodiment.

The present application also provides a computer program product which, when run on a computer, causes the computer to perform the method of the method embodiments.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the system described above may refer to the corresponding process in the foregoing method, and will not be described in too much detail herein.

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of model training, comprising:

acquiring a first sample image and a second sample image; the first sample image and the second sample image are two images obtained by adopting different data enhancement modes for the same image;

inputting the first sample image into a first student model for feature extraction, and obtaining first query sample features output by the first student model;

inputting the second sample image into a first isomorphic teacher model for feature extraction, and obtaining a first candidate sample feature output by the first isomorphic teacher model;

inputting the first sample image and the second sample image into a heterogeneous teacher model for feature extraction, and obtaining image features output by the heterogeneous teacher model; wherein the image features comprise second query sample features;

determining a first sample feature relationship between the first query sample feature and the first candidate sample feature;

determining a second sample feature relationship between the second query sample feature and a plurality of second candidate sample features recorded by the heterogeneous teacher model, respectively;

and training the first student model according to the first sample characteristic relation and the second sample characteristic relation.

2. The model training method of claim 1, wherein the heterogeneous teacher model comprises a second student model and a second homogeneous teacher model, and the image features further comprise a second candidate sample feature;

the inputting the first sample image and the second sample image into a heterogeneous teacher model for feature extraction to obtain image features output by the heterogeneous teacher model includes:

inputting the first sample image into the second student model for feature extraction to obtain a second query sample feature output by the second student model; and the number of the first and second groups,

inputting the second sample image into the second isomorphic teacher model for feature extraction, and obtaining a second candidate sample feature output by the second isomorphic teacher model;

wherein the plurality of second candidate sample features recorded by the heterogeneous teacher model include the second candidate sample feature currently output by the second homogeneous teacher model and at least one second candidate sample feature previously output at least once.

3. The model training method of claim 2, wherein the determining a second sample feature relationship between the second query sample feature and a plurality of second candidate sample features of the heterogeneous teacher model record, respectively, comprises:

respectively determining whether the second query sample feature and each second candidate sample feature belong to the same category;

if the second query sample features belong to the same category, determining that a second sample feature relationship between the second query sample feature and each second candidate sample feature is a positive sample relationship;

and if the second candidate sample features belong to different categories, determining that a second sample feature relationship between the second query sample feature and each second candidate sample feature is a negative sample relationship.

4. The model training method of claim 3, wherein the separately determining whether the second query sample feature and each second candidate sample feature belong to the same category comprises:

acquiring a plurality of clustering centers;

performing similarity calculation on the second query sample characteristic and each cluster center of the plurality of cluster centers to obtain a first similarity score between the second query sample characteristic and each cluster center;

taking the cluster center with the highest similarity score in the plurality of first similarity scores as a first target cluster center corresponding to the second query sample feature;

performing similarity calculation on the plurality of second candidate sample features and each clustering center to obtain a second similarity score between each second candidate sample feature and each clustering center;

taking the cluster center with the highest similarity score in the plurality of second similarity scores as a second target cluster center corresponding to each second candidate sample feature;

if the first target clustering center and the second target clustering center are the same clustering center, determining that the second query sample feature and each second candidate sample feature belong to the same category;

and if the first target clustering center and the second target clustering center are different clustering centers, determining that the second query sample feature and each second candidate sample feature belong to different categories.

5. The model training method of claim 1, wherein the training of the first student model according to the first and second sample feature relationships comprises:

generating a first supervision sub-signal by utilizing the first sample characteristic relation; wherein the first supervisory sub-signal comprises a first sample feature relationship identification, and the first sample feature relationship identification is used to represent a sample feature relationship of the first query sample feature and the first candidate sample feature;

generating a second supervision sub-signal by utilizing the second sample characteristic relation; wherein the second supervisory sub-signal comprises a second sample feature relationship identification, and the second sample feature relationship identification is for representing a sample feature relationship of the second query sample feature and the historical candidate sample features;

generating a supervisory signal according to the first supervisory sub-signal and the second supervisory sub-signal;

training the first student model using the supervisory signals.

6. The model training method of claim 5, wherein the generating a supervisory signal from the first and second supervisory sub-signals comprises:

and splicing the first supervision sub-signal and the second supervision sub-signal to obtain the supervision signal.

7. A storage medium having stored thereon a computer program for performing the model training method according to any one of claims 1 to 6 when the computer program is executed by a processor.

8. An electronic device comprising a processor, a memory, and a computer program stored on the memory, wherein the processor executes the computer program to implement the model training method of any of claims 1-6.

9. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, carries out the model training method of any one of claims 1 to 6.