CN116563581A

CN116563581A - Training method and device for image detection model

Info

Publication number: CN116563581A
Application number: CN202310491483.9A
Authority: CN
Inventors: 戚风亮; 胡玉琛; 王洪彬
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-05-04
Filing date: 2023-05-04
Publication date: 2023-08-08

Abstract

The embodiment of the specification provides a training method and device for an image detection model, wherein the training method for the image detection model comprises the following steps: inputting the first sample image set and the second sample image set into a teacher network to be trained to perform category matching, and obtaining category matching probability; inputting the first sample image set and the second sample image set into a student network to be trained, generating multiple characteristics and fusing the characteristics of each sample image according to a similarity matrix corresponding to the first sample image set and the second sample image set, obtaining multiple characteristic sets and fused characteristic sets, calculating training loss based on the category matching probability, the multiple characteristic sets and the fused characteristic sets, and carrying out parameter adjustment on the student network to be trained and the teacher network to be trained based on the training loss to realize cooperation training of the student network and the teacher network.

Description

Training method and device for image detection model

Technical Field

The present document relates to the field of data processing technologies, and in particular, to a training method and apparatus for an image detection model.

Background

In the field of machine learning, there are two conventional learning methods: supervised learning and unsupervised learning. Semi-supervised learning is a key problem of pattern recognition and machine learning field study, and is a learning method combining supervised learning and unsupervised learning. It mainly considers the problem of how to train and classify with a small number of labeled samples and a large number of unlabeled samples. The method mainly comprises semi-supervised classification, semi-supervised regression, semi-supervised clustering and semi-supervised dimension reduction algorithm. How to use semi-supervised learning to perform more various and comprehensive model training is an important point of increasing attention of users.

Disclosure of Invention

One or more embodiments of the present specification provide a training method of an image detection model. The training method of the image detection model comprises the following steps: inputting the first sample image set and the second sample image set into a teacher network to be trained to perform category matching, and obtaining category matching probability; the first sample image set is composed of first sample images subjected to image category marking; the second sample image set is composed of second sample images that are not image class labeled. Inputting the first sample image set and the second sample image set into a student network to be trained to perform the following operations: according to the similarity matrix corresponding to the first sample image set and the second sample image set, generating multiple characteristics and fusing the characteristics of each sample image to obtain multiple characteristic sets and fused characteristic sets; and calculating training loss based on the category matching probability, the multi-element feature set and the fusion feature set, and carrying out parameter adjustment on the student network to be trained and the teacher network to be trained based on the training loss so as to construct an image detection model according to the teacher network obtained through training.

One or more embodiments of the present disclosure provide an image detection processing method, including: at least two images to be detected are acquired. Inputting the at least two images to be detected into an image detection model for image detection to obtain feature vectors; the image detection model is constructed based on the trained teacher network. And calculating the category matching probability of the at least two images to be detected based on the feature vector. Wherein the teacher network and the student network cooperate for training; the cooperation training comprises the following steps: performing category matching on the first sample image set and the second sample image set based on a teacher network to be trained to obtain sample category matching probability; based on the student network to be trained, parameter adjustment is carried out on the teacher network to be trained and the student network to be trained according to the sample category matching probability, the multi-feature set and the fusion feature set corresponding to the first sample image set and the second sample image set, so as to obtain the trained teacher network.

One or more embodiments of the present specification provide a training apparatus of an image detection model, including: the class matching module is configured to input the first sample image set and the second sample image set into a teacher network to be trained for class matching, so as to obtain class matching probability; the first sample image set is composed of first sample images subjected to image category marking; the second sample image set is composed of second sample images that are not image class labeled. Inputting the first sample image set and the second sample image set into a student network to be trained to run the following modules: the feature fusion module is configured to perform multi-element feature generation and feature fusion on each sample image according to a similarity matrix corresponding to the first sample image set and the second sample image set to obtain a multi-element feature set and a fusion feature set; and the parameter adjustment module is configured to calculate training loss based on the category matching probability, the multi-element feature set and the fusion feature set, and perform parameter adjustment on the student network to be trained and the teacher network to be trained based on the training loss so as to construct an image detection model according to the teacher network obtained through training.

One or more embodiments of the present specification provide an image detection processing apparatus including: and the image acquisition module is configured to acquire at least two images to be detected. The image detection module is configured to input the at least two images to be detected into an image detection model for image detection to obtain feature vectors; the image detection model is constructed based on the trained teacher network. And the probability calculation module is configured to calculate the category matching probability of the at least two images to be detected based on the feature vector. Wherein the teacher network and the student network cooperate for training; the cooperation training comprises the following steps: performing category matching on the first sample image set and the second sample image set based on a teacher network to be trained to obtain sample category matching probability; based on the student network to be trained, parameter adjustment is carried out on the teacher network to be trained and the student network to be trained according to the sample category matching probability, the multi-feature set and the fusion feature set corresponding to the first sample image set and the second sample image set, so as to obtain the trained teacher network.

One or more embodiments of the present specification provide a training apparatus of an image detection model, including: a processor; and a memory configured to store computer-executable instructions that, when executed, cause the processor to: the training method of the image detection model comprises the following steps: inputting the first sample image set and the second sample image set into a teacher network to be trained to perform category matching, and obtaining category matching probability; the first sample image set is composed of first sample images subjected to image category marking; the second sample image set is composed of second sample images that are not image class labeled. Inputting the first sample image set and the second sample image set into a student network to be trained to perform the following operations: according to the similarity matrix corresponding to the first sample image set and the second sample image set, generating multiple characteristics and fusing the characteristics of each sample image to obtain multiple characteristic sets and fused characteristic sets; and calculating training loss based on the category matching probability, the multi-element feature set and the fusion feature set, and carrying out parameter adjustment on the student network to be trained and the teacher network to be trained based on the training loss so as to construct an image detection model according to the teacher network obtained through training.

One or more embodiments of the present specification provide an image detection processing apparatus including: a processor; and a memory configured to store computer-executable instructions that, when executed, cause the processor to: at least two images to be detected are acquired. Inputting the at least two images to be detected into an image detection model for image detection to obtain feature vectors; the image detection model is constructed based on the trained teacher network. And calculating the category matching probability of the at least two images to be detected based on the feature vector. Wherein the teacher network and the student network cooperate for training; the cooperation training comprises the following steps: performing category matching on the first sample image set and the second sample image set based on a teacher network to be trained to obtain sample category matching probability; based on the student network to be trained, parameter adjustment is carried out on the teacher network to be trained and the student network to be trained according to the sample category matching probability, the multi-feature set and the fusion feature set corresponding to the first sample image set and the second sample image set, so as to obtain the trained teacher network.

One or more embodiments of the present specification provide a storage medium storing computer-executable instructions that, when executed by a processor, implement the following: the training method of the image detection model comprises the following steps: inputting the first sample image set and the second sample image set into a teacher network to be trained to perform category matching, and obtaining category matching probability; the first sample image set is composed of first sample images subjected to image category marking; the second sample image set is composed of second sample images that are not image class labeled. Inputting the first sample image set and the second sample image set into a student network to be trained to perform the following operations: according to the similarity matrix corresponding to the first sample image set and the second sample image set, generating multiple characteristics and fusing the characteristics of each sample image to obtain multiple characteristic sets and fused characteristic sets; and calculating training loss based on the category matching probability, the multi-element feature set and the fusion feature set, and carrying out parameter adjustment on the student network to be trained and the teacher network to be trained based on the training loss so as to construct an image detection model according to the teacher network obtained through training.

One or more embodiments of the present specification provide another storage medium storing computer-executable instructions that, when executed by a processor, implement the following: at least two images to be detected are acquired. Inputting the at least two images to be detected into an image detection model for image detection to obtain feature vectors; the image detection model is constructed based on the trained teacher network. And calculating the category matching probability of the at least two images to be detected based on the feature vector. Wherein the teacher network and the student network cooperate for training; the cooperation training comprises the following steps: performing category matching on the first sample image set and the second sample image set based on a teacher network to be trained to obtain sample category matching probability; based on the student network to be trained, parameter adjustment is carried out on the teacher network to be trained and the student network to be trained according to the sample category matching probability, the multi-feature set and the fusion feature set corresponding to the first sample image set and the second sample image set, so as to obtain the trained teacher network.

Drawings

For a clearer description of one or more embodiments of the present description or of the solutions of the prior art, the drawings that are needed in the description of the embodiments or of the prior art will be briefly described below, it being obvious that the drawings in the description that follow are only some of the embodiments described in the present description, from which other drawings can be obtained, without inventive faculty, for a person skilled in the art;

FIG. 1 is a schematic diagram of an implementation environment of a training method for an image detection model according to one or more embodiments of the present disclosure;

FIG. 2 is a process flow diagram of a training method for an image detection model according to one or more embodiments of the present disclosure;

FIG. 3 is a process flow diagram of an image detection processing method according to one or more embodiments of the present disclosure;

FIG. 4 is a schematic diagram of an embodiment of a training apparatus for an image detection model according to one or more embodiments of the present disclosure;

FIG. 5 is a schematic diagram of an embodiment of an image detection processing device according to one or more embodiments of the present disclosure;

FIG. 6 is a schematic structural diagram of a training device for an image detection model according to one or more embodiments of the present disclosure;

fig. 7 is a schematic structural diagram of an image detection processing apparatus according to one or more embodiments of the present disclosure.

Detailed Description

In order to enable a person skilled in the art to better understand the technical solutions in one or more embodiments of the present specification, the technical solutions in one or more embodiments of the present specification will be clearly and completely described below with reference to the drawings in one or more embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one or more embodiments of the present disclosure without inventive effort, are intended to be within the scope of the present disclosure.

As shown in fig. 1, in an aspect of one or more embodiments of the present description, an implementation environment includes a new model framework that includes two branches: a Teacher network to be trained (Teacher Model) and a Student network to be trained (Student Model);

the student network to be trained comprises three parts, namely a feature extraction layer Backbone, a pooling layer Pooler and a feature conversion layer Conv+Pooler;

the backbox is used for extracting a feature map of an input image; the Pooler is used for extracting feature vectors corresponding to the feature images; the Conv+Pooler is used for extracting the feature vector of the spliced combined feature map;

the teacher network to be trained comprises a feature extraction layer Backbone, a feature conversion layer Conv+Pooler and a full connection layer;

the backbox is used for extracting a feature map of an input image; the Conv+Pooler is used for extracting the feature vector of the spliced combined feature map; the full connection layer is used for carrying out classification prediction on the combined characteristics;

in addition, the teacher network to be trained also comprises a pooling layer Pooler for extracting feature vectors corresponding to the feature images;

the method comprises the steps that a teacher network to be trained and a student network to be trained share structures of a back bone and a Conv+Pooler, and parameter values of the back bone and the Conv+Pooler of the teacher network to be trained are updated by adopting an exponential sliding average (EMA) method through parameter values of the back bone and the Conv+Pooler of the student network to be trained;

In the implementation environment, after acquiring a first sample image set with image category marking and a second sample image set without image category marking, a teacher network to be trained performs the following operations: feature extraction is carried out on each sample image in the first sample image set and the second sample image set through the back bone layer, so that first sample image features and second sample image features are obtained, feature combination is carried out on each sample image based on the first sample image features and the second sample image features, combined features of each sample image are obtained, feature extraction is carried out on the combined features through Conv+Pooler, combined feature vectors are obtained, the combined feature vectors are input into the full-connection layer for classification prediction, and prediction probability of whether two sample images corresponding to each combined feature vector belong to the same image class is obtained;

the student network to be trained performs the following operations: extracting features of each sample image in a first sample image set and a second sample image set through a backfone layer to obtain first sample image features and second sample image features, extracting feature vectors of each sample image through a Pooler based on the first sample image features and the second sample image features to obtain first sample feature vectors and second image feature vectors, calculating a similarity matrix based on the first sample feature vectors and the second sample feature vectors, generating multiple features based on the similarity matrix to obtain multiple feature sets, combining the image features in the first sample image features and the second sample image features, inputting the combined features into Conv+pooler to perform feature extraction to obtain fusion feature vectors of fusion features, labeling the multiple features containing the second sample image features and the fusion features based on a teacher network to be trained, calculating training loss based on the multiple feature sets and the fusion feature sets to be labeled, and performing parameter adjustment on the teacher network to be trained and the student network to be trained based on the training loss;

After training to obtain a teacher network and a student network, an image detection model can be built based on the part contained in the teacher network according to actual requirements.

One or more embodiments of a training method for an image detection model provided in the present specification are as follows:

according to the training method of the image detection model, probability prediction of whether the second sample image which is not subjected to image category marking belongs to the same image category as the first sample image is performed through the teacher network to be trained, category matching probability obtained through combination of the teacher network probability prediction to be trained is obtained through two layers of multi-element features and fusion features of the student network to be trained, marking processing is performed on the multi-element features and the fusion features, training loss calculation is performed on the basis of the marked multi-element feature sets and the fusion feature sets, and then parameter adjustment is performed on the teacher network to be trained and the student network to be trained, so that the teacher network to be trained predicts the category matching probability of the second sample image which is not subjected to image category marking and the first sample image which is subjected to image category marking, utilization of a non-marked sample is achieved, the student network and the teacher network are applied to a semi-supervised learning frame, effectiveness of training samples is improved, and generalization capability of the obtained teacher network and the student network is improved.

Referring to fig. 2, the training method of the image detection model provided in the present embodiment specifically includes steps S202 to S204.

Step S202, inputting the first sample image set and the second sample image set into a teacher network to be trained to perform category matching, and obtaining category matching probability.

In this embodiment, the sample images in the first sample image set are sample images after the image class marking, and the sample images in the second sample image set are sample images without the image class marking; optionally, the first sample image set is composed of a first sample image subjected to image class marking; the second sample image set consists of second sample images which are not marked by image categories; optionally, the first sample image set is composed of a first number of first sample images subjected to image class marking; the first quantity is obtained by calculation based on the quantity of the image categories and the quantity of the sample images corresponding to the image categories; the number of the sample images corresponding to each image category is equal; the second sample image set is composed of a second number of second sample images that are not image class labeled.

For example, the first sample image set is composed of p×k images, where P represents the number of image categories (i.e., image IDs), and K represents the number of images under each image ID; the second sample image set is made up of any M images of unknown image categories.

In this embodiment, the teacher network and the student network are composed of a feature extraction layer, a pooling layer and/or a feature conversion layer; in addition, the teacher network can also comprise a full connection layer; optionally, the parameter values of the feature extraction layer in the teacher network are obtained by performing an exponential moving average processing on the parameter values of the feature extraction layer in the student network; the parameter values of the feature conversion layer in the teacher network are obtained by carrying out index moving average processing on the parameter values of the feature conversion layer in the student network.

In specific implementation, in order to promote the effectiveness of training, after an initial sample image set is acquired, sampling sample images of the initial sample image set to obtain a first sample image set and a second sample image set; optionally, in an optional implementation manner provided in this embodiment, the first sample image set and the second sample image set are obtained by:

acquiring an initial sample image set;

and sampling the sample image of the initial sample image set to obtain the first sample image set and the second sample image set.

Specifically, a first sample image set and a second sample image set are obtained by sampling sample images of an initial sample image set; wherein the first sample image sampling may be performed in a first number in the initial sample image set and the second sample image sampling may be performed in a second number in the initial sample image set.

Further, after the first sample image set and the second sample image set are obtained, in order to take the sample images which are not subjected to image category marking as effective samples, training of the teacher network and the student network is achieved.

In a specific execution process, after a first sample image set and a second sample image set are acquired, inputting the first sample image set and the second sample image set into a teacher network to be trained for category matching, and acquiring category matching probability; in an optional implementation manner provided in this embodiment, the teacher network performs category matching in the following manner:

inputting the first sample image set and the second sample image set into a feature extraction layer for feature extraction to obtain first sample image features and second sample image features;

performing feature combination on the first sample image features and the second sample image features to obtain a combined feature set, and inputting the combined feature set into a feature conversion layer to perform feature conversion to obtain a combined feature vector set;

And inputting the combined characteristic of the characteristic vector set containing the characteristic vector corresponding to the second sample image into a full-connection layer for classification prediction to obtain the class matching probability of whether the combined characteristic belongs to the same class.

Specifically, a feature extraction layer in the teacher network to be trained performs feature extraction on the first sample image set and the second sample image set to obtain first sample image features and second sample image features; the teacher network to be trained performs feature combination on the first sample image features and the second sample image features to obtain a combined feature set; the feature conversion layer in the teacher network to be trained performs feature conversion on each combined feature in the combined feature set to obtain a combined feature vector set; and the full connection layer in the teacher network to be trained carries out classification prediction on the feature vectors corresponding to the second sample images contained in the combined feature vector set to obtain the class matching probability of whether the images corresponding to the combined features containing the second sample images belong to the same image class.

In addition, in this embodiment, in the process of performing feature combination on the first sample image feature and the second sample image feature, feature combination may be performed on the first sample image feature and the second sample image feature according to a similarity matrix corresponding to the first sample image set and the second sample image set; specifically, in an optional implementation manner provided in this embodiment, after the first sample image set and the second sample image set are input to the feature extraction layer to perform feature extraction to obtain a first sample image feature and a second sample image feature, feature combination is performed on the first sample image feature and the second sample image feature to obtain a combined feature set, and before the combined feature set is input to the feature conversion layer to perform feature conversion, the following operations may be further performed to obtain a combined feature vector set:

Inputting the first sample image feature and the second sample image feature into a pooling layer for pooling treatment to obtain a first sample feature vector and a second sample feature vector;

and carrying out similarity calculation based on the first sample feature vector and the second sample feature vector to obtain a similarity matrix.

Correspondingly, the feature combination is carried out on the first sample image feature and the second sample image feature to obtain a combined feature set, the combined feature set is input into a feature conversion layer to carry out feature conversion, and a combined feature vector set is obtained, and the method comprises the following steps: and carrying out feature combination on the first sample image features and the second sample image features based on a similarity matrix obtained by similarity calculation to obtain a combined feature set, and inputting the combined feature set into a feature conversion layer to carry out feature conversion to obtain a combined feature vector set.

Optionally, the category matching probability is used for marking a multi-feature set generated by multi-feature generation and a fusion feature set obtained by feature fusion of the student network to be trained; specifically, the class matching probability is used for labeling the multi-element characteristics including the second sample image in the multi-element characteristic set generated by the student network to be trained and the fusion characteristics including the second sample image in the generated fusion characteristic set.

Step S204, inputting the first sample image set and the second sample image set into a student network to be trained, so as to perform the following operations: according to the similarity matrix corresponding to the first sample image set and the second sample image set, generating multiple characteristics and fusing the characteristics of each sample image to obtain multiple characteristic sets and fused characteristic sets; and calculating training loss based on the category matching probability, the multi-element feature set and the fusion feature set, and carrying out parameter adjustment on the student network to be trained and the teacher network to be trained based on the training loss so as to construct an image detection model according to the teacher network obtained through training.

The similarity matrix comprises a similarity matrix formed by the similarity between each sample image in the first sample image set and each sample image in the second sample image set, and optionally, the similarity matrix is obtained by calculating the feature vector of each sample image in the first sample image set and each sample image in the second sample image set.

The multi-element feature of any sample image in the multi-element feature set refers to a feature group formed by any sample image and at least one associated sample image; the multiple features of any of the plurality of sample images may be triplets of the any of the sample images; the fusion characteristic of any sample image refers to the fusion characteristic obtained after the fusion of any sample image and the associated sample image; wherein the multi-element features of any sample image are feature fusion-free; the fusion characteristic of any sample image is obtained by carrying out characteristic fusion on any sample image and the associated sample image.

The image detection model comprises a partially constructed model which is extracted from a teacher network according to actual requirements and used for model construction.

In a specific implementation process, after a first sample image set and a second sample image set are acquired, inputting the first sample image set and the second sample image set into a student network to be trained, performing similarity calculation on the student network to be trained based on the first sample image set and the second sample image set to obtain a similarity matrix, performing multi-element feature generation on each sample image according to the similarity matrix to obtain a multi-element feature set, and performing feature fusion on each sample image according to the similarity matrix to obtain a fusion feature set; optionally, the multi-feature set is composed of multi-features of each sample image; the fusion feature set is composed of fusion features of each sample image.

In order to improve the effectiveness of the obtained multi-feature set and the fusion feature set, in this embodiment, similarity matrix calculation is first performed according to the first sample image set and the second sample image set, and in order to improve the accuracy of the similarity matrix obtained by calculation, in an alternative implementation manner provided in this embodiment, the similarity matrix is calculated by adopting the following manner:

and carrying out similarity calculation based on the first sample feature vector and the second sample feature vector to obtain the similarity matrix.

Specifically, a first sample image set and a second sample image set are input into a feature extraction layer of a student network to be trained to perform feature extraction to obtain first sample image features and second sample image features, then the first sample image features and the second sample image features are input into a pooling layer of the student network to be trained to perform pooling treatment to obtain first sample feature vectors and second sample feature vectors, finally the student network to be trained performs similarity calculation based on the first sample feature vectors and the second sample feature vectors to obtain a similarity matrix formed by similarity between two sample images in the first sample image and the second sample image.

Further, after the similarity matrix is obtained through calculation, generating multiple characteristics of each sample image according to the similarity matrix to obtain a multiple characteristic set, and carrying out characteristic fusion on each sample image according to the similarity matrix to obtain a fusion characteristic set; in an optional implementation manner provided in this embodiment, the process of generating the multiple features and fusing the features of each sample image according to the similarity matrix corresponding to the first sample image set and the second sample image set to obtain the multiple feature set and the fused feature set is implemented in the following manner:

based on the similarity matrix, generating multiple characteristics of each sample image in the first sample image set and the second sample image set to obtain the multiple characteristic sets;

and carrying out feature fusion on each sample image according to the similarity matrix to obtain the fusion feature set.

In order to improve the efficiency of parameter adjustment, the problem that the parameter adjustment has no practical meaning due to the fact that images with the same or the same image category are selected to perform training loss calculation is avoided; in the embodiment, from the aspect of 'difficult force', the selection and the determination of the sample image associated with any sample image are performed; specifically, in an alternative implementation manner provided in this embodiment, taking a triplet as an example, a generation process of a multi-element feature of any sample image is specifically described:

Screening out a first target sample image with similarity smaller than other similarity with any sample image under the image category of the any sample image from the first sample image set and the second sample image set; the method comprises the steps of,

screening out second target sample images with similarity larger than other similarity with any sample image outside the image category;

a multivariate feature of the arbitrary sample image is generated based on the arbitrary sample image, the first target sample image, and the second target sample image.

In an optional implementation manner provided in this embodiment, the fusion feature of the feature vector of any sample in each sample image is generated in the following manner:

inputting the arbitrary sample image and the corresponding first target sample image into a feature fusion layer for feature fusion, and obtaining a first fusion feature of the arbitrary sample image; the method comprises the steps of,

inputting the arbitrary sample image and the corresponding second target sample image into the feature fusion layer for feature fusion, obtaining a second fusion feature of the arbitrary sample image, and taking the first fusion feature and the second fusion feature as fusion features of the arbitrary sample image.

Specifically, in order to promote the effectiveness of the obtained multi-element features and fusion features in model training, for any sample image, generating a triplet by using the sample image with the lowest similarity in the same image category and the sample image with the highest similarity in different image categories, performing feature fusion on the sample image with the lowest similarity in the same image category, and performing feature fusion on the sample image with the lowest similarity in different image categories.

Optionally, any sample image corresponds to the corresponding multivariate feature one by one; any sample image and corresponding fusion feature are one-to-many.

In addition, according to the similarity matrix corresponding to the first sample image set and the second sample image set, generating multiple features and fusing features of each sample image, so as to obtain multiple feature sets and fused feature sets, which can be replaced by determining a first target sample image and a second target sample image corresponding to each sample image according to the similarity matrix corresponding to the first sample image set and the second sample image set and the category matching probability, and generating multiple features and fusing features of each sample image and the corresponding first target sample image and second target sample image, so as to obtain multiple feature sets and fused feature sets, and forming a new implementation manner with other processing procedures provided by the embodiment; optionally, the first target sample image is a sample image with a similarity lower than that of other sample images under the same image category as the corresponding sample image; the second target sample image is a sample image with higher similarity than other sample images under different image categories with the corresponding sample image.

It should be noted that, the multiple features are generated from feature vector dimensions, and the fusion features are generated from feature dimensions; in an optional implementation manner provided in this embodiment, according to the similarity matrix corresponding to the first sample image set and the second sample image set, the multiple feature generation and feature fusion are performed on each sample image to obtain multiple feature sets and fusion feature sets, which may also be implemented in the following manner:

based on the similarity matrix, generating multiple characteristics of each sample characteristic vector in the first sample characteristic vector and the second sample characteristic vector to obtain the multiple characteristic set;

and carrying out feature fusion on each sample image feature in the first sample image feature and the second sample image feature according to the similarity matrix to obtain the fusion feature set.

In addition, the above-mentioned method may further include performing multiple feature generation and feature fusion on each sample image according to the similarity matrix corresponding to the first sample image set and the second sample image set to obtain multiple feature sets and fusion feature sets, or may further be replaced by performing multiple feature generation and feature fusion on each sample image according to the category matching probability and the similarity matrix corresponding to the first sample image set and the second sample image set to obtain multiple feature sets and fusion feature sets, and forming a new implementation manner with other processing procedures provided in this embodiment.

It should be noted that, since the second sample image is not marked by image category, the first sample image may be subjected to multiple feature generation and feature fusion to obtain multiple feature sets and fusion feature sets; that is, the above-mentioned process of generating and fusing multiple features for each sample image according to the similarity matrix corresponding to the first sample image set and the second sample image set to obtain multiple feature sets and fused feature sets may be replaced by a process of generating and fusing multiple features for each first image sample in the first sample image set according to the similarity matrix corresponding to the first sample image set and the second sample image set to obtain multiple feature sets and fused feature sets, and may form a new implementation manner with other processing procedures provided in this embodiment.

The above-mentioned process of generating multiple features and fusing features for each sample image according to the similarity matrix corresponding to the first sample image set and the second sample image set to obtain multiple feature sets and fused feature sets, except the above-mentioned provided process of determining the first target sample image and the second target sample image, generating multiple features and fusing features for each sample image based on the sample image and the corresponding first target sample image and second target sample image, may further adopt the following manner to generate multiple feature sets and fused features for any sample image:

Extracting sample images except any sample image in the first sample image set and the second sample image set and generating multiple characteristics of the sample images and the any sample image; the method comprises the steps of,

extracting sample images except any sample image in the first sample image set and the second sample image set, and performing feature fusion on the sample images and the any sample image;

optionally, the arbitrary sample image is another different sample image other than the arbitrary sample image.

In a specific implementation process, after a multi-feature set and a fusion feature set are obtained, training loss is calculated based on category matching probability, the multi-feature set and the fusion feature set, parameter adjustment is carried out on a student network to be trained and a teacher network to be trained based on the training loss, a trained student network and teacher network are obtained, and image detection model construction is carried out based on parts in the teacher network. Specifically, in the process of calculating the training loss based on the category matching probability, the multi-element feature set and the fusion feature set, namely, marking multi-element features containing the second sample image in the multi-element feature set based on the category matching probability, marking fusion features containing the second sample image in the fusion feature set based on the category matching probability, and calculating the training loss based on the marked multi-element feature set and the fusion feature set.

In order to improve the effectiveness of the obtained training loss from two aspects of the multi-feature set and the fusion feature set, in an alternative implementation provided in this embodiment, the process of calculating the training loss based on the category matching probability, the multi-feature set and the fusion feature set is implemented in the following manner:

based on the category matching probability, marking the multi-element features comprising the second sample image in the multi-element feature set, and marking the fusion features comprising the second sample image in the fusion feature set to obtain a multi-element feature set and a fusion feature set comprising a category mark;

calculating a multi-feature loss from the multi-feature set containing the classification markers, and calculating a classification loss from the fused feature set containing the classification markers;

the training loss is calculated based on the multivariate feature loss and the classification loss.

Optionally, the multi-feature set does not include the multi-feature of the second sample image and the fusion feature set does not include the fusion feature of the second sample image, and the marking processing is performed based on the image category marked by each first sample image in the first sample image set;

And marking the multi-feature set containing the multi-feature of the second sample image and the fusion feature set containing the fusion feature of the second sample image based on the category matching probability.

Specifically, the multi-element features in the multi-element feature set are marked based on the category matching probability, the fusion features in the fusion feature set are marked to obtain a multi-element feature set and a fusion feature set containing classification marks, the multi-element feature loss is calculated according to the multi-element feature set containing the classification marks, the classification loss is calculated according to the fusion feature set containing the classification marks, and the training loss is calculated based on the multi-element feature loss and the two classification losses.

In addition, calculating a training loss based on the class matching probability, the multiple feature set, and the fused feature set may also be replaced with calculating a training loss based on the class matching probability, an image class of the first sample image, the multiple feature set, and the fused feature set; or, marking the multiple features in the multiple feature set based on the category matching probability, marking the fusion features in the fusion feature set to obtain the multiple feature set and the fusion feature set containing the classification marks, and alternatively, marking the multiple features in the multiple feature set and the fusion feature set based on the category matching probability and the image category of the first sample image, and marking the fusion features in the fusion feature set to obtain the multiple feature set and the fusion feature set containing the classification marks.

The labels of the multivariate feature and the fusion feature are "0 (not belonging to the same image class)" and "1 (belonging to the same image class)". In the process of marking based on the category matching probability, if the category matching probability is higher than a preset threshold value, the two corresponding images are indicated to belong to the same image category; if the category matching probability is smaller than or equal to the preset threshold value, the corresponding two images are not classified into the same image category.

After the training loss is obtained through calculation, carrying out parameter adjustment on the student network to be trained and the teacher network to be trained based on the training loss until the student network to be trained and the teacher network to be trained are converged; in the process of adjusting parameters, optionally, after adjusting parameters of the backbond and conv+pooler of the student network to be trained, adjusting parameters of the backbond and conv+pooler of the teacher network to be trained by using an exponential moving average method.

In addition, corresponding to the process of extracting the sample images except any sample image from the first sample image set and the second sample image set and performing multi-feature generation and feature fusion on any sample image, in the process of calculating training loss based on the category prediction probability, the multi-feature set and the fusion feature set, firstly, labeling the multi-feature, the image category and the fusion feature based on the category prediction probability, the image category corresponding to the multi-feature and the image category corresponding to the fusion feature, and then calculating the training loss based on the multi-feature set and the fusion feature set carrying the classification mark. Optionally, in order to train from the "hard" level, in calculating training loss based on the multi-feature set carrying the classification mark and the fusion feature set, a multi-feature subset composed of the sample image, the multi-feature composed of the first target sample image and the second target sample image associated with the sample image may be extracted in the multi-feature set, and a fusion feature subset composed of the sample image and the fusion feature obtained by fusion of the associated first target sample image and/or the second target sample image may be extracted in the fusion feature set; training loss is summed based on the multi-feature subset and the fused feature subset.

In the process of calculating the training loss based on the multi-element feature set and the fusion feature set carrying the classification mark, the training loss can be performed aiming at the multi-element feature and the fusion feature corresponding to each sample image, and then the parameters of the student network to be trained and the teacher network to be trained are adjusted; the method can also be used for calculating the multi-element characteristic loss through the multi-element characteristic set and calculating the two-class loss through the fusion characteristic set, and calculating the training loss based on the multi-element characteristic loss and the two-class loss. The training loss may also be calculated based on other weights, and the embodiment is not limited herein.

In the specific execution process, after training a student network and a teacher network, constructing an image detection model according to the teacher network obtained by training in order to realize image detection; optionally, the image detection model may include a part of the teacher network, or may include all of the teacher network, or may be a combination of a part of the teacher network or all of the teacher network and a part of the student network; for example, if image classification is to be performed, an image detection model may be constructed using a Backbone and pooling layer in the teacher network; in an optional implementation manner provided in this embodiment, the image category detection is performed in the following manner:

Acquiring at least two images to be detected;

inputting the at least two images to be detected into an image detection model constructed based on a feature extraction layer and a pooling layer in the trained teacher network for image detection to obtain feature vectors;

and calculating the category matching probability of the at least two images to be detected based on the feature vector.

Specifically, an image to be detected is input into an image detection model constructed by a feature extraction layer and a pooling layer in a teacher network obtained after training is completed to carry out image detection, a feature vector is obtained, and then the category matching probability of the image to be detected is calculated based on the feature vector.

It should be noted that, in practical application, the student network and the teacher network obtained by training in this embodiment may be used to construct an image detection model according to practical needs, and may be applied to various scenes besides a category matching scene, for example, a similarity sorting scene, a candidate screening scene, etc., which is not limited herein.

It should be noted that, in the specific execution process of step S202 and step S204 in this embodiment, step S202 may be executed first, then step S204 may be executed first, then step S202 may be executed, or step S202 and step S204 may be executed simultaneously, and the execution sequence of step S202 and step S204 is not limited herein.

One or more embodiments of an image detection processing method provided in the present specification are as follows:

similar to the related content of the training method of the image detection model provided in the above embodiment, the related content of the image detection processing method provided in the present embodiment may be referred to or modified adaptively by reading the present embodiment, which is not described herein.

Referring to fig. 3, the image detection processing method provided in the present embodiment specifically includes steps S302 to S306.

Step S302, at least two images to be detected are acquired.

Step S304, inputting the at least two images to be detected into an image detection model for image detection to obtain feature vectors.

Optionally, the image detection model is constructed based on the trained teacher network.

Step S306, calculating the category matching probability of the at least two images to be detected based on the feature vectors.

Optionally, the teacher network and the student network cooperate for training; the cooperation training comprises the following steps: performing category matching on the first sample image set and the second sample image set based on a teacher network to be trained to obtain sample category matching probability; based on the student network to be trained, parameter adjustment is carried out on the teacher network to be trained and the student network to be trained according to the sample category matching probability, the multi-feature set and the fusion feature set corresponding to the first sample image set and the second sample image set, so as to obtain the trained teacher network.

In this embodiment, it may be identified whether the templates used by the document are the same template by capturing multiple images of one or more documents. Optionally, steps S302 to S306 may be replaced by acquiring a plurality of document images obtained by image capturing for at least two documents; inputting the plurality of document images into an image detection model for image detection to obtain feature vectors; and calculating the category matching probability between every two of the plurality of document images based on the feature vectors. Optionally, the plurality comprises at least two.

One or more embodiments of a training apparatus for an image detection model provided in the present specification are as follows:

in the foregoing embodiments, a training method of an image detection model is provided, and a training device of the image detection model is provided correspondingly, which is described below with reference to the accompanying drawings.

Referring to fig. 4, a schematic diagram of an embodiment of a training device for an image detection model according to the present embodiment is shown.

Since the apparatus embodiments correspond to the method embodiments, the description is relatively simple, and the relevant portions should be referred to the corresponding descriptions of the method embodiments provided above. The device embodiments described below are merely illustrative.

The present embodiment provides a training device for an image detection model, including:

the class matching module 402 is configured to input the first sample image set and the second sample image set into a teacher network to be trained for class matching, so as to obtain class matching probability; the first sample image set is composed of first sample images subjected to image category marking; the second sample image set consists of second sample images which are not marked by image categories;

inputting the first sample image set and the second sample image set into a student network to be trained to run the following modules:

the feature fusion module 404 is configured to perform multi-element feature generation and feature fusion on each sample image according to the similarity matrix corresponding to the first sample image set and the second sample image set, so as to obtain a multi-element feature set and a fusion feature set;

and a parameter adjustment module 406, configured to calculate a training loss based on the category matching probability, the multi-feature set and the fusion feature set, and perform parameter adjustment on the student network to be trained and the teacher network to be trained based on the training loss, so as to construct an image detection model according to the teacher network obtained by training.

One or more embodiments of an image detection processing apparatus provided in the present specification are as follows:

in the above-described embodiments, an image detection processing method and an image detection processing apparatus corresponding thereto are provided, and the following description is made with reference to the accompanying drawings.

Referring to fig. 5, a schematic diagram of an embodiment of an image detection processing apparatus provided in this embodiment is shown.

The present embodiment provides an image detection processing apparatus including:

an image acquisition module 502 configured to acquire at least two images to be detected;

the image detection module 504 is configured to input the at least two images to be detected into an image detection model for image detection to obtain feature vectors; the image detection model is constructed based on a trained teacher network;

a probability calculation module 506 configured to calculate a category matching probability of the at least two images to be detected based on the feature vector;

Wherein the teacher network and the student network cooperate for training; the cooperation training comprises the following steps: performing category matching on the first sample image set and the second sample image set based on a teacher network to be trained to obtain sample category matching probability; based on the student network to be trained, parameter adjustment is carried out on the teacher network to be trained and the student network to be trained according to the sample category matching probability, the multi-feature set and the fusion feature set corresponding to the first sample image set and the second sample image set, so as to obtain the trained teacher network.

in response to the above-described training method for an image detection model, one or more embodiments of the present disclosure further provide a training device for an image detection model, where the training device for an image detection model is used to perform the above-provided training method for an image detection model, and fig. 6 is a schematic structural diagram of the training device for an image detection model provided by one or more embodiments of the present disclosure.

The training device for an image detection model provided in this embodiment includes:

As shown in fig. 6, the training device of the image detection model may have a relatively large difference due to different configurations or performances, and may include one or more processors 601 and a memory 602, where one or more storage applications or data may be stored in the memory 602. Wherein the memory 602 may be transient storage or persistent storage. The application program stored in the memory 602 may include one or more modules (not shown in the figures), each of which may include a series of computer-executable instructions in a training device of the image detection model. Still further, the processor 601 may be arranged to communicate with the memory 602, executing a series of computer executable instructions in the memory 602 on a training device of the image detection model. The training device of the image detection model may also include one or more power supplies 603, one or more wired or wireless network interfaces 604, one or more input/output interfaces 605, one or more keyboards 606, and the like.

In a specific embodiment, the training device of the image detection model comprises a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may comprise one or more modules, and each module may comprise a series of computer-executable instructions in the training device of the image detection model, and configured to be executed by the one or more processors, the one or more programs comprising computer-executable instructions for:

Inputting the first sample image set and the second sample image set into a teacher network to be trained to perform category matching, and obtaining category matching probability; the first sample image set is composed of first sample images subjected to image category marking; the second sample image set consists of second sample images which are not marked by image categories;

inputting the first sample image set and the second sample image set into a student network to be trained to perform the following operations:

according to the similarity matrix corresponding to the first sample image set and the second sample image set, generating multiple characteristics and fusing the characteristics of each sample image to obtain multiple characteristic sets and fused characteristic sets;

and calculating training loss based on the category matching probability, the multi-element feature set and the fusion feature set, and carrying out parameter adjustment on the student network to be trained and the teacher network to be trained based on the training loss so as to construct an image detection model according to the teacher network obtained through training.

in correspondence to the above-described image detection processing method, one or more embodiments of the present disclosure further provide an image detection processing apparatus for performing the above-provided image detection processing method, and fig. 7 is a schematic structural diagram of the image detection processing apparatus provided by the one or more embodiments of the present disclosure, based on the same technical concept.

An image detection processing apparatus provided in this embodiment includes:

as shown in fig. 7, the image detection processing device may have a relatively large difference due to different configurations or performances, and may include one or more processors 701 and a memory 702, where the memory 702 may store one or more storage applications or data. Wherein the memory 702 may be transient storage or persistent storage. The application programs stored in the memory 702 may include one or more modules (not shown), each of which may include a series of computer-executable instructions in the image detection processing device. Still further, the processor 701 may be arranged to communicate with the memory 702 and execute a series of computer executable instructions in the memory 702 on the image detection processing device. The image detection processing device may also include one or more power supplies 703, one or more wired or wireless network interfaces 704, one or more input/output interfaces 705, one or more keyboards 706, and the like.

In a specific embodiment, the image detection processing device includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the image detection processing device, and the execution of the one or more programs by the one or more processors comprises computer-executable instructions for:

Acquiring at least two images to be detected;

inputting the at least two images to be detected into an image detection model for image detection to obtain feature vectors; the image detection model is constructed based on a trained teacher network;

calculating the category matching probability of the at least two images to be detected based on the feature vector;

One or more embodiments of a storage medium provided in the present specification are as follows:

one or more embodiments of the present disclosure further provide a storage medium, based on the same technical concept, corresponding to the training method of an image detection model described above.

The storage medium provided in this embodiment is configured to store computer executable instructions that, when executed by a processor, implement the following flow:

It should be noted that, in the present specification, the embodiment about the storage medium and the embodiment about the training method of the image detection model in the present specification are based on the same inventive concept, so that the specific implementation of this embodiment may refer to the implementation of the foregoing corresponding method, and the repetition is omitted.

One or more embodiments of another storage medium provided in the present specification are as follows:

in correspondence with the above-described image detection processing method, one or more embodiments of the present specification further provide a storage medium based on the same technical idea.

acquiring at least two images to be detected;

It should be noted that, the embodiments related to the storage medium in the present specification and the embodiments related to the image detection processing method in the present specification are based on the same inventive concept, so that the specific implementation of this embodiment may refer to the implementation of the foregoing corresponding method, and the repetition is not repeated.

In this specification, each embodiment is described in a progressive manner, and the same or similar parts of each embodiment are referred to each other, and each embodiment focuses on the differences from other embodiments, for example, an apparatus embodiment, and a storage medium embodiment, which are all similar to a method embodiment, so that description is relatively simple, and relevant content in reading apparatus embodiments, and storage medium embodiments is referred to the part description of the method embodiment.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In the 30 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each unit may be implemented in the same piece or pieces of software and/or hardware when implementing the embodiments of the present specification.

One skilled in the relevant art will recognize that one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

One or more embodiments of the present specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing description is by way of example only and is not intended to limit the present disclosure. Various modifications and changes may occur to those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. that fall within the spirit and principles of the present document are intended to be included within the scope of the claims of the present document.

Claims

1. A training method of an image detection model, comprising:

2. The training method of an image detection model according to claim 1, wherein the similarity matrix is obtained by calculating in the following manner:

3. The training method of an image detection model according to claim 1, wherein the performing multi-element feature generation and feature fusion on each sample image according to the similarity matrix corresponding to the first sample image set and the second sample image set to obtain a multi-element feature set and a fused feature set includes:

4. The training method of an image detection model according to claim 3, wherein the multivariate feature of any one of the sample images is generated by:

5. The training method of the image detection model according to claim 3, wherein the fusion feature of any sample image in each sample image is generated by the following way:

6. The training method of the image detection model according to claim 1, the calculating training loss based on the category matching probability, the multivariate feature set, and the fusion feature set, comprising:

7. The training method of an image detection model according to claim 6, wherein the plurality of feature sets do not include the plurality of features of the second sample image and the fusion feature sets do not include the fusion feature of the second sample image, and the marking process is performed based on the image category marked by each first sample image in the first sample image set;

8. The training method of an image detection model according to claim 1, the category matching comprising:

9. The training method of an image detection model according to claim 8, wherein the parameter values of the feature extraction layer in the teacher network are obtained by performing an exponential moving average process on the parameter values of the feature extraction layer in the student network;

the parameter values of the feature conversion layer in the teacher network are obtained by carrying out index moving average processing on the parameter values of the feature conversion layer in the student network.

10. The training method of an image detection model according to claim 1, the first sample image set being composed of a first number of first sample images subjected to image class marking; the first quantity is obtained by calculation based on the quantity of the image categories and the quantity of the sample images corresponding to the image categories; the number of the sample images corresponding to each image category is equal;

the second sample image set is composed of a second number of second sample images that are not image class labeled.

11. The training method of an image detection model according to claim 1, further comprising:

Acquiring at least two images to be detected;

12. The training method of an image detection model according to claim 2, wherein the performing multi-element feature generation and feature fusion on each sample image according to the similarity matrix corresponding to the first sample image set and the second sample image set to obtain a multi-element feature set and a fused feature set includes:

13. The training method of an image detection model according to claim 1, wherein the step of inputting the first sample image set and the second sample image set into a teacher network to be trained to perform category matching, and before the step of obtaining a category matching probability is performed, further comprises:

Acquiring an initial sample image set;

14. An image detection processing method, comprising:

acquiring at least two images to be detected;

15. A training apparatus for an image detection model, comprising:

The class matching module is configured to input the first sample image set and the second sample image set into a teacher network to be trained for class matching, so as to obtain class matching probability; the first sample image set is composed of first sample images subjected to image category marking; the second sample image set consists of second sample images which are not marked by image categories;

the feature fusion module is configured to perform multi-element feature generation and feature fusion on each sample image according to a similarity matrix corresponding to the first sample image set and the second sample image set to obtain a multi-element feature set and a fusion feature set;

and the parameter adjustment module is configured to calculate training loss based on the category matching probability, the multi-element feature set and the fusion feature set, and perform parameter adjustment on the student network to be trained and the teacher network to be trained based on the training loss so as to construct an image detection model according to the teacher network obtained through training.

16. An image detection processing apparatus comprising:

The image acquisition module is configured to acquire at least two images to be detected;

the image detection module is configured to input the at least two images to be detected into an image detection model for image detection to obtain feature vectors; the image detection model is constructed based on a trained teacher network;

a probability calculation module configured to calculate a category matching probability of the at least two images to be detected based on the feature vector;

17. A training apparatus of an image detection model, comprising:

a processor; and a memory configured to store computer-executable instructions that, when executed, cause the processor to:

18. An image detection processing apparatus comprising:

Acquiring at least two images to be detected;

19. A storage medium storing computer-executable instructions that when executed by a processor implement the following:

20. A storage medium storing computer-executable instructions that when executed by a processor implement the following:

acquiring at least two images to be detected;