CN112257808B - Integrated collaborative training method and device for zero sample classification and terminal equipment - Google Patents

Integrated collaborative training method and device for zero sample classification and terminal equipment Download PDF

Info

Publication number
CN112257808B
CN112257808B CN202011202927.5A CN202011202927A CN112257808B CN 112257808 B CN112257808 B CN 112257808B CN 202011202927 A CN202011202927 A CN 202011202927A CN 112257808 B CN112257808 B CN 112257808B
Authority
CN
China
Prior art keywords
invisible
class
attribute
network
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011202927.5A
Other languages
Chinese (zh)
Other versions
CN112257808A (en
Inventor
郭毅博
范一鸣
王海迪
孟文化
姜晓恒
徐明亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University
Original Assignee
Zhengzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University filed Critical Zhengzhou University
Priority to CN202011202927.5A priority Critical patent/CN112257808B/en
Publication of CN112257808A publication Critical patent/CN112257808A/en
Application granted granted Critical
Publication of CN112257808B publication Critical patent/CN112257808B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/259Fusion by voting

Abstract

The invention relates to an integrated collaborative training method, a device and a terminal device for zero sample classification, which are characterized in that an obtained data set is divided into a training set and a testing set, the training set is respectively called a visible class and an invisible class, attribute prediction networks with different structures are trained, two networks are selected as a main network and a secondary network from the training set, attribute mapping parameters are calculated, virtual features of the invisible class are synthesized according to the attribute mapping parameters, the virtual features and a plurality of classifiers are combined to complete training of the classifiers, the invisible class features are extracted by using the main network and the secondary network, the invisible class features are predicted by using the classifiers, pseudo labels are given to the invisible classes meeting conditions according to a classifier voting mechanism, the invisible classes given with the pseudo labels are added into the training set to train the attribute prediction networks again, the prediction precision of a network model is improved, meanwhile, different ZSL embedding methods can be used for training to further select the main network and the secondary network, the method is easy to be expanded to other zero sample learning methods, and the performance of the method is improved.

Description

Integrated collaborative training method and device for zero sample classification and terminal equipment
Technical Field
The invention relates to an integrated collaborative training method and device for zero sample classification and terminal equipment.
Background
Due to the effectiveness of deep learning on the image recognition problem, the supervised image recognition method achieves the exclamatory result in many fields, but a considerable amount of labeled samples are often needed to train a good enough network recognition model, and the model trained by using the known samples can only recognize the object class contained in the training set, so that the ability of recognizing the object class not contained in the training set is lacked. However, in real life, image data of some categories is deficient, image categories to be identified are continuously increased, meanwhile, the cost of retraining the model is high each time different categories of data are increased, the image identification field should not completely depend on the method which needs a large number of samples, and therefore, more challenging zero-sample learning is proposed, and the method aims to identify target examples from new category images which are never seen.
Early research on Zero-sample Learning dates back to 2008, larochelle H et al used a Zero-data Learning (ZSL) method for the character classification problem, palatucci M et al formally proposed a ZSL concept in the next year, larochelle H et al also proposed an attribute-based class migration Learning mechanism and an Attributes With Attributes (AWA) data set in the current year, and firstly proposed Direct Attribute Prediction (DAP) and Indirect Attribute Prediction (IAP) using the image recognition field as an application scene, because the study is different from the thinking way of the traditional image recognition task and the development requirements of the image recognition field, and Zero-sample Learning begins to attract wide attention. In the zero sample task, all categories are provided with relevant descriptions, such as common attribute characteristics of color, wings, crawling, tail and the like, which are shared among the categories, and the mapping relation between the image and the category label in the supervised image recognition problem is converted into the mapping relation between the image and the semantic and the category.
In the early ZSL method, classifiers of each attribute in DAP and IAP are trained independently, and the relationship between attributes in classes is not considered, so the latest ZSL method almost designs different constraint terms aiming at image visual features or semantics to learn the mapping between the image visual features and the class embedding, or constructs a general embedding space for the images and the semantic attributes to learn the mapping between the image visual features and the class embedding, for example, SJE proposed by Akata Z et al in 2015 completes compatibility modeling from the visual space to the semantic space by training a structured support vector machine, EXEM proposed by changpiyo S et al in 2017 projects semantic information to the visual feature center in the visual space, and LDF proposed by Li Y et al in 2018 constructs a potential feature embedding space to associate the visual and semantic information. However, the final objective of ZSL is to predict the object classes not contained in the training set, since the same attribute between the known class and the unknown class often contains different appearances (for example, the tail is an option, and the pigtail is very different in appearance compared with the tail of animals such as tigers and zebras), the domain shift problem may be caused, that is, the difference in corresponding visual features between different classes of the same attribute may be very large, when the network model is used to classify the test set, the new classes that have never been found are often classified into the known classes of the training set, resulting in poor prediction accuracy of the network model.
Disclosure of Invention
The invention aims to provide an integrated collaborative training method, an integrated collaborative training device and terminal equipment for zero sample classification, and aims to solve the problem that a network model obtained by the existing training method is poor in prediction accuracy.
In order to solve the problems, the invention adopts the following technical scheme:
an integrated collaborative training method for zero sample classification, comprising:
acquiring a data set and an attribute library thereof, dividing the data set into a training set and a testing set, and respectively calling the training set and the testing set as a visible class and an invisible class;
training attribute prediction networks with different structures, and selecting two networks as a main network and an auxiliary network according to the robustness and generalization capability of the different networks to invisible classes;
calculating the mapping relation between the visible class attributes and the invisible class attributes in the attribute library to obtain attribute mapping parameters, respectively extracting the image characteristics of the visible class by using a main network and a secondary network, and synthesizing the virtual characteristics of the invisible class according to the attribute mapping parameters;
the virtual features and the classifiers are combined to complete the training of the classifiers, the invisible features are extracted by using a main network and a secondary network, the invisible features are predicted by using the classifiers, the invisible classes meeting the conditions are endowed with pseudo labels according to a classifier voting mechanism, and the invisible classes endowed with the pseudo labels are added into a training set to train the attribute prediction network again.
Preferably, the dividing the data set into a training set and a test set, and the training set and the test set being respectively referred to as a visible class and an invisible class includes:
the data set is divided into a training set and a test set, and the training set and the test set are respectively called a visible class D S And invisible class D U
Wherein the visible sample label is
Figure GDA0003878169960000031
Representing a visible class data set D S The picture of (a) is the ith picture,
Figure GDA0003878169960000032
is composed of
Figure GDA0003878169960000033
The category label of (a) is used,
Figure GDA00038781699600000311
invisible class sample label is
Figure GDA0003878169960000034
Figure GDA0003878169960000035
Y U ∪Y S = Y; for each Y ∈ Y, there is a semantic attribute A associated with it y ={a 1 ,a 2 …,a l+n }。
Preferably, the training of the attribute prediction networks with different structures selects two networks as a primary network and a secondary network according to the robustness and generalization capability of the different networks to invisible classes, and includes:
training attribute prediction networks of different structures, attribute predictionNetwork by feature extraction function
Figure GDA0003878169960000036
And a classification function phi main Two parts, wherein the feature extraction function
Figure GDA0003878169960000037
As formula (1), the classification function φ main Is formula (2);
Figure GDA0003878169960000038
Figure GDA0003878169960000039
wherein, W cnn Parameters representing convolutional layers in the network, x represents the input image sample, W main Parameters representing a network full connection layer;
sending the visible classes into an attribute prediction network to train through a formula (3), and using a self-adaptive moment estimation optimizer by the optimizer;
Figure GDA00038781699600000310
where σ represents a sigmoid activation function, a i Is x i The attribute tag of (2);
the output result of the network is the predicted invisible class attribute, and the predicted invisible class attribute is input into a formula (4) to predict the class of the invisible class;
Figure GDA0003878169960000041
wherein phi pre A semantic attribute representing a prediction of the network,
Figure GDA0003878169960000042
representing true not in property librariesVisible class semantic attributes;
taking the formula (5) as an evaluation index, and selecting two networks with the highest evaluation indexes as a main network and an auxiliary network;
Figure GDA0003878169960000043
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0003878169960000044
indicates the Top-1 accuracy of the c-th class in the invisible classes, and gamma indicates the total number of classes in the test set that are invisible.
Preferably, the calculating a mapping relationship between attributes of a visible class and an invisible class in an attribute library to obtain attribute mapping parameters, respectively extracting image features of the visible class by using a primary network and a secondary network, and synthesizing virtual features of the invisible class according to the attribute mapping parameters includes:
regularizing the attributes in the attribute library, and calculating the attribute mapping parameters between the invisible classes and the visible classes through a formula (6)
Figure GDA0003878169960000045
Figure GDA0003878169960000046
Wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0003878169960000047
an attribute representing an invisible class is represented by,
Figure GDA0003878169960000048
an attribute representing a visible class;
visible class feature usage
Figure GDA0003878169960000049
Obtaining;
virtual feature adoption of invisible classes
Figure GDA00038781699600000410
In a manner described above.
Preferably, the training of the classifier is completed by combining the virtual features with a plurality of classifiers, the invisible features are extracted by using a primary network and a secondary network, the invisible features are predicted by using the classifier, the invisible classes meeting the conditions are endowed with pseudo labels according to a classifier voting mechanism, the invisible classes endowed with the pseudo labels are added into a training set to train the attribute prediction network again, and the method includes:
training the classifier according to the synthesized virtual features of the invisible classes and corresponding labels, obtaining predicted semantic attributes through a formula (7), and predicting the invisible classes through a formula (8);
Figure GDA0003878169960000051
Figure GDA0003878169960000052
wherein, F classification Representing a classifier used by the network;
and according to a classifier voting mechanism, giving a pseudo label to the invisible class meeting the conditions, adding the invisible class given with the pseudo label into a training set, and training the attribute prediction network again until the training is finished.
An integrated co-training apparatus for zero sample classification, comprising:
the data dividing module is used for acquiring a data set and an attribute library thereof, dividing the data set into a training set and a test set, and respectively calling the training set and the test set as a visible class and an invisible class;
the main network acquisition module and the auxiliary network acquisition module are used for training attribute prediction networks with different structures, and selecting two networks as a main network and an auxiliary network according to the robustness and generalization capability of the different networks to invisible classes;
the invisible class virtual feature synthesis module is used for calculating the mapping relation between visible class attributes and invisible class attributes in the attribute library to obtain attribute mapping parameters, respectively extracting the image features of the visible class by using the main network and the auxiliary network, and synthesizing the invisible class virtual features according to the attribute mapping parameters;
and the network training module is used for combining the virtual features with the plurality of classifiers to complete the training of the classifiers, extracting invisible features by using the main and auxiliary networks, predicting the invisible features by using the classifiers, endowing the invisible classes meeting the conditions with pseudo labels according to a classifier voting mechanism, adding the invisible classes endowed with the pseudo labels into a training set, and training the attribute prediction network again.
A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the integrated co-training method for zero sample classification as described above when executing the computer program.
The invention has the beneficial effects that: the method comprises the steps of dividing a data set into a training set and a testing set, respectively calling the training set and the testing set as a visible class and an invisible class, training attribute prediction networks with different structures, selecting two networks from the training set and the testing set as a main network and an auxiliary network according to robustness and generalization capability of different networks to the invisible class, synthesizing invisible class characteristics by combining attribute mapping parameters between the visible class and the invisible class, and finally using a plurality of classifiers to assign invisible class pseudo labels to the synthesized characteristics.
Drawings
In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings required to be used in the embodiments will be briefly described as follows:
fig. 1 is a schematic overall flowchart of an integrated collaborative training method for zero sample classification according to an embodiment of the present application;
FIG. 2 is a flowchart of an algorithm corresponding to the integrated co-training method for zero sample classification;
FIG. 3 is a graph showing the relationship between the number of training cycles and TOP-1 accuracy;
FIG. 4 is a schematic overall structure diagram of an integrated cooperative training apparatus for zero sample classification according to a second embodiment of the present application;
fig. 5 is a schematic structural diagram of a terminal device according to a third embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless otherwise specifically stated.
The integrated collaborative training method for zero sample classification provided by the embodiment of the application can be applied to terminal equipment such as a mobile phone, a tablet computer, a notebook computer and a personal computer, and the embodiment of the application does not limit the specific type of the terminal equipment. That is, the carrier of the client corresponding to the integrated collaborative training method for zero sample classification provided in the embodiment of the present application may be any one of the above terminal devices.
In order to explain the technical means described in the present application, the following description will be given by way of specific embodiments.
The first part is network selection, modifies full connection layers of pre-training convolutional neural networks with different structures, directly uses the full connection layers to learn mapping between visual features and semantics, and selects two networks as a main network and a secondary network according to robustness and generalization capability of different networks to test samples; the second part is pseudo label prediction, firstly, extracting characteristics from a training class and a testing class by using a main network and an auxiliary network, calculating mapping parameters between semantics of a testing class sample and a training class sample, and combining the mapping parameters with the training class characteristics to generate testing class virtual characteristics; and then constructing different classifiers, sending the virtual characteristics of the test class and corresponding labels into the classifiers for training, predicting the characteristics extracted from the test class, endowing the test class samples meeting the conditions with pseudo labels according to a classifier voting mechanism, adding the test class samples endowed with the pseudo labels into a training set, and training the convolutional neural network again. And finally, repeatedly executing the pseudo label prediction process until the network accuracy rate is not changed obviously.
Referring to fig. 1, which is a flowchart of an implementation process of an integrated collaborative training method for zero sample classification according to an embodiment of the present application, for convenience of description, only a part related to the embodiment of the present application is shown.
The integrated collaborative training method for zero sample classification comprises the following steps:
step S101: acquiring a data set and an attribute library thereof, dividing the data set into a training set and a testing set, and respectively calling the training set and the testing set as a visible class and an invisible class:
acquiring a data set and an attribute library thereof, and dividing the data set into a training set and a test set, such as: dividing a training set and a test set according to a preset proportion, and respectively calling the training set and the test set as a visible class D S And invisible class D U
Wherein the visible sample label is
Figure GDA0003878169960000081
I.e. there are a total of l pictures that have been marked,
Figure GDA0003878169960000082
representing a visible class data set D S The picture of (a) is the ith picture,
Figure GDA0003878169960000083
is composed of
Figure GDA0003878169960000084
The category label of (a) is set,
Figure GDA0003878169960000085
invisible class sample label is
Figure GDA0003878169960000086
Figure GDA0003878169960000087
I.e. there are a total of n pictures that are not marked,
Figure GDA0003878169960000088
Y U ∪Y S = Y. For each Y ∈ Y, there is a semantic attribute A associated with it y ={a 1 ,a 2 …,a l+n }。
Step S102: training attribute prediction networks with different structures, and selecting two networks as a main network and an auxiliary network according to the robustness and generalization capability of different networks to invisible classes:
fig. 2 is a flowchart of an algorithm corresponding to the integrated collaborative training method for zero sample classification provided by the present invention. The algorithm starts from the selection of the network, and aims to select two networks with strong robustness and generalization capability for invisible classes to form a cooperative network, wherein the two networks are respectively called as a main network and a secondary network according to network performance. For the same data set, the models obtained by learning different network architectures are different from the prediction distribution of the test set, meanwhile, the model follows the standard collaborative training rule, random noise is added into the training set to construct different data sets, and after the main network and the auxiliary network are selected, the auxiliary network is used for retraining the data set constructed by the random noise. It is assumed that the data can be classified from different angles to achieve a complementary effect. For the ZSL method based on embedding, training data are required to be sent into a convolutional neural network to obtain visual features and projected to a semantic space. Based on the idea of a collaborative training algorithm, by combining a data set added with noise, networks with different structures and a semantic classifier and combining the correlation between global features and image semantics, the prediction error in embedding visible labels is relieved in a self-adaptive manner from different angles of images, and the predicted invisible labels are added into a training set, so that the problem of domain deviation caused by only using the visible labels in the training process is solved.
The specific implementation of step S102 is given below:
training attribute prediction networks of different structures, the attribute prediction networks being derived from feature extraction functions
Figure GDA0003878169960000091
And a classification function phi main Two parts, wherein the feature extraction function
Figure GDA0003878169960000092
The classification function phi is given by equation (1) main Is formula (2);
Figure GDA0003878169960000093
Figure GDA0003878169960000094
wherein, W cnn Parameters representing convolutional layers in the network, x represents the input image sample, W main Parameters representing the network full connection layer.
The visible classes are fed into the attribute prediction network to be trained through formula (3), in this embodimentOn the training problem of the network, by giving the training sample with label
Figure GDA0003878169960000096
The loss function is minimized and different embedding methods have different loss functions. Equation (3) uses a binary cross entropy loss function to update the network parameters. The optimizer uses an adaptive moment estimation optimizer.
Figure GDA0003878169960000095
Where σ represents a sigmoid activation function, a i Is x i The attribute tag of (1).
The output result of the network is the predicted invisible class attribute, which is input into formula (4) to predict the class of the invisible class, in this embodiment, a metric measure using cosine similarity as the closeness of the prediction semantics and the invisible class semantics is used.
Figure GDA0003878169960000101
Wherein phi pre A semantic attribute representing a prediction of the network,
Figure GDA0003878169960000102
representing the real invisible class semantic attributes in the attribute library.
Taking the formula (5) as an evaluation index, and selecting two networks with the highest evaluation indexes as a main network and an auxiliary network;
Figure GDA0003878169960000103
wherein the content of the first and second substances,
Figure GDA0003878169960000104
indicates the Top-1 accuracy of the c-th class in the invisible classes, and gamma indicates the total number of classes in the test set that are invisible.
It should be understood that, in order to select a network with more robustness and generalization capability, as a specific embodiment, experiments are performed on the current mainstream convolutional neural networks VGG, google lenet, resNet and the EfficientNet which is the strongest in ImageNet, and two networks with the best performance are selected as the primary and secondary networks, as shown in table 1, table 1 is the average TOP-1 accuracy of different convolutional neural networks on different data sets, and it can be seen that the network effect of the ResNet series network is the best.
TABLE 1
Figure GDA0003878169960000105
Step S103: calculating the mapping relation between the visible class attributes and the invisible class attributes in the attribute library to obtain attribute mapping parameters, respectively extracting the image characteristics of the visible class by using a main network and an auxiliary network, and synthesizing the virtual characteristics of the invisible class according to the attribute mapping parameters:
regularizing the attributes in the attribute library, and calculating the attribute mapping parameters between the invisible classes and the visible classes through a formula (6)
Figure GDA0003878169960000111
Figure GDA0003878169960000112
Wherein the content of the first and second substances,
Figure GDA0003878169960000113
an attribute representing an invisible class is represented by,
Figure GDA0003878169960000114
attributes representing visible classes.
Prototype features are central features of all samples of each class, and virtual prototypes of invisible classes are synthesized by using visible class prototype features based on semantic information and used as input of a semantic classifier in the proposed algorithm. Visible type prototype feature (i.e. theVisible class feature) use
Figure GDA0003878169960000115
Is obtained by
Figure GDA0003878169960000116
And acquiring visible prototype features.
The virtual prototype features of the invisible class (i.e. the virtual features of the invisible class) can be obtained by using ridge regression:
Figure GDA0003878169960000117
step S104: the virtual features and the classifiers are combined to complete the training of the classifiers, the invisible features are extracted by using a main network and a secondary network, the invisible features are predicted by using the classifiers, pseudo labels are given to the invisible classes meeting the conditions according to a classifier voting mechanism, the invisible classes given the pseudo labels are added into a training set to train the attribute prediction network again:
the virtual features and a plurality of classifiers are combined to complete the training of the classifiers, the invisible features are extracted by using a main network and a secondary network, the invisible features are predicted by using the classifiers, and the invisible virtual features to be synthesized are
Figure GDA0003878169960000118
And corresponding labels are fed into the classifier, and the classifier is trained. In this embodiment, the current mainstream classifiers such as lasso regression, ridge regression, bayesian ridge regression, linear regression, support vector machine, random forest and the like can be considered.
The predicted semantic attributes of the prediction are obtained by equation (7):
Figure GDA0003878169960000119
wherein, F classification Representing the classifiers used by the network. Considering samples from multiple angles, improving generalization performance and reducing entrance of single classifierThe risk of local minimum points, followed by prediction of invisible classes by equation (8):
Figure GDA00038781699600001110
as a specific embodiment, under a certain hardware condition, such as a hardware condition with a CPU of i7-8700k, the average TOP-1 accuracy and training time of different classifiers on the CUB data set are counted, as shown in Table 2, and Table 2 shows the average TOP-1 accuracy and training time of different classifiers on the CUB data set, and considering both the accuracy and the time, LASSO regression, ridge regression, and Bayesian ridge regression can be selected as the classifier.
TABLE 2
Figure GDA0003878169960000121
And then, according to a classifier voting mechanism, giving a pseudo label to an invisible class meeting the conditions, wherein a general classifier voting rule means that a plurality of classifiers often have different prediction results, so that a voted classifier is obtained on the basis of the base classifiers, and the class with the most votes is taken as the class to be predicted. Here, there are three classifiers corresponding to the primary and secondary networks and each network, and a voting rule is formulated: selecting the best 5 prediction results from the classifier prediction results according to the accuracy, judging whether the number of invisible samples is more than half of the total number of invisible samples when 4 classifier voting rules are used, if so, adopting 4 classifier voting rules, otherwise, adopting 3 classifier voting rules to the D U A pseudo label is assigned and 3 classifier voting rules are employed when executing to the fourth loop. And (4) counting the accuracy of the network model under the SUN data set under different voting numbers Z, as shown in FIG. 3, it can be seen that the accuracy is not increased in the third cycle, and the network model reaches the best performance under the marking rule by considering the trend that the voting number can be increased by changing.
After the invisible classes are marked, the invisible classes are added into a training set to train the network again until the accuracy rate is not changed obviously (generally, the accuracy rate is not changed obviously when the 5 th cycle is executed), the invisible classes endowed with pseudo labels are added into the training set to train the attribute prediction network again (the steps S103 and S104 are repeatedly executed) until the training is finished (for example, the accuracy rate is not changed obviously, or the cycle number reaches the preset number).
Table 3 is a comparison graph of experimental results of the method provided in the present application and other existing methods, and at the same time, LFGAA proposed by Liu Y et al in 2019 is introduced as a method for training a network in a network selection module, from which it can be seen that the performance of the method provided in the present application is improved compared with other existing methods.
TABLE 3
Figure GDA0003878169960000131
In Table 3, y u Mean Top-1 accuracy, y, representing invisible class samples s Indicating the average Top-1 accuracy of the visual type samples tested, "-" indicating that the prior art did not disclose this result.
As can be seen from the above steps, the invisible class virtual prototype feature synthesized based on semantic information is utilized in the embodiment, but the invisible class virtual prototype feature cannot be guaranteed to be the same as a real image, so that a reliable invisible class label is obtained and put into a training set to retrain a network, so that a prediction error in embedding of the visible class label can be overcome, and the problem of domain deviation caused by training only using the visible class is further alleviated.
Fig. 4 shows a structural block diagram of an integrated cooperative training apparatus for zero sample classification provided in embodiment two of the present application, and for convenience of description, only the parts related to the embodiment of the present application are shown.
Referring to fig. 4, an integrated co-training apparatus 200 for zero sample classification includes:
the data dividing module 201 is configured to obtain a data set and an attribute library thereof, divide the data set into a training set and a test set, and respectively refer to the training set and the test set as a visible class and an invisible class;
the primary and secondary network acquisition module 202 is used for training attribute prediction networks with different structures, and selecting two networks as primary and secondary networks according to the robustness and generalization capability of the different networks to invisible classes;
the invisible class virtual feature synthesis module 203 is used for calculating the mapping relationship between the visible class and the invisible class attributes in the attribute library to obtain attribute mapping parameters, respectively extracting the image features of the visible class by using the main and auxiliary networks, and synthesizing the invisible class virtual features according to the attribute mapping parameters;
the network training module 204 is configured to combine the virtual features with multiple classifiers to complete training of the classifiers, extract invisible features using the primary and secondary networks, predict the invisible features using the classifiers, assign pseudo labels to the invisible classes meeting the conditions according to a classifier voting mechanism, add the invisible classes assigned with the pseudo labels into a training set, and train the attribute prediction network again.
It should be noted that, for the information interaction, the execution process, and other contents between the above-mentioned devices/modules, because the same concept is based on, the specific functions and the technical effects of the embodiment of the integrated collaborative training method for zero sample classification in the present application may be specifically referred to the section of the embodiment of the integrated collaborative training method for zero sample classification, and are not described herein again.
It is clear to those skilled in the art that, for the convenience and simplicity of description, the above division of the functional modules is only used for illustration, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the integrated cooperative training apparatus for zero sample classification 200 is divided into different functional modules to perform all or part of the above described functions. Each functional module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional modules are only used for distinguishing one functional module from another, and are not used for limiting the protection scope of the application. The specific working process of each functional module in the above description may refer to the corresponding process in the foregoing embodiment of the integrated collaborative training method for zero sample classification, and is not described herein again.
Fig. 5 is a schematic structural diagram of a terminal device according to a third embodiment of the present application. As shown in fig. 5, the terminal device 300 includes: a processor 302, a memory 301, and a computer program 303 stored in the memory 301 and operable on the processor 302. The number of the processors 302 is at least one, and fig. 5 takes one as an example. The processor 302, when executing the computer program 303, implements the implementation steps of the above-described integrated co-training method for zero sample classification, i.e., the steps shown in fig. 1.
The specific implementation process of the terminal device 300 can be seen in the above embodiment of the integrated collaborative training method for zero sample classification.
Illustratively, the computer program 303 may be partitioned into one or more modules/units that are stored in the memory 301 and executed by the processor 302 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 303 in the terminal device 300.
The terminal device 300 may be a computing device such as a desktop computer, a notebook, a palm computer, a main control, or a mobile terminal such as a mobile phone. Terminal device 300 may include, but is not limited to, a processor and a memory. Those skilled in the art will appreciate that fig. 5 is only an example of the terminal device 300 and does not constitute a limitation of the terminal device 300, and may include more or less components than those shown, or combine some of the components, or different components, for example, the terminal device 300 may further include input and output devices, network access devices, buses, etc.
The Processor 302 may be a CPU (Central Processing Unit), other general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array), other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 301 may be an internal storage unit of the terminal device 300, such as a hard disk or a memory. The memory 301 may also be an external storage device of the terminal device 300, such as a plug-in hard disk, SMC (Smart Media Card), SD (Secure Digital Card), flash Card, or the like provided on the terminal device 300. Further, the memory 301 may also include both an internal storage unit of the terminal device 300 and an external storage device. The memory 301 is used for storing an operating system, application programs, a boot loader, data, and other programs, such as program codes of the computer program 303. The memory 301 may also be used to temporarily store data that has been output or is to be output.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program may implement the steps in the above embodiment of the integrated collaborative training method for zero sample classification.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the embodiment of the integrated collaborative training method for zero sample classification described above may be implemented by instructing relevant hardware by a computer program, and the computer program 303 may be stored in a computer-readable storage medium, and when being executed by the processor 302, the computer program 303 may implement the steps of the embodiment of the integrated collaborative training method for zero sample classification described above. Wherein the computer program 303 comprises computer program code, and the computer program 303 code may be in a source code form, an object code form, an executable file or some intermediate form, and the like. The computer-readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, ROM (Read-Only Memory), RAM (Random Access Memory), electrical carrier wave signal, telecommunication signal, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In some jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and proprietary practices.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one type of logical function division, and other division manners may be available in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (4)

1. An integrated collaborative training method for zero sample classification, comprising:
acquiring a data set and an attribute library thereof, dividing the data set into a training set and a testing set, and respectively calling the training set and the testing set as a visible class and an invisible class;
training attribute prediction networks with different structures, and selecting two networks as a main network and an auxiliary network according to the robustness and generalization capability of different networks to invisible classes;
calculating the mapping relation between the visible class attributes and the invisible class attributes in the attribute library to obtain attribute mapping parameters, respectively extracting the image characteristics of the visible class by using a main network and a secondary network, and synthesizing the virtual characteristics of the invisible class according to the attribute mapping parameters;
combining the virtual features with a plurality of classifiers to complete the training of the classifiers, extracting invisible features by using a main network and a secondary network, predicting the invisible features by using the classifiers, giving pseudo labels to the invisible classes meeting conditions according to a classifier voting mechanism, adding the invisible classes given with the pseudo labels into a training set, and re-training an attribute prediction network;
the dividing of the data set into a training set and a test set, and the training set and the test set being respectively referred to as a visible class and an invisible class, includes:
the data set is divided into a training set and a test set, and the training set and the test set are respectively called a visible class D S And invisible class D U
Wherein the visible sample label is
Figure FDA0003878169950000011
Figure FDA0003878169950000012
Representing a visible class data set D S The (4) th picture in (1),
Figure FDA0003878169950000013
is composed of
Figure FDA0003878169950000014
The category label of (a) is set,
Figure FDA0003878169950000015
invisible class sample label is
Figure FDA0003878169950000016
Figure FDA0003878169950000017
Y U ∪Y S = Y; for each Y ∈ Y, there is a semantic attribute A associated with it y ={a 1 ,a 2 …,a l+n };
The method for training the attribute prediction networks with different structures selects two networks as a main network and an auxiliary network according to the robustness and the generalization capability of the different networks to invisible classes, and comprises the following steps:
training attribute prediction networks of different structures, wherein the attribute prediction networks are extracted by feature extraction functions
Figure FDA0003878169950000018
And a classification function phi main Two parts, wherein the feature extraction function
Figure FDA0003878169950000021
The classification function phi is given by equation (1) main Is formula (2);
Figure FDA0003878169950000022
Figure FDA0003878169950000023
wherein, W cnn Parameters representing convolutional layers in the network, x representing input image samples, W main Parameters representing a network full connection layer;
sending the visible classes into an attribute prediction network to be trained through a formula (3), and using a self-adaptive moment estimation optimizer by an optimizer;
Figure FDA0003878169950000024
wherein σ represents sigmoid activation function, a i Is x i The attribute tag of (1);
the output result of the network is the predicted invisible class attribute, and the predicted invisible class attribute is input into a formula (4) to predict the class of the invisible class;
Figure FDA0003878169950000025
wherein phi is pre Language expressing network predictionAn attribute is defined as an attribute of the object,
Figure FDA0003878169950000026
representing real invisible semantic attributes in an attribute library;
taking the formula (5) as an evaluation index, and selecting two networks with the highest evaluation indexes as a main network and an auxiliary network;
Figure FDA0003878169950000027
wherein the content of the first and second substances,
Figure FDA0003878169950000028
top-1 accuracy, representing the c-th class in the invisible class, y represents the total number of invisible class classes in the test set;
the method comprises the following steps of calculating the mapping relation between attributes of a visible class and an invisible class in an attribute library to obtain attribute mapping parameters, respectively extracting image features of the visible class by using a main network and a secondary network, and synthesizing virtual features of the invisible class according to the attribute mapping parameters, and comprises the following steps:
regularizing the attributes in the attribute library, and calculating the attribute mapping parameters between the invisible classes and the visible classes through a formula (6)
Figure FDA0003878169950000029
Figure FDA0003878169950000031
Wherein the content of the first and second substances,
Figure FDA0003878169950000032
an attribute representing an invisible class is provided,
Figure FDA0003878169950000033
an attribute representing a visible class;
visible type feature usage
Figure FDA0003878169950000034
Obtaining;
virtual feature adoption of invisible classes
Figure FDA0003878169950000035
In a manner described above.
2. The integrated collaborative training method for zero sample classification as claimed in claim 1, wherein the training of the classifier is completed by combining the virtual features with a plurality of classifiers, the invisible class features are extracted by using a primary and secondary network, the invisible class features are predicted by using the classifiers, the invisible class meeting the conditions is assigned with a pseudo label according to a classifier voting mechanism, and the invisible class assigned with the pseudo label is added into a training set to train the attribute prediction network again, including:
training the classifier according to the synthesized virtual features of the invisible classes and corresponding labels, obtaining predicted semantic attributes through a formula (7), and predicting the invisible classes through a formula (8);
Figure FDA0003878169950000036
Figure FDA0003878169950000037
wherein, F classification Representing a classifier used by the network;
and according to a classifier voting mechanism, giving a pseudo label to the invisible class meeting the conditions, adding the invisible class given with the pseudo label into a training set, and training the attribute prediction network again until the training is finished.
3. An integrated co-training apparatus for zero sample classification, comprising:
the data dividing module is used for acquiring the data set and the attribute library thereof, dividing the data set into a training set and a testing set, and respectively calling the training set and the testing set as a visible class and an invisible class;
the main and auxiliary network acquisition module is used for training attribute prediction networks with different structures, and selecting two networks as a main network and an auxiliary network according to the robustness and generalization capability of the different networks to invisible classes;
the invisible class virtual feature synthesis module is used for calculating the mapping relation between visible class attributes and invisible class attributes in the attribute library to obtain attribute mapping parameters, respectively extracting the image features of the visible class by using the main network and the auxiliary network, and synthesizing the invisible class virtual features according to the attribute mapping parameters;
the network training module is used for combining the virtual features with the plurality of classifiers to complete the training of the classifiers, extracting invisible features by using a main network and a secondary network, predicting the invisible features by using the classifiers, endowing the invisible classes meeting the conditions with pseudo labels according to a classifier voting mechanism, adding the invisible classes endowed with the pseudo labels into a training set, and training the attribute prediction network again;
the dividing the data set into a training set and a test set, and respectively calling the training set and the test set as a visible class and an invisible class includes:
the data set is divided into a training set and a test set, and the training set and the test set are respectively called a visible class D S And invisible class D U
Wherein the visible sample label is
Figure FDA0003878169950000041
Figure FDA0003878169950000042
Representing a visible class data set D S The picture of (a) is the ith picture,
Figure FDA0003878169950000043
is composed of
Figure FDA0003878169950000044
The category label of (a) is set,
Figure FDA0003878169950000045
invisible class sample label is
Figure FDA0003878169950000046
Figure FDA0003878169950000047
Y U ∪Y S = Y; for each Y e Y, there is a semantic attribute A associated with it y ={a 1 ,a 2 …,a l+n };
The method for training the attribute prediction networks with different structures selects two networks as a main network and an auxiliary network according to the robustness and the generalization capability of the different networks to invisible classes, and comprises the following steps:
training attribute prediction networks of different structures, wherein the attribute prediction networks are extracted by feature extraction functions
Figure FDA0003878169950000048
And a classification function phi main Two parts, wherein the feature extraction function
Figure FDA0003878169950000049
As formula (1), the classification function φ main Is formula (2);
Figure FDA00038781699500000410
Figure FDA00038781699500000411
wherein, W cnn Parameters representing convolutional layers in the network, x representing input image samples, W main Parameters representing a network full connection layer;
sending the visible classes into an attribute prediction network to train through a formula (3), and using a self-adaptive moment estimation optimizer by the optimizer;
Figure FDA00038781699500000412
where σ represents a sigmoid activation function, a i Is x i The attribute tag of (1);
the output result of the network is the predicted invisible class attribute, and the predicted invisible class attribute is input into a formula (4) to predict the class of the invisible class;
Figure FDA0003878169950000051
wherein phi pre A semantic attribute representing a prediction of the network,
Figure FDA0003878169950000052
representing real invisible semantic attributes in an attribute library;
taking the formula (5) as an evaluation index, and selecting two networks with the highest evaluation indexes as a main network and an auxiliary network;
Figure FDA0003878169950000053
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003878169950000054
top-1 accuracy representing the c-th category in the invisible category, and γ representing the total number of the invisible category in the test set;
the method comprises the following steps of calculating the mapping relation between attributes of a visible class and an invisible class in an attribute library to obtain attribute mapping parameters, respectively extracting image features of the visible class by using a main network and an auxiliary network, and synthesizing virtual features of the invisible class according to the attribute mapping parameters, and comprises the following steps:
regularizing the attributes in the attribute library, and calculating attribute mapping parameters between the invisible classes and the visible classes through a formula (6)
Figure FDA0003878169950000055
Figure FDA0003878169950000056
Wherein the content of the first and second substances,
Figure FDA0003878169950000057
an attribute representing an invisible class is represented by,
Figure FDA0003878169950000058
an attribute representing a visible class;
visible type feature usage
Figure FDA0003878169950000059
Obtaining;
virtual feature adoption of invisible classes
Figure FDA00038781699500000510
Is obtained in the following manner.
4. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the integrated co-training method for zero sample classification according to any of claims 1-2 when executing the computer program.
CN202011202927.5A 2020-11-02 2020-11-02 Integrated collaborative training method and device for zero sample classification and terminal equipment Active CN112257808B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011202927.5A CN112257808B (en) 2020-11-02 2020-11-02 Integrated collaborative training method and device for zero sample classification and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011202927.5A CN112257808B (en) 2020-11-02 2020-11-02 Integrated collaborative training method and device for zero sample classification and terminal equipment

Publications (2)

Publication Number Publication Date
CN112257808A CN112257808A (en) 2021-01-22
CN112257808B true CN112257808B (en) 2022-11-11

Family

ID=74267569

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011202927.5A Active CN112257808B (en) 2020-11-02 2020-11-02 Integrated collaborative training method and device for zero sample classification and terminal equipment

Country Status (1)

Country Link
CN (1) CN112257808B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949688A (en) * 2021-02-01 2021-06-11 哈尔滨市科佳通用机电股份有限公司 Motor train unit bottom plate rubber damage fault detection method, system and device
CN113283514A (en) * 2021-05-31 2021-08-20 高新兴科技集团股份有限公司 Unknown class classification method, device and medium based on deep learning
CN113688879A (en) * 2021-07-30 2021-11-23 南京理工大学 Generalized zero sample learning classification method based on confidence degree distribution external detection
CN113807420B (en) * 2021-09-06 2024-03-19 湖南大学 Domain self-adaptive target detection method and system considering category semantic matching
CN114005005B (en) * 2021-12-30 2022-03-22 深圳佑驾创新科技有限公司 Double-batch standardized zero-instance image classification method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163258A (en) * 2019-04-24 2019-08-23 浙江大学 A kind of zero sample learning method and system reassigning mechanism based on semantic attribute attention
CN110826638A (en) * 2019-11-12 2020-02-21 福州大学 Zero sample image classification model based on repeated attention network and method thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10908616B2 (en) * 2017-05-05 2021-02-02 Hrl Laboratories, Llc Attribute aware zero shot machine vision system via joint sparse representations

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163258A (en) * 2019-04-24 2019-08-23 浙江大学 A kind of zero sample learning method and system reassigning mechanism based on semantic attribute attention
CN110826638A (en) * 2019-11-12 2020-02-21 福州大学 Zero sample image classification model based on repeated attention network and method thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Global Semantic Consistency for Zero-Shot Learning;Fan Wu et al.;《https://arxiv.org/abs/1806.08503》;20180625;第1-18页 *
基于深度排序学习的零样本多标签图像分类;李慧慧;《中国优秀硕士学位论文全文数据库信息科技辑》;20200615;第2020年卷(第06期);第I138-960页 *

Also Published As

Publication number Publication date
CN112257808A (en) 2021-01-22

Similar Documents

Publication Publication Date Title
CN112257808B (en) Integrated collaborative training method and device for zero sample classification and terminal equipment
EP3598342B1 (en) Method and device for identifying object
Geman et al. Visual turing test for computer vision systems
CN111797893B (en) Neural network training method, image classification system and related equipment
WO2019100724A1 (en) Method and device for training multi-label classification model
Kao et al. Visual aesthetic quality assessment with a regression model
CN103268317B (en) Image is carried out the system and method for semantic annotations
CN110827129B (en) Commodity recommendation method and device
Yang et al. Benchmarking commercial emotion detection systems using realistic distortions of facial image datasets
CN109284675B (en) User identification method, device and equipment
US20230022387A1 (en) Method and apparatus for image segmentation model training and for image segmentation
JP2017062781A (en) Similarity-based detection of prominent objects using deep cnn pooling layers as features
CN110245714B (en) Image recognition method and device and electronic equipment
CN110377733B (en) Text-based emotion recognition method, terminal equipment and medium
CN115443490A (en) Image auditing method and device, equipment and storage medium
WO2020023760A1 (en) System and method for clustering products by combining attribute data with image recognition
Karaoglu et al. Detect2rank: Combining object detectors using learning to rank
Crabbé et al. Label-free explainability for unsupervised models
JP2022014776A (en) Activity detection device, activity detection system, and activity detection method
CN111325237A (en) Image identification method based on attention interaction mechanism
CN112749737A (en) Image classification method and device, electronic equipment and storage medium
CN114419378B (en) Image classification method and device, electronic equipment and medium
CN113657087B (en) Information matching method and device
CN113837257A (en) Target detection method and device
CN109885745A (en) A kind of user draws a portrait method, apparatus, readable storage medium storing program for executing and terminal device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant