CN112257808A - Integrated collaborative training method and device for zero sample classification and terminal equipment - Google Patents

Integrated collaborative training method and device for zero sample classification and terminal equipment Download PDF

Info

Publication number
CN112257808A
CN112257808A CN202011202927.5A CN202011202927A CN112257808A CN 112257808 A CN112257808 A CN 112257808A CN 202011202927 A CN202011202927 A CN 202011202927A CN 112257808 A CN112257808 A CN 112257808A
Authority
CN
China
Prior art keywords
invisible
class
training
attribute
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011202927.5A
Other languages
Chinese (zh)
Other versions
CN112257808B (en
Inventor
郭毅博
范一鸣
王海迪
孟文化
姜晓恒
徐明亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University
Original Assignee
Zhengzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University filed Critical Zhengzhou University
Priority to CN202011202927.5A priority Critical patent/CN112257808B/en
Publication of CN112257808A publication Critical patent/CN112257808A/en
Application granted granted Critical
Publication of CN112257808B publication Critical patent/CN112257808B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/259Fusion by voting

Abstract

The invention relates to an integrated collaborative training method, a device and a terminal device for zero sample classification, which divide an obtained data set into a training set and a testing set which are respectively called as a visible class and an invisible class, train attribute prediction networks with different structures, select two networks as a main network and a secondary network from the training set, calculate attribute mapping parameters, synthesize virtual characteristics of the invisible class according to the attribute mapping parameters, combine the virtual characteristics with a plurality of classifiers to complete the training of the classifiers, extract the invisible class characteristics by using the main network and the secondary network, predict the invisible class characteristics by using the classifiers, assign pseudo labels to the invisible classes meeting the conditions according to a classifier voting mechanism, add the invisible classes assigned with the pseudo labels into the training set to train the attribute prediction networks again, improve the prediction precision of a network model, and simultaneously train by using different ZSL embedding methods to further select the main network and the secondary network, the method is easy to expand to other zero sample learning methods, and the method performance is improved.

Description

Integrated collaborative training method and device for zero sample classification and terminal equipment
Technical Field
The invention relates to an integrated collaborative training method and device for zero sample classification and terminal equipment.
Background
Due to the effectiveness of deep learning on the image recognition problem, the supervised image recognition method achieves the exclamatory result in many fields, but a considerable amount of labeled samples are often needed to train a good enough network recognition model, and the model trained by using the known samples can only recognize the object class contained in the training set, so that the ability of recognizing the object class not contained in the training set is lacked. However, in real life, image data of some categories is deficient, image categories to be identified are continuously increased, meanwhile, the cost of retraining the model is high each time different categories of data are increased, the image identification field should not completely depend on the method which needs a large number of samples, and therefore, more challenging zero-sample learning is proposed, and the method aims to identify target examples from new category images which are never seen.
Early research on Zero-sample Learning dates back to 2008, Larochelle H et al used a Zero-data Learning (ZSL) method for the character classification problem, Palatucci M et al formally proposed a ZSL concept, Larochelle H et al proposed an attribute-based class migration Learning mechanism and an Attributes With Attributes (AWA) data set in the current year, and first proposed Direct Attribute Prediction (DAP) and Indirect Attribute Prediction (IAP) using the image recognition field as an application scene, because the study is different from the thinking way of the traditional image recognition task and the development requirements of the image recognition field, and Zero-sample Learning began to attract wide attention. In the zero sample task, all categories are provided with relevant descriptions, such as common attribute characteristics of color, wings, crawling, tail and the like, which are shared among the categories, and the mapping relation between the image and the category label in the supervised image recognition problem is converted into the mapping relation between the image and the semantic and the category.
In the early ZSL method, classifiers of each attribute in DAP and IAP are trained independently, and the relationship between attributes in classes is not considered, so the latest ZSL method almost designs different constraint terms aiming at image visual features or semantics to learn the mapping between the image visual features and the class embedding, or constructs a general embedding space for the image and the semantic attributes thereof to learn the mapping between the image visual features and the class embedding, for example, SJE proposed by Akata Z et al in 2015 completes compatibility modeling from the visual space to the semantic space by training a structured support vector machine, EXEM proposed by Changpinyo S et al in 2017 projects semantic information to the visual feature center in the visual space, and LDF proposed by Li Y et al in 2018 constructs a potential feature embedding space to associate the visual and semantic information. However, the final objective of ZSL is to predict the object classes not contained in the training set, since the same attribute between the known class and the unknown class often contains different appearances (for example, the tail is an option, and the pigtail is very different in appearance compared with the tail of animals such as tigers and zebras), the domain shift problem may be caused, that is, the difference in corresponding visual features between different classes of the same attribute may be very large, when the network model is used to classify the test set, the new classes that have never been found are often classified into the known classes of the training set, resulting in poor prediction accuracy of the network model.
Disclosure of Invention
The invention aims to provide an integrated collaborative training method, an integrated collaborative training device and terminal equipment for zero sample classification, and aims to solve the problem that a network model obtained by the existing training method is poor in prediction accuracy.
In order to solve the problems, the invention adopts the following technical scheme:
an integrated collaborative training method for zero sample classification, comprising:
acquiring a data set and an attribute library thereof, dividing the data set into a training set and a testing set, and respectively calling the training set and the testing set as a visible class and an invisible class;
training attribute prediction networks with different structures, and selecting two networks as a main network and an auxiliary network according to the robustness and generalization capability of different networks to invisible classes;
calculating the mapping relation between the visible class attributes and the invisible class attributes in the attribute library to obtain attribute mapping parameters, respectively extracting the image characteristics of the visible class by using a main network and a secondary network, and synthesizing the virtual characteristics of the invisible class according to the attribute mapping parameters;
the virtual features and the classifiers are combined to complete the training of the classifiers, the invisible features are extracted by using a main network and a secondary network, the invisible features are predicted by using the classifiers, the invisible classes meeting the conditions are endowed with pseudo labels according to a classifier voting mechanism, and the invisible classes endowed with the pseudo labels are added into a training set to train the attribute prediction network again.
Preferably, the dividing the data set into a training set and a test set, and the training set and the test set being respectively referred to as a visible class and an invisible class includes:
the data set is divided into a training set and a test set, and the training set and the test set are respectively called a visible class DSAnd invisible class DU
Wherein the visible sample label is
Figure BDA0002756017490000031
Figure BDA0002756017490000032
Representing a visible class data set DSThe picture of (a) is the ith picture,
Figure BDA0002756017490000033
is composed of
Figure BDA0002756017490000034
The category label of (a) is set,
Figure BDA0002756017490000035
invisible class sample label is
Figure BDA0002756017490000036
YU∪YSY; for each Y ∈ Y, there is storedAt semantic attribute A associated therewithy={a1,a2…,al+n}。
Preferably, the training of the attribute prediction networks with different structures selects two networks as a primary network and a secondary network according to the robustness and generalization capability of the different networks to invisible classes, and includes:
training attribute prediction networks of different structures, wherein the attribute prediction networks are extracted by feature extraction functions
Figure BDA0002756017490000037
And a classification function phimainTwo parts, wherein the feature extraction function
Figure BDA0002756017490000038
The classification function phi is given by equation (1)mainIs formula (2);
Figure BDA0002756017490000039
Figure BDA00027560174900000310
wherein, WcnnParameters representing convolutional layers in the network, x representing input image samples, WmainA parameter representing a network full connectivity layer;
sending the visible classes into an attribute prediction network to be trained through a formula (3), and using a self-adaptive moment estimation optimizer by an optimizer;
Figure BDA00027560174900000311
where σ represents a sigmoid activation function, aiIs xiThe attribute tag of (1);
the output result of the network is the predicted invisible class attribute, and the predicted invisible class attribute is input into a formula (4) to predict the class of the invisible class;
Figure BDA0002756017490000041
wherein phi ispreA semantic attribute representing a prediction of the network,
Figure BDA0002756017490000042
representing real invisible semantic attributes in an attribute library;
taking the formula (5) as an evaluation index, and selecting two networks with the highest evaluation indexes as a main network and an auxiliary network;
Figure BDA0002756017490000043
wherein the content of the first and second substances,
Figure BDA0002756017490000044
top-1 accuracy, representing the c-th class in the invisible class, and γ representing the total number of invisible class classes in the test set.
Preferably, the calculating a mapping relationship between attributes of a visible class and an invisible class in an attribute library to obtain attribute mapping parameters, respectively extracting image features of the visible class by using a primary network and a secondary network, and synthesizing virtual features of the invisible class according to the attribute mapping parameters includes:
regularizing the attributes in the attribute library, and calculating the attribute mapping parameters between the invisible classes and the visible classes by the formula (6)
Figure BDA0002756017490000045
Figure BDA0002756017490000046
Wherein the content of the first and second substances,
Figure BDA0002756017490000047
an attribute representing an invisible class is represented by,
Figure BDA0002756017490000048
an attribute representing a visible class;
visible type feature usage
Figure BDA0002756017490000049
Obtaining;
virtual feature adoption of invisible classes
Figure BDA00027560174900000410
Is obtained in the following manner.
Preferably, the training of the classifier is completed by combining the virtual features with a plurality of classifiers, the invisible features are extracted by using a primary network and a secondary network, the invisible features are predicted by using the classifier, the invisible classes meeting the conditions are endowed with pseudo labels according to a classifier voting mechanism, the invisible classes endowed with the pseudo labels are added into a training set to train the attribute prediction network again, and the method includes:
training the classifier according to the synthesized virtual features of the invisible classes and corresponding labels, obtaining predicted semantic attributes through a formula (7), and predicting the invisible classes through a formula (8);
Figure BDA00027560174900000411
Figure BDA0002756017490000051
wherein, FclassificationRepresenting a classifier used by the network;
and according to a classifier voting mechanism, giving a pseudo label to the invisible class meeting the conditions, adding the invisible class given with the pseudo label into a training set, and training the attribute prediction network again until the training is finished.
An integrated collaborative training apparatus for zero sample classification, comprising:
the data dividing module is used for acquiring the data set and the attribute library thereof, dividing the data set into a training set and a testing set, and respectively calling the training set and the testing set as a visible class and an invisible class;
the main and auxiliary network acquisition module is used for training attribute prediction networks with different structures, and selecting two networks as a main network and an auxiliary network according to the robustness and generalization capability of the different networks to invisible classes;
the invisible class virtual feature synthesis module is used for calculating the mapping relation between visible class attributes and invisible class attributes in the attribute library to obtain attribute mapping parameters, respectively extracting the image features of the visible class by using the main network and the auxiliary network, and synthesizing the invisible class virtual features according to the attribute mapping parameters;
and the network training module is used for combining the virtual features with the plurality of classifiers to complete the training of the classifiers, extracting invisible features by using the main and auxiliary networks, predicting the invisible features by using the classifiers, endowing the invisible classes meeting the conditions with pseudo labels according to a classifier voting mechanism, adding the invisible classes endowed with the pseudo labels into a training set, and training the attribute prediction network again.
A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the integrated co-training method for zero sample classification as described above when executing the computer program.
The invention has the beneficial effects that: firstly, a data set is divided into a training set and a testing set, the training set and the testing set are respectively called as a visible class and an invisible class, attribute prediction networks with different structures are trained, two networks are selected as a main network and a secondary network according to the robustness and generalization capability of different networks to the invisible class, then invisible class characteristics are synthesized by combining attribute mapping parameters between the visible class and the invisible class, finally, the synthesized characteristics are endowed with invisible class pseudo labels by using a plurality of classifiers, as the invisible class is predicted by using the plurality of classifiers integrated by using the plurality of networks, the prediction error in embedding the visible class labels can be adaptively relieved from different angles of samples, the invisible class meeting conditions is endowed with pseudo labels according to a classifier voting mechanism, the invisible class endowed with the pseudo labels is added into the training set to train the attribute prediction networks again, the invisible classes are marked again in each cycle, so that the problem that the invisible classes cannot escape in the life due to one-time selection in a common pseudo label endowing method is solved, the problem that the prediction accuracy of a network model obtained by the conventional training method is poor can be solved, and different ZSL embedding methods can be used for training to further select a main network and a secondary network, namely the integrated collaborative training method for zero sample classification provided by the invention is easily expanded to other zero sample learning methods based on embedding, and the performance of the method is improved.
Drawings
In order to more clearly illustrate the technical solution of the embodiment of the present invention, the drawings needed to be used in the embodiment will be briefly described as follows:
fig. 1 is a schematic overall flowchart of an integrated collaborative training method for zero sample classification according to an embodiment of the present application;
FIG. 2 is a flowchart of an algorithm corresponding to the integrated co-training method for zero sample classification;
FIG. 3 is a graph showing the relationship between the number of training cycles and the accuracy of TOP-1;
FIG. 4 is a schematic overall structure diagram of an integrated cooperative training apparatus for zero sample classification according to a second embodiment of the present application;
fig. 5 is a schematic structural diagram of a terminal device according to a third embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
The integrated collaborative training method for zero sample classification provided by the embodiment of the application can be applied to terminal equipment such as mobile phones, tablet computers, notebook computers and personal computers, and the embodiment of the application does not limit the specific types of the terminal equipment. That is, the carrier of the client corresponding to the integrated collaborative training method for zero sample classification provided in the embodiment of the present application may be any one of the above terminal devices.
In order to explain the technical means described in the present application, the following description will be given by way of specific embodiments.
The first part is network selection, modifies full connection layers of pre-training convolutional neural networks with different structures, directly uses the full connection layers to learn mapping between visual features and semantics, and selects two networks as a main network and a secondary network according to robustness and generalization capability of different networks to test samples; the second part is pseudo label prediction, firstly, a main network and a secondary network are used for extracting characteristics of a training class and a testing class, a mapping parameter between semantics of a testing class sample and the training class sample is calculated, and the mapping parameter is combined with the training class characteristics to generate a testing class virtual characteristic; and then constructing different classifiers, sending the virtual features of the test class and corresponding labels into the classifiers for training, predicting the features extracted from the test class, assigning pseudo labels to the test class samples meeting the conditions according to a classifier voting mechanism, adding the test class samples assigned with the pseudo labels into a training set, and training the convolutional neural network again. And finally, repeatedly executing the pseudo label prediction process until the network accuracy rate is not changed obviously.
Referring to fig. 1, it is a flowchart of an implementation procedure of an integrated collaborative training method for zero sample classification provided in an embodiment of the present application, and for convenience of explanation, only a part related to the embodiment of the present application is shown.
The integrated collaborative training method for zero sample classification comprises the following steps:
step S101: acquiring a data set and an attribute library thereof, dividing the data set into a training set and a testing set, and respectively calling the training set and the testing set as a visible class and an invisible class:
acquiring a data set and an attribute library thereof, and dividing the data set into a training set and a test set, such as: dividing the training set and the test set according to a preset proportion, and respectively calling the training set and the test set as a visible class DSAnd invisible class DU
Wherein the visible sample label is
Figure BDA0002756017490000081
I.e. there are a total of l pictures that have been marked,
Figure BDA0002756017490000082
representing a visible class data set DSThe picture of (a) is the ith picture,
Figure BDA0002756017490000083
is composed of
Figure BDA0002756017490000084
The category label of (a) is set,
Figure BDA0002756017490000085
invisible class sample label is
Figure BDA0002756017490000086
Figure BDA0002756017490000087
I.e. there are a total of n unlabelled pictures,
Figure BDA0002756017490000088
YU∪YSy. For each Y ∈ Y, there is a semantic attribute A associated with ity={a1,a2…,al+n}。
Step S102: training attribute prediction networks with different structures, and selecting two networks as a main network and an auxiliary network according to the robustness and generalization capability of different networks to invisible classes:
fig. 2 is a flowchart of an algorithm corresponding to the integrated collaborative training method for zero sample classification provided by the present invention. The algorithm starts from the selection of the network, and aims to select two networks with strong robustness and generalization capability for invisible classes to form a cooperative network (the cooperative network is removed), and the two networks are respectively called as a main network and a secondary network according to network performance. For the same data set, the models obtained by learning different network architectures are different from the prediction distribution of the test set, meanwhile, the model follows the standard collaborative training rule, random noise is added into the training set to construct different data sets, and after the main network and the auxiliary network are selected, the auxiliary network is used for retraining the data set constructed by the random noise. It is assumed that the data can be classified from different angles to achieve a complementary effect. For the embedding-based ZSL method, training data needs to be sent into a convolutional neural network to obtain visual features and projected to a semantic space. Based on the idea of a collaborative training algorithm, by combining a data set added with noise, networks with different structures and a semantic classifier and combining the correlation between global features and image semantics, the prediction error in the embedding of visible labels is relieved from different angles of images in a self-adaptive manner, and the predicted invisible labels are added into a training set, so that the problem of domain deviation caused by only using the visible labels in the training process is reduced.
The specific implementation of step S102 is given below:
training attribute prediction networks of different structures, wherein the attribute prediction networks are extracted by feature extraction functions
Figure BDA0002756017490000091
And a classification function phimainTwo parts, wherein the feature extraction function
Figure BDA0002756017490000092
The classification function phi is given by equation (1)mainIs formula (2);
Figure BDA0002756017490000093
Figure BDA0002756017490000094
wherein, WcnnParameters representing convolutional layers in the network, x representing input image samples, WmainRepresenting parameters of a network full connectivity layer.
The visible classes are fed into the attribute prediction network to be trained through formula (3), in the embodiment, on the aspect of training problems of the network, the training samples with labels are given
Figure BDA0002756017490000095
The loss function is minimized and different embedding methods have different loss functions. Equation (3) uses a binary cross entropy loss function to update the network parameters. The optimizer uses an adaptive moment estimation optimizer.
Figure BDA0002756017490000096
Where σ represents a sigmoid activation function, aiIs xiThe attribute tag of (1).
The output result of the network is the predicted invisible class attribute, which is input into formula (4) to predict the class of the invisible class, in this embodiment, a metric measure using cosine similarity as the closeness of the prediction semantics and the invisible class semantics is used.
Figure BDA0002756017490000101
Wherein phi ispreA semantic attribute representing a prediction of the network,
Figure BDA0002756017490000102
representing the real invisible class semantic attributes in the attribute library.
Taking the formula (5) as an evaluation index, and selecting two networks with the highest evaluation indexes as a main network and an auxiliary network;
Figure BDA0002756017490000103
wherein the content of the first and second substances,
Figure BDA0002756017490000104
top-1 accuracy, representing the c-th class in the invisible class, and γ representing the total number of invisible class classes in the test set.
It should be understood that, in order to select a network with more robustness and generalization capability, as a specific embodiment, experiments are performed on the current mainstream convolutional neural networks VGG, google lenet, ResNet and the EfficientNet which is the strongest in ImageNet, and two networks with the best performance are selected as the primary and secondary networks, as shown in table 1, table 1 is the average TOP-1 accuracy of different convolutional neural networks on different data sets, and it can be seen that the network effect of the ResNet series network is the best.
TABLE 1
Figure BDA0002756017490000105
Step S103: calculating the mapping relation between the visible class and the invisible class in the attribute library to obtain attribute mapping parameters, respectively extracting the image characteristics of the visible class by using a main network and a secondary network, and synthesizing the virtual characteristics of the invisible class according to the attribute mapping parameters:
regularizing the attributes in the attribute library, and calculating the attribute mapping parameters between the invisible classes and the visible classes by the formula (6)
Figure BDA0002756017490000106
Figure BDA0002756017490000111
Wherein the content of the first and second substances,
Figure BDA0002756017490000112
an attribute representing an invisible class is represented by,
Figure BDA0002756017490000113
attributes representing visible classes.
Prototype features are central features of all samples of each class, and virtual prototypes of invisible classes are synthesized by using visible class prototype features based on semantic information and used as input of a semantic classifier in the proposed algorithm. Visible type prototype feature (i.e., visible type feature) usage
Figure BDA0002756017490000114
Is obtained by
Figure BDA0002756017490000115
And acquiring the visible prototype features.
The virtual prototype features of the invisible class (i.e. the virtual features of the invisible class) can be obtained by using ridge regression:
Figure BDA0002756017490000116
step S104: the virtual features and the classifiers are combined to complete the training of the classifiers, the invisible features are extracted by using a main network and a secondary network, the invisible features are predicted by using the classifiers, pseudo labels are given to the invisible classes meeting the conditions according to a classifier voting mechanism, the invisible classes given the pseudo labels are added into a training set to train the attribute prediction network again:
the virtual features and a plurality of classifiers are combined to complete the training of the classifiers, the invisible features are extracted by using a main network and a secondary network, the invisible features are predicted by using the classifiers, and the invisible virtual features to be synthesized are
Figure BDA0002756017490000119
And corresponding labels are fed into the classifier to train the classifier. In this embodiment, the current mainstream classifiers such as lasso regression, ridge regression, bayesian ridge regression, linear regression, support vector machine, random forest and the like can be considered.
The predicted semantic attributes are obtained by equation (7):
Figure BDA0002756017490000117
wherein, FclassificationRepresenting the classifiers used by the network. Considering samples from multiple angles, improving generalization performance, reducing the risk of a single classifier entering a local minimum point, and then predicting invisible classes through formula (8):
Figure BDA0002756017490000118
as a specific embodiment, under a certain hardware condition, for example, the hardware condition that the CPU is i7-8700k, the average TOP-1 accuracy and the training time of different classifiers on the CUB data set are counted, as shown in Table 2, Table 2 shows the average TOP-1 accuracy and the training time of different classifiers on the CUB data set, and the LASSO regression, the ridge regression and the Bayesian ridge regression can be selected as the classifiers in consideration of the accuracy and the time.
TABLE 2
Figure BDA0002756017490000121
And then, according to a classifier voting mechanism, giving a pseudo label to an invisible class meeting the conditions, wherein a general classifier voting rule means that a plurality of classifiers often have different prediction results, so that a voted classifier is obtained on the basis of the base classifiers, and the class with the most votes is taken as the class to be predicted. Here, there are main and auxiliary networks and three classifiers corresponding to each network, and a voting rule is formulated: selecting the best 5 prediction results from the classifier prediction results according to the accuracy, judging whether the number of invisible samples is more than half of the total number of invisible samples when 4 classifier voting rules are used, if so, adopting 4 classifier voting rules, otherwise, adopting 3 classifier voting rules to the DUGiving a false label when executing to the secondFour cycles, 3 classifier voting rules are used. And (4) counting the accuracy of the network model under the SUN data set under different voting numbers Z, as shown in FIG. 3, it can be seen that the accuracy is not increased in the third cycle, and the network model reaches the best performance under the marking rule by considering the trend that the voting number can be increased by changing.
After the marking of the invisible classes is completed, the invisible classes are added into the training set to train the network again until the accuracy rate is not changed obviously (generally, the accuracy rate is not changed obviously when the 5 th cycle is executed), that is, the invisible classes given with the pseudo labels are added into the training set to train the attribute prediction network again (the steps S103 and S104 are repeatedly executed) until the training is completed (for example, the accuracy rate is not changed obviously or the cycle times reach the preset times).
Table 3 is a comparison graph of the experimental results of the method provided by the present application and other existing methods, and at the same time, LFGAA proposed by Liu Y et al in 2019 is introduced as a method for training a network in a network selection module, which shows that the performance of the method provided by the present application is improved compared with other existing methods.
TABLE 3
Figure BDA0002756017490000131
In Table 3, yuMean Top-1 accuracy, y, representing invisible class samplessIndicating the average Top-1 accuracy of the visible type samples tested, "-" indicating that the prior art did not disclose this result.
As can be seen from the above steps, the invisible class virtual prototype feature synthesized based on semantic information is utilized in the embodiment, but the invisible class virtual prototype feature cannot be guaranteed to be the same as a real image, so that a reliable invisible class label is obtained and put into a training set to retrain a network, so that a prediction error in embedding of the visible class label can be overcome, and the problem of domain deviation caused by training only using the visible class is further alleviated.
Fig. 4 shows a structural block diagram of an integrated collaborative training apparatus for zero sample classification provided in the second embodiment of the present application, and for convenience of explanation, only the parts related to the second embodiment of the present application are shown.
Referring to fig. 4, an integrated co-training apparatus 200 for zero sample classification includes:
the data dividing module 201 is configured to obtain a data set and an attribute library thereof, divide the data set into a training set and a test set, and respectively refer to the training set and the test set as a visible class and an invisible class;
the primary and secondary network acquisition module 202 is used for training attribute prediction networks with different structures, and selecting two networks as primary and secondary networks according to the robustness and generalization capability of the different networks to invisible classes;
the invisible class virtual feature synthesis module 203 is used for calculating the mapping relationship between the visible class and the invisible class attributes in the attribute library to obtain attribute mapping parameters, respectively extracting the image features of the visible class by using the main and auxiliary networks, and synthesizing the invisible class virtual features according to the attribute mapping parameters;
the network training module 204 is configured to combine the virtual features with the multiple classifiers to complete training of the classifiers, extract the invisible features by using the primary and secondary networks, predict the invisible features by using the classifiers, assign pseudo labels to the invisible classes meeting the conditions according to a classifier voting mechanism, add the invisible classes assigned with the pseudo labels into a training set, and train the attribute prediction network again.
It should be noted that, for the information interaction, the execution process, and other contents between the above-mentioned devices/modules, because the same concept is based on, the specific functions and the technical effects of the embodiment of the integrated collaborative training method for zero sample classification in the present application may be specifically referred to the section of the embodiment of the integrated collaborative training method for zero sample classification, and are not described herein again.
It is clear to those skilled in the art that, for the convenience and simplicity of description, the above division of the functional modules is only used for illustration, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the integrated cooperative training apparatus for zero sample classification 200 is divided into different functional modules to perform all or part of the above described functions. Each functional module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional modules are only used for distinguishing one functional module from another, and are not used for limiting the protection scope of the application. The specific working process of each functional module in the above description may refer to the corresponding process in the foregoing embodiment of the integrated collaborative training method for zero sample classification, and is not described herein again.
Fig. 5 is a schematic structural diagram of a terminal device according to a third embodiment of the present application. As shown in fig. 5, the terminal device 300 includes: a processor 302, a memory 301, and a computer program 303 stored in the memory 301 and operable on the processor 302. The number of the processors 302 is at least one, and fig. 5 takes one as an example. The implementation steps of the integrated co-training method for zero sample classification described above, i.e. the steps shown in fig. 1, are implemented when the processor 302 executes the computer program 303.
The specific implementation process of the terminal device 300 can be seen in the above embodiment of the integrated collaborative training method for zero sample classification.
Illustratively, the computer program 303 may be partitioned into one or more modules/units that are stored in the memory 301 and executed by the processor 302 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 303 in the terminal device 300.
The terminal device 300 may be a desktop computer, a notebook, a palm computer, a main control and other computing devices, or may be a mobile terminal such as a mobile phone. Terminal device 300 may include, but is not limited to, a processor and a memory. Those skilled in the art will appreciate that fig. 5 is only an example of the terminal device 300 and does not constitute a limitation of the terminal device 300, and may include more or less components than those shown, or combine some of the components, or different components, for example, the terminal device 300 may further include input and output devices, network access devices, buses, etc.
The Processor 302 may be a CPU (Central Processing Unit), other general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array), other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 301 may be an internal storage unit of the terminal device 300, such as a hard disk or a memory. The memory 301 may also be an external storage device of the terminal device 300, such as a plug-in hard disk, SMC (Smart Media Card), SD (Secure Digital Card), Flash Card, or the like provided on the terminal device 300. Further, the memory 301 may also include both an internal storage unit of the terminal device 300 and an external storage device. The memory 301 is used for storing an operating system, application programs, a boot loader, data, and other programs, such as program codes of the computer program 303. The memory 301 may also be used to temporarily store data that has been output or is to be output.
An embodiment of the present application further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program can implement the steps in the above embodiment of the integrated collaborative training method for zero sample classification.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the embodiment of the integrated co-training method for zero sample classification described above may be implemented by a computer program to instruct related hardware to perform the steps, where the computer program 303 may be stored in a computer-readable storage medium, and when being executed by the processor 302, the computer program 303 may implement the steps of the embodiment of the integrated co-training method for zero sample classification described above. Wherein the computer program 303 comprises computer program code, and the computer program 303 code may be in a source code form, an object code form, an executable file or some intermediate form, and the like. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, ROM (Read-Only Memory), RAM (Random Access Memory), electrical carrier wave signal, telecommunication signal, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (7)

1. An integrated collaborative training method for zero sample classification, comprising:
acquiring a data set and an attribute library thereof, dividing the data set into a training set and a testing set, and respectively calling the training set and the testing set as a visible class and an invisible class;
training attribute prediction networks with different structures, and selecting two networks as a main network and an auxiliary network according to the robustness and generalization capability of different networks to invisible classes;
calculating the mapping relation between the visible class attributes and the invisible class attributes in the attribute library to obtain attribute mapping parameters, respectively extracting the image characteristics of the visible class by using a main network and a secondary network, and synthesizing the virtual characteristics of the invisible class according to the attribute mapping parameters;
the virtual features and the classifiers are combined to complete the training of the classifiers, the invisible features are extracted by using a main network and a secondary network, the invisible features are predicted by using the classifiers, the invisible classes meeting the conditions are endowed with pseudo labels according to a classifier voting mechanism, and the invisible classes endowed with the pseudo labels are added into a training set to train the attribute prediction network again.
2. The integrated collaborative training method for zero sample classification according to claim 1, wherein the dividing of the data set into a training set and a test set, and the training set and the test set respectively referred to as a visible class and an invisible class, comprises:
the data set is divided into a training set and a test set, and the training set and the test set are respectively called a visible class DSAnd invisible class DU
Wherein the visible sample label is
Figure FDA0002756017480000011
Figure FDA0002756017480000012
Representing a visible class data set DSThe picture of (a) is the ith picture,
Figure FDA0002756017480000013
is composed of
Figure FDA0002756017480000014
The category label of (a) is set,
Figure FDA0002756017480000015
invisible class sample label is
Figure FDA0002756017480000016
YU∪YSY; for each Y ∈ Y, there is a semantic attribute A associated with ity={a1,a2…,al+n}。
3. The integrated collaborative training method for zero sample classification as claimed in claim 2, wherein the training of the attribute prediction networks of different structures selects two networks as the primary and secondary networks according to the robustness and generalization capability of the different networks to invisible classes, and comprises:
training attribute prediction networks of different structures, wherein the attribute prediction networks are extracted by feature extraction functions
Figure FDA0002756017480000021
And a classification function phimainTwo parts, wherein the feature extraction function
Figure FDA0002756017480000022
The classification function phi is given by equation (1)mainIs formula (2);
Figure FDA0002756017480000023
Figure FDA0002756017480000024
wherein, WcnnParameters representing convolutional layers in the network, x representing input image samples, WmainA parameter representing a network full connectivity layer;
sending the visible classes into an attribute prediction network to be trained through a formula (3), and using a self-adaptive moment estimation optimizer by an optimizer;
Figure FDA0002756017480000025
whereinσ denotes sigmoid activation function, aiIs xiThe attribute tag of (1);
the output result of the network is the predicted invisible class attribute, and the predicted invisible class attribute is input into a formula (4) to predict the class of the invisible class;
Figure FDA0002756017480000026
wherein phi ispreA semantic attribute representing a prediction of the network,
Figure FDA0002756017480000027
representing real invisible semantic attributes in an attribute library;
taking the formula (5) as an evaluation index, and selecting two networks with the highest evaluation indexes as a main network and an auxiliary network;
Figure FDA0002756017480000028
wherein the content of the first and second substances,
Figure FDA0002756017480000029
top-1 accuracy, representing the c-th class in the invisible class, and γ representing the total number of invisible class classes in the test set.
4. The integrated collaborative training method for zero sample classification according to claim 3, wherein the calculating of the mapping relationship between the attributes of the visible class and the invisible class in the attribute library to obtain attribute mapping parameters, the extracting of the image features of the visible class using the primary and secondary networks, and the synthesizing of the virtual features of the invisible class according to the attribute mapping parameters comprises:
regularizing the attributes in the attribute library, and calculating the attribute mapping parameters between the invisible classes and the visible classes by the formula (6)
Figure FDA0002756017480000031
Figure FDA0002756017480000032
Wherein the content of the first and second substances,
Figure FDA0002756017480000033
an attribute representing an invisible class is represented by,
Figure FDA0002756017480000034
an attribute representing a visible class;
visible type feature usage
Figure FDA0002756017480000035
Obtaining;
virtual feature adoption of invisible classes
Figure FDA0002756017480000036
Is obtained in the following manner.
5. The integrated collaborative training method for zero sample classification as claimed in claim 4, wherein the training of the classifier is completed by combining the virtual features with a plurality of classifiers, the invisible class features are extracted by using a primary and secondary network, the invisible class features are predicted by using the classifiers, the invisible class meeting the condition is assigned with a pseudo label according to a classifier voting mechanism, and the invisible class assigned with the pseudo label is added into a training set to train the attribute prediction network again, and the method comprises:
training the classifier according to the synthesized virtual features of the invisible classes and corresponding labels, obtaining predicted semantic attributes through a formula (7), and predicting the invisible classes through a formula (8);
Figure FDA0002756017480000037
Figure FDA0002756017480000038
wherein, FclassificationRepresenting a classifier used by the network;
and according to a classifier voting mechanism, giving a pseudo label to the invisible class meeting the conditions, adding the invisible class given with the pseudo label into a training set, and training the attribute prediction network again until the training is finished.
6. An integrated collaborative training apparatus for zero sample classification, comprising:
the data dividing module is used for acquiring the data set and the attribute library thereof, dividing the data set into a training set and a testing set, and respectively calling the training set and the testing set as a visible class and an invisible class;
the main and auxiliary network acquisition module is used for training attribute prediction networks with different structures, and selecting two networks as a main network and an auxiliary network according to the robustness and generalization capability of the different networks to invisible classes;
the invisible class virtual feature synthesis module is used for calculating the mapping relation between visible class attributes and invisible class attributes in the attribute library to obtain attribute mapping parameters, respectively extracting the image features of the visible class by using the main network and the auxiliary network, and synthesizing the invisible class virtual features according to the attribute mapping parameters;
and the network training module is used for combining the virtual features with the plurality of classifiers to complete the training of the classifiers, extracting invisible features by using the main and auxiliary networks, predicting the invisible features by using the classifiers, endowing the invisible classes meeting the conditions with pseudo labels according to a classifier voting mechanism, adding the invisible classes endowed with the pseudo labels into a training set, and training the attribute prediction network again.
7. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the integrated co-training method for zero sample classification according to any of claims 1-5 when executing the computer program.
CN202011202927.5A 2020-11-02 2020-11-02 Integrated collaborative training method and device for zero sample classification and terminal equipment Active CN112257808B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011202927.5A CN112257808B (en) 2020-11-02 2020-11-02 Integrated collaborative training method and device for zero sample classification and terminal equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011202927.5A CN112257808B (en) 2020-11-02 2020-11-02 Integrated collaborative training method and device for zero sample classification and terminal equipment

Publications (2)

Publication Number Publication Date
CN112257808A true CN112257808A (en) 2021-01-22
CN112257808B CN112257808B (en) 2022-11-11

Family

ID=74267569

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011202927.5A Active CN112257808B (en) 2020-11-02 2020-11-02 Integrated collaborative training method and device for zero sample classification and terminal equipment

Country Status (1)

Country Link
CN (1) CN112257808B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949688A (en) * 2021-02-01 2021-06-11 哈尔滨市科佳通用机电股份有限公司 Motor train unit bottom plate rubber damage fault detection method, system and device
CN113283514A (en) * 2021-05-31 2021-08-20 高新兴科技集团股份有限公司 Unknown class classification method, device and medium based on deep learning
CN113688879A (en) * 2021-07-30 2021-11-23 南京理工大学 Generalized zero sample learning classification method based on confidence degree distribution external detection
CN113807420A (en) * 2021-09-06 2021-12-17 湖南大学 Domain self-adaptive target detection method and system considering category semantic matching
CN114005005A (en) * 2021-12-30 2022-02-01 深圳佑驾创新科技有限公司 Double-batch standardized zero-instance image classification method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190025848A1 (en) * 2017-05-05 2019-01-24 Hrl Laboratories, Llc Attribute aware zero shot machine vision system via joint sparse representations
CN110163258A (en) * 2019-04-24 2019-08-23 浙江大学 A kind of zero sample learning method and system reassigning mechanism based on semantic attribute attention
CN110826638A (en) * 2019-11-12 2020-02-21 福州大学 Zero sample image classification model based on repeated attention network and method thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190025848A1 (en) * 2017-05-05 2019-01-24 Hrl Laboratories, Llc Attribute aware zero shot machine vision system via joint sparse representations
CN110163258A (en) * 2019-04-24 2019-08-23 浙江大学 A kind of zero sample learning method and system reassigning mechanism based on semantic attribute attention
CN110826638A (en) * 2019-11-12 2020-02-21 福州大学 Zero sample image classification model based on repeated attention network and method thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FAN WU ET AL.: "Global Semantic Consistency for Zero-Shot Learning", 《HTTPS://ARXIV.ORG/ABS/1806.08503》 *
李慧慧: "基于深度排序学习的零样本多标签图像分类", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949688A (en) * 2021-02-01 2021-06-11 哈尔滨市科佳通用机电股份有限公司 Motor train unit bottom plate rubber damage fault detection method, system and device
CN113283514A (en) * 2021-05-31 2021-08-20 高新兴科技集团股份有限公司 Unknown class classification method, device and medium based on deep learning
CN113688879A (en) * 2021-07-30 2021-11-23 南京理工大学 Generalized zero sample learning classification method based on confidence degree distribution external detection
CN113807420A (en) * 2021-09-06 2021-12-17 湖南大学 Domain self-adaptive target detection method and system considering category semantic matching
CN113807420B (en) * 2021-09-06 2024-03-19 湖南大学 Domain self-adaptive target detection method and system considering category semantic matching
CN114005005A (en) * 2021-12-30 2022-02-01 深圳佑驾创新科技有限公司 Double-batch standardized zero-instance image classification method

Also Published As

Publication number Publication date
CN112257808B (en) 2022-11-11

Similar Documents

Publication Publication Date Title
CN112257808B (en) Integrated collaborative training method and device for zero sample classification and terminal equipment
CN111797893B (en) Neural network training method, image classification system and related equipment
WO2019100724A1 (en) Method and device for training multi-label classification model
Lu et al. Dense and sparse reconstruction error based saliency descriptor
Kao et al. Visual aesthetic quality assessment with a regression model
CN103268317B (en) Image is carried out the system and method for semantic annotations
CN109634698B (en) Menu display method and device, computer equipment and storage medium
CN112330685B (en) Image segmentation model training method, image segmentation device and electronic equipment
CN109284675B (en) User identification method, device and equipment
CN110827129A (en) Commodity recommendation method and device
CN110245714B (en) Image recognition method and device and electronic equipment
CN110377733B (en) Text-based emotion recognition method, terminal equipment and medium
Del Rincón et al. Common-sense reasoning for human action recognition
WO2020023760A1 (en) System and method for clustering products by combining attribute data with image recognition
Karaoglu et al. Detect2rank: Combining object detectors using learning to rank
Crabbé et al. Label-free explainability for unsupervised models
CN111507285A (en) Face attribute recognition method and device, computer equipment and storage medium
CN111325237A (en) Image identification method based on attention interaction mechanism
CN112749737A (en) Image classification method and device, electronic equipment and storage medium
CN112990318A (en) Continuous learning method, device, terminal and storage medium
CN113657087B (en) Information matching method and device
CN113837257A (en) Target detection method and device
CN109885745A (en) A kind of user draws a portrait method, apparatus, readable storage medium storing program for executing and terminal device
CN113780365A (en) Sample generation method and device
CN112614111A (en) Video tampering operation detection method and device based on reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant