CN112257808A

CN112257808A - Integrated collaborative training method and device for zero sample classification and terminal equipment

Info

Publication number: CN112257808A
Application number: CN202011202927.5A
Authority: CN
Inventors: 郭毅博; 范一鸣; 王海迪; 孟文化; 姜晓恒; 徐明亮
Original assignee: Zhengzhou University
Current assignee: Zhengzhou University
Priority date: 2020-11-02
Filing date: 2020-11-02
Publication date: 2021-01-22
Anticipated expiration: 2040-11-02
Also published as: CN112257808B

Abstract

The invention relates to an integrated collaborative training method, a device and a terminal device for zero sample classification, which divide an obtained data set into a training set and a testing set which are respectively called as a visible class and an invisible class, train attribute prediction networks with different structures, select two networks as a main network and a secondary network from the training set, calculate attribute mapping parameters, synthesize virtual characteristics of the invisible class according to the attribute mapping parameters, combine the virtual characteristics with a plurality of classifiers to complete the training of the classifiers, extract the invisible class characteristics by using the main network and the secondary network, predict the invisible class characteristics by using the classifiers, assign pseudo labels to the invisible classes meeting the conditions according to a classifier voting mechanism, add the invisible classes assigned with the pseudo labels into the training set to train the attribute prediction networks again, improve the prediction precision of a network model, and simultaneously train by using different ZSL embedding methods to further select the main network and the secondary network, the method is easy to expand to other zero sample learning methods, and the method performance is improved.

Description

Integrated collaborative training method and device for zero sample classification and terminal equipment

Technical Field

The invention relates to an integrated collaborative training method and device for zero sample classification and terminal equipment.

Background

Due to the effectiveness of deep learning on the image recognition problem, the supervised image recognition method achieves the exclamatory result in many fields, but a considerable amount of labeled samples are often needed to train a good enough network recognition model, and the model trained by using the known samples can only recognize the object class contained in the training set, so that the ability of recognizing the object class not contained in the training set is lacked. However, in real life, image data of some categories is deficient, image categories to be identified are continuously increased, meanwhile, the cost of retraining the model is high each time different categories of data are increased, the image identification field should not completely depend on the method which needs a large number of samples, and therefore, more challenging zero-sample learning is proposed, and the method aims to identify target examples from new category images which are never seen.

Early research on Zero-sample Learning dates back to 2008, Larochelle H et al used a Zero-data Learning (ZSL) method for the character classification problem, Palatucci M et al formally proposed a ZSL concept, Larochelle H et al proposed an attribute-based class migration Learning mechanism and an Attributes With Attributes (AWA) data set in the current year, and first proposed Direct Attribute Prediction (DAP) and Indirect Attribute Prediction (IAP) using the image recognition field as an application scene, because the study is different from the thinking way of the traditional image recognition task and the development requirements of the image recognition field, and Zero-sample Learning began to attract wide attention. In the zero sample task, all categories are provided with relevant descriptions, such as common attribute characteristics of color, wings, crawling, tail and the like, which are shared among the categories, and the mapping relation between the image and the category label in the supervised image recognition problem is converted into the mapping relation between the image and the semantic and the category.

In the early ZSL method, classifiers of each attribute in DAP and IAP are trained independently, and the relationship between attributes in classes is not considered, so the latest ZSL method almost designs different constraint terms aiming at image visual features or semantics to learn the mapping between the image visual features and the class embedding, or constructs a general embedding space for the image and the semantic attributes thereof to learn the mapping between the image visual features and the class embedding, for example, SJE proposed by Akata Z et al in 2015 completes compatibility modeling from the visual space to the semantic space by training a structured support vector machine, EXEM proposed by Changpinyo S et al in 2017 projects semantic information to the visual feature center in the visual space, and LDF proposed by Li Y et al in 2018 constructs a potential feature embedding space to associate the visual and semantic information. However, the final objective of ZSL is to predict the object classes not contained in the training set, since the same attribute between the known class and the unknown class often contains different appearances (for example, the tail is an option, and the pigtail is very different in appearance compared with the tail of animals such as tigers and zebras), the domain shift problem may be caused, that is, the difference in corresponding visual features between different classes of the same attribute may be very large, when the network model is used to classify the test set, the new classes that have never been found are often classified into the known classes of the training set, resulting in poor prediction accuracy of the network model.

Disclosure of Invention

The invention aims to provide an integrated collaborative training method, an integrated collaborative training device and terminal equipment for zero sample classification, and aims to solve the problem that a network model obtained by the existing training method is poor in prediction accuracy.

In order to solve the problems, the invention adopts the following technical scheme:

an integrated collaborative training method for zero sample classification, comprising:

acquiring a data set and an attribute library thereof, dividing the data set into a training set and a testing set, and respectively calling the training set and the testing set as a visible class and an invisible class;

training attribute prediction networks with different structures, and selecting two networks as a main network and an auxiliary network according to the robustness and generalization capability of different networks to invisible classes;

calculating the mapping relation between the visible class attributes and the invisible class attributes in the attribute library to obtain attribute mapping parameters, respectively extracting the image characteristics of the visible class by using a main network and a secondary network, and synthesizing the virtual characteristics of the invisible class according to the attribute mapping parameters;

the virtual features and the classifiers are combined to complete the training of the classifiers, the invisible features are extracted by using a main network and a secondary network, the invisible features are predicted by using the classifiers, the invisible classes meeting the conditions are endowed with pseudo labels according to a classifier voting mechanism, and the invisible classes endowed with the pseudo labels are added into a training set to train the attribute prediction network again.

Preferably, the dividing the data set into a training set and a test set, and the training set and the test set being respectively referred to as a visible class and an invisible class includes:

the data set is divided into a training set and a test set, and the training set and the test set are respectively called a visible class D^SAnd invisible class D^U；

Wherein the visible sample label is

Representing a visible class data set D^SThe picture of (a) is the ith picture,

is composed of

The category label of (a) is set,

invisible class sample label is

Y^U∪Y^SY; for each Y ∈ Y, there is storedAt semantic attribute A associated therewith_y＝{a₁,a₂…,a_l+n}。

Preferably, the training of the attribute prediction networks with different structures selects two networks as a primary network and a secondary network according to the robustness and generalization capability of the different networks to invisible classes, and includes:

training attribute prediction networks of different structures, wherein the attribute prediction networks are extracted by feature extraction functions

And a classification function phi_mainTwo parts, wherein the feature extraction function

The classification function phi is given by equation (1)_mainIs formula (2);

wherein, W_cnnParameters representing convolutional layers in the network, x representing input image samples, W_mainA parameter representing a network full connectivity layer;

sending the visible classes into an attribute prediction network to be trained through a formula (3), and using a self-adaptive moment estimation optimizer by an optimizer;

where σ represents a sigmoid activation function, a_iIs x_iThe attribute tag of (1);

the output result of the network is the predicted invisible class attribute, and the predicted invisible class attribute is input into a formula (4) to predict the class of the invisible class;

wherein phi is_preA semantic attribute representing a prediction of the network,

representing real invisible semantic attributes in an attribute library;

taking the formula (5) as an evaluation index, and selecting two networks with the highest evaluation indexes as a main network and an auxiliary network;

wherein the content of the first and second substances,

top-1 accuracy, representing the c-th class in the invisible class, and γ representing the total number of invisible class classes in the test set.

Preferably, the calculating a mapping relationship between attributes of a visible class and an invisible class in an attribute library to obtain attribute mapping parameters, respectively extracting image features of the visible class by using a primary network and a secondary network, and synthesizing virtual features of the invisible class according to the attribute mapping parameters includes:

regularizing the attributes in the attribute library, and calculating the attribute mapping parameters between the invisible classes and the visible classes by the formula (6)

Wherein the content of the first and second substances,

an attribute representing an invisible class is represented by,

an attribute representing a visible class;

visible type feature usage

Obtaining;

virtual feature adoption of invisible classes

Is obtained in the following manner.

Preferably, the training of the classifier is completed by combining the virtual features with a plurality of classifiers, the invisible features are extracted by using a primary network and a secondary network, the invisible features are predicted by using the classifier, the invisible classes meeting the conditions are endowed with pseudo labels according to a classifier voting mechanism, the invisible classes endowed with the pseudo labels are added into a training set to train the attribute prediction network again, and the method includes:

training the classifier according to the synthesized virtual features of the invisible classes and corresponding labels, obtaining predicted semantic attributes through a formula (7), and predicting the invisible classes through a formula (8);

wherein, F_{classification}Representing a classifier used by the network;

and according to a classifier voting mechanism, giving a pseudo label to the invisible class meeting the conditions, adding the invisible class given with the pseudo label into a training set, and training the attribute prediction network again until the training is finished.

An integrated collaborative training apparatus for zero sample classification, comprising:

the data dividing module is used for acquiring the data set and the attribute library thereof, dividing the data set into a training set and a testing set, and respectively calling the training set and the testing set as a visible class and an invisible class;

the main and auxiliary network acquisition module is used for training attribute prediction networks with different structures, and selecting two networks as a main network and an auxiliary network according to the robustness and generalization capability of the different networks to invisible classes;

the invisible class virtual feature synthesis module is used for calculating the mapping relation between visible class attributes and invisible class attributes in the attribute library to obtain attribute mapping parameters, respectively extracting the image features of the visible class by using the main network and the auxiliary network, and synthesizing the invisible class virtual features according to the attribute mapping parameters;

and the network training module is used for combining the virtual features with the plurality of classifiers to complete the training of the classifiers, extracting invisible features by using the main and auxiliary networks, predicting the invisible features by using the classifiers, endowing the invisible classes meeting the conditions with pseudo labels according to a classifier voting mechanism, adding the invisible classes endowed with the pseudo labels into a training set, and training the attribute prediction network again.

A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the integrated co-training method for zero sample classification as described above when executing the computer program.

The invention has the beneficial effects that: firstly, a data set is divided into a training set and a testing set, the training set and the testing set are respectively called as a visible class and an invisible class, attribute prediction networks with different structures are trained, two networks are selected as a main network and a secondary network according to the robustness and generalization capability of different networks to the invisible class, then invisible class characteristics are synthesized by combining attribute mapping parameters between the visible class and the invisible class, finally, the synthesized characteristics are endowed with invisible class pseudo labels by using a plurality of classifiers, as the invisible class is predicted by using the plurality of classifiers integrated by using the plurality of networks, the prediction error in embedding the visible class labels can be adaptively relieved from different angles of samples, the invisible class meeting conditions is endowed with pseudo labels according to a classifier voting mechanism, the invisible class endowed with the pseudo labels is added into the training set to train the attribute prediction networks again, the invisible classes are marked again in each cycle, so that the problem that the invisible classes cannot escape in the life due to one-time selection in a common pseudo label endowing method is solved, the problem that the prediction accuracy of a network model obtained by the conventional training method is poor can be solved, and different ZSL embedding methods can be used for training to further select a main network and a secondary network, namely the integrated collaborative training method for zero sample classification provided by the invention is easily expanded to other zero sample learning methods based on embedding, and the performance of the method is improved.

Drawings

In order to more clearly illustrate the technical solution of the embodiment of the present invention, the drawings needed to be used in the embodiment will be briefly described as follows:

fig. 1 is a schematic overall flowchart of an integrated collaborative training method for zero sample classification according to an embodiment of the present application;

FIG. 2 is a flowchart of an algorithm corresponding to the integrated co-training method for zero sample classification;

FIG. 3 is a graph showing the relationship between the number of training cycles and the accuracy of TOP-1;

FIG. 4 is a schematic overall structure diagram of an integrated cooperative training apparatus for zero sample classification according to a second embodiment of the present application;

fig. 5 is a schematic structural diagram of a terminal device according to a third embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

The integrated collaborative training method for zero sample classification provided by the embodiment of the application can be applied to terminal equipment such as mobile phones, tablet computers, notebook computers and personal computers, and the embodiment of the application does not limit the specific types of the terminal equipment. That is, the carrier of the client corresponding to the integrated collaborative training method for zero sample classification provided in the embodiment of the present application may be any one of the above terminal devices.

In order to explain the technical means described in the present application, the following description will be given by way of specific embodiments.

The first part is network selection, modifies full connection layers of pre-training convolutional neural networks with different structures, directly uses the full connection layers to learn mapping between visual features and semantics, and selects two networks as a main network and a secondary network according to robustness and generalization capability of different networks to test samples; the second part is pseudo label prediction, firstly, a main network and a secondary network are used for extracting characteristics of a training class and a testing class, a mapping parameter between semantics of a testing class sample and the training class sample is calculated, and the mapping parameter is combined with the training class characteristics to generate a testing class virtual characteristic; and then constructing different classifiers, sending the virtual features of the test class and corresponding labels into the classifiers for training, predicting the features extracted from the test class, assigning pseudo labels to the test class samples meeting the conditions according to a classifier voting mechanism, adding the test class samples assigned with the pseudo labels into a training set, and training the convolutional neural network again. And finally, repeatedly executing the pseudo label prediction process until the network accuracy rate is not changed obviously.

Referring to fig. 1, it is a flowchart of an implementation procedure of an integrated collaborative training method for zero sample classification provided in an embodiment of the present application, and for convenience of explanation, only a part related to the embodiment of the present application is shown.

The integrated collaborative training method for zero sample classification comprises the following steps:

step S101: acquiring a data set and an attribute library thereof, dividing the data set into a training set and a testing set, and respectively calling the training set and the testing set as a visible class and an invisible class:

acquiring a data set and an attribute library thereof, and dividing the data set into a training set and a test set, such as: dividing the training set and the test set according to a preset proportion, and respectively calling the training set and the test set as a visible class D^SAnd invisible class D^U。

Wherein the visible sample label is

I.e. there are a total of l pictures that have been marked,

representing a visible class data set D^SThe picture of (a) is the ith picture,

is composed of

The category label of (a) is set,

invisible class sample label is

I.e. there are a total of n unlabelled pictures,

Y^U∪Y^Sy. For each Y ∈ Y, there is a semantic attribute A associated with it_y＝{a₁,a₂…,a_l+n}。

Step S102: training attribute prediction networks with different structures, and selecting two networks as a main network and an auxiliary network according to the robustness and generalization capability of different networks to invisible classes:

fig. 2 is a flowchart of an algorithm corresponding to the integrated collaborative training method for zero sample classification provided by the present invention. The algorithm starts from the selection of the network, and aims to select two networks with strong robustness and generalization capability for invisible classes to form a cooperative network (the cooperative network is removed), and the two networks are respectively called as a main network and a secondary network according to network performance. For the same data set, the models obtained by learning different network architectures are different from the prediction distribution of the test set, meanwhile, the model follows the standard collaborative training rule, random noise is added into the training set to construct different data sets, and after the main network and the auxiliary network are selected, the auxiliary network is used for retraining the data set constructed by the random noise. It is assumed that the data can be classified from different angles to achieve a complementary effect. For the embedding-based ZSL method, training data needs to be sent into a convolutional neural network to obtain visual features and projected to a semantic space. Based on the idea of a collaborative training algorithm, by combining a data set added with noise, networks with different structures and a semantic classifier and combining the correlation between global features and image semantics, the prediction error in the embedding of visible labels is relieved from different angles of images in a self-adaptive manner, and the predicted invisible labels are added into a training set, so that the problem of domain deviation caused by only using the visible labels in the training process is reduced.

The specific implementation of step S102 is given below:

The classification function phi is given by equation (1)_mainIs formula (2);

wherein, W_cnnParameters representing convolutional layers in the network, x representing input image samples, W_mainRepresenting parameters of a network full connectivity layer.

The visible classes are fed into the attribute prediction network to be trained through formula (3), in the embodiment, on the aspect of training problems of the network, the training samples with labels are given

The loss function is minimized and different embedding methods have different loss functions. Equation (3) uses a binary cross entropy loss function to update the network parameters. The optimizer uses an adaptive moment estimation optimizer.

Where σ represents a sigmoid activation function, a_iIs x_iThe attribute tag of (1).

The output result of the network is the predicted invisible class attribute, which is input into formula (4) to predict the class of the invisible class, in this embodiment, a metric measure using cosine similarity as the closeness of the prediction semantics and the invisible class semantics is used.

representing the real invisible class semantic attributes in the attribute library.

wherein the content of the first and second substances,

It should be understood that, in order to select a network with more robustness and generalization capability, as a specific embodiment, experiments are performed on the current mainstream convolutional neural networks VGG, google lenet, ResNet and the EfficientNet which is the strongest in ImageNet, and two networks with the best performance are selected as the primary and secondary networks, as shown in table 1, table 1 is the average TOP-1 accuracy of different convolutional neural networks on different data sets, and it can be seen that the network effect of the ResNet series network is the best.

TABLE 1

Step S103: calculating the mapping relation between the visible class and the invisible class in the attribute library to obtain attribute mapping parameters, respectively extracting the image characteristics of the visible class by using a main network and a secondary network, and synthesizing the virtual characteristics of the invisible class according to the attribute mapping parameters:

Wherein the content of the first and second substances,

an attribute representing an invisible class is represented by,

attributes representing visible classes.

Prototype features are central features of all samples of each class, and virtual prototypes of invisible classes are synthesized by using visible class prototype features based on semantic information and used as input of a semantic classifier in the proposed algorithm. Visible type prototype feature (i.e., visible type feature) usage

Is obtained by

And acquiring the visible prototype features.

The virtual prototype features of the invisible class (i.e. the virtual features of the invisible class) can be obtained by using ridge regression:

step S104: the virtual features and the classifiers are combined to complete the training of the classifiers, the invisible features are extracted by using a main network and a secondary network, the invisible features are predicted by using the classifiers, pseudo labels are given to the invisible classes meeting the conditions according to a classifier voting mechanism, the invisible classes given the pseudo labels are added into a training set to train the attribute prediction network again:

the virtual features and a plurality of classifiers are combined to complete the training of the classifiers, the invisible features are extracted by using a main network and a secondary network, the invisible features are predicted by using the classifiers, and the invisible virtual features to be synthesized are

And corresponding labels are fed into the classifier to train the classifier. In this embodiment, the current mainstream classifiers such as lasso regression, ridge regression, bayesian ridge regression, linear regression, support vector machine, random forest and the like can be considered.

The predicted semantic attributes are obtained by equation (7):

wherein, F_{classification}Representing the classifiers used by the network. Considering samples from multiple angles, improving generalization performance, reducing the risk of a single classifier entering a local minimum point, and then predicting invisible classes through formula (8):

as a specific embodiment, under a certain hardware condition, for example, the hardware condition that the CPU is i7-8700k, the average TOP-1 accuracy and the training time of different classifiers on the CUB data set are counted, as shown in Table 2, Table 2 shows the average TOP-1 accuracy and the training time of different classifiers on the CUB data set, and the LASSO regression, the ridge regression and the Bayesian ridge regression can be selected as the classifiers in consideration of the accuracy and the time.

TABLE 2

And then, according to a classifier voting mechanism, giving a pseudo label to an invisible class meeting the conditions, wherein a general classifier voting rule means that a plurality of classifiers often have different prediction results, so that a voted classifier is obtained on the basis of the base classifiers, and the class with the most votes is taken as the class to be predicted. Here, there are main and auxiliary networks and three classifiers corresponding to each network, and a voting rule is formulated: selecting the best 5 prediction results from the classifier prediction results according to the accuracy, judging whether the number of invisible samples is more than half of the total number of invisible samples when 4 classifier voting rules are used, if so, adopting 4 classifier voting rules, otherwise, adopting 3 classifier voting rules to the D^UGiving a false label when executing to the secondFour cycles, 3 classifier voting rules are used. And (4) counting the accuracy of the network model under the SUN data set under different voting numbers Z, as shown in FIG. 3, it can be seen that the accuracy is not increased in the third cycle, and the network model reaches the best performance under the marking rule by considering the trend that the voting number can be increased by changing.

After the marking of the invisible classes is completed, the invisible classes are added into the training set to train the network again until the accuracy rate is not changed obviously (generally, the accuracy rate is not changed obviously when the 5 th cycle is executed), that is, the invisible classes given with the pseudo labels are added into the training set to train the attribute prediction network again (the steps S103 and S104 are repeatedly executed) until the training is completed (for example, the accuracy rate is not changed obviously or the cycle times reach the preset times).

Table 3 is a comparison graph of the experimental results of the method provided by the present application and other existing methods, and at the same time, LFGAA proposed by Liu Y et al in 2019 is introduced as a method for training a network in a network selection module, which shows that the performance of the method provided by the present application is improved compared with other existing methods.

TABLE 3

In Table 3, y_uMean Top-1 accuracy, y, representing invisible class samples_sIndicating the average Top-1 accuracy of the visible type samples tested, "-" indicating that the prior art did not disclose this result.

As can be seen from the above steps, the invisible class virtual prototype feature synthesized based on semantic information is utilized in the embodiment, but the invisible class virtual prototype feature cannot be guaranteed to be the same as a real image, so that a reliable invisible class label is obtained and put into a training set to retrain a network, so that a prediction error in embedding of the visible class label can be overcome, and the problem of domain deviation caused by training only using the visible class is further alleviated.

Fig. 4 shows a structural block diagram of an integrated collaborative training apparatus for zero sample classification provided in the second embodiment of the present application, and for convenience of explanation, only the parts related to the second embodiment of the present application are shown.

Referring to fig. 4, an integrated co-training apparatus 200 for zero sample classification includes:

the data dividing module 201 is configured to obtain a data set and an attribute library thereof, divide the data set into a training set and a test set, and respectively refer to the training set and the test set as a visible class and an invisible class;

the primary and secondary network acquisition module 202 is used for training attribute prediction networks with different structures, and selecting two networks as primary and secondary networks according to the robustness and generalization capability of the different networks to invisible classes;

the invisible class virtual feature synthesis module 203 is used for calculating the mapping relationship between the visible class and the invisible class attributes in the attribute library to obtain attribute mapping parameters, respectively extracting the image features of the visible class by using the main and auxiliary networks, and synthesizing the invisible class virtual features according to the attribute mapping parameters;

the network training module 204 is configured to combine the virtual features with the multiple classifiers to complete training of the classifiers, extract the invisible features by using the primary and secondary networks, predict the invisible features by using the classifiers, assign pseudo labels to the invisible classes meeting the conditions according to a classifier voting mechanism, add the invisible classes assigned with the pseudo labels into a training set, and train the attribute prediction network again.

It should be noted that, for the information interaction, the execution process, and other contents between the above-mentioned devices/modules, because the same concept is based on, the specific functions and the technical effects of the embodiment of the integrated collaborative training method for zero sample classification in the present application may be specifically referred to the section of the embodiment of the integrated collaborative training method for zero sample classification, and are not described herein again.

It is clear to those skilled in the art that, for the convenience and simplicity of description, the above division of the functional modules is only used for illustration, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the integrated cooperative training apparatus for zero sample classification 200 is divided into different functional modules to perform all or part of the above described functions. Each functional module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional modules are only used for distinguishing one functional module from another, and are not used for limiting the protection scope of the application. The specific working process of each functional module in the above description may refer to the corresponding process in the foregoing embodiment of the integrated collaborative training method for zero sample classification, and is not described herein again.

Fig. 5 is a schematic structural diagram of a terminal device according to a third embodiment of the present application. As shown in fig. 5, the terminal device 300 includes: a processor 302, a memory 301, and a computer program 303 stored in the memory 301 and operable on the processor 302. The number of the processors 302 is at least one, and fig. 5 takes one as an example. The implementation steps of the integrated co-training method for zero sample classification described above, i.e. the steps shown in fig. 1, are implemented when the processor 302 executes the computer program 303.

The specific implementation process of the terminal device 300 can be seen in the above embodiment of the integrated collaborative training method for zero sample classification.

Illustratively, the computer program 303 may be partitioned into one or more modules/units that are stored in the memory 301 and executed by the processor 302 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 303 in the terminal device 300.

The terminal device 300 may be a desktop computer, a notebook, a palm computer, a main control and other computing devices, or may be a mobile terminal such as a mobile phone. Terminal device 300 may include, but is not limited to, a processor and a memory. Those skilled in the art will appreciate that fig. 5 is only an example of the terminal device 300 and does not constitute a limitation of the terminal device 300, and may include more or less components than those shown, or combine some of the components, or different components, for example, the terminal device 300 may further include input and output devices, network access devices, buses, etc.

The Processor 302 may be a CPU (Central Processing Unit), other general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field-Programmable Gate Array), other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 301 may be an internal storage unit of the terminal device 300, such as a hard disk or a memory. The memory 301 may also be an external storage device of the terminal device 300, such as a plug-in hard disk, SMC (Smart Media Card), SD (Secure Digital Card), Flash Card, or the like provided on the terminal device 300. Further, the memory 301 may also include both an internal storage unit of the terminal device 300 and an external storage device. The memory 301 is used for storing an operating system, application programs, a boot loader, data, and other programs, such as program codes of the computer program 303. The memory 301 may also be used to temporarily store data that has been output or is to be output.

An embodiment of the present application further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program can implement the steps in the above embodiment of the integrated collaborative training method for zero sample classification.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the embodiment of the integrated co-training method for zero sample classification described above may be implemented by a computer program to instruct related hardware to perform the steps, where the computer program 303 may be stored in a computer-readable storage medium, and when being executed by the processor 302, the computer program 303 may implement the steps of the embodiment of the integrated co-training method for zero sample classification described above. Wherein the computer program 303 comprises computer program code, and the computer program 303 code may be in a source code form, an object code form, an executable file or some intermediate form, and the like. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, ROM (Read-Only Memory), RAM (Random Access Memory), electrical carrier wave signal, telecommunication signal, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. An integrated collaborative training method for zero sample classification, comprising:

2. The integrated collaborative training method for zero sample classification according to claim 1, wherein the dividing of the data set into a training set and a test set, and the training set and the test set respectively referred to as a visible class and an invisible class, comprises:

Wherein the visible sample label is

Representing a visible class data set D^SThe picture of (a) is the ith picture,

is composed of

The category label of (a) is set,

invisible class sample label is

Y^U∪Y^SY; for each Y ∈ Y, there is a semantic attribute A associated with it_y＝{a₁,a₂…,a_l+n}。

3. The integrated collaborative training method for zero sample classification as claimed in claim 2, wherein the training of the attribute prediction networks of different structures selects two networks as the primary and secondary networks according to the robustness and generalization capability of the different networks to invisible classes, and comprises:

The classification function phi is given by equation (1)_mainIs formula (2);

whereinσ denotes sigmoid activation function, a_iIs x_iThe attribute tag of (1);

representing real invisible semantic attributes in an attribute library;

wherein the content of the first and second substances,

4. The integrated collaborative training method for zero sample classification according to claim 3, wherein the calculating of the mapping relationship between the attributes of the visible class and the invisible class in the attribute library to obtain attribute mapping parameters, the extracting of the image features of the visible class using the primary and secondary networks, and the synthesizing of the virtual features of the invisible class according to the attribute mapping parameters comprises:

Wherein the content of the first and second substances,

an attribute representing an invisible class is represented by,

an attribute representing a visible class;

visible type feature usage

Obtaining;

virtual feature adoption of invisible classes

Is obtained in the following manner.

5. The integrated collaborative training method for zero sample classification as claimed in claim 4, wherein the training of the classifier is completed by combining the virtual features with a plurality of classifiers, the invisible class features are extracted by using a primary and secondary network, the invisible class features are predicted by using the classifiers, the invisible class meeting the condition is assigned with a pseudo label according to a classifier voting mechanism, and the invisible class assigned with the pseudo label is added into a training set to train the attribute prediction network again, and the method comprises:

wherein, F_{classification}Representing a classifier used by the network;

6. An integrated collaborative training apparatus for zero sample classification, comprising:

7. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the integrated co-training method for zero sample classification according to any of claims 1-5 when executing the computer program.