CN109993197A - A kind of zero sample multi-tag classification method based on the end-to-end example differentiation of depth - Google Patents

A kind of zero sample multi-tag classification method based on the end-to-end example differentiation of depth Download PDF

Info

Publication number
CN109993197A
CN109993197A CN201811495479.5A CN201811495479A CN109993197A CN 109993197 A CN109993197 A CN 109993197A CN 201811495479 A CN201811495479 A CN 201811495479A CN 109993197 A CN109993197 A CN 109993197A
Authority
CN
China
Prior art keywords
label
training
sample
labels
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811495479.5A
Other languages
Chinese (zh)
Other versions
CN109993197B (en
Inventor
冀中
李慧慧
庞彦伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201811495479.5A priority Critical patent/CN109993197B/en
Publication of CN109993197A publication Critical patent/CN109993197A/en
Application granted granted Critical
Publication of CN109993197B publication Critical patent/CN109993197B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Abstract

A kind of zero sample multi-tag classification method based on the end-to-end example differentiation of depth, training stage include: that trained more exemplary characteristics extract network;Extract the corresponding label characteristics of training sample;Visual signature for realizing multi-modal fusion, and excavates the incidence relation between label and label, between sample and label to the cross-module state mapping network training in label characteristics space;Constraints module between the label of training sample and each label of the label of test sample;The optimization of the final goal function of training stage.Test phase directly realizes that the classification of zero sample multi-tag includes: to extract network using more exemplary characteristics to extract the more exemplary characteristics of test sample using ad-hoc network acquired in the training stage;Extract the corresponding label characteristics of test sample;The classification of test sample multi-tag.The present invention can realize multi-tag image labeling to unmarked image.

Description

Zero sample multi-label classification method based on deep end-to-end example differentiation
Technical Field
The invention relates to a zero sample multi-label classification method. In particular to a zero sample multi-label classification method based on deep end-to-end example differentiation.
Background
As data information has grown explosively, the motivation for people to intelligently use data and mine it to extract information about its effectiveness has also grown. The ability of machine learning models to model and solve complex tasks makes this study a great advance, for two main reasons: more computing power and more tagged data. The traditional Single-Label image Classification (Single-Label Classification) system refers to labeling a Single image only containing a Single class, and in order to accurately identify a certain class of images, a classifier must be learned according to a known training data set, and then the classifier is used for classifying test images, wherein the class of the test images is determined to appear in a training stage. In practical situations, training data and labeling information are often difficult to obtain, and on one hand, things in the world are very various and are continuously increasing; on the other hand, for a certain category, it can be further subdivided into many subclasses. Thus it can be seen that visual recognition systems are generally limited to training sample classes and model expansion capabilities are affected. To solve this problem, early studies proposed classifying unseen classes during training using auxiliary semantic information such as text, which is called Zero-sample Learning (Zero-Shot Learning) and is derived from the ability of human beings to recognize new things just by description. At present, a zero sample learning technology is mainly used for a single-label image classification task, in practical application, different regions of one image often correspond to a plurality of classes, and how to classify the regions into one of the classes is a multi-label image classification technology, namely, the zero sample multi-label image classification task, which can meet the actual requirement and solve the problem of label missing.
The zero-sample multi-label classification task is more challenging compared with two sub-problems, and specifically, challenges of zero-sample learning, such as a semantic gap problem, a domain offset problem and a hubness problem, are faced; the semantic explosion problem in the multi-label classification task is faced; in addition, the multi-label zero-sample image classification task needs to consider not only the complex semantic relation between the seen classes, but also the semantic relation between the unseen classes. For example, for a given multi-label observation sample x, the number of classes included is n, and conventional multi-label image classification considers the problem as n independent single-label classification problems, the process is redundant and has low precision, how to efficiently and accurately realize class labeling is critical to effectively utilize the internal association between images and classes, and thus the zero-sample multi-label classification problem mainly solves two key problems: (1) a cross-modal mapping model from a sample x visual representation to a corresponding multi-label semantic representation realizes knowledge transfer between a known class and an unknown class and simultaneously establishes visual and semantic association; (2) and the interrelationship between classes and images and between classes is reasonably modeled, and the high-efficiency and accurate multi-label classification is realized.
Representation Learning (Representation Learning) refers to a general term of a Learning technology for Learning a feature Representation, and in the deep Learning field, refers to a way that a sample x is effectively characterized in a certain form, and three common data ways for a computer in deep Learning are as follows: the method comprises the steps of local expression, coefficient expression and distributed expression, and typical expression learning models comprise a CNN network supervised feature extraction, an unsupervised feature characterization based on variational self-coding and Boltzmann machine, some fine-ting semi-supervised learning mechanisms and the like. One of the main reasons for the strong modeling and knowledge extraction capabilities of deep learning is to perform effective expression on an observation sample, and therefore an effective expression is important for simplifying the learning task and improving the learning performance. The most intuitive way to represent an efficient evaluation of a learning model is to use the features proposed by the model for classification, such as feature extraction based on CNN and performance evaluation by softmax classification. The distributed feature expression of auxiliary semantic information in zero sample learning, namely a Word vector method (common models Word2Vec and Glove), is effective embodiment for representing learning; the other type of middle layer auxiliary semantic information, namely attribute characteristics, belongs to a coefficient expression mode in the expression learning; visual feature extraction based on VGG networkI.e. a typical representation learning method, the visual characteristics of the sample x are characterized by RDAnd representing the feature vector as D dimension. For a multi-label image, besides reasonable representation of auxiliary semantic information of labels, due to the fact that the multi-label image has various targets and abundant features, the single-dimensional feature expression capability of a classical CNN network is not enough, and richer multi-channel and multi-dimensional visual feature representations and corresponding representation learning models are needed.
Multi-example Multi-Label learning (MIML) is mainly directed to scenes where objects have different targets and different categories, such as text classification, where each document has partial sentences as examples and corresponds to many categories. When a common machine learning technology solves a practical problem, it is a common practice to extract object features, describe the object with a feature vector, so as to obtain an instance (instance), and then associate the instance with a class label (label) corresponding to the object, when a larger instance set is available, a certain learning algorithm can be used to learn a mapping between an instance space and a label space (or a label word vector space), and the mapping can predict that the corresponding label of the instance is not seen. However, real-world objects often have multiple semantics, for example, an image includes "elephant", "blue sky", "white cloud", and "grassland", a feature vector is extracted for the image to obtain an example, n classifiers are respectively trained to recognize, that is, it is inefficient to find a one-to-one classification relationship; the efficiency of training an n-way classifier, namely seeking one-to-many classification relation is improved, but the key point is that the visual feature representation is single, each class corresponds to the same visual feature vector, and the n-way classifier has no discriminability and interpretability. Therefore, an example differentiation idea is provided, different targets of the complex object are represented as different example feature vectors, and at the moment, a many-to-one or many-to-many strategy is adopted for classification, so that the actual scene requirements are further met.
The traditional zero sample classification method seeks a one-to-many relationship, the classification method ignores rich information in multi-label sample images, the sample visual feature characterization is excessively simplified, partial useful information is lost, and subsequent learning and classification stages face constraint and difficulty. The invention utilizes the deep learning network to learn the multi-example characteristic representation form of the complex image, fully utilizes the relation between the multi-label sample image and the category and between the categories, improves the existing multi-label image classification technology, realizes the classification technology suitable for the zero-sample multi-label image, improves the image labeling precision and solves the label missing problem to a certain extent.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a zero sample multi-label classification method based on deep end-to-end example differentiation.
The technical scheme adopted by the invention is as follows: a zero sample multi-label classification method based on deep end-to-end example differentiation comprises a training stage and a testing stage, wherein,
the training stage is to obtain an end-to-end network, and the end-to-end network is composed of a multi-example feature extraction network, a cross-modal mapping network, labels of training samples and constraint modules among the labels of testing samples; the training phase specifically comprises:
11) training a multi-example feature extraction network;
12) extracting label features corresponding to the training samples;
13) performing cross-modal mapping network training of visual features to a label feature space, and mining incidence relations between labels and between samples and labels;
14) a constraint module between the label of the training sample and each label of the testing sample;
15) optimizing a final objective function in a training phase;
the testing stage is to directly utilize the end-to-end network obtained in the training stage to realize zero-sample multi-label classification; the testing stage specifically comprises:
21) extracting multi-example features of the test sample by using a multi-example feature extraction network;
22) extracting label features corresponding to the test samples;
23) and (4) multi-label classification of the test sample.
The multi-example feature extraction network in the step 11) takes the network structures of the last third layer and the layers before the last third layer of the VGG-16 network as the multi-example feature extraction network, wherein the output of the 3-dimensional convolution layer of the last third layer is taken as the multi-example visual feature x of the training samplei∈Rt×pWhere i is 1, …, n, p is the visual feature space dimension of multiple examples, n is the number of training samples, and t is the number of multiple examples of each training sample.
Step 12) inputting the label information of the image into the distributed language model to obtain training sample label semantic characteristics Y ═ Y1,…ys]∈Rq×sAnd q is the dimensionality of the semantic vector, and s is the number of labels corresponding to the training samples.
Step 13) comprises: after cross-modal feature transformation is carried out on the training samples through a cross-modal mapping network W, the assumption is satisfied in a semantic space: the similarity between the label related to the training sample and the training sample is large, and the similarity between the label unrelated to the training sample and the training sample is small, namely the multi-example visual feature of the training sample after the cross-modal feature transformationSemantic feature y with tagjPerforming similarity measurement:
where F represents the training sample and label correlation score vector, xiFor multiple instances of training samplesA visual characteristic;
multi-instance visual feature x of training sampleiSemantic feature y with tagjAverage score f (x) with similarityi,yj) Is represented as follows:
f(xi,yj)=avgtF(xi,yj)
where t is the number of multiple instances of each training sample,
the labels related to the training samples, the labels not related to the training samples and the training samples satisfy the following relationship:
wherein ,labels unrelated to the training samples;
based on the maximum interval frame of multi-label learning, the optimal objective function of all training samples existing in the training stage is as follows:
wherein ,the loss function is ordered for the largest interval,in order to be a term of regularization,for norm, W is the cross-modal mapping network, M is the multi-instance feature network, and λ is for balancing the regularization term and ordering loss functionParameters, wherein maximum interval orders a loss functionSatisfies the following conditions:
wherein ,f0The similarity threshold can be adjusted experimentally.
The optimized objective functions of all the training samples realize multi-modal mapping from the visual space of the training samples to the semantic space of the training samples, and the association information between the labels corresponding to the training samples and the sequencing information between the labels are unified to the optimized objective functions for optimization.
Step 14) the constraint module comprises:
(1) and a similarity constraint module among the labels:
wherein ,the similarity relation among the labels is obtained through a WordNet dictionary method, the WordNet dictionary method is based on a tree-shaped hierarchical structure, the similarity relation among the labels is reflected through a connection path among the labels, and the similarity among the labels is defined as the reciprocal of the path length path _ len (h, z);
(2) a statistical symbiosis constraint module among the labels:
wherein ,for the statistical symbiotic relationship among the labels, HC (h, z) represents the co-occurrence times of the label h and the label z, HC (h) represents the co-occurrence times of the label h, and HC (z) represents the co-occurrence times of the label z.
Step 15) combining the incidence relation between the original image of the training sample and the label of the training sample and the relation between the label of the training sample and the label of the test sample to obtain a final objective function as follows:
wherein ,the loss function is ordered for the largest interval,in order to be a term of regularization,to norm, W is the cross-modal mapping network, M is the multi-instance feature network, λ is a parameter used to balance the regularization term and the ordering loss function,similarity constraint module between labels of training sample and test sampleOr a statistical symbiosis constraint module between labels of the training sample and the test sampleWherein the maximum interval orders the penalty functionSatisfies the following conditions:
wherein ,f0The similarity threshold can be adjusted according to experiments;
the final goal of the training phase is to optimize the end-to-end network, resulting in the cross-modal mapping network W and the multi-instance feature network M.
Step 21) includes: inputting the test sample image into the trained multi-example feature extraction network to obtain the multi-example feature x 'of the test sample'l∈Rr×pAnd l is 1, …, m is the number of test samples, p is the visual feature space dimension of multiple examples, and r is the number of multiple examples of each test sample.
Step 22) inputting the label information of the image into the distributed language model to obtain the semantic feature Y' ═ Y of the label of the test sample1’,…,yu’]∈Rq×uAnd q is the semantic vector dimension, and u is the number of labels corresponding to the test sample.
Step 23) is a test sample multiple example feature x'lMapping to test sample tag semantic feature spaceW is a cross-modal mapping network, and the multi-example characteristics of the mapped test sample are directly subjected toAnd test sample candidate tag y'oO 1, …, u performs a similarity measure:
r(x’l) For the similarity degree of the multi-example characteristics of the mapped test sample and any test sample candidate label, the similarity degree is obtainedIf the similarity degree is greater than a set threshold value, testing multiple example characteristics x 'of the sample'lContaining the candidate tag.
The zero sample multi-label classification method based on the deep end-to-end example differentiation analyzes the feasibility and the limitation of the existing scheme aiming at the problem of multi-label zero sample image classification, improves the characteristic representation mode of the complex scene image, fully excavates the ambiguity of the image, and realizes the association with the semantic word vector among the multiple labels on the basis, so that the multi-label image labeling can be realized on the unmarked image. The advantages are mainly reflected in that:
(1) the novelty is as follows: the zero-sample multi-label image classification aims at classifying and labeling unseen classes, the label information and samples of a complex actual scene graph are difficult to obtain, the classification purpose is realized by means of intermediate auxiliary information in combination with a zero-sample learning thought, the zero-sample multi-label image classification is a bold attempt in the field of multimedia understanding, a conventional visual feature characterization mode is broken, example differentiation segmentation is carried out on image information, and the zero-sample multi-label image classification is a new breakthrough and attempt for task research.
(2) Multimode property: zero sample learning belongs to the field of multi-modal learning, and auxiliary information obtained from other channels besides visual information is required to realize learning and prediction of unseen classes, so that the zero sample learning has multi-modal attitude no matter the problem of single-label or multi-label zero sample classification. Particularly, the method relates to two modalities of vision and semantics, and relates to the knowledge fields of transfer learning, cross-modality learning and the like.
(3) End-to-end: the three functions of multi-example feature characterization learning, multi-example multi-label classification and multi-mode mapping are unified in the same network framework, and a unique target constraint function is used for uniformly adjusting network parameters to achieve the optimal classification performance.
(4) The practicability is as follows: the single-label zero-sample image classification is suitable for each sample image only corresponding to a single labeling category, images in actual life often contain complex background information of multiple categories, and the multi-label zero-sample image classification is used for labeling more complex scene images according to the actual situation and better meets the actual scene requirements.
Drawings
FIG. 1 is a block diagram of a framework for an end-to-end network in accordance with the present invention;
FIG. 2 is a flow chart of a training process for solving the multi-label zero-sample classification problem in the present invention.
Detailed Description
The following describes a zero sample multi-label classification method based on deep end-to-end example differentiation according to the present invention in detail with reference to the following embodiments and the accompanying drawings.
The zero sample multi-label classification method based on deep end-to-end example differentiation comprises a training stage and a testing stage, wherein,
the training stage is to obtain an end-to-end network, and the end-to-end network is composed of a multi-example feature extraction network, a cross-modal mapping network, and constraint modules among labels of training samples and labels of testing samples, as shown in fig. 1; the training phase is shown in fig. 2, and specifically includes:
11) training a multi-example feature extraction network;
the multi-example feature extraction network takes network structures of a last three layers and before the last three layers of the VGG-16 network as the multi-example feature extraction network, wherein the output of a 3-dimensional convolution layer of the last three layers is taken as the multi-example visual feature x of the training samplei∈Rt×pWhere i is 1, …, n, p is the visual feature space dimension of multiple examples, n is the number of training samples, and t is the number of multiple examples of each training sample.
12) Extracting label features corresponding to the training samples; inputting label information of an image into a distributed language model to obtain training sample label semantic characteristics Y ═ Y1,…ys]∈Rq×sAnd q is the dimensionality of the semantic vector, and s is the number of labels corresponding to the training samples.
13) Training a cross-modal mapping network from the visual features to a label feature space, realizing multi-modal fusion, and mining incidence relations between labels and between samples and labels; the method comprises the following steps:
after cross-modal feature transformation is carried out on the training samples through a cross-modal mapping network W, the assumption is satisfied in a semantic space: the similarity between the label related to the training sample and the training sample is large, and the similarity between the label unrelated to the training sample and the training sample is small, namely the multi-example visual feature of the training sample after the cross-modal feature transformationSemantic feature y with tagjPerforming similarity measurement:
where F represents the training sample and label correlation score vector, xiMulti-example visual features for training samples;
multi-instance visual feature x of training sampleiSemantic feature y with tagjAverage score f (x) with similarityi,yj) Is represented as follows:
f(xi,yj)=avgtF(xi,yj)
where t is the number of multiple instances of each training sample,
the labels related to the training samples, the labels not related to the training samples and the training samples satisfy the following relationship:
wherein ,labels unrelated to the training samples;
based on the maximum interval frame of multi-label learning, the optimal objective function of all training samples existing in the training stage is as follows:
wherein ,the loss function is ordered for the largest interval,in order to be a term of regularization,is a norm, W is a cross-modal mapping network, M is a multi-instance feature network, and λ is a parameter used to balance the regularization term and the ordering loss function, wherein the maximum interval ordering loss functionSatisfies the following conditions:
wherein ,f0The similarity threshold can be adjusted experimentally.
The optimized objective functions of all the training samples realize multi-modal mapping from the visual space of the training samples to the semantic space of the training samples, and the association information between the labels corresponding to the training samples and the sequencing information between the labels are unified to the optimized objective functions for optimization.
14) A constraint module between the label of the training sample and each label of the testing sample; the method comprises the following steps:
(1) and a similarity constraint module among the labels:
wherein ,the similarity relation among the labels is obtained through a WordNet dictionary method, the WordNet dictionary method is based on a tree-shaped hierarchical structure, the similarity relation among the labels is reflected through a connection path among the labels, and the similarity among the labels is defined as the reciprocal of the path length path _ len (h, z);
(2) a statistical symbiosis constraint module among the labels:
wherein ,for the statistical symbiotic relationship among the labels, HC (h, z) represents the co-occurrence times of the label h and the label z, HC (h) represents the co-occurrence times of the label h, and HC (z) represents the co-occurrence times of the label z.
15) Optimizing a final objective function in a training phase;
the final objective function is obtained by combining the incidence relation between the original image of the training sample and the label of the training sample and the relation between the label of the training sample and the label of the test sample as follows:
wherein ,the loss function is ordered for the largest interval,in order to be a term of regularization,to norm, W is the cross-modal mapping network, M is the multi-instance feature network, λ is a parameter used to balance the regularization term and the ordering loss function,similarity constraint module between labels of training sample and test sampleOr a statistical symbiosis constraint module between labels of the training sample and the test sampleWherein the maximum interval orders the penalty functionSatisfies the following conditions:
wherein ,f0The similarity threshold can be adjusted according to experiments;
the final goal of the training phase is to optimize the end-to-end network, resulting in the cross-modal mapping network W and the multi-instance feature network M.
The testing stage is to directly utilize the end-to-end network obtained in the training stage to realize zero-sample multi-label classification; the testing stage specifically comprises:
21) extracting multi-example features of the test sample by using a multi-example feature extraction network; the method comprises the following steps:
inputting the test sample image into the trained multi-example feature extraction network to obtain the multi-example feature x 'of the test sample'l∈Rr×pAnd l is 1, …, m is the number of test samples, p is the visual feature space dimension of multiple examples, and r is the number of multiple examples of each test sample.
22) Extracting label features corresponding to the test samples; inputting label information of an image into a distributed language model to obtain a semantic feature Y' [ < Y > ] of a label of a test sample1’,…,yu’]∈Rq×uAnd q is the semantic vector dimension, and u is the number of labels corresponding to the test sample.
23) And (4) multi-label classification of the test sample.
Is to test sample multiple example feature x'lMapping to test sample tag semantic feature spaceW is a cross-modal mapping network, and the multi-example characteristics of the mapped test sample are directly subjected toAnd test sample candidate tag y'oO 1, …, u performs a similarity measure:
r(x’l) For the similarity degree of the multi-example characteristics of the mapped test sample and any test sample candidate label, when the similarity degree is more than a set threshold value, determining that the similarity degree is more than the set threshold valueTest sample Multi-example feature x'lContaining the candidate tag.

Claims (9)

1. A zero sample multi-label classification method based on deep end-to-end example differentiation is characterized by comprising a training stage and a testing stage, wherein,
the training stage is to obtain an end-to-end network, and the end-to-end network is composed of a multi-example feature extraction network, a cross-modal mapping network, labels of training samples and constraint modules among the labels of testing samples; the training phase specifically comprises:
11) training a multi-example feature extraction network;
12) extracting label features corresponding to the training samples;
13) performing cross-modal mapping network training of visual features to a label feature space, and mining incidence relations between labels and between samples and labels;
14) a constraint module between the label of the training sample and each label of the testing sample;
15) optimizing a final objective function in a training phase;
the testing stage is to directly utilize the end-to-end network obtained in the training stage to realize zero-sample multi-label classification; the testing stage specifically comprises:
21) extracting multi-example features of the test sample by using a multi-example feature extraction network;
22) extracting label features corresponding to the test samples;
23) and (4) multi-label classification of the test sample.
2. The method according to claim 1, wherein the multi-instance feature extraction network in step 11) uses network structures at last three layers and before last three layers of the VGG-16 network as the multi-instance feature extraction network, and wherein the output of the 3-dimensional convolution layer at last three layers is used as the multi-instance visual feature x of the training samplei∈Rt×pWhere i is 1, …, n, p is the visual feature space dimension of multiple examples, n is the number of training samples, and t is the number of multiple examples of each training sample.
3. The method according to claim 1, wherein the step 12) is to input label information of the image into the distributed language model to obtain training sample label semantic feature Y ═ Y1,…ys]∈Rq×sAnd q is the dimensionality of the semantic vector, and s is the number of labels corresponding to the training samples.
4. The method according to claim 1, wherein the step 13) comprises: after cross-modal feature transformation is carried out on the training samples through a cross-modal mapping network W, the assumption is satisfied in a semantic space: the similarity between the label related to the training sample and the training sample is large, and the similarity between the label unrelated to the training sample and the training sample is small, namely the multi-example visual feature of the training sample after the cross-modal feature transformationSemantic feature y with tagjPerforming similarity measurement:
where F represents the training sample and label correlation score vector, xiMulti-example visual features for training samples;
multi-instance visual feature x of training sampleiSemantic feature y with tagjAverage score f (x) with similarityi,yj) Is represented as follows:
f(xi,yj)=avgtF(xi,yj)
where t is the number of multiple instances of each training sample,
the labels related to the training samples, the labels not related to the training samples and the training samples satisfy the following relationship:
wherein ,labels unrelated to the training samples;
based on the maximum interval frame of multi-label learning, the optimal objective function of all training samples existing in the training stage is as follows:
wherein ,the loss function is ordered for the largest interval,in order to be a term of regularization,is a norm, W is a cross-modal mapping network, M is a multi-instance feature network, and λ is a parameter used to balance the regularization term and the ordering loss function, wherein the maximum interval ordering loss functionSatisfies the following conditions:
wherein ,f0The similarity threshold can be adjusted experimentally.
The optimized objective functions of all the training samples realize multi-modal mapping from the visual space of the training samples to the semantic space of the training samples, and the association information between the labels corresponding to the training samples and the sequencing information between the labels are unified to the optimized objective functions for optimization.
5. The method according to claim 1, wherein the constraint module of step 14) comprises:
(1) and a similarity constraint module among the labels:
wherein ,the similarity relation among the labels is obtained through a WordNet dictionary method, the WordNet dictionary method is based on a tree-shaped hierarchical structure, the similarity relation among the labels is reflected through a connection path among the labels, and the similarity among the labels is defined as the reciprocal of the path length path _ len (h, z);
(2) a statistical symbiosis constraint module among the labels:
wherein ,for the statistical symbiotic relationship among the labels, HC (h, z) represents the co-occurrence times of the label h and the label z, HC (h) represents the co-occurrence times of the label h, and HC (z) represents the co-occurrence times of the label z.
6. The method according to claim 1, wherein the step 15) is to combine the correlation between the original image of the training sample and the label of the training sample and the relationship between the label of the training sample and the label of the test sample to obtain a final objective function as follows:
wherein ,the loss function is ordered for the largest interval,in order to be a term of regularization,to norm, W is the cross-modal mapping network, M is the multi-instance feature network, λ is a parameter used to balance the regularization term and the ordering loss function,similarity constraint module between labels of training sample and test sampleOr a statistical symbiosis constraint module between labels of the training sample and the test sampleWherein the maximum interval orders the penalty functionSatisfies the following conditions:
wherein ,f0The similarity threshold can be adjusted according to experiments;
the final goal of the training phase is to optimize the end-to-end network, resulting in the cross-modal mapping network W and the multi-instance feature network M.
7. The method according to claim 1, wherein the step 21) comprises: inputting the test sample image into the trained multi-example feature extraction network to obtain the multi-example feature x 'of the test sample'l∈Rr×pAnd l is 1, …, m is the number of test samples, p is the visual feature space dimension of multiple examples, and r is the number of multiple examples of each test sample.
8. The method according to claim 1, wherein step 22) is to input label information of the image into the distributed language model to obtain a semantic feature Y' ═ Y of the label of the test sample in the distributed language model1’,…,yu’]∈Rq×uAnd q is the semantic vector dimension, and u is the number of labels corresponding to the test sample.
9. The method of zero-sample multi-label classification based on deep end-to-end example differentiation according to claim 1, characterized in that step 23) is testing sample multi-example feature x'lMapping to test sample tag semantic feature spaceW is a cross-modal mapping network, and the multi-example characteristics of the mapped test sample are directly subjected toAnd test sample candidate tag y'oO 1, …, u performs a similarity measure:
r(x’l) The similarity degree of the mapped test sample multi-example feature and any test sample candidate label is determined, and when the similarity degree is greater than a set threshold value, the test sample multi-example feature x'lContaining the candidate tag.
CN201811495479.5A 2018-12-07 2018-12-07 Zero sample multi-label classification method based on depth end-to-end example differentiation Active CN109993197B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811495479.5A CN109993197B (en) 2018-12-07 2018-12-07 Zero sample multi-label classification method based on depth end-to-end example differentiation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811495479.5A CN109993197B (en) 2018-12-07 2018-12-07 Zero sample multi-label classification method based on depth end-to-end example differentiation

Publications (2)

Publication Number Publication Date
CN109993197A true CN109993197A (en) 2019-07-09
CN109993197B CN109993197B (en) 2023-04-28

Family

ID=67128980

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811495479.5A Active CN109993197B (en) 2018-12-07 2018-12-07 Zero sample multi-label classification method based on depth end-to-end example differentiation

Country Status (1)

Country Link
CN (1) CN109993197B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110580501A (en) * 2019-08-20 2019-12-17 天津大学 Zero sample image classification method based on variational self-coding countermeasure network
CN110598759A (en) * 2019-08-23 2019-12-20 天津大学 Zero sample classification method for generating countermeasure network based on multi-mode fusion
CN110765912A (en) * 2019-10-15 2020-02-07 武汉大学 SAR image ship target detection method based on statistical constraint and Mask R-CNN
CN110795590A (en) * 2019-09-30 2020-02-14 武汉大学 Multi-label image retrieval method and device based on direct-push zero-sample hash
CN111291618A (en) * 2020-01-13 2020-06-16 腾讯科技(深圳)有限公司 Labeling method, device, server and storage medium
CN111325281A (en) * 2020-03-05 2020-06-23 新希望六和股份有限公司 Deep learning network training method and device, computer equipment and storage medium
CN111563554A (en) * 2020-05-08 2020-08-21 河北工业大学 Zero sample image classification method based on regression variational self-encoder
CN111816255A (en) * 2020-07-09 2020-10-23 江南大学 RNA-binding protein recognition by fusing multi-view and optimal multi-tag chain learning
CN112308115A (en) * 2020-09-25 2021-02-02 安徽工业大学 Multi-label image deep learning classification method and equipment
CN112364895A (en) * 2020-10-23 2021-02-12 天津大学 Graph convolution network zero sample learning method based on attribute inheritance
CN112749738A (en) * 2020-12-30 2021-05-04 之江实验室 Zero sample object detection method for performing super-class inference by fusing context
CN112801105A (en) * 2021-01-22 2021-05-14 之江实验室 Two-stage zero sample image semantic segmentation method
CN112819052A (en) * 2021-01-25 2021-05-18 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Multi-modal fine-grained mixing method, system, device and storage medium
CN113139664A (en) * 2021-04-30 2021-07-20 中国科学院计算技术研究所 Cross-modal transfer learning method
CN114882279A (en) * 2022-05-10 2022-08-09 西安理工大学 Multi-label image classification method based on direct-push type semi-supervised deep learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512679A (en) * 2015-12-02 2016-04-20 天津大学 Zero sample classification method based on extreme learning machine
CN106203483A (en) * 2016-06-29 2016-12-07 天津大学 A kind of zero sample image sorting technique of multi-modal mapping method of being correlated with based on semanteme
CN107766873A (en) * 2017-09-06 2018-03-06 天津大学 The sample classification method of multi-tag zero based on sequence study
CN108376267A (en) * 2018-03-26 2018-08-07 天津大学 A kind of zero sample classification method based on classification transfer
CN108399421A (en) * 2018-01-31 2018-08-14 南京邮电大学 A kind of zero sample classification method of depth of word-based insertion
CN108629367A (en) * 2018-03-22 2018-10-09 中山大学 A method of clothes Attribute Recognition precision is enhanced based on depth network
US20200187841A1 (en) * 2017-02-01 2020-06-18 Cerebian Inc. System and Method for Measuring Perceptual Experiences

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512679A (en) * 2015-12-02 2016-04-20 天津大学 Zero sample classification method based on extreme learning machine
CN106203483A (en) * 2016-06-29 2016-12-07 天津大学 A kind of zero sample image sorting technique of multi-modal mapping method of being correlated with based on semanteme
US20200187841A1 (en) * 2017-02-01 2020-06-18 Cerebian Inc. System and Method for Measuring Perceptual Experiences
CN107766873A (en) * 2017-09-06 2018-03-06 天津大学 The sample classification method of multi-tag zero based on sequence study
CN108399421A (en) * 2018-01-31 2018-08-14 南京邮电大学 A kind of zero sample classification method of depth of word-based insertion
CN108629367A (en) * 2018-03-22 2018-10-09 中山大学 A method of clothes Attribute Recognition precision is enhanced based on depth network
CN108376267A (en) * 2018-03-26 2018-08-07 天津大学 A kind of zero sample classification method based on classification transfer

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MIN-LING ZHANG ET AL.: "Multi-label learning by instance differentiation", 《PROCEEDINGS OF THE 22ND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》 *
YANG ZHANG ET AL.: "Fast Zero-Shot Image Tagging", 《2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
姜文晖: "物体检索与定位技术研究", 《中国博士学位论文全文数据库信息科技辑》 *

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110580501A (en) * 2019-08-20 2019-12-17 天津大学 Zero sample image classification method based on variational self-coding countermeasure network
CN110580501B (en) * 2019-08-20 2023-04-25 天津大学 Zero sample image classification method based on variational self-coding countermeasure network
CN110598759A (en) * 2019-08-23 2019-12-20 天津大学 Zero sample classification method for generating countermeasure network based on multi-mode fusion
CN110795590A (en) * 2019-09-30 2020-02-14 武汉大学 Multi-label image retrieval method and device based on direct-push zero-sample hash
CN110795590B (en) * 2019-09-30 2023-04-18 武汉大学 Multi-label image retrieval method and device based on direct-push zero-sample hash
CN110765912B (en) * 2019-10-15 2022-08-05 武汉大学 SAR image ship target detection method based on statistical constraint and Mask R-CNN
CN110765912A (en) * 2019-10-15 2020-02-07 武汉大学 SAR image ship target detection method based on statistical constraint and Mask R-CNN
CN111291618A (en) * 2020-01-13 2020-06-16 腾讯科技(深圳)有限公司 Labeling method, device, server and storage medium
CN111291618B (en) * 2020-01-13 2024-01-09 腾讯科技(深圳)有限公司 Labeling method, labeling device, server and storage medium
CN111325281A (en) * 2020-03-05 2020-06-23 新希望六和股份有限公司 Deep learning network training method and device, computer equipment and storage medium
CN111325281B (en) * 2020-03-05 2023-10-27 新希望六和股份有限公司 Training method and device for deep learning network, computer equipment and storage medium
CN111563554A (en) * 2020-05-08 2020-08-21 河北工业大学 Zero sample image classification method based on regression variational self-encoder
CN111563554B (en) * 2020-05-08 2022-05-17 河北工业大学 Zero sample image classification method based on regression variational self-encoder
CN111816255B (en) * 2020-07-09 2024-03-08 江南大学 RNA binding protein recognition incorporating multi-view and optimal multi-tag chain learning
CN111816255A (en) * 2020-07-09 2020-10-23 江南大学 RNA-binding protein recognition by fusing multi-view and optimal multi-tag chain learning
CN112308115B (en) * 2020-09-25 2023-05-26 安徽工业大学 Multi-label image deep learning classification method and equipment
CN112308115A (en) * 2020-09-25 2021-02-02 安徽工业大学 Multi-label image deep learning classification method and equipment
CN112364895B (en) * 2020-10-23 2023-04-07 天津大学 Graph convolution network zero sample learning method based on attribute inheritance
CN112364895A (en) * 2020-10-23 2021-02-12 天津大学 Graph convolution network zero sample learning method based on attribute inheritance
CN112749738A (en) * 2020-12-30 2021-05-04 之江实验室 Zero sample object detection method for performing super-class inference by fusing context
CN112801105A (en) * 2021-01-22 2021-05-14 之江实验室 Two-stage zero sample image semantic segmentation method
CN112819052A (en) * 2021-01-25 2021-05-18 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Multi-modal fine-grained mixing method, system, device and storage medium
CN113139664B (en) * 2021-04-30 2023-10-10 中国科学院计算技术研究所 Cross-modal migration learning method
CN113139664A (en) * 2021-04-30 2021-07-20 中国科学院计算技术研究所 Cross-modal transfer learning method
CN114882279A (en) * 2022-05-10 2022-08-09 西安理工大学 Multi-label image classification method based on direct-push type semi-supervised deep learning
CN114882279B (en) * 2022-05-10 2024-03-19 西安理工大学 Multi-label image classification method based on direct-push semi-supervised deep learning

Also Published As

Publication number Publication date
CN109993197B (en) 2023-04-28

Similar Documents

Publication Publication Date Title
CN109993197B (en) Zero sample multi-label classification method based on depth end-to-end example differentiation
Yang et al. Visual sentiment prediction based on automatic discovery of affective regions
Tang et al. Visual and semantic knowledge transfer for large scale semi-supervised object detection
Wang et al. Beyond object recognition: Visual sentiment analysis with deep coupled adjective and noun neural networks.
Bhagat et al. Image annotation: Then and now
CN109002834B (en) Fine-grained image classification method based on multi-modal representation
Sudderth et al. Learning hierarchical models of scenes, objects, and parts
Gkelios et al. Deep convolutional features for image retrieval
CN104063683A (en) Expression input method and device based on face identification
CN105844292A (en) Image scene labeling method based on conditional random field and secondary dictionary study
CN112580362A (en) Visual behavior recognition method and system based on text semantic supervision and computer readable medium
Lee et al. Save: A framework for semantic annotation of visual events
Feng et al. Beyond tag relevance: integrating visual attention model and multi-instance learning for tag saliency ranking
Mesnil et al. Learning semantic representations of objects and their parts
CN116737979A (en) Context-guided multi-modal-associated image text retrieval method and system
Sun et al. Detection and recognition of text traffic signs above the road
Song et al. Sparse multi-modal topical coding for image annotation
Ke et al. A two-level model for automatic image annotation
CN113516118B (en) Multi-mode cultural resource processing method for joint embedding of images and texts
Tian et al. Scene graph generation by multi-level semantic tasks
Su et al. Cross-modality based celebrity face naming for news image collections
Xu et al. Image annotation by learning label-specific distance metrics
Dharsini et al. Captioning based image using Euclidean distance and resNet-50
Sreenivasulu et al. Adaptive inception based on transfer learning for effective visual recognition
Sharma et al. Optical Character Recognition Using Hybrid CRNN Based Lexicon-Free Approach with Grey Wolf Hyperparameter Optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant