CN114648635A - Multi-label image classification method fusing strong correlation among labels - Google Patents
Multi-label image classification method fusing strong correlation among labels Download PDFInfo
- Publication number
- CN114648635A CN114648635A CN202210250180.3A CN202210250180A CN114648635A CN 114648635 A CN114648635 A CN 114648635A CN 202210250180 A CN202210250180 A CN 202210250180A CN 114648635 A CN114648635 A CN 114648635A
- Authority
- CN
- China
- Prior art keywords
- label
- matrix
- node
- layer
- occurrence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 239000011159 matrix material Substances 0.000 claims abstract description 120
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 55
- 238000011176 pooling Methods 0.000 claims abstract description 21
- 230000007246 mechanism Effects 0.000 claims abstract description 19
- 238000010586 diagram Methods 0.000 claims abstract description 16
- 230000004927 fusion Effects 0.000 claims abstract description 15
- 230000006870 function Effects 0.000 claims description 38
- 238000013528 artificial neural network Methods 0.000 claims description 20
- 230000009466 transformation Effects 0.000 claims description 15
- 238000001514 detection method Methods 0.000 claims description 13
- 238000012549 training Methods 0.000 claims description 11
- 238000010606 normalization Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 6
- 239000004576 sand Substances 0.000 claims description 4
- 238000000844 transformation Methods 0.000 claims description 4
- 230000000875 corresponding effect Effects 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 2
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 230000002596 correlated effect Effects 0.000 claims description 2
- 238000000605 extraction Methods 0.000 claims 1
- 238000013507 mapping Methods 0.000 claims 1
- 230000001131 transforming effect Effects 0.000 claims 1
- 238000007796 conventional method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000002904 solvent Substances 0.000 description 1
- 238000000547 structure data Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a multi-label image classification method fusing strong correlation among labels, which comprises the following steps: clustering the labels in the data set into M communities, and dividing a traditional label co-occurrence matrix into M-times label co-occurrence tensors; sending the picture to be trained into a general convolutional neural network, and performing M-weight categorical pooling after the last hierarchy to obtain a multiple categorical feature map; sending the label co-occurrence tensor and the label embedded matrix into a multi-graph convolutional neural network, and fusing the M-fold label expression tensors into a label expression matrix by using an attention fusion mechanism after the last multi-graph convolutional layer; merging the label sub-semantic relation into a middle hierarchy of the convolutional neural network; integrating community coding information into a label expression matrix, and performing label level multiplication with the multiple generic characteristic diagrams; and constructing a global objective function. The invention learns the strong correlation between the labels and the correlation between the labels, and integrates the strong correlation into the feature map, thereby improving the performance of the multi-label classification task.
Description
Technical Field
The invention relates to the technical field of multi-label image classification, in particular to a multi-label image classification method fusing strong correlation among labels.
Background
Image classification is an important topic in the field of machine learning and is also the core of computer vision. The traditional method is to extract image features by using a convolutional neural network, for example, the neural network such as AlexNet, VGG, Resnet and the like is used to extract the features, and a high accuracy rate is obtained in the field of single-label image classification. However, in multi-tag image recognition, tag correlation is difficult to be represented by such a neural network, and therefore, in recent years, researchers have tried to incorporate the correlation between tags into a convolutional neural network by different methods.
Previous researchers have dealt with the problem of multi-label image classification, and often adopt a recurrent neural network and a graph convolution neural network to construct the correlation between labels. However, since the recurrent neural network generally processes serialized data, it is difficult to express its internal complex co-occurrence relationship in constructing correlation problems between tags. The graph convolution neural network has strong modeling capacity in processing non-Euclidean structure data (such as graph data), so that the co-occurrence relation between the labels in the graph convolution neural network can be well learned by utilizing the graph convolution neural network and a label co-occurrence graph constructed in advance. Researchers often adopt a word embedding model to construct a label node representation matrix, construct a label co-occurrence adjacency matrix by utilizing the co-occurrence relation among labels in a data set, send the label node representation matrix and the label co-occurrence adjacency matrix into a graph convolution neural network to learn the correlation among labels, and finally fuse the learned label correlation into a final-level feature graph of the convolutional neural network.
The conventional method for constructing the tag correlation by using the graph convolution neural network, such as the ML-GCN method, and integrating the tag correlation into the last layer of the convolution neural network has two disadvantages: first, the convolutional neural network only has correlation between the last convolutional layer and the merged tag, and the convolutional neural network, especially the number of convolutional layers of the Resnet correlation model, is usually stacked very much, which results in insufficient correlation of the merged tag of the convolutional neural network; second, conventional methods of constructing tag dependency adjacency matrices tend to simply establish co-occurrence relationships between tags throughout the dataset, ignoring strong relationships between the interior of tags. For example, { "person", "cup", "bowl", "table", "umbrella", chair "," car "} may have some relevance, but {" cup "," bowl "," table "," chair "}, {" person "," umbrella "," car "} may have stronger relevance.
The prior MGTN model fuses the strong connection in the label into the convolutional neural network by the Graph transformations method, divides the label nodes into different communities by the community detection algorithm, and fuses the label nodes of the different communities into different characteristic graphs by using Multiple CNNs (consisting of a plurality of convolutional neural networks), but the use of the Multiple CNNs needs to consume very large calculation amount to learn a small number of communities, so that the number of the communities is difficult to expand.
Disclosure of Invention
1. Technical problem to be solved by the invention
In view of the defects of the prior art, the invention provides a multi-label image classification method fusing strong correlation among labels; according to the invention, the traditional label co-occurrence adjacency matrix is converted into a plurality of sub-label co-occurrence adjacency matrixes, and the strong correlation inside the sub-labels is learned through the multi-graph convolutional neural network, so that the co-occurrence information of the sub-labels is more fully fused into the convolutional neural network, and the image classification performance of the convolutional neural network is improved.
2. Technical scheme
In order to achieve the purpose, the technical scheme provided by the invention is as follows:
the invention discloses a multi-label image classification method fusing strong correlation among labels, which comprises the following steps:
s1, constructing a label co-occurrence matrix by using the label co-occurrence relation in the data set; dividing the label nodes into M communities by using a community detection algorithm; setting a threshold value to divide the label co-occurrence matrix into M co-occurrence subgraphs;
s2, obtaining an image file of training data, sending the image file into a convolutional neural network to extract image features to obtain a feature map, converting the feature map into M two-dimensional generic feature maps through multiple generic pooling operations, and splicing the two-dimensional generic feature maps in parallel;
s3, acquiring a label embedding matrix, acquiring a label co-occurrence tensor, sending the M label co-occurrence tensors and the label embedding matrix into a multi-graph convolutional neural network to obtain an M-fold label node representation tensor, and fusing the M-fold label node representation tensors into a label node representation matrix by utilizing a label level attention fusion mechanism;
s4, fusing an M-ply label node expression tensor generated by the multi-ply graph convolutional layer into a feature graph output by a middle level of the general convolutional neural network by using an attention mechanism;
s5, building a community coding M matrix according to M communities divided by the label nodes by a community detection algorithm, multiplying the M matrix by a label node expression matrix fused by a fusion mechanism to obtain a label node expression matrix with community coding, multiplying the matrix by a multiple generic feature graph spliced together in the column direction, and using the obtained prediction label for classification;
and S6, constructing a loss function, wherein the loss function comprises a multi-label classification loss function and an objective function for learning a parameter matrix in the multiple generic pooling layer.
3. Advantageous effects
Compared with the prior art, the technical scheme provided by the invention has the following remarkable effects:
(1) the traditional multi-label deep learning algorithm usually utilizes the correlation between the convolutional neural network learning labels, but ignores the strong connection in the sub-labels, and the multi-label image classification method fusing the strong correlation between the labels converts the traditional label co-occurrence adjacent matrix into a plurality of sub-label co-occurrence adjacent matrices, learns the strong correlation in the sub-labels through the multi-image convolutional neural network, and is more favorable for improving the classification performance of the multi-label image.
(2) According to the multi-label image classification method fusing the strong correlation among labels, a label attention fusion mechanism can learn the correlation among label nodes, and the correlation among sub-images is not only learned in a Graph transform algorithm. The sub-graph label node representation is fused into the graph convolution neural network by using an attention mechanism, so that the co-occurrence information of the sub-labels is more fully fused into the convolution neural network, and the image classification performance of the convolution neural network is improved.
Drawings
FIG. 1 is a flow chart of a multi-label image classification method fusing strong correlation between labels according to the present invention;
FIG. 2 is a schematic diagram of Resnet-101 extracting image features according to the present invention;
FIG. 3 is a schematic diagram of generic pooling of the present invention;
FIG. 4 is a schematic diagram of the multiple generic pooling of the present invention;
FIG. 5 is a schematic diagram of the fusion of multiple convolutional layers and a sub-graph tag representation matrix according to the present invention;
FIG. 6 is a schematic diagram of an intermediate layer of the present invention for fusing sub-graph label correlations to a convolutional neural network;
FIG. 7 is an internal structure diagram of the M matrix according to the present invention;
FIG. 8 is a schematic diagram of the tag level multiplication of the present invention.
Detailed Description
For a further understanding of the invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings and examples.
Example 1
According to the multi-label image classification method fusing the strong correlation between labels, the label relation and the strong relation inside the labels are learned, and a multi-label classification task is better performed, and the method comprises the following steps:
s1, constructing a label co-occurrence matrix by using the label co-occurrence relation in the data set; dividing the label nodes into M communities by utilizing a community detection algorithm; setting a threshold value to divide the label co-occurrence matrix into M co-occurrence subgraphs, namely a label co-occurrence tensor; the method specifically comprises the following steps:
(1-1) obtaining an image file of training data and labels in the training data, and establishing a co-occurrence relation adjacency matrix A belonging to R for the labelsC×CWherein A isij1 denotes the label LiWhen present, label LjThere is also a certain probability that A will occur otherwiseij0. By using the basic conditional probability formula and the Z matrix, the conditional probability matrix P epsilon R can be constructedC×CDenotes when the label LiWhen present, label LjSetting a threshold value tau to binarize the P matrix and constructing a label co-occurrence matrix A epsilon for RC×CThe following:
(1-2) obtaining a label co-occurrence adjacency matrix, dividing the label co-occurrence adjacency matrix into M communities and M subgraphs (label sub-relationship adjacency tensors),
dividing the label nodes into M communities by using a community detection algorithm, firstly, calculating the modularity of the adjacency matrix A,
wherein,di=∑kAi,kdegree of the ith node, and δ (c)i,cj) Used for detecting whether the i node and the j node are in the same community, if so, delta (c)i,cj) 1, otherwise, δ (c)i,cj)=0,
When the community detection algorithm starts, each node is independently used as a community, and in order to maximize modularity, other label nodes are brought into the community. Until the modularity does not change, M communities are detected.
(1-3) constructing a label sub-relationship adjacency tensor by setting a threshold as follows: setting a threshold T ═ T on the conditional probability matrix P1,...tm]Wherein, ti∈[0,1]And isConstructing a label sub-relationship adjacency tensorThe following were used:
wherein,co-occurrence of adjacency matrices by labelsk is formed by { 1.,. M },representing the kth sub-graph, the label LiWhen present, label LjThere is also a certain probability that this will occur,
in this embodiment, the number of communities is equal to the number of subgraphs.
S2, obtaining an image file of training data, sending the image file into a convolutional neural network to extract image features to obtain a feature map, converting the feature map into M two-dimensional generic feature maps through multiple generic pooling operations, and splicing the two-dimensional generic feature maps in parallel; the method specifically comprises the following steps:
(2-1) acquiring an image file of training data, and cutting the image into a uniform specification (3 × 448 × 448) and inputting the uniform specification into the Resnet-101 neural network. In a neural network, the feature map goes through a 7 × 7 convolutional layer, a 1 × 1 pooling layer, and four intermediate layers, layer1, layer2, layer3, layer 4. Four ofThe feature map output by the middle hierarchy is
Wherein s is the hierarchy number, and the value range is {1,2,3,4 }. WS、HSAnd CSThe width, height and channel number of the output feature map of the s-th level are respectively. Specifically, the feature map dimensions of the four hierarchical outputs are respectively, F1∈R256×112×112,F2∈R512×56×56,F3∈R1024×28×28,F4∈R2048×14×14Then, a multiple generic pooling layer is connected to obtain a multiple generic feature map, as shown in FIG. 2.
(2-2) obtaining the last hierarchical feature map F4∈R2048×14×14Will F4Is shown asWherein, H, W and DcThe method for extracting the picture features in the hierarchical image includes the following steps:
Xlsp=Wlsp×R(Xconv)
wherein,the feature map output for the last level of Resnet-101. The dimension transformation operation R (-) may let XconvFrom the three-dimensional tensor (dimension D)cxHxW) to a two-dimensional matrix (dimension HW x D)c) That is to say that,and let the parameter matrix Wlsp∈RC×HWLeft rideCan obtain a matrix after special poolingWherein C is the label category number, and the generic pooling method is shown in FIG. 3.
(2-3) finally, performing M times of generic pooling operations by using the same method, wherein the method comprises the following specific steps:
wherein i is a positive integer from 1 to M, M is the number of subgraphs or communities, and M generic feature graphs are usedSplicing according to the column direction to obtainMultiple generic pooling layers are shown in FIG. 4.
wherein i is a positive integer from 1 to M, M is the number of subgraphs, and l is used2Norm is to maintain a parameter matrixThe sparsity of the composite material is determined,
learning a parameter matrix using tag co-occurrence tensorsIs to accelerate the parameter matrixAnd (4) learning, and differentially fusing different sub-graphs to learn the label semantic relationship by utilizing the generic feature graph. Generic feature maps containing different features can be obtained by multiple generic pooling
S3, acquiring a label embedding matrix, acquiring a label co-occurrence tensor, sending the M label co-occurrence tensors and the label embedding matrix into a multi-graph convolutional neural network to obtain an M-fold label node representation tensor, and fusing the M-fold label node representation tensors into a label node representation matrix by utilizing a label level attention fusion mechanism; the method specifically comprises the following steps:
(3-1) adjoining subgraph label relationship to matrixAnd the label embedding matrix E ∈ RC×DAnd sending the signal to a multiple convolutional neural network as follows:
the first layer of input of the multi-graph convolution layer is a label embedding matrix E belonging to RC×DAnd multiple adjacent subgraphs after normalization The subgraphs are sent into different graph convolution layers, and after the multiple graph convolution layers, a label expression tensor is obtainedWherein L islThe labels input for the ith layer of graph convolution represent the node dimensions of the tensor,l is more than or equal to 2. h (-) is a nonlinear activation function. The above formula is an expression of the multi-graph convolution layer when the multi-graph convolution layer is more than two layers, whereinFor a parameter matrix in a graph convolution layer, each layer of nodes represents a dimension represented by a parameterAnd (6) determining. Similar to the above, the first layer map is represented by
(3-2) at this time, the M-repeated label nodes need to represent tensors by using a label-level attention fusion mechanismA label node expression matrix G epsilon R is fused into a wholeC×LThe following are:
can be expressed asWherein,denotes a k-th node representing a matrix, k being an integer between 1 and M,
the importance of the matrix is represented by learning the nodes of each subgraph first, as follows:
wherein W is a parameter vector W ∈ R1×CB is an offset vector b ∈ R1×CTan h (-) is an activation function, q is a parameter vector q epsilon R1×CThe purpose is to convert the vector generated by the function tanh (-) into a scalar, so that the whole isThe scalar values obtained on both sides of the equation,for the kth label representation the label node representation of the ith node in the matrix, wkRepresenting importance weights of the label embedding matrix and the kth sub-graph when learning through a graph convolution neural network, i is an integer between 1 and C,
calculating the weight w of the good node in each subgraphkAfterwards, normalization is done by the softmax () function as follows:
thus, the importance score (β) of the node in each sub-graph is calculated1,...,βM) Respectively multiplying the importance scores by the corresponding label nodes, then summing, and then multiplying by the parameter matrixAnd performing dimension transformation to obtain a label node representation matrix G after fusion, which is as follows:
wherein G ∈ RC×LWherein C is the number of label nodes, and the label nodes represent the dimension L by a parameter matrixIt is determined that the multi-map convolutional layer and label level attention fusion network structure is shown in FIG. 5.
S4, fusing the M-ply label node expression tensor generated by the multi-ply convolutional layer into a feature graph output by the middle layer of the general convolutional neural network by using an attention mechanism; comprises the following steps:
(4-1) mixingFusing the attention mechanism into a feature map output at a certain level in the middle of Resnet-101 as follows:
the Resnet-101 top three-level output characteristic diagram isWherein s is more than or equal to 1 and less than or equal to 3, Cs、WsAnd HsThe number, width and height of the channels of the s-th middle layerIs marked asWherein, Cl=Cs、W=Ws、H=HsL represents FsAnd fusing the label node correlation output by the l-th layer graph convolution neural network.
Due to the fact thatMiddle LlIs defined by a parameter matrixDetermine that L can bel=ClP, i.e. Xl∈RP×W×H,
Using dimension transformation operation R (-) to transform XlAnd HlDimensional transformation to XR∈RWH×PAndmixing XR∈RWH×PWhen the key and the value are taken as the key and the value,as query, it is fed into the transform model decoder, and the Multi-Head Attention mechanism in the transform decoder is used to fuse tag correlations as follows:
wherein, Multihead (Q, K, V) Concat (head)1,head2,...,headh)WO,
the present embodiment uses a standard Transformer structure, and the difference from the conventional Transformer structure is that the input is from a different source. In this embodiment, key and value in the MultiHead structure are derived from feature maps output by a convolutional neural network Resnet-101 middle hierarchy, and query is derived from label nodes output by a multi-graph convolutional network to represent tensors. The correlation between the image features and the sub-labels can be learned through query and key, wherein QK isTThe correlation degree of the Q matrix and the K matrix can be calculated, the correlation score is obtained through the function normalization of softmax (), and then the correlation score is multiplied to value, so that the label correlation can be merged into the feature map. Wherein the multi-head attention mechanism is consistent with the conventional Transformer, i.e.WhileWherein h is 8, P is hdk,
The obtained characteristics X fused with the relevance of the labelRDoing the inverse operation of dimension transformation R (-) to convert XR∈RWH×PConversion to XM∈RP×W×HThe newly obtained XM∈RP×W×HWith the original feature map Xl∈RP×W×HAdd to Transformer decoder&The Norm layer is used for carrying out normalization and residual error linkage, and then the characteristic diagram X of the tag correlation is blendedM∈RP×W×HFNN layer fed into the transform decoder as follows:
XF=FFN(XM)
wherein ffn (x) max (0, xW)1+b1)W2+b2The inner layer comprises two linear layer linear transformations, and the middle is activated by a Relu activation layer.
The FFN layer in the conventional Transformer consists of two linear layers, the middle of which is activated by a Relu function, and the difference between the embodiment and the conventional method is that the parameter W is1And W2Is a three-dimensional tensor, i.e.Wherein d isffIs the dimension of the previous linear layer output.
The newly obtained XF∈RP×W×HSignature graph X correlated with original merged labelM∈RP×W×HAdd to Transformer decoder&And the Norm layer is used for carrying out normalization and residual error linkage. Obtaining the final characteristic diagram X of the correlation of the merged labell∈RP×W×HAnd (5) sending the information to the next layer of training of the convolutional neural network, wherein the convolutional neural network attention fusion subgraph label correlation network structure is shown in FIG. 6.
S5, building a community coding M matrix according to M communities divided by label nodes by a community detection algorithm, multiplying the M matrix by a label node expression matrix fused by a fusion mechanism (Hadamard product) to obtain a label node expression matrix with community coding, multiplying the matrix by multiple generic feature graphs spliced together in the column direction (label level multiplication), and using the obtained prediction label for classification; the method comprises the following steps:
(5-1) dividing the label nodes in the data set into M communities by utilizing a community detection method, wherein M and a subgraphThe number is kept consistent, the label nodes are divided into M communities according to a community detection algorithm, so that each label node has a community to which the label node belongs, the communities are numbered as p according to a { 1.,. M } sequence, wherein M is the number of the communities, and the community number to which each label node belongs is stored in an S e.ZCWherein C is the number of label categories, Si=p,i∈{1,...,C},
Obtaining multiple label generic feature map through multiple generic pooling layersWherein, XmullspCan be expressed as a combination of generic feature mapsi is a positive integer between 1 and M,by usingDifferent communities are distinguished and matched, the pth community is determined to be matched with which generic label graph according to the community code p,
constructing a community coding M matrix as follows:
wherein, tau is equal to [0, 1]],SiP means that the i-th tag can find the community to which it belongs as p,representing node needs and generic features graph belonging to p-communityMatching, so that the point corresponding to the feature map can be found in the M matrix, the point value is set as tau, and other position points are set asIt is clear that,the purpose of differentially fusing different label nodes by using different generic characteristics is achieved, and the internal structure of the community coding M matrix is shown in FIG. 7.
(5-2) expressing the community coding M matrix and the label node expression matrix G epsilon RC×LMaking Hadamard product, and mixing with XmullspPerforming label level multiplication to enable the model to fuse different generic features with label nodes of different communities as follows:
since G is belonged to RC×LL in (1) is a parameter matrixDetermine that L and D can be madecM equal, coding subgraphs And label node representationMultiplying (Hadamard product) to enable the label nodes to be merged into community coding information to obtain a matrixThe following were used:
W=M·G
(5-3) As shown in FIG. 8, theAndperforming label level multiplication for final classification, as follows:
w may be represented as W ═ W1,...,wC]T,XmullspCan be represented as Xmullsp=[x1,...,xC]And C is the number of label categories, the prediction function is as follows:
and S6, constructing a loss function, wherein the loss function comprises a multi-label classification loss function and an objective function for learning a parameter matrix in the multiple generic pooling layer. The method comprises the following steps:
(6-1) defining the objective function as:
where L (-) is a loss function of the multi-label classification, LlspFor accelerating learning parametersAnd is used to distinguish different subgraphs, theta is a global parameter that needs to be learned.
(6-2) using a two-class loss function as a multi-label class loss function as follows:
objective function L for guiding generic featureslsp(θ) is:
(6-3) therefore, the overall loss function is:
Through the steps, semantic correlations of different strengths of the labels are merged into the convolutional neural network to obtain the predicted labelAnd calculating the minimum value between the real label and the predicted label by using a loss function, and optimizing by adopting SGD (generalized minimum) to fulfill the aim of improving the classification performance of the multi-label image.
In the embodiment, the Resnet related network is used as a base network to extract image features, and replaces the traditional average pooling layer or maximum pooling layer after the last layer, and a plurality of generic pooling layers are used for converting feature map dimensions. And learning parameters in the label sub-relation adjacent tensor to obtain a generic feature map with different features. The method replaces different communities of a Multiple CNNs matching method in MGTN, so that the calculated amount is greatly reduced, more communities are learned, and parameters in the adjacent matrixes are learned by using different sub-labels to distinguish label features in different subgraphs.
In addition, the embodiment adopts the multi-Graph convolutional neural network to train the co-occurrence adjacent tensor of the sub-labels and the label node representation matrix, which is different from the method in the MGTN, the embodiment can learn the semantic relationship of the label nodes under the co-occurrence adjacent matrix of different sub-labels, and only learn the correlation of the co-occurrence adjacent matrix of different sub-labels, or the correlation between element paths, by using the Graph transformations method,
the embodiment utilizes an attention mechanism, a transform decoder structure to integrate the tag correlation into the middle layer of the convolutional neural network. The embodiment utilizes a MultiHead orientation mechanism to integrate the label correlation into a convolutional neural network, takes a feature graph output by the convolutional neural network as key and value, and takes a label node expression tensor output by the multi-graph convolutional neural network as query, so that the sub-label correlation is integrated into an intermediate layer feature graph of the convolutional neural network, and the integration degree of the label correlation and the convolutional neural network is improved.
In summary, the embodiment can learn the correlation between the labels and the correlation inside the labels, and more fully fuse the strong correlation between the labels and the correlation between the labels into the convolutional neural network, and the method of the embodiment is more easily expanded on the number of communities and the number of subgraphs, thereby being more beneficial to improving the multi-label image classification performance.
It is worth to be noted that the number of layers of the multi-map convolutional layer is not necessarily limited to three, if the number of layers of the multi-map convolutional neural network is three, the multi-map convolutional neural network needs to correspond to the first three middle layers of the convolutional neural network one by one, and the sub-label correlation is merged into the middle layer of the convolutional neural network; if the number of layers of the multi-map convolutional layer is less than three, selectively fusing the sub-label correlation into some layers of the first three middle layers of the convolutional neural network; but the number of layers of the multi-map convolutional layer cannot be more than three.
If the number of layers of the multi-map convolutional layers is three, the label expression tensors output by each multi-map convolutional layer need to correspond to the first three middle layers of the feature map output by the convolutional neural network one by one, and the transformer decoder is used for integrating the sub-label correlation into the middle layers of the convolutional neural network. Therefore, the dimension of the feature map output by the first three middle levels is F1∈R256×112×112,F2∈R512×56×56,F3∈R1024×28×28And the label expression tensor dimension of the multi-graph convolution layer output is If the number of layers of the multi-map convolutional layer is less than three, the eigen map and the label expression tensor dimension are kept consistent by using a similar method.
Compared with the method of matching different communities by using a multiple CNNs method, the method can expand the number of sub-graphs or communities by using smaller calculated amount, and is easier to learn the strong correlation between the label correlation and the sub-labels, thereby improving the classification performance of multi-label images.
The present invention and its embodiments have been described above schematically, without limitation, and what is shown in the drawings is only one of the embodiments of the present invention, and the actual structure is not limited thereto. Therefore, if the person skilled in the art receives the teaching, without departing from the spirit of the invention, the person skilled in the art shall not inventively design the similar structural modes and embodiments to the technical solution, but shall fall within the scope of the invention.
Claims (7)
1. A multi-label image classification method fusing strong correlation among labels is characterized by comprising the following steps:
s1, constructing a label co-occurrence matrix by using the label co-occurrence relation in the data set; dividing the label nodes into M communities by using a community detection algorithm; setting a threshold value to divide the label co-occurrence matrix into M co-occurrence subgraphs;
s2, obtaining an image file of training data, sending the image file into a convolutional neural network to extract image features to obtain a feature map, converting the feature map into M two-dimensional generic feature maps through multiple generic pooling operations, and splicing the two-dimensional generic feature maps in parallel;
s3, acquiring a label embedding matrix, acquiring a label co-occurrence tensor, sending the M label co-occurrence tensors and the label embedding matrix into a multi-graph convolutional neural network to obtain an M-fold label node representation tensor, and fusing the M-fold label node representation tensors into a label node representation matrix by utilizing a label level attention fusion mechanism;
s4, fusing the M-ply label node expression tensor generated by the multi-ply convolutional layer into a feature graph output by the middle layer of the general convolutional neural network by using an attention mechanism;
s5, constructing a community coding M matrix according to M communities divided by the label nodes by a community detection algorithm, multiplying the M matrix by a label node expression matrix fused by a fusion mechanism to obtain a label node expression matrix with community coding, multiplying the matrix by multiple generic feature graphs spliced together in the column direction, and using the obtained prediction label for classification;
and S6, constructing a loss function, wherein the loss function comprises a multi-label classification loss function and an objective function for learning a parameter matrix in the multiple generic pooling layer.
2. The multi-label image classification method fusing strong correlation between labels as claimed in claim 1, wherein step S1 specifically includes the following steps:
(1-1) obtaining the image file of the training data and the label in the training data, and establishing a co-occurrence relation adjacency matrix A belonging to the R for the labelC×CWherein A isij1 denotes the label LiWhen present, label LjIt is also possible that otherwise Aij0, as follows:
firstly, counting the co-occurrence relation of labels in a data set, and constructing a Z matrix, wherein Z belongs to RC×CWherein Z isijIndicating label LiAnd a label LjThe times of common occurrence in the data set are constructed by a Z matrix and a basic conditional probability formula to construct a conditional probability matrix P belonging to RC×C,Pij=P(Lj|Li) Denotes when the label LiWhen present, label LjSetting a threshold value tau to binarize the P matrix and constructing a label co-occurrence matrix A epsilon for RC×CThe following are:
(1-2) dividing the label nodes into M communities by using a community detection algorithm, wherein the modularity of the adjacency matrix A is defined as follows:
wherein,di=∑kAi,kis the degree of the ith node, and, δ (c)i,cj) Used for detecting whether the i node and the j node are in the same community, if so, delta (c)i,cj) 1, otherwise, δ (c)i,cj)=0;
Each node is independently used as a community, and in order to maximize modularity, other label nodes are brought into the community;
until the modularity does not change, the M communities are detected, namely, all the label nodes in the data set are divided into the M communities;
(1-3) setting a threshold to construct a tag co-occurrence tensor as follows: setting a threshold T ═ T on the conditional probability matrix P1,...,tm]Wherein, ti∈[0,1]And isConstructing a tag co-occurrence tensorThe following were used:
3. The multi-label image classification method fusing the strong correlation between labels as claimed in claim 2, wherein: step S2 specifically includes the following steps:
(2-1) acquiring an image file of training data, and inputting the image file into a Resnet-101 convolutional neural network; obtaining a feature map of the final level output by Resnet-101 feature extraction, and representing the feature map asWherein, H, W and DcRespectively, height, width and channel of the characteristic diagram;
(2-2) changing the original feature map XconvDimension (c):
Xlsp=Wlsp×R(Xconv)
transforming X by a dimension transformation operation R (-)convMapping from three-dimensional tensor to two-dimensional matrix and letting parameter matrix Wlsp∈RC×HWLeft rideWherein C is the label category number, and a generic feature map of dimension transformation is obtained
(2-3) repeating the same method M times, specifically as follows:
wherein i is a positive integer between 1 and M, M is the number of label co-occurrence graphs or community number, and M generic feature graphs are combinedSplicing according to the column direction to obtain a multiple generic feature map
(2-4) learning parameter matrix by using label co-occurrence relation after segmenting subgraphThe following were used:
4. The multi-label image classification method fusing strong correlation between labels as claimed in claim 3, wherein step S3 specifically comprises the following steps:
(3-1) Co-occurrence tensor of tagsAnd the label embedding matrix E ∈ RC×DThe method is sent to a multi-graph convolutional neural network, and comprises the following steps:
the first layer of input of the multi-graph convolution layer is a label embedding matrix E belonging to RC×DAnd normalized multiple adjacency sub-graph The subgraphs are sent into different graph convolution layers, and after the multiple graph convolution layers, a label expression tensor is obtainedWherein L islThe label input for the convolution of the ith layer of graph represents the node dimension of the tensor, wherein l is more than or equal to 2; h (-) is a nonlinear activation function; the above formula is an expression of the multi-graph convolution layer when the multi-graph convolution layer is more than two layers, whereinFor a parameter matrix in a graph convolution layer, each layer of nodes represents a dimension represented by a parameterDetermining; similar to the above, the first layer map is represented by
(3-2) Label node expression tensor learned by multigraph convolutional layer by using label level attention fusion mechanismA label node expression matrix G epsilon R is fused into a wholeC×LThe following:
is shown asWherein,representing the kth node represents a matrix, k is an integer between 1 and M, and the importance of the label node representing the matrix learned by each subgraph is as follows:
wherein W is a parameter vector W ∈ R1×CB is an offset vector b ∈ R1×CTan h (-) is an activation function, q is a parameter vector q epsilon R1×C,For the k-th label representing matrix the label node representing matrix of the i-th node, wkRepresenting importance weights of the label embedded matrix and the kth sub-graph when the k sub-graph is subjected to graph convolution neural network learning, wherein i is an integer from 1 to C;
calculating the weight w of the good node on each subgraphkThen, normalization is carried out through a softmax () function, and the importance score of the node on the subgraph is calculated as follows:
thus, the importance scores (beta) of the nodes on each subgraph are obtained1,...,βM) Respectively multiplying the importance scores by corresponding label node representation matrixes, then summing, and then multiplying by a parameter matrixAnd performing dimension transformation to obtain a label node representation matrix G after fusion, which is as follows:
5. The multi-label image classification method fusing strong correlation between labels as claimed in claim 4, wherein step S4 specifically comprises the following steps:
(4-1) outputting a multi-label node representation matrix from the multi-graph convolutional neural network asWherein M is the number of subgraphs, C is the number of label categories, LlFor label node representation dimensions output for the ith graph convolution layer, willFusing the feature map output by Resnet-101 intermediate layer with attention mechanism as follows:
the output characteristic diagram of the first three middle layers of Resnet-101 isWherein s is more than or equal to 1 and less than or equal to 3, Cs、WsAnd HsThe number, width and height of the channels of the s-th middle layerIs marked asWherein, Cl=Cs、W=Ws、H=HsL represents FsFused with the label node correlation output by the l-th layer graph convolution neural network,
due to the fact thatMiddle LlIs determined by a parameter matrixDetermine that L can bel=ClP, i.e. Xl∈RP×W×H, Using dimension transformation operation R (-) to transform XlAnd HlDimensional transformation to XR∈RWH×PAndinto the decoder of the transform model, the tag correlation is fused using the Multi-Head Attention mechanism in the transform decoder, as follows:
wherein, Multihead (Q, K, V) Concat (head)1,head2,...,headh)WO,
the obtained characteristics X fused with the label correlationRDoing the inverse operation of dimension transformation R (-) to convert XR∈RWH×PConversion to XM∈RP×W×HThe newly obtained XM∈RP×W×HWith the original feature map Xl∈RP×W×HAdd to Transformer decoder&The Norm layer is used for carrying out normalization and residual error linkage, and then the characteristic diagram X of the tag correlation is blendedM∈RP×W×HFNN layer fed into the transform decoder as follows:
XF=FFN(XM)
wherein ffn (x) max (0, xW)1+b1)W2+b2The inner layer comprises two linear layer linear transformations, and the middle is activated by a Relu activation layer;
the newly obtained XF∈RP×W×HSignature graph X correlated with original merged labelM∈RP×W×HAdd to Transformer decoder&A Norm layer for normalization and residual linking; obtaining the final characteristic diagram X of the correlation of the merged labell∈RP×W×HAnd sending the training to the next layer of the convolutional neural network.
6. The method for classifying multi-label images fused with strong correlation between labels as claimed in claim 5, wherein step S5 comprises the following steps:
(5-1) dividing the labels in the data set into M communities by using a community detection algorithm, so that each label node has a respective community, numbering the communities as p according to a { 1., M } sequence order, wherein M is the number of communities, and storing the community number of each label node to the S E ZCWherein C is the number of label categories, Si=p,i∈{1,...,C},
Obtaining a multi-label generic feature mapXmullspCan be expressed as a combination of generic feature maps
By usingDifferent communities are distinguished and matched, the pth community is determined to be matched with which generic label graph according to the community code p, and a community code M matrix is constructed as follows:
wherein, tau belongs to [0, 1 ];
(5-2) representing the matrix G e R according to the label nodesC×LExpressing the community coding M matrix and the label node to form a matrix G epsilon RC×LMaking Hadamard product and combining with multiple generic feature diagram XmullspPerforming label level multiplication to enable the model to fuse different generic features with label nodes of different communities as follows:
since G is belonged to RC×LL in (1) is a parameter matrixDetermine that L and D can be madeCM is equal, then the subgraph is codedAnd label node representationMultiplying to enable the label nodes to be integrated into community coding information to obtain a matrixThe following were used:
W=M·G
(5-3) mixingAndperforming label-level multiplication, and using the obtained prediction label for final classification, wherein W can be expressed as W ═ W1,...,wC]TMultiple generic feature map XmullspCan be represented as Xmullsp=[x1,...,xC]Wherein C is the number of label categories, the prediction function is,
7. the method for classifying multi-label images fused with strong correlation between labels as claimed in claim 6, wherein step S6 comprises the following steps:
the objective function is defined as:
where L (-) is a loss function of the multi-label classification, Llsp(theta) is a parameter matrix objective function for learning the multiple generic pooling layers, and theta is a global parameter to be learned;
using a two-class loss function as the multi-label class loss function as follows:
objective function L for guiding generic featureslsp(θ) is:
thus, the overall loss function is:
by the above steps, we will differentiate the labelsThe strength semantic correlation is merged into the convolutional neural network to obtain a prediction labelAnd calculating the minimum value between the real label and the predicted label by using a loss function, and optimizing by adopting SGD (generalized minimum) to fulfill the aim of improving the classification performance of the multi-label image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210250180.3A CN114648635B (en) | 2022-03-15 | 2022-03-15 | Multi-label image classification method fusing strong correlation among labels |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210250180.3A CN114648635B (en) | 2022-03-15 | 2022-03-15 | Multi-label image classification method fusing strong correlation among labels |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114648635A true CN114648635A (en) | 2022-06-21 |
CN114648635B CN114648635B (en) | 2024-07-09 |
Family
ID=81993189
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210250180.3A Active CN114648635B (en) | 2022-03-15 | 2022-03-15 | Multi-label image classification method fusing strong correlation among labels |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114648635B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115964626A (en) * | 2022-10-27 | 2023-04-14 | 河南大学 | Community detection method based on dynamic multi-scale feature fusion network |
CN117893839A (en) * | 2024-03-15 | 2024-04-16 | 华东交通大学 | Multi-label classification method and system based on graph attention mechanism |
CN118279610A (en) * | 2024-06-03 | 2024-07-02 | 之江实验室 | Soybean phenotype identification method based on image phenotype matching, electronic equipment and medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019100723A1 (en) * | 2017-11-24 | 2019-05-31 | 华为技术有限公司 | Method and device for training multi-label classification model |
WO2019100724A1 (en) * | 2017-11-24 | 2019-05-31 | 华为技术有限公司 | Method and device for training multi-label classification model |
US20200210773A1 (en) * | 2019-01-02 | 2020-07-02 | Boe Technology Group Co., Ltd. | Neural network for image multi-label identification, related method, medium and device |
CN111476315A (en) * | 2020-04-27 | 2020-07-31 | 中国科学院合肥物质科学研究院 | Image multi-label identification method based on statistical correlation and graph convolution technology |
CN112308115A (en) * | 2020-09-25 | 2021-02-02 | 安徽工业大学 | Multi-label image deep learning classification method and equipment |
CN112906720A (en) * | 2021-03-19 | 2021-06-04 | 河北工业大学 | Multi-label image identification method based on graph attention network |
CN113657425A (en) * | 2021-06-28 | 2021-11-16 | 华南师范大学 | Multi-label image classification method based on multi-scale and cross-modal attention mechanism |
-
2022
- 2022-03-15 CN CN202210250180.3A patent/CN114648635B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019100723A1 (en) * | 2017-11-24 | 2019-05-31 | 华为技术有限公司 | Method and device for training multi-label classification model |
WO2019100724A1 (en) * | 2017-11-24 | 2019-05-31 | 华为技术有限公司 | Method and device for training multi-label classification model |
US20200210773A1 (en) * | 2019-01-02 | 2020-07-02 | Boe Technology Group Co., Ltd. | Neural network for image multi-label identification, related method, medium and device |
CN111476315A (en) * | 2020-04-27 | 2020-07-31 | 中国科学院合肥物质科学研究院 | Image multi-label identification method based on statistical correlation and graph convolution technology |
CN112308115A (en) * | 2020-09-25 | 2021-02-02 | 安徽工业大学 | Multi-label image deep learning classification method and equipment |
CN112906720A (en) * | 2021-03-19 | 2021-06-04 | 河北工业大学 | Multi-label image identification method based on graph attention network |
CN113657425A (en) * | 2021-06-28 | 2021-11-16 | 华南师范大学 | Multi-label image classification method based on multi-scale and cross-modal attention mechanism |
Non-Patent Citations (2)
Title |
---|
张辉宜等: "基于图注意力网络的多标签图像分类模型", 重庆工商大学学报, vol. 39, no. 1, 28 February 2022 (2022-02-28), pages 34 - 41 * |
陈科峻;张叶;: "循环神经网络多标签航空图像分类", 光学精密工程, no. 06, 9 June 2020 (2020-06-09) * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115964626A (en) * | 2022-10-27 | 2023-04-14 | 河南大学 | Community detection method based on dynamic multi-scale feature fusion network |
CN117893839A (en) * | 2024-03-15 | 2024-04-16 | 华东交通大学 | Multi-label classification method and system based on graph attention mechanism |
CN117893839B (en) * | 2024-03-15 | 2024-06-07 | 华东交通大学 | Multi-label classification method and system based on graph attention mechanism |
CN118279610A (en) * | 2024-06-03 | 2024-07-02 | 之江实验室 | Soybean phenotype identification method based on image phenotype matching, electronic equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN114648635B (en) | 2024-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Arevalo et al. | Gated multimodal networks | |
CN108960073B (en) | Cross-modal image mode identification method for biomedical literature | |
CN114648635A (en) | Multi-label image classification method fusing strong correlation among labels | |
CN111783831B (en) | Complex image accurate classification method based on multi-source multi-label shared subspace learning | |
CN112085012B (en) | Project name and category identification method and device | |
Sharma et al. | A survey of methods, datasets and evaluation metrics for visual question answering | |
CN112380435A (en) | Literature recommendation method and recommendation system based on heterogeneous graph neural network | |
CN109299341A (en) | One kind confrontation cross-module state search method dictionary-based learning and system | |
CN112733866A (en) | Network construction method for improving text description correctness of controllable image | |
CN110210534B (en) | Multi-packet fusion-based high-resolution remote sensing image scene multi-label classification method | |
CN114398491A (en) | Semantic segmentation image entity relation reasoning method based on knowledge graph | |
CN110188827A (en) | A kind of scene recognition method based on convolutional neural networks and recurrence autocoder model | |
CN114386534A (en) | Image augmentation model training method and image classification method based on variational self-encoder and countermeasure generation network | |
CN115145551A (en) | Intelligent auxiliary system for machine learning application low-code development | |
CN113095314B (en) | Formula identification method, device, storage medium and equipment | |
Menaga et al. | Deep learning: a recent computing platform for multimedia information retrieval | |
CN113159067A (en) | Fine-grained image identification method and device based on multi-grained local feature soft association aggregation | |
CN118171149B (en) | Label classification method, apparatus, device, storage medium and computer program product | |
CN114863407A (en) | Multi-task cold start target detection method based on visual language depth fusion | |
CN117371523A (en) | Education knowledge graph construction method and system based on man-machine hybrid enhancement | |
CN113240033B (en) | Visual relation detection method and device based on scene graph high-order semantic structure | |
CN116187349A (en) | Visual question-answering method based on scene graph relation information enhancement | |
CN111340113A (en) | Chinese ink and wash painting identification method based on heterogeneous characteristic unified deep neural network | |
CN114943990A (en) | Continuous sign language recognition method and device based on ResNet34 network-attention mechanism | |
CN117954081A (en) | Intelligent medical inquiry method and system based on graph transducer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |