CN111476315B - Image multi-label identification method based on statistical correlation and graph convolution technology - Google Patents

Image multi-label identification method based on statistical correlation and graph convolution technology Download PDF

Info

Publication number
CN111476315B
CN111476315B CN202010342622.8A CN202010342622A CN111476315B CN 111476315 B CN111476315 B CN 111476315B CN 202010342622 A CN202010342622 A CN 202010342622A CN 111476315 B CN111476315 B CN 111476315B
Authority
CN
China
Prior art keywords
image
label
network
graph
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010342622.8A
Other languages
Chinese (zh)
Other versions
CN111476315A (en
Inventor
王儒敬
滕越
谢成军
张洁
李�瑞
陈天娇
陈红波
胡海瀛
刘海云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Institutes of Physical Science of CAS
Original Assignee
Hefei Institutes of Physical Science of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Institutes of Physical Science of CAS filed Critical Hefei Institutes of Physical Science of CAS
Priority to CN202010342622.8A priority Critical patent/CN111476315B/en
Publication of CN111476315A publication Critical patent/CN111476315A/en
Application granted granted Critical
Publication of CN111476315B publication Critical patent/CN111476315B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computational Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an image multi-label identification method based on statistical correlation and graph convolution technology, which solves the defect that the relation between objects in a multi-label image is not fully considered compared with the prior art. The invention comprises the following steps: collecting and preprocessing multi-label images; calculating the correlation between the labels; constructing an image multi-label identification network; training an image multi-label recognition network; acquiring a multi-label image to be detected; and obtaining an image multi-label recognition result. According to the image multi-label classification method, the image label data are utilized to learn the adjacency matrix, object characteristic representation in the image is updated through the graph rolling network, and the multi-label classification performance of the image is improved by combining the global characteristic residual error.

Description

Image multi-label identification method based on statistical correlation and graph convolution technology
Technical Field
The invention relates to the technical field of image analysis, in particular to an image multi-label identification method based on statistical correlation and graph convolution technology.
Background
In recent years, convolutional neural networks have been developed in the field of computer vision, particularly in image classification techniques. Due to the limitation of the local receptive field of the convolution kernel, the convolution neural network is better at identifying a single object, and ignores the relation between objects. In an image, there are multiple related objects that appear substantially simultaneously, such as: teacher and student, mouse and keyboard, goat and grassland, etc. There are also some relationships that hardly appear in the same image, such as: dogs and planes, yaks and seas, snowflakes and swimwear, and the like. Therefore, a large number of dependency relations are contained in the image, and the convolutional neural network cannot model the dependency relations between the objects in the training data at present so as to improve the classification accuracy.
Graph convolutional networks are widely used to address the inherent limitations of convolutional neural networks, the main parts of which are a adjacency matrix, a characteristic representation matrix of nodes, and a learnable weight matrix. Among them, a great deal of research is focused on adjacency matrices. The adjacency matrix is constructed by the methods of semantic network, context information, knowledge graph and the like through partial research, but the message transmission between the nodes is only limited to the first-order neighbor nodes of the nodes. Furthermore, the graph network acquired through external information may not fit well with the learned image dataset, resulting in a knowledge graph misleading training situation.
Particularly in multi-tag images, there are a plurality of recognition (tag) objects in one image, and there is usually a certain correlation between these recognition objects, for example: when the user finds objects such as a keyboard and a mouse in a partial area of one image, the user considers that the computer object exists in the image with high probability, and correspondingly, the user can also estimate that the display object exists in the image with high probability. In other words, for the identification of the keyboard and the mouse in the picture, the probability of the existence of a host computer and a display in the picture is increased, and the probability of the existence of objects such as planes, elephants and the like in the picture can be reduced. It can be seen that modeling and reasoning of dependencies between objects in an image are critical.
Therefore, how to model the dependency relationship between objects from image data has become a technical problem to be solved in view of the current situation that convolutional neural networks ignore the relationship between objects.
Disclosure of Invention
The invention aims to solve the defect that the relation between objects in a multi-label image is not fully considered in the prior art, and provides an image multi-label identification method based on statistical correlation and graph convolution technology to solve the problems.
In order to achieve the above object, the technical scheme of the present invention is as follows:
an image multi-label identification method based on statistical correlation and graph convolution technology comprises the following steps:
collecting and preprocessing multi-label images: collecting multi-label images, and processing labels into a matrix of N x C, wherein N is the number of samples, and C is the type or class number of the labels;
calculating the correlation between tags: calculating the mutual dependency relationship between the labels by utilizing mutual information, constructing a dependency relationship full-connection graph and normalizing the graph to obtain an adjacency matrix;
constructing an image multi-label identification network: constructing an image multi-label recognition network based on the graph rolling network;
training an image multi-label recognition network: training a graph rolling network and a full connection layer in the image multi-label recognition network;
acquiring a multi-label image to be detected: acquiring a multi-label image to be detected;
obtaining an image multi-label recognition result: inputting the multi-label image to be detected into a trained image multi-label recognition network to obtain a final multi-label classification result.
The collecting and preprocessing of the multi-label image comprises the following steps:
constructing an all-zero matrix D of N x C, wherein N is the number of images in the training set, C is the total number of categories in the training set, and C is arranged according to any rule;
converting the image marking data into a tag data matrix D, wherein one image and standard information thereof correspond to one line of data in the tag data matrix D; for all the images in the labeling data, if a certain label exists in the images, the corresponding row and column are found in the label data matrix D, and the value is assigned as '1', which represents that the label exists.
The calculating the correlation between the tags includes the steps of:
for each column in the tag data matrix D, the mutual information of that column and the other columns is calculated as follows:
I(X;Y)=H(X)-H(X|Y)
Figure BDA0002469042170000031
H(X)=-∑ X=x P(x)*logP(x),
wherein X and Y are random variables representing the class of the tag, X and Y are values of the random variables X and Y, X, Y e {0,1}, P (X) is probability of the random variables x=x, P (x|y) is conditional probability, H (X) is information entropy, H (x|y is conditional information entropy;
each column of the tag data is regarded as a random variable X or Y, the numerical value of each row is regarded as X or Y, mutual information among nodes is calculated, a C row and C column matrix A is constructed to store the mutual information value, A ij Representing mutual information values of the ith column and the jth column;
computing an adjacency matrix normalizing matrix A as a graph convolution network
Figure BDA0002469042170000032
Figure BDA0002469042170000033
Wherein: a is that ij Mutual for the ith class and the jth classInformation value, exp is an exponential function, softmax is a normalization function,
Figure BDA0002469042170000034
is the normalized adjacency matrix.
The construction of the image multi-label identification network comprises the following steps:
setting Fast R-CNN as a baseline module to obtain the characteristic X of each picture I And a bounding box;
setting initial feature representation for each bounding box using ROI
Figure BDA0002469042170000035
Setting and utilizing a mutual information method to obtain a full-connection adjacency matrix, and carrying out normalization processing on the full-connection adjacency matrix, wherein the expression is as follows:
I(X;Y)=H(X)-H(X|Y),
Figure BDA0002469042170000036
setting a graph rolling network: representing the initial features of each bounding box
Figure BDA0002469042170000037
Combining to form X (0) Binding full-connectivity adjacency matrix->
Figure BDA0002469042170000038
As input of graph convolution network, the characteristic representation is obtained after L-layer graph convolution>
Figure BDA0002469042170000039
The expression is as follows: />
Figure BDA0002469042170000041
Wherein:
Figure BDA0002469042170000042
the method is characterized in that the method comprises the steps that the method is an adjacency matrix, X is a matrix formed by feature vectors of a plurality of nodes, W is a parameter which can be learned, and sigma (°) is an activation function;
setting of a full connection layer: and (3) connecting the whole image characteristics and the boundary frame characteristics after convolution of the graph convolution network in series, connecting two layers of fully-connected neural networks, and activating by softmax to obtain a final classification result.
The training of the graph rolling network comprises the following steps:
obtaining global feature representation of the image and a boundary box of an object in the image and feature representation of the object in the image by using Fast R-CNN and ROI;
taking the characteristic representation of the object as the input of the graph rolling network, updating the corresponding node representation,
Figure BDA0002469042170000043
wherein X is (l+1) For the layer 1 graph convolution feature, σ is the nonlinear activation function,
Figure BDA0002469042170000044
for the normalized global adjacency matrix obtained in the second step, X (l) For the first layer of characteristic representation, W is a learning parameter;
and (3) connecting the global feature representation of the image with the object representation updated by the graph rolling network in series, connecting two FC layers, and finally normalizing by a softmax function to obtain a final multi-label identification result.
The training of the full connection layer comprises the following steps:
inputting the training image into a network to obtain a training result;
correcting the connection weight of the full-connection network layer according to a gradient descent algorithm;
the graph roll-up network parameter W is modified according to a gradient descent algorithm.
The obtaining of the image multi-label recognition result comprises the following steps:
using Fast R-CNN as the baseline module,obtaining the characteristic X of the multi-label image to be detected I And a bounding box;
obtaining an initial feature representation of each bounding box in a multi-label image to be detected using an ROI
Figure BDA0002469042170000045
All of each image to be detected
Figure BDA0002469042170000046
Merging into a graph convolution network X (0) As input to a graph convolution network;
integral initial feature X of output serial images of graph rolling network I Connecting to a fully connected network;
and obtaining a final multi-label classification result by the output of the two layers of trained fully-connected networks through a softmax function.
Advantageous effects
Compared with the prior art, the image multi-label identification method based on the statistical correlation and the graph rolling technology utilizes image label data to learn the adjacency matrix, updates object feature representation in the image through the graph rolling network, and combines global feature residual errors to improve the image multi-label classification performance.
The method can well combine the image feature extraction capability of the convolutional neural network and the mutual dependence relationship of the labels, thereby improving the precision of multi-label classification.
Drawings
FIG. 1 is a process sequence diagram of the present invention.
Detailed Description
For a further understanding and appreciation of the structural features and advantages achieved by the present invention, the following description is provided in connection with the accompanying drawings, which are presently preferred embodiments and are incorporated in the accompanying drawings, in which:
as shown in fig. 1, the image multi-label identification method based on the statistical correlation and graph convolution technology of the present invention includes the following steps:
the first step, collecting and preprocessing the multi-label image: and collecting multi-label images, and processing labels into a matrix of N x C, wherein N is the number of samples, and C is the type or class number of the labels. The method comprises the following specific steps:
(1) And constructing an all-zero matrix D of N x C, wherein N is the number of images in the training set, C is the total number of categories in the training set, and C is arranged according to any rule.
(2) Converting the image marking data into a tag data matrix D, wherein one image and standard information thereof correspond to one line of data in the tag data matrix D; for all the images in the labeling data, if a certain label exists in the images, the corresponding row and column are found in the label data matrix D, and the value is assigned as '1', which represents that the label exists.
Second, calculate the correlation between the labels: and calculating the mutual dependency relationship among the labels by utilizing the mutual information, constructing a dependency relationship full-connection graph and normalizing the graph to obtain an adjacency matrix. The statistical correlation among the labels is modeled, so that the performance of multi-label classification can be improved. The adjacency matrix may direct the messaging of features between objects in the graph rolling network, thereby enhancing the feature representation of associated objects and reducing the messaging between non-statistically relevant objects. Most of the current methods for constructing the adjacency matrix are often constructed through external knowledge (such as semantic network, knowledge graph and the like), but the external knowledge cannot be well matched with the training data set, so that the adjacency matrix misleads message transmission, and therefore, we model the statistical correlation among labels based on the label data of the training data set. Conventional statistical correlation modeling often requires independent detection of tag data, which is a time-consuming and labor-intensive task. The information entropy may describe the magnitude of the information uncertainty contained in a random variable, and the mutual information may describe the degree to which the information uncertainty of one random variable decreases with the addition of another random variable. In addition, the computational complexity of mutual information is far less than that of chi-square detection, so that we use the mutual information to calculate the correlation between image labels and normalize the correlation as an adjacency matrix to guide the message passing between image multi-label recognition objects.
The method comprises the following specific steps:
(1) For each column in the tag data matrix D, the mutual information of that column and the other columns is calculated as follows:
I(X;Y)=H(X)-H(X|Y(
Figure BDA0002469042170000061
H(X)=-∑ X=x P(x)*logP(x),
wherein X and Y are random variables representing the types of the labels, X and Y are values of the random variables X and Y, X, Y epsilon {0,1}, P (X) is probability of the random variables X=x, P (x|y) is conditional probability, H (X) is information entropy, H (x|Y is conditional information entropy, and information entropy describes how much uncertainty the information contains, and the mutual information is used for replacing the conditional independence test innovatively to quantitatively describe the correlation among the labels of the picture types.
Each column of the tag data is regarded as a random variable X or Y, the numerical value of each row is regarded as X or Y, mutual information among nodes is calculated, a C row and C column matrix A is constructed to store the mutual information value, A ij Representing the mutual information values of the ith and jth columns.
(2) Computing an adjacency matrix normalizing matrix A as a graph convolution network
Figure BDA0002469042170000071
Figure BDA0002469042170000072
/>
Wherein: a is that ij For the mutual information values of the ith class and the jth class, exp is an exponential function, softmax is a normalization function,
Figure BDA0002469042170000073
is the normalized adjacency matrix.
Thirdly, constructing an image multi-label identification network: an image multi-label recognition network is constructed based on the graph convolution network.
The graph convolution network can effectively integrate a connection sense framework and a symbol sense reasoning framework of a convolution network and the like, and performs message transmission and reasoning among image objects according to the guidance of the adjacent matrix, so that the performance of multi-label classification is improved. In many cases, the conventional graph rolling network uses external knowledge as an adjacency matrix and semantic vector as a node feature vector. But the external knowledge and external node vector representation do not fit well into the training dataset, we then use Fast R-CNN and ROI (region of interest) to extract the feature representation of each object as the feature vector of the node. And simultaneously, the characteristic representation of the whole image and the characteristic representation of the node (object) which is transmitted by the graph convolution message are connected in series and are delivered to a two-layer fully-connected network to obtain a final classification result. The benefits of doing so are mainly two: 1. the message passing and characteristic enhancement capability of the graph convolutional network aims at training data and is not misled by external knowledge; 2. the image integral features and the graph convolution network node features are connected in series, so that the classification acceptance domain can not lose the integral information of the image when in a local object region, and a stable classification effect is achieved.
The construction of the image multi-label identification network comprises the following steps:
(1) Setting Fast R-CNN as a baseline module to obtain the characteristic X of each picture I And a bounding box;
(2) Setting up an initial feature representation of each bounding box using ROI (region of interest)
Figure BDA0002469042170000074
(3) Setting and utilizing a mutual information method to obtain a full-connection adjacency matrix, and carrying out normalization processing on the full-connection adjacency matrix, wherein the expression is as follows:
I(X;Y)=H(X)-H(X|Y),
Figure BDA0002469042170000075
(4) Setting a graph rolling network: representing the initial features of each bounding box
Figure BDA0002469042170000081
Combining to form X (0) Binding full-connectivity adjacency matrix->
Figure BDA0002469042170000082
As input of graph convolution network, the characteristic representation is obtained after L-layer graph convolution>
Figure BDA0002469042170000083
The expression is as follows:
Figure BDA0002469042170000084
wherein:
Figure BDA0002469042170000085
for the adjacency matrix, X is a matrix composed of feature vectors of a plurality of nodes, W is a learnable parameter, and sigma (°) is an activation function.
(5) Setting of a full connection layer: and (3) connecting the whole image characteristics and the boundary frame characteristics after convolution of the graph convolution network in series, connecting two layers of fully-connected neural networks, and activating by softmax to obtain a final classification result.
Training the image multi-label recognition network: the graph rolling network and the full connection layer in the image multi-label recognition network are trained.
Wherein training the graph rolling network comprises the following steps:
(1) Obtaining global feature representation of the image and a boundary box of an object in the image and feature representation of the object in the image by using Fast R-CNN and ROI;
(2) Taking the characteristic representation of the object as the input of the graph rolling network, updating the corresponding node representation,
Figure BDA0002469042170000086
wherein X is (l+1) For the layer 1 graph convolution feature, σ is the nonlinear activation function,
Figure BDA0002469042170000087
for the normalized global adjacency matrix obtained in the second step, X (l) For the first layer of characteristic representation, W is a learning parameter;
(3) And (3) connecting the global feature representation of the image with the object representation updated by the graph rolling network in series, connecting two FC layers, and finally normalizing by a softmax function to obtain a final multi-label identification result.
Training the fully connected layer uses a traditional method, which comprises the following steps:
(1) Inputting the training image into a network to obtain a training result;
(2) Correcting the connection weight of the full-connection network layer according to a gradient descent algorithm;
(3) The graph roll-up network parameter W is modified according to a gradient descent algorithm.
Fifthly, acquiring a multi-label image to be detected: and acquiring a multi-label image to be detected.
Sixth, obtaining an image multi-label recognition result: inputting the multi-label image to be detected into a trained image multi-label recognition network to obtain a final multi-label classification result. The method comprises the following specific steps:
(1) Obtaining the characteristic X of the multi-label image to be detected by using Fast R-CNN as a baseline module I And a bounding box;
(2) Obtaining an initial feature representation of each bounding box in a multi-label image to be detected using an ROI
Figure BDA0002469042170000091
(3) All of each image to be detected
Figure BDA0002469042170000092
Merging into a graph convolution network X (0) As input to a graph convolution network;
(4) Integral initial feature X of output serial images of graph rolling network I Connecting to a fully connected network;
(5) And obtaining a final multi-label classification result by the output of the two layers of trained fully-connected networks through a softmax function.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention, which is defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (5)

1. The image multi-label identification method based on the statistical correlation and graph convolution technology is characterized by comprising the following steps of:
11 Collecting and preprocessing of multi-label images: collecting multi-label images, and processing labels into a matrix of N x C, wherein N is the number of samples, and C is the type or class number of the labels;
12 Calculating the correlation between tags): calculating the mutual dependency relationship between the labels by utilizing mutual information, constructing a dependency relationship full-connection graph and normalizing the graph to obtain an adjacency matrix;
the calculating the correlation between the tags includes the steps of:
121 For each column in the tag data matrix D, the mutual information of that column and the other columns is calculated as follows:
I(X;Y)=H(X)-H(X|Y)
Figure FDA0004057184090000011
H(X)=-∑ x=x P(x)*logP(x),
wherein X and Y are random variables representing the types of labels, X and Y are values of the random variables X and Y, X, Y epsilon {0,1}, P (X) is probability of the random variables X=x, P (x|y) is conditional probability, H (X) is information entropy, and H (x|Y) is conditional information entropy;
each column of the tag data is regarded as a random variable X or Y, the numerical value of each row is regarded as X or Y, mutual information among nodes is calculated, a C row and C column matrix A is constructed to store the mutual information value, A ij Representing mutual information values of the ith column and the jth column;
122 Calculating a adjacency matrix normalized to matrix a as a graph convolution network
Figure FDA0004057184090000012
Figure FDA0004057184090000013
Wherein: a is that ij For the mutual information values of the ith class and the jth class, exp is an exponential function, softmax is a normalization function,
Figure FDA0004057184090000014
the normalized adjacency matrix;
13 Building an image multi-label recognition network: constructing an image multi-label recognition network based on the graph rolling network;
the construction of the image multi-label identification network comprises the following steps:
131 Setting Fast R-CNN as a base line module to obtain the characteristic X of each picture I And a bounding box;
132 Setting initial feature representation for each bounding box using ROI
Figure FDA0004057184090000021
133 Setting and obtaining a full-connection adjacency matrix by using a mutual information method, and carrying out normalization processing on the full-connection adjacency matrix, wherein the expression is as follows:
I(X;Y)=H(X)-H(X|Y),
Figure FDA0004057184090000022
134 Setting of graph rolling network: representing the initial features of each bounding box
Figure FDA0004057184090000023
Combining to form X (0) Binding full-connectivity adjacency matrix->
Figure FDA0004057184090000024
As input of graph convolution network, the characteristic representation is obtained after L-layer graph convolution>
Figure FDA0004057184090000025
The expression is as follows:
Figure FDA0004057184090000026
wherein:
Figure FDA0004057184090000027
the method is characterized in that the method comprises the steps that the method is an adjacency matrix, X is a matrix formed by feature vectors of a plurality of nodes, W is a parameter which can be learned, and sigma (°) is an activation function;
135 Setting of the full connection layer: the whole image features and the boundary frame features after convolution of the graph convolution network are connected in series, two layers of fully-connected neural networks are connected, and a final classification result is obtained after softmax activation;
14 Training the image multi-label recognition network: training a graph rolling network and a full connection layer in the image multi-label recognition network;
15 Acquisition of a multi-label image to be detected: acquiring a multi-label image to be detected;
16 Image multi-label recognition result is obtained: inputting the multi-label image to be detected into a trained image multi-label recognition network to obtain a final multi-label classification result.
2. The image multi-label identification method based on statistical correlation and graph rolling technology according to claim 1, wherein the collection and preprocessing of the multi-label image comprises the following steps:
21 Constructing an all-zero matrix D of N x C, wherein N is the number of images in the training set, C is the total number of categories in the training set, and C is arranged according to any rule;
22 Converting the image marking data into a tag data matrix D, wherein one image and standard information thereof correspond to one line of data in the tag data matrix D; for all the images in the labeling data, if a certain label exists in the images, the corresponding row and column are found in the label data matrix D, and the value is assigned as '1', which represents that the label exists.
3. The method for identifying multiple labels of an image based on statistical correlation and graph rolling technique according to claim 1, wherein said training of the graph rolling network comprises the steps of:
31 Obtaining global feature representation of the image and a boundary box of the object in the image and feature representation thereof by using Fast R-CNN and ROI;
32 With the characteristic representation of the object as input to the graph convolution network, updating the corresponding node representation,
Figure FDA0004057184090000031
wherein X is (l+1) For the layer 1 graph convolution feature, σ is the nonlinear activation function,
Figure FDA0004057184090000032
for the normalized global adjacency matrix obtained in the second step, X (l) For the first layer of characteristic representation, W is a learning parameter;
33 The global feature representation of the image is connected with the object representation updated by the graph rolling network in series, two FC layers are connected, and finally, the final multi-label recognition result is obtained after the normalization of the softmax function.
4. The method for identifying multiple image labels based on statistical correlation and graph rolling technology according to claim 1, wherein said training of the full-connection layer comprises the following steps:
41 Inputting the training image into a network to obtain a training result;
42 Correcting the connection weight of the full-connection network layer according to a gradient descent algorithm;
43 Correcting the graph roll-up network parameter W according to a gradient descent algorithm.
5. The image multi-label recognition method based on the statistical correlation and graph rolling technology according to claim 1, wherein the obtaining of the image multi-label recognition result comprises the following steps:
51 Using Fast R-CNN as a baseline module to obtain the characteristic X of the multi-label image to be detected I And a bounding box;
52 Obtaining an initial feature representation of each bounding box in the multi-label image to be detected using the ROI
Figure FDA0004057184090000041
53 All of each image to be detected
Figure FDA0004057184090000042
Merging into a graph convolution network X (0) As input to a graph convolution network;
54 Integral initial feature X of output serial images of graph rolling network I Connecting to a fully connected network;
55 The output of the two layers of trained fully-connected networks is subjected to softmax function to obtain a final multi-label classification result.
CN202010342622.8A 2020-04-27 2020-04-27 Image multi-label identification method based on statistical correlation and graph convolution technology Active CN111476315B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010342622.8A CN111476315B (en) 2020-04-27 2020-04-27 Image multi-label identification method based on statistical correlation and graph convolution technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010342622.8A CN111476315B (en) 2020-04-27 2020-04-27 Image multi-label identification method based on statistical correlation and graph convolution technology

Publications (2)

Publication Number Publication Date
CN111476315A CN111476315A (en) 2020-07-31
CN111476315B true CN111476315B (en) 2023-05-05

Family

ID=71763058

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010342622.8A Active CN111476315B (en) 2020-04-27 2020-04-27 Image multi-label identification method based on statistical correlation and graph convolution technology

Country Status (1)

Country Link
CN (1) CN111476315B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183299B (en) * 2020-09-23 2024-02-09 成都佳华物链云科技有限公司 Pedestrian attribute prediction method and device, electronic equipment and storage medium
CN112487207A (en) * 2020-12-09 2021-03-12 Oppo广东移动通信有限公司 Image multi-label classification method and device, computer equipment and storage medium
CN112862089B (en) * 2021-01-20 2023-05-23 清华大学深圳国际研究生院 Medical image deep learning method with interpretability
CN112906720B (en) * 2021-03-19 2022-03-22 河北工业大学 Multi-label image identification method based on graph attention network
CN113204659B (en) * 2021-03-26 2024-01-19 北京达佳互联信息技术有限公司 Label classification method and device for multimedia resources, electronic equipment and storage medium
CN113988147B (en) * 2021-12-08 2022-04-26 南京信息工程大学 Multi-label classification method and device for remote sensing image scene based on graph network, and multi-label retrieval method and device
CN114648635A (en) * 2022-03-15 2022-06-21 安徽工业大学 Multi-label image classification method fusing strong correlation among labels
CN114550310A (en) * 2022-04-22 2022-05-27 杭州魔点科技有限公司 Method and device for identifying multi-label behaviors
CN115031794A (en) * 2022-04-29 2022-09-09 天津大学 Novel gas-solid two-phase flow measuring method of multi-characteristic-diagram convolution
CN117475240A (en) * 2023-12-26 2024-01-30 创思(广州)电子科技有限公司 Vegetable checking method and system based on image recognition

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109816009A (en) * 2019-01-18 2019-05-28 南京旷云科技有限公司 Multi-tag image classification method, device and equipment based on picture scroll product
CN110705425A (en) * 2019-09-25 2020-01-17 广州西思数字科技有限公司 Tongue picture multi-label classification learning method based on graph convolution network
WO2020048119A1 (en) * 2018-09-04 2020-03-12 Boe Technology Group Co., Ltd. Method and apparatus for training a convolutional neural network to detect defects

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020048119A1 (en) * 2018-09-04 2020-03-12 Boe Technology Group Co., Ltd. Method and apparatus for training a convolutional neural network to detect defects
CN109816009A (en) * 2019-01-18 2019-05-28 南京旷云科技有限公司 Multi-tag image classification method, device and equipment based on picture scroll product
CN110705425A (en) * 2019-09-25 2020-01-17 广州西思数字科技有限公司 Tongue picture multi-label classification learning method based on graph convolution network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于图卷积网络的多标签食品原材料识别;李辉等;《南京信息工程大学学报(自然科学版)》(第06期);全文 *
基于标签相关性的卷积神经网络多标签分类算法;蒋俊钊等;《工业控制计算机》(第07期);全文 *

Also Published As

Publication number Publication date
CN111476315A (en) 2020-07-31

Similar Documents

Publication Publication Date Title
CN111476315B (en) Image multi-label identification method based on statistical correlation and graph convolution technology
CN110084296B (en) Graph representation learning framework based on specific semantics and multi-label classification method thereof
CN108875827B (en) Method and system for classifying fine-grained images
CN114067160B (en) Small sample remote sensing image scene classification method based on embedded smooth graph neural network
Torralba et al. Contextual models for object detection using boosted random fields
Goodfellow et al. Multi-digit number recognition from street view imagery using deep convolutional neural networks
CN112015863B (en) Multi-feature fusion Chinese text classification method based on graphic neural network
CN113657425A (en) Multi-label image classification method based on multi-scale and cross-modal attention mechanism
CN112308115B (en) Multi-label image deep learning classification method and equipment
CN111475622A (en) Text classification method, device, terminal and storage medium
CN113449801B (en) Image character behavior description generation method based on multi-level image context coding and decoding
CN115410088B (en) Hyperspectral image field self-adaption method based on virtual classifier
CN115131613B (en) Small sample image classification method based on multidirectional knowledge migration
CN111079847A (en) Remote sensing image automatic labeling method based on deep learning
CN111582506A (en) Multi-label learning method based on global and local label relation
CN110689049A (en) Visual classification method based on Riemann kernel dictionary learning algorithm
CN114863091A (en) Target detection training method based on pseudo label
CN113642602B (en) Multi-label image classification method based on global and local label relation
CN116681128A (en) Neural network model training method and device with noisy multi-label data
CN116089605A (en) Text emotion analysis method based on transfer learning and improved word bag model
CN113259369B (en) Data set authentication method and system based on machine learning member inference attack
CN114724167A (en) Marketing text recognition method and system
CN111767402B (en) Limited domain event detection method based on counterstudy
CN111259176B (en) Cross-modal Hash retrieval method based on matrix decomposition and integrated with supervision information
CN114417938A (en) Electromagnetic target classification method using knowledge vector embedding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant