CN111476315A - Image multi-label identification method based on statistical correlation and graph convolution technology - Google Patents

Image multi-label identification method based on statistical correlation and graph convolution technology Download PDF

Info

Publication number
CN111476315A
CN111476315A CN202010342622.8A CN202010342622A CN111476315A CN 111476315 A CN111476315 A CN 111476315A CN 202010342622 A CN202010342622 A CN 202010342622A CN 111476315 A CN111476315 A CN 111476315A
Authority
CN
China
Prior art keywords
image
label
network
graph convolution
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010342622.8A
Other languages
Chinese (zh)
Other versions
CN111476315B (en
Inventor
王儒敬
滕越
谢成军
张洁
李�瑞
陈天娇
陈红波
胡海瀛
刘海云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Institutes of Physical Science of CAS
Original Assignee
Hefei Institutes of Physical Science of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Institutes of Physical Science of CAS filed Critical Hefei Institutes of Physical Science of CAS
Priority to CN202010342622.8A priority Critical patent/CN111476315B/en
Publication of CN111476315A publication Critical patent/CN111476315A/en
Application granted granted Critical
Publication of CN111476315B publication Critical patent/CN111476315B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computational Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an image multi-label identification method based on statistical correlation and graph convolution technology, which overcomes the defect that the relation between objects in a multi-label image is not fully considered compared with the prior art. The invention comprises the following steps: collecting and preprocessing a multi-label image; calculating the correlation between the labels; constructing an image multi-label identification network; training the image multi-label recognition network; acquiring a multi-label image to be detected; and obtaining an image multi-label identification result. The method utilizes image label data to learn the adjacency matrix, updates object feature representation in the image through a graph convolution network, and improves the multi-label classification performance of the image by combining global feature residual errors.

Description

Image multi-label identification method based on statistical correlation and graph convolution technology
Technical Field
The invention relates to the technical field of image analysis, in particular to an image multi-label identification method based on statistical correlation and graph convolution technology.
Background
In recent years, convolutional neural networks have been developed dramatically in the field of computer vision, especially image classification techniques. Because of the limitation of local receptive fields of convolution kernels, the convolutional neural network is better at identifying a single object and ignores the relationship between objects. In an image, there are basically a plurality of related objects appearing simultaneously, such as: teachers and students, mice and keyboards, goats and grasslands, etc. There are also some relationships that hardly appear in the same image, such as: dogs and airplanes, yaks and seas, snowflakes, swimsuits, and the like. Therefore, a large number of dependency relationships are contained in the image, and the current convolutional neural network cannot model the dependency relationships between the objects from the training data so as to improve the classification accuracy.
Graph convolutional networks are widely used to solve the inherent limitations of convolutional neural networks, and the main parts of the graph convolutional neural networks are an adjacency matrix, a feature representation matrix of nodes and a learnable weight matrix. Among them, a lot of research is focused on the adjacency matrix. Some researches establish the adjacency matrix through semantic network, context information, knowledge graph and other methods, but the message transmission between nodes is limited to the first-order neighbor nodes of the nodes. Furthermore, the graph network obtained through external information may not fit well with the learned image dataset, leading to a situation where the knowledge graph misleads the training.
In particular, in a multi-label image, there are a plurality of identification (label) objects in one image, and there is usually a certain correlation between these identification objects, for example: when an object such as a keyboard, a mouse and the like is found in a partial area of an image, the image is considered to have a computer object with a high probability, and accordingly, a display object can be estimated to exist in the image. In other words, for the identification of the keyboard and the mouse in the picture, the probability that the computer host and the display exist in the picture is increased, and the probability that the airplane, the elephant and other objects exist in the picture can be reduced. It can be seen that modeling and reasoning about dependencies between multiple objects in an image is crucial.
Therefore, how to ignore the relationship between objects in the current situation of the convolutional neural network, modeling the dependency relationship between objects from the image data has become an urgent technical problem to be solved.
Disclosure of Invention
The invention aims to solve the defect that the relationship between objects in a multi-label image is not fully considered in the prior art, and provides an image multi-label identification method based on statistical correlation and graph convolution technology to solve the problem.
In order to achieve the purpose, the technical scheme of the invention is as follows:
an image multi-label identification method based on statistical correlation and graph convolution technology comprises the following steps:
collecting and preprocessing multi-label images: collecting multi-label images, and processing labels into a matrix of N × C, wherein N is the number of samples, and C is the type or category number of the labels;
calculating the correlation between the labels: calculating the mutual dependency relationship among the labels by utilizing the mutual information, constructing a dependency relationship full-link graph and normalizing the dependency relationship full-link graph to obtain an adjacency matrix;
constructing an image multi-label identification network: constructing an image multi-label identification network based on the graph convolution network;
training the image multi-label recognition network: training a graph convolution network and a full connection layer in the image multi-label identification network;
acquiring a multi-label image to be detected: acquiring a multi-label image to be detected;
obtaining an image multi-label identification result: and inputting the multi-label image to be detected into the trained image multi-label identification network to obtain a final multi-label classification result.
The collection and pre-processing of the multi-label image comprises the steps of:
constructing an N-C all-zero matrix D, wherein N is the number of images in the training set, C is the total number of categories in the training set, and C is arranged according to any rule;
converting the image labeling data into a label data matrix D, wherein one image and standard information thereof correspond to one row of data in the label data matrix D; for the images in all the labeled data, if a certain label exists in the image, the corresponding row and column are found in the label data matrix D, and are assigned as "1", which represents that the label exists.
The calculating the correlation between the labels comprises the following steps:
for each column in the tag data matrix D, the mutual information of the column and other columns is calculated, and the calculation formula is as follows:
I(X;Y)=H(X)-H(X|Y)
Figure BDA0002469042170000031
H(X)=-∑X=xP(x)*logP(x),
wherein, X and Y are random variables representing the category of the label, X and Y are values of the random variables X and Y, X, Y ∈ {0,1}, P (X) is the probability that X is the random variable X ═ X, P (X | Y) is the conditional probability, H (X) is the information entropy, and H (X | Y is the conditional information entropy;
regarding each column of the label data as a random variable X or Y, regarding the numerical value of each row as X or Y, calculating mutual information between nodes, constructing a C-row C-column matrix A for storing mutual information values, AijA mutual information value representing the ith column and the jth column;
computing the normalization of the matrix A as an adjacency matrix for the graph convolution network
Figure BDA0002469042170000032
Figure BDA0002469042170000033
Wherein: a. theijFor the mutual information values of the ith class and the jth class, exp is an exponential function, softmax is a normalizing function,
Figure BDA0002469042170000034
is a normalized adjacency matrix.
The method for constructing the image multi-label identification network comprises the following steps:
setting and utilizing Fast R-CNN as a baseline module to obtain the characteristic X of each pictureIAnd a bounding box;
setting initial feature representation of each bounding box using ROI
Figure BDA0002469042170000035
Setting and utilizing a mutual information method to obtain a full-connection adjacent matrix, and carrying out normalization processing on the full-connection adjacent matrix, wherein the expression is as follows:
I(X;Y)=H(X)-H(X|Y),
Figure BDA0002469042170000036
setting a graph volume network: representing the initial features of each bounding box
Figure BDA0002469042170000037
Are combined to form X(0)Combining fully-connected adjacency matrices
Figure BDA0002469042170000038
As input of the graph convolution network, L-layer graph convolution is carried out to obtain feature representation
Figure BDA0002469042170000039
The expression is as follows:
Figure BDA0002469042170000041
wherein:
Figure BDA0002469042170000042
is an adjacent matrix, X is a matrix formed by the feature vectors of a plurality of nodes, W is a learnable parameter, and sigma (-) is an activation function;
setting of the full connection layer: and connecting the integral image features and the bounding box features after convolution of the image convolution network in series, connecting two layers of fully-connected neural networks, and activating by softmax to obtain a final classification result.
The training of the graph convolution network comprises the following steps:
obtaining global feature representation of the image and a boundary box of an object in the image and feature representation thereof by using Fast R-CNN and ROI;
taking the characteristic representation of the object as the input of the graph convolution network, updating the corresponding node representation,
Figure BDA0002469042170000043
wherein, X(l+1)For the (l + 1) th layer map convolution characteristic, σ is a nonlinear activation function,
Figure BDA0002469042170000044
for the normalized global adjacency matrix obtained in the second step, X(l)Is the l-th layer feature representation, W is the learning parameter;
and connecting the global feature representation of the image and the object representation updated by the graph convolution network in series, connecting two FC layers, and finally obtaining a final multi-label identification result after normalization by a softmax function.
The training of the full connection layer comprises the following steps:
inputting the training image into a network to obtain a training result;
modifying the connection weight of the fully-connected network layer according to a gradient descent algorithm;
and correcting the graph convolution network parameter W according to a gradient descent algorithm.
The obtaining of the image multi-label identification result comprises the following steps:
obtaining the characteristic X of the multi-label image to be detected by using Fast R-CNN as a baseline moduleIAnd a bounding box;
obtaining initial feature representation of each bounding box in multi-label image to be detected by using ROI
Figure BDA0002469042170000045
All the images to be detected
Figure BDA0002469042170000046
Merged into a convolutional network X(0)As input to the graph convolution network;
integral initial characteristic X of output series images of graph convolution networkIConnected to a fully connected network;
and (4) passing the output of the two layers of trained fully-connected networks through a softmax function to obtain a final multi-label classification result.
Advantageous effects
Compared with the prior art, the image multi-label identification method based on the statistical correlation and the graph convolution technology utilizes image label data to learn the adjacency matrix, updates object feature representation in the image through the graph convolution network, and improves the image multi-label classification performance by combining global feature residual errors.
The method can well combine the image feature extraction capability of the convolutional neural network and the mutual dependency relationship of the labels, thereby improving the precision of multi-label classification.
Drawings
FIG. 1 is a sequence diagram of the method of the present invention.
Detailed Description
So that the manner in which the above recited features of the present invention can be understood and readily understood, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings, wherein:
as shown in fig. 1, the image multi-label identification method based on statistical correlation and graph convolution technology according to the present invention includes the following steps:
the first step, the collection and preprocessing of multi-label images: collecting multi-label images, and processing labels into a matrix of N × C, wherein N is the number of samples, and C is the type or category number of the labels. The method comprises the following specific steps:
(1) and constructing an all-zero matrix D of N x C, wherein N is the number of images in the training set, C is the total number of categories in the training set, and C is arranged according to any rule.
(2) Converting the image labeling data into a label data matrix D, wherein one image and standard information thereof correspond to one row of data in the label data matrix D; for the images in all the labeled data, if a certain label exists in the image, the corresponding row and column are found in the label data matrix D, and are assigned as "1", which represents that the label exists.
Secondly, calculating the correlation between the labels: and calculating the mutual dependency relationship among the labels by utilizing the mutual information, constructing a dependency relationship full-link graph and normalizing the dependency relationship full-link graph to obtain the adjacency matrix. The statistical correlation among the labels is modeled, and the performance of multi-label classification can be improved. The adjacency matrix may direct the messaging of features between objects in the graph convolution network, thereby enhancing the feature representation of the associated object and reducing the messaging between non-statistically related objects. Most of the existing methods for constructing the adjacency matrix are constructed through external knowledge (such as semantic networks, knowledge maps and the like), but the external knowledge cannot well fit with a training data set, so that the adjacency matrix misleads message transmission, and therefore statistical correlation among labels is modeled from label data of the training data set. Traditional statistical correlation modeling often requires independent detection of tag data, which is a time-consuming and labor-consuming task. The information entropy can describe the size of information uncertainty contained in the random variable, and the mutual information can describe the degree of information uncertainty of one random variable reducing with the addition of another random variable. In addition, the computation complexity of mutual information is far less than that of chi-square detection, so that the mutual information is used for computing the correlation between image labels, and the correlation is normalized to be used as an adjacent matrix for guiding the message transmission between image multi-label identification objects.
The method comprises the following specific steps:
(1) for each column in the tag data matrix D, the mutual information of the column and other columns is calculated, and the calculation formula is as follows:
I(X;Y)=H(X)-H(X|Y(
Figure BDA0002469042170000061
H(X)=-∑X=xP(x)*logP(x),
x and Y are random variables and represent the category of the label, X and Y are values of the random variables X and Y, X, Y ∈ {0,1}, P (X) is the probability that X is the random variable X ═ X, P (X | Y) is the conditional probability, H (X) is the information entropy, H (X | Y is the conditional information entropy, the information entropy describes the uncertainty of the information, mutual information is innovatively used for replacing the conditional independence test, and the correlation between the picture category labels is described quantitatively.
Regarding each column of the label data as a random variable X or Y, regarding the numerical value of each row as X or Y, calculating mutual information between nodes, constructing a C-row C-column matrix A for storing mutual information values, AijRepresenting the mutual information values of the ith and jth columns.
(2) Computing the normalization of the matrix A as an adjacency matrix for the graph convolution network
Figure BDA0002469042170000071
Figure BDA0002469042170000072
Wherein: a. theijFor the mutual information values of the ith class and the jth class, exp is an exponential function, softmax is a normalizing function,
Figure BDA0002469042170000073
is a normalized adjacency matrix.
Thirdly, constructing an image multi-label identification network: and constructing an image multi-label identification network based on the graph convolution network.
The graph convolution network can effectively fuse a connection meaning frame such as a convolution network and a symbolic meaning reasoning frame, and carry out message transmission and reasoning between image objects according to the guidance of the adjacent matrix, thereby improving the performance of multi-label classification. In most applications of the conventional graph convolution network, external knowledge is used as an adjacent matrix, and semantic directions are used as node feature vectors. But the external knowledge and external node vector representations do not fit well with the training data set, we use Fast R-CNN and roi (region of interest) to extract the feature representation of each object as the feature vector of the node. Meanwhile, the feature representation of the whole image and the node (object) feature representation obtained by the graph convolution information are connected in series and handed to a two-layer full-connection network to obtain a final classification result. The benefits of this are mainly two: 1. the message transmission and feature enhancement capability of the graph convolution network aims at training data and is not misled by external knowledge; 2. the series connection of the image overall characteristic and the graph convolution network node characteristic can ensure that the overall information of the image is not lost while the classification receiving domain is in the local object region, thereby achieving the stable classification effect.
The method for constructing the image multi-label identification network comprises the following steps:
(1) setting and utilizing Fast R-CNN as a baseline module to obtain the characteristic X of each pictureIAnd a bounding box;
(2) setting the ROI (region of interest) to obtain the initial feature representation of each bounding box
Figure BDA0002469042170000074
(3) Setting and utilizing a mutual information method to obtain a full-connection adjacent matrix, and carrying out normalization processing on the full-connection adjacent matrix, wherein the expression is as follows:
I(X;Y)=H(X)-H(X|Y),
Figure BDA0002469042170000075
(4) setting a graph volume network: representing the initial features of each bounding box
Figure BDA0002469042170000081
Are combined to form X(0)Combining fully-connected adjacency matrices
Figure BDA0002469042170000082
As input of the graph convolution network, L-layer graph convolution is carried out to obtain feature representation
Figure BDA0002469042170000083
The expression is as follows:
Figure BDA0002469042170000084
wherein:
Figure BDA0002469042170000085
is a adjacency matrix, X is a matrix composed of feature vectors of a plurality of nodes, W is a learnable parameter, and σ (-) is an activation function.
(5) Setting of the full connection layer: and connecting the integral image features and the bounding box features after convolution of the image convolution network in series, connecting two layers of fully-connected neural networks, and activating by softmax to obtain a final classification result.
Fourthly, training the image multi-label recognition network: and training a graph convolution network and a full connection layer in the image multi-label identification network.
The method for training the graph convolution network comprises the following steps:
(1) obtaining global feature representation of the image and a boundary box of an object in the image and feature representation thereof by using Fast R-CNN and ROI;
(2) taking the characteristic representation of the object as the input of the graph convolution network, updating the corresponding node representation,
Figure BDA0002469042170000086
wherein, X(l+1)For the (l + 1) th layer map convolution characteristic, σ is a nonlinear activation function,
Figure BDA0002469042170000087
for the normalized global adjacency matrix obtained in the second step, X(l)Is the l-th layer feature representation, W is the learning parameter;
(3) and connecting the global feature representation of the image and the object representation updated by the graph convolution network in series, connecting two FC layers, and finally obtaining a final multi-label identification result after normalization by a softmax function.
Training the fully-connected layer utilizes a conventional method, which includes the steps of:
(1) inputting the training image into a network to obtain a training result;
(2) modifying the connection weight of the fully-connected network layer according to a gradient descent algorithm;
(3) and correcting the graph convolution network parameter W according to a gradient descent algorithm.
And fifthly, acquiring the multi-label image to be detected: and acquiring a multi-label image to be detected.
Sixthly, obtaining an image multi-label identification result: and inputting the multi-label image to be detected into the trained image multi-label identification network to obtain a final multi-label classification result. The method comprises the following specific steps:
(1) obtaining the characteristic X of the multi-label image to be detected by using Fast R-CNN as a baseline moduleIAnd a bounding box;
(2) obtaining initial feature representation of each bounding box in multi-label image to be detected by using ROI
Figure BDA0002469042170000091
(3) All the images to be detected
Figure BDA0002469042170000092
Merged into a convolutional network X(0)As input to the graph convolution network;
(4) integral initial characteristic X of output series images of graph convolution networkIConnected to a fully connected network;
(5) and (4) passing the output of the two layers of trained fully-connected networks through a softmax function to obtain a final multi-label classification result.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are merely illustrative of the principles of the invention, but that various changes and modifications may be made without departing from the spirit and scope of the invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (7)

1. An image multi-label identification method based on statistical correlation and graph convolution technology is characterized by comprising the following steps:
11) collecting and preprocessing multi-label images: collecting multi-label images, and processing labels into a matrix of N × C, wherein N is the number of samples, and C is the type or category number of the labels;
12) calculating the correlation between the labels: calculating the mutual dependency relationship among the labels by utilizing the mutual information, constructing a dependency relationship full-link graph and normalizing the dependency relationship full-link graph to obtain an adjacency matrix;
13) constructing an image multi-label identification network: constructing an image multi-label identification network based on the graph convolution network;
14) training the image multi-label recognition network: training a graph convolution network and a full connection layer in the image multi-label identification network;
15) acquiring a multi-label image to be detected: acquiring a multi-label image to be detected;
16) obtaining an image multi-label identification result: and inputting the multi-label image to be detected into the trained image multi-label identification network to obtain a final multi-label classification result.
2. The method for image multi-label recognition based on statistical correlation and graph convolution technology as claimed in claim 1, wherein the collecting and preprocessing of the multi-label image comprises the following steps:
21) constructing an N-C all-zero matrix D, wherein N is the number of images in the training set, C is the total number of categories in the training set, and C is arranged according to any rule;
22) converting the image labeling data into a label data matrix D, wherein one image and standard information thereof correspond to one row of data in the label data matrix D; for the images in all the labeled data, if a certain label exists in the image, the corresponding row and column are found in the label data matrix D, and are assigned as "1", which represents that the label exists.
3. The method for image multi-label recognition based on statistical correlation and graph convolution technology as claimed in claim 1, wherein said calculating the correlation between labels comprises the following steps:
31) for each column in the tag data matrix D, the mutual information of the column and other columns is calculated, and the calculation formula is as follows:
I(X;Y)=H(X)-H(X|Y)
Figure FDA0002469042160000011
H(X)=-∑X=xP(x)*logP(x),
wherein, X and Y are random variables representing the category of the label, X and Y are values of the random variables X and Y, X, Y ∈ {0,1}, P (X) is the probability that X is the random variable X ═ X, P (X | Y) is the conditional probability, H (X) is the information entropy, and H (X | Y) is the conditional information entropy;
regarding each column of the label data as a random variable X or Y, regarding the numerical value of each row as X or Y, calculating mutual information between nodes, constructing a C-row C-column matrix A for storing mutual information values, AijA mutual information value representing the ith column and the jth column;
32) computing the normalization of the matrix A as an adjacency matrix for the graph convolution network
Figure FDA0002469042160000021
Figure FDA0002469042160000022
Wherein: a. theijFor the mutual information values of the ith class and the jth class, exp is an exponential function, softmax is a normalizing function,
Figure FDA0002469042160000023
is a normalized adjacency matrix.
4. The image multi-label identification method based on statistical correlation and graph convolution technology as claimed in claim 1, wherein said constructing image multi-label identification network comprises the following steps:
41) setting and utilizing Fast R-CNN as a baseline module to obtain the characteristic X of each pictureIAnd a bounding box;
42) setting initial feature representation of each bounding box using ROI
Figure FDA0002469042160000024
43) Setting and utilizing a mutual information method to obtain a full-connection adjacent matrix, and carrying out normalization processing on the full-connection adjacent matrix, wherein the expression is as follows:
I(X;Y)=H(X)-H(X|Y),
Figure FDA0002469042160000025
44) setting a graph volume network: representing the initial features of each bounding box
Figure FDA0002469042160000026
Are combined to form X(0)Combining fully-connected adjacency matrices
Figure FDA0002469042160000027
As input of the graph convolution network, L-layer graph convolution is carried out to obtain feature representation
Figure FDA0002469042160000028
The expression is as follows:
Figure FDA0002469042160000029
wherein:
Figure FDA0002469042160000031
is an adjacent matrix, X is a matrix formed by the feature vectors of a plurality of nodes, W is a learnable parameter, and sigma (-) is an activation function;
45) setting of the full connection layer: and connecting the integral image features and the bounding box features after convolution of the image convolution network in series, connecting two layers of fully-connected neural networks, and activating by softmax to obtain a final classification result.
5. The method of claim 1, wherein the training of the histogram network comprises the following steps:
51) obtaining global feature representation of the image and a boundary box of an object in the image and feature representation thereof by using Fast R-CNN and ROI;
52) taking the characteristic representation of the object as the input of the graph convolution network, updating the corresponding node representation,
Figure FDA0002469042160000032
wherein, X(l+1)For the (l + 1) th layer map convolution characteristic, σ is a nonlinear activation function,
Figure FDA0002469042160000033
for the normalized global adjacency matrix obtained in the second step, X(l)For the level 1 feature representation, W is a learning parameter;
53) and connecting the global feature representation of the image and the object representation updated by the graph convolution network in series, connecting two FC layers, and finally obtaining a final multi-label identification result after normalization by a softmax function.
6. The method of claim 1, wherein the training of the fully connected layer comprises the following steps:
61) inputting the training image into a network to obtain a training result;
62) modifying the connection weight of the fully-connected network layer according to a gradient descent algorithm;
63) and correcting the graph convolution network parameter W according to a gradient descent algorithm.
7. The method for image multi-label recognition based on statistical correlation and graph convolution technology as claimed in claim 1, wherein the obtaining of the image multi-label recognition result comprises the following steps:
71) obtaining the characteristic X of the multi-label image to be detected by using Fast R-CNN as a baseline moduleIAnd a bounding box;
72) obtaining initial feature representation of each bounding box in multi-label image to be detected by using ROI
Figure FDA0002469042160000034
73) All the images to be detected
Figure FDA0002469042160000035
Merged into a convolutional network X(0)As input to the graph convolution network;
74) integral initial characteristic X of output series images of graph convolution networkIConnected to a fully connected network;
75) and (4) passing the output of the two layers of trained fully-connected networks through a softmax function to obtain a final multi-label classification result.
CN202010342622.8A 2020-04-27 2020-04-27 Image multi-label identification method based on statistical correlation and graph convolution technology Active CN111476315B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010342622.8A CN111476315B (en) 2020-04-27 2020-04-27 Image multi-label identification method based on statistical correlation and graph convolution technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010342622.8A CN111476315B (en) 2020-04-27 2020-04-27 Image multi-label identification method based on statistical correlation and graph convolution technology

Publications (2)

Publication Number Publication Date
CN111476315A true CN111476315A (en) 2020-07-31
CN111476315B CN111476315B (en) 2023-05-05

Family

ID=71763058

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010342622.8A Active CN111476315B (en) 2020-04-27 2020-04-27 Image multi-label identification method based on statistical correlation and graph convolution technology

Country Status (1)

Country Link
CN (1) CN111476315B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183299A (en) * 2020-09-23 2021-01-05 成都佳华物链云科技有限公司 Pedestrian attribute prediction method and device, electronic equipment and storage medium
CN112487207A (en) * 2020-12-09 2021-03-12 Oppo广东移动通信有限公司 Image multi-label classification method and device, computer equipment and storage medium
CN112862089A (en) * 2021-01-20 2021-05-28 清华大学深圳国际研究生院 Medical image deep learning method with interpretability
CN112906720A (en) * 2021-03-19 2021-06-04 河北工业大学 Multi-label image identification method based on graph attention network
CN113204659A (en) * 2021-03-26 2021-08-03 北京达佳互联信息技术有限公司 Label classification method and device for multimedia resources, electronic equipment and storage medium
CN113988147A (en) * 2021-12-08 2022-01-28 南京信息工程大学 Multi-label classification method and device for remote sensing image scene based on graph network, and multi-label retrieval method and device
CN114550310A (en) * 2022-04-22 2022-05-27 杭州魔点科技有限公司 Method and device for identifying multi-label behaviors
CN114648635A (en) * 2022-03-15 2022-06-21 安徽工业大学 Multi-label image classification method fusing strong correlation among labels
CN115031794A (en) * 2022-04-29 2022-09-09 天津大学 Novel gas-solid two-phase flow measuring method of multi-characteristic-diagram convolution
CN117475240A (en) * 2023-12-26 2024-01-30 创思(广州)电子科技有限公司 Vegetable checking method and system based on image recognition

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109816009A (en) * 2019-01-18 2019-05-28 南京旷云科技有限公司 Multi-tag image classification method, device and equipment based on picture scroll product
CN110705425A (en) * 2019-09-25 2020-01-17 广州西思数字科技有限公司 Tongue picture multi-label classification learning method based on graph convolution network
WO2020048119A1 (en) * 2018-09-04 2020-03-12 Boe Technology Group Co., Ltd. Method and apparatus for training a convolutional neural network to detect defects

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020048119A1 (en) * 2018-09-04 2020-03-12 Boe Technology Group Co., Ltd. Method and apparatus for training a convolutional neural network to detect defects
CN109816009A (en) * 2019-01-18 2019-05-28 南京旷云科技有限公司 Multi-tag image classification method, device and equipment based on picture scroll product
CN110705425A (en) * 2019-09-25 2020-01-17 广州西思数字科技有限公司 Tongue picture multi-label classification learning method based on graph convolution network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李辉等: "基于图卷积网络的多标签食品原材料识别", 《南京信息工程大学学报(自然科学版)》 *
蒋俊钊等: "基于标签相关性的卷积神经网络多标签分类算法", 《工业控制计算机》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183299B (en) * 2020-09-23 2024-02-09 成都佳华物链云科技有限公司 Pedestrian attribute prediction method and device, electronic equipment and storage medium
CN112183299A (en) * 2020-09-23 2021-01-05 成都佳华物链云科技有限公司 Pedestrian attribute prediction method and device, electronic equipment and storage medium
CN112487207A (en) * 2020-12-09 2021-03-12 Oppo广东移动通信有限公司 Image multi-label classification method and device, computer equipment and storage medium
CN112862089B (en) * 2021-01-20 2023-05-23 清华大学深圳国际研究生院 Medical image deep learning method with interpretability
CN112862089A (en) * 2021-01-20 2021-05-28 清华大学深圳国际研究生院 Medical image deep learning method with interpretability
CN112906720A (en) * 2021-03-19 2021-06-04 河北工业大学 Multi-label image identification method based on graph attention network
CN112906720B (en) * 2021-03-19 2022-03-22 河北工业大学 Multi-label image identification method based on graph attention network
CN113204659B (en) * 2021-03-26 2024-01-19 北京达佳互联信息技术有限公司 Label classification method and device for multimedia resources, electronic equipment and storage medium
CN113204659A (en) * 2021-03-26 2021-08-03 北京达佳互联信息技术有限公司 Label classification method and device for multimedia resources, electronic equipment and storage medium
CN113988147B (en) * 2021-12-08 2022-04-26 南京信息工程大学 Multi-label classification method and device for remote sensing image scene based on graph network, and multi-label retrieval method and device
CN113988147A (en) * 2021-12-08 2022-01-28 南京信息工程大学 Multi-label classification method and device for remote sensing image scene based on graph network, and multi-label retrieval method and device
CN114648635A (en) * 2022-03-15 2022-06-21 安徽工业大学 Multi-label image classification method fusing strong correlation among labels
CN114648635B (en) * 2022-03-15 2024-07-09 安徽工业大学 Multi-label image classification method fusing strong correlation among labels
CN114550310A (en) * 2022-04-22 2022-05-27 杭州魔点科技有限公司 Method and device for identifying multi-label behaviors
CN115031794A (en) * 2022-04-29 2022-09-09 天津大学 Novel gas-solid two-phase flow measuring method of multi-characteristic-diagram convolution
CN115031794B (en) * 2022-04-29 2024-07-26 天津大学 Novel gas-solid two-phase flow measuring method based on multi-feature graph convolution
CN117475240A (en) * 2023-12-26 2024-01-30 创思(广州)电子科技有限公司 Vegetable checking method and system based on image recognition

Also Published As

Publication number Publication date
CN111476315B (en) 2023-05-05

Similar Documents

Publication Publication Date Title
CN111476315B (en) Image multi-label identification method based on statistical correlation and graph convolution technology
CN114067160B (en) Small sample remote sensing image scene classification method based on embedded smooth graph neural network
Kauffmann et al. From clustering to cluster explanations via neural networks
CN108875827B (en) Method and system for classifying fine-grained images
US11003949B2 (en) Neural network-based action detection
CN112906720B (en) Multi-label image identification method based on graph attention network
CN113657425B (en) Multi-label image classification method based on multi-scale and cross-modal attention mechanism
CN110909820A (en) Image classification method and system based on self-supervision learning
CN112116599B (en) Sputum smear tubercle bacillus semantic segmentation method and system based on weak supervised learning
CN112308115B (en) Multi-label image deep learning classification method and equipment
CN110705490B (en) Visual emotion recognition method
CN111612051A (en) Weak supervision target detection method based on graph convolution neural network
Cholakkal et al. Backtracking spatial pyramid pooling-based image classifier for weakly supervised top–down salient object detection
Hossain et al. Recognition and solution for handwritten equation using convolutional neural network
CN115131613B (en) Small sample image classification method based on multidirectional knowledge migration
CN114332893A (en) Table structure identification method and device, computer equipment and storage medium
CN112183464A (en) Video pedestrian identification method based on deep neural network and graph convolution network
Juyal et al. Multilabel image classification using the CNN and DC-CNN model on Pascal VOC 2012 dataset
CN108960005B (en) Method and system for establishing and displaying object visual label in intelligent visual Internet of things
CN113553326A (en) Spreadsheet data processing method, device, computer equipment and storage medium
CN111259176B (en) Cross-modal Hash retrieval method based on matrix decomposition and integrated with supervision information
Liu et al. Self-supervised image co-saliency detection
CN114299342B (en) Unknown mark classification method in multi-mark picture classification based on deep learning
Kanungo Analysis of Image Classification Deep Learning Algorithm
CN112232398B (en) Semi-supervised multi-category Boosting classification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant