CN111476315A

CN111476315A - Image multi-label identification method based on statistical correlation and graph convolution technology

Info

Publication number: CN111476315A
Application number: CN202010342622.8A
Authority: CN
Inventors: 王儒敬; 滕越; 谢成军; 张洁; 李�瑞; 陈天娇; 陈红波; 胡海瀛; 刘海云
Original assignee: Hefei Institutes of Physical Science of CAS
Current assignee: Hefei Institutes of Physical Science of CAS
Priority date: 2020-04-27
Filing date: 2020-04-27
Publication date: 2020-07-31
Anticipated expiration: 2040-04-27
Also published as: CN111476315B

Abstract

The invention relates to an image multi-label identification method based on statistical correlation and graph convolution technology, which overcomes the defect that the relation between objects in a multi-label image is not fully considered compared with the prior art. The invention comprises the following steps: collecting and preprocessing a multi-label image; calculating the correlation between the labels; constructing an image multi-label identification network; training the image multi-label recognition network; acquiring a multi-label image to be detected; and obtaining an image multi-label identification result. The method utilizes image label data to learn the adjacency matrix, updates object feature representation in the image through a graph convolution network, and improves the multi-label classification performance of the image by combining global feature residual errors.

Description

Image multi-label identification method based on statistical correlation and graph convolution technology

Technical Field

The invention relates to the technical field of image analysis, in particular to an image multi-label identification method based on statistical correlation and graph convolution technology.

Background

In recent years, convolutional neural networks have been developed dramatically in the field of computer vision, especially image classification techniques. Because of the limitation of local receptive fields of convolution kernels, the convolutional neural network is better at identifying a single object and ignores the relationship between objects. In an image, there are basically a plurality of related objects appearing simultaneously, such as: teachers and students, mice and keyboards, goats and grasslands, etc. There are also some relationships that hardly appear in the same image, such as: dogs and airplanes, yaks and seas, snowflakes, swimsuits, and the like. Therefore, a large number of dependency relationships are contained in the image, and the current convolutional neural network cannot model the dependency relationships between the objects from the training data so as to improve the classification accuracy.

Graph convolutional networks are widely used to solve the inherent limitations of convolutional neural networks, and the main parts of the graph convolutional neural networks are an adjacency matrix, a feature representation matrix of nodes and a learnable weight matrix. Among them, a lot of research is focused on the adjacency matrix. Some researches establish the adjacency matrix through semantic network, context information, knowledge graph and other methods, but the message transmission between nodes is limited to the first-order neighbor nodes of the nodes. Furthermore, the graph network obtained through external information may not fit well with the learned image dataset, leading to a situation where the knowledge graph misleads the training.

In particular, in a multi-label image, there are a plurality of identification (label) objects in one image, and there is usually a certain correlation between these identification objects, for example: when an object such as a keyboard, a mouse and the like is found in a partial area of an image, the image is considered to have a computer object with a high probability, and accordingly, a display object can be estimated to exist in the image. In other words, for the identification of the keyboard and the mouse in the picture, the probability that the computer host and the display exist in the picture is increased, and the probability that the airplane, the elephant and other objects exist in the picture can be reduced. It can be seen that modeling and reasoning about dependencies between multiple objects in an image is crucial.

Therefore, how to ignore the relationship between objects in the current situation of the convolutional neural network, modeling the dependency relationship between objects from the image data has become an urgent technical problem to be solved.

Disclosure of Invention

The invention aims to solve the defect that the relationship between objects in a multi-label image is not fully considered in the prior art, and provides an image multi-label identification method based on statistical correlation and graph convolution technology to solve the problem.

In order to achieve the purpose, the technical scheme of the invention is as follows:

an image multi-label identification method based on statistical correlation and graph convolution technology comprises the following steps:

collecting and preprocessing multi-label images: collecting multi-label images, and processing labels into a matrix of N × C, wherein N is the number of samples, and C is the type or category number of the labels;

calculating the correlation between the labels: calculating the mutual dependency relationship among the labels by utilizing the mutual information, constructing a dependency relationship full-link graph and normalizing the dependency relationship full-link graph to obtain an adjacency matrix;

constructing an image multi-label identification network: constructing an image multi-label identification network based on the graph convolution network;

training the image multi-label recognition network: training a graph convolution network and a full connection layer in the image multi-label identification network;

acquiring a multi-label image to be detected: acquiring a multi-label image to be detected;

obtaining an image multi-label identification result: and inputting the multi-label image to be detected into the trained image multi-label identification network to obtain a final multi-label classification result.

The collection and pre-processing of the multi-label image comprises the steps of:

constructing an N-C all-zero matrix D, wherein N is the number of images in the training set, C is the total number of categories in the training set, and C is arranged according to any rule;

converting the image labeling data into a label data matrix D, wherein one image and standard information thereof correspond to one row of data in the label data matrix D; for the images in all the labeled data, if a certain label exists in the image, the corresponding row and column are found in the label data matrix D, and are assigned as "1", which represents that the label exists.

The calculating the correlation between the labels comprises the following steps:

for each column in the tag data matrix D, the mutual information of the column and other columns is calculated, and the calculation formula is as follows:

I(X；Y)＝H(X)-H(X|Y)

H(X)＝-∑_X＝xP(x)*logP(x)，

wherein, X and Y are random variables representing the category of the label, X and Y are values of the random variables X and Y, X, Y ∈ {0,1}, P (X) is the probability that X is the random variable X ═ X, P (X | Y) is the conditional probability, H (X) is the information entropy, and H (X | Y is the conditional information entropy;

regarding each column of the label data as a random variable X or Y, regarding the numerical value of each row as X or Y, calculating mutual information between nodes, constructing a C-row C-column matrix A for storing mutual information values, A_ijA mutual information value representing the ith column and the jth column;

computing the normalization of the matrix A as an adjacency matrix for the graph convolution network

Wherein: a. the_ijFor the mutual information values of the ith class and the jth class, exp is an exponential function, softmax is a normalizing function,

is a normalized adjacency matrix.

The method for constructing the image multi-label identification network comprises the following steps:

setting and utilizing Fast R-CNN as a baseline module to obtain the characteristic X of each picture_IAnd a bounding box;

setting initial feature representation of each bounding box using ROI

Setting and utilizing a mutual information method to obtain a full-connection adjacent matrix, and carrying out normalization processing on the full-connection adjacent matrix, wherein the expression is as follows:

I(X；Y)＝H(X)-H(X|Y)，

setting a graph volume network: representing the initial features of each bounding box

Are combined to form X⁽⁰⁾Combining fully-connected adjacency matrices

As input of the graph convolution network, L-layer graph convolution is carried out to obtain feature representation

The expression is as follows:

wherein:

is an adjacent matrix, X is a matrix formed by the feature vectors of a plurality of nodes, W is a learnable parameter, and sigma (-) is an activation function;

setting of the full connection layer: and connecting the integral image features and the bounding box features after convolution of the image convolution network in series, connecting two layers of fully-connected neural networks, and activating by softmax to obtain a final classification result.

The training of the graph convolution network comprises the following steps:

obtaining global feature representation of the image and a boundary box of an object in the image and feature representation thereof by using Fast R-CNN and ROI;

taking the characteristic representation of the object as the input of the graph convolution network, updating the corresponding node representation,

wherein, X^(l+1)For the (l + 1) th layer map convolution characteristic, σ is a nonlinear activation function,

for the normalized global adjacency matrix obtained in the second step, X^(l)Is the l-th layer feature representation, W is the learning parameter;

and connecting the global feature representation of the image and the object representation updated by the graph convolution network in series, connecting two FC layers, and finally obtaining a final multi-label identification result after normalization by a softmax function.

The training of the full connection layer comprises the following steps:

inputting the training image into a network to obtain a training result;

modifying the connection weight of the fully-connected network layer according to a gradient descent algorithm;

and correcting the graph convolution network parameter W according to a gradient descent algorithm.

The obtaining of the image multi-label identification result comprises the following steps:

obtaining the characteristic X of the multi-label image to be detected by using Fast R-CNN as a baseline module_IAnd a bounding box;

obtaining initial feature representation of each bounding box in multi-label image to be detected by using ROI

All the images to be detected

Merged into a convolutional network X⁽⁰⁾As input to the graph convolution network;

integral initial characteristic X of output series images of graph convolution network_IConnected to a fully connected network;

and (4) passing the output of the two layers of trained fully-connected networks through a softmax function to obtain a final multi-label classification result.

Advantageous effects

Compared with the prior art, the image multi-label identification method based on the statistical correlation and the graph convolution technology utilizes image label data to learn the adjacency matrix, updates object feature representation in the image through the graph convolution network, and improves the image multi-label classification performance by combining global feature residual errors.

The method can well combine the image feature extraction capability of the convolutional neural network and the mutual dependency relationship of the labels, thereby improving the precision of multi-label classification.

Drawings

FIG. 1 is a sequence diagram of the method of the present invention.

Detailed Description

So that the manner in which the above recited features of the present invention can be understood and readily understood, a more particular description of the invention, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings, wherein:

as shown in fig. 1, the image multi-label identification method based on statistical correlation and graph convolution technology according to the present invention includes the following steps:

the first step, the collection and preprocessing of multi-label images: collecting multi-label images, and processing labels into a matrix of N × C, wherein N is the number of samples, and C is the type or category number of the labels. The method comprises the following specific steps:

(1) and constructing an all-zero matrix D of N x C, wherein N is the number of images in the training set, C is the total number of categories in the training set, and C is arranged according to any rule.

(2) Converting the image labeling data into a label data matrix D, wherein one image and standard information thereof correspond to one row of data in the label data matrix D; for the images in all the labeled data, if a certain label exists in the image, the corresponding row and column are found in the label data matrix D, and are assigned as "1", which represents that the label exists.

Secondly, calculating the correlation between the labels: and calculating the mutual dependency relationship among the labels by utilizing the mutual information, constructing a dependency relationship full-link graph and normalizing the dependency relationship full-link graph to obtain the adjacency matrix. The statistical correlation among the labels is modeled, and the performance of multi-label classification can be improved. The adjacency matrix may direct the messaging of features between objects in the graph convolution network, thereby enhancing the feature representation of the associated object and reducing the messaging between non-statistically related objects. Most of the existing methods for constructing the adjacency matrix are constructed through external knowledge (such as semantic networks, knowledge maps and the like), but the external knowledge cannot well fit with a training data set, so that the adjacency matrix misleads message transmission, and therefore statistical correlation among labels is modeled from label data of the training data set. Traditional statistical correlation modeling often requires independent detection of tag data, which is a time-consuming and labor-consuming task. The information entropy can describe the size of information uncertainty contained in the random variable, and the mutual information can describe the degree of information uncertainty of one random variable reducing with the addition of another random variable. In addition, the computation complexity of mutual information is far less than that of chi-square detection, so that the mutual information is used for computing the correlation between image labels, and the correlation is normalized to be used as an adjacent matrix for guiding the message transmission between image multi-label identification objects.

The method comprises the following specific steps:

(1) for each column in the tag data matrix D, the mutual information of the column and other columns is calculated, and the calculation formula is as follows:

I(X；Y)＝H(X)-H(X|Y(

H(X)＝-∑_X＝xP(x)*logP(x)，

x and Y are random variables and represent the category of the label, X and Y are values of the random variables X and Y, X, Y ∈ {0,1}, P (X) is the probability that X is the random variable X ═ X, P (X | Y) is the conditional probability, H (X) is the information entropy, H (X | Y is the conditional information entropy, the information entropy describes the uncertainty of the information, mutual information is innovatively used for replacing the conditional independence test, and the correlation between the picture category labels is described quantitatively.

Regarding each column of the label data as a random variable X or Y, regarding the numerical value of each row as X or Y, calculating mutual information between nodes, constructing a C-row C-column matrix A for storing mutual information values, A_ijRepresenting the mutual information values of the ith and jth columns.

(2) Computing the normalization of the matrix A as an adjacency matrix for the graph convolution network

is a normalized adjacency matrix.

Thirdly, constructing an image multi-label identification network: and constructing an image multi-label identification network based on the graph convolution network.

The graph convolution network can effectively fuse a connection meaning frame such as a convolution network and a symbolic meaning reasoning frame, and carry out message transmission and reasoning between image objects according to the guidance of the adjacent matrix, thereby improving the performance of multi-label classification. In most applications of the conventional graph convolution network, external knowledge is used as an adjacent matrix, and semantic directions are used as node feature vectors. But the external knowledge and external node vector representations do not fit well with the training data set, we use Fast R-CNN and roi (region of interest) to extract the feature representation of each object as the feature vector of the node. Meanwhile, the feature representation of the whole image and the node (object) feature representation obtained by the graph convolution information are connected in series and handed to a two-layer full-connection network to obtain a final classification result. The benefits of this are mainly two: 1. the message transmission and feature enhancement capability of the graph convolution network aims at training data and is not misled by external knowledge; 2. the series connection of the image overall characteristic and the graph convolution network node characteristic can ensure that the overall information of the image is not lost while the classification receiving domain is in the local object region, thereby achieving the stable classification effect.

(1) setting and utilizing Fast R-CNN as a baseline module to obtain the characteristic X of each picture_IAnd a bounding box;

(2) setting the ROI (region of interest) to obtain the initial feature representation of each bounding box

(3) Setting and utilizing a mutual information method to obtain a full-connection adjacent matrix, and carrying out normalization processing on the full-connection adjacent matrix, wherein the expression is as follows:

I(X；Y)＝H(X)-H(X|Y)，

(4) setting a graph volume network: representing the initial features of each bounding box

Are combined to form X⁽⁰⁾Combining fully-connected adjacency matrices

The expression is as follows:

wherein:

is a adjacency matrix, X is a matrix composed of feature vectors of a plurality of nodes, W is a learnable parameter, and σ (-) is an activation function.

(5) Setting of the full connection layer: and connecting the integral image features and the bounding box features after convolution of the image convolution network in series, connecting two layers of fully-connected neural networks, and activating by softmax to obtain a final classification result.

Fourthly, training the image multi-label recognition network: and training a graph convolution network and a full connection layer in the image multi-label identification network.

The method for training the graph convolution network comprises the following steps:

(1) obtaining global feature representation of the image and a boundary box of an object in the image and feature representation thereof by using Fast R-CNN and ROI;

(2) taking the characteristic representation of the object as the input of the graph convolution network, updating the corresponding node representation,

(3) and connecting the global feature representation of the image and the object representation updated by the graph convolution network in series, connecting two FC layers, and finally obtaining a final multi-label identification result after normalization by a softmax function.

Training the fully-connected layer utilizes a conventional method, which includes the steps of:

(1) inputting the training image into a network to obtain a training result;

(2) modifying the connection weight of the fully-connected network layer according to a gradient descent algorithm;

(3) and correcting the graph convolution network parameter W according to a gradient descent algorithm.

And fifthly, acquiring the multi-label image to be detected: and acquiring a multi-label image to be detected.

Sixthly, obtaining an image multi-label identification result: and inputting the multi-label image to be detected into the trained image multi-label identification network to obtain a final multi-label classification result. The method comprises the following specific steps:

(1) obtaining the characteristic X of the multi-label image to be detected by using Fast R-CNN as a baseline module_IAnd a bounding box;

(2) obtaining initial feature representation of each bounding box in multi-label image to be detected by using ROI

(3) All the images to be detected

(4) integral initial characteristic X of output series images of graph convolution network_IConnected to a fully connected network;

(5) and (4) passing the output of the two layers of trained fully-connected networks through a softmax function to obtain a final multi-label classification result.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are merely illustrative of the principles of the invention, but that various changes and modifications may be made without departing from the spirit and scope of the invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. An image multi-label identification method based on statistical correlation and graph convolution technology is characterized by comprising the following steps:

11) collecting and preprocessing multi-label images: collecting multi-label images, and processing labels into a matrix of N × C, wherein N is the number of samples, and C is the type or category number of the labels;

12) calculating the correlation between the labels: calculating the mutual dependency relationship among the labels by utilizing the mutual information, constructing a dependency relationship full-link graph and normalizing the dependency relationship full-link graph to obtain an adjacency matrix;

13) constructing an image multi-label identification network: constructing an image multi-label identification network based on the graph convolution network;

14) training the image multi-label recognition network: training a graph convolution network and a full connection layer in the image multi-label identification network;

15) acquiring a multi-label image to be detected: acquiring a multi-label image to be detected;

16) obtaining an image multi-label identification result: and inputting the multi-label image to be detected into the trained image multi-label identification network to obtain a final multi-label classification result.

2. The method for image multi-label recognition based on statistical correlation and graph convolution technology as claimed in claim 1, wherein the collecting and preprocessing of the multi-label image comprises the following steps:

21) constructing an N-C all-zero matrix D, wherein N is the number of images in the training set, C is the total number of categories in the training set, and C is arranged according to any rule;

22) converting the image labeling data into a label data matrix D, wherein one image and standard information thereof correspond to one row of data in the label data matrix D; for the images in all the labeled data, if a certain label exists in the image, the corresponding row and column are found in the label data matrix D, and are assigned as "1", which represents that the label exists.

3. The method for image multi-label recognition based on statistical correlation and graph convolution technology as claimed in claim 1, wherein said calculating the correlation between labels comprises the following steps:

31) for each column in the tag data matrix D, the mutual information of the column and other columns is calculated, and the calculation formula is as follows:

I(X；Y)＝H(X)-H(X|Y)

H(X)＝-∑_X＝xP(x)*logP(x)，

wherein, X and Y are random variables representing the category of the label, X and Y are values of the random variables X and Y, X, Y ∈ {0,1}, P (X) is the probability that X is the random variable X ═ X, P (X | Y) is the conditional probability, H (X) is the information entropy, and H (X | Y) is the conditional information entropy;

32) computing the normalization of the matrix A as an adjacency matrix for the graph convolution network

is a normalized adjacency matrix.

4. The image multi-label identification method based on statistical correlation and graph convolution technology as claimed in claim 1, wherein said constructing image multi-label identification network comprises the following steps:

41) setting and utilizing Fast R-CNN as a baseline module to obtain the characteristic X of each picture_IAnd a bounding box;

42) setting initial feature representation of each bounding box using ROI

43) Setting and utilizing a mutual information method to obtain a full-connection adjacent matrix, and carrying out normalization processing on the full-connection adjacent matrix, wherein the expression is as follows:

I(X；Y)＝H(X)-H(X|Y)，

44) setting a graph volume network: representing the initial features of each bounding box

Are combined to form X⁽⁰⁾Combining fully-connected adjacency matrices

The expression is as follows:

wherein:

45) setting of the full connection layer: and connecting the integral image features and the bounding box features after convolution of the image convolution network in series, connecting two layers of fully-connected neural networks, and activating by softmax to obtain a final classification result.

5. The method of claim 1, wherein the training of the histogram network comprises the following steps:

51) obtaining global feature representation of the image and a boundary box of an object in the image and feature representation thereof by using Fast R-CNN and ROI;

52) taking the characteristic representation of the object as the input of the graph convolution network, updating the corresponding node representation,

for the normalized global adjacency matrix obtained in the second step, X^(l)For the level 1 feature representation, W is a learning parameter;

53) and connecting the global feature representation of the image and the object representation updated by the graph convolution network in series, connecting two FC layers, and finally obtaining a final multi-label identification result after normalization by a softmax function.

6. The method of claim 1, wherein the training of the fully connected layer comprises the following steps:

61) inputting the training image into a network to obtain a training result;

62) modifying the connection weight of the fully-connected network layer according to a gradient descent algorithm;

63) and correcting the graph convolution network parameter W according to a gradient descent algorithm.

7. The method for image multi-label recognition based on statistical correlation and graph convolution technology as claimed in claim 1, wherein the obtaining of the image multi-label recognition result comprises the following steps:

71) obtaining the characteristic X of the multi-label image to be detected by using Fast R-CNN as a baseline module_IAnd a bounding box;

72) obtaining initial feature representation of each bounding box in multi-label image to be detected by using ROI

73) All the images to be detected

74) integral initial characteristic X of output series images of graph convolution network_IConnected to a fully connected network;

75) and (4) passing the output of the two layers of trained fully-connected networks through a softmax function to obtain a final multi-label classification result.