CN111476315B

CN111476315B - Image multi-label identification method based on statistical correlation and graph convolution technology

Info

Publication number: CN111476315B
Application number: CN202010342622.8A
Authority: CN
Inventors: 王儒敬; 滕越; 谢成军; 张洁; 李�瑞; 陈天娇; 陈红波; 胡海瀛; 刘海云
Original assignee: Hefei Institutes of Physical Science of CAS
Current assignee: Hefei Institutes of Physical Science of CAS
Priority date: 2020-04-27
Filing date: 2020-04-27
Publication date: 2023-05-05
Anticipated expiration: 2040-04-27
Also published as: CN111476315A

Abstract

The invention relates to an image multi-label identification method based on statistical correlation and graph convolution technology, which solves the defect that the relation between objects in a multi-label image is not fully considered compared with the prior art. The invention comprises the following steps: collecting and preprocessing multi-label images; calculating the correlation between the labels; constructing an image multi-label identification network; training an image multi-label recognition network; acquiring a multi-label image to be detected; and obtaining an image multi-label recognition result. According to the image multi-label classification method, the image label data are utilized to learn the adjacency matrix, object characteristic representation in the image is updated through the graph rolling network, and the multi-label classification performance of the image is improved by combining the global characteristic residual error.

Description

Image multi-label identification method based on statistical correlation and graph convolution technology

Technical Field

The invention relates to the technical field of image analysis, in particular to an image multi-label identification method based on statistical correlation and graph convolution technology.

Background

In recent years, convolutional neural networks have been developed in the field of computer vision, particularly in image classification techniques. Due to the limitation of the local receptive field of the convolution kernel, the convolution neural network is better at identifying a single object, and ignores the relation between objects. In an image, there are multiple related objects that appear substantially simultaneously, such as: teacher and student, mouse and keyboard, goat and grassland, etc. There are also some relationships that hardly appear in the same image, such as: dogs and planes, yaks and seas, snowflakes and swimwear, and the like. Therefore, a large number of dependency relations are contained in the image, and the convolutional neural network cannot model the dependency relations between the objects in the training data at present so as to improve the classification accuracy.

Graph convolutional networks are widely used to address the inherent limitations of convolutional neural networks, the main parts of which are a adjacency matrix, a characteristic representation matrix of nodes, and a learnable weight matrix. Among them, a great deal of research is focused on adjacency matrices. The adjacency matrix is constructed by the methods of semantic network, context information, knowledge graph and the like through partial research, but the message transmission between the nodes is only limited to the first-order neighbor nodes of the nodes. Furthermore, the graph network acquired through external information may not fit well with the learned image dataset, resulting in a knowledge graph misleading training situation.

Particularly in multi-tag images, there are a plurality of recognition (tag) objects in one image, and there is usually a certain correlation between these recognition objects, for example: when the user finds objects such as a keyboard and a mouse in a partial area of one image, the user considers that the computer object exists in the image with high probability, and correspondingly, the user can also estimate that the display object exists in the image with high probability. In other words, for the identification of the keyboard and the mouse in the picture, the probability of the existence of a host computer and a display in the picture is increased, and the probability of the existence of objects such as planes, elephants and the like in the picture can be reduced. It can be seen that modeling and reasoning of dependencies between objects in an image are critical.

Therefore, how to model the dependency relationship between objects from image data has become a technical problem to be solved in view of the current situation that convolutional neural networks ignore the relationship between objects.

Disclosure of Invention

The invention aims to solve the defect that the relation between objects in a multi-label image is not fully considered in the prior art, and provides an image multi-label identification method based on statistical correlation and graph convolution technology to solve the problems.

In order to achieve the above object, the technical scheme of the present invention is as follows:

an image multi-label identification method based on statistical correlation and graph convolution technology comprises the following steps:

collecting and preprocessing multi-label images: collecting multi-label images, and processing labels into a matrix of N x C, wherein N is the number of samples, and C is the type or class number of the labels;

calculating the correlation between tags: calculating the mutual dependency relationship between the labels by utilizing mutual information, constructing a dependency relationship full-connection graph and normalizing the graph to obtain an adjacency matrix;

constructing an image multi-label identification network: constructing an image multi-label recognition network based on the graph rolling network;

training an image multi-label recognition network: training a graph rolling network and a full connection layer in the image multi-label recognition network;

acquiring a multi-label image to be detected: acquiring a multi-label image to be detected;

obtaining an image multi-label recognition result: inputting the multi-label image to be detected into a trained image multi-label recognition network to obtain a final multi-label classification result.

The collecting and preprocessing of the multi-label image comprises the following steps:

constructing an all-zero matrix D of N x C, wherein N is the number of images in the training set, C is the total number of categories in the training set, and C is arranged according to any rule;

converting the image marking data into a tag data matrix D, wherein one image and standard information thereof correspond to one line of data in the tag data matrix D; for all the images in the labeling data, if a certain label exists in the images, the corresponding row and column are found in the label data matrix D, and the value is assigned as '1', which represents that the label exists.

The calculating the correlation between the tags includes the steps of:

for each column in the tag data matrix D, the mutual information of that column and the other columns is calculated as follows:

I(X；Y)＝H(X)-H(X|Y)

H(X)＝-∑ _X＝x P(x)*logP(x)，

wherein X and Y are random variables representing the class of the tag, X and Y are values of the random variables X and Y, X, Y e {0,1}, P (X) is probability of the random variables x=x, P (x|y) is conditional probability, H (X) is information entropy, H (x|y is conditional information entropy;

each column of the tag data is regarded as a random variable X or Y, the numerical value of each row is regarded as X or Y, mutual information among nodes is calculated, a C row and C column matrix A is constructed to store the mutual information value, A _ij Representing mutual information values of the ith column and the jth column;

computing an adjacency matrix normalizing matrix A as a graph convolution network

Wherein: a is that _ij Mutual for the ith class and the jth classInformation value, exp is an exponential function, softmax is a normalization function,

is the normalized adjacency matrix.

The construction of the image multi-label identification network comprises the following steps:

setting Fast R-CNN as a baseline module to obtain the characteristic X of each picture _I And a bounding box;

setting initial feature representation for each bounding box using ROI

Setting and utilizing a mutual information method to obtain a full-connection adjacency matrix, and carrying out normalization processing on the full-connection adjacency matrix, wherein the expression is as follows:

I(X；Y)＝H(X)-H(X|Y)，

setting a graph rolling network: representing the initial features of each bounding box

Combining to form X ⁽⁰⁾ Binding full-connectivity adjacency matrix->

As input of graph convolution network, the characteristic representation is obtained after L-layer graph convolution>

The expression is as follows: />

Wherein:

the method is characterized in that the method comprises the steps that the method is an adjacency matrix, X is a matrix formed by feature vectors of a plurality of nodes, W is a parameter which can be learned, and sigma (°) is an activation function;

setting of a full connection layer: and (3) connecting the whole image characteristics and the boundary frame characteristics after convolution of the graph convolution network in series, connecting two layers of fully-connected neural networks, and activating by softmax to obtain a final classification result.

The training of the graph rolling network comprises the following steps:

obtaining global feature representation of the image and a boundary box of an object in the image and feature representation of the object in the image by using Fast R-CNN and ROI;

taking the characteristic representation of the object as the input of the graph rolling network, updating the corresponding node representation,

wherein X is ^(l+1) For the layer 1 graph convolution feature, σ is the nonlinear activation function,

for the normalized global adjacency matrix obtained in the second step, X ^(l) For the first layer of characteristic representation, W is a learning parameter;

and (3) connecting the global feature representation of the image with the object representation updated by the graph rolling network in series, connecting two FC layers, and finally normalizing by a softmax function to obtain a final multi-label identification result.

The training of the full connection layer comprises the following steps:

inputting the training image into a network to obtain a training result;

correcting the connection weight of the full-connection network layer according to a gradient descent algorithm;

the graph roll-up network parameter W is modified according to a gradient descent algorithm.

The obtaining of the image multi-label recognition result comprises the following steps:

using Fast R-CNN as the baseline module,obtaining the characteristic X of the multi-label image to be detected _I And a bounding box;

obtaining an initial feature representation of each bounding box in a multi-label image to be detected using an ROI

All of each image to be detected

Merging into a graph convolution network X ⁽⁰⁾ As input to a graph convolution network;

integral initial feature X of output serial images of graph rolling network _I Connecting to a fully connected network;

and obtaining a final multi-label classification result by the output of the two layers of trained fully-connected networks through a softmax function.

Advantageous effects

Compared with the prior art, the image multi-label identification method based on the statistical correlation and the graph rolling technology utilizes image label data to learn the adjacency matrix, updates object feature representation in the image through the graph rolling network, and combines global feature residual errors to improve the image multi-label classification performance.

The method can well combine the image feature extraction capability of the convolutional neural network and the mutual dependence relationship of the labels, thereby improving the precision of multi-label classification.

Drawings

FIG. 1 is a process sequence diagram of the present invention.

Detailed Description

For a further understanding and appreciation of the structural features and advantages achieved by the present invention, the following description is provided in connection with the accompanying drawings, which are presently preferred embodiments and are incorporated in the accompanying drawings, in which:

as shown in fig. 1, the image multi-label identification method based on the statistical correlation and graph convolution technology of the present invention includes the following steps:

the first step, collecting and preprocessing the multi-label image: and collecting multi-label images, and processing labels into a matrix of N x C, wherein N is the number of samples, and C is the type or class number of the labels. The method comprises the following specific steps:

(1) And constructing an all-zero matrix D of N x C, wherein N is the number of images in the training set, C is the total number of categories in the training set, and C is arranged according to any rule.

(2) Converting the image marking data into a tag data matrix D, wherein one image and standard information thereof correspond to one line of data in the tag data matrix D; for all the images in the labeling data, if a certain label exists in the images, the corresponding row and column are found in the label data matrix D, and the value is assigned as '1', which represents that the label exists.

Second, calculate the correlation between the labels: and calculating the mutual dependency relationship among the labels by utilizing the mutual information, constructing a dependency relationship full-connection graph and normalizing the graph to obtain an adjacency matrix. The statistical correlation among the labels is modeled, so that the performance of multi-label classification can be improved. The adjacency matrix may direct the messaging of features between objects in the graph rolling network, thereby enhancing the feature representation of associated objects and reducing the messaging between non-statistically relevant objects. Most of the current methods for constructing the adjacency matrix are often constructed through external knowledge (such as semantic network, knowledge graph and the like), but the external knowledge cannot be well matched with the training data set, so that the adjacency matrix misleads message transmission, and therefore, we model the statistical correlation among labels based on the label data of the training data set. Conventional statistical correlation modeling often requires independent detection of tag data, which is a time-consuming and labor-intensive task. The information entropy may describe the magnitude of the information uncertainty contained in a random variable, and the mutual information may describe the degree to which the information uncertainty of one random variable decreases with the addition of another random variable. In addition, the computational complexity of mutual information is far less than that of chi-square detection, so that we use the mutual information to calculate the correlation between image labels and normalize the correlation as an adjacency matrix to guide the message passing between image multi-label recognition objects.

The method comprises the following specific steps:

(1) For each column in the tag data matrix D, the mutual information of that column and the other columns is calculated as follows:

I(X；Y)＝H(X)-H(X|Y(

H(X)＝-∑ _X＝x P(x)*logP(x)，

wherein X and Y are random variables representing the types of the labels, X and Y are values of the random variables X and Y, X, Y epsilon {0,1}, P (X) is probability of the random variables X=x, P (x|y) is conditional probability, H (X) is information entropy, H (x|Y is conditional information entropy, and information entropy describes how much uncertainty the information contains, and the mutual information is used for replacing the conditional independence test innovatively to quantitatively describe the correlation among the labels of the picture types.

Each column of the tag data is regarded as a random variable X or Y, the numerical value of each row is regarded as X or Y, mutual information among nodes is calculated, a C row and C column matrix A is constructed to store the mutual information value, A _ij Representing the mutual information values of the ith and jth columns.

(2) Computing an adjacency matrix normalizing matrix A as a graph convolution network

/>

Wherein: a is that _ij For the mutual information values of the ith class and the jth class, exp is an exponential function, softmax is a normalization function,

is the normalized adjacency matrix.

Thirdly, constructing an image multi-label identification network: an image multi-label recognition network is constructed based on the graph convolution network.

The graph convolution network can effectively integrate a connection sense framework and a symbol sense reasoning framework of a convolution network and the like, and performs message transmission and reasoning among image objects according to the guidance of the adjacent matrix, so that the performance of multi-label classification is improved. In many cases, the conventional graph rolling network uses external knowledge as an adjacency matrix and semantic vector as a node feature vector. But the external knowledge and external node vector representation do not fit well into the training dataset, we then use Fast R-CNN and ROI (region of interest) to extract the feature representation of each object as the feature vector of the node. And simultaneously, the characteristic representation of the whole image and the characteristic representation of the node (object) which is transmitted by the graph convolution message are connected in series and are delivered to a two-layer fully-connected network to obtain a final classification result. The benefits of doing so are mainly two: 1. the message passing and characteristic enhancement capability of the graph convolutional network aims at training data and is not misled by external knowledge; 2. the image integral features and the graph convolution network node features are connected in series, so that the classification acceptance domain can not lose the integral information of the image when in a local object region, and a stable classification effect is achieved.

(1) Setting Fast R-CNN as a baseline module to obtain the characteristic X of each picture _I And a bounding box;

(2) Setting up an initial feature representation of each bounding box using ROI (region of interest)

(3) Setting and utilizing a mutual information method to obtain a full-connection adjacency matrix, and carrying out normalization processing on the full-connection adjacency matrix, wherein the expression is as follows:

I(X；Y)＝H(X)-H(X|Y)，

(4) Setting a graph rolling network: representing the initial features of each bounding box

Combining to form X ⁽⁰⁾ Binding full-connectivity adjacency matrix->

The expression is as follows:

wherein:

for the adjacency matrix, X is a matrix composed of feature vectors of a plurality of nodes, W is a learnable parameter, and sigma (°) is an activation function.

(5) Setting of a full connection layer: and (3) connecting the whole image characteristics and the boundary frame characteristics after convolution of the graph convolution network in series, connecting two layers of fully-connected neural networks, and activating by softmax to obtain a final classification result.

Training the image multi-label recognition network: the graph rolling network and the full connection layer in the image multi-label recognition network are trained.

Wherein training the graph rolling network comprises the following steps:

(1) Obtaining global feature representation of the image and a boundary box of an object in the image and feature representation of the object in the image by using Fast R-CNN and ROI;

(2) Taking the characteristic representation of the object as the input of the graph rolling network, updating the corresponding node representation,

(3) And (3) connecting the global feature representation of the image with the object representation updated by the graph rolling network in series, connecting two FC layers, and finally normalizing by a softmax function to obtain a final multi-label identification result.

Training the fully connected layer uses a traditional method, which comprises the following steps:

(1) Inputting the training image into a network to obtain a training result;

(2) Correcting the connection weight of the full-connection network layer according to a gradient descent algorithm;

(3) The graph roll-up network parameter W is modified according to a gradient descent algorithm.

Fifthly, acquiring a multi-label image to be detected: and acquiring a multi-label image to be detected.

Sixth, obtaining an image multi-label recognition result: inputting the multi-label image to be detected into a trained image multi-label recognition network to obtain a final multi-label classification result. The method comprises the following specific steps:

(1) Obtaining the characteristic X of the multi-label image to be detected by using Fast R-CNN as a baseline module _I And a bounding box;

(2) Obtaining an initial feature representation of each bounding box in a multi-label image to be detected using an ROI

(3) All of each image to be detected

(4) Integral initial feature X of output serial images of graph rolling network _I Connecting to a fully connected network;

(5) And obtaining a final multi-label classification result by the output of the two layers of trained fully-connected networks through a softmax function.

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention, which is defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The image multi-label identification method based on the statistical correlation and graph convolution technology is characterized by comprising the following steps of:

11 Collecting and preprocessing of multi-label images: collecting multi-label images, and processing labels into a matrix of N x C, wherein N is the number of samples, and C is the type or class number of the labels;

12 Calculating the correlation between tags): calculating the mutual dependency relationship between the labels by utilizing mutual information, constructing a dependency relationship full-connection graph and normalizing the graph to obtain an adjacency matrix;

the calculating the correlation between the tags includes the steps of:

121 For each column in the tag data matrix D, the mutual information of that column and the other columns is calculated as follows:

I(X；Y)＝H(X)-H(X|Y)

H(X)＝-∑ _x＝x P(x)*logP(x)，

wherein X and Y are random variables representing the types of labels, X and Y are values of the random variables X and Y, X, Y epsilon {0,1}, P (X) is probability of the random variables X=x, P (x|y) is conditional probability, H (X) is information entropy, and H (x|Y) is conditional information entropy;

122 Calculating a adjacency matrix normalized to matrix a as a graph convolution network

the normalized adjacency matrix;

13 Building an image multi-label recognition network: constructing an image multi-label recognition network based on the graph rolling network;

131 Setting Fast R-CNN as a base line module to obtain the characteristic X of each picture _I And a bounding box;

132 Setting initial feature representation for each bounding box using ROI

133 Setting and obtaining a full-connection adjacency matrix by using a mutual information method, and carrying out normalization processing on the full-connection adjacency matrix, wherein the expression is as follows:

I(X；Y)＝H(X)-H(X|Y),

134 Setting of graph rolling network: representing the initial features of each bounding box

Combining to form X ⁽⁰⁾ Binding full-connectivity adjacency matrix->

The expression is as follows:

wherein:

135 Setting of the full connection layer: the whole image features and the boundary frame features after convolution of the graph convolution network are connected in series, two layers of fully-connected neural networks are connected, and a final classification result is obtained after softmax activation;

14 Training the image multi-label recognition network: training a graph rolling network and a full connection layer in the image multi-label recognition network;

15 Acquisition of a multi-label image to be detected: acquiring a multi-label image to be detected;

16 Image multi-label recognition result is obtained: inputting the multi-label image to be detected into a trained image multi-label recognition network to obtain a final multi-label classification result.

2. The image multi-label identification method based on statistical correlation and graph rolling technology according to claim 1, wherein the collection and preprocessing of the multi-label image comprises the following steps:

21 Constructing an all-zero matrix D of N x C, wherein N is the number of images in the training set, C is the total number of categories in the training set, and C is arranged according to any rule;

22 Converting the image marking data into a tag data matrix D, wherein one image and standard information thereof correspond to one line of data in the tag data matrix D; for all the images in the labeling data, if a certain label exists in the images, the corresponding row and column are found in the label data matrix D, and the value is assigned as '1', which represents that the label exists.

3. The method for identifying multiple labels of an image based on statistical correlation and graph rolling technique according to claim 1, wherein said training of the graph rolling network comprises the steps of:

31 Obtaining global feature representation of the image and a boundary box of the object in the image and feature representation thereof by using Fast R-CNN and ROI;

32 With the characteristic representation of the object as input to the graph convolution network, updating the corresponding node representation,

33 The global feature representation of the image is connected with the object representation updated by the graph rolling network in series, two FC layers are connected, and finally, the final multi-label recognition result is obtained after the normalization of the softmax function.

4. The method for identifying multiple image labels based on statistical correlation and graph rolling technology according to claim 1, wherein said training of the full-connection layer comprises the following steps:

41 Inputting the training image into a network to obtain a training result;

42 Correcting the connection weight of the full-connection network layer according to a gradient descent algorithm;

43 Correcting the graph roll-up network parameter W according to a gradient descent algorithm.

5. The image multi-label recognition method based on the statistical correlation and graph rolling technology according to claim 1, wherein the obtaining of the image multi-label recognition result comprises the following steps:

51 Using Fast R-CNN as a baseline module to obtain the characteristic X of the multi-label image to be detected _I And a bounding box;

52 Obtaining an initial feature representation of each bounding box in the multi-label image to be detected using the ROI

53 All of each image to be detected

54 Integral initial feature X of output serial images of graph rolling network _I Connecting to a fully connected network;

55 The output of the two layers of trained fully-connected networks is subjected to softmax function to obtain a final multi-label classification result.