CN113378965A

CN113378965A - Multi-label image identification method and system based on DCGAN and GCN

Info

Publication number: CN113378965A
Application number: CN202110713085.8A
Authority: CN
Inventors: 刘嵩; 来庆涵; 周梓涵
Original assignee: Qilu University of Technology
Current assignee: Qilu University of Technology
Priority date: 2021-06-25
Filing date: 2021-06-25
Publication date: 2021-09-10
Anticipated expiration: 2041-06-25
Also published as: CN113378965B

Abstract

The present disclosure provides a multi-label recognition algorithm based on DCGAN and GCN, including: constructing a DCGAN model based on the GAN model, and generating a similar image based on the DCGAN model; extracting features based on a transferred CNN algorithm, transferring parameters of a neural network of a DCGAN model to the CNN algorithm to extract features of a multi-label image, and generating a class label classifier by using a GCN algorithm through a relation graph among training labels; and generating a data pre-training model by generating a confrontation network through deep convolution, and transferring parameters of a convolution neural network of the pre-training model to a target task to fine tune the network so as to obtain a more accurate image recognition effect. Meanwhile, random noise is added when the image is generated, and therefore robustness of the pre-training model can be improved.

Description

Multi-label image identification method and system based on DCGAN and GCN

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a multi-label image recognition method and system based on a DCGAN (deep convolutional countermeasure network) and a GCN (graph neural network).

Background

In the internet era, multimedia data, such as images and short videos, become the mainstream of information, and have an important influence on the life of people. Image recognition is a branch of the computer vision field, and by labeling an image with an appropriate label, visual information conveyed by the image is converted into semantic information which is easy to understand by people, so that people can better understand and analyze the image. Single-label image classification algorithms have been studied for many years, such as support vector machines, random forest algorithms, and the like. Supervised deep learning algorithms such as convolutional neural networks have excellent performance in single-label image recognition, and are widely applied to the fields of transportation, medical treatment and the like. However, in real life, one image has a plurality of objects, scenes, and the like in most cases. Different objects in each picture are associated with each other, so multi-label image classification becomes a general problem, and multi-label image classification is popularization of the single-label classification problem. Convolutional neural networks have made good progress in the application of single-label image classification, and provide a basis for the research of multi-label images.

The content in the multi-label image is complex, and the problems of occlusion of the target, complex background, unobvious target and the like are possible. It may not be applicable if the processing algorithm of the single label image is applied directly to the multi-label image. A simpler method for solving the problem of multi-label image identification is to convert the multi-label problem into a plurality of single-label problems.

One of the main difficulties in multi-tag learning is the explosive growth of output space, and in order to solve the problem of tag space with exponential complexity, the correlation between tags needs to be mined. For example, if an image is labeled with rainforest tropics and soccer, then it is highly likely to have brazilian labels. A document labeled as an entertainment tag is less likely to be politically related. The effective mining of the correlation among the labels is the key for the success of multi-label learning.

The inventor finds that a multi-label image recognition method based on DCGAN and GCN can be formed by a method of generating data by a GAN algorithm and mapping label features to corresponding label classifiers by a graph convolution neural network.

Disclosure of Invention

In order to solve the defects of the prior art, the embodiment of the present disclosure provides a multi-label image recognition method based on DCGAN and GCN, which can extract features by adopting a deep learning method, can solve the recognition problem of multi-label images, and can reduce labor cost.

In order to achieve the purpose, the following technical scheme is adopted for achieving the purpose:

a multi-label identification algorithm and a multi-label identification system based on DCGAN and GCN are provided.

In a first aspect, the present disclosure provides a DCGAN and GCN-based multi-tag identification algorithm, including:

constructing a DCGAN model based on the GAN model, and generating a similar image based on the DCGAN model;

generating similar images based on a DCGAN model, extracting features by using a CNN algorithm based on migration, migrating parameters of a neural network of the DCGAN model into the CNN algorithm to extract features of the multi-label images, and generating a class label classifier by using the GCN algorithm through a relation graph among training labels;

and (3) generating a category label classifier based on the GCN algorithm, classifying and identifying the multi-label image, performing point multiplication on the features extracted by the CNN algorithm and a semantic feature vector matrix in the category classifier generated by the GCN algorithm, and identifying the image by using the multi-label classifier.

In a second aspect, the present disclosure provides a multi-label recognition system for DCGAN and GCN based auto-supervised learning, comprising a picture generation module configured to generate similar images based on a DCGAN model;

the feature extraction module is configured to extract features based on the transferred CNN algorithm, transfer parameters of a neural network of the DCGAN model to the CNN algorithm to extract features of the multi-label image, and generate a class label classifier through a relation graph between training labels by using the GCN algorithm;

and the image identification module is configured to classify and identify the multi-label image based on the GCN algorithm, and identify the image by the multi-label classifier after point multiplication is carried out on the features extracted by the CNN algorithm and a semantic feature vector matrix in a class classifier generated by the GCN algorithm.

In a third aspect, the present disclosure provides a computer-readable storage medium for storing computer instructions which, when executed by a processor, perform a DCGAN and GCN based multi-tag identification algorithm as described in the first aspect.

In a fourth aspect, the present disclosure provides an electronic device comprising a memory and a processor, and computer instructions stored in the memory and executed on the processor, wherein the computer instructions, when executed by the processor, implement a DCGAN and GCN based multi-tag identification algorithm as described in the first aspect.

Compared with the prior art, the beneficial effect of this disclosure is:

according to the method, data generated in a DCGAN model and data in an original data set are mixed into a new data set, so that multi-label images are more diverse, various features generated in a training process are easy to classify and recognize, the problem of overfitting possibly occurring in the training process is solved, and meanwhile, random noise is added during training to enhance the robustness of a pre-training model;

according to the method, a deep convolution generation countermeasure network (DCGAN) algorithm is selected to generate the pictures similar to reality in the data set, so that the integrity of the pictures is guaranteed, and the diversity of the pictures in the data set is increased;

and generating a data pre-training model by generating a confrontation network through deep convolution, and transferring parameters of a convolution neural network of the pre-training model to a target task to fine tune the network so as to obtain a more accurate image recognition effect. Meanwhile, random noise is added when the image is generated, and therefore robustness of the pre-training model can be improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.

FIG. 1 is a schematic flow chart of a multi-label image recognition method based on DCGAN and GCN according to the present disclosure;

FIG. 2 is a schematic flow chart of the DCGAN algorithm of the embodiment of the present disclosure;

fig. 3 is a picture (b) and an original picture (a) generated in the DCGAN model according to the embodiment of the present disclosure;

FIG. 4 is a graph of a loss function of an embodiment of the present disclosure;

fig. 5 is a ResNet shallow residual unit diagram (a) and a deep residual unit diagram (b) of an embodiment of the present disclosure;

FIG. 6 is a tag dependency modeling diagram of an embodiment of the present disclosure;

fig. 7 is a diagram of a graph convolution neural network structure according to an embodiment of the present disclosure.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The deep convolution generation countermeasure network algorithm is a generation model for unsupervised learning improved by a generation countermeasure network (GAN) model, and compared with a general GAN algorithm, the deep convolution generation countermeasure network algorithm has better capability of generating data and is more stable in training, and generated samples are more diversified.

The Graph Convolutional neural Network (GCN) algorithm is a semi-supervised learning method, which is trained end to end by using node attributes and node labels, and the core idea is to update the representation of nodes through the transmission of information between the nodes. The graph convolution neural network is used for constructing a directed graph among a plurality of objects according to a topological structure, and expressing the correlation among labels in a graph mode, so that the graph convolution neural network is shallow and easy to understand.

Example 1

As shown in fig. 1, the present disclosure provides a DCGAN and GCN based multi-tag identification algorithm, including:

Further, the generating of the similar image based on the DCGAN model comprises generating training data of a similar image adding data set through the DCGAN model; and mixing the data generated in the DCGAN model with the data in the original data set into a new data set.

In particular, the amount of the solvent to be used,

the DCGAN algorithm generates data equivalent to Pretext task for pretraining, and in the training process, the added training data can relieve the overfitting problem and enhance the robustness of the algorithm. According to the method, a deep convolution generation countermeasure network (DCGAN) algorithm is selected to generate the pictures similar to reality in the data set, so that the integrity of the pictures is guaranteed, and the diversity of the pictures in the data set is increased.

The DCGAN model transforms a generator g (generator) and a discriminator d (discriminator) in the GAN model into two convolutional neural networks.

As shown in fig. 2, which is a flow chart of the DCGAN algorithm, the generator is replaced by a pre-trained ResNet-101 network, the G network of the two convolutional neural networks uses Relu as an activation function, the last layer uses a Tanh function, and the D network uses LeakyRelu as an activation function. BN (batch normalization) layer is adopted in the generator G and the discriminator D, and the normalization method behind the convolution layer can help the network to be converged quickly.

In the process of model training, the goal of generating the network G is to generate vivid pictures as much as possible to deceive and distinguish the network D, the goal of D is to distinguish the pictures generated by the network G from real pictures as much as possible, and pictures which can be falsely and truly generated in the process of mutual game between the generator G and the discriminator D are generated.

As shown in fig. 3, the images in the partial data sets are added into a deep convolution generation countermeasure network (DCGAN) algorithm model for training, and through continuous training, "game" between the generator and the discriminator finally generates a picture similar to the images in the original data sets.

As shown in fig. 4, data generated in a deep convolution generated countermeasure network (DCGAN) algorithm model is mixed with data in an original data set to form a new data set, so that the multi-label image is more diverse, various features generated in a training process are easy to classify and recognize, the problem of overfitting possibly occurring in the training process is also relieved, and meanwhile, random noise is added during training to enhance the robustness of a pre-training model.

Fig. 4 shows a graph of the loss function of the deep convolution generated countermeasure network (DCGAN) during the training process. The present disclosure has made relevant experiments on the PASCAL VOC data set, and in this section shows the results of an algorithm trained on the PASCAL VOC data set.

Further, the migration-based CNN algorithm feature extraction includes migrating parameters of a neural network of a generator in the DCGAN algorithm to the CNN algorithm to extract features of multi-label images in a newly combined training set, and utilizing back propagation of the algorithm after a loss function is calculated by inputting the training set to achieve fine tuning of the network.

Further, a residual error network corresponding to a pre-trained generator when the DCGAN is used for generating the image is used as a CNN algorithm for extracting the features, and the features of the image are obtained by adopting global maximum pooling;

optionally, the image label relation graph is input into a GCN algorithm, and a class classifier is generated by mapping labels in the image through training.

In particular, the amount of the solvent to be used,

residual error network ResNet-101 is used by a generator G trained when the DCGAN generates an image in a pre-training mode, and ResNet-101 network parameters of the generator G generated by generating a countermeasure network (DCGAN) through deep convolution after the pre-training are transferred to a CNN algorithm, so that the image of a training set can be subjected to feature extraction, and meanwhile, the network can be finely adjusted by utilizing the back propagation of the algorithm after the loss function is calculated by inputting the training set.

According to the method, a residual error network ResNet-101 corresponding to a pre-trained generator G is used as a CNN algorithm for extracting features when a DCGAN is used for generating an image, and finally, the features x of the image are obtained by global maximum pooling;

x＝f_GMP(f_cnn(I；θ_CNN))∈R^D (1)

in the formula (1), θ_CNNThe parameter is represented, D2048, and I represents the image.

As shown in fig. 5, ResNet uses two types of residual units, a shallow residual unit and a deep residual unit. While the ResNet-101 algorithm uses deep residual units.

The ResNet-101 algorithm parameters are shown in table 1, and the ResNet change is mainly reflected in that the ResNet network replaces a full connection layer with a global average pore layer in addition to using a stride of 2 as a downsampling, and the ResNet network maintains the complexity of the network layer. As can be seen from the table, as the network is deeper, it does residual learning between three layers, the three layers of convolution kernels being 1x1, 3x3 and 1x1, respectively.

TABLE 1 ResNet-101 Algorithm parameters

Firstly, inputting the number of all labels in a training set, learning the relevance among various class labels through GCN, and training the probability among the learned labels by adopting a cross-correlation matrix.

Secondly, modeling the correlation dependence between the labels in the form of conditional probability and constructing a correlation coefficient matrix.

And finally, after the characteristics are extracted through a CNN algorithm, performing point multiplication on the characteristics and an output matrix obtained by GCN network training to obtain vectors for classification, and performing multi-classification by using a cross entropy loss function.

The GCN algorithm learns the semantic features of the corresponding labels of the images in the algorithm. And embedding the label information by a GLOVE pre-training language model by the GCN to obtain an input matrix of the GCN. The input during training is the number of all labels in a training set, the relevance between various class labels is learned through GCN, and the probability between the learned labels is trained by adopting a cross-correlation matrix to initially adjoin a matrix.

As shown in FIG. 6, the present disclosure models the dependency of the correlation between tags in the form of conditional probabilities.

From FIG. 6, P (L) can be seen_j|L_i) Is not equal to P (L)_i|L_j) And therefore the correlation coefficient matrix is not symmetric.

Constructing a correlation coefficient matrix, comprising the following steps:

(1) counting the occurrence times of the label pairs in the training data set to obtain a matrix M (C);

(2) using the label co-occurrence matrix to obtain a conditional probability matrix: pi ═ Mi/Ni, where Ni is the probability of a label appearing in the training dataset;

(3) and (4) carrying out binarization processing to eliminate noise introduced by the co-occurrence probability.

A threshold τ is used to filter the noise edge, where a (C × C) is a binary correlation coefficient matrix:

(4) when training is carried out, according to an image label input into a training set, a corresponding word embedding vector is obtained, so that an input H (C x D) and an adjacent matrix A of the GCN are obtained, H and A are input into the GCN together, a C x D-dimensional output matrix is finally obtained, the output matrix of the GCN and the output vector of each image in the CNN are subjected to point multiplication, vectors for classification are finally obtained, and then a cross entropy loss function is used for multi-classification and back propagation adjustment parameters are carried out.

In the testing stage, after the picture is input, the characteristic is extracted through a CNN algorithm, the characteristic is subjected to point multiplication with an output matrix obtained through GCN network training to obtain a vector for classification, and multi-classification is carried out by using a cross entropy loss function.

As shown in fig. 7, which is a structure of a graph convolution neural network, the whole graph is input, in the convolution layer 1, a convolution operation is performed on the neighbors of each node, and the node is updated by the convolution result; then through an activation function such as ReLU, through a layer of convolution layer 2 and an activation function; the above process is repeated until the number of layers reaches the desired depth.

Similar to the Graph Neural Network (GNN) algorithm, the graph convolution neural network also has a local output function for converting the node states (including hidden states and node features) into task-related labels, such as the naval account number classification; there are also tasks to classify the whole graph, such as compound classification.

Unlike standard convolution methods, the goal of a graph convolution neural network is to learn the function f (,) of a graph G. The input to this function is a feature description and a relationship matrix A ∈ R^n×nThereby updating the node characteristics to H^l+1∈R^n×d'. Each GCN layer can be written as a non-linear function:

H^l+1＝f(H^l，A) (3)

f (,) may be expressed as:

as can be seen by the formula, complex relationships between nodes can be modeled by stacking multiple GCN layers.

And constructing a Graph among target labels of the multi-label images in the data set, wherein each node (label) is represented by a word vector (word embedding). A graph convolutional neural network (GCN) network maps the label graph into a set of interdependent target classifiers. A GCN-based map during trainingMethod for learning interdependent object classifier by ray function from label features

Stacked GCNs are used in the study, where each GCN layer I takes its input as the previous layer H^lAs input, and then outputs a new node signature H^l+1. The input to the first layer is the word embedding vector H ∈ R^C×dThe output of the last layer of the matrix is the classifier W ∈ R^C×D。

And finally, point-multiplying a semantic feature vector matrix (C multiplied by D dimension matrix) in the category classifier generated by GCN with the feature vector extracted by the ResNet-101 algorithm to obtain a vector for classification, and then training the classifier to perform classification and identification of the multi-label image.

By applying the learned classifier to the image features, a prediction score is obtained:

suppose that the true label of an image is y ∈ R^cAnd there are C types of labels in total, the loss function of the whole multi-label classification recognition algorithm network is as follows:

σ () is a sigmoid function in the above formula.

Example 2

The multi-label identification system based on the self-supervision learning of DCGAN and GCN is realized based on the server, and the server comprises:

a picture generation module configured to generate similar images based on a DCGAN model;

Example 3

A computer readable storage medium storing computer instructions which, when executed by a processor, perform a DCGAN and GCN based multi-tag identification algorithm as described in the first aspect.

Example 4

An electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, which when executed by the processor, perform a DCGAN and GCN based multi-tag identification algorithm as described in the first aspect.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A multi-label identification algorithm based on DCGAN and GCN is characterized by comprising the following steps:

2. The DCGAN and GCN based multi-label recognition algorithm of claim 1, wherein the DCGAN model based generation of similar images comprises generating training data of similar image augmentation data sets by the DCGAN model; and mixing the data generated in the DCGAN model with the data in the original data set into a new data set.

3. The DCGAN and GCN based multi-label recognition algorithm of claim 1, wherein the migration based CNN algorithm feature extraction comprises migrating the neural network parameters of the generator in the DCGAN algorithm into the CNN algorithm to perform feature extraction on the multi-label images in the newly merged training set, and using the back propagation of the algorithm after the loss function is calculated in the input training set to realize the fine tuning of the network;

and adopting a residual error network corresponding to a pre-trained generator when the DCGAN generates the image as a CNN algorithm for extracting the features, and acquiring the features of the image by adopting global maximum pooling.

4. The multi-label recognition algorithm based on DCGAN and GCN as claimed in claim 1, wherein the residual network corresponding to the pre-trained generator when DCGAN is used to generate image is used as CNN algorithm for extracting features, and the features of image are obtained by global maximum pooling.

5. The multi-label recognition algorithm based on DCGAN and GCN as claimed in claim 1, wherein the number of all labels in the training set is inputted, the correlation between each class label is learned by GCN, and the probability between the learned labels is trained by using the initial adjacency matrix.

6. The DCGAN and GCN based multi-tag identification algorithm of claim 1,

and modeling the correlation dependence between the labels in the form of conditional probability, and constructing a correlation coefficient matrix.

7. The multi-label recognition algorithm based on DCGAN and GCN as claimed in claim 1, wherein after the features are extracted by CNN algorithm, the vectors for classification are obtained by point multiplication with the output matrix obtained by GCN network training, and the multi-classification is performed by using cross entropy loss function.

8. A multi-label identification system based on DCGAN and GCN is realized based on a server, and is characterized in that the server comprises:

the characteristic extraction module is configured to generate similar images based on a DCGAN model, extract characteristics by using a CNN algorithm based on migration, migrate parameters of a neural network of the DCGAN model into the CNN algorithm to extract characteristics of multi-label images, and generate a class label classifier by using a GCN algorithm through a relation graph among training labels;

and the image identification module is configured to generate a category label classifier based on the GCN algorithm, classify and identify the multi-label image, perform point multiplication on the features extracted by the CNN algorithm and a semantic feature vector matrix in the category classifier generated by the GCN algorithm, and identify the image by using the multi-label classifier.

9. A computer readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor of a terminal device and to execute a DCGAN and GCN based multi-tag identification algorithm according to any of claims 1-7.

10. A terminal device comprising a processor and a computer readable storage medium, the processor being configured to implement instructions; a computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor and to execute a DCGAN and GCN based multi-tag identification algorithm according to any of claims 1-7.