CN112685590B

CN112685590B - Image retrieval method based on convolutional neural network regularization processing

Info

Publication number: CN112685590B
Application number: CN202011597827.7A
Authority: CN
Inventors: 李宏亮; 戚耀; 钟子涵; 李泊琦
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2022-10-14
Anticipated expiration: 2040-12-29
Also published as: CN112685590A

Abstract

The invention provides an image retrieval method based on convolutional neural network regularization, which is characterized in that the convolutional neural network regularization based on structure expansion is utilized, a neural network structure is represented in a directed acyclic graph form, a series of expansion is carried out on the structure of the directed acyclic graph, then a neural network corresponding to the expanded graph is trained, and finally layers except an original structure in the neural network are deleted. Compared with the prior art, the method provided by the invention improves the performance of the neural network under the condition of not increasing reasoning cost, and has application prospects in all large directions of computer vision.

Description

Image retrieval method based on convolutional neural network regularization processing

Technical Field

The invention relates to an image retrieval technology, in particular to a convolutional neural network regularization technology.

Background

The content-based image retrieval has wide application prospects in various industrial fields. The image retrieval based on the content utilizes a computer to analyze the image, establishes image feature vector description and stores the image feature vector description in an image feature library, when a user inputs a query image, the same feature extraction method is used for extracting the features of the query image to obtain a query vector, then the similarity of the query vector to each feature in the feature library is calculated under a certain similarity measurement criterion, and finally, the corresponding images are sorted according to the similarity and output in sequence.

The image retrieval technology based on the content gives the expression and the similarity measurement of the image content to a computer for automatic processing, thereby greatly improving the retrieval efficiency and opening a new door for the retrieval of a massive image library. However, the disadvantage is also present, mainly as a semantic gap exists between the feature description and the high level semantics, which is difficult to fill, and is not eliminable. Then, people apply the convolutional neural network to image retrieval to solve the problem of semantic gap, and use the features extracted by the neural network as the image features for retrieval. The convolutional neural network often has a certain overfitting problem due to the limitation of an optimization algorithm. This will affect the extraction of image features and ultimately the accuracy of image retrieval.

For the over-fitting problem, besides Dropout, batch Normalization, etc., there are some methods to change the network structure to achieve regularization. The use of auxiliary classifiers to achieve gradient propagation to the shallow layers and the regularization effect that exists is proposed by the paper "Going stripper with volumes" published in 2015 at the CVPR conference. In addition, the paper "FractalNet: ultra-Deep Neural Networks with out results", published in 2017 at the ICLR conference, achieves regularization by randomly deleting parts of the network structure at training.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a method for improving the structure of the convolutional neural network to solve the overfitting problem under the condition of not increasing reasoning cost, so that the image retrieval accuracy is improved.

The invention adopts the technical scheme that the image retrieval method based on the convolutional neural network regularization processing comprises the following steps:

1) Training a neural network:

1-1) representing the convolutional neural network structure as a directed acyclic bipartite graph; each tensor of each input and output sub-network in the convolutional neural network forms each node in a set Y of the bipartite graph, each sub-network forms each node in a set X of the bipartite graph, and the nodes in the bipartite graph are connected according to the sequence from an input layer to an output layer; the sub-network is a set of each operation unit or connected operation units in the neural network;

1-2) setting the number M of selected nodes and the expansion times N of the graph structure corresponding to each node _m (ii) a Selecting M selected nodes from the hidden layer; the hidden layer is the output tensor of each sub-network;

1-3) determining one not yet expanded in the order of direction from closer to the output layer to closer to the input layerSelecting nodes, and performing N on the graph structure behind the selected nodes _m Secondary expansion; judging whether selected nodes which are not expanded exist, if so, returning to the step 1-3), otherwise, generating a convolutional neural network based on structure expansion corresponding to the expanded graph structure, and then entering the step 1-4);

1-4) inputting images in a training set into a convolutional neural network based on structure expansion, deleting network structures of all expansion parts and weights obtained by corresponding training after training is finished, and only keeping the original convolutional neural network structure and the weights obtained by corresponding training as the trained convolutional neural network;

2) An image retrieval step:

2-1) inputting an image to be retrieved into the trained convolutional neural network, and obtaining the characteristics of the image to be retrieved from a hidden layer in an original convolutional neural network structure;

2-2) searching the image characteristics of the image library by taking the image characteristics to be searched as a query vector, and outputting a picture corresponding to the image characteristics with the highest similarity with the query vector as a search result.

Convolutional neural networks are typically optimized using a gradient descent algorithm. According to the gradient descent algorithm, the gradient of any one feature spectrum is related to all weights of the layers passed from that feature spectrum to the missing layer. Therefore, an extra network structure and loss are connected behind a certain characteristic spectrum, so that richer gradients can be obtained, and the purpose of regularization is realized. Therefore, the invention provides a method for improving the image detection preparation rate based on the structural expansion convolutional neural network regularization.

The method has the advantages that a more flexible and more universal regularization scheme is utilized, the robustness of the convolutional neural network extraction features can be improved, the method can be applied to convolutional neural networks of image retrieval, more complex convolutional neural networks of more basic image classification, target detection and the like, the network performance is improved under the condition that reasoning cost is not increased, and the accuracy of image processing is improved.

Drawings

Fig. 1 is a simple residual block diagram.

Fig. 2 is a diagram of a simple residual network architecture.

Fig. 3 is a directed acyclic representation of the residual network shown in fig. 2.

Fig. 4 is a directed acyclic graph after a network structure is expanded once.

Fig. 5 is a directed acyclic graph after two network structure expansions.

Detailed Description

First, a regularization processing method of the present invention is explained, which includes the following steps:

the method comprises the following steps: representing the neural network structure in the form of a directed acyclic graph;

step two: selecting a certain node in the graph, and expanding a partial structure of the graph for a plurality of times;

step three: repeating the step two for a plurality of times;

step four: and training the neural network corresponding to the graph obtained in the third step.

The first step is as follows:

(1) Giving a convolutional neural network, outputting zero or a plurality of tensors for any operation by taking zero or a plurality of tensors as input, and defining the operation as a layer;

(2) According to actual requirements, a plurality of connected layers which complete independent functions are regarded as a sub-network;

(3) The following two types of nodes are defined in the graph:

i. each subnetwork n _i Uniquely corresponding to a node v _i These nodes form a set X;

each input or output tensor t of a respective subnetwork _i Corresponding to one node u only _i These nodes constitute a set Y;

all nodes form a point set V = X + Y;

(4) Directed edges are added to the graph according to the following rules:

i. if t _i Is n _j The output of (v) is added with a directed edge (v) _j ,u _i )；

if t, ii _i Is n _j The input of (b) is added with a directed edge (u) _i ,v _j )；

All the edges form an edge set E;

(5) The point set V and the edge set E form a directed graph D (E, V)

(6) Generally, graph D possesses the following properties:

i.D is directed acyclic;

d is a bipartite graph, and (X, Y) is a division thereof;

iii, the node with out degree greater than zero and in degree zero corresponds to the input layer of the network and represents a tensor;

iv, nodes with out-degrees of zero and in-degrees of greater than zero correspond to each loss of the network;

v. nodes with out degrees greater than zero and in degrees greater than zero correspond to subnets of the network and represent a plurality of operations;

the expanding method of the second step is as follows:

(1) Determining the number M of selected nodes and expanding times N _m ∈{N ₁ ,…,N _i ,…,N _M }，m＝1,…,M；

M tensor nodes u are selected from the Y set _i ，u _i The set of reachable all or part of the lossy layer nodes is denoted as V _L ；

(2) Addition of | V to X _L The number of l lossy layer nodes,

adding | V to Y _L The | pieces represent the newly added loss value nodes,

(3) Adding directed edges

The whole graph is still a bipartite graph, and the X set and the Y set are divided into the bipartite graph;

(4) Adding N to X and Y respectively _m The nodes and directed edges of the sub-steps (2) and (3);

(5) Repeating the steps (2), (3) and (4) M times until the expansion of the M selected nodes is completed.

When the structure is expanded, the expanded structure can be the same as the original structure or different from the original structure.

The invention has no special requirements on the specific structure of the convolutional neural network. The convolutional neural network comprises an input layer, a hidden layer and an output layer; the hidden layer is produced by different types of sub-networks, including convolutional layers, active layers, pooling layers, fully-connected layers, or more structurally complex residual blocks, etc. The convolution layer, the activation layer, the pooling layer and the full-connection layer are all independent operation units, and the residual block is a set of connected operation units.

Fig. 1 is a simple residual block diagram, in which the convolution Conv, activation function ReLU, add, etc. are all "layers" by definition, and the hexagon represents a tensor. The input tensor of the residual block is simultaneously input to the Conv layer and the Add layer, the output tensor of the Conv layer is also output to the Add layer after passing through the ReLU layer, and the output of the Add layer is the output tensor of the residual block.

A convolutional neural network as shown in fig. 2, in which the active ReLU and batch normalized BN layers are omitted. The network body comprises 4 sub-networks n ₁ To n ₄ Respectively, a convolution layer, a residual block 1, a residual block 2 and a full connection layer FC; the structure of the residual block 1 and the residual block 2 is substantially the same as that of fig. 1 (both have one more convolutional layer, and the residual block 2 has one more convolutional layer for spatial down-sampling). Sub-network n ₅ For calculating the loss.

The input tensor passes through the convolution layer and 2 residual blocks, and then passes through the full connection layer and the loss layer to be output. The residual network structure is conventionally divided into 5 sub-networks n ₁ To n ₅ Convolution layer, residual block 1, residual block 2, full link layer and loss layer, respectively, forward propagation also denoted by n ₁ To n ₅ As shown in table 1:

convolutional layerConv	Sub-network n ₁
		Residual block 1	Sub-network n ₂
Residual block 2	Sub-network n ₃
		Full connection layer FC	Sub-network n ₄
Loss layer Loss	Sub-network n ₅

TABLE 1

In the training process of the convolutional neural network, the regularization processing mode is as follows:

1-1) representing the convolutional neural network structure as a directed acyclic bipartite graph; represented by a directed acyclic graph, as shown in FIG. 3, with a circular node u ₁ To u ₆ A representation tensor, constituting a set Y; square node v ₁ To v ₅ For representing sub-networks n ₁ To n ₅ Forming a set X;

1-2) setting the number M =2 of selected nodes and the expansion times N of the graph structure _m ∈{N ₁ ＝2,N ₂ ＝1}；N _m Represents the number of expansions starting from the mth selected node, M =1, …, M; selecting 2 selected nodes u from the set Y of hidden layers ₄ And u ₃ ；

1-3) determining a selected node u not yet propagated in order of direction from near the output layer to near the input layer ₄ The selected node u ₄ Loss node (u) reachable to it ₆ ) Structure of (b) is carried out by N ₁ =2 expansion expanded graph structure (two new expansion points)Loss node u ₆ ' and u ₆ "); fig. 4 is a directed acyclic graph after the expansion, in which the dotted nodes are newly added nodes, and correspond to the network structure shown in table 2;

TABLE 2

The output tensor of the residual block 2 is simultaneously used as the input of the full connection, the newly added full connection _1 and the newly added full connection _ 2; the outputs of the full connection, the newly added full connection _1 and the newly added full connection _2 are respectively used as the inputs of a loss layer, the newly added loss layer _1 and the newly added loss layer _ 2;

because there is also a selected node u which is not expanded ₃ Then, the next step is started to select the node u ₃ Expanding;

1-4) determining a selected node u not yet propagated in order of direction from near the output layer to near the input layer ₃ Will select node u ₃ Loss node (u) reachable to it ₆ 、u ₆ ' and u ₆ ") across a network structure by N ₂ =1 expansion; fig. 5 is a directed acyclic graph after the expansion, in which the dotted nodes are newly added nodes, and correspond to the network structure shown in table 3;

TABLE 3

The output of the residual block 1 is simultaneously used as the input of the residual block 2 and the residual block 2_1; the output of the residual block 2 is simultaneously used as the input of full connection, full connection _1 and full connection _ 2; the output of the residual block 2_1 is simultaneously the input of full connection _3, full connection _4, full connection _ 5; the outputs of full connection, full connection _1, full connection _2, full connection _3, full connection _4 and full connection _5 are respectively used as the inputs of a loss layer, a loss layer _1, a loss layer _2, a loss layer _3, a loss layer _4 and a loss layer _ 5;

and if no selected node without expansion exists, using the current expanded graph structure as a convolutional neural network based on structure expansion, and completing regularization.

In the training process, images of a training set are input to the convolutional neural network based on structure expansion, after training is completed, the network structures of all expansion parts and the weights obtained by corresponding training are deleted, and only the original convolutional neural network structure and the weights obtained by corresponding training are reserved as the trained convolutional neural network.

Then, inputting the image to be retrieved into the trained convolutional neural network, and obtaining the characteristics of the image to be retrieved from a hidden layer in the trained convolutional neural network structure; and searching the image characteristics of the image library by taking the image characteristics to be searched as a query vector, and outputting a picture corresponding to the image characteristics with the highest similarity with the query vector as a search result.

The trained convolutional neural network can also be directly used for image classification to directly obtain image classification results, and can also be used as an image feature extraction module in image processing systems for image retrieval, target detection and the like, namely, image features are obtained from the output of a hidden layer of the trained convolutional neural network and are used for subsequent processing steps.

Training is carried out on a standard data set MNIST by using a network (table 1) shown in fig. 2, the final classification accuracy is 99.27%, the network structure is expanded to be shown in fig. 5 (table 3) by using the regularization method of the embodiment, the same hyper-parameters are used for training, the newly added structure and weights are deleted, the accuracy is improved to 99.48%, the overfitting phenomenon is relieved under the condition that no reasoning cost is increased, and the network performance is improved.

Claims

1. An image retrieval method based on convolutional neural network regularization processing is characterized by comprising the following steps:

1) Training a neural network:

1-1) representing the convolutional neural network structure as a directed acyclic bipartite graph; each input tensor and each output tensor of each sub-network in the convolutional neural network form each node in a set Y of the bipartite graph, each sub-network forms each node in a set X of the bipartite graph, and the nodes in the bipartite graph are connected according to the sequence from an input layer to an output layer; the sub-network is a set of each operation unit or connected operation units in the neural network;

1-2) setting the number M of selected nodes and the expansion times N of the graph structure corresponding to each node _m (ii) a Selecting M selected nodes from the set Y;

1-3) determining a selected node which is not expanded according to the approaching direction sequence from the output layer to the input layer, and carrying out N on the graph structure behind the selected node _m Secondary expansion; judging whether a selected node which is not expanded exists, if so, returning to the step 1-3), otherwise, generating a convolutional neural network based on structure expansion corresponding to the expanded graph structure, and then entering the step 1-4);

1-4) inputting the images in the training set into a convolutional neural network based on structure expansion, deleting the network structures of all the expansion parts and the weights obtained by corresponding training after the training is finished, and only keeping the original convolutional neural network structure and the weights obtained by corresponding training as the trained convolutional neural network;

2) An image retrieval step:

2-1) inputting an image to be retrieved into a trained convolutional neural network, and obtaining the characteristics of the image to be retrieved from a hidden layer in an original convolutional neural network structure;

and 2-2) searching the image features to be retrieved in the image features of the image library by taking the image features as query vectors, and outputting the image corresponding to the image features with the highest similarity with the query vectors as a retrieval result.