CN112685590B - Image retrieval method based on convolutional neural network regularization processing - Google Patents

Image retrieval method based on convolutional neural network regularization processing Download PDF

Info

Publication number
CN112685590B
CN112685590B CN202011597827.7A CN202011597827A CN112685590B CN 112685590 B CN112685590 B CN 112685590B CN 202011597827 A CN202011597827 A CN 202011597827A CN 112685590 B CN112685590 B CN 112685590B
Authority
CN
China
Prior art keywords
neural network
convolutional neural
image
network
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011597827.7A
Other languages
Chinese (zh)
Other versions
CN112685590A (en
Inventor
李宏亮
戚耀
钟子涵
李泊琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202011597827.7A priority Critical patent/CN112685590B/en
Publication of CN112685590A publication Critical patent/CN112685590A/en
Application granted granted Critical
Publication of CN112685590B publication Critical patent/CN112685590B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides an image retrieval method based on convolutional neural network regularization, which is characterized in that the convolutional neural network regularization based on structure expansion is utilized, a neural network structure is represented in a directed acyclic graph form, a series of expansion is carried out on the structure of the directed acyclic graph, then a neural network corresponding to the expanded graph is trained, and finally layers except an original structure in the neural network are deleted. Compared with the prior art, the method provided by the invention improves the performance of the neural network under the condition of not increasing reasoning cost, and has application prospects in all large directions of computer vision.

Description

Image retrieval method based on convolutional neural network regularization processing
Technical Field
The invention relates to an image retrieval technology, in particular to a convolutional neural network regularization technology.
Background
The content-based image retrieval has wide application prospects in various industrial fields. The image retrieval based on the content utilizes a computer to analyze the image, establishes image feature vector description and stores the image feature vector description in an image feature library, when a user inputs a query image, the same feature extraction method is used for extracting the features of the query image to obtain a query vector, then the similarity of the query vector to each feature in the feature library is calculated under a certain similarity measurement criterion, and finally, the corresponding images are sorted according to the similarity and output in sequence.
The image retrieval technology based on the content gives the expression and the similarity measurement of the image content to a computer for automatic processing, thereby greatly improving the retrieval efficiency and opening a new door for the retrieval of a massive image library. However, the disadvantage is also present, mainly as a semantic gap exists between the feature description and the high level semantics, which is difficult to fill, and is not eliminable. Then, people apply the convolutional neural network to image retrieval to solve the problem of semantic gap, and use the features extracted by the neural network as the image features for retrieval. The convolutional neural network often has a certain overfitting problem due to the limitation of an optimization algorithm. This will affect the extraction of image features and ultimately the accuracy of image retrieval.
For the over-fitting problem, besides Dropout, batch Normalization, etc., there are some methods to change the network structure to achieve regularization. The use of auxiliary classifiers to achieve gradient propagation to the shallow layers and the regularization effect that exists is proposed by the paper "Going stripper with volumes" published in 2015 at the CVPR conference. In addition, the paper "FractalNet: ultra-Deep Neural Networks with out results", published in 2017 at the ICLR conference, achieves regularization by randomly deleting parts of the network structure at training.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a method for improving the structure of the convolutional neural network to solve the overfitting problem under the condition of not increasing reasoning cost, so that the image retrieval accuracy is improved.
The invention adopts the technical scheme that the image retrieval method based on the convolutional neural network regularization processing comprises the following steps:
1) Training a neural network:
1-1) representing the convolutional neural network structure as a directed acyclic bipartite graph; each tensor of each input and output sub-network in the convolutional neural network forms each node in a set Y of the bipartite graph, each sub-network forms each node in a set X of the bipartite graph, and the nodes in the bipartite graph are connected according to the sequence from an input layer to an output layer; the sub-network is a set of each operation unit or connected operation units in the neural network;
1-2) setting the number M of selected nodes and the expansion times N of the graph structure corresponding to each node m (ii) a Selecting M selected nodes from the hidden layer; the hidden layer is the output tensor of each sub-network;
1-3) determining one not yet expanded in the order of direction from closer to the output layer to closer to the input layerSelecting nodes, and performing N on the graph structure behind the selected nodes m Secondary expansion; judging whether selected nodes which are not expanded exist, if so, returning to the step 1-3), otherwise, generating a convolutional neural network based on structure expansion corresponding to the expanded graph structure, and then entering the step 1-4);
1-4) inputting images in a training set into a convolutional neural network based on structure expansion, deleting network structures of all expansion parts and weights obtained by corresponding training after training is finished, and only keeping the original convolutional neural network structure and the weights obtained by corresponding training as the trained convolutional neural network;
2) An image retrieval step:
2-1) inputting an image to be retrieved into the trained convolutional neural network, and obtaining the characteristics of the image to be retrieved from a hidden layer in an original convolutional neural network structure;
2-2) searching the image characteristics of the image library by taking the image characteristics to be searched as a query vector, and outputting a picture corresponding to the image characteristics with the highest similarity with the query vector as a search result.
Convolutional neural networks are typically optimized using a gradient descent algorithm. According to the gradient descent algorithm, the gradient of any one feature spectrum is related to all weights of the layers passed from that feature spectrum to the missing layer. Therefore, an extra network structure and loss are connected behind a certain characteristic spectrum, so that richer gradients can be obtained, and the purpose of regularization is realized. Therefore, the invention provides a method for improving the image detection preparation rate based on the structural expansion convolutional neural network regularization.
The method has the advantages that a more flexible and more universal regularization scheme is utilized, the robustness of the convolutional neural network extraction features can be improved, the method can be applied to convolutional neural networks of image retrieval, more complex convolutional neural networks of more basic image classification, target detection and the like, the network performance is improved under the condition that reasoning cost is not increased, and the accuracy of image processing is improved.
Drawings
Fig. 1 is a simple residual block diagram.
Fig. 2 is a diagram of a simple residual network architecture.
Fig. 3 is a directed acyclic representation of the residual network shown in fig. 2.
Fig. 4 is a directed acyclic graph after a network structure is expanded once.
Fig. 5 is a directed acyclic graph after two network structure expansions.
Detailed Description
First, a regularization processing method of the present invention is explained, which includes the following steps:
the method comprises the following steps: representing the neural network structure in the form of a directed acyclic graph;
step two: selecting a certain node in the graph, and expanding a partial structure of the graph for a plurality of times;
step three: repeating the step two for a plurality of times;
step four: and training the neural network corresponding to the graph obtained in the third step.
The first step is as follows:
(1) Giving a convolutional neural network, outputting zero or a plurality of tensors for any operation by taking zero or a plurality of tensors as input, and defining the operation as a layer;
(2) According to actual requirements, a plurality of connected layers which complete independent functions are regarded as a sub-network;
(3) The following two types of nodes are defined in the graph:
i. each subnetwork n i Uniquely corresponding to a node v i These nodes form a set X;
each input or output tensor t of a respective subnetwork i Corresponding to one node u only i These nodes constitute a set Y;
all nodes form a point set V = X + Y;
(4) Directed edges are added to the graph according to the following rules:
i. if t i Is n j The output of (v) is added with a directed edge (v) j ,u i );
if t, ii i Is n j The input of (b) is added with a directed edge (u) i ,v j );
All the edges form an edge set E;
(5) The point set V and the edge set E form a directed graph D (E, V)
(6) Generally, graph D possesses the following properties:
i.D is directed acyclic;
d is a bipartite graph, and (X, Y) is a division thereof;
iii, the node with out degree greater than zero and in degree zero corresponds to the input layer of the network and represents a tensor;
iv, nodes with out-degrees of zero and in-degrees of greater than zero correspond to each loss of the network;
v. nodes with out degrees greater than zero and in degrees greater than zero correspond to subnets of the network and represent a plurality of operations;
the expanding method of the second step is as follows:
(1) Determining the number M of selected nodes and expanding times N m ∈{N 1 ,…,N i ,…,N M },m=1,…,M;
M tensor nodes u are selected from the Y set i ,u i The set of reachable all or part of the lossy layer nodes is denoted as V L
(2) Addition of | V to X L The number of l lossy layer nodes,
Figure BDA0002867177760000041
adding | V to Y L The | pieces represent the newly added loss value nodes,
Figure BDA0002867177760000042
(3) Adding directed edges
Figure BDA0002867177760000043
The whole graph is still a bipartite graph, and the X set and the Y set are divided into the bipartite graph;
(4) Adding N to X and Y respectively m The nodes and directed edges of the sub-steps (2) and (3);
(5) Repeating the steps (2), (3) and (4) M times until the expansion of the M selected nodes is completed.
When the structure is expanded, the expanded structure can be the same as the original structure or different from the original structure.
The invention has no special requirements on the specific structure of the convolutional neural network. The convolutional neural network comprises an input layer, a hidden layer and an output layer; the hidden layer is produced by different types of sub-networks, including convolutional layers, active layers, pooling layers, fully-connected layers, or more structurally complex residual blocks, etc. The convolution layer, the activation layer, the pooling layer and the full-connection layer are all independent operation units, and the residual block is a set of connected operation units.
Fig. 1 is a simple residual block diagram, in which the convolution Conv, activation function ReLU, add, etc. are all "layers" by definition, and the hexagon represents a tensor. The input tensor of the residual block is simultaneously input to the Conv layer and the Add layer, the output tensor of the Conv layer is also output to the Add layer after passing through the ReLU layer, and the output of the Add layer is the output tensor of the residual block.
A convolutional neural network as shown in fig. 2, in which the active ReLU and batch normalized BN layers are omitted. The network body comprises 4 sub-networks n 1 To n 4 Respectively, a convolution layer, a residual block 1, a residual block 2 and a full connection layer FC; the structure of the residual block 1 and the residual block 2 is substantially the same as that of fig. 1 (both have one more convolutional layer, and the residual block 2 has one more convolutional layer for spatial down-sampling). Sub-network n 5 For calculating the loss.
The input tensor passes through the convolution layer and 2 residual blocks, and then passes through the full connection layer and the loss layer to be output. The residual network structure is conventionally divided into 5 sub-networks n 1 To n 5 Convolution layer, residual block 1, residual block 2, full link layer and loss layer, respectively, forward propagation also denoted by n 1 To n 5 As shown in table 1:
convolutional layerConv Sub-network n 1
Residual block 1 Sub-network n 2
Residual block 2 Sub-network n 3
Full connection layer FC Sub-network n 4
Loss layer Loss Sub-network n 5
TABLE 1
In the training process of the convolutional neural network, the regularization processing mode is as follows:
1-1) representing the convolutional neural network structure as a directed acyclic bipartite graph; represented by a directed acyclic graph, as shown in FIG. 3, with a circular node u 1 To u 6 A representation tensor, constituting a set Y; square node v 1 To v 5 For representing sub-networks n 1 To n 5 Forming a set X;
1-2) setting the number M =2 of selected nodes and the expansion times N of the graph structure m ∈{N 1 =2,N 2 =1};N m Represents the number of expansions starting from the mth selected node, M =1, …, M; selecting 2 selected nodes u from the set Y of hidden layers 4 And u 3
1-3) determining a selected node u not yet propagated in order of direction from near the output layer to near the input layer 4 The selected node u 4 Loss node (u) reachable to it 6 ) Structure of (b) is carried out by N 1 =2 expansion expanded graph structure (two new expansion points)Loss node u 6 ' and u 6 "); fig. 4 is a directed acyclic graph after the expansion, in which the dotted nodes are newly added nodes, and correspond to the network structure shown in table 2;
Figure BDA0002867177760000051
TABLE 2
The output tensor of the residual block 2 is simultaneously used as the input of the full connection, the newly added full connection _1 and the newly added full connection _ 2; the outputs of the full connection, the newly added full connection _1 and the newly added full connection _2 are respectively used as the inputs of a loss layer, the newly added loss layer _1 and the newly added loss layer _ 2;
because there is also a selected node u which is not expanded 3 Then, the next step is started to select the node u 3 Expanding;
1-4) determining a selected node u not yet propagated in order of direction from near the output layer to near the input layer 3 Will select node u 3 Loss node (u) reachable to it 6 、u 6 ' and u 6 ") across a network structure by N 2 =1 expansion; fig. 5 is a directed acyclic graph after the expansion, in which the dotted nodes are newly added nodes, and correspond to the network structure shown in table 3;
Figure BDA0002867177760000061
TABLE 3
The output of the residual block 1 is simultaneously used as the input of the residual block 2 and the residual block 2_1; the output of the residual block 2 is simultaneously used as the input of full connection, full connection _1 and full connection _ 2; the output of the residual block 2_1 is simultaneously the input of full connection _3, full connection _4, full connection _ 5; the outputs of full connection, full connection _1, full connection _2, full connection _3, full connection _4 and full connection _5 are respectively used as the inputs of a loss layer, a loss layer _1, a loss layer _2, a loss layer _3, a loss layer _4 and a loss layer _ 5;
and if no selected node without expansion exists, using the current expanded graph structure as a convolutional neural network based on structure expansion, and completing regularization.
In the training process, images of a training set are input to the convolutional neural network based on structure expansion, after training is completed, the network structures of all expansion parts and the weights obtained by corresponding training are deleted, and only the original convolutional neural network structure and the weights obtained by corresponding training are reserved as the trained convolutional neural network.
Then, inputting the image to be retrieved into the trained convolutional neural network, and obtaining the characteristics of the image to be retrieved from a hidden layer in the trained convolutional neural network structure; and searching the image characteristics of the image library by taking the image characteristics to be searched as a query vector, and outputting a picture corresponding to the image characteristics with the highest similarity with the query vector as a search result.
The trained convolutional neural network can also be directly used for image classification to directly obtain image classification results, and can also be used as an image feature extraction module in image processing systems for image retrieval, target detection and the like, namely, image features are obtained from the output of a hidden layer of the trained convolutional neural network and are used for subsequent processing steps.
Training is carried out on a standard data set MNIST by using a network (table 1) shown in fig. 2, the final classification accuracy is 99.27%, the network structure is expanded to be shown in fig. 5 (table 3) by using the regularization method of the embodiment, the same hyper-parameters are used for training, the newly added structure and weights are deleted, the accuracy is improved to 99.48%, the overfitting phenomenon is relieved under the condition that no reasoning cost is increased, and the network performance is improved.

Claims (1)

1. An image retrieval method based on convolutional neural network regularization processing is characterized by comprising the following steps:
1) Training a neural network:
1-1) representing the convolutional neural network structure as a directed acyclic bipartite graph; each input tensor and each output tensor of each sub-network in the convolutional neural network form each node in a set Y of the bipartite graph, each sub-network forms each node in a set X of the bipartite graph, and the nodes in the bipartite graph are connected according to the sequence from an input layer to an output layer; the sub-network is a set of each operation unit or connected operation units in the neural network;
1-2) setting the number M of selected nodes and the expansion times N of the graph structure corresponding to each node m (ii) a Selecting M selected nodes from the set Y;
1-3) determining a selected node which is not expanded according to the approaching direction sequence from the output layer to the input layer, and carrying out N on the graph structure behind the selected node m Secondary expansion; judging whether a selected node which is not expanded exists, if so, returning to the step 1-3), otherwise, generating a convolutional neural network based on structure expansion corresponding to the expanded graph structure, and then entering the step 1-4);
1-4) inputting the images in the training set into a convolutional neural network based on structure expansion, deleting the network structures of all the expansion parts and the weights obtained by corresponding training after the training is finished, and only keeping the original convolutional neural network structure and the weights obtained by corresponding training as the trained convolutional neural network;
2) An image retrieval step:
2-1) inputting an image to be retrieved into a trained convolutional neural network, and obtaining the characteristics of the image to be retrieved from a hidden layer in an original convolutional neural network structure;
and 2-2) searching the image features to be retrieved in the image features of the image library by taking the image features as query vectors, and outputting the image corresponding to the image features with the highest similarity with the query vectors as a retrieval result.
CN202011597827.7A 2020-12-29 2020-12-29 Image retrieval method based on convolutional neural network regularization processing Active CN112685590B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011597827.7A CN112685590B (en) 2020-12-29 2020-12-29 Image retrieval method based on convolutional neural network regularization processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011597827.7A CN112685590B (en) 2020-12-29 2020-12-29 Image retrieval method based on convolutional neural network regularization processing

Publications (2)

Publication Number Publication Date
CN112685590A CN112685590A (en) 2021-04-20
CN112685590B true CN112685590B (en) 2022-10-14

Family

ID=75454192

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011597827.7A Active CN112685590B (en) 2020-12-29 2020-12-29 Image retrieval method based on convolutional neural network regularization processing

Country Status (1)

Country Link
CN (1) CN112685590B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818314A (en) * 2017-11-22 2018-03-20 北京达佳互联信息技术有限公司 Face image processing method, device and server
CN108446307A (en) * 2018-02-05 2018-08-24 中国科学院信息工程研究所 A kind of the binary set generation method and image, semantic similarity search method of multi-tag image

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9633306B2 (en) * 2015-05-07 2017-04-25 Siemens Healthcare Gmbh Method and system for approximating deep neural networks for anatomical object detection
CN104933428B (en) * 2015-07-23 2018-05-01 苏州大学 A kind of face identification method and device based on tensor description
CN105760872B (en) * 2016-02-03 2019-06-11 苏州大学 A kind of recognition methods and system based on robust image feature extraction
WO2017151757A1 (en) * 2016-03-01 2017-09-08 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Recurrent neural feedback model for automated image annotation
CN110287985B (en) * 2019-05-15 2023-04-18 江苏大学 Depth neural network image identification method based on variable topology structure with variation particle swarm optimization
CN110634170B (en) * 2019-08-30 2022-09-13 福建帝视信息科技有限公司 Photo-level image generation method based on semantic content and rapid image retrieval
CN111951263A (en) * 2020-08-26 2020-11-17 桂林电子科技大学 Mechanical part drawing retrieval method based on convolutional neural network
CN112036512B (en) * 2020-11-03 2021-03-26 浙江大学 Image classification neural network architecture searching method and device based on network clipping

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818314A (en) * 2017-11-22 2018-03-20 北京达佳互联信息技术有限公司 Face image processing method, device and server
CN108446307A (en) * 2018-02-05 2018-08-24 中国科学院信息工程研究所 A kind of the binary set generation method and image, semantic similarity search method of multi-tag image

Also Published As

Publication number Publication date
CN112685590A (en) 2021-04-20

Similar Documents

Publication Publication Date Title
CN111489358B (en) Three-dimensional point cloud semantic segmentation method based on deep learning
JP5282658B2 (en) Image learning, automatic annotation, search method and apparatus
CN110188228B (en) Cross-modal retrieval method based on sketch retrieval three-dimensional model
CN111462282A (en) Scene graph generation method
US11804036B2 (en) Person re-identification method based on perspective-guided multi-adversarial attention
CN114398491A (en) Semantic segmentation image entity relation reasoning method based on knowledge graph
CN110110116B (en) Trademark image retrieval method integrating deep convolutional network and semantic analysis
CN109472282B (en) Depth image hashing method based on few training samples
CN113255895A (en) Graph neural network representation learning-based structure graph alignment method and multi-graph joint data mining method
CN104008177B (en) Rule base structure optimization and generation method and system towards linguistic indexing of pictures
CN111597943B (en) Table structure identification method based on graph neural network
CN113052254A (en) Multi-attention ghost residual fusion classification model and classification method thereof
CN113255892A (en) Method and device for searching decoupled network structure and readable storage medium
CN115248876A (en) Remote sensing image overall planning recommendation method based on content understanding
CN114461890A (en) Hierarchical multi-modal intellectual property search engine method and system
CN115858919A (en) Learning resource recommendation method and system based on project field knowledge and user comments
CN108470251B (en) Community division quality evaluation method and system based on average mutual information
CN112685590B (en) Image retrieval method based on convolutional neural network regularization processing
CN116452939A (en) Social media false information detection method based on multi-modal entity fusion and alignment
CN115860119A (en) Low-sample knowledge graph completion method and system based on dynamic meta-learning
Janković Babić A comparison of methods for image classification of cultural heritage using transfer learning for feature extraction
CN111460324B (en) Citation recommendation method and system based on link analysis
CN113869461A (en) Author migration and classification method for scientific cooperation heterogeneous network
CN113032612A (en) Construction method of multi-target image retrieval model, retrieval method and device
Fofana et al. Optimal Flame Detection of Fires in Videos Based on Deep Learning and the Use of Various Optimizers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant