CN112685590B - Image retrieval method based on convolutional neural network regularization processing - Google Patents
Image retrieval method based on convolutional neural network regularization processing Download PDFInfo
- Publication number
- CN112685590B CN112685590B CN202011597827.7A CN202011597827A CN112685590B CN 112685590 B CN112685590 B CN 112685590B CN 202011597827 A CN202011597827 A CN 202011597827A CN 112685590 B CN112685590 B CN 112685590B
- Authority
- CN
- China
- Prior art keywords
- neural network
- convolutional neural
- image
- network
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Abstract
The invention provides an image retrieval method based on convolutional neural network regularization, which is characterized in that the convolutional neural network regularization based on structure expansion is utilized, a neural network structure is represented in a directed acyclic graph form, a series of expansion is carried out on the structure of the directed acyclic graph, then a neural network corresponding to the expanded graph is trained, and finally layers except an original structure in the neural network are deleted. Compared with the prior art, the method provided by the invention improves the performance of the neural network under the condition of not increasing reasoning cost, and has application prospects in all large directions of computer vision.
Description
Technical Field
The invention relates to an image retrieval technology, in particular to a convolutional neural network regularization technology.
Background
The content-based image retrieval has wide application prospects in various industrial fields. The image retrieval based on the content utilizes a computer to analyze the image, establishes image feature vector description and stores the image feature vector description in an image feature library, when a user inputs a query image, the same feature extraction method is used for extracting the features of the query image to obtain a query vector, then the similarity of the query vector to each feature in the feature library is calculated under a certain similarity measurement criterion, and finally, the corresponding images are sorted according to the similarity and output in sequence.
The image retrieval technology based on the content gives the expression and the similarity measurement of the image content to a computer for automatic processing, thereby greatly improving the retrieval efficiency and opening a new door for the retrieval of a massive image library. However, the disadvantage is also present, mainly as a semantic gap exists between the feature description and the high level semantics, which is difficult to fill, and is not eliminable. Then, people apply the convolutional neural network to image retrieval to solve the problem of semantic gap, and use the features extracted by the neural network as the image features for retrieval. The convolutional neural network often has a certain overfitting problem due to the limitation of an optimization algorithm. This will affect the extraction of image features and ultimately the accuracy of image retrieval.
For the over-fitting problem, besides Dropout, batch Normalization, etc., there are some methods to change the network structure to achieve regularization. The use of auxiliary classifiers to achieve gradient propagation to the shallow layers and the regularization effect that exists is proposed by the paper "Going stripper with volumes" published in 2015 at the CVPR conference. In addition, the paper "FractalNet: ultra-Deep Neural Networks with out results", published in 2017 at the ICLR conference, achieves regularization by randomly deleting parts of the network structure at training.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a method for improving the structure of the convolutional neural network to solve the overfitting problem under the condition of not increasing reasoning cost, so that the image retrieval accuracy is improved.
The invention adopts the technical scheme that the image retrieval method based on the convolutional neural network regularization processing comprises the following steps:
1) Training a neural network:
1-1) representing the convolutional neural network structure as a directed acyclic bipartite graph; each tensor of each input and output sub-network in the convolutional neural network forms each node in a set Y of the bipartite graph, each sub-network forms each node in a set X of the bipartite graph, and the nodes in the bipartite graph are connected according to the sequence from an input layer to an output layer; the sub-network is a set of each operation unit or connected operation units in the neural network;
1-2) setting the number M of selected nodes and the expansion times N of the graph structure corresponding to each node m (ii) a Selecting M selected nodes from the hidden layer; the hidden layer is the output tensor of each sub-network;
1-3) determining one not yet expanded in the order of direction from closer to the output layer to closer to the input layerSelecting nodes, and performing N on the graph structure behind the selected nodes m Secondary expansion; judging whether selected nodes which are not expanded exist, if so, returning to the step 1-3), otherwise, generating a convolutional neural network based on structure expansion corresponding to the expanded graph structure, and then entering the step 1-4);
1-4) inputting images in a training set into a convolutional neural network based on structure expansion, deleting network structures of all expansion parts and weights obtained by corresponding training after training is finished, and only keeping the original convolutional neural network structure and the weights obtained by corresponding training as the trained convolutional neural network;
2) An image retrieval step:
2-1) inputting an image to be retrieved into the trained convolutional neural network, and obtaining the characteristics of the image to be retrieved from a hidden layer in an original convolutional neural network structure;
2-2) searching the image characteristics of the image library by taking the image characteristics to be searched as a query vector, and outputting a picture corresponding to the image characteristics with the highest similarity with the query vector as a search result.
Convolutional neural networks are typically optimized using a gradient descent algorithm. According to the gradient descent algorithm, the gradient of any one feature spectrum is related to all weights of the layers passed from that feature spectrum to the missing layer. Therefore, an extra network structure and loss are connected behind a certain characteristic spectrum, so that richer gradients can be obtained, and the purpose of regularization is realized. Therefore, the invention provides a method for improving the image detection preparation rate based on the structural expansion convolutional neural network regularization.
The method has the advantages that a more flexible and more universal regularization scheme is utilized, the robustness of the convolutional neural network extraction features can be improved, the method can be applied to convolutional neural networks of image retrieval, more complex convolutional neural networks of more basic image classification, target detection and the like, the network performance is improved under the condition that reasoning cost is not increased, and the accuracy of image processing is improved.
Drawings
Fig. 1 is a simple residual block diagram.
Fig. 2 is a diagram of a simple residual network architecture.
Fig. 3 is a directed acyclic representation of the residual network shown in fig. 2.
Fig. 4 is a directed acyclic graph after a network structure is expanded once.
Fig. 5 is a directed acyclic graph after two network structure expansions.
Detailed Description
First, a regularization processing method of the present invention is explained, which includes the following steps:
the method comprises the following steps: representing the neural network structure in the form of a directed acyclic graph;
step two: selecting a certain node in the graph, and expanding a partial structure of the graph for a plurality of times;
step three: repeating the step two for a plurality of times;
step four: and training the neural network corresponding to the graph obtained in the third step.
The first step is as follows:
(1) Giving a convolutional neural network, outputting zero or a plurality of tensors for any operation by taking zero or a plurality of tensors as input, and defining the operation as a layer;
(2) According to actual requirements, a plurality of connected layers which complete independent functions are regarded as a sub-network;
(3) The following two types of nodes are defined in the graph:
i. each subnetwork n i Uniquely corresponding to a node v i These nodes form a set X;
each input or output tensor t of a respective subnetwork i Corresponding to one node u only i These nodes constitute a set Y;
all nodes form a point set V = X + Y;
(4) Directed edges are added to the graph according to the following rules:
i. if t i Is n j The output of (v) is added with a directed edge (v) j ,u i );
if t, ii i Is n j The input of (b) is added with a directed edge (u) i ,v j );
All the edges form an edge set E;
(5) The point set V and the edge set E form a directed graph D (E, V)
(6) Generally, graph D possesses the following properties:
i.D is directed acyclic;
d is a bipartite graph, and (X, Y) is a division thereof;
iii, the node with out degree greater than zero and in degree zero corresponds to the input layer of the network and represents a tensor;
iv, nodes with out-degrees of zero and in-degrees of greater than zero correspond to each loss of the network;
v. nodes with out degrees greater than zero and in degrees greater than zero correspond to subnets of the network and represent a plurality of operations;
the expanding method of the second step is as follows:
(1) Determining the number M of selected nodes and expanding times N m ∈{N 1 ,…,N i ,…,N M },m=1,…,M;
M tensor nodes u are selected from the Y set i ,u i The set of reachable all or part of the lossy layer nodes is denoted as V L ;
(2) Addition of | V to X L The number of l lossy layer nodes,adding | V to Y L The | pieces represent the newly added loss value nodes,
(3) Adding directed edgesThe whole graph is still a bipartite graph, and the X set and the Y set are divided into the bipartite graph;
(4) Adding N to X and Y respectively m The nodes and directed edges of the sub-steps (2) and (3);
(5) Repeating the steps (2), (3) and (4) M times until the expansion of the M selected nodes is completed.
When the structure is expanded, the expanded structure can be the same as the original structure or different from the original structure.
The invention has no special requirements on the specific structure of the convolutional neural network. The convolutional neural network comprises an input layer, a hidden layer and an output layer; the hidden layer is produced by different types of sub-networks, including convolutional layers, active layers, pooling layers, fully-connected layers, or more structurally complex residual blocks, etc. The convolution layer, the activation layer, the pooling layer and the full-connection layer are all independent operation units, and the residual block is a set of connected operation units.
Fig. 1 is a simple residual block diagram, in which the convolution Conv, activation function ReLU, add, etc. are all "layers" by definition, and the hexagon represents a tensor. The input tensor of the residual block is simultaneously input to the Conv layer and the Add layer, the output tensor of the Conv layer is also output to the Add layer after passing through the ReLU layer, and the output of the Add layer is the output tensor of the residual block.
A convolutional neural network as shown in fig. 2, in which the active ReLU and batch normalized BN layers are omitted. The network body comprises 4 sub-networks n 1 To n 4 Respectively, a convolution layer, a residual block 1, a residual block 2 and a full connection layer FC; the structure of the residual block 1 and the residual block 2 is substantially the same as that of fig. 1 (both have one more convolutional layer, and the residual block 2 has one more convolutional layer for spatial down-sampling). Sub-network n 5 For calculating the loss.
The input tensor passes through the convolution layer and 2 residual blocks, and then passes through the full connection layer and the loss layer to be output. The residual network structure is conventionally divided into 5 sub-networks n 1 To n 5 Convolution layer, residual block 1, residual block 2, full link layer and loss layer, respectively, forward propagation also denoted by n 1 To n 5 As shown in table 1:
convolutional layerConv | Sub-network n 1 |
Residual block 1 | Sub-network n 2 |
|
Sub-network n 3 |
Full connection layer FC | Sub-network n 4 |
Loss layer Loss | Sub-network n 5 |
TABLE 1
In the training process of the convolutional neural network, the regularization processing mode is as follows:
1-1) representing the convolutional neural network structure as a directed acyclic bipartite graph; represented by a directed acyclic graph, as shown in FIG. 3, with a circular node u 1 To u 6 A representation tensor, constituting a set Y; square node v 1 To v 5 For representing sub-networks n 1 To n 5 Forming a set X;
1-2) setting the number M =2 of selected nodes and the expansion times N of the graph structure m ∈{N 1 =2,N 2 =1};N m Represents the number of expansions starting from the mth selected node, M =1, …, M; selecting 2 selected nodes u from the set Y of hidden layers 4 And u 3 ;
1-3) determining a selected node u not yet propagated in order of direction from near the output layer to near the input layer 4 The selected node u 4 Loss node (u) reachable to it 6 ) Structure of (b) is carried out by N 1 =2 expansion expanded graph structure (two new expansion points)Loss node u 6 ' and u 6 "); fig. 4 is a directed acyclic graph after the expansion, in which the dotted nodes are newly added nodes, and correspond to the network structure shown in table 2;
TABLE 2
The output tensor of the residual block 2 is simultaneously used as the input of the full connection, the newly added full connection _1 and the newly added full connection _ 2; the outputs of the full connection, the newly added full connection _1 and the newly added full connection _2 are respectively used as the inputs of a loss layer, the newly added loss layer _1 and the newly added loss layer _ 2;
because there is also a selected node u which is not expanded 3 Then, the next step is started to select the node u 3 Expanding;
1-4) determining a selected node u not yet propagated in order of direction from near the output layer to near the input layer 3 Will select node u 3 Loss node (u) reachable to it 6 、u 6 ' and u 6 ") across a network structure by N 2 =1 expansion; fig. 5 is a directed acyclic graph after the expansion, in which the dotted nodes are newly added nodes, and correspond to the network structure shown in table 3;
TABLE 3
The output of the residual block 1 is simultaneously used as the input of the residual block 2 and the residual block 2_1; the output of the residual block 2 is simultaneously used as the input of full connection, full connection _1 and full connection _ 2; the output of the residual block 2_1 is simultaneously the input of full connection _3, full connection _4, full connection _ 5; the outputs of full connection, full connection _1, full connection _2, full connection _3, full connection _4 and full connection _5 are respectively used as the inputs of a loss layer, a loss layer _1, a loss layer _2, a loss layer _3, a loss layer _4 and a loss layer _ 5;
and if no selected node without expansion exists, using the current expanded graph structure as a convolutional neural network based on structure expansion, and completing regularization.
In the training process, images of a training set are input to the convolutional neural network based on structure expansion, after training is completed, the network structures of all expansion parts and the weights obtained by corresponding training are deleted, and only the original convolutional neural network structure and the weights obtained by corresponding training are reserved as the trained convolutional neural network.
Then, inputting the image to be retrieved into the trained convolutional neural network, and obtaining the characteristics of the image to be retrieved from a hidden layer in the trained convolutional neural network structure; and searching the image characteristics of the image library by taking the image characteristics to be searched as a query vector, and outputting a picture corresponding to the image characteristics with the highest similarity with the query vector as a search result.
The trained convolutional neural network can also be directly used for image classification to directly obtain image classification results, and can also be used as an image feature extraction module in image processing systems for image retrieval, target detection and the like, namely, image features are obtained from the output of a hidden layer of the trained convolutional neural network and are used for subsequent processing steps.
Training is carried out on a standard data set MNIST by using a network (table 1) shown in fig. 2, the final classification accuracy is 99.27%, the network structure is expanded to be shown in fig. 5 (table 3) by using the regularization method of the embodiment, the same hyper-parameters are used for training, the newly added structure and weights are deleted, the accuracy is improved to 99.48%, the overfitting phenomenon is relieved under the condition that no reasoning cost is increased, and the network performance is improved.
Claims (1)
1. An image retrieval method based on convolutional neural network regularization processing is characterized by comprising the following steps:
1) Training a neural network:
1-1) representing the convolutional neural network structure as a directed acyclic bipartite graph; each input tensor and each output tensor of each sub-network in the convolutional neural network form each node in a set Y of the bipartite graph, each sub-network forms each node in a set X of the bipartite graph, and the nodes in the bipartite graph are connected according to the sequence from an input layer to an output layer; the sub-network is a set of each operation unit or connected operation units in the neural network;
1-2) setting the number M of selected nodes and the expansion times N of the graph structure corresponding to each node m (ii) a Selecting M selected nodes from the set Y;
1-3) determining a selected node which is not expanded according to the approaching direction sequence from the output layer to the input layer, and carrying out N on the graph structure behind the selected node m Secondary expansion; judging whether a selected node which is not expanded exists, if so, returning to the step 1-3), otherwise, generating a convolutional neural network based on structure expansion corresponding to the expanded graph structure, and then entering the step 1-4);
1-4) inputting the images in the training set into a convolutional neural network based on structure expansion, deleting the network structures of all the expansion parts and the weights obtained by corresponding training after the training is finished, and only keeping the original convolutional neural network structure and the weights obtained by corresponding training as the trained convolutional neural network;
2) An image retrieval step:
2-1) inputting an image to be retrieved into a trained convolutional neural network, and obtaining the characteristics of the image to be retrieved from a hidden layer in an original convolutional neural network structure;
and 2-2) searching the image features to be retrieved in the image features of the image library by taking the image features as query vectors, and outputting the image corresponding to the image features with the highest similarity with the query vectors as a retrieval result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011597827.7A CN112685590B (en) | 2020-12-29 | 2020-12-29 | Image retrieval method based on convolutional neural network regularization processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011597827.7A CN112685590B (en) | 2020-12-29 | 2020-12-29 | Image retrieval method based on convolutional neural network regularization processing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112685590A CN112685590A (en) | 2021-04-20 |
CN112685590B true CN112685590B (en) | 2022-10-14 |
Family
ID=75454192
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011597827.7A Active CN112685590B (en) | 2020-12-29 | 2020-12-29 | Image retrieval method based on convolutional neural network regularization processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112685590B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107818314A (en) * | 2017-11-22 | 2018-03-20 | 北京达佳互联信息技术有限公司 | Face image processing method, device and server |
CN108446307A (en) * | 2018-02-05 | 2018-08-24 | 中国科学院信息工程研究所 | A kind of the binary set generation method and image, semantic similarity search method of multi-tag image |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9633306B2 (en) * | 2015-05-07 | 2017-04-25 | Siemens Healthcare Gmbh | Method and system for approximating deep neural networks for anatomical object detection |
CN104933428B (en) * | 2015-07-23 | 2018-05-01 | 苏州大学 | A kind of face identification method and device based on tensor description |
CN105760872B (en) * | 2016-02-03 | 2019-06-11 | 苏州大学 | A kind of recognition methods and system based on robust image feature extraction |
WO2017151757A1 (en) * | 2016-03-01 | 2017-09-08 | The United States Of America, As Represented By The Secretary, Department Of Health And Human Services | Recurrent neural feedback model for automated image annotation |
CN110287985B (en) * | 2019-05-15 | 2023-04-18 | 江苏大学 | Depth neural network image identification method based on variable topology structure with variation particle swarm optimization |
CN110634170B (en) * | 2019-08-30 | 2022-09-13 | 福建帝视信息科技有限公司 | Photo-level image generation method based on semantic content and rapid image retrieval |
CN111951263A (en) * | 2020-08-26 | 2020-11-17 | 桂林电子科技大学 | Mechanical part drawing retrieval method based on convolutional neural network |
CN112036512B (en) * | 2020-11-03 | 2021-03-26 | 浙江大学 | Image classification neural network architecture searching method and device based on network clipping |
-
2020
- 2020-12-29 CN CN202011597827.7A patent/CN112685590B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107818314A (en) * | 2017-11-22 | 2018-03-20 | 北京达佳互联信息技术有限公司 | Face image processing method, device and server |
CN108446307A (en) * | 2018-02-05 | 2018-08-24 | 中国科学院信息工程研究所 | A kind of the binary set generation method and image, semantic similarity search method of multi-tag image |
Also Published As
Publication number | Publication date |
---|---|
CN112685590A (en) | 2021-04-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111489358B (en) | Three-dimensional point cloud semantic segmentation method based on deep learning | |
JP5282658B2 (en) | Image learning, automatic annotation, search method and apparatus | |
CN110188228B (en) | Cross-modal retrieval method based on sketch retrieval three-dimensional model | |
CN111462282A (en) | Scene graph generation method | |
US11804036B2 (en) | Person re-identification method based on perspective-guided multi-adversarial attention | |
CN114398491A (en) | Semantic segmentation image entity relation reasoning method based on knowledge graph | |
CN110110116B (en) | Trademark image retrieval method integrating deep convolutional network and semantic analysis | |
CN109472282B (en) | Depth image hashing method based on few training samples | |
CN113255895A (en) | Graph neural network representation learning-based structure graph alignment method and multi-graph joint data mining method | |
CN104008177B (en) | Rule base structure optimization and generation method and system towards linguistic indexing of pictures | |
CN111597943B (en) | Table structure identification method based on graph neural network | |
CN113052254A (en) | Multi-attention ghost residual fusion classification model and classification method thereof | |
CN113255892A (en) | Method and device for searching decoupled network structure and readable storage medium | |
CN115248876A (en) | Remote sensing image overall planning recommendation method based on content understanding | |
CN114461890A (en) | Hierarchical multi-modal intellectual property search engine method and system | |
CN115858919A (en) | Learning resource recommendation method and system based on project field knowledge and user comments | |
CN108470251B (en) | Community division quality evaluation method and system based on average mutual information | |
CN112685590B (en) | Image retrieval method based on convolutional neural network regularization processing | |
CN116452939A (en) | Social media false information detection method based on multi-modal entity fusion and alignment | |
CN115860119A (en) | Low-sample knowledge graph completion method and system based on dynamic meta-learning | |
Janković Babić | A comparison of methods for image classification of cultural heritage using transfer learning for feature extraction | |
CN111460324B (en) | Citation recommendation method and system based on link analysis | |
CN113869461A (en) | Author migration and classification method for scientific cooperation heterogeneous network | |
CN113032612A (en) | Construction method of multi-target image retrieval model, retrieval method and device | |
Fofana et al. | Optimal Flame Detection of Fires in Videos Based on Deep Learning and the Use of Various Optimizers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |