CN110516091A

CN110516091A - A kind of image data retrieval method

Info

Publication number: CN110516091A
Application number: CN201910815340.2A
Authority: CN
Inventors: 齐峰; 张艳明; 徐海利; 迟言; 杨巍巍
Original assignee: Heilongjiang University of Chinese Medicine
Current assignee: Heilongjiang University of Chinese Medicine
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2019-11-29

Abstract

A kind of image data retrieval method, belongs to Computer Science and Technology field.In order to solve the problems, such as some existing models to need to rely on caused by triple is trained that training process is complicated and the serious problem not high by precision caused by the influence of gradient problem of existing prototype network depth.A convolution sub-network and a Hash sub-network are connected after INPUT layers of the present invention；Hash sub-network includes sliced layer, connects layer unit entirely, normalization layer unit, activation layer unit, merges layer and thresholding layer；Then the feature scribing that sliced layer exports convolution sub-network respectively enters each full articulamentum and handles respectively, then handles after normalizing layer using activation primitive respectively；It is laminated simultaneously that feature after activation primitive enters merging；Into thresholding layer, the thresholding number of plies exports Hash codes；For retrieval image, the retrieval of destination image data is carried out using trained depth Hash model.The present invention is suitable for image data retrieval.

Description

A kind of image data retrieval method

Technical field

The present invention relates to a kind of search methods of image, belong to Computer Science and Technology field.

Background technique

With the development of computer technology, the especially development of artificial intelligence field technology.Face recognition, image procossing, Each practical application area such as natural language processing starts with deep neural network all to realize, or even has relied in depth Neural network.For nearest neighbor search problem, need then to need to find in space according to a given inquiry (query) The point nearest from it, image retrieval are also typically used as a kind of neighbor search also to handle.

In recent years, image retrieval had obtained very extensive research, such as the teacher Pan Yan study group of Zhongshan University and Yan Shui Cooperate at teacher, the paper delivered on American Association for Artificial Inte annual meeting (AAAI 2014) proposes a kind of entitled CNNH The method of (Convolutional Neural Network Hashing), before the depth hash algorithm based on CNN has been shifted onto Platform.Later teacher Yan Shuicheng has used the network knot that more Network in Network is firmly got than the network in CNNH Structure, referred to as NINH (NIN Hashing) or DNNH (Deep Neural Network Hashing).Its Web vector graphic three is opened The triple of image construction is trained.Good effect is achieved, but training process needs to spend more training image The time of processing, and operate comparatively laborious.Other scientific research personnel ask CNN and depth hash algorithm to solve picture search The technology of topic also expands a large amount of research and experiment, and the deep neural network that everybody builds is different, the effect of acquirement Respectively there is superiority and inferiority.There is more classical model structure during this, but the model of various classics is also due to the scope of application With itself existing for it is specific continuously improved and updated, but some improve not only cannot more excellent or even effect can be deteriorated, this is all Caused by being the characteristic having due to deep neural network itself, because the structure of deep neural network can drastically influence place Rationality energy.Moreover, relaxation problem, the influence of loss function (or objective function), training process gradient descent procedures Etc. can all seriously affect the process performance of deep neural network.

Current deep neural network has been able to obtain preferable search result, but how to determine and query More matched search result still needs to be further improved.

Summary of the invention

The present invention in order to solve some existing models need to rely on triple be trained caused by training process it is complicated The problem of and existing prototype network depth seriously influenced by gradient problem caused by the not high problem of precision.

A kind of image data retrieval method, comprising the following steps:

Depth Hash model is as follows:

A convolution sub-network is connected after INPUT layers, connects a Hash sub-network after convolution sub-network；The Hash Sub-network includes sliced layer, connects layer unit entirely, normalization layer unit, activation layer unit, merges layer and thresholding layer；By convolution The feature of sub-network output is denoted as x, and feature x is divided into n piece by sliced layer, and every feature is x_i, i=1,2 ... n；

It is m/n per a piece of intrinsic dimensionality for including, m is characterized the dimension of x；

The full connection layer unit of Hash sub-network has n full articulamentums, and each full articulamentum respectively corresponds one x of processing_i, Full connection processing is expressed as f_i=W_ix_i+b_i, wherein W_iFor the weight matrix of i-th of full articulamentum, b_iFor corresponding biasing；

Normalization layer unit has n normalization layer, each one f of normalization layer alignment processing_i, indicate after normalized For g_i, range is [- 1,1],

Activation layer unit has n activation primitive layer, each one g of activation primitive layer alignment processing_i, activation primitive isWherein β is smooth control parameter, e natural constant；g_iIt is q by activation primitive postscript_i；Active coating is compiled Code enters Softmax classifier；

All q_iInto layer is merged, merge layer for q_iMerge into a n-dimensional vector q=(q₁,q₂,…,q_n)^T；

Subsequently into thresholding layer, thresholding function are as follows:The thresholding number of plies exports Hash Code；

The loss function of depth Hash model is

V=(v (q₁),v(q₂),…,v(q_n))^TFor the Hash codes of thresholding layer output；Indicate the L1 norm of q-1 2 powers；L is weight factor；

Depth Hash model obtains trained depth Hash model by training set after training；

For retrieval image, the retrieval of destination image data is carried out using trained depth Hash model.

Further, in the normalization layerf_jFor f₁To f_nThe middle maximum feature of mould.

The beneficial effects of the present invention are:

Training process of the invention can directly use individual image, and be trained without relying on triple, energy The time largely handled training image is enough saved, and training process is simple, is easily achieved；And it can save a large amount of Time for being directed to training set itself and being handled, i.e. training set can be used directly, time saving and energy saving.

The present invention normalizes the normalization operation in layer unit, avoids because double tangent activation primitives exist due to saturation The problem of gradient of generation disappears can not only guarantee that the present invention can have neural network deep enough and build ability in this way, And it can also ensure that can there is relatively good optimizing effect in gradient derivation process.

The loss function of depth Hash model of the present invention makes the present invention have preferably study and effect of optimization.

Detailed description of the invention

Fig. 1 is depth Hash model structure schematic diagram.

Specific embodiment

Specific embodiment 1:

Present embodiment is a kind of image data retrieval method, specifically includes the following steps:

As shown in Figure 1, depth Hash model is as follows:

A convolution sub-network is connected after INPUT layers, connects a Hash sub-network after convolution sub-network；The Hash Sub-network includes sliced layer, connects layer unit entirely, normalization layer unit, activation layer unit, merges layer and thresholding layer；It is described INPUT layers are input layer.

The feature that convolution sub-network exports is denoted as x, feature x is divided into n piece by sliced layer, and every feature is x_i, i=1, 2,…n；It is m/n per a piece of intrinsic dimensionality for including, m is characterized the dimension of x；It is integer that strict control m/n is wanted in the present invention, can To utilize FC layers of controlling feature dimension of the last layer in convolution sub-network, so that m/n is integer；Here after operation will affect Continue normalized process, to influence active coating and then influence the effect of overall model；

In some embodiments,f_jFor f₁To f_nThe middle maximum feature of mould；It can guarantee characteristic symbol not in this way Become, and in the normalized section to [- 1,1], to avoid because double tangent activation primitives have the ladder generated due to saturation The problem of degree disappears can not only guarantee that the present invention can have neural network deep enough and build ability in this way, and can also Enough guarantee there can be relatively good optimizing effect in gradient derivation process.

Subsequently into thresholding layer, thresholding function are as follows:

The thresholding number of plies exports Hash codes；

The loss function of depth Hash model is

V=(v (q₁),v(q₂),…,v(q_n))^TFor the Hash codes of thresholding layer output；Indicate the L1 norm of q-1 2 powers；L is weight factor；In the present inventionIt is corresponding when coboundary minimum for the coboundary of quantization error loss Target in boundary also can be minimum, so that the present invention has preferably study and effect of optimization.

Specific embodiment 2:

Present embodiment is a kind of image data retrieval method, and convolution sub-network described in present embodiment includes at least four It is Conv layers a, at least four Pool layers and at least two FC layers；Conv layers of expression convolutional layer, Pool layers of expression pond layer, FC layers Indicate full articulamentum；

In some embodiments, the convolution sub-network includes there are four at Conv layers and four Pool layers, and structure is as follows:

First Conv layers of Pool layers of connection the first, the first Pool layers of Conv layers of connection the 2nd, the 2nd Conv layers of connection second Pool layers, the 2nd Pool layers of Conv layers of connection the 3rd, the 3rd Conv layers of Pool layers of connection the 3rd, the 3rd Pool layers of connection the 4th Conv layers, the 4th Conv layers of Pool layers of connection the 4th；Four Pool layers of primary two FC layers of connection.Each Conv layers is made using RELU For activation primitive.

The loss function of existing model determines that its model depth cannot be too deep.Due to loss function form of the invention The convolution sub-network and Hash sub-network suitable for deeper can be allowed the invention to, will not be influenced because of gradient problem The depth of model；But also can reduce influence caused by relaxation, so that the effect of model is more preferable.The actually present invention Need four layers of convolution-pondization that can obtain the effect well retrieved, whole network model especially at this time is simple, so that Whole arithmetic speed faster, it is more efficient, nor will appear gradient and disappear or the problem of gradient is exploded, so that whole The Model Practical of body is stronger.Certainly model of the invention can be adapted for the network model of deeper completely, and with net The intensification effect of network model can further increase.

Other steps and depth Hash model structure are same as the specific embodiment one.

Embodiment

Tested according to the scheme of specific embodiment one and two, wherein convolution sub-network include there are four Conv layer with At four Pool layers, structure is as follows:

The weight factor l=0.35.

It is tested with CIFAR-10 data set, experiment extracts 1000 images from one kind every in CIFAR-10 data set Sample is as experimental data, wherein 900 image patterns are as training data, remaining 100 image patterns are as test data.

The present invention and existing classical way compare explanation, are shown in Table 1.Wherein the data of classical way are existing Data disclosed in file, data of the invention are the result that actual experiment obtains.

1 mAP Comparative result of table

By table 1 it can be seen that mAP result of the invention is much higher than other several methods, here it may also be noticed that this hair It is bright be only with simple four groups Conv-Pool layer add that two two FC layers of convolution sub-network obtains as a result, if using Classical Conv-Pool layers of group or deeper neural network, effect of the invention can be further improved.Simultaneously because this hair Bright Hash sub-network structure and loss function determine that the present invention can be adapted for deeper neural network completely, and will not be by To the influence of gradient problem, while influence caused by relaxation can also be controlled to a certain extent, so the present invention selects deeper Neural network, precision can be higher.And if may also be noticed that training set data be more often trained, the present invention MAP value can also further increase.

Even if having been had using simplest four groups Conv-Pool layers plus two FC layers the convolution sub-network present invention Good effect, in this case, since structure of the invention is simple, so operation efficiency is higher, compared to by simple five groups Conv-Pool layers plus two FC layers convolution sub-networks, efficiency of the invention can be improved 10% or so.

Claims

1. a kind of image data retrieval method, which comprises the following steps:

Depth Hash model is as follows:

A convolution sub-network is connected after INPUT layers, connects a Hash sub-network after convolution sub-network；The Hash subnet Network includes sliced layer, connects layer unit entirely, normalization layer unit, activation layer unit, merges layer and thresholding layer；

The feature that convolution sub-network exports is denoted as x, feature x is divided into n piece by sliced layer, and every feature is x_i, i=1,2 ... n；It is m/n per a piece of intrinsic dimensionality for including, m is characterized the dimension of x；

The full connection layer unit of Hash sub-network has n full articulamentums, and each full articulamentum respectively corresponds one x of processing_i, Quan Lian It connects processing and is expressed as f_i=W_ix_i+b_i, wherein W_iFor the weight matrix of i-th of full articulamentum, b_iFor corresponding biasing；

Normalization layer unit has n normalization layer, each one f of normalization layer alignment processing_i, g is expressed as after normalized_i, Range is [- 1,1],

Subsequently into thresholding layer, thresholding function are as follows:

The thresholding number of plies exports Hash codes；

The loss function of depth Hash model is

V=(v (q₁),v(q₂),…,v(q_n))^TFor the Hash codes of thresholding layer output；Indicate 2 times of the L1 norm of q-1 Side；L is weight factor；

2. a kind of image data retrieval method according to claim 1, which is characterized in that in the normalization layerf_jFor f₁To f_nThe middle maximum feature of mould.

3. a kind of image data retrieval method according to claim 1, which is characterized in that the weight factor l=0.35.

4. a kind of image data retrieval method according to claim 1,2 or 3, which is characterized in that the convolution subnet Network includes at least four Conv layers, at least four Pool layers and at least two FC layers；Conv layers of expression convolutional layer, Pool layers of expression Pond layer, FC layers indicate full articulamentum.

5. a kind of image data retrieval method according to claim 4, it is characterised in that；The convolution sub-network includes At four Conv layers and four Pool layers, structure is as follows:

First Conv layers of Pool layers of connection the first, the first Pool layers of Conv layers of connection the 2nd, the 2nd Conv layers of the 2nd Pool of connection Layer, the 2nd Pool layers of Conv layers of connection the 3rd, the 3rd Conv layers of Pool layers of connection the 3rd, the 3rd Pool layers of the 4th Conv of connection Layer, the 4th Conv layers of Pool layers of connection the 4th；Four Pool layers of primary two FC layers of connection.Each Conv layers using RELU as sharp Function living.