CN108304573A

CN108304573A - Target retrieval method based on convolutional neural networks and supervision core Hash

Info

Publication number: CN108304573A
Application number: CN201810157751.2A
Authority: CN
Inventors: 李弼程; 赵永威; 朱彩英; 陈良浩
Original assignee: Jiangsu Test Joint Space Big Data Application Research Center Ltd Co
Current assignee: Jiangsu Test Joint Space Big Data Application Research Center Ltd Co
Priority date: 2018-02-24
Filing date: 2018-02-24
Publication date: 2018-07-20

Abstract

The present invention relates to search method field, it is based especially on convolutional neural networks and supervises the target retrieval method of core Hash.The search method includes：(1) it introduces convolutional neural networks to learn training image, the high-order for implicitly being learnt to obtain image data using its special networks structure is indicated, further feature is generated；(2) supervision core hash method of the enhancing to the resolving power of linearly inseparable data is introduced, simultaneously object function is proposed using the equivalence relation of Hash codes inner product and Hamming distance, and the affinity information of combined training image exercises supervision study to dimensional images feature, and generate Hash codes；(3) image index is constructed using trained hash function, realizes the retrieval to large-scale image data.The present invention greatly improves target retrieval efficiency by the target retrieval method based on convolutional neural networks and supervision core Hash, enhances the practicability under big data environment.

Description

Target retrieval method based on convolutional neural networks and supervision core Hash

Technical field

The present invention relates to search method field, it is based especially on convolutional neural networks and supervises the target retrieval side of core Hash Method.

Background technology

With the arrival in big data epoch, internet video image resource rapidly increases, how to extensive video, image Concern target in resource is fast and effeciently retrieved urgently to be resolved hurrily to meet user demand.Although local feature region goes out It is existing, such as SIFT (Scale-Invariant Feature Transform), HOG (Histogram of Orientated ) etc. Gradients excellent performance is shown in image processing field, but generate these and describe the fixed coding step of the period of the day from 11 p.m. to 1 a.m So that description lacks learning ability, its picture material ability to express is limited, it is difficult to various image data is adapted to, certain Large-scale image target retrieval performance is reduced in degree.The inherent implication relation of great amount of images data in order to obtain generates more With distinction and representative feature, deep learning (Deep Learning) is applied to image procossing by the scholars such as Hinton In field, new approaches are provided to extract significantly more efficient characteristics of image.

Depth confidence network (Deep Belief Network, DBN) top layer uses three rank Boltzmann machine (Boltzmann Machine, BM), improved DBN is used for objective feature extraction, this feature is rotationally-varying to target to have higher Shandong Stick.In addition, researcher construct convolution depth confidence network (Convolutional Deep Belief Network, CDBN), learn effective high-order character representation in the natural image that can be never marked using CDBN.And by convolutional Neural SPP (Spatial are added in the convolutional layer of network (Convolutional Neural Network, CNN) and full articulamentum Pyramid Pooling) layer, directly different size image is learnt and generates Analysis On Multi-scale Features.

But deep learning generate characteristics of image dimension it is higher, there are problems that dimension disaster, when image data scale compared with When big, retrieval rate will be made drastically by carrying out retrieval according to traditional nearest _neighbor retrieval method (such as R-tree, KD-tree) Decline, it is difficult to be suitable for large-scale data.Extensive high dimensional image is effectively retrieved to realize, researcher proposes Approximate KNN search strategy (Approximate Nearest Neighbor, ANN).Wherein, salted hash Salted (Hashing) is The main stream approach of approximate KNN search problem is solved, thought is by dimensional images Feature Mapping using hash function race to low In dimension space, while making distance is closer in former space point is mapped to after lower dimensional space still to keep closer distance.LSH and its The hash function of innovatory algorithm construction is all unrelated with data, and in recent years, how researchers are in conjunction with data characteristics structure It makes hash function that is effective, compacting and proposes many algorithms.Hash method (Spetral Hashing, SH) is composed first to similar The Laplacian Matrix characteristic value and feature vector of figure are analyzed, then by the condition of relaxing the restriction, will be to image feature vector The dimensionality reduction problem that encoded question is converted to Laplce's characteristic pattern solves, this method rely on data itself generate index than with Machine generates hash function method and reaches higher accuracy rate.But unsupervised method does not consider the semantic information of image, And user is often more likely to the semantic information of retrieval result.For this purpose, researcher proposes semi-supervised hash method (Semi- Supervised Hashing, SSH).Researchers also proposed some full supervision and breathe out on the basis of semi-supervised learning method Uncommon method, such as SH (Semantic Hashing), BRE (Binary Reconstructive Embedding), MLH (Minimal Loss Hashing) etc., full hash method of supervising can reach higher accuracy rate compared to non-supervisory method, but The problems such as to be that there are optimization process complex, training effectiveness is low, the application that this severely limits it on large-scale dataset.

Invention content

The technical problem to be solved by the present invention is to：In order to solve existing unsupervised method, there is no the languages for considering image The deficiency of adopted information, the target retrieval method based on convolutional neural networks and supervision core Hash that the present invention provides a kind of, passes through Target retrieval method based on convolutional neural networks and supervision core Hash can utilize convolutional neural networks to Large Scale Graphs As data characteristics progress autonomous learning, enhance the ability to express of characteristics of image.Secondly, by supervision core hash method to dimensional images Further feature exercises supervision study, and high dimensional feature is mapped in low-dimensional Hamming space, generates the Hash codes compacted, greatly Target retrieval efficiency is improved, the practicability under big data environment is enhanced.

The technical solution adopted by the present invention to solve the technical problems is：

A kind of target retrieval method based on convolutional neural networks and supervision core Hash, includes the following steps：

(1) convolutional neural networks are introduced to learn training image, is implicitly learnt using its special networks structure High-order to image data indicates, generates further feature；

(2) supervision core hash method of the enhancing to the resolving power of linearly inseparable data is introduced, while using in Hash codes Product and the equivalence relation of Hamming distance propose object function, and the affinity information of combined training image to dimensional images feature into Row supervised learning, and generate Hash codes；

(3) image index is constructed using trained hash function, realizes the retrieval to large-scale image data.

Specifically, the input picture size of the convolutional neural networks is 227 × 227, is exported deep for 4096 × 1 image Layer feature includes 5 convolutional layers, 3 sub- sample levels altogether；In the characteristic pattern of convolutional layer preceding layerWith the volume that can learn Product core K_ijConvolution is carried out, the result of convolution generates the characteristic pattern of this layer through nonlinear function f ()Concrete form is such as Under：

Formula one：

For first of convolutional layer C_lOutput,Represent convolution algorithm, b_jFor biasing, convolution kernel K_ijIt can be with preceding layer One or more features figure determines convolution relation, M_jRepresent input feature vector set of graphs, common nonlinear function have f (x)= Tanh (x) and f (x)=(1+e^-x)^-1, the characteristic pattern size h of convolutional layer generation_lIt can be calculated by formula two：

Formula two：

h_l-1For the size of l-1 layers of characteristic pattern, z_lIndicate the size of l layers of convolution kernel, λ_lIt is convolution kernel moving step length, ρ_l To the columns of preceding layer characteristic pattern edge zero padding when expression convolution algorithm；Each layer convolution kernel size Z={ z₁=11, z₂=5, z₃= z₄=z₅=3 }, moving step length Λ={ λ₁=4, λ₂=λ₃=λ₄=λ₅=1 }, characteristic pattern edge zero padding columns P={ ρ₁=0, ρ₂ =2, ρ₃=ρ₄=ρ₅=1 }；Sub-sampling layer carries out maximum value sampling, sampling area 3 using overlap sampling method to characteristic pattern × 3, sampling step length is 2 pixels；

The training of convolutional neural networks is mainly divided to two stages of propagated forward and back-propagating：

The propagated forward stage：Sample (X, a Y are chosen from training sample_p), X is transmitted to from input layer through converting step by step Output layer calculates corresponding reality output：

Formula three：O_p=F_n(…(F₂(F₁(X_pW⁽¹⁾)W⁽²⁾)…)W⁽ⁿ⁾)

The back-propagating stage：The stage is the error propagation stage, calculates reality output O_pY is exported with corresponding ideal_p

Error：

Formula four：

By error E_pReversely successively pusher obtains the error of each layer, and adjusts neuron weights by error approach is minimized, As overall error E≤ε, the training of the batch training sample is completed, after the completion of the training of all batches, by image input convolution god Through in network, image data step by step by each network layer after, can be obtained the further feature of image in output end.

Specifically, measure image between apart from when, give the Hash codes dimension r, then need r coefficient vector a₁,…,a_rConstruct hash functionThe label information of training image can lead to The semantic dependency and space length for crossing image obtain, and are defined as description label image collection χ_l={ x₁,…,x_lIn between element Correlation check matrix

Formula five：

Make image x_i,x_jHamming distance D_h(x_i,x_j) meet：

Formula six：

Hash codes distance, the Hash codes code of image x are calculated using inner product of vectors operation_r(x)=[h₁(x),…,h_r(x)] ∈{1,-1}^1×r, then image x_i,x_jDistance calculate as shown in formula 11：

D(x_i,x_j)=code_r(x_i)·code_r(x_j)

=| k | h_k(x_i)=h_k(x_j),1≤k≤r}|-|{k|h_k(x_i)≠h_k(x_j),1≤k≤r}

=r-2 | and k | h_k(x_i)≠h_k(x_j),1≤k≤r}|

Formula seven：=r-2D_h(x_i,x_j)

D(x_i,x_j) ∈ [- r, r], to D (x_i,x_j) obtain after normalizationDefinition is so that similar MatrixThe minimum object function with check matrix S distances：

Formula eight：

Matrix F robenius norms are sought in expression,For the Kazakhstan of label image collection χ l Uncommon code matrix, is generalized to matrix form, according to formula (3) H by sgn ()_lIt can be expressed as：

Formula nine：

Hl is substituted into public Formula eight

Formula ten：

Formula 11：

Define matrixWherein R₀=rS can then lead to It crosses greedy algorithm minimum formula (11) and gradually estimates a_k：

Formula 12：

Remove constant term, more succinct object function can be obtained：

Formula 13：

WithSgn (x) is replaced, then approximate objective functionAs shown in formula 14：

Formula 14：

Gradient descent method pair can be passed throughIt minimizes,About a_kGradient is asked to obtain：

Formula 15：

⊙ indicates Hadamard inner product operations, to accelerateIt restrains and is breathed out with spectrum Spectral analysis method in uncommon generates initial valueGradient search process is further accelerated, it is right after obtaining hash function H and Hash table H The further feature of query image carries out Hash mapping and obtains code_r(x_q), calculate code_r(x_q) with Hash table H in Hash codes away from From the closer image of layback is as retrieval result.

The beneficial effects of the invention are as follows：The target based on convolutional neural networks and supervision core Hash that the present invention provides a kind of Search method can utilize convolutional Neural by the target retrieval method based on convolutional neural networks and supervision core Hash Network carries out autonomous learning to large-scale image data feature, enhances the ability to express of characteristics of image.Secondly, by supervision core Hash Method exercises supervision study to dimensional images further feature, and high dimensional feature is mapped in low-dimensional Hamming space, and generation is compacted Hash codes, greatly improve target retrieval efficiency, enhance the practicability under big data environment.

Description of the drawings

Present invention will be further explained below with reference to the attached drawings and examples.

Fig. 1 is the structural schematic diagram of the convolutional neural networks of the present invention；

Fig. 2 is convolutional neural networks structure chart of the present invention for the extraction of image further feature；

Fig. 3 is the schematic diagram of present invention target retrieval MAP value on ImageNet-100 databases；

Fig. 4 is schematic diagram of the training time of the invention with Hash code bit number r change curves；

Fig. 5 is the schematic diagram of present invention target retrieval MAP value on Caltech-256 databases；

Fig. 6 is the schematic diagram of present invention Precision-Recall curves on Caltech-256 databases；

Specific implementation mode

Fig. 1 is the structural schematic diagram of the convolutional neural networks of the present invention, and Fig. 2 is that the present invention is extracted for image further feature Convolutional neural networks structure chart, Fig. 3 be the present invention on ImageNet-100 databases target retrieval MAP value schematic diagram, Fig. 4 is schematic diagram of the training time of the invention with Hash code bit number r change curves, and Fig. 5 is the present invention in Caltech-256 data The schematic diagram of target retrieval MAP value on library, Fig. 6 are that present invention Precision-Recall on Caltech-256 databases is bent The schematic diagram of line.

First, convolutional neural networks are introduced to learn training image, is implicitly learnt using its special networks structure The high-order for obtaining image data indicates, generates the further feature with stronger distinction and ability to express；

Secondly, introducing supervision core hash method (Kernel-Based Supervised Hashing, KSH) enhances to line Property can not divided data resolving power, while being proposed using the equivalence relation of Hash codes inner product and Hamming distance simpler, effective Object function, and the affinity information of combined training image exercises supervision study to dimensional images feature, and generates and compact Hash codes；

Finally, trained hash function constructs image index for utilization, realizes the efficient inspection to large-scale image data Rope.

The target retrieval method based on convolutional neural networks and supervision core Hash utilizes convolutional Neural net first The input picture size of the further feature of network structure extraction image, convolutional neural networks used is 227 × 227, and it is 4096 to export × 1 image further feature includes 5 convolutional layers, 3 sub- sample levels altogether.In convolutional layer, the characteristic pattern of preceding layerWith The convolution kernel K that can learn_ijConvolution is carried out, the result of convolution generates the characteristic pattern of this layer through nonlinear function f ()Tool Body form is as follows：

Wherein,For first of convolutional layer C_lOutput,Represent convolution algorithm, b_jFor biasing, convolution kernel K_ijCan with it is previous The one or more features figure of layer determines convolution relation, M_jInput feature vector set of graphs is represented, common nonlinear function has f (x) =tanh (x) and f (x)=(1+e^-x)^-1, compared with above-mentioned nonlinear function, f (x)=max (0, x) can effectively improve trained effect Rate.The characteristic pattern size h that convolutional layer generates_lIt can be calculated by formula (2)：

Wherein, h_l-1For the size of l-1 layers of characteristic pattern, z_lIndicate the size of l layers of convolution kernel, λ_lIt is convolution kernel movement step It is long, ρ_lTo the columns of preceding layer characteristic pattern edge zero padding when expression convolution algorithm.Here, each layer convolution kernel size Z={ z₁=11, z₂=5, z₃=z₄=z₅=3 }, moving step length Λ={ λ₁=4, λ₂=λ₃=λ₄=λ₅=1 }, characteristic pattern edge zero padding columns P= {ρ₁=0, ρ₂=2, ρ₃=ρ₄=ρ₅=1 }.In sub-sampling layer, document shows, relative to traditional non-overlapping sampling, to use overlapping Sampling can not only improve the accuracy of feature, be also prevented from the training stage and over-fitting occur, therefore, use overlap sampling here Method carries out maximum value sampling to characteristic pattern, and sampling area is 3 × 3, and sampling step length is 2 pixels.

(1) the propagated forward stage.Sample (X, a Y are chosen from training sample_p), X is from input layer through converting biography step by step It is sent to output layer, calculates corresponding reality output：

O_p=F_n(…(F₂(F₁(X_pW⁽¹⁾)W⁽²⁾)…)W⁽ⁿ⁾) (3)

(2) back-propagating stage, also referred to as error propagation stage.Calculate reality output O_pY is exported with corresponding ideal_pError：

By error E_pReversely successively pusher obtains the error of each layer, and adjusts neuron weights by error approach is minimized, As overall error E≤ε, the training of the batch training sample is completed.After the completion of the training of all batches, by image input convolution god Through in network, image data step by step by each network layer after, can be obtained the further feature of image in output end.

The target retrieval method based on convolutional neural networks and supervision core Hash, schemes in measurement As between apart from when, give Hash codes dimension r, then need r coefficient vector a₁,…,a_rConstruct hash functionThe label information of training image can pass through the semantic dependency and space length of image It obtains, lable (x_i,x_j)=1 indicates image x_i,x_jIt is similar；Conversely, lable (x_i,x_jThe representative image x of)=- 1_i,x_jDifference It is very big.To describe label image collection χ_l={ x₁,…,x_lIn correlation between element, define check matrix

Wherein, lable (x_i,x_i) ≡ 1, S_ii≡ 1, S_ij=0 indicates image x_i,x_jBetween similitude it is uncertain.For enhancing The separating capacity of Hash codes so that the similitude between image can be efficiently judged in Hamming space, it should be as possible so that image x_i,x_jHamming distance D_h(x_i,x_j) meet：

Due to Hamming distance calculation formula D_h(x_i,x_j)=| k | h_k(x_i)≠h_k(x_j), 1≤k≤r } | form is complicated, very Hardly possible directly optimizes it, therefore utilizes inner product of vectors operation to calculate Hash codes distance herein.The Hash codes code of image x_r (x)=[h₁(x),…,h_r(x)]∈{1,-1}^1×r, then image x_i,x_jDistance calculate as shown in formula (11)：

D(x_i,x_j)=code_r(x_i)·code_r(x_j)

=| k | h_k(x_i)=h_k(x_j),1≤k≤r}|-|{k|h_k(x_i)≠h_k(x_j),1≤k≤r}|

=r-2 | and k | h_k(x_i)≠h_k(x_j),1≤k≤r}|

=r-2D_h(x_i,x_j) (7)

Formula (7) is shown to be consistent by Hash codes inner product operation and Hamming distance operation, and D (x_i,x_j)∈[-r, R], to D (x_i,x_j) obtain after normalizationTo make similar matrixWith check matrix S Distance is minimum, objective function：

Wherein,Matrix F robenius norms are sought in expression,For label image collection χ_lKazakhstan Uncommon code matrix.Sgn () is generalized to matrix form, according to formula (3) H_lIt can be expressed as：

Wherein,By H_lSubstitution formula (8)

Compared with BRE and MLH, object function Γ (A) calculates similitude by inner product, is built to parameter A Mould is more intuitive.It is assumed that at the t=k moment, it is known that vectorIt needs to estimate a_k, define matrixWherein R₀=rS then can minimize formula (11) by greedy algorithm and gradually estimate a_k：

Remove constant term, more succinct object function can be obtained：

Since sgn (x) functions in object function makeDiscontinuously, andNor convex function, it is difficult to directly It connects pairIt minimizes, when | x | when ＞ 6, continuous functionIt can approximation sgn (x), therefore profit well WithSgn (x) is replaced, then approximate objective functionAs shown in formula (14)：

Wherein,⊙ indicates Hadamard inner product operations.It is smoothed that treatedIt is not convex function, globally optimal solution can not be acquired, in order to accelerateConvergence, the present invention utilize the spectral analysis method in spectrum Hash Generate initial valueGradient search process is further accelerated, specific implementation step is as follows：

It supervises core hash method and realizes step：

Input：Training image collectionLabel image collectionAnd check matrix

Kernel function κHash code bit number r participates in trained sample number m (＜ l).

Pretreatment：M images are randomly selected from χ, are obtained

Training：Initialize R₀=rS, T_max=500；

For k=1 ..., r do

It is obtained using spectral analysis methodIt willSubstitute into object functionIn, pass through gradient Descent method is calculated

end if

R_k←R_k-1-h^*(h^*)^T

end for

Coding：For i=1 ..., n do

Output：Hash functionWith Hash table H={ code_r(x_i)|i∈[1, n]}。

After obtaining hash function H and Hash table H, Hash mapping is carried out to the further feature of query image and obtains code_r (x_q), calculate code_r(x_q) at a distance from Hash codes, the closer image of layback is as retrieval result in Hash table H.

Embodiment one：Target retrieval method of the present embodiment based on convolutional neural networks and supervision core Hash.First, it introduces Convolutional neural networks learn training image, implicitly learn to obtain the high-order of image data using its special networks structure It indicates, generates the further feature with stronger distinction and ability to express；Then, supervision core hash method (Kernel- is introduced Based Supervised Hashing, KSH) enhancing utilizes Hash codes inner product to the resolving powers of linearly inseparable data Simpler, effective object function, and the affinity information pair of combined training image are proposed with the equivalence relation of Hamming distance Dimensional images feature exercises supervision study, and generates the Hash codes compacted；Finally, utilization trained hash function structural map As index, the efficient retrieval to large-scale image data is realized.

Compared with traditional target retrieval method, the independent learning ability of characteristics of image is effectively enhanced, and can Hash codes of compacting are generated using supervision core Hash, time overhead is reduced, enhances the practicability under large-scale data.

Embodiment two：Referring to Fig. 1, Fig. 2, the target retrieval based on convolutional neural networks and supervision core Hash of the present embodiment Method generates the further feature of image data using following step：

First, the further feature of convolutional neural networks structure extraction image, the input figure of convolutional neural networks used are utilized Picture size is 227 × 227, exports the image further feature for 4096 × 1, includes 5 convolutional layers, 3 sub- sample levels altogether. Convolutional layer, the characteristic pattern of preceding layerWith the convolution kernel K that can learn_ijConvolution is carried out, the result of convolution is through nonlinear function f () generates the characteristic pattern of this layerConcrete form is as follows：

O_p=F_n(…(F₂(F₁(X_pW⁽¹⁾)W⁽²⁾)…)W⁽ⁿ⁾) (3)

Between measuring image apart from when, give Hash codes dimension r, then need r coefficient vector a₁,…,a_rConstruction is breathed out Uncommon functionThe label information of training image can pass through the semantic dependency and sky of image Between distance obtain, lable (x_i,x_j)=1 indicates image x_i,x_jIt is similar；Conversely, lable (x_i,x_jThe representative image of)=- 1 x_i,x_jIt is widely different.To describe label image collection χ_l={ x₁,…,x_lIn correlation between element, define check matrix

D(x_i,x_j)=code_r(x_i)·code_r(x_j)

=| k | h_k(x_i)=h_k(x_j),1≤k≤r}|-|{k|h_k(x_i)≠h_k(x_j),1≤k≤r}|

=r-2 | and k | h_k(x_i)≠h_k(x_j),1≤k≤r}|

=r-2D_h(x_i,x_j) (7)

Wherein,By H_lSubstitution formula (8)

Compared with BRE and MLH, object function Γ (A) calculates similitude by inner product, more intuitive to parameter A modelings.It is false It is scheduled on the t=k moment, it is known that vectorIt needs to estimate a_k, define matrix Wherein R₀=rS then can minimize formula (11) by greedy algorithm and gradually estimate a_k：

Remove constant term, more succinct object function can be obtained：

Wherein,⊙ indicates Hadamard inner product operations.It is smoothed that treatedIt is not convex function, globally optimal solution can not be acquired, in order to accelerateConvergence, the present invention utilize the spectral analysis method in spectrum Hash Generate initial valueGradient search process is further accelerated, after obtaining hash function H and Hash table H, to the further feature of query image It carries out Hash mapping and obtains code_r(x_q), calculate code_r(x_q) at a distance from Hash codes, layback is closer in Hash table H Image is as retrieval result.

Experimental result and analysis

Here, using being assessed the present invention on ImageNet-1000 and Caltech-256 image sets, ImageNet-1000 image sets are a subsets of ImageNet image sets, are large scale visual identity contest (Large Scale Visual Recognition Challenge, LSVRC) evaluation and test data set, including 1000 classifications total 1,200,000 Open image；Caltech-256 image sets are frequently-used data collection in target classification task, including total 30608 figures of 256 classifications Picture, wherein including at least 80 images in each classification.Experimental Hardware be configured in save as 6G GPU equipment GTX Titan and Intel Xeon CPU, the interior server for saving as 16G.Target retrieval performance indicator uses precision ratio and recall ratio, and definition is such as Under：

The influence of parameter

First, core hash method (abbreviation KSH) retrieval performance is supervised with the variation of Hash code bit number r, this hair in order to verify It is bright to be tested on ImageNet-1000 image sets, and be compared with some current mainstream hash methods, including LSH(Locality Sensitive Hashing)、SKLSH(LSH with Shift-Invariant Kernels)、SH (Spetral Hashing)、DSH(Density Sensitive Hashing)、PCA-ITQ(Iterative Quantization of PCA), the methods of BRE (Binary Reconstructive Embedding).Experiment first from 50 classes are randomly selected in ImageNet-1000 image sets, and to this 50 class image zooming-out GIST feature；Then, from each class Training set of the feature (feature for amounting to 50000 images) of 1000 images as supervised training hash function is randomly selected, Remaining image is as inquiry use-case；Finally, it introduces hash method to be retrieved, it is as shown in Figure 3 to obtain experimental result：

From figure 3, it can be seen that with the increase of Hash code bit number r, the MAP value of each method increases, however, working as r After increasing to certain value, MAP value increase amplitude, which tapers into, to be tended to be saturated.The target retrieval MAP value for comparing each hash method can Know, retrieved using the present invention (KSH) has better performance compared with other main stream approach, this is because unsupervised hash method (such as LSH, SH, DSH, PCA-ITQ etc.) and supervision hash method BRE do not utilize the semantic information construction of image to breathe out well Uncommon function, causes retrieval performance relatively low；And KSH of the present invention introduces Kernel hash function and strengthens to linearly inseparable number According to resolution capability, hash function is trained in combination with the affinity information of image, generates the Hash that more compacts Code, to improve target retrieval performance.

Experiment is again compared the present invention and the hash function training time consumption of current main-stream hash method, specifically As shown in Figure 4.It is not difficult to find out from Fig. 4, no prison fewer than supervision hash method using the time loss of unsupervised hash method It superintends and directs hash method mostly and is to retain the position sensing of primitive character as optimization aim, and it is with image language to supervise hash method For adopted neighbor information as supervision message, searching process is increasingly complex, compares unsupervised hash method time overhead bigger.

Experimental performance is analyzed

To verify the effective of the target retrieval method (abbreviation CNN+KSH) based on convolutional neural networks and supervision core Hash Property, 50 class images have been randomly selected in ImageNet-1000 image sets is tested to obtain table 1.It is not difficult to find out from table 1, Based on the CNN MAP values retrieved of extraction further feature than the MAP value retrieved based on global GIST features be higher by 10% with On, illustrate that the image further feature using CNN extractions has stronger distinction and ability to express, this is because GIST features carry It takes step to fix, does not have independent learning ability, so that its image expression limited ability, and CNN can imitate brain processing The pattern of data carries out feature extraction to image, and the network structure of its deep layer can be excavated effectively in image and be closed implicit System, enhances the image expression ability of feature.Wherein, the online retrieving time of (CNN+KSH) of the invention is 1.775 × 10^-4Second, It is suitable with other main stream approach, and MAP value has reached 40.79%, retrieval performance is apparently higher than other methods, therefore the present invention exists There is stronger applicability under big data environment.

The target retrieval MAP value of 1 distinct methods of table and retrieval time comparison (64bits)

Comparison diagram 5, Fig. 6 are it is found that the present invention (CNN+KSH) MAP value under equal length Hash codes coding is above other masters Stream method, and under conditions of ensureing identical precision ratio, CNN+KSH can reach recall ratio more higher than other methods.Document In method be all to be re-introduced into after image zooming-out GIST features hash method construction index, although GIST features pass through it is multiple dimensioned Multi-direction Gabor filter group is to having obtained global structure and spatial context information after image filtering, but GIST feature grains Degree is more coarse, and lacks independent learning ability, limits its image expression ability, and the hash method in these documents It only considers how that data characteristics is combined to construct hash function, there is no the semantic informations for utilizing image well, lead to its mesh It is relatively low to mark retrieval performance, it is difficult to be adapted to extensive target retrieval；CNN+KSH methods utilize ImageNet-1000 image sets pair CNN is trained, and a large amount of training image makes the training of CNN model parameters more abundant, can effectively excavate in image hidden Containing relationship, the image expression ability of feature is effectively enhanced, and KSH supervises hash function using image similarity information Supervise and instruct white silk, while searching process is accelerated using greedy algorithm and gradient descent method so that the present invention is better than current main-stream method.It is real It tests result and also demonstrates the retrieval performance of of the invention (CNN+KSH) on Caltech-256 image sets and be substantially better than other methods. Finally for different query images, the present invention compared to other methods can retrieve to obtain more with the relevant figure of query image Picture.

1.1 convolutional neural networks

Convolutional neural networks (Convolutional Neural Network, CNN) are put forward for the first time by Fukushima, It is first and really successfully trains the learning algorithm of multitiered network structure, and is widely used in solving how to extract study image The further feature problem of data, basic thought are using the local sensing region of image as the input of network, and information is again successively It is transferred to different layers, every layer is gone to obtain to translation, rotation and scale notable with invariance by a digital filter Feature.

As shown in Figure 1, the multilayer depth nerve net that CNN is made of multiple convolutional layers (C layers) and sub-sampling layer (S layers) Network will generally be followed by sub-sampling layer after convolutional layer, and every layer is all made of multiple two dimensional surfaces, and network structure can be by scheming Described by 1.Convolutional layer is also referred to as feature extraction layer, is made of multiple characteristic patterns, wherein each characteristic pattern is by multiple independent nerves Member composition, different characteristic patterns realize the extraction to last layer data different characteristic.All neurons of same characteristic plane are total One or more connection weight is enjoyed, drastically reduces the number of parameters for needing training, and reduce the complexity of CNN.Son Sample level samples subregion in characteristic pattern, realizes to characteristic pattern high dimensional feature dimensionality reduction, and it is existing to prevent " dimension disaster " As this distinctive structure of feature extraction twice makes feature have invariance, Er Qieqi to transformation such as translation, scaling, deformation The network structure of deep layer can be excavated effectively in image in implication relation, the ability to express of Enhanced feature.

1.2 position sensing core Hash (KLSH)

It is the high dimensional data for enhancing hash function to linearly inseparableResolution capability, Kulis etc. Utilize kernel function κ:Build hash function h:Mapping is carried out to high dimensional data and generates Hash codes, Hash function concrete form is shown below：

Wherein,x₍₁₎,…,x_(n)It is the m sample randomly selected from χ, in order to realize quick Hash Mapping, m are the constants much smaller than n.Hash function h (x) is in addition to meeting low-dimensional Hamming space similar to original higher dimensional space one Outside cause property, it also should ensure that the Hash codes of generation are balanced, i.e. hash function h (x) should meet

Then biasThe value of b, which is entered formula (18), to be obtained：

Wherein, a=[a₁,…a_m]^T, It is shown in mapping matrix such as formula (21)：

Wherein,It can be by precalculating to obtain, coefficient vector a is the m obtained by random sampling Dimensional vector.The present invention obtains coefficient vector a using the study that exercises supervision of the correlation information of training data, and construction is associated with the data Hash function, enhancing generate Hash codes distinction, improve retrieval precision ratio.

It is enlightenment with above-mentioned desirable embodiment according to the present invention, through the above description, relevant staff is complete Various changes and amendments can be carried out without departing from the scope of the technological thought of the present invention' entirely.The technology of this invention Property range is not limited to the contents of the specification, it is necessary to determine its technical scope according to right.

Claims

1. a kind of target retrieval method based on convolutional neural networks and supervision core Hash, characterized in that include the following steps：

(1) convolutional neural networks are introduced to learn training image, implicitly learns to obtain figure using its special networks structure As the high-order expression of data, further feature is generated；

(2) introduce supervision core hash method of the enhancing to the resolving powers of linearly inseparable data, at the same using Hash codes inner product with The equivalence relation of Hamming distance proposes object function, and the affinity information of combined training image supervises dimensional images feature Educational inspector practises, and generates Hash codes；

2. the target retrieval method according to claim 1 based on convolutional neural networks and supervision core Hash, feature exist In：The input picture size of the convolutional neural networks is 227 × 227, exports the image further feature for 4096 × 1, altogether Including 5 convolutional layers, 3 sub- sample levels；In the characteristic pattern of convolutional layer preceding layerWith the convolution kernel K that can learn_ijIt carries out Convolution, the result of convolution generate the characteristic pattern of this layer through nonlinear function f ()Concrete form is as follows：

Formula one：

For first of convolutional layer C_lOutput,Represent convolution algorithm, b_jFor biasing, convolution kernel K_ijIt can be with one of preceding layer Or multiple characteristic patterns determine convolution relation, M_jInput feature vector set of graphs is represented, common nonlinear function has f (x)=tanh (x) With f (x)=(1+e^-x)^-1, the characteristic pattern size h of convolutional layer generation_lIt can be calculated by formula two：

Formula two：

h_l-1For the size of l-1 layers of characteristic pattern, z_lIndicate the size of l layers of convolution kernel, λ_lIt is convolution kernel moving step length, ρ_lIt indicates To the columns of preceding layer characteristic pattern edge zero padding when convolution algorithm；Each layer convolution kernel size Z={ z₁=11, z₂=5, z₃=z₄= z₅=3 }, moving step length Λ={ λ₁=4, λ₂=λ₃=λ₄=λ₅=1 }, characteristic pattern edge zero padding columns P={ ρ₁=0, ρ₂=2, ρ₃=ρ₄=ρ₅=1 }；Sub-sampling layer carries out maximum value sampling using overlap sampling method to characteristic pattern, and sampling area is 3 × 3, Sampling step length is 2 pixels；

The propagated forward stage：Sample (X, a Y are chosen from training sample_p), X is transmitted to output from input layer through converting step by step Layer, calculates corresponding reality output：

Formula three：O_p=F_n(…(F₂(F₁(X_pW⁽¹⁾)W⁽²⁾)…)W⁽ⁿ⁾)

The back-propagating stage：The stage is the error propagation stage, calculates reality output O_pY is exported with corresponding ideal_pError：

Formula four：

By error E_pReversed successively pusher obtains the error of each layer, and adjusts neuron weights by error approach is minimized, and is missed when total When poor E≤ε, the training of the batch training sample is completed, after the completion of the training of all batches, image is inputted into convolutional neural networks In, image data step by step by each network layer after, can be obtained the further feature of image in output end.

3. the target retrieval method according to claim 1 based on convolutional neural networks and supervision core Hash, feature exist In：Measure image between apart from when, give the Hash codes dimension r, then need r coefficient vector a₁,…,a_rConstruct Hash FunctionThe label information of training image can be related by the semanteme of image Property and space length obtain, be defined as description label image collection χ_l={ x₁,…,x_lIn correlation between element supervision Matrix

Formula five：

Make image x_i,x_jHamming distance D_h(x_i,x_j) meet：

Formula six：

Hash codes distance, the Hash codes code of image x are calculated using inner product of vectors operation_r(x)=[h₁(x),…,h_r(x)]∈ {1,-1}^1×r, then image x_i,x_jDistance calculate as shown in formula 11：

Formula seven：

Formula eight：

Matrix F robenius norms are sought in expression,For label image collection χ_lHash codes Sgn () is generalized to matrix form by matrix, according to formula (3) H_lIt can be expressed as：

Formula nine：

By H_lSubstitute into formula eight

Formula ten：

Formula 11：

Define matrixWherein R₀=rS can then pass through greediness Algorithmic minimizing formula (11) gradually estimates a_k：

Formula 12：

Remove constant term, more succinct object function can be obtained：

Formula 13：

Formula 14：

Formula 15：

Hadamard inner product operations are indicated, to accelerateConvergence and use compose Hash in Spectral analysis method generate initial valueGradient search process is further accelerated, after obtaining hash function H and Hash table H, to inquiry The further feature of image carries out Hash mapping and obtains code_r(x_q), calculate code_r(x_q) in Hash table H at a distance from Hash codes, The closer image of layback is as retrieval result.