CN110019652A

CN110019652A - A kind of cross-module state Hash search method based on deep learning

Info

Publication number: CN110019652A
Application number: CN201910196009.7A
Authority: CN
Inventors: 董西伟; 邓安远; 周军; 杨茂保; 孙丽; 胡芳; 贾海英; 王海霞
Original assignee: Jiujiang University
Current assignee: Jiujiang University
Priority date: 2019-03-14
Filing date: 2019-03-14
Publication date: 2019-07-16
Anticipated expiration: 2039-03-14
Also published as: CN110019652B

Abstract

A kind of cross-module state Hash search method based on deep learning, it is assumed thatThe pixel characteristic vector set of the image modalities of a object is, it is characterized in that method includes the following steps: (1) obtains the binary system Hash coding that image modalities and text modality are shared using based on the objective function that depth learning technology designs, the deep neural network parameter of image modalities and text modalityWithAnd the projection matrix of image modalities and text modalityWith；(2) known variables in objective function are solved using the mode alternately updated、、、With；(3) based on the deep neural network parameter for solving obtained image modalities and text modalityWithAnd projection matrixWith；(4) the binary system Hash coding based on generation calculates the Hamming distance that query sample concentrates each sample to sample retrieval；(5) retrieval to query sample is completed using the cross-module state searcher searched for based on approximate KNN.The performance of cross-module state Hash retrieval is effectively promoted in this method.

Description

A kind of cross-module state Hash search method based on deep learning

Technical field

The cross-module state Hash search method based on deep learning that the present invention relates to a kind of.

Background technique

Along with the fast development of science and technology and social productive forces, big data era is come quietly.So-called big data is Finger is not available the data acquisition system that conventional software tool is captured, managed and handled within the regular hour.IBM is mentioned Big data has the characteristics that 5V out, it may be assumed that Volume (data volume is big), Variety (type and source diversification), Value (data valence Be worth density it is relatively low, and it is sometimes but precious), Velocity (data growth rate is fast), the Veracity (matter of data Amount).Big data is it is also assumed that be to need new tupe that could have stronger decision edge, see clearly discovery power and process optimization The information assets of ability.

Information retrieval is an importance of data processing, and faces big data, how information retrieval is effectively performed The urgently to be resolved and very challenging problem as big data era.Large-scale data is retrieved, Hash retrieval side Method plays important role.The high dimensional feature of object is mapped in Hamming space by Hash search method, generates a low-dimensional Hash encode to indicate an object, it reduces requirement of the searching system to calculator memory space, improve retrieval speed Degree can better adapt to the requirement of magnanimity retrieval.The main thought of Hash retrieval is that the data projection that high dimension vector is indicated arrives Hamming space carries out the retrieval of k nearest neighbor (K >=1) in Hamming space.In order to make the k nearest neighbor and luv space in Hamming space It is consistent, Hash learning algorithm needs to meet local retention performance, that is, keeps the similitude before and after data projection.Part is quick Sense Hash (Locality Sensitive Hashing, LSH) method can make the two o'clock that distance is close in higher dimensional space, pass through After hash function is crossed to this two o'clock progress Hash coding, it is the same that their Hash coding, which has very big probability, conversely, if two Farther out, then their Hash encodes identical probability meeting very little to the distance between point.

The retrieval of cross-module state Hash is mainly used for solving the mutual search problem between different modalities data, for example, using image Retrieve text or with text retrieval image etc..Cross-module state Hash search method needs to carry out Hash to the data of different modalities Coding generates compact binary system Hash coding, the phase being then based between the Hash coding completion different modalities data of generation Mutually retrieval.Ding et al. propose collective's matrix decomposition Hash (Collective Matrix FactorizationHashing, CMFH) method.CMFH method can use collective's matrix decomposition and learn unified Hash coding from the different modalities of each example. In order to which classification information is efficiently used in the cross-module state hash method based on matrix decomposition and keeps local geometry, in turn Achieve the purpose that effectively to promote the potential applications characteristic differentiation ability obtained by matrix decomposition, Tang et al. has proposed supervision square Battle array decomposes Hash (Supervised Matrix Factorization Hashing, SMFH) method.SMFH method is being breathed out When uncommon coding study, the consistency of mark information between mode is not only considered, it is also contemplated that the one of local geometry inside mode Cause property.For the problem that not rare supervision cross-module state hash method training time complexity is excessively high, Zhang et al. is proposed referred to as Semantic dependency maximizes having for (Semantic Correlation Maximization, SCM) and supervises cross-module state Hash side Method.Semantic marker information can be seamlessly integrated into Hash learning process by SCM method.

Craft feature used in the above-mentioned cross-module state hash algorithm based on shallow-layer learning structure may not be able to be with Hash Coding study reaches optimal compatibility.In order to solve this problem, Jiang et al. proposes depth cross-module state Hash (Deep Cross-modal Hashing, DCMH) method.DCMH method is a kind of cross-module state end to end based on deep learning framework Hash method, feature learning and Hash coding study can be effectively integrated in a learning framework by it.In order to be arrived at end The quantization ability (Quantizability) that depth characteristic indicates is promoted in the study framework at end, so that indicate can be with for depth characteristic More effectively quantified, Cao et al., which passes through to quantify to be introduced into, to be used in the end-to-end deep learning framework of cross-module state retrieval, is mentioned Chu Liao collective depth quantization (Collective Deep Quantization, CDQ) method.CDQ method passes through well-designed Hybrid network and loss function are two mode combination learning depth characteristics expressions and quantizer.The hybrid network packet of CDQ method Contain: an image for being used to extract image feature representation being made of multiple convolution-pond (Convolution-Pooling) layers Network, a text network for being used to extract Text Representation being made of multiple full connection (Fully-Connected) layers, Two for generating full connection bottleneck (Fully-Connected Bottleneck) layer of optimal low-dimensional character representation, a use In capturing adaptive intersection entropy loss across Modal Correlation and a collective for controlling Hash quality and quantifying ability Quantization loss.In addition, CDQ method can also learn the shared quantizer code book of mode, can substantially be increased by the code book Relevance between strong two mode.In order to effectively be captured in the framework of deep learning end to end retrieved for cross-module state Constitutive relations between different modalities, Yang et al. propose the depth Hash (Pairwise of pairs of relationship orientation RelationshipGuided Deep Hashing, PRDH) method.PRDH method is from the angle between the angle and mode in mode Degree learns more reflect the Hash coding of constitutive relations between mode by integrating different types of pairs of constraint.In addition, PRDH Method enhances the distinguishing ability that Hash encodes each bit by introducing decorrelation constraint in deep learning framework.

The retrieval of cross-module state Hash needs object being mapped to low-dimensional Hamming space in the high dimensional feature data of different modalities, with Realize that the binary system Hash coding based on Hamming space is completed rapidly and accurately cross-module state information retrieval task.Existing cross-module Most of state Hash search method is the method based on shallow-layer learning structure, although these methods can be based on Hash retrieval technique Retrieval tasks are quickly completed, but the learning structure of shallow-layer dig the authentication information in primitive character can not well Pick.Deep learning technology has shown excellent feature learning ability in such as classification task, object detection task, and And the existing cross-module state Hash search method based on deep learning technology also indicates that deep learning technology for promoting cross-module state The performance of retrieval tasks is beneficial.Therefore, the cross-module state Hash search method based on deep learning technology is designed, for completing Cross-module state retrieval tasks under big data situation have great importance and are worth.

Summary of the invention

Its purpose of the invention, which is that, provides a kind of cross-module state Hash search method based on deep learning, solves existing The cross-module state Hash search method based on shallow-layer learning structure can not excavate the authentication information in primitive character well Problem.

It adopts the technical scheme that achieve the above object, a kind of cross-module state Hash retrieval side based on deep learning Method, it is assumed that the pixel characteristic vector set of the image modalities of n object isWherein, v_iIndicate i-th of object In the pixel characteristic vector of image modalities；It enablesIndicate this n object in the feature vector of text modality, In, t_iIndicate i-th of object in the feature vector of text modality；The category label vector of n object is expressed asWherein, c indicates the quantity of object type；For vector y_iFor, if i-th of object belongs to kth class, Then enable vector y_iK-th of element be 1, otherwise, vector y_iK-th of element be 0；Method includes the following steps:

(1) the binary system Kazakhstan that image modalities and text modality are shared is obtained using based on the objective function that deep learning technology designs Uncommon coding B, the deep neural network parameter θ of image modalities and text modality_vAnd θ_tAnd the throwing of image modalities and text modality Shadow matrix P_vAnd P_t；

(2) known variables B, θ in objective function are solved using the mode alternately updated_v、θ_t、P_vAnd P_t, i.e., alternate to solve such as Lower three subproblems: fixed B, P_vAnd P_t, solve θ_vAnd θ_t；Fixed B, θ_vAnd θ_t, solve P_vAnd P_t；Fixed θ_v、θ_t、P_vAnd P_t, ask Solve B；

(3) based on the deep neural network parameter θ for solving obtained image modalities and text modality_vAnd θ_tAnd projection matrix P_v And P_t, it is query sample and the sample generation binary system Hash coding that sample retrieval is concentrated；

(4) the binary system Hash coding based on generation calculates the Hamming distance that query sample concentrates each sample to sample retrieval；

(5) retrieval to query sample is completed using the cross-module state searcher searched for based on approximate KNN.

Wherein, the objective function form based on deep learning technology design in the step (1) is as follows:

Wherein, γ₁And γ₂For non-negative balance factor, B=[b₁,b₂,…,b_n]T∈{-1,+1}^n×k,WithFor projection matrix, θ_vAnd θ_tFor deep neural network parameter,WithRespectively n right As the depth characteristic in image modalities and text modality, and the vector of the i-th column of matrix F and matrix G is respectively f (v_i；θ_v) and g(t_i；θ_t),It is used to keep the consistency in mode between consistency and mode for Laplacian Matrix, 1 is all members The column vector that element is 1, | | | |_FThe Frobenius norm of representing matrix, the mark of tr () representing matrix, ()^TRepresenting matrix Transposition.

Wherein, the mode that the use in the step (2) alternately updates solves known variables B, θ in objective function_v、θ_t、 P_vAnd P_t, specifically, alternately solving following three subproblem:

(1) fixed B, P_vAnd P_t, solve θ_vAnd θ_t；When fixed binary system Hash encodes B and projection matrix P_vAnd P_tWhen, formula (1) objective function shown in is reduced to about deep neural network parameter θ_vAnd θ_tSubproblem, it may be assumed that

(2) fixed B, θ_vAnd θ_t, solve P_vAnd P_t；When fixed binary system Hash coding B and deep neural network parameter θ_vAnd θ_t When, objective function shown in formula (1) is reduced to about projection matrix P_vAnd P_tSubproblem, it may be assumed that

(3) fixed θ_v、θ_t、P_vAnd P_t, solve B；As constant depth neural network parameter θ_vAnd θ_tAnd projection matrix P_vAnd P_t When, objective function shown in formula (1) is reduced to encode the subproblem of B about binary system Hash, it may be assumed that

When known variables B in solution formula (4), solved using the discrete hash algorithm based on singular value decomposition.

Wherein, being joined in the step (3) based on the deep neural network for solving obtained image modalities and text modality Number θ_vAnd θ_tAnd projection matrix P_vAnd P_t, it is the sample generation binary system Hash coding of query sample and sample retrieval concentration, Specifically, assuming that the feature vector of a query sample of image modalities isOne query sample of text modality Feature vector isImage modalities sample retrieval concentrate sample feature beText modality retrieval The feature of sample is in sample setWherein,Indicate that sample retrieval concentrates the quantity of sample；Image modalities The binary system Hash coding of sample is concentrated to be respectively as follows: with text modality query sample and sample retrieval With Wherein,Sign () is sign function.

Wherein, the binary system Hash coding based on generation in the step (4) calculates query sample to sample retrieval collection In each sample Hamming distance, specifically, using formulaThe query sample of image modalities is calculated to text mould State sample retrieval concentrates the The Hamming distance of a sample；Use formulaCalculate text mould The query sample of state concentrates the to image modalities sample retrievalThe Hamming distance of a sample.

Wherein, the use in the step (5) is completed based on the cross-module state searcher that approximate KNN is searched for inquiry sample This retrieval, specifically, to the Hamming distance being calculated(or) carried out according to sequence from small to large Sequence, then, the corresponding sample of K minimum range is as inspection before text modality (or image modalities) sample retrieval concentration takes Hitch fruit.

Beneficial effect

The present invention has the following advantages that compared with prior art.

1. the method for the present invention can excavate more identify using deep learning structure and believe in the case where keeping retrieval rate Breath, so as to more accurately complete the retrieval of cross-module state；

2. the method for the present invention is by implementing holding consistency strategy between consistency and mode in mode, fully by primitive character sky Between advantageous information remain to Hamming space, promote the excavation of authentication information and the promotion of retrieval performance；

3. the discrete hash algorithm based on singular value decomposition that the method for the present invention is proposed can make the binary system Hash got Coding has more beneficial characteristics, and then the performance of cross-module state Hash retrieval is effectively promoted.

Detailed description of the invention

Fig. 1 is the cross-module state Hash search method work flow diagram proposed by the present invention based on deep learning.

Specific embodiment

Technical solution of the present invention is described in further detail with reference to the accompanying drawing.

The cross-module state Hash search method based on deep learning that the invention discloses a kind of, as shown in Figure 1, being embodied Journey mainly comprises the steps that the pixel characteristic vector set for assuming the image modalities of n object isIts In, v_iIndicate i-th of object in the pixel characteristic vector of image modalities；It enablesIndicate this n object in text The feature vector of mode, wherein t_iIndicate i-th of object in the feature vector of text modality；By the category label of n object to Amount is expressed asWherein, c indicates the quantity of object type；For vector y_iFor, if i-th of object category In kth class, then vector y is enabled_iK-th of element be 1, otherwise, vector y_iK-th of element be 0；

(1) the cross-module state Hash searched targets function building based on deep learning

The purpose of the method for the present invention is the category label of the characteristic V and T and object using image modalities and text modality The hash function of information learning image modalities and text modality, and the hash function obtained using study is generated for completing cross-module The binary system Hash of state Hash retrieval tasks encodes.Directly using the characteristic V and T of image modalities and text modality carry out across The study of mode Hash is unfavorable for the excavation authentication information from primitive character and carrys out the binary system Hash coding that generative nature can be excellent.For Preferably authentication information is excavated from the initial characteristic data of image modalities and text modality, the method for the present invention is directed to figure respectively As mode and text modality data building deep neural network (Deep Neural Network, DNN) carry out depth characteristic It practises.

For image modalities, the method for the present invention uses the convolutional Neural being made of seven layers improved by AlexNet Network (Convolutional Neural Network, CNN) carries out the study of image modalities depth characteristic.Below to this CNN Model describes in detail.

This CNN model for the study of image modalities depth characteristic includes five convolutional layer (Convolution Layer) and two full articulamentums (Fully Connected Layer), it is expressed as " Conv1-Conv5 " and " Fc6- Fc7".The network is using the pixel characteristic of image modalities as input.In this CNN, first convolutional layer Conv1 is big with 96 Small is that the input picture that 11 × 11 × 3 verification sizes are 227 × 227 × 3 is filtered using 4 pixels as step-length.By linear The activation of amending unit (Rectified Linear Unit, ReLU), maximum pond (MAX-pooling) and local acknowledgement are returned After one changes (Local Response Normalization, LRN), the output feature that size is 27 × 27 × 96 is obtained.Second A convolutional layer Conv2 is using the output of first convolutional layer Conv1 as input, the core that Conv2 is 5 × 5 × 96 with 256 sizes Input is filtered.Similarly, after by ReLU, MAX-pooling and LRN, obtaining size is 13 × 13 × 256 Export feature.It is 3 × 3 × 256,3 that third, the 4th and the 5th convolutional layer Conv3, Conv4 and Conv5 have used size respectively × 3 × 384 and 3 × 3 × 384 384,384 and 256 convolution kernels, and every layer is all activated using ReLU.When Conv5 passes through The output feature that size is 6 × 6 × 256 is obtained after MAX-pooling.The neuron number of full articulamentum Fc6 is 4096, and is made Neuron temporarily abandon to prevent over-fitting with 0.5 discarding ratio.Fc7 layers are to contain d^(v)A neuron connects entirely Connect layer, and with tanh (Hyperbolic Tangent, TanH) function as Fc7 layers of activation primitive.Finally, exist It is d that Fc7 layers, which obtain size,^(v)× 1 output feature.

For text modality, the method for the present invention uses the multi-layer perception (MLP) (Multilayer being made of three full articulamentums Perceptron, MLP) it is mapped to construct a MLP deep neural network for the feature of text modality from original feature space Semantic space.Here three full articulamentums in constructed MLP deep neural network are indicated with Fc1, Fc2 and Fc3 respectively. Similar to pertinent literature carry out text modality feature learning when construct MLP deep neural network way, the present invention constructed by The Fc1 layer of MLP deep neural network uses ReLU as nonlinear activation function with Fc2 layers.Then hyperbolic is being used just for Fc3 layers (TanH) function is cut as activation primitive.Fc3 layers of neuron number is d^(t), it may be assumed that for learning text mode depth characteristic The dimension of the output feature of MLP deep neural network is d^(t)。

For i-th of object, enableIndicate the output feature of the CNN of image modalities, wherein θ_vFor figure As the parameter of the CNN of mode；It enablesIndicate the output feature of the MLP deep neural network of text modality, In, θ_tFor the parameter of the MLP deep neural network of text modality.

Assuming that deep learning feature f (v of i-th of object in image modalities and text modality_i；θ_v) and g (t_i；θ_t) pass through Linear projection matrixWithFeature after projection is respectivelyAnd P_t ^Tg(t_i；θ_t), wherein (·)^TThe transposition of representing matrix.It is further assumed that byAnd P_t ^Tg(t_i；θ_t) can generate respectively in Hamming space Binary system Hash codingWithIt is possible to carry out the study of cross-module state Hash by following minimization problem:

Wherein, γ₁And γ₂For non-negative balance factor, the Section 3 of right side of the equal sign is the regular terms for preventing over-fitting, equal sign The effect of the right Section 4 be desirable to Hash coding each be+1 it is equal with -1 probability and for maximize Hash encode Each provided information.

Similitude reflects the neighbor relationships between the data point being made of in each mode feature vector in mode.Image Two data point v of mode_iAnd v_jBetween mode in similitude can be with is defined as:

Wherein,Indicate data point v_iK₁Neighbour collects (k₁- nearest neighbors),Indicate v_iAnd v_jIt Between Euclidean distance, it may be assumed thatIndicate the l of vector₂Norm.σ is for controllingDecaying speed Rate.Similarly, two data point t being made of in text modality two feature vectors_iAnd t_jMode in similitudeDefinition Are as follows:

Wherein,For each mode, in order to make the local neighbor structure of data point in Hamming space and Original feature space is consistent, it may be assumed that obtains each data point and its neighbor relationships in original feature space in Hamming space To holding, following objective function can be designed:

Object-based Classmark information can define the data point v of image modalities_i(i=1,2 ..., n) and text The data point t of mode_jThe semantic association matrix as follows of (j=1,2 ..., n):

As long as it should be understood that v_iAnd t_jAt least belong to an identical classification, then it is assumed that their semantemes having the same.For The consistency between the mode that Hamming space is kept between image modalities and text modality, can design following objective function:

In summary about consistency and mode in the study of image modalities depth characteristic, the study of text modality depth characteristic, mode Between holding consistency analysis, the objective function of the method for the present invention can be designed as:

According to having work, if the semanteme having the same of the data in different modalities space, these different modalities In data often correspond to a public latent space.Therefore, present invention assumes that having in image modalities and text modality There is the feature of identical semanteme that may finally be expressed as identical binary system Hash coding in public Hamming space.That is haveIt sets up.Based on this it is assumed that the optimization problem in formula (7) can indicate are as follows:

Wherein,

By simply deriving,It can be rewritten as following form:

Wherein, B=[b₁,b₂,…,b_n]T∈{-1,+1}^n×k,And matrix F and matrix G's The vector of i-th column is respectively f (v_i；θ_v) and g (t_i；θ_t), | | | |_FThe Frobenius norm of representing matrix.It is rightInto Its available equivalent form of the following derivation of row, it may be assumed that

Wherein,L=D-W is Laplacian Matrix, Indicate i-th of diagonal element of diagonal matrix D, w_ijFor the element on matrix W the i-th row jth column, the mark of tr () representing matrix. According to formula (12) and formula (13), formula (8) can be rewritten are as follows:

(2) solution of objective function

It include five known variables to be solved in objective function shown in formula (14), it may be assumed that binary system Hash encoder matrix B, Linear projection matrix P_vAnd P_t, deep neural network parameter θ_vAnd θ_t.Objective function shown in formula (14) combines this five Known variables together are non-convex, therefore, it is impossible to obtain the analytic solutions of this five known variables simultaneously.In formula (14) Known variables can be solved by alternately solving following three subproblem, it may be assumed that fixed B, P_vAnd P_t, solve θ_vAnd θ_t；It is fixed B、θ_vAnd θ_t, solve P_vAnd P_t；Fixed θ_v、θ_t、P_vAnd P_t, solve B.

(a) fixed B, P_vAnd P_t, solve θ_vAnd θ_t

When fixed binary system Hash encodes B and projection matrix P_vAnd P_tWhen, objective function shown in formula (14) is reduced to close In deep neural network parameter θ_vAnd θ_tSubproblem, it may be assumed that

The present invention learns to update DNN network parameter θ using back-propagating (Back Propagation, BP) algorithm_v.Class Most of existing deep learning methods are similar to, learn θ used here as based on the stochastic gradient descent algorithm of back-propagating_v.It learns Practise θ_vSpecific practice be: each iteration chooses a small quantities of training sample from training sample, is then made using the sample of selection Learn θ with based on the stochastic gradient descent algorithm of back-propagating_v.For each spy of the image modalities of the training sample of selection Levy vector v_i, gradient is calculated using following formula first:

Then, it using chain type rule and has obtainedIt calculatesFinally, calculatedThe DNN network parameter θ of image modalities is updated with BP algorithm_v。

Algorithm 1, which illustrates, solves image modalities DNN network parameter θ_vAlgorithm.

Similarly, the depth nerve net of text modality is updated using the stochastic gradient descent algorithm study based on backpropagation Network parameter θ_t.For each feature vector t of the text modality of the training sample of selection_i, following gradient is calculated first:

Then, using chain type rule and obtained gradientIt calculatesFinally, using calculating It arrivesThe DNN network parameter θ of text modality is updated with BP algorithm_t.It can be learnt using the algorithm similar with algorithm 1 To the DNN network parameter θ of text modality_t。

(b) fixed B, θ_vAnd θ_t, solve P_vAnd P_t

When fixed binary system Hash coding B and deep neural network parameter θ_vAnd θ_tWhen, objective function shown in formula (14) It is reduced to about projection matrix P_vAnd P_tSubproblem, it may be assumed that

For in formula (18)Respectively about P_vAnd P_tIt seeks partial derivative and enables partial derivative equal to 0, available:

It can be obtained by simply deriving:

P_v=(FF^T+I+F11^TF^T)^-1FB, (21)

P_t=(GG^T+I+G11^TG^T)^-1GB, (22)

Wherein, I is unit matrix, ()^-1Representing matrix it is inverse.

(c) fixed θ_v、θ_t、P_vAnd P_t, solve B

As constant depth neural network parameter θ_vAnd θ_tAnd projection matrix P_vAnd P_tWhen, the letter of objective function shown in formula (14) Turn to the subproblem about binary system Hash coding B, it may be assumed that

Formula (23) is simply derived available:

Because of P_v、P_t、θ_vAnd θ_tBe it is fixed, therefore,WithIt is constant.Further, in formula (24) The solution of B will not be had an impact by ignoring this two.In addition, because B ∈ { -1 ,+1 }^n×k, available? With regard to saying,For constant.After constant term in formula (24) is given up, formula (24) conversion are as follows:

Wherein,

Known variables in formula (25) are therefore discrete variable is difficult directly solve to it under normal circumstances To analytic solutions.The invention proposes the discrete hash algorithm based on singular value decomposition come shown in solution formula (25) about discrete The optimization problem of variable B.The discrete hash algorithm based on singular value decomposition is described in detail below.

It is available that singular value decomposition is carried out to matrix LWherein, For Diagonal matrix.It willIt is available to substitute into formula (25):

It enablesWithRespectively indicate matrix B,WithThe i-th row；It enablesWithRespectively indicate matrix B,WithIt is eliminatingWith The matrix that remaining row is constituted later.At this point it is possible to obtain:

Similarly, available:

Wherein,The i-th column of representing matrix Q,Representing matrix Q is in removal q_iRemaining column structure afterwards At matrix.

According to formula (27) and formula (28), unknown binary system Hash encoder matrix B can be by solving such as in formula (26) Under about b_iThe optimization problem of (i=1,2 ..., n) obtains, it may be assumed that

By simply deriving, formula (29) can be converted are as follows:

Optimization problem in formula (30) has following analytic solutions:

Wherein, sign () indicates sign function.

Algorithm 2 illustrates the discrete hash algorithm based on singular value decomposition.

(3) the sample binary system Hash coding of query sample and sample retrieval concentration is generated

Assuming that the feature vector of a query sample of image modalities isThe spy of one query sample of text modality Levying vector isImage modalities sample retrieval concentrate sample feature beText modality retrieves sample The feature of this concentration sample isWherein,Indicate that sample retrieval concentrates the quantity of sample.Using solving The projection matrix P of the image modalities and text modality that arrive_vAnd P_tAnd the deep neural network ginseng of image modalities and text modality Number θ_vAnd θ_t, available image modalities and text modality query sample and sample retrieval concentrate the binary system Hash coding of sample It is respectively as follows: WithWherein,Sign () is sign function.

(4) Hamming distance that query sample concentrates each sample to sample retrieval is calculated

For the query sample of image modalitiesUse formulaCalculate looking into for image modalities Ask sampleSample is concentrated to text modality sample retrieval Hamming distance.Inquiry for text modality SampleUse formulaCalculate the query sample of text modalityIt is retrieved to image modalities Sample in sample setHamming distance.

(5) retrieval to query sample is completed using cross-module state searcher

For the retrieval tasks of image retrieval text, first to being calculatedA Hamming distanceAccording to from Small to be ranked up to big sequence, then, the corresponding sample of K minimum range is as retrieval before taking in text retrieval sample set As a result.Similarly, for the retrieval tasks of text retrieval image, first to being calculatedA Hamming distanceIt is ranked up according to sequence from small to large, then, K minimum range before being taken in image retrieval sample set Corresponding sample is as search result.

Beneficial effects of the present invention are illustrated below in conjunction with specific experiment.

The related experiment implemented for the method for the present invention mainly carries out on 2007 data set of Pascal VOC, first briefly Introduce 2007 data set of Pascal VOC.2007 data set of Pascal VOC includes to belong to 20 classifications (for example, aircraft, bottle Son, horse and sofa etc.) 9963 width images, and each image is marked with label.In an experiment, the method for the present invention will count It is divided into according to collection comprising 5011 image-label pair training sets and comprising 4952 image-label pair test sets.For depth Cross-module state hash method is spent, image modalities use original pixel feature as input feature vector.For using manual feature as input Method, use the GIST features of 512 dimensions as input feature vector.For text modality, use the words-frequency features of 399 dimensions as defeated Enter feature.It is substantially carried out two kinds of cross-module state retrieval tasks in an experiment, it may be assumed that with image retrieval text and with text retrieval image, It is indicated respectively with Img2Txt and Txt2Img.

The present invention uses mean accuracy mean value (Mean Average Precision, MAP) Lai Hengliang cross-module state Hash retrieval side The performance of method.In order to obtain MAP, need to calculate first against each query sample mean accuracy (Average Precision, AP).After obtaining the mean accuracy AP of all query samples, averaging to all mean accuracy AP can be obtained average essence Spend mean value MAP.

The method of the present invention using momentum (Momentum) and weight decay (Weight Decay) be respectively 0.9 and 0.0001 it is small Batch gradient descent algorithm, and batch (Batch) is dimensioned to 128.Use the pre-training on ImageNet data set AlexNet initializes first five layer of image modalities deep neural network in the method for the present invention.For depth in the method for the present invention The other parameters of neural network are initialized by the way of random initializtion.By the depth of image modalities and text modality mind Output intrinsic dimensionality through network is disposed as 1024.In an experiment, joined using 5 folding cross validations to determine in the method for the present invention Number γ₁And γ₂Optimum value.For the parameter in other methods, the parameter setting principle recommended according to each method is joined The average value that reported result is 10 random experiments results is tested in number setting.

It is respectively as follows: semantic dependency with the method that the method for the present invention compares and maximizes (Semantic Correlation Maximization, SCM), have check matrix decompose Hash (Supervised Matrix Factorization Hashing, SMFH) method, depth cross-module state Hash (Deep Cross-Modal Hashing, DCMH) Method and the depth Hash of pairs of relationship orientation (Pairwise Relationship Guided Deep Hashing, PRDH) Method.Table 1 lists the method for the present invention and control methods and carries out the retrieval of cross-module state Hash on 2007 data set of Pascal VOC When mean accuracy mean value MAP.As it can be seen from table 1 for two kinds of retrieval tasks, under three kinds of Hash code lengths, depth Cross-module state Hash search method DCMH, PRDH and the method for the present invention can obtain than shallow-layer cross-module state Hash search method SCM and The better retrieval performance of SMFH.This explanation is using deep learning technology study for generating the depth characteristic of binary system Hash coding It is beneficial.From table 1 it can also be seen that for Img2Txt and Txt2Img retrieval tasks, under three kinds of Hash code lengths, The cross-module state retrieval performance of the method for the present invention is superior to DCMH and PRDH method.This illustrates that the method for the present invention is effective cross-module state Hash search method.

MAP of 1 each method of table on 2007 data set of Pascal VOC

Claims

1. a kind of cross-module state Hash search method based on deep learning, it is assumed thatThe pixel characteristic vector of the image modalities of a object Collection is, whereinIndicate thePixel characteristic vector of a object in image modalities；It enablesIndicate thisFeature vector of a object in text modality, whereinIndicate theA object is in text modality Feature vector；It willThe category label vector of a object is expressed as, whereinIndicate object type Quantity；For vectorFor, if theA object belongs toClass then enables vector?A element is 1, otherwise, vector ?A element is 0；It is characterized in that, method includes the following steps:

(1) the binary system Kazakhstan that image modalities and text modality are shared is obtained using based on the objective function that deep learning technology designs Uncommon coding, the deep neural network parameter of image modalities and text modalityWithAnd the throwing of image modalities and text modality Shadow matrixWith；

(2) known variables in objective function are solved using the mode alternately updated、、、With, i.e., alternately solve as follows Three subproblems: fixed、With, solveWith；It is fixed、With, solveWith；It is fixed、、With, solve；

(3) based on the deep neural network parameter for solving obtained image modalities and text modalityWithAnd projection matrix With, it is query sample and the sample generation binary system Hash coding that sample retrieval is concentrated；

2. a kind of cross-module state Hash search method based on deep learning according to claim 1, which is characterized in that described The objective function form based on deep learning technology design in step (1) is as follows:

, (1)

Wherein,WithFor non-negative balance factor,,With For projection matrix,WithFor deep neural network parameter,WithRespectivelyA object is in image The depth characteristic of mode and text modality, and matrixAnd matrix?The vector of column is respectivelyWith,It is used to keep the consistency in mode between consistency and mode for Laplacian Matrix,For whole members The column vector that element is 1,The Frobenius norm of representing matrix,The mark of representing matrix,The transposition of representing matrix.

3. a kind of cross-module state Hash search method based on deep learning according to claim 1, which is characterized in that described The mode that use in step (2) alternately updates solves the known variables in objective function、、、With, specifically, alternately Ground solves following three subproblem:

(1) fixed、With, solveWith；When fixed binary system Hash codingAnd projection matrixWithWhen, formula (1) Shown in objective function be reduced to about deep neural network parameterWithSubproblem, it may be assumed that

(2)；

(2) fixed、With, solveWith；When fixed binary system Hash codingAnd deep neural network parameterWith When, objective function shown in formula (1) is reduced to about projection matrixWithSubproblem, it may be assumed that

(3)；

(3) fixed、、With, solve；When constant depth neural network parameterWithAnd projection matrixWithWhen, it is public Objective function shown in formula (1) is reduced to encode about binary system HashSubproblem, it may be assumed that

(4)

Known variables in solution formula (4)When, it is solved using the discrete hash algorithm based on singular value decomposition.

4. a kind of cross-module state Hash search method based on deep learning according to claim 1, which is characterized in that described The deep neural network parameter of the image modalities and text modality obtained based on solution in step (3)With, and projection square Battle arrayWith, it is query sample and the sample generation binary system Hash coding that sample retrieval is concentrated, specifically, assuming image modalities The feature vector of a query sample be, the feature vector of a query sample of text modality is, Image modalities sample retrieval concentrate sample feature be, the spy of text modality sample retrieval concentration sample Sign is, whereinIndicate that sample retrieval concentrates the quantity of sample；Image modalities and text modality inquire sample This and sample retrieval concentrate the binary system Hash coding of sample to be respectively as follows:,,With, wherein,For sign function.

5. a kind of cross-module state Hash search method based on deep learning according to claim 1, which is characterized in that described The binary system Hash coding based on generation in step (4) calculates the Hamming distance that query sample concentrates each sample to sample retrieval From specifically, using formulaThe query samples of image modalities is calculated to text modality sample retrieval concentration the() a sample Hamming distance；Use formulaCalculate text modality query sample to scheme As mode sample retrieval concentrates the() a sample Hamming distance.

6. a kind of cross-module state Hash search method based on deep learning according to claim 1, which is characterized in that described Use in step (5) completes the retrieval to query sample based on the cross-module state searcher that approximate KNN is searched for, specifically, right The Hamming distance being calculated(or) be ranked up according to sequence from small to large, then, in text Before this mode (or image modalities) sample retrieval concentration takesThe corresponding sample of a minimum range is as search result.