CN110019652A - A kind of cross-module state Hash search method based on deep learning - Google Patents

A kind of cross-module state Hash search method based on deep learning Download PDF

Info

Publication number
CN110019652A
CN110019652A CN201910196009.7A CN201910196009A CN110019652A CN 110019652 A CN110019652 A CN 110019652A CN 201910196009 A CN201910196009 A CN 201910196009A CN 110019652 A CN110019652 A CN 110019652A
Authority
CN
China
Prior art keywords
sample
retrieval
image modalities
cross
hash
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910196009.7A
Other languages
Chinese (zh)
Other versions
CN110019652B (en
Inventor
董西伟
邓安远
周军
杨茂保
孙丽
胡芳
贾海英
王海霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiujiang University
Original Assignee
Jiujiang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiujiang University filed Critical Jiujiang University
Priority to CN201910196009.7A priority Critical patent/CN110019652B/en
Publication of CN110019652A publication Critical patent/CN110019652A/en
Application granted granted Critical
Publication of CN110019652B publication Critical patent/CN110019652B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/325Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of cross-module state Hash search method based on deep learning, it is assumed thatThe pixel characteristic vector set of the image modalities of a object is, it is characterized in that method includes the following steps: (1) obtains the binary system Hash coding that image modalities and text modality are shared using based on the objective function that depth learning technology designs, the deep neural network parameter of image modalities and text modalityWithAnd the projection matrix of image modalities and text modalityWith;(2) known variables in objective function are solved using the mode alternately updatedWith;(3) based on the deep neural network parameter for solving obtained image modalities and text modalityWithAnd projection matrixWith;(4) the binary system Hash coding based on generation calculates the Hamming distance that query sample concentrates each sample to sample retrieval;(5) retrieval to query sample is completed using the cross-module state searcher searched for based on approximate KNN.The performance of cross-module state Hash retrieval is effectively promoted in this method.

Description

A kind of cross-module state Hash search method based on deep learning
Technical field
The cross-module state Hash search method based on deep learning that the present invention relates to a kind of.
Background technique
Along with the fast development of science and technology and social productive forces, big data era is come quietly.So-called big data is Finger is not available the data acquisition system that conventional software tool is captured, managed and handled within the regular hour.IBM is mentioned Big data has the characteristics that 5V out, it may be assumed that Volume (data volume is big), Variety (type and source diversification), Value (data valence Be worth density it is relatively low, and it is sometimes but precious), Velocity (data growth rate is fast), the Veracity (matter of data Amount).Big data is it is also assumed that be to need new tupe that could have stronger decision edge, see clearly discovery power and process optimization The information assets of ability.
Information retrieval is an importance of data processing, and faces big data, how information retrieval is effectively performed The urgently to be resolved and very challenging problem as big data era.Large-scale data is retrieved, Hash retrieval side Method plays important role.The high dimensional feature of object is mapped in Hamming space by Hash search method, generates a low-dimensional Hash encode to indicate an object, it reduces requirement of the searching system to calculator memory space, improve retrieval speed Degree can better adapt to the requirement of magnanimity retrieval.The main thought of Hash retrieval is that the data projection that high dimension vector is indicated arrives Hamming space carries out the retrieval of k nearest neighbor (K >=1) in Hamming space.In order to make the k nearest neighbor and luv space in Hamming space It is consistent, Hash learning algorithm needs to meet local retention performance, that is, keeps the similitude before and after data projection.Part is quick Sense Hash (Locality Sensitive Hashing, LSH) method can make the two o'clock that distance is close in higher dimensional space, pass through After hash function is crossed to this two o'clock progress Hash coding, it is the same that their Hash coding, which has very big probability, conversely, if two Farther out, then their Hash encodes identical probability meeting very little to the distance between point.
The retrieval of cross-module state Hash is mainly used for solving the mutual search problem between different modalities data, for example, using image Retrieve text or with text retrieval image etc..Cross-module state Hash search method needs to carry out Hash to the data of different modalities Coding generates compact binary system Hash coding, the phase being then based between the Hash coding completion different modalities data of generation Mutually retrieval.Ding et al. propose collective's matrix decomposition Hash (Collective Matrix FactorizationHashing, CMFH) method.CMFH method can use collective's matrix decomposition and learn unified Hash coding from the different modalities of each example. In order to which classification information is efficiently used in the cross-module state hash method based on matrix decomposition and keeps local geometry, in turn Achieve the purpose that effectively to promote the potential applications characteristic differentiation ability obtained by matrix decomposition, Tang et al. has proposed supervision square Battle array decomposes Hash (Supervised Matrix Factorization Hashing, SMFH) method.SMFH method is being breathed out When uncommon coding study, the consistency of mark information between mode is not only considered, it is also contemplated that the one of local geometry inside mode Cause property.For the problem that not rare supervision cross-module state hash method training time complexity is excessively high, Zhang et al. is proposed referred to as Semantic dependency maximizes having for (Semantic Correlation Maximization, SCM) and supervises cross-module state Hash side Method.Semantic marker information can be seamlessly integrated into Hash learning process by SCM method.
Craft feature used in the above-mentioned cross-module state hash algorithm based on shallow-layer learning structure may not be able to be with Hash Coding study reaches optimal compatibility.In order to solve this problem, Jiang et al. proposes depth cross-module state Hash (Deep Cross-modal Hashing, DCMH) method.DCMH method is a kind of cross-module state end to end based on deep learning framework Hash method, feature learning and Hash coding study can be effectively integrated in a learning framework by it.In order to be arrived at end The quantization ability (Quantizability) that depth characteristic indicates is promoted in the study framework at end, so that indicate can be with for depth characteristic More effectively quantified, Cao et al., which passes through to quantify to be introduced into, to be used in the end-to-end deep learning framework of cross-module state retrieval, is mentioned Chu Liao collective depth quantization (Collective Deep Quantization, CDQ) method.CDQ method passes through well-designed Hybrid network and loss function are two mode combination learning depth characteristics expressions and quantizer.The hybrid network packet of CDQ method Contain: an image for being used to extract image feature representation being made of multiple convolution-pond (Convolution-Pooling) layers Network, a text network for being used to extract Text Representation being made of multiple full connection (Fully-Connected) layers, Two for generating full connection bottleneck (Fully-Connected Bottleneck) layer of optimal low-dimensional character representation, a use In capturing adaptive intersection entropy loss across Modal Correlation and a collective for controlling Hash quality and quantifying ability Quantization loss.In addition, CDQ method can also learn the shared quantizer code book of mode, can substantially be increased by the code book Relevance between strong two mode.In order to effectively be captured in the framework of deep learning end to end retrieved for cross-module state Constitutive relations between different modalities, Yang et al. propose the depth Hash (Pairwise of pairs of relationship orientation RelationshipGuided Deep Hashing, PRDH) method.PRDH method is from the angle between the angle and mode in mode Degree learns more reflect the Hash coding of constitutive relations between mode by integrating different types of pairs of constraint.In addition, PRDH Method enhances the distinguishing ability that Hash encodes each bit by introducing decorrelation constraint in deep learning framework.
The retrieval of cross-module state Hash needs object being mapped to low-dimensional Hamming space in the high dimensional feature data of different modalities, with Realize that the binary system Hash coding based on Hamming space is completed rapidly and accurately cross-module state information retrieval task.Existing cross-module Most of state Hash search method is the method based on shallow-layer learning structure, although these methods can be based on Hash retrieval technique Retrieval tasks are quickly completed, but the learning structure of shallow-layer dig the authentication information in primitive character can not well Pick.Deep learning technology has shown excellent feature learning ability in such as classification task, object detection task, and And the existing cross-module state Hash search method based on deep learning technology also indicates that deep learning technology for promoting cross-module state The performance of retrieval tasks is beneficial.Therefore, the cross-module state Hash search method based on deep learning technology is designed, for completing Cross-module state retrieval tasks under big data situation have great importance and are worth.
Summary of the invention
Its purpose of the invention, which is that, provides a kind of cross-module state Hash search method based on deep learning, solves existing The cross-module state Hash search method based on shallow-layer learning structure can not excavate the authentication information in primitive character well Problem.
It adopts the technical scheme that achieve the above object, a kind of cross-module state Hash retrieval side based on deep learning Method, it is assumed that the pixel characteristic vector set of the image modalities of n object isWherein, viIndicate i-th of object In the pixel characteristic vector of image modalities;It enablesIndicate this n object in the feature vector of text modality, In, tiIndicate i-th of object in the feature vector of text modality;The category label vector of n object is expressed asWherein, c indicates the quantity of object type;For vector yiFor, if i-th of object belongs to kth class, Then enable vector yiK-th of element be 1, otherwise, vector yiK-th of element be 0;Method includes the following steps:
(1) the binary system Kazakhstan that image modalities and text modality are shared is obtained using based on the objective function that deep learning technology designs Uncommon coding B, the deep neural network parameter θ of image modalities and text modalityvAnd θtAnd the throwing of image modalities and text modality Shadow matrix PvAnd Pt
(2) known variables B, θ in objective function are solved using the mode alternately updatedv、θt、PvAnd Pt, i.e., alternate to solve such as Lower three subproblems: fixed B, PvAnd Pt, solve θvAnd θt;Fixed B, θvAnd θt, solve PvAnd Pt;Fixed θv、θt、PvAnd Pt, ask Solve B;
(3) based on the deep neural network parameter θ for solving obtained image modalities and text modalityvAnd θtAnd projection matrix Pv And Pt, it is query sample and the sample generation binary system Hash coding that sample retrieval is concentrated;
(4) the binary system Hash coding based on generation calculates the Hamming distance that query sample concentrates each sample to sample retrieval;
(5) retrieval to query sample is completed using the cross-module state searcher searched for based on approximate KNN.
Wherein, the objective function form based on deep learning technology design in the step (1) is as follows:
Wherein, γ1And γ2For non-negative balance factor, B=[b1,b2,…,bn]T∈{-1,+1}n×k,WithFor projection matrix, θvAnd θtFor deep neural network parameter,WithRespectively n right As the depth characteristic in image modalities and text modality, and the vector of the i-th column of matrix F and matrix G is respectively f (vi;θv) and g(ti;θt),It is used to keep the consistency in mode between consistency and mode for Laplacian Matrix, 1 is all members The column vector that element is 1, | | | |FThe Frobenius norm of representing matrix, the mark of tr () representing matrix, ()TRepresenting matrix Transposition.
Wherein, the mode that the use in the step (2) alternately updates solves known variables B, θ in objective functionv、θt、 PvAnd Pt, specifically, alternately solving following three subproblem:
(1) fixed B, PvAnd Pt, solve θvAnd θt;When fixed binary system Hash encodes B and projection matrix PvAnd PtWhen, formula (1) objective function shown in is reduced to about deep neural network parameter θvAnd θtSubproblem, it may be assumed that
(2) fixed B, θvAnd θt, solve PvAnd Pt;When fixed binary system Hash coding B and deep neural network parameter θvAnd θt When, objective function shown in formula (1) is reduced to about projection matrix PvAnd PtSubproblem, it may be assumed that
(3) fixed θv、θt、PvAnd Pt, solve B;As constant depth neural network parameter θvAnd θtAnd projection matrix PvAnd Pt When, objective function shown in formula (1) is reduced to encode the subproblem of B about binary system Hash, it may be assumed that
When known variables B in solution formula (4), solved using the discrete hash algorithm based on singular value decomposition.
Wherein, being joined in the step (3) based on the deep neural network for solving obtained image modalities and text modality Number θvAnd θtAnd projection matrix PvAnd Pt, it is the sample generation binary system Hash coding of query sample and sample retrieval concentration, Specifically, assuming that the feature vector of a query sample of image modalities isOne query sample of text modality Feature vector isImage modalities sample retrieval concentrate sample feature beText modality retrieval The feature of sample is in sample setWherein,Indicate that sample retrieval concentrates the quantity of sample;Image modalities The binary system Hash coding of sample is concentrated to be respectively as follows: with text modality query sample and sample retrieval With Wherein,Sign () is sign function.
Wherein, the binary system Hash coding based on generation in the step (4) calculates query sample to sample retrieval collection In each sample Hamming distance, specifically, using formulaThe query sample of image modalities is calculated to text mould State sample retrieval concentrates the The Hamming distance of a sample;Use formulaCalculate text mould The query sample of state concentrates the to image modalities sample retrievalThe Hamming distance of a sample.
Wherein, the use in the step (5) is completed based on the cross-module state searcher that approximate KNN is searched for inquiry sample This retrieval, specifically, to the Hamming distance being calculated(or) carried out according to sequence from small to large Sequence, then, the corresponding sample of K minimum range is as inspection before text modality (or image modalities) sample retrieval concentration takes Hitch fruit.
Beneficial effect
The present invention has the following advantages that compared with prior art.
1. the method for the present invention can excavate more identify using deep learning structure and believe in the case where keeping retrieval rate Breath, so as to more accurately complete the retrieval of cross-module state;
2. the method for the present invention is by implementing holding consistency strategy between consistency and mode in mode, fully by primitive character sky Between advantageous information remain to Hamming space, promote the excavation of authentication information and the promotion of retrieval performance;
3. the discrete hash algorithm based on singular value decomposition that the method for the present invention is proposed can make the binary system Hash got Coding has more beneficial characteristics, and then the performance of cross-module state Hash retrieval is effectively promoted.
Detailed description of the invention
Fig. 1 is the cross-module state Hash search method work flow diagram proposed by the present invention based on deep learning.
Specific embodiment
Technical solution of the present invention is described in further detail with reference to the accompanying drawing.
The cross-module state Hash search method based on deep learning that the invention discloses a kind of, as shown in Figure 1, being embodied Journey mainly comprises the steps that the pixel characteristic vector set for assuming the image modalities of n object isIts In, viIndicate i-th of object in the pixel characteristic vector of image modalities;It enablesIndicate this n object in text The feature vector of mode, wherein tiIndicate i-th of object in the feature vector of text modality;By the category label of n object to Amount is expressed asWherein, c indicates the quantity of object type;For vector yiFor, if i-th of object category In kth class, then vector y is enablediK-th of element be 1, otherwise, vector yiK-th of element be 0;
(1) the cross-module state Hash searched targets function building based on deep learning
The purpose of the method for the present invention is the category label of the characteristic V and T and object using image modalities and text modality The hash function of information learning image modalities and text modality, and the hash function obtained using study is generated for completing cross-module The binary system Hash of state Hash retrieval tasks encodes.Directly using the characteristic V and T of image modalities and text modality carry out across The study of mode Hash is unfavorable for the excavation authentication information from primitive character and carrys out the binary system Hash coding that generative nature can be excellent.For Preferably authentication information is excavated from the initial characteristic data of image modalities and text modality, the method for the present invention is directed to figure respectively As mode and text modality data building deep neural network (Deep Neural Network, DNN) carry out depth characteristic It practises.
For image modalities, the method for the present invention uses the convolutional Neural being made of seven layers improved by AlexNet Network (Convolutional Neural Network, CNN) carries out the study of image modalities depth characteristic.Below to this CNN Model describes in detail.
This CNN model for the study of image modalities depth characteristic includes five convolutional layer (Convolution Layer) and two full articulamentums (Fully Connected Layer), it is expressed as " Conv1-Conv5 " and " Fc6- Fc7".The network is using the pixel characteristic of image modalities as input.In this CNN, first convolutional layer Conv1 is big with 96 Small is that the input picture that 11 × 11 × 3 verification sizes are 227 × 227 × 3 is filtered using 4 pixels as step-length.By linear The activation of amending unit (Rectified Linear Unit, ReLU), maximum pond (MAX-pooling) and local acknowledgement are returned After one changes (Local Response Normalization, LRN), the output feature that size is 27 × 27 × 96 is obtained.Second A convolutional layer Conv2 is using the output of first convolutional layer Conv1 as input, the core that Conv2 is 5 × 5 × 96 with 256 sizes Input is filtered.Similarly, after by ReLU, MAX-pooling and LRN, obtaining size is 13 × 13 × 256 Export feature.It is 3 × 3 × 256,3 that third, the 4th and the 5th convolutional layer Conv3, Conv4 and Conv5 have used size respectively × 3 × 384 and 3 × 3 × 384 384,384 and 256 convolution kernels, and every layer is all activated using ReLU.When Conv5 passes through The output feature that size is 6 × 6 × 256 is obtained after MAX-pooling.The neuron number of full articulamentum Fc6 is 4096, and is made Neuron temporarily abandon to prevent over-fitting with 0.5 discarding ratio.Fc7 layers are to contain d(v)A neuron connects entirely Connect layer, and with tanh (Hyperbolic Tangent, TanH) function as Fc7 layers of activation primitive.Finally, exist It is d that Fc7 layers, which obtain size,(v)× 1 output feature.
For text modality, the method for the present invention uses the multi-layer perception (MLP) (Multilayer being made of three full articulamentums Perceptron, MLP) it is mapped to construct a MLP deep neural network for the feature of text modality from original feature space Semantic space.Here three full articulamentums in constructed MLP deep neural network are indicated with Fc1, Fc2 and Fc3 respectively. Similar to pertinent literature carry out text modality feature learning when construct MLP deep neural network way, the present invention constructed by The Fc1 layer of MLP deep neural network uses ReLU as nonlinear activation function with Fc2 layers.Then hyperbolic is being used just for Fc3 layers (TanH) function is cut as activation primitive.Fc3 layers of neuron number is d(t), it may be assumed that for learning text mode depth characteristic The dimension of the output feature of MLP deep neural network is d(t)
For i-th of object, enableIndicate the output feature of the CNN of image modalities, wherein θvFor figure As the parameter of the CNN of mode;It enablesIndicate the output feature of the MLP deep neural network of text modality, In, θtFor the parameter of the MLP deep neural network of text modality.
Assuming that deep learning feature f (v of i-th of object in image modalities and text modalityi;θv) and g (ti;θt) pass through Linear projection matrixWithFeature after projection is respectivelyAnd Pt Tg(ti;θt), wherein (·)TThe transposition of representing matrix.It is further assumed that byAnd Pt Tg(ti;θt) can generate respectively in Hamming space Binary system Hash codingWithIt is possible to carry out the study of cross-module state Hash by following minimization problem:
Wherein, γ1And γ2For non-negative balance factor, the Section 3 of right side of the equal sign is the regular terms for preventing over-fitting, equal sign The effect of the right Section 4 be desirable to Hash coding each be+1 it is equal with -1 probability and for maximize Hash encode Each provided information.
Similitude reflects the neighbor relationships between the data point being made of in each mode feature vector in mode.Image Two data point v of modeiAnd vjBetween mode in similitude can be with is defined as:
Wherein,Indicate data point viK1Neighbour collects (k1- nearest neighbors),Indicate viAnd vjIt Between Euclidean distance, it may be assumed thatIndicate the l of vector2Norm.σ is for controllingDecaying speed Rate.Similarly, two data point t being made of in text modality two feature vectorsiAnd tjMode in similitudeDefinition Are as follows:
Wherein,For each mode, in order to make the local neighbor structure of data point in Hamming space and Original feature space is consistent, it may be assumed that obtains each data point and its neighbor relationships in original feature space in Hamming space To holding, following objective function can be designed:
Object-based Classmark information can define the data point v of image modalitiesi(i=1,2 ..., n) and text The data point t of modejThe semantic association matrix as follows of (j=1,2 ..., n):
As long as it should be understood that viAnd tjAt least belong to an identical classification, then it is assumed that their semantemes having the same.For The consistency between the mode that Hamming space is kept between image modalities and text modality, can design following objective function:
In summary about consistency and mode in the study of image modalities depth characteristic, the study of text modality depth characteristic, mode Between holding consistency analysis, the objective function of the method for the present invention can be designed as:
According to having work, if the semanteme having the same of the data in different modalities space, these different modalities In data often correspond to a public latent space.Therefore, present invention assumes that having in image modalities and text modality There is the feature of identical semanteme that may finally be expressed as identical binary system Hash coding in public Hamming space.That is haveIt sets up.Based on this it is assumed that the optimization problem in formula (7) can indicate are as follows:
Wherein,
By simply deriving,It can be rewritten as following form:
Wherein, B=[b1,b2,…,bn]T∈{-1,+1}n×k,And matrix F and matrix G's The vector of i-th column is respectively f (vi;θv) and g (ti;θt), | | | |FThe Frobenius norm of representing matrix.It is rightInto Its available equivalent form of the following derivation of row, it may be assumed that
Wherein,L=D-W is Laplacian Matrix, Indicate i-th of diagonal element of diagonal matrix D, wijFor the element on matrix W the i-th row jth column, the mark of tr () representing matrix. According to formula (12) and formula (13), formula (8) can be rewritten are as follows:
(2) solution of objective function
It include five known variables to be solved in objective function shown in formula (14), it may be assumed that binary system Hash encoder matrix B, Linear projection matrix PvAnd Pt, deep neural network parameter θvAnd θt.Objective function shown in formula (14) combines this five Known variables together are non-convex, therefore, it is impossible to obtain the analytic solutions of this five known variables simultaneously.In formula (14) Known variables can be solved by alternately solving following three subproblem, it may be assumed that fixed B, PvAnd Pt, solve θvAnd θt;It is fixed B、θvAnd θt, solve PvAnd Pt;Fixed θv、θt、PvAnd Pt, solve B.
(a) fixed B, PvAnd Pt, solve θvAnd θt
When fixed binary system Hash encodes B and projection matrix PvAnd PtWhen, objective function shown in formula (14) is reduced to close In deep neural network parameter θvAnd θtSubproblem, it may be assumed that
The present invention learns to update DNN network parameter θ using back-propagating (Back Propagation, BP) algorithmv.Class Most of existing deep learning methods are similar to, learn θ used here as based on the stochastic gradient descent algorithm of back-propagatingv.It learns Practise θvSpecific practice be: each iteration chooses a small quantities of training sample from training sample, is then made using the sample of selection Learn θ with based on the stochastic gradient descent algorithm of back-propagatingv.For each spy of the image modalities of the training sample of selection Levy vector vi, gradient is calculated using following formula first:
Then, it using chain type rule and has obtainedIt calculatesFinally, calculatedThe DNN network parameter θ of image modalities is updated with BP algorithmv
Algorithm 1, which illustrates, solves image modalities DNN network parameter θvAlgorithm.
Similarly, the depth nerve net of text modality is updated using the stochastic gradient descent algorithm study based on backpropagation Network parameter θt.For each feature vector t of the text modality of the training sample of selectioni, following gradient is calculated first:
Then, using chain type rule and obtained gradientIt calculatesFinally, using calculating It arrivesThe DNN network parameter θ of text modality is updated with BP algorithmt.It can be learnt using the algorithm similar with algorithm 1 To the DNN network parameter θ of text modalityt
(b) fixed B, θvAnd θt, solve PvAnd Pt
When fixed binary system Hash coding B and deep neural network parameter θvAnd θtWhen, objective function shown in formula (14) It is reduced to about projection matrix PvAnd PtSubproblem, it may be assumed that
For in formula (18)Respectively about PvAnd PtIt seeks partial derivative and enables partial derivative equal to 0, available:
It can be obtained by simply deriving:
Pv=(FFT+I+F11TFT)-1FB, (21)
Pt=(GGT+I+G11TGT)-1GB, (22)
Wherein, I is unit matrix, ()-1Representing matrix it is inverse.
(c) fixed θv、θt、PvAnd Pt, solve B
As constant depth neural network parameter θvAnd θtAnd projection matrix PvAnd PtWhen, the letter of objective function shown in formula (14) Turn to the subproblem about binary system Hash coding B, it may be assumed that
Formula (23) is simply derived available:
Because of Pv、Pt、θvAnd θtBe it is fixed, therefore,WithIt is constant.Further, in formula (24) The solution of B will not be had an impact by ignoring this two.In addition, because B ∈ { -1 ,+1 }n×k, available? With regard to saying,For constant.After constant term in formula (24) is given up, formula (24) conversion are as follows:
Wherein,
Known variables in formula (25) are therefore discrete variable is difficult directly solve to it under normal circumstances To analytic solutions.The invention proposes the discrete hash algorithm based on singular value decomposition come shown in solution formula (25) about discrete The optimization problem of variable B.The discrete hash algorithm based on singular value decomposition is described in detail below.
It is available that singular value decomposition is carried out to matrix LWherein, For Diagonal matrix.It willIt is available to substitute into formula (25):
It enablesWithRespectively indicate matrix B,WithThe i-th row;It enablesWithRespectively indicate matrix B,WithIt is eliminatingWith The matrix that remaining row is constituted later.At this point it is possible to obtain:
Similarly, available:
Wherein,The i-th column of representing matrix Q,Representing matrix Q is in removal qiRemaining column structure afterwards At matrix.
According to formula (27) and formula (28), unknown binary system Hash encoder matrix B can be by solving such as in formula (26) Under about biThe optimization problem of (i=1,2 ..., n) obtains, it may be assumed that
By simply deriving, formula (29) can be converted are as follows:
Optimization problem in formula (30) has following analytic solutions:
Wherein, sign () indicates sign function.
Algorithm 2 illustrates the discrete hash algorithm based on singular value decomposition.
(3) the sample binary system Hash coding of query sample and sample retrieval concentration is generated
Assuming that the feature vector of a query sample of image modalities isThe spy of one query sample of text modality Levying vector isImage modalities sample retrieval concentrate sample feature beText modality retrieves sample The feature of this concentration sample isWherein,Indicate that sample retrieval concentrates the quantity of sample.Using solving The projection matrix P of the image modalities and text modality that arrivevAnd PtAnd the deep neural network ginseng of image modalities and text modality Number θvAnd θt, available image modalities and text modality query sample and sample retrieval concentrate the binary system Hash coding of sample It is respectively as follows: WithWherein,Sign () is sign function.
(4) Hamming distance that query sample concentrates each sample to sample retrieval is calculated
For the query sample of image modalitiesUse formulaCalculate looking into for image modalities Ask sampleSample is concentrated to text modality sample retrieval Hamming distance.Inquiry for text modality SampleUse formulaCalculate the query sample of text modalityIt is retrieved to image modalities Sample in sample setHamming distance.
(5) retrieval to query sample is completed using cross-module state searcher
For the retrieval tasks of image retrieval text, first to being calculatedA Hamming distanceAccording to from Small to be ranked up to big sequence, then, the corresponding sample of K minimum range is as retrieval before taking in text retrieval sample set As a result.Similarly, for the retrieval tasks of text retrieval image, first to being calculatedA Hamming distanceIt is ranked up according to sequence from small to large, then, K minimum range before being taken in image retrieval sample set Corresponding sample is as search result.
Beneficial effects of the present invention are illustrated below in conjunction with specific experiment.
The related experiment implemented for the method for the present invention mainly carries out on 2007 data set of Pascal VOC, first briefly Introduce 2007 data set of Pascal VOC.2007 data set of Pascal VOC includes to belong to 20 classifications (for example, aircraft, bottle Son, horse and sofa etc.) 9963 width images, and each image is marked with label.In an experiment, the method for the present invention will count It is divided into according to collection comprising 5011 image-label pair training sets and comprising 4952 image-label pair test sets.For depth Cross-module state hash method is spent, image modalities use original pixel feature as input feature vector.For using manual feature as input Method, use the GIST features of 512 dimensions as input feature vector.For text modality, use the words-frequency features of 399 dimensions as defeated Enter feature.It is substantially carried out two kinds of cross-module state retrieval tasks in an experiment, it may be assumed that with image retrieval text and with text retrieval image, It is indicated respectively with Img2Txt and Txt2Img.
The present invention uses mean accuracy mean value (Mean Average Precision, MAP) Lai Hengliang cross-module state Hash retrieval side The performance of method.In order to obtain MAP, need to calculate first against each query sample mean accuracy (Average Precision, AP).After obtaining the mean accuracy AP of all query samples, averaging to all mean accuracy AP can be obtained average essence Spend mean value MAP.
The method of the present invention using momentum (Momentum) and weight decay (Weight Decay) be respectively 0.9 and 0.0001 it is small Batch gradient descent algorithm, and batch (Batch) is dimensioned to 128.Use the pre-training on ImageNet data set AlexNet initializes first five layer of image modalities deep neural network in the method for the present invention.For depth in the method for the present invention The other parameters of neural network are initialized by the way of random initializtion.By the depth of image modalities and text modality mind Output intrinsic dimensionality through network is disposed as 1024.In an experiment, joined using 5 folding cross validations to determine in the method for the present invention Number γ1And γ2Optimum value.For the parameter in other methods, the parameter setting principle recommended according to each method is joined The average value that reported result is 10 random experiments results is tested in number setting.
It is respectively as follows: semantic dependency with the method that the method for the present invention compares and maximizes (Semantic Correlation Maximization, SCM), have check matrix decompose Hash (Supervised Matrix Factorization Hashing, SMFH) method, depth cross-module state Hash (Deep Cross-Modal Hashing, DCMH) Method and the depth Hash of pairs of relationship orientation (Pairwise Relationship Guided Deep Hashing, PRDH) Method.Table 1 lists the method for the present invention and control methods and carries out the retrieval of cross-module state Hash on 2007 data set of Pascal VOC When mean accuracy mean value MAP.As it can be seen from table 1 for two kinds of retrieval tasks, under three kinds of Hash code lengths, depth Cross-module state Hash search method DCMH, PRDH and the method for the present invention can obtain than shallow-layer cross-module state Hash search method SCM and The better retrieval performance of SMFH.This explanation is using deep learning technology study for generating the depth characteristic of binary system Hash coding It is beneficial.From table 1 it can also be seen that for Img2Txt and Txt2Img retrieval tasks, under three kinds of Hash code lengths, The cross-module state retrieval performance of the method for the present invention is superior to DCMH and PRDH method.This illustrates that the method for the present invention is effective cross-module state Hash search method.
MAP of 1 each method of table on 2007 data set of Pascal VOC

Claims (6)

1. a kind of cross-module state Hash search method based on deep learning, it is assumed thatThe pixel characteristic vector of the image modalities of a object Collection is, whereinIndicate thePixel characteristic vector of a object in image modalities;It enablesIndicate thisFeature vector of a object in text modality, whereinIndicate theA object is in text modality Feature vector;It willThe category label vector of a object is expressed as, whereinIndicate object type Quantity;For vectorFor, if theA object belongs toClass then enables vector?A element is 1, otherwise, vector ?A element is 0;It is characterized in that, method includes the following steps:
(1) the binary system Kazakhstan that image modalities and text modality are shared is obtained using based on the objective function that deep learning technology designs Uncommon coding, the deep neural network parameter of image modalities and text modalityWithAnd the throwing of image modalities and text modality Shadow matrixWith
(2) known variables in objective function are solved using the mode alternately updatedWith, i.e., alternately solve as follows Three subproblems: fixedWith, solveWith;It is fixedWith, solveWith;It is fixedWith, solve
(3) based on the deep neural network parameter for solving obtained image modalities and text modalityWithAnd projection matrix With, it is query sample and the sample generation binary system Hash coding that sample retrieval is concentrated;
(4) the binary system Hash coding based on generation calculates the Hamming distance that query sample concentrates each sample to sample retrieval;
(5) retrieval to query sample is completed using the cross-module state searcher searched for based on approximate KNN.
2. a kind of cross-module state Hash search method based on deep learning according to claim 1, which is characterized in that described The objective function form based on deep learning technology design in step (1) is as follows:
, (1)
Wherein,WithFor non-negative balance factor,,With For projection matrix,WithFor deep neural network parameter,WithRespectivelyA object is in image The depth characteristic of mode and text modality, and matrixAnd matrix?The vector of column is respectivelyWith,It is used to keep the consistency in mode between consistency and mode for Laplacian Matrix,For whole members The column vector that element is 1,The Frobenius norm of representing matrix,The mark of representing matrix,The transposition of representing matrix.
3. a kind of cross-module state Hash search method based on deep learning according to claim 1, which is characterized in that described The mode that use in step (2) alternately updates solves the known variables in objective functionWith, specifically, alternately Ground solves following three subproblem:
(1) fixedWith, solveWith;When fixed binary system Hash codingAnd projection matrixWithWhen, formula (1) Shown in objective function be reduced to about deep neural network parameterWithSubproblem, it may be assumed that
(2);
(2) fixedWith, solveWith;When fixed binary system Hash codingAnd deep neural network parameterWith When, objective function shown in formula (1) is reduced to about projection matrixWithSubproblem, it may be assumed that
(3);
(3) fixedWith, solve;When constant depth neural network parameterWithAnd projection matrixWithWhen, it is public Objective function shown in formula (1) is reduced to encode about binary system HashSubproblem, it may be assumed that
(4)
Known variables in solution formula (4)When, it is solved using the discrete hash algorithm based on singular value decomposition.
4. a kind of cross-module state Hash search method based on deep learning according to claim 1, which is characterized in that described The deep neural network parameter of the image modalities and text modality obtained based on solution in step (3)With, and projection square Battle arrayWith, it is query sample and the sample generation binary system Hash coding that sample retrieval is concentrated, specifically, assuming image modalities The feature vector of a query sample be, the feature vector of a query sample of text modality is, Image modalities sample retrieval concentrate sample feature be, the spy of text modality sample retrieval concentration sample Sign is, whereinIndicate that sample retrieval concentrates the quantity of sample;Image modalities and text modality inquire sample This and sample retrieval concentrate the binary system Hash coding of sample to be respectively as follows:,,With, wherein,For sign function.
5. a kind of cross-module state Hash search method based on deep learning according to claim 1, which is characterized in that described The binary system Hash coding based on generation in step (4) calculates the Hamming distance that query sample concentrates each sample to sample retrieval From specifically, using formulaThe query samples of image modalities is calculated to text modality sample retrieval concentration the() a sample Hamming distance;Use formulaCalculate text modality query sample to scheme As mode sample retrieval concentrates the() a sample Hamming distance.
6. a kind of cross-module state Hash search method based on deep learning according to claim 1, which is characterized in that described Use in step (5) completes the retrieval to query sample based on the cross-module state searcher that approximate KNN is searched for, specifically, right The Hamming distance being calculated(or) be ranked up according to sequence from small to large, then, in text Before this mode (or image modalities) sample retrieval concentration takesThe corresponding sample of a minimum range is as search result.
CN201910196009.7A 2019-03-14 2019-03-14 Cross-modal Hash retrieval method based on deep learning Active CN110019652B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910196009.7A CN110019652B (en) 2019-03-14 2019-03-14 Cross-modal Hash retrieval method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910196009.7A CN110019652B (en) 2019-03-14 2019-03-14 Cross-modal Hash retrieval method based on deep learning

Publications (2)

Publication Number Publication Date
CN110019652A true CN110019652A (en) 2019-07-16
CN110019652B CN110019652B (en) 2022-06-03

Family

ID=67189652

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910196009.7A Active CN110019652B (en) 2019-03-14 2019-03-14 Cross-modal Hash retrieval method based on deep learning

Country Status (1)

Country Link
CN (1) CN110019652B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674323A (en) * 2019-09-02 2020-01-10 山东师范大学 Unsupervised cross-modal Hash retrieval method and system based on virtual label regression
CN111628866A (en) * 2020-05-22 2020-09-04 深圳前海微众银行股份有限公司 Neural network verification method, device and equipment and readable storage medium
CN111639197A (en) * 2020-05-28 2020-09-08 山东大学 Cross-modal multimedia data retrieval method and system with label embedded online hash
CN112199375A (en) * 2020-09-30 2021-01-08 三维通信股份有限公司 Cross-modal data processing method and device, storage medium and electronic device
US20210191990A1 (en) * 2019-12-20 2021-06-24 Rakuten, Inc. Efficient cross-modal retrieval via deep binary hashing and quantization
CN113407661A (en) * 2021-08-18 2021-09-17 鲁东大学 Discrete hash retrieval method based on robust matrix decomposition
US11899765B2 (en) 2019-12-23 2024-02-13 Dts Inc. Dual-factor identification system and method with adaptive enrollment

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU8584898A (en) * 1997-08-01 1999-02-22 Interval Research Corporation A method and apparatus for personnel detection and tracking
CN105184303A (en) * 2015-04-23 2015-12-23 南京邮电大学 Image marking method based on multi-mode deep learning
US20170076143A1 (en) * 2015-06-11 2017-03-16 Duke University Systems and methods for large scale face identification and verification
CN107402993A (en) * 2017-07-17 2017-11-28 山东师范大学 The cross-module state search method for maximizing Hash is associated based on identification
CN107885764A (en) * 2017-09-21 2018-04-06 银江股份有限公司 Based on the quick Hash vehicle retrieval method of multitask deep learning
CN108108657A (en) * 2017-11-16 2018-06-01 浙江工业大学 A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning
CN108170755A (en) * 2017-12-22 2018-06-15 西安电子科技大学 Cross-module state Hash search method based on triple depth network
CN108334574A (en) * 2018-01-23 2018-07-27 南京邮电大学 A kind of cross-module state search method decomposed based on Harmonious Matrix
CN109271486A (en) * 2018-09-19 2019-01-25 九江学院 A kind of similitude reservation cross-module state Hash search method
CN109299342A (en) * 2018-11-30 2019-02-01 武汉大学 A kind of cross-module state search method based on circulation production confrontation network
CN109446347A (en) * 2018-10-29 2019-03-08 山东师范大学 A kind of multi-modal Hash search method of fast discrete and system having supervision

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU8584898A (en) * 1997-08-01 1999-02-22 Interval Research Corporation A method and apparatus for personnel detection and tracking
CN105184303A (en) * 2015-04-23 2015-12-23 南京邮电大学 Image marking method based on multi-mode deep learning
US20170076143A1 (en) * 2015-06-11 2017-03-16 Duke University Systems and methods for large scale face identification and verification
CN107402993A (en) * 2017-07-17 2017-11-28 山东师范大学 The cross-module state search method for maximizing Hash is associated based on identification
CN107885764A (en) * 2017-09-21 2018-04-06 银江股份有限公司 Based on the quick Hash vehicle retrieval method of multitask deep learning
CN108108657A (en) * 2017-11-16 2018-06-01 浙江工业大学 A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning
CN108170755A (en) * 2017-12-22 2018-06-15 西安电子科技大学 Cross-module state Hash search method based on triple depth network
CN108334574A (en) * 2018-01-23 2018-07-27 南京邮电大学 A kind of cross-module state search method decomposed based on Harmonious Matrix
CN109271486A (en) * 2018-09-19 2019-01-25 九江学院 A kind of similitude reservation cross-module state Hash search method
CN109446347A (en) * 2018-10-29 2019-03-08 山东师范大学 A kind of multi-modal Hash search method of fast discrete and system having supervision
CN109299342A (en) * 2018-11-30 2019-02-01 武汉大学 A kind of cross-module state search method based on circulation production confrontation network

Non-Patent Citations (10)

* Cited by examiner, † Cited by third party
Title
RAN HE 等: "Cross-Modal Subspace Learning via Pairwise Constraints", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》 *
RAN HE 等: "Cross-Modal Subspace Learning via Pairwise Constraints", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》, vol. 24, 7 August 2015 (2015-08-07), pages 5543 - 5556, XP011586934, DOI: 10.1109/TIP.2015.2466106 *
YONGMING CHEN 等: "Continuum regression for cross-modal multimedia retrieval", 《2012 19TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING》 *
YONGMING CHEN 等: "Continuum regression for cross-modal multimedia retrieval", 《2012 19TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING》, 21 February 2013 (2013-02-21), pages 1949 - 1952 *
姚伟娜: "基于深度哈希算法的图像—文本跨模态检索研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
姚伟娜: "基于深度哈希算法的图像—文本跨模态检索研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》, no. 01, 15 January 2019 (2019-01-15), pages 138 - 5069 *
欧卫华 等: "跨模态检索研究综述", 《贵州师范大学学报(自然科学版)》 *
欧卫华 等: "跨模态检索研究综述", 《贵州师范大学学报(自然科学版)》, vol. 36, no. 02, 31 March 2018 (2018-03-31), pages 114 - 120 *
董西伟: "基于局部流形重构的半监督多视图图像分类", 《计算机工程与应用》 *
董西伟: "基于局部流形重构的半监督多视图图像分类", 《计算机工程与应用》, vol. 52, no. 18, 30 September 2016 (2016-09-30), pages 24 - 30 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110674323A (en) * 2019-09-02 2020-01-10 山东师范大学 Unsupervised cross-modal Hash retrieval method and system based on virtual label regression
US20210191990A1 (en) * 2019-12-20 2021-06-24 Rakuten, Inc. Efficient cross-modal retrieval via deep binary hashing and quantization
US11651037B2 (en) * 2019-12-20 2023-05-16 Rakuten Group, Inc. Efficient cross-modal retrieval via deep binary hashing and quantization
US11899765B2 (en) 2019-12-23 2024-02-13 Dts Inc. Dual-factor identification system and method with adaptive enrollment
CN111628866A (en) * 2020-05-22 2020-09-04 深圳前海微众银行股份有限公司 Neural network verification method, device and equipment and readable storage medium
CN111639197A (en) * 2020-05-28 2020-09-08 山东大学 Cross-modal multimedia data retrieval method and system with label embedded online hash
CN112199375A (en) * 2020-09-30 2021-01-08 三维通信股份有限公司 Cross-modal data processing method and device, storage medium and electronic device
WO2022068196A1 (en) * 2020-09-30 2022-04-07 三维通信股份有限公司 Cross-modal data processing method and device, storage medium, and electronic device
CN112199375B (en) * 2020-09-30 2024-03-01 三维通信股份有限公司 Cross-modal data processing method and device, storage medium and electronic device
CN113407661A (en) * 2021-08-18 2021-09-17 鲁东大学 Discrete hash retrieval method based on robust matrix decomposition

Also Published As

Publication number Publication date
CN110019652B (en) 2022-06-03

Similar Documents

Publication Publication Date Title
CN110019652A (en) A kind of cross-module state Hash search method based on deep learning
Zhang et al. Improved deep hashing with soft pairwise similarity for multi-label image retrieval
CN111353076B (en) Method for training cross-modal retrieval model, cross-modal retrieval method and related device
Shabbir et al. Satellite and scene image classification based on transfer learning and fine tuning of ResNet50
CN110222140A (en) A kind of cross-module state search method based on confrontation study and asymmetric Hash
Santa Cruz et al. Visual permutation learning
CN106227851A (en) Based on the image search method searched for by depth of seam division that degree of depth convolutional neural networks is end-to-end
CN108520275A (en) A kind of regular system of link information based on adjacency matrix, figure Feature Extraction System, figure categorizing system and method
CN107408209A (en) Without the classification of the automatic defect of sampling and feature selecting
CN110222718B (en) Image processing method and device
CN110532417A (en) Image search method, device and terminal device based on depth Hash
Marburg et al. Deep learning for benthic fauna identification
WO2023019698A1 (en) Hyperspectral image classification method based on rich context network
CN110334724B (en) Remote sensing object natural language description and multi-scale correction method based on LSTM
Thirumuruganathan et al. Data curation with deep learning [vision]
CN110647907A (en) Multi-label image classification algorithm using multi-layer classification and dictionary learning
CN112597324A (en) Image hash index construction method, system and equipment based on correlation filtering
Ma et al. Research on fish image classification based on transfer learning and convolutional neural network model
Chen et al. Image classification based on convolutional denoising sparse autoencoder
Hossain et al. Genetic algorithm based deep learning parameters tuning for robot object recognition and grasping
Chao et al. Incomplete contrastive multi-view clustering with high-confidence guiding
Chauhan et al. Empirical Study on convergence of Capsule Networks with various hyperparameters
Zhao Fruit detection using CenterNet
CN117237704A (en) Multi-label image classification method based on two-dimensional dependence
Marasović et al. Person classification from aerial imagery using local convolutional neural network features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant