CN110019652A - A kind of cross-module state Hash search method based on deep learning - Google Patents
A kind of cross-module state Hash search method based on deep learning Download PDFInfo
- Publication number
- CN110019652A CN110019652A CN201910196009.7A CN201910196009A CN110019652A CN 110019652 A CN110019652 A CN 110019652A CN 201910196009 A CN201910196009 A CN 201910196009A CN 110019652 A CN110019652 A CN 110019652A
- Authority
- CN
- China
- Prior art keywords
- sample
- retrieval
- image modalities
- cross
- hash
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/325—Hash tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/51—Indexing; Data structures therefor; Storage structures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of cross-module state Hash search method based on deep learning, it is assumed thatThe pixel characteristic vector set of the image modalities of a object is, it is characterized in that method includes the following steps: (1) obtains the binary system Hash coding that image modalities and text modality are shared using based on the objective function that depth learning technology designs, the deep neural network parameter of image modalities and text modalityWithAnd the projection matrix of image modalities and text modalityWith;(2) known variables in objective function are solved using the mode alternately updated、、、With;(3) based on the deep neural network parameter for solving obtained image modalities and text modalityWithAnd projection matrixWith;(4) the binary system Hash coding based on generation calculates the Hamming distance that query sample concentrates each sample to sample retrieval;(5) retrieval to query sample is completed using the cross-module state searcher searched for based on approximate KNN.The performance of cross-module state Hash retrieval is effectively promoted in this method.
Description
Technical field
The cross-module state Hash search method based on deep learning that the present invention relates to a kind of.
Background technique
Along with the fast development of science and technology and social productive forces, big data era is come quietly.So-called big data is
Finger is not available the data acquisition system that conventional software tool is captured, managed and handled within the regular hour.IBM is mentioned
Big data has the characteristics that 5V out, it may be assumed that Volume (data volume is big), Variety (type and source diversification), Value (data valence
Be worth density it is relatively low, and it is sometimes but precious), Velocity (data growth rate is fast), the Veracity (matter of data
Amount).Big data is it is also assumed that be to need new tupe that could have stronger decision edge, see clearly discovery power and process optimization
The information assets of ability.
Information retrieval is an importance of data processing, and faces big data, how information retrieval is effectively performed
The urgently to be resolved and very challenging problem as big data era.Large-scale data is retrieved, Hash retrieval side
Method plays important role.The high dimensional feature of object is mapped in Hamming space by Hash search method, generates a low-dimensional
Hash encode to indicate an object, it reduces requirement of the searching system to calculator memory space, improve retrieval speed
Degree can better adapt to the requirement of magnanimity retrieval.The main thought of Hash retrieval is that the data projection that high dimension vector is indicated arrives
Hamming space carries out the retrieval of k nearest neighbor (K >=1) in Hamming space.In order to make the k nearest neighbor and luv space in Hamming space
It is consistent, Hash learning algorithm needs to meet local retention performance, that is, keeps the similitude before and after data projection.Part is quick
Sense Hash (Locality Sensitive Hashing, LSH) method can make the two o'clock that distance is close in higher dimensional space, pass through
After hash function is crossed to this two o'clock progress Hash coding, it is the same that their Hash coding, which has very big probability, conversely, if two
Farther out, then their Hash encodes identical probability meeting very little to the distance between point.
The retrieval of cross-module state Hash is mainly used for solving the mutual search problem between different modalities data, for example, using image
Retrieve text or with text retrieval image etc..Cross-module state Hash search method needs to carry out Hash to the data of different modalities
Coding generates compact binary system Hash coding, the phase being then based between the Hash coding completion different modalities data of generation
Mutually retrieval.Ding et al. propose collective's matrix decomposition Hash (Collective Matrix FactorizationHashing,
CMFH) method.CMFH method can use collective's matrix decomposition and learn unified Hash coding from the different modalities of each example.
In order to which classification information is efficiently used in the cross-module state hash method based on matrix decomposition and keeps local geometry, in turn
Achieve the purpose that effectively to promote the potential applications characteristic differentiation ability obtained by matrix decomposition, Tang et al. has proposed supervision square
Battle array decomposes Hash (Supervised Matrix Factorization Hashing, SMFH) method.SMFH method is being breathed out
When uncommon coding study, the consistency of mark information between mode is not only considered, it is also contemplated that the one of local geometry inside mode
Cause property.For the problem that not rare supervision cross-module state hash method training time complexity is excessively high, Zhang et al. is proposed referred to as
Semantic dependency maximizes having for (Semantic Correlation Maximization, SCM) and supervises cross-module state Hash side
Method.Semantic marker information can be seamlessly integrated into Hash learning process by SCM method.
Craft feature used in the above-mentioned cross-module state hash algorithm based on shallow-layer learning structure may not be able to be with Hash
Coding study reaches optimal compatibility.In order to solve this problem, Jiang et al. proposes depth cross-module state Hash (Deep
Cross-modal Hashing, DCMH) method.DCMH method is a kind of cross-module state end to end based on deep learning framework
Hash method, feature learning and Hash coding study can be effectively integrated in a learning framework by it.In order to be arrived at end
The quantization ability (Quantizability) that depth characteristic indicates is promoted in the study framework at end, so that indicate can be with for depth characteristic
More effectively quantified, Cao et al., which passes through to quantify to be introduced into, to be used in the end-to-end deep learning framework of cross-module state retrieval, is mentioned
Chu Liao collective depth quantization (Collective Deep Quantization, CDQ) method.CDQ method passes through well-designed
Hybrid network and loss function are two mode combination learning depth characteristics expressions and quantizer.The hybrid network packet of CDQ method
Contain: an image for being used to extract image feature representation being made of multiple convolution-pond (Convolution-Pooling) layers
Network, a text network for being used to extract Text Representation being made of multiple full connection (Fully-Connected) layers,
Two for generating full connection bottleneck (Fully-Connected Bottleneck) layer of optimal low-dimensional character representation, a use
In capturing adaptive intersection entropy loss across Modal Correlation and a collective for controlling Hash quality and quantifying ability
Quantization loss.In addition, CDQ method can also learn the shared quantizer code book of mode, can substantially be increased by the code book
Relevance between strong two mode.In order to effectively be captured in the framework of deep learning end to end retrieved for cross-module state
Constitutive relations between different modalities, Yang et al. propose the depth Hash (Pairwise of pairs of relationship orientation
RelationshipGuided Deep Hashing, PRDH) method.PRDH method is from the angle between the angle and mode in mode
Degree learns more reflect the Hash coding of constitutive relations between mode by integrating different types of pairs of constraint.In addition, PRDH
Method enhances the distinguishing ability that Hash encodes each bit by introducing decorrelation constraint in deep learning framework.
The retrieval of cross-module state Hash needs object being mapped to low-dimensional Hamming space in the high dimensional feature data of different modalities, with
Realize that the binary system Hash coding based on Hamming space is completed rapidly and accurately cross-module state information retrieval task.Existing cross-module
Most of state Hash search method is the method based on shallow-layer learning structure, although these methods can be based on Hash retrieval technique
Retrieval tasks are quickly completed, but the learning structure of shallow-layer dig the authentication information in primitive character can not well
Pick.Deep learning technology has shown excellent feature learning ability in such as classification task, object detection task, and
And the existing cross-module state Hash search method based on deep learning technology also indicates that deep learning technology for promoting cross-module state
The performance of retrieval tasks is beneficial.Therefore, the cross-module state Hash search method based on deep learning technology is designed, for completing
Cross-module state retrieval tasks under big data situation have great importance and are worth.
Summary of the invention
Its purpose of the invention, which is that, provides a kind of cross-module state Hash search method based on deep learning, solves existing
The cross-module state Hash search method based on shallow-layer learning structure can not excavate the authentication information in primitive character well
Problem.
It adopts the technical scheme that achieve the above object, a kind of cross-module state Hash retrieval side based on deep learning
Method, it is assumed that the pixel characteristic vector set of the image modalities of n object isWherein, viIndicate i-th of object
In the pixel characteristic vector of image modalities;It enablesIndicate this n object in the feature vector of text modality,
In, tiIndicate i-th of object in the feature vector of text modality;The category label vector of n object is expressed asWherein, c indicates the quantity of object type;For vector yiFor, if i-th of object belongs to kth class,
Then enable vector yiK-th of element be 1, otherwise, vector yiK-th of element be 0;Method includes the following steps:
(1) the binary system Kazakhstan that image modalities and text modality are shared is obtained using based on the objective function that deep learning technology designs
Uncommon coding B, the deep neural network parameter θ of image modalities and text modalityvAnd θtAnd the throwing of image modalities and text modality
Shadow matrix PvAnd Pt;
(2) known variables B, θ in objective function are solved using the mode alternately updatedv、θt、PvAnd Pt, i.e., alternate to solve such as
Lower three subproblems: fixed B, PvAnd Pt, solve θvAnd θt;Fixed B, θvAnd θt, solve PvAnd Pt;Fixed θv、θt、PvAnd Pt, ask
Solve B;
(3) based on the deep neural network parameter θ for solving obtained image modalities and text modalityvAnd θtAnd projection matrix Pv
And Pt, it is query sample and the sample generation binary system Hash coding that sample retrieval is concentrated;
(4) the binary system Hash coding based on generation calculates the Hamming distance that query sample concentrates each sample to sample retrieval;
(5) retrieval to query sample is completed using the cross-module state searcher searched for based on approximate KNN.
Wherein, the objective function form based on deep learning technology design in the step (1) is as follows:
Wherein, γ1And γ2For non-negative balance factor, B=[b1,b2,…,bn]T∈{-1,+1}n×k,WithFor projection matrix, θvAnd θtFor deep neural network parameter,WithRespectively n right
As the depth characteristic in image modalities and text modality, and the vector of the i-th column of matrix F and matrix G is respectively f (vi;θv) and
g(ti;θt),It is used to keep the consistency in mode between consistency and mode for Laplacian Matrix, 1 is all members
The column vector that element is 1, | | | |FThe Frobenius norm of representing matrix, the mark of tr () representing matrix, ()TRepresenting matrix
Transposition.
Wherein, the mode that the use in the step (2) alternately updates solves known variables B, θ in objective functionv、θt、
PvAnd Pt, specifically, alternately solving following three subproblem:
(1) fixed B, PvAnd Pt, solve θvAnd θt;When fixed binary system Hash encodes B and projection matrix PvAnd PtWhen, formula
(1) objective function shown in is reduced to about deep neural network parameter θvAnd θtSubproblem, it may be assumed that
(2) fixed B, θvAnd θt, solve PvAnd Pt;When fixed binary system Hash coding B and deep neural network parameter θvAnd θt
When, objective function shown in formula (1) is reduced to about projection matrix PvAnd PtSubproblem, it may be assumed that
(3) fixed θv、θt、PvAnd Pt, solve B;As constant depth neural network parameter θvAnd θtAnd projection matrix PvAnd Pt
When, objective function shown in formula (1) is reduced to encode the subproblem of B about binary system Hash, it may be assumed that
When known variables B in solution formula (4), solved using the discrete hash algorithm based on singular value decomposition.
Wherein, being joined in the step (3) based on the deep neural network for solving obtained image modalities and text modality
Number θvAnd θtAnd projection matrix PvAnd Pt, it is the sample generation binary system Hash coding of query sample and sample retrieval concentration,
Specifically, assuming that the feature vector of a query sample of image modalities isOne query sample of text modality
Feature vector isImage modalities sample retrieval concentrate sample feature beText modality retrieval
The feature of sample is in sample setWherein,Indicate that sample retrieval concentrates the quantity of sample;Image modalities
The binary system Hash coding of sample is concentrated to be respectively as follows: with text modality query sample and sample retrieval With
Wherein,Sign () is sign function.
Wherein, the binary system Hash coding based on generation in the step (4) calculates query sample to sample retrieval collection
In each sample Hamming distance, specifically, using formulaThe query sample of image modalities is calculated to text mould
State sample retrieval concentrates the The Hamming distance of a sample;Use formulaCalculate text mould
The query sample of state concentrates the to image modalities sample retrievalThe Hamming distance of a sample.
Wherein, the use in the step (5) is completed based on the cross-module state searcher that approximate KNN is searched for inquiry sample
This retrieval, specifically, to the Hamming distance being calculated(or) carried out according to sequence from small to large
Sequence, then, the corresponding sample of K minimum range is as inspection before text modality (or image modalities) sample retrieval concentration takes
Hitch fruit.
Beneficial effect
The present invention has the following advantages that compared with prior art.
1. the method for the present invention can excavate more identify using deep learning structure and believe in the case where keeping retrieval rate
Breath, so as to more accurately complete the retrieval of cross-module state;
2. the method for the present invention is by implementing holding consistency strategy between consistency and mode in mode, fully by primitive character sky
Between advantageous information remain to Hamming space, promote the excavation of authentication information and the promotion of retrieval performance;
3. the discrete hash algorithm based on singular value decomposition that the method for the present invention is proposed can make the binary system Hash got
Coding has more beneficial characteristics, and then the performance of cross-module state Hash retrieval is effectively promoted.
Detailed description of the invention
Fig. 1 is the cross-module state Hash search method work flow diagram proposed by the present invention based on deep learning.
Specific embodiment
Technical solution of the present invention is described in further detail with reference to the accompanying drawing.
The cross-module state Hash search method based on deep learning that the invention discloses a kind of, as shown in Figure 1, being embodied
Journey mainly comprises the steps that the pixel characteristic vector set for assuming the image modalities of n object isIts
In, viIndicate i-th of object in the pixel characteristic vector of image modalities;It enablesIndicate this n object in text
The feature vector of mode, wherein tiIndicate i-th of object in the feature vector of text modality;By the category label of n object to
Amount is expressed asWherein, c indicates the quantity of object type;For vector yiFor, if i-th of object category
In kth class, then vector y is enablediK-th of element be 1, otherwise, vector yiK-th of element be 0;
(1) the cross-module state Hash searched targets function building based on deep learning
The purpose of the method for the present invention is the category label of the characteristic V and T and object using image modalities and text modality
The hash function of information learning image modalities and text modality, and the hash function obtained using study is generated for completing cross-module
The binary system Hash of state Hash retrieval tasks encodes.Directly using the characteristic V and T of image modalities and text modality carry out across
The study of mode Hash is unfavorable for the excavation authentication information from primitive character and carrys out the binary system Hash coding that generative nature can be excellent.For
Preferably authentication information is excavated from the initial characteristic data of image modalities and text modality, the method for the present invention is directed to figure respectively
As mode and text modality data building deep neural network (Deep Neural Network, DNN) carry out depth characteristic
It practises.
For image modalities, the method for the present invention uses the convolutional Neural being made of seven layers improved by AlexNet
Network (Convolutional Neural Network, CNN) carries out the study of image modalities depth characteristic.Below to this CNN
Model describes in detail.
This CNN model for the study of image modalities depth characteristic includes five convolutional layer (Convolution
Layer) and two full articulamentums (Fully Connected Layer), it is expressed as " Conv1-Conv5 " and " Fc6-
Fc7".The network is using the pixel characteristic of image modalities as input.In this CNN, first convolutional layer Conv1 is big with 96
Small is that the input picture that 11 × 11 × 3 verification sizes are 227 × 227 × 3 is filtered using 4 pixels as step-length.By linear
The activation of amending unit (Rectified Linear Unit, ReLU), maximum pond (MAX-pooling) and local acknowledgement are returned
After one changes (Local Response Normalization, LRN), the output feature that size is 27 × 27 × 96 is obtained.Second
A convolutional layer Conv2 is using the output of first convolutional layer Conv1 as input, the core that Conv2 is 5 × 5 × 96 with 256 sizes
Input is filtered.Similarly, after by ReLU, MAX-pooling and LRN, obtaining size is 13 × 13 × 256
Export feature.It is 3 × 3 × 256,3 that third, the 4th and the 5th convolutional layer Conv3, Conv4 and Conv5 have used size respectively
× 3 × 384 and 3 × 3 × 384 384,384 and 256 convolution kernels, and every layer is all activated using ReLU.When Conv5 passes through
The output feature that size is 6 × 6 × 256 is obtained after MAX-pooling.The neuron number of full articulamentum Fc6 is 4096, and is made
Neuron temporarily abandon to prevent over-fitting with 0.5 discarding ratio.Fc7 layers are to contain d(v)A neuron connects entirely
Connect layer, and with tanh (Hyperbolic Tangent, TanH) function as Fc7 layers of activation primitive.Finally, exist
It is d that Fc7 layers, which obtain size,(v)× 1 output feature.
For text modality, the method for the present invention uses the multi-layer perception (MLP) (Multilayer being made of three full articulamentums
Perceptron, MLP) it is mapped to construct a MLP deep neural network for the feature of text modality from original feature space
Semantic space.Here three full articulamentums in constructed MLP deep neural network are indicated with Fc1, Fc2 and Fc3 respectively.
Similar to pertinent literature carry out text modality feature learning when construct MLP deep neural network way, the present invention constructed by
The Fc1 layer of MLP deep neural network uses ReLU as nonlinear activation function with Fc2 layers.Then hyperbolic is being used just for Fc3 layers
(TanH) function is cut as activation primitive.Fc3 layers of neuron number is d(t), it may be assumed that for learning text mode depth characteristic
The dimension of the output feature of MLP deep neural network is d(t)。
For i-th of object, enableIndicate the output feature of the CNN of image modalities, wherein θvFor figure
As the parameter of the CNN of mode;It enablesIndicate the output feature of the MLP deep neural network of text modality,
In, θtFor the parameter of the MLP deep neural network of text modality.
Assuming that deep learning feature f (v of i-th of object in image modalities and text modalityi;θv) and g (ti;θt) pass through
Linear projection matrixWithFeature after projection is respectivelyAnd Pt Tg(ti;θt), wherein
(·)TThe transposition of representing matrix.It is further assumed that byAnd Pt Tg(ti;θt) can generate respectively in Hamming space
Binary system Hash codingWithIt is possible to carry out the study of cross-module state Hash by following minimization problem:
Wherein, γ1And γ2For non-negative balance factor, the Section 3 of right side of the equal sign is the regular terms for preventing over-fitting, equal sign
The effect of the right Section 4 be desirable to Hash coding each be+1 it is equal with -1 probability and for maximize Hash encode
Each provided information.
Similitude reflects the neighbor relationships between the data point being made of in each mode feature vector in mode.Image
Two data point v of modeiAnd vjBetween mode in similitude can be with is defined as:
Wherein,Indicate data point viK1Neighbour collects (k1- nearest neighbors),Indicate viAnd vjIt
Between Euclidean distance, it may be assumed thatIndicate the l of vector2Norm.σ is for controllingDecaying speed
Rate.Similarly, two data point t being made of in text modality two feature vectorsiAnd tjMode in similitudeDefinition
Are as follows:
Wherein,For each mode, in order to make the local neighbor structure of data point in Hamming space and
Original feature space is consistent, it may be assumed that obtains each data point and its neighbor relationships in original feature space in Hamming space
To holding, following objective function can be designed:
Object-based Classmark information can define the data point v of image modalitiesi(i=1,2 ..., n) and text
The data point t of modejThe semantic association matrix as follows of (j=1,2 ..., n):
As long as it should be understood that viAnd tjAt least belong to an identical classification, then it is assumed that their semantemes having the same.For
The consistency between the mode that Hamming space is kept between image modalities and text modality, can design following objective function:
In summary about consistency and mode in the study of image modalities depth characteristic, the study of text modality depth characteristic, mode
Between holding consistency analysis, the objective function of the method for the present invention can be designed as:
According to having work, if the semanteme having the same of the data in different modalities space, these different modalities
In data often correspond to a public latent space.Therefore, present invention assumes that having in image modalities and text modality
There is the feature of identical semanteme that may finally be expressed as identical binary system Hash coding in public Hamming space.That is haveIt sets up.Based on this it is assumed that the optimization problem in formula (7) can indicate are as follows:
Wherein,
By simply deriving,It can be rewritten as following form:
Wherein, B=[b1,b2,…,bn]T∈{-1,+1}n×k,And matrix F and matrix G's
The vector of i-th column is respectively f (vi;θv) and g (ti;θt), | | | |FThe Frobenius norm of representing matrix.It is rightInto
Its available equivalent form of the following derivation of row, it may be assumed that
Wherein,L=D-W is Laplacian Matrix,
Indicate i-th of diagonal element of diagonal matrix D, wijFor the element on matrix W the i-th row jth column, the mark of tr () representing matrix.
According to formula (12) and formula (13), formula (8) can be rewritten are as follows:
(2) solution of objective function
It include five known variables to be solved in objective function shown in formula (14), it may be assumed that binary system Hash encoder matrix B,
Linear projection matrix PvAnd Pt, deep neural network parameter θvAnd θt.Objective function shown in formula (14) combines this five
Known variables together are non-convex, therefore, it is impossible to obtain the analytic solutions of this five known variables simultaneously.In formula (14)
Known variables can be solved by alternately solving following three subproblem, it may be assumed that fixed B, PvAnd Pt, solve θvAnd θt;It is fixed
B、θvAnd θt, solve PvAnd Pt;Fixed θv、θt、PvAnd Pt, solve B.
(a) fixed B, PvAnd Pt, solve θvAnd θt
When fixed binary system Hash encodes B and projection matrix PvAnd PtWhen, objective function shown in formula (14) is reduced to close
In deep neural network parameter θvAnd θtSubproblem, it may be assumed that
The present invention learns to update DNN network parameter θ using back-propagating (Back Propagation, BP) algorithmv.Class
Most of existing deep learning methods are similar to, learn θ used here as based on the stochastic gradient descent algorithm of back-propagatingv.It learns
Practise θvSpecific practice be: each iteration chooses a small quantities of training sample from training sample, is then made using the sample of selection
Learn θ with based on the stochastic gradient descent algorithm of back-propagatingv.For each spy of the image modalities of the training sample of selection
Levy vector vi, gradient is calculated using following formula first:
Then, it using chain type rule and has obtainedIt calculatesFinally, calculatedThe DNN network parameter θ of image modalities is updated with BP algorithmv。
Algorithm 1, which illustrates, solves image modalities DNN network parameter θvAlgorithm.
Similarly, the depth nerve net of text modality is updated using the stochastic gradient descent algorithm study based on backpropagation
Network parameter θt.For each feature vector t of the text modality of the training sample of selectioni, following gradient is calculated first:
Then, using chain type rule and obtained gradientIt calculatesFinally, using calculating
It arrivesThe DNN network parameter θ of text modality is updated with BP algorithmt.It can be learnt using the algorithm similar with algorithm 1
To the DNN network parameter θ of text modalityt。
(b) fixed B, θvAnd θt, solve PvAnd Pt
When fixed binary system Hash coding B and deep neural network parameter θvAnd θtWhen, objective function shown in formula (14)
It is reduced to about projection matrix PvAnd PtSubproblem, it may be assumed that
For in formula (18)Respectively about PvAnd PtIt seeks partial derivative and enables partial derivative equal to 0, available:
It can be obtained by simply deriving:
Pv=(FFT+I+F11TFT)-1FB, (21)
Pt=(GGT+I+G11TGT)-1GB, (22)
Wherein, I is unit matrix, ()-1Representing matrix it is inverse.
(c) fixed θv、θt、PvAnd Pt, solve B
As constant depth neural network parameter θvAnd θtAnd projection matrix PvAnd PtWhen, the letter of objective function shown in formula (14)
Turn to the subproblem about binary system Hash coding B, it may be assumed that
Formula (23) is simply derived available:
Because of Pv、Pt、θvAnd θtBe it is fixed, therefore,WithIt is constant.Further, in formula (24)
The solution of B will not be had an impact by ignoring this two.In addition, because B ∈ { -1 ,+1 }n×k, available?
With regard to saying,For constant.After constant term in formula (24) is given up, formula (24) conversion are as follows:
Wherein,
Known variables in formula (25) are therefore discrete variable is difficult directly solve to it under normal circumstances
To analytic solutions.The invention proposes the discrete hash algorithm based on singular value decomposition come shown in solution formula (25) about discrete
The optimization problem of variable B.The discrete hash algorithm based on singular value decomposition is described in detail below.
It is available that singular value decomposition is carried out to matrix LWherein, For
Diagonal matrix.It willIt is available to substitute into formula (25):
It enablesWithRespectively indicate matrix B,WithThe i-th row;It enablesWithRespectively indicate matrix B,WithIt is eliminatingWith
The matrix that remaining row is constituted later.At this point it is possible to obtain:
Similarly, available:
Wherein,The i-th column of representing matrix Q,Representing matrix Q is in removal qiRemaining column structure afterwards
At matrix.
According to formula (27) and formula (28), unknown binary system Hash encoder matrix B can be by solving such as in formula (26)
Under about biThe optimization problem of (i=1,2 ..., n) obtains, it may be assumed that
By simply deriving, formula (29) can be converted are as follows:
Optimization problem in formula (30) has following analytic solutions:
Wherein, sign () indicates sign function.
Algorithm 2 illustrates the discrete hash algorithm based on singular value decomposition.
(3) the sample binary system Hash coding of query sample and sample retrieval concentration is generated
Assuming that the feature vector of a query sample of image modalities isThe spy of one query sample of text modality
Levying vector isImage modalities sample retrieval concentrate sample feature beText modality retrieves sample
The feature of this concentration sample isWherein,Indicate that sample retrieval concentrates the quantity of sample.Using solving
The projection matrix P of the image modalities and text modality that arrivevAnd PtAnd the deep neural network ginseng of image modalities and text modality
Number θvAnd θt, available image modalities and text modality query sample and sample retrieval concentrate the binary system Hash coding of sample
It is respectively as follows: WithWherein,Sign () is sign function.
(4) Hamming distance that query sample concentrates each sample to sample retrieval is calculated
For the query sample of image modalitiesUse formulaCalculate looking into for image modalities
Ask sampleSample is concentrated to text modality sample retrieval Hamming distance.Inquiry for text modality
SampleUse formulaCalculate the query sample of text modalityIt is retrieved to image modalities
Sample in sample setHamming distance.
(5) retrieval to query sample is completed using cross-module state searcher
For the retrieval tasks of image retrieval text, first to being calculatedA Hamming distanceAccording to from
Small to be ranked up to big sequence, then, the corresponding sample of K minimum range is as retrieval before taking in text retrieval sample set
As a result.Similarly, for the retrieval tasks of text retrieval image, first to being calculatedA Hamming distanceIt is ranked up according to sequence from small to large, then, K minimum range before being taken in image retrieval sample set
Corresponding sample is as search result.
Beneficial effects of the present invention are illustrated below in conjunction with specific experiment.
The related experiment implemented for the method for the present invention mainly carries out on 2007 data set of Pascal VOC, first briefly
Introduce 2007 data set of Pascal VOC.2007 data set of Pascal VOC includes to belong to 20 classifications (for example, aircraft, bottle
Son, horse and sofa etc.) 9963 width images, and each image is marked with label.In an experiment, the method for the present invention will count
It is divided into according to collection comprising 5011 image-label pair training sets and comprising 4952 image-label pair test sets.For depth
Cross-module state hash method is spent, image modalities use original pixel feature as input feature vector.For using manual feature as input
Method, use the GIST features of 512 dimensions as input feature vector.For text modality, use the words-frequency features of 399 dimensions as defeated
Enter feature.It is substantially carried out two kinds of cross-module state retrieval tasks in an experiment, it may be assumed that with image retrieval text and with text retrieval image,
It is indicated respectively with Img2Txt and Txt2Img.
The present invention uses mean accuracy mean value (Mean Average Precision, MAP) Lai Hengliang cross-module state Hash retrieval side
The performance of method.In order to obtain MAP, need to calculate first against each query sample mean accuracy (Average Precision,
AP).After obtaining the mean accuracy AP of all query samples, averaging to all mean accuracy AP can be obtained average essence
Spend mean value MAP.
The method of the present invention using momentum (Momentum) and weight decay (Weight Decay) be respectively 0.9 and 0.0001 it is small
Batch gradient descent algorithm, and batch (Batch) is dimensioned to 128.Use the pre-training on ImageNet data set
AlexNet initializes first five layer of image modalities deep neural network in the method for the present invention.For depth in the method for the present invention
The other parameters of neural network are initialized by the way of random initializtion.By the depth of image modalities and text modality mind
Output intrinsic dimensionality through network is disposed as 1024.In an experiment, joined using 5 folding cross validations to determine in the method for the present invention
Number γ1And γ2Optimum value.For the parameter in other methods, the parameter setting principle recommended according to each method is joined
The average value that reported result is 10 random experiments results is tested in number setting.
It is respectively as follows: semantic dependency with the method that the method for the present invention compares and maximizes (Semantic
Correlation Maximization, SCM), have check matrix decompose Hash (Supervised Matrix
Factorization Hashing, SMFH) method, depth cross-module state Hash (Deep Cross-Modal Hashing, DCMH)
Method and the depth Hash of pairs of relationship orientation (Pairwise Relationship Guided Deep Hashing, PRDH)
Method.Table 1 lists the method for the present invention and control methods and carries out the retrieval of cross-module state Hash on 2007 data set of Pascal VOC
When mean accuracy mean value MAP.As it can be seen from table 1 for two kinds of retrieval tasks, under three kinds of Hash code lengths, depth
Cross-module state Hash search method DCMH, PRDH and the method for the present invention can obtain than shallow-layer cross-module state Hash search method SCM and
The better retrieval performance of SMFH.This explanation is using deep learning technology study for generating the depth characteristic of binary system Hash coding
It is beneficial.From table 1 it can also be seen that for Img2Txt and Txt2Img retrieval tasks, under three kinds of Hash code lengths,
The cross-module state retrieval performance of the method for the present invention is superior to DCMH and PRDH method.This illustrates that the method for the present invention is effective cross-module state
Hash search method.
MAP of 1 each method of table on 2007 data set of Pascal VOC
Claims (6)
1. a kind of cross-module state Hash search method based on deep learning, it is assumed thatThe pixel characteristic vector of the image modalities of a object
Collection is, whereinIndicate thePixel characteristic vector of a object in image modalities;It enablesIndicate thisFeature vector of a object in text modality, whereinIndicate theA object is in text modality
Feature vector;It willThe category label vector of a object is expressed as, whereinIndicate object type
Quantity;For vectorFor, if theA object belongs toClass then enables vector?A element is 1, otherwise, vector
?A element is 0;It is characterized in that, method includes the following steps:
(1) the binary system Kazakhstan that image modalities and text modality are shared is obtained using based on the objective function that deep learning technology designs
Uncommon coding, the deep neural network parameter of image modalities and text modalityWithAnd the throwing of image modalities and text modality
Shadow matrixWith;
(2) known variables in objective function are solved using the mode alternately updated、、、With, i.e., alternately solve as follows
Three subproblems: fixed、With, solveWith;It is fixed、With, solveWith;It is fixed、、With, solve;
(3) based on the deep neural network parameter for solving obtained image modalities and text modalityWithAnd projection matrix
With, it is query sample and the sample generation binary system Hash coding that sample retrieval is concentrated;
(4) the binary system Hash coding based on generation calculates the Hamming distance that query sample concentrates each sample to sample retrieval;
(5) retrieval to query sample is completed using the cross-module state searcher searched for based on approximate KNN.
2. a kind of cross-module state Hash search method based on deep learning according to claim 1, which is characterized in that described
The objective function form based on deep learning technology design in step (1) is as follows:
, (1)
Wherein,WithFor non-negative balance factor,,With
For projection matrix,WithFor deep neural network parameter,WithRespectivelyA object is in image
The depth characteristic of mode and text modality, and matrixAnd matrix?The vector of column is respectivelyWith,It is used to keep the consistency in mode between consistency and mode for Laplacian Matrix,For whole members
The column vector that element is 1,The Frobenius norm of representing matrix,The mark of representing matrix,The transposition of representing matrix.
3. a kind of cross-module state Hash search method based on deep learning according to claim 1, which is characterized in that described
The mode that use in step (2) alternately updates solves the known variables in objective function、、、With, specifically, alternately
Ground solves following three subproblem:
(1) fixed、With, solveWith;When fixed binary system Hash codingAnd projection matrixWithWhen, formula (1)
Shown in objective function be reduced to about deep neural network parameterWithSubproblem, it may be assumed that
(2);
(2) fixed、With, solveWith;When fixed binary system Hash codingAnd deep neural network parameterWith
When, objective function shown in formula (1) is reduced to about projection matrixWithSubproblem, it may be assumed that
(3);
(3) fixed、、With, solve;When constant depth neural network parameterWithAnd projection matrixWithWhen, it is public
Objective function shown in formula (1) is reduced to encode about binary system HashSubproblem, it may be assumed that
(4)
Known variables in solution formula (4)When, it is solved using the discrete hash algorithm based on singular value decomposition.
4. a kind of cross-module state Hash search method based on deep learning according to claim 1, which is characterized in that described
The deep neural network parameter of the image modalities and text modality obtained based on solution in step (3)With, and projection square
Battle arrayWith, it is query sample and the sample generation binary system Hash coding that sample retrieval is concentrated, specifically, assuming image modalities
The feature vector of a query sample be, the feature vector of a query sample of text modality is,
Image modalities sample retrieval concentrate sample feature be, the spy of text modality sample retrieval concentration sample
Sign is, whereinIndicate that sample retrieval concentrates the quantity of sample;Image modalities and text modality inquire sample
This and sample retrieval concentrate the binary system Hash coding of sample to be respectively as follows:,,With, wherein,For sign function.
5. a kind of cross-module state Hash search method based on deep learning according to claim 1, which is characterized in that described
The binary system Hash coding based on generation in step (4) calculates the Hamming distance that query sample concentrates each sample to sample retrieval
From specifically, using formulaThe query samples of image modalities is calculated to text modality sample retrieval concentration the() a sample Hamming distance;Use formulaCalculate text modality query sample to scheme
As mode sample retrieval concentrates the() a sample Hamming distance.
6. a kind of cross-module state Hash search method based on deep learning according to claim 1, which is characterized in that described
Use in step (5) completes the retrieval to query sample based on the cross-module state searcher that approximate KNN is searched for, specifically, right
The Hamming distance being calculated(or) be ranked up according to sequence from small to large, then, in text
Before this mode (or image modalities) sample retrieval concentration takesThe corresponding sample of a minimum range is as search result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910196009.7A CN110019652B (en) | 2019-03-14 | 2019-03-14 | Cross-modal Hash retrieval method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910196009.7A CN110019652B (en) | 2019-03-14 | 2019-03-14 | Cross-modal Hash retrieval method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110019652A true CN110019652A (en) | 2019-07-16 |
CN110019652B CN110019652B (en) | 2022-06-03 |
Family
ID=67189652
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910196009.7A Active CN110019652B (en) | 2019-03-14 | 2019-03-14 | Cross-modal Hash retrieval method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110019652B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110674323A (en) * | 2019-09-02 | 2020-01-10 | 山东师范大学 | Unsupervised cross-modal Hash retrieval method and system based on virtual label regression |
CN111628866A (en) * | 2020-05-22 | 2020-09-04 | 深圳前海微众银行股份有限公司 | Neural network verification method, device and equipment and readable storage medium |
CN111639197A (en) * | 2020-05-28 | 2020-09-08 | 山东大学 | Cross-modal multimedia data retrieval method and system with label embedded online hash |
CN112199375A (en) * | 2020-09-30 | 2021-01-08 | 三维通信股份有限公司 | Cross-modal data processing method and device, storage medium and electronic device |
US20210191990A1 (en) * | 2019-12-20 | 2021-06-24 | Rakuten, Inc. | Efficient cross-modal retrieval via deep binary hashing and quantization |
CN113407661A (en) * | 2021-08-18 | 2021-09-17 | 鲁东大学 | Discrete hash retrieval method based on robust matrix decomposition |
US11899765B2 (en) | 2019-12-23 | 2024-02-13 | Dts Inc. | Dual-factor identification system and method with adaptive enrollment |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU8584898A (en) * | 1997-08-01 | 1999-02-22 | Interval Research Corporation | A method and apparatus for personnel detection and tracking |
CN105184303A (en) * | 2015-04-23 | 2015-12-23 | 南京邮电大学 | Image marking method based on multi-mode deep learning |
US20170076143A1 (en) * | 2015-06-11 | 2017-03-16 | Duke University | Systems and methods for large scale face identification and verification |
CN107402993A (en) * | 2017-07-17 | 2017-11-28 | 山东师范大学 | The cross-module state search method for maximizing Hash is associated based on identification |
CN107885764A (en) * | 2017-09-21 | 2018-04-06 | 银江股份有限公司 | Based on the quick Hash vehicle retrieval method of multitask deep learning |
CN108108657A (en) * | 2017-11-16 | 2018-06-01 | 浙江工业大学 | A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning |
CN108170755A (en) * | 2017-12-22 | 2018-06-15 | 西安电子科技大学 | Cross-module state Hash search method based on triple depth network |
CN108334574A (en) * | 2018-01-23 | 2018-07-27 | 南京邮电大学 | A kind of cross-module state search method decomposed based on Harmonious Matrix |
CN109271486A (en) * | 2018-09-19 | 2019-01-25 | 九江学院 | A kind of similitude reservation cross-module state Hash search method |
CN109299342A (en) * | 2018-11-30 | 2019-02-01 | 武汉大学 | A kind of cross-module state search method based on circulation production confrontation network |
CN109446347A (en) * | 2018-10-29 | 2019-03-08 | 山东师范大学 | A kind of multi-modal Hash search method of fast discrete and system having supervision |
-
2019
- 2019-03-14 CN CN201910196009.7A patent/CN110019652B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU8584898A (en) * | 1997-08-01 | 1999-02-22 | Interval Research Corporation | A method and apparatus for personnel detection and tracking |
CN105184303A (en) * | 2015-04-23 | 2015-12-23 | 南京邮电大学 | Image marking method based on multi-mode deep learning |
US20170076143A1 (en) * | 2015-06-11 | 2017-03-16 | Duke University | Systems and methods for large scale face identification and verification |
CN107402993A (en) * | 2017-07-17 | 2017-11-28 | 山东师范大学 | The cross-module state search method for maximizing Hash is associated based on identification |
CN107885764A (en) * | 2017-09-21 | 2018-04-06 | 银江股份有限公司 | Based on the quick Hash vehicle retrieval method of multitask deep learning |
CN108108657A (en) * | 2017-11-16 | 2018-06-01 | 浙江工业大学 | A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning |
CN108170755A (en) * | 2017-12-22 | 2018-06-15 | 西安电子科技大学 | Cross-module state Hash search method based on triple depth network |
CN108334574A (en) * | 2018-01-23 | 2018-07-27 | 南京邮电大学 | A kind of cross-module state search method decomposed based on Harmonious Matrix |
CN109271486A (en) * | 2018-09-19 | 2019-01-25 | 九江学院 | A kind of similitude reservation cross-module state Hash search method |
CN109446347A (en) * | 2018-10-29 | 2019-03-08 | 山东师范大学 | A kind of multi-modal Hash search method of fast discrete and system having supervision |
CN109299342A (en) * | 2018-11-30 | 2019-02-01 | 武汉大学 | A kind of cross-module state search method based on circulation production confrontation network |
Non-Patent Citations (10)
Title |
---|
RAN HE 等: "Cross-Modal Subspace Learning via Pairwise Constraints", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》 * |
RAN HE 等: "Cross-Modal Subspace Learning via Pairwise Constraints", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》, vol. 24, 7 August 2015 (2015-08-07), pages 5543 - 5556, XP011586934, DOI: 10.1109/TIP.2015.2466106 * |
YONGMING CHEN 等: "Continuum regression for cross-modal multimedia retrieval", 《2012 19TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING》 * |
YONGMING CHEN 等: "Continuum regression for cross-modal multimedia retrieval", 《2012 19TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING》, 21 February 2013 (2013-02-21), pages 1949 - 1952 * |
姚伟娜: "基于深度哈希算法的图像—文本跨模态检索研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
姚伟娜: "基于深度哈希算法的图像—文本跨模态检索研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》, no. 01, 15 January 2019 (2019-01-15), pages 138 - 5069 * |
欧卫华 等: "跨模态检索研究综述", 《贵州师范大学学报(自然科学版)》 * |
欧卫华 等: "跨模态检索研究综述", 《贵州师范大学学报(自然科学版)》, vol. 36, no. 02, 31 March 2018 (2018-03-31), pages 114 - 120 * |
董西伟: "基于局部流形重构的半监督多视图图像分类", 《计算机工程与应用》 * |
董西伟: "基于局部流形重构的半监督多视图图像分类", 《计算机工程与应用》, vol. 52, no. 18, 30 September 2016 (2016-09-30), pages 24 - 30 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110674323A (en) * | 2019-09-02 | 2020-01-10 | 山东师范大学 | Unsupervised cross-modal Hash retrieval method and system based on virtual label regression |
US20210191990A1 (en) * | 2019-12-20 | 2021-06-24 | Rakuten, Inc. | Efficient cross-modal retrieval via deep binary hashing and quantization |
US11651037B2 (en) * | 2019-12-20 | 2023-05-16 | Rakuten Group, Inc. | Efficient cross-modal retrieval via deep binary hashing and quantization |
US11899765B2 (en) | 2019-12-23 | 2024-02-13 | Dts Inc. | Dual-factor identification system and method with adaptive enrollment |
CN111628866A (en) * | 2020-05-22 | 2020-09-04 | 深圳前海微众银行股份有限公司 | Neural network verification method, device and equipment and readable storage medium |
CN111639197A (en) * | 2020-05-28 | 2020-09-08 | 山东大学 | Cross-modal multimedia data retrieval method and system with label embedded online hash |
CN112199375A (en) * | 2020-09-30 | 2021-01-08 | 三维通信股份有限公司 | Cross-modal data processing method and device, storage medium and electronic device |
WO2022068196A1 (en) * | 2020-09-30 | 2022-04-07 | 三维通信股份有限公司 | Cross-modal data processing method and device, storage medium, and electronic device |
CN112199375B (en) * | 2020-09-30 | 2024-03-01 | 三维通信股份有限公司 | Cross-modal data processing method and device, storage medium and electronic device |
CN113407661A (en) * | 2021-08-18 | 2021-09-17 | 鲁东大学 | Discrete hash retrieval method based on robust matrix decomposition |
Also Published As
Publication number | Publication date |
---|---|
CN110019652B (en) | 2022-06-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110019652A (en) | A kind of cross-module state Hash search method based on deep learning | |
Zhang et al. | Improved deep hashing with soft pairwise similarity for multi-label image retrieval | |
CN111353076B (en) | Method for training cross-modal retrieval model, cross-modal retrieval method and related device | |
Shabbir et al. | Satellite and scene image classification based on transfer learning and fine tuning of ResNet50 | |
CN110222140A (en) | A kind of cross-module state search method based on confrontation study and asymmetric Hash | |
Santa Cruz et al. | Visual permutation learning | |
CN106227851A (en) | Based on the image search method searched for by depth of seam division that degree of depth convolutional neural networks is end-to-end | |
CN108520275A (en) | A kind of regular system of link information based on adjacency matrix, figure Feature Extraction System, figure categorizing system and method | |
CN107408209A (en) | Without the classification of the automatic defect of sampling and feature selecting | |
CN110222718B (en) | Image processing method and device | |
CN110532417A (en) | Image search method, device and terminal device based on depth Hash | |
Marburg et al. | Deep learning for benthic fauna identification | |
WO2023019698A1 (en) | Hyperspectral image classification method based on rich context network | |
CN110334724B (en) | Remote sensing object natural language description and multi-scale correction method based on LSTM | |
Thirumuruganathan et al. | Data curation with deep learning [vision] | |
CN110647907A (en) | Multi-label image classification algorithm using multi-layer classification and dictionary learning | |
CN112597324A (en) | Image hash index construction method, system and equipment based on correlation filtering | |
Ma et al. | Research on fish image classification based on transfer learning and convolutional neural network model | |
Chen et al. | Image classification based on convolutional denoising sparse autoencoder | |
Hossain et al. | Genetic algorithm based deep learning parameters tuning for robot object recognition and grasping | |
Chao et al. | Incomplete contrastive multi-view clustering with high-confidence guiding | |
Chauhan et al. | Empirical Study on convergence of Capsule Networks with various hyperparameters | |
Zhao | Fruit detection using CenterNet | |
CN117237704A (en) | Multi-label image classification method based on two-dimensional dependence | |
Marasović et al. | Person classification from aerial imagery using local convolutional neural network features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |