CN107402993B - The cross-module state search method for maximizing Hash is associated with based on identification - Google Patents
The cross-module state search method for maximizing Hash is associated with based on identification Download PDFInfo
- Publication number
- CN107402993B CN107402993B CN201710581083.1A CN201710581083A CN107402993B CN 107402993 B CN107402993 B CN 107402993B CN 201710581083 A CN201710581083 A CN 201710581083A CN 107402993 B CN107402993 B CN 107402993B
- Authority
- CN
- China
- Prior art keywords
- hash
- text
- data
- image
- object function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 239000011159 matrix material Substances 0.000 claims abstract description 58
- 238000012549 training Methods 0.000 claims abstract description 33
- 238000012360 testing method Methods 0.000 claims abstract description 26
- 238000013480 data collection Methods 0.000 claims abstract description 19
- 238000000605 extraction Methods 0.000 claims abstract description 10
- 238000004458 analytical method Methods 0.000 claims description 16
- 238000012545 processing Methods 0.000 claims description 9
- 238000013139 quantization Methods 0.000 claims description 9
- 238000010276 construction Methods 0.000 claims description 8
- 241001269238 Data Species 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 230000006870 function Effects 0.000 description 44
- 230000001419 dependent effect Effects 0.000 description 5
- 239000012141 concentrate Substances 0.000 description 3
- 238000000354 decomposition reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000000875 corresponding effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013549 information retrieval technique Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 230000029058 respiratory gaseous exchange Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention proposes a kind of cross-module state search method for being associated with based on identification and maximizing Hash, including:Multi-modal extraction is carried out to training dataset, obtains training multi-modal data collection;For training multi-modal data collection, constructs and be associated with the object function for maximizing Hash based on identification on the data set;The object function is solved, the joint Hash codes of image, the projection matrix for projecting to common hamming space of text, image text pair are obtained;For test data set, the common hamming space is projected to, and is quantified as the Hash codes of training set sample by hash function;Cross-module state retrieval is carried out based on Hash codes.The present invention improves efficiency and the accuracy of cross-media retrieval.
Description
Technical field
The present invention relates to field of data retrieval, and in particular to a kind of to be associated with the cross-module state inspection for maximizing Hash based on identification
Suo Fangfa.
Background technology
With the development of science and technology, a large amount of multi-modal data has been poured in internet.In order to be retrieved from internet
To useful information, range of information retrieval technique has been risen.Traditional information retrieval is based on single mode, that is, the inquiry inputted
Data and retrieval obtain the result is that same mode.This makes information retrieval have very much limitation, therefore it is desirable that by single mode
The information retrieval of state expands to the information retrieval of cross-module state, i.e., a given pictures are retrieved and retouched with the relevant word of the picture
It states, otherwise similarly.
Because the data of different modalities have different characteristics, therefore the similitude of hardly possible directly both measurements,
This is the significant challenge across Modal Method.In order to solve this problem, most common method is exactly sub-space learning method.Typical phase
It is a kind of general unsupervised sub-space learning method to close analysis (CCA), it is by the data projection of different modalities to the same space, together
When by between two mode relationship maximize.CCA methods are intended to maximize the relationship between the data of two different modalities,
And offset minimum binary (PLS) is to solve the problems, such as cross-media retrieval from the angle of covariance.The multimode analysis (GMA) of broad sense makes
Use category label as supervision message, it is CCA methods in the extension for having supervision field.
Cross-media retrieval method above-mentioned is generally required to consume a large amount of time and be deposited when handling large-scale data
Store up space.In order to solve this problem, hash method comes into being.In hash method, indicated with binary Hash codes
Data, in the similitude between measuring different data, it is only necessary to base be carried out to the Hash codes of different data in hamming space
In the XOR operation of digit.Hash method effectively reduces computational complexity, uses less memory space.Based on Hash across
Modal Method obtains different modalities usually by the data projection of different modalities to a general hamming space in this space
The Hash codes of data, so as to directly carry out the measuring similarity between different modalities data.Cross-module state inspection based on Hash
Suo Fangfa has been obtained for effectively applying, and the common matrix decomposition (CMFH) based on Hash is that multi-modal data learns to one altogether
Hash codes, and measuring similarity is carried out in general semantics space using it;Potential applications sparse hash (LSSH) is respectively
The high-layer semantic information of two modal datas is obtained using sparse coding and matrix decomposition, is then carried out across matchmaker using hash method
Physical examination rope.
Although there are many cross-media retrieval method based on Hash, existing method does not account for data characteristics
Identification is distributed.(similar data characteristics is as close possible to inhomogeneous data characteristics to the greatest extent may be used for the identification distribution of data characteristics
Can be separate) it can make the better accurate of cross-media retrieval.Therefore, how image and text are being projected into semantic space
While keep their own identification to be distributed, be that current those skilled in the art still need to solve to improve retrieval precision
The technical issues of.
Invention content
The present invention is to solve the above-mentioned problems, it is proposed that a kind of to be associated with the cross-module state retrieval for maximizing Hash based on identification
Method still maintains each mode after the data characteristics of text and image modalities to be projected to a common hamming space
Identification distribution and so that the association between pairs of multi-modal data is maximized, to improve cross-module state retrieve it is accurate
Degree.
The specific technical solution of the present invention is as follows:
It is a kind of to be associated with the cross-module state search method for maximizing Hash based on identification, include the following steps:
Step 1:Training dataset is obtained, wherein each sample includes pairs of two modal datas of image and text;
Step 2:Multi-modal extraction is carried out to training dataset, obtains training multi-modal data collection Otrain;
Step 3:For training multi-modal data collection Otrain, construct being associated with based on identification on the data set and maximize Kazakhstan
Uncommon object function;
Step 4:The object function is solved, the projection matrix for projecting to common hamming space of image, text is obtained
W1And W2, image text pair joint Hash codes B, use joint Hash codes B as the Hash codes to image and text;
Step 5:Test data set is obtained, and multi-modal extraction is carried out to it, obtains test multi-modal data collection Otest;
Step 6:For testing multi-modal data collection Otest, the projection matrix W that is acquired according to step 31And W2, number will be tested
The common hamming space is projected to according to the image or text of concentrating each sample, and training set is quantified as by hash function
The Hash codes of sample;
Step 7:Carry out cross-module state retrieval, be based on Hash codes, the training data concentrate retrieval with it is to be checked in test set
The object of the relevant another mode of rope sample;
Object function is in the step 3:
Wherein,It is the data characteristics square of image and text respectively
Battle array,It is label matrix;λ, μ1, μ2, β, α are balance parameters, and γ is regularization parameter.
Further, the step 3 includes:
Step 3-1:If training multi-modal data subset OtrainEach data sample isWherein,It is the feature vector of image,It is the feature vector of text, yi∈{0,1}cIt is category label, N is sample
Number;The data of two mode are projected to from original isomeric space in common hamming space, and are made in a sample in pairs
Image and text between association maximize:
Step 3-2:Linear discriminant analysis processing is carried out to text modality data, and its characteristic is made to be transmitted to image modalities number
According to:
Step 3-3:It is Hash codes by two modal data Feature Conversions, the quantization of Hash codes will be obtained by hash function
Minimization of loss:
Step 3-4:Category label is added as supervision message, classifies to Hash codes:
Step 3-5:Increasing regularization term prevents over-fitting, is defined as:
Step 3-6:Step 3-1 to 3-5 is integrated, object function is obtained.
Further, the step 4 object function method for solving is:
Step 4-1:Other in object function are fixed, the projection matrix W of image modalities is solved1;
Step 4-2:Other in object function are fixed, the projection matrix W of text modality is solved2;
Step 4-3:Other in object function are fixed, joint Hash codes B is solved;
Step 4-4:Other in object function are fixed, grader matrix Q is solved.
Further, the search method further includes:Retrieval is being judged according to the category label that multi-modal data collection carries just
True rate.
According to another aspect of the present invention, the present invention also provides a kind of object function structure sides for the retrieval of cross-module state
Method, including:
Step 1:Training dataset is obtained, wherein each sample includes pairs of two modal datas of image and text;
Multi-modal extraction is carried out to the training dataset, obtains training multi-modal data collection Otrain;
Step 2:The data of two mode are projected to from original isomeric space in common hamming space, and make one
Association in sample between pairs of image and text maximizes;
Step 3:Linear discriminant analysis processing is carried out to text modality data, and its characteristic is made to be transmitted to image modalities number
According to;
Step 4:It is Hash codes by two modal data Feature Conversions, the quantization that Hash codes are obtained by hash function is damaged
It loses and minimizes;
Step 5:Category label is added as supervision message;
Step 6:Increasing regularization term prevents over-fitting;
Step 7:Step 2 to 6 is integrated, obtains being associated with the object function object function for maximizing Hash based on identification.
Further, the association between image and text pairs of in a sample is made to maximize definition in the step 2
For:
Wherein, V and T is the data characteristics matrix of image and text, W respectively1And W2Respectively image, text project to
The projection matrix in common hamming space.
Further, the step 3 includes:Linear discriminant analysis processing is carried out to text modality data, obtains phase in class
Like degree matrix SwThe similarity matrix S between classb, which is transmitted to image modalities data, is defined as:
Further, the quantization minimization of loss that Hash codes are obtained by hash function is defined as by the step 4:
Wherein, B is joint Hash codes.
Further, step 5 category label is defined as:
Wherein, Q is grader matrix.
Further, step 6 regularization term is defined as:
Beneficial effects of the present invention are:
The present invention takes full advantage of the identification distribution of data characteristics, to text in the cross-media retrieval based on Hash
Mode carries out linear discriminant analysis and its characteristic is passed to image modalities.It it also allows the multi-modal number of same sample
Association is maximized according to being still maintained after projecting to common hamming space.This all makes the data characteristics in hamming space point
Cloth more has identification, to be easier to make for classifying by the Hash codes that data characteristics quantifies, to improve across media
The performance of retrieval, while the application of salted hash Salted can reduce consumption of the cross-module state retrieval in the time, spatially.
Description of the drawings
The accompanying drawings which form a part of this application be for providing further understanding of the present application, the application's
Suitability embodiment and its explanation do not constitute the improper restriction to the application for explaining the application.
Fig. 1 is that the cross-media retrieval general flow chart for maximizing Hash is associated with based on identification;
Fig. 2 is the organigram for the object function that maximized Hash is associated with based on identification;
Fig. 3 is the schematic diagram for solving the object function.
Specific implementation mode
Below in conjunction with drawings and examples, technical scheme in the embodiment of the invention is clearly and completely described.
It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the application.Unless another
It indicates, all technical and scientific terms used herein has usual with the application person of an ordinary skill in the technical field
The identical meanings of understanding.
It should be noted that term used herein above is merely to describe specific implementation mode, and be not intended to restricted root
According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singulative
It is also intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet
Include " when, indicate existing characteristics, step, operation, device, component and/or combination thereof.
Embodiment one
A kind of cross-module state search method for being associated with based on identification and maximizing Hash is present embodiments provided, as shown in Figure 1,
Include the following steps:
Step 1:Training dataset is obtained, wherein each sample includes pairs of two modal datas of image and text;
Step 2:Multi-modal extraction is carried out to training dataset, obtains training multi-modal data collection Otrain;
Step 3:For training multi-modal data collection Otrain, construct being associated with based on identification on the data set and maximize Kazakhstan
Uncommon object function;
Step 4:The object function is solved, the projection matrix for projecting to common hamming space of image, text is obtained
W1And W2, image text pair joint Hash codes B and grader matrix Q, use joint Hash codes B as this to image and text
Hash codes;
Step 5:Test data set is obtained, and multi-modal extraction is carried out to it, obtains test multi-modal data collection Otest;
Step 6:For test data set Otest, the projection matrix W that is acquired according to step 31And W2, test data is concentrated
The image or text of each sample project to the common hamming space, and utilize the hash function hash function f learnt
(V)=sgn (W1) and g (T)=sgn (W V2T), the Hash codes that test data concentrates image and text can directly be acquired;
Step 7:Carry out cross-module state retrieval, be based on Hash codes, the training data concentrate retrieval with it is to be checked in test set
The object of the relevant another mode of rope sample.
Described be associated with based on identification maximizes the object function of Hash as (as shown in Figure 2):
Wherein,It is the data characteristics square of image and text respectively
Battle array,It is label matrix;λ, μ1, μ2, β, α are balance parameters, and γ is regularization parameter.
Described be associated with based on identification is maximized the construction process of object function of Hash and is:
Step 1:Multi-modal data set O is obtained, the multi-modal data set includes training multi-modal data subset
OtrainWith test multi-modal data subset Otest;
It is assumed that each data sampleWhereinIt is the feature vector of image,
It is the feature vector of text, yi∈{0,1}cIt is category label, N is number of samples.Include a pair of of image in each data sample
Text pair, their physical characteristic are different, but their semantic meanings having the same, belong to same class.
We assume here that each sample belongs to one of c class.Then It is the data characteristics matrix of image and text respectively.It is label matrix, such as
One sample o of fruitiIn image and text data feature viAnd tiJ-th of class is belonged to, then yiJ-th of element be 1, remaining
It is 0.
Step 2:The data characteristics of original isomeric space is projected in a general hamming space.
Step 2-1:For OtrainIn each sampleSet two mode of image and text
Hash function f (V)=sgn (W1) and g (T)=sgn (W V2T), the data of two mode are projected from original isomeric space
Into in a common hamming space.
The hash function of two mode is defined as:WithI
Use a kind of common hash function representation method, then the specific representation of two hash functions is as follows:F (V)=sgn
(W1) and g (T)=sgn (W V2T).Wherein sgn () is sign function, and continuous data discrete can be melted into binary Kazakhstan by it
Uncommon code;W1And W2It is the projection matrix of two mode respectively.
Step 2-2:Because the image and text in reset condition in one sample are pairs of, therefore after projection
Hamming space in the association between image and text pairs of in a sample should be made to maximize, be defined as follows:
Wherein W1And W2It is the projection matrix of image and text.
Step 2-3:In order to keep the identification characteristic of data, we introduce linear discriminant analysis (LDA) to text mould
The data of state are handled, and its characteristic is made to be transmitted to image modalities, are defined as follows:
Wherein SwFor similar degree in the class matrix, SbThe similarity matrix between class.
Linear discriminant analysis (LDA) into an optimal identification space, is projecting the data projection in higher dimensional space
In space afterwards, distance is as big as possible between different classes of data, and the distance between similar data are as small as possible.To text
The data of mode carry out linear discriminant analysis, define SwFor similar degree in the class matrix, SbThe similarity matrix between class.We are to text
The data of this mode carry out linear discriminant analysis, this can be such that the data distribution for projecting to the text modality in public hamming space has
Identification passes through SwAnd SbThis characteristic is transmitted to image modalities, is defined as:
Wherein tr () is the mark of matrix.The formula is equivalent to:
Step 3:Using the hash function defined in step (2-1), the image and text in public hamming space will be projected to
Data characteristics be quantified as Hash codes.
Because a sample is made of a pair of of image and text, their semantic meanings having the same, therefore we
Introduce an auxiliary variable --- the joint Hash codes of two modeThat is a pair of of figure in a sample
Picture and text use the same Hash codes.It is as small as possible that we should be such that the quantization of generation Hash codes loses as possible, is defined as follows:
Step 4:Category label is added as supervision message, the joint Hash codes that we learn to obtain can be easy to handy
In classification, specifically, the Hash codes learnt are B, be its increased semantic information it is Y, due to semantic information Y and Hash codes B
Matrix dimensionality is inconsistent, introduces grader matrix Q and is converted.It is defined as:
Step 5:Over-fitting in order to prevent carries out regularization constraint to projection matrix, is defined as:
Increasing regularization term prevents over-fitting, is defined as:
Above five steps are integrated, we obtain a complete object function:
Wherein λ, μ1, μ2, β, α are balance parameters, and γ is regularization parameter (for preventing over-fitting).
Our purpose is to obtain projection matrix W by the object function of solution above1And W2, joint Hash codes B.Due to
Contain multiple known variables in object function, it can not direct solution.Therefore the present invention proposes an iterative solution algorithm, fixed
Its dependent variable solves a variable, we may finally obtain optimal solution in this way.In addition, calculating for simplicity, we will combine
The discrete constraint B ∈ { -1,1 } of Hash codes BL×NLoosen as continuous constraint 0≤B≤1.
According to the object function for maximizing Hash is associated with based on identification, we have proposed an iterative solution algorithms (such as
Shown in Fig. 3), for solving our required projection matrix W1And W2, joint Hash codes B and grader matrix Q.
Step 1:Its dependent variable W in fixed object function2, Q and B, solve projection matrix W1.Object function becomes:
By to W1Partial derivative is sought, obtained W1Solution:
W1=(μ1BVT+λW2TVT)(μ1VVT+λVVT+γI)。
Step 2:By fixing its dependent variable W1, Q and B, solve projection matrix W2.Object function becomes:
By calculating W2Partial derivative and enable its be equal to 0, obtain W2Solution:
Step 3:Fix its dependent variable W1、W2And Q, solve joint Hash codes B.Object function becomes:
By calculating the partial derivative of B and it being enabled to be equal to 0, the solution of B is obtained:
B=(α QTQ+(μ1+μ2)I)-1(αQTY+μ1W1V+μ2W2T)。
Step 4:Fix its dependent variable W1、W2And B, solve grader matrix Q.Object function becomes:
By calculating the partial derivative of Q and it being enabled to be equal to 0, the solution of Q is obtained:
Q=(α YBT)(αBBT+γI)-1。
Finally, we use joint Hash codes B as the Hash codes of training sample, for new test sample, Wo Mentong
It crosses and hash function is quantified to obtain the Hash codes of test sample.It is carried out across matchmaker by the similarity-rough set between Hash codes
Physical examination rope.
The search method further includes:Retrieval accuracy is judged according to the category label that multi-modal data collection carries.Here
We assess the retrieval accuracy of this method using common Average Accuracy (MAP) value.A sample retrieval set is given,
The Average Accuracy (AP) of wherein each sample retrieval is defined as:WhereinIt is that sample retrieval is concentrated
The sum of sample, P (r) indicates the ratio of the quantity and the sample size that is all retrieved of correlated samples, if r-th of retrieval obtains
Sample then δ (r)=1 related to query sample, otherwise δ (r)=0.Average value, that is, MAP of the AP values of all samples.
Embodiment two
According to the cross-module state search method for maximizing Hash is associated with based on identification above, present embodiments provide corresponding
Object function construction method, as shown in Fig. 2, including:
Step 1:Training dataset is obtained, wherein each sample includes pairs of two modal datas of image and text;
Multi-modal extraction is carried out to the training dataset, obtains training multi-modal data collection Otrain;
Step 2:The data of two mode are projected to from original isomeric space in common hamming space, and make one
Association in sample between pairs of image and text maximizes;
Step 3:Linear discriminant analysis processing is carried out to text modality data, and its characteristic is made to be transmitted to image modalities number
According to;
Step 4:It is Hash codes by two modal data Feature Conversions, the quantization that Hash codes are obtained by hash function is damaged
It loses and minimizes;
Step 5:Category label is added as supervision message;
Step 6:Increasing regularization term prevents over-fitting;
Step 7:Step 2 to 6 is integrated, obtains being associated with the object function object function for maximizing Hash based on identification.
It maximizes the association between image and text pairs of in a sample in the step 2 to be defined as:
Wherein, V and T is the data characteristics matrix of image and text, W respectively1And W2Respectively image, text project to
The projection matrix in common hamming space.
The step 3 includes:Linear discriminant analysis processing is carried out to text modality data, obtains similar degree in the class matrix Sw
The similarity matrix S between classb, which is transmitted to image modalities data, is defined as:
The quantization minimization of loss that Hash codes are obtained by hash function is defined as by the step 4:
Wherein, B is joint Hash codes.
Step 5 category label is defined as:
Wherein, Q is grader matrix.
Step 6 regularization term is defined as:
Experiment effect:
It is verified with the image text data in Wiki image text data sets, retrieval rate is as shown in table 1.
The retrieval accuracy of 6 kinds of cross-media retrievals (image retrieval text and text retrieval image) on 1 Wiki data sets of table
(MAP) compare
As can be seen that the data that the method for the present invention is respectively two mode of text and image learn to respective Hash letter
Original data characteristics is projected to a common hamming space, and carries out linear discriminant point to the data of text modality by number
Analyse (LDA) processing so that the text feature after projecting keeps identification, and this characteristic will be passed to image modalities.
In common hamming space, data characteristics can be transformed into Hash codes, can be easy to breathing out using Classmark information
Uncommon code is classified.These operations can obtain good cross-media retrieval effect, at the same the application of salted hash Salted can reduce across
Mode retrieves the consumption in the time, spatially.
The foregoing is merely the preferred embodiments of the application, are not intended to limit this application, for the skill of this field
For art personnel, the application can have various modifications and variations.Within the spirit and principles of this application, any made by repair
Change, equivalent replacement, improvement etc., should be included within the protection domain of the application.
Above-mentioned, although the foregoing specific embodiments of the present invention is described with reference to the accompanying drawings, not protects model to the present invention
The limitation enclosed, those skilled in the art should understand that, based on the technical solutions of the present invention, those skilled in the art are not
Need to make the creative labor the various modifications or changes that can be made still within protection scope of the present invention.
Claims (10)
1. a kind of being associated with the cross-module state search method for maximizing Hash based on identification, which is characterized in that include the following steps:
Step 1:Training dataset is obtained, wherein each sample includes pairs of two modal datas of image and text;
Step 2:Multi-modal extraction is carried out to training dataset, obtains training multi-modal data collection Otrain;
Step 3:For training multi-modal data collection Otrain, construct being associated with based on identification on the data set and maximize Hash
Object function;
Step 4:The object function is solved, the projection matrix W for projecting to common hamming space of image, text is obtained1With
W2, image text pair Hash codes B;
Step 5:Test data set is obtained, and multi-modal extraction is carried out to it, obtains test multi-modal data collection Otest;
Step 6:For testing multi-modal data collection Otest, the projection matrix W that is acquired according to step 41And W2, by test data set
In each sample image or text project to the common hamming space, and Hash codes are quantified as by hash function;
Step 7:Cross-module state retrieval is carried out, Hash codes are based on, retrieval and sample to be retrieved in test set are concentrated in the training data
The object of this relevant another mode;
Object function is in the step 3:
s.t.B∈{-1,1}L×N,W1W1 T=Ik,
Wherein,It is the data characteristics matrix of image and text respectively,It is label matrix;λ, μ1, μ2, β, α are balance parameters, and γ is regularization parameter, SwFor similar degree in the class
Matrix, SbThe similarity matrix between class, Q are grader matrix, and N is number of samples, and c indicates classification number.
2. a kind of cross-module state search method being associated with maximization Hash based on identification as described in claim 1, feature are existed
In the step 3 includes:
Step 3-1:If training multi-modal data subset OtrainEach data sample isWherein,It is the feature vector of image,It is the feature vector of text, yi∈{0,1}cIt is category label, N is sample
Number;The data of two mode are projected to from original isomeric space in common hamming space, and are made in a sample in pairs
Image and text between association maximize:
s.t.W1W1 T=Ik,
Step 3-2:Linear discriminant analysis processing is carried out to text modality data, and its characteristic is made to be transmitted to image modalities data:
Step 3-3:It is Hash codes by two modal data Feature Conversions, the quantization that Hash codes are obtained by hash function is lost
It minimizes:
s.t.B∈{-1,1}L,W1W1 T=Ik,
Step 3-4:Category label is added as supervision message, classifies to Hash codes:
s.t.B∈{-1,1}L
Step 3-5:Increasing regularization term prevents over-fitting, is defined as:
Step 3-6:Step 3-1 to 3-5 is integrated, object function is obtained.
3. a kind of cross-module state search method being associated with maximization Hash based on identification as claimed in claim 2, feature are existed
In the step 4 object function method for solving is:
Step 4-1:Other in object function are fixed, the projection matrix W of image modalities is solved1;
Step 4-2:Other in object function are fixed, the projection matrix W of text modality is solved2;
Step 4-3:Other in object function are fixed, joint Hash codes B is solved;
Step 4-4:Other in object function are fixed, grader matrix Q is solved.
4. a kind of cross-module state search method being associated with maximization Hash based on identification as described in claim 1, feature are existed
In the search method further includes:Retrieval accuracy is judged according to the category label that multi-modal data collection carries.
5. a kind of object function construction method for the retrieval of cross-module state, which is characterized in that including:
Step 1:Training dataset is obtained, wherein each sample includes pairs of two modal datas of image and text;To institute
It states training dataset and carries out multi-modal extraction, obtain training multi-modal data collection Otrain;
Step 2:The data of two mode are projected to from original isomeric space in common hamming space, and make a sample
In association between pairs of image and text maximize;
Step 3:Linear discriminant analysis processing is carried out to text modality data, and its characteristic is made to be transmitted to image modalities data;
Step 4:It is Hash codes by two modal data Feature Conversions, the quantization loss of Hash codes will be obtained by hash function most
Smallization;
Step 5:Category label is added as supervision message;
Step 6:Increasing regularization term prevents over-fitting;
Step 7:Step 2 to 6 is integrated, obtains being associated with the object function object function for maximizing Hash based on identification.
6. a kind of object function construction method for the retrieval of cross-module state as claimed in claim 5, which is characterized in that the step
It maximizes the association between image and text pairs of in a sample in rapid 2 to be defined as:
s.t.W1W1 T=Ik,
Wherein, V and T is the data characteristics matrix of image and text, W respectively1And W2Respectively image, text project to it is public
Hamming space projection matrix.
7. a kind of object function construction method for the retrieval of cross-module state as claimed in claim 6, which is characterized in that the step
Rapid 3 include:Linear discriminant analysis processing is carried out to text modality data, obtains similar degree in the class matrix SwThe similarity moment between class
Battle array Sb, which is transmitted to image modalities data, is defined as:
SwFor similar degree in the class matrix, SbThe similarity matrix between class.
8. a kind of object function construction method for the retrieval of cross-module state as claimed in claims 6 or 7, which is characterized in that institute
Step 4 is stated to be defined as the quantization minimization of loss for obtaining Hash codes by hash function:
s.t.B∈{-1,1}L,W1W1 T=Ik,
Wherein, B is joint Hash codes.
9. a kind of object function construction method for the retrieval of cross-module state as claimed in claim 8, which is characterized in that step 5
Category label is defined as:
s.t.B∈{-1,1}L
Wherein, Q is grader matrix, and Y indicates label matrix.
10. a kind of object function construction method for the retrieval of cross-module state as claimed in claim 9, which is characterized in that described
Step 6 regularization term is defined as:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710581083.1A CN107402993B (en) | 2017-07-17 | 2017-07-17 | The cross-module state search method for maximizing Hash is associated with based on identification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710581083.1A CN107402993B (en) | 2017-07-17 | 2017-07-17 | The cross-module state search method for maximizing Hash is associated with based on identification |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107402993A CN107402993A (en) | 2017-11-28 |
CN107402993B true CN107402993B (en) | 2018-09-11 |
Family
ID=60400727
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710581083.1A Expired - Fee Related CN107402993B (en) | 2017-07-17 | 2017-07-17 | The cross-module state search method for maximizing Hash is associated with based on identification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107402993B (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108170755B (en) * | 2017-12-22 | 2020-04-07 | 西安电子科技大学 | Cross-modal Hash retrieval method based on triple deep network |
CN109376261B (en) * | 2018-10-29 | 2019-09-24 | 山东师范大学 | Mode independent retrieval method and system based on intermediate text semantic enhancing space |
CN109299216B (en) * | 2018-10-29 | 2019-07-23 | 山东师范大学 | A kind of cross-module state Hash search method and system merging supervision message |
CN109522946A (en) * | 2018-10-31 | 2019-03-26 | 咪咕文化科技有限公司 | A kind of image classification model treatment method, apparatus and storage medium |
CN109766455B (en) * | 2018-11-15 | 2021-09-24 | 南京邮电大学 | Identified full-similarity preserved Hash cross-modal retrieval method |
CN109766481B (en) * | 2019-01-11 | 2021-06-08 | 西安电子科技大学 | Online Hash cross-modal information retrieval method based on collaborative matrix decomposition |
CN111460077B (en) * | 2019-01-22 | 2021-03-26 | 大连理工大学 | Cross-modal Hash retrieval method based on class semantic guidance |
CN110019652B (en) * | 2019-03-14 | 2022-06-03 | 九江学院 | Cross-modal Hash retrieval method based on deep learning |
CN110059198B (en) * | 2019-04-08 | 2021-04-13 | 浙江大学 | Discrete hash retrieval method of cross-modal data based on similarity maintenance |
CN110059154B (en) * | 2019-04-10 | 2022-04-15 | 山东师范大学 | Cross-modal migration hash retrieval method based on inheritance mapping |
CN110188210B (en) * | 2019-05-10 | 2021-09-24 | 山东师范大学 | Cross-modal data retrieval method and system based on graph regularization and modal independence |
CN110674323B (en) * | 2019-09-02 | 2020-06-30 | 山东师范大学 | Unsupervised cross-modal Hash retrieval method and system based on virtual label regression |
CN111259176B (en) * | 2020-01-16 | 2021-08-17 | 合肥工业大学 | Cross-modal Hash retrieval method based on matrix decomposition and integrated with supervision information |
CN111368176B (en) * | 2020-03-02 | 2023-08-18 | 南京财经大学 | Cross-modal hash retrieval method and system based on supervision semantic coupling consistency |
CN111651577B (en) * | 2020-06-01 | 2023-04-21 | 全球能源互联网研究院有限公司 | Cross-media data association analysis model training and data association analysis method and system |
CN113343014A (en) * | 2021-05-25 | 2021-09-03 | 武汉理工大学 | Cross-modal image audio retrieval method based on deep heterogeneous correlation learning |
CN117033724B (en) * | 2023-08-24 | 2024-05-03 | 广州市景心科技股份有限公司 | Multi-mode data retrieval method based on semantic association |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101996191A (en) * | 2009-08-14 | 2011-03-30 | 北京大学 | Method and system for searching for two-dimensional cross-media element |
CN102629275A (en) * | 2012-03-21 | 2012-08-08 | 复旦大学 | Face and name aligning method and system facing to cross media news retrieval |
CN105205096A (en) * | 2015-08-18 | 2015-12-30 | 天津中科智能识别产业技术研究院有限公司 | Text modal and image modal crossing type data retrieval method |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9280587B2 (en) * | 2013-03-15 | 2016-03-08 | Xerox Corporation | Mailbox search engine using query multi-modal expansion and community-based smoothing |
US9830506B2 (en) * | 2015-11-09 | 2017-11-28 | The United States Of America As Represented By The Secretary Of The Army | Method of apparatus for cross-modal face matching using polarimetric image data |
CN106777318B (en) * | 2017-01-05 | 2019-12-10 | 西安电子科技大学 | Matrix decomposition cross-modal Hash retrieval method based on collaborative training |
-
2017
- 2017-07-17 CN CN201710581083.1A patent/CN107402993B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101996191A (en) * | 2009-08-14 | 2011-03-30 | 北京大学 | Method and system for searching for two-dimensional cross-media element |
CN102629275A (en) * | 2012-03-21 | 2012-08-08 | 复旦大学 | Face and name aligning method and system facing to cross media news retrieval |
CN105205096A (en) * | 2015-08-18 | 2015-12-30 | 天津中科智能识别产业技术研究院有限公司 | Text modal and image modal crossing type data retrieval method |
Non-Patent Citations (1)
Title |
---|
Linear Subspace Ranking Hashing for Cross-Modal Retrieval;Kai Li等;《IEEE Transactions on Pattern Analysis and Machine Intelligence》;20160919;第39卷(第9期);第1825-1838页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107402993A (en) | 2017-11-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107402993B (en) | The cross-module state search method for maximizing Hash is associated with based on identification | |
CN108897989B (en) | Biological event extraction method based on candidate event element attention mechanism | |
Mandal et al. | Generalized semantic preserving hashing for n-label cross-modal retrieval | |
CN106777318B (en) | Matrix decomposition cross-modal Hash retrieval method based on collaborative training | |
CN107256271B (en) | Cross-modal Hash retrieval method based on mapping dictionary learning | |
CN107729513B (en) | Discrete supervision cross-modal Hash retrieval method based on semantic alignment | |
CN106202256B (en) | Web image retrieval method based on semantic propagation and mixed multi-instance learning | |
US11176462B1 (en) | System and method for prediction of protein-ligand interactions and their bioactivity | |
CN109784405B (en) | Cross-modal retrieval method and system based on pseudo-tag learning and semantic consistency | |
Cheng et al. | Semi-supervised multi-graph hashing for scalable similarity search | |
Niu et al. | Knowledge-based topic model for unsupervised object discovery and localization | |
Ji et al. | Image-attribute reciprocally guided attention network for pedestrian attribute recognition | |
CN111126563B (en) | Target identification method and system based on space-time data of twin network | |
Li et al. | Hashing with dual complementary projection learning for fast image retrieval | |
CN112101029B (en) | Bert model-based university teacher recommendation management method | |
Xu et al. | Transductive visual-semantic embedding for zero-shot learning | |
Wang et al. | Asymmetric correlation quantization hashing for cross-modal retrieval | |
Sitaula et al. | Unsupervised deep features for privacy image classification | |
CN109857892B (en) | Semi-supervised cross-modal Hash retrieval method based on class label transfer | |
Shen et al. | Semi-paired hashing for cross-view retrieval | |
Tang et al. | Efficient dictionary learning for visual categorization | |
Yazici et al. | Color naming for multi-color fashion items | |
Wang et al. | Deep hashing with active pairwise supervision | |
CN107885854A (en) | A kind of semi-supervised cross-media retrieval method of feature based selection and virtual data generation | |
Xu et al. | Interaction content aware network embedding via co-embedding of nodes and edges |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180911 |