CN109871454A

CN109871454A - A kind of discrete across media Hash search methods of supervision of robust

Info

Publication number: CN109871454A
Application number: CN201910096204.2A
Authority: CN
Inventors: 姚涛; 闫连山; 吕高焕; 崔光海; 岳峻
Original assignee: Ludong University
Current assignee: Ludong University
Priority date: 2019-01-31
Filing date: 2019-01-31
Publication date: 2019-06-11
Anticipated expiration: 2039-01-31
Also published as: CN109871454B

Abstract

The invention discloses a kind of discrete across the media Hash search methods of supervision of robust, similarity matrix excavates the semantic association between isomery sample between sample two-by-two by learning a robust, the cross-media retrieval based on content can be realized using this method, method includes the following steps: establishing image and text data set, and vision and text feature are extracted respectively to the image and samples of text of data concentration；Similarity matrix between sample two-by-two is constructed respectively using class label, image and the text feature of sample, and learns using the sparse characteristic of the low-rank of similarity matrix and sample noise between sample two-by-two similarity matrix between the sample two-by-two of a robust；And then learn the better Hash codes of distinction using similarity matrix between the robust two-by-two sample；Hash function is appliedNorm canonical item constraint, to learn the hash function of more robust；It proposes a kind of discrete iteration optimization algorithm, directly obtains the discrete solution of Hash codes；Similarity matrix can effectively resist noise that may be present in sample to one robust of study of the method for the present invention between sample two-by-two, to greatly improve the performance of multimedia retrieval.

Description

A kind of discrete across media Hash search methods of supervision of robust

Technical field:

The present invention relates to a kind of discrete supervision cross-module state Hash search methods of robust, belong to multimedia retrieval and machine learning Field.

Background technique:

In recent years, a large amount of data can be all generated daily on internet, this brings huge to multimedia retrieval task Challenge, how efficiently and effectively to search approximate sample becomes urgent need.Hash method is by learning one group of hash function for sample This is mapped to Hamming space from original feature space, since its calculating speed in large-scale application is empty with saving storage fastly Between, cause the great concern of researcher.Hash codes are more much lower than the carrying cost of primitive character, while passing through Hamming space The middle similarity that can be rapidly calculated using XOR operation between sample.Extensive research has been obtained in hash method, but big Majority research is concerned only with a kind of mode, however the sample of identical semanteme can be typically expressed as multiple mode on the internet, this leads Cause the isomery semantic gap between different modalities.For example, image can be by vision and corresponding Text Representation.In addition, working as When user submits query sample to search engine, user prefers to search for the similar sample that engine returns to multiple modalities.Therefore, across Media retrieval causes more and more concerns.Target across media hash method is that isomery sample is mapped to a shared Chinese Prescribed space, and the similar structure of sample is kept in this space.Specifically, for similar isomery sample, in shared Hamming space Middle Hamming distance wants small, and vice versa.According to class label whether is used in the training process, across media hash methods usually can be with It is divided into two classes: unsupervised and measure of supervision.Similitude learns to breathe out between mode in the former mode usually by keeping sample Uncommon code, and the latter can be further combined with the better Hash codes of dividing property of class label learning region.Nearest work shows in conjunction with sample Class label retrieval performance can be improved.

Although many supervision cross-module state hash methods are it has been proposed that and achieve satisfactory as a result, however still having one A little problems need further to solve.Firstly, sample may contain noise in real world.But most of supervision cross-module states The class label configurations of training data similarity matrix between sample two-by-two is used only in hash method, without considering making an uproar in sample Sound, such as: outlier.Obviously, these noise samples can seriously damage the structure of similarity matrix between sample two-by-two, to mislead The study of Hash codes, causes retrieval performance to reduce.Secondly, the discrete constraint of Hash codes leads to mixed integer optimization problem usually very It is difficult to resolve certainly, most methods loosen the discrete constraint of Hash codes first, obtain continuous solution, and then quantization generates Hash codes.However, Quantization will lead to information loss, so that the differentiation reduced performance of Hash codes.

Summary of the invention:

One kind is provided it is an object of the invention to overcome the shortcomings of above-mentioned prior art with the better Hash of learning performance Code, the performance of boosting algorithm improve the separating capacity of Hash codes preferably to resist noise, are suitable for real network data Cross-media retrieval the discrete supervision cross-module state Hash search method of robust.

The purpose of the present invention can provide the measure that reach: a kind of discrete supervision cross-module state Hash retrieval side of robust Method, which is characterized in that method includes the following steps:

Step 1: collecting image and samples of text pair containing class label, image, the one-to-one cross-module state of text are constituted The image, text and data collection of retrieval；

Step 2: respectively to image and text modality sample extraction feature, and respectively to image and text modality sample Feature goes mean value, makes the characteristic mean value 0 of two mode samples；

Step 3: being training set and test set to random division by all samples in data set；

Step 4: constructing two respectively using the sample characteristics of the class label of sample pair, image and text modality in training set Similarity matrix between two samples, and the low-rank characteristic of similarity matrix and the sparse characteristic of noise sample between sample two-by-two are utilized, Learn similarity matrix between the sample two-by-two an of robust；The feature of training sample pair is set as X, X={ X⁽¹⁾, X⁽²⁾, wherein X⁽¹⁾ Indicate the sample characteristics of image modalities in training set, X⁽²⁾Indicate the sample characteristics of text modality in training set,Wherein d₁And d₂Respectively indicate image and text The dimension of mode sample characteristics, N indicate that image or text modality sample size in training set, the class label of sample pair are indicated with L,C indicates the quantity of sample class, l_i∈ { 0,1 }^cIf l_ij=1, indicate that i-th of sample belongs to Jth class；, whereas if l_ij=0, indicate that i-th of sample is not belonging to jth class；Learn robust similarity matrix between sample two-by-two Objective function the following steps are included:

(1) similarity moment between the sample two-by-two based on image modalities feature is calculated using the sample characteristics of image modalities Battle array, is defined as follows:

Wherein | | | |_FIndicate Frobenius norm, S⁽¹⁾Similarity matrix between the sample two-by-two of expression image modalities,Indicate the similarity of i-th of image pattern and j-th of image pattern, σ₁For scale parameter；

(2) sample characteristics of text modality is utilized to calculate similarity matrix between the sample two-by-two based on text modality feature, It is defined as follows:

Wherein S⁽²⁾Similarity matrix between the sample two-by-two of expression text modality,Indicate i-th of samples of text and j-th The similarity of samples of text, σ₂For scale parameter；

(3) similarity matrix calculating the sample two-by-two based on class label using the class label of sample between, is defined as follows:

Wherein S⁽³⁾Indicate sample to the similarity matrix two-by-two of label,Indicate i-th of sample to label and j-th of sample This similarity to label；

(4) objective function of similarity matrix is defined as follows study robust between sample two-by-two:

s.t.S⁽ⁱ⁾=S+ | | E⁽ⁱ⁾||₀

Wherein S indicates similarity matrix two-by-two between the robust sample of study, E⁽ⁱ⁾It indicates i-th two-by-two in similarity matrix Noise, the order of rank () representing matrix, | | | |₀Indicate l₀Norm；

(5) since there are discrete low-rank and l for the objective function in above-mentioned (4)₀The constraint of norm, so problem is difficult directly It solves, the two constraint conditions can be loosened, obtain the approximate solution of problem, institute's above formula can be rewritten as

s.t.S⁽ⁱ⁾=S+ | | E⁽ⁱ⁾||₁

(6) this problem is solved using augmented vector approach, obtain robust similarity matrix between sample two-by-two；

Step 5: construction objective function, specifically includes the following steps:

(1) similitude based on similarity matrix between robust two-by-two sample is kept in Hamming space, and due to image text This sample is identical to class label, therefore their distance should be as small as possible, so the objective function of Hash codes study is defined as follows:

Wherein k indicates the length of Hash codes, B₁For the Hash codes of image modalities sample, B₂For the Hash of text modality sample Code, λ is weight parameter；

(2) using Linear Mapping as hash function, and l is utilized_2,1Norm constrains image and text modality as regular terms The study of hash function, to enhance it to antimierophonic ability, therefore the objective function definition of each mode hash function study is such as Under:

Wherein W₁, W₂The hash function of image modalities and text modality is respectively indicated, Reg () indicates that regular terms prevented Fitting, hereinβ_iIt is weight parameter with μ；

(3) objective function that Hash codes and hash function learn is added as to the objective function of this method, is defined as follows:

Wherein β_iFor weight parameter；

Step 6: objective function is very since objective function includes the discrete constraint of multiple known variables and Hash codes Solution hard to find, but can be found through observation, it is convex optimization problem when fixing its dependent variable and solving wherein some variable, therefore Can use iteration optimization algorithms solution, solution procedure the following steps are included:

(1) fixed W₁, W₂And B₂, solve B₁:

Constant term is removed, objective function is writeable are as follows:

Due to B₁Be it is discrete, problem is difficult direct solution, can be solved with sample-by-sample herein, enable b_1iIndicate B₁I-th column, b_2jIndicate B₂Jth column, removal constant term objective function it is writeable are as follows:

This problem is still difficult direct solution, is solved herein using cyclic coordinate gradient descent method by bit, if b_1imTable Show b_1iM bit,Indicate b_1iThe vector that other bits other than m bit are constituted, then b_1imIt can be obtained by following formula:

Above formula is repeated until having solved the Hash codes of all image modalities samples；

(2) fixed W₁, W₂And B₁, solve B₂:

With solution B₁It is similar, it can obtain

Above formula is repeated until having solved the Hash codes of all text modality samples；

(3) fixed W₂, B₁And B₂, solve W₁:

Constant term is removed, objective function is writeable are as follows:

There are closed solutions for this problem

Wherein D₁For diagonal matrix,

(4) fixed W₁, B₁And B₂, solve W₂:

With solution W₁It is similar, W₂There are closed solutions

Wherein D₂For diagonal matrix,

(5) (1)-(4) are repeated to algorithmic statement or reach maximum number of iterations；

Step 7: user input query sample, extracts its feature, and go mean value to the feature of extraction；

Step 8: generating the Hash codes of query sample using the hash function learnt:

Step 9: calculating the Hamming distance of query sample and target (training) concentration isomery sample, and Hamming distance is pressed Ascending order arrangement, the corresponding sample of preceding r Hamming distance is search result.

The present invention can produce following good effect compared with the prior art: the method for the present invention by by class label, image and Similarity matrix between the sample two-by-two of one frame one robust of study of feature involvement of text modality, it is better with learning performance Hash codes, the performance of boosting algorithm；It proposes to apply l_2,1Study of the norm as canonical item constraint hash function, preferably to support Antinoise；A kind of discrete optimization algorithm is proposed, discrete Hash codes can be directly obtained, improves the separating capacity of Hash codes, The invention is suitable for the cross-media retrieval of real network data.

Detailed description of the invention:

Fig. 1 is the flow chart of the discrete supervision cross-module state Hash search method of robust of the present invention.

Specific embodiment:

To be more clearly understood that technical solution of the present invention, the present invention is further retouched in detail below in conjunction with specific embodiment It states, and is not the limitation to its protection scope.

Embodiment: a kind of discrete supervision cross-module state Hash search method of robust comprising following steps:

Step 2: extracting the feature of image and text, wherein image modalities sample is indicated with the textural characteristics of 150 dimensions, text BOW (Bag Of Words) character representation of 500 dimensions of this mode sample, and mean value is gone to feature, make two mode samples Characteristic mean value is 0；

Step 4: constructing two respectively using the sample characteristics of the class label of sample pair, image and text modality in training set Similarity matrix between two samples, and the low-rank characteristic of similarity matrix and the sparse characteristic of noise sample between sample two-by-two are utilized, Learn similarity matrix between the sample two-by-two an of robust；The feature of training sample pair is set as X, X={ X⁽¹⁾, X⁽²⁾, wherein X⁽¹⁾ Indicate the sample characteristics of image modalities in training set, X⁽²⁾Indicate the sample characteristics of text modality in training set,Wherein d₁And d₂Respectively indicate image and text The dimension of mode sample characteristics, N indicate that image or text modality sample size in training set, the class label of sample pair are indicated with L,C indicates the quantity of sample class, l_i∈ { 0,1 }^cIf l_ij=1, indicate that i-th of sample belongs to Jth class；, whereas if l_ij=0, indicate that i-th of sample is not belonging to jth class；D herein₁=150, d₂=500；

Learn robust two-by-two between sample similarity matrix objective function the following steps are included:

Wherein | | | |_FIndicate Frobenius norm, S⁽¹⁾Similarity matrix between the sample two-by-two of expression image modalities,

Indicate the similarity of i-th of image pattern and j-th of image pattern, σ₁For scale parameter, σ herein₁=0.8；

Wherein S⁽²⁾Similarity matrix between the sample two-by-two of expression text modality,Indicate i-th of samples of text and j-th The similarity of samples of text, σ₂For scale parameter, σ herein₂=0.3；

s.t.S⁽ⁱ⁾=S+ | | E⁽ⁱ⁾||₀

s.t.S⁽ⁱ⁾=S+ | | E⁽ⁱ⁾||₁

Wherein k indicates the length of Hash codes, B₁For the Hash codes of image modalities sample, B₂For the Hash of text modality sample Code, λ is weight parameter, herein λ=1；

Wherein W₁, W₂The hash function of image modalities and text modality is respectively indicated, Reg () indicates that regular terms prevented Fitting, hereinβ_iIt is weight parameter, herein β with μ₁=10, β₂=10, μ= 0.1:

(1) fixed W₁, W₂And B₂, solve B₁:

Constant term is removed, objective function is writeable are as follows:

(2) fixed W₁, W₂And B₁, solve B₂:

With solution B₁It is similar, it can obtain

(3) fixed W₂, B₁And B₂, solve W₁:

Constant term is removed, objective function is writeable are as follows:

There are closed solutions for this problem

Wherein D₁For diagonal matrix,

(4) fixed W₁, B₁And B₂, solve W₂:

With solution W₁It is similar, W₂There are closed solutions

Wherein D₂For diagonal matrix,

(5) (1)-(4) are repeated, if the mistake absolute value of the difference of iteration is less than 0.01 twice recently or the number of iterations is big Terminate in 20；

Step 9: calculating the Hamming distance of query sample and target (training) concentration isomery sample, and Hamming distance is pressed Ascending order arrangement, the corresponding sample of preceding r Hamming distance is search result, herein r=100.

In order to verify effectiveness of the invention, the present embodiment is by taking public data collection Mirflickr25K as an example, notebook data collection Comprising 20015 image texts pair, all samples are to can be divided into 24 classifications；Randomly select a sample pair of 15011 (75%) Composing training collection, and remaining a sample of 5004 (25%) is to composition test set；The textural characteristics of 150 dimensions of image modalities sample It indicates, BOW (Bag Of Words) character representation of 500 dimensions of text modality sample, and mean value, two mode samples is gone to feature This characteristic mean value is 0；In order to objectively evaluate the retrieval performance of the method for the present invention, with Average Accuracy (Mean Average Precision, MAP) it is used as evaluation criterion, on Mirflickr25K data set, the MAP of different Hash code length p The results are shown in Table 1.

MAP result of the table 1 on Mirflickr25K data set

	P=16	P=32	P=64	P=96
					Image retrieval text	0.6718	0.6785	0.6843	0.6918
Text retrieval image	0.6813	0.6953	0.6977	0.7045

It should be understood that the part that this specification does not elaborate belongs to the prior art.It is above-mentioned to implement for preferable The description of example is more careful, but cannot be therefore, it is considered that being the limitation to the invention patent protection scope.

Claims

1. a kind of discrete across media Hash search methods of supervision of robust, which is characterized in that this method comprises the following steps:

Step 1: collecting image and samples of text pair containing class label, the one-to-one cross-module state retrieval of image, text is constituted Image, text and data collection；

Step 2: respectively to image and text modality sample extraction feature, and respectively to the feature of image and text modality sample Mean value is gone, the characteristic mean value 0 of two mode samples is made；

Step 4: constructing two differences respectively using the sample characteristics of the class label of sample pair, image and text modality in training set This similarity matrix, and utilize the low-rank characteristic of similarity matrix and the sparse characteristic of noise sample between sample two-by-two, study Similarity matrix between the sample two-by-two of one robust；The feature of training sample pair is set as X, X={ X⁽¹⁾, X⁽²⁾, wherein X⁽¹⁾It indicates The sample characteristics of image modalities, X in training set⁽²⁾Indicate the sample characteristics of text modality in training set,Wherein d₁And d₂Respectively indicate image and text The dimension of mode sample characteristics, N indicate that image or text modality sample size in training set, the class label of sample pair are indicated with L,C indicates the quantity of sample class, l_i∈ { 0,1 }^cIf l_ij=1, indicate i-th of sample category In jth class；, whereas if l_ij=0, indicate that i-th of sample is not belonging to jth class；Learn robust similarity matrix between sample two-by-two Objective function the following steps are included:

(1) similarity matrix between the sample two-by-two based on image modalities feature is calculated using the sample characteristics of image modalities, it is fixed Justice is as follows:

Wherein | | | |_FIndicate Frobenius norm, S⁽¹⁾Similarity matrix between the sample two-by-two of expression image modalities,It indicates The similarity of i-th of image pattern and j-th of image pattern, σ₁For scale parameter；

(2) similarity matrix, definition between sample two-by-two of the sample characteristics calculating based on text modality feature of text modality are utilized It is as follows:

Wherein S⁽²⁾Similarity matrix between the sample two-by-two of expression text modality,Indicate i-th of samples of text and j-th of text The similarity of sample, σ₂For scale parameter；

Wherein S⁽³⁾Indicate sample to the similarity matrix two-by-two of label,Indicate i-th of sample to label and j-th of sample pair The similarity of label；

s.t.S⁽ⁱ⁾=S+ | | E⁽ⁱ⁾||₀

Wherein S indicates similarity matrix two-by-two between the robust sample of study, E⁽ⁱ⁾Indicate i-th of making an uproar in similarity matrix two-by-two Sound, the order of rank () representing matrix, | | | |₀Indicate l₀Norm；

(5) there are discrete low-rank and l for the objective function in above-mentioned (4)₀The constraint of norm, above formula can be rewritten as

s.t.S⁽ⁱ⁾=S+ | | E⁽ⁱ⁾||₁

(1) similitude based on similarity matrix between robust two-by-two sample, the target letter of Hash codes study are kept in Hamming space Number is defined as follows:

Wherein k indicates the length of Hash codes, B₁For the Hash codes of image modalities sample, B₂For the Hash codes of text modality sample, λ For weight parameter；

(2) using Linear Mapping as hash function, and l is utilized_2,1Norm constrains image and text modality Hash as regular terms The objective function of the study of function, each mode hash function study is defined as follows:

Wherein W₁, W₂The hash function of image modalities and text modality is respectively indicated, Reg () indicates that regular terms prevents over-fitting, Hereinβ_iIt is weight parameter with μ；

Wherein β_iFor weight parameter；

Step 6: objective function using iteration optimization algorithms solve, solution procedure the following steps are included:

(1) fixed W₁, W₂And B₂, solve B₁:

Constant term is removed, objective function is writeable are as follows:

It can be solved herein with sample-by-sample, enable b_1iIndicate B₁I-th column, b_2jIndicate B₂Jth column, removal constant term objective function can It is written as:

It is solved herein using cyclic coordinate gradient descent method by bit, if b_1imIndicate b_1iM bit,Indicate b_1iIn addition to The vector that other bits outside m bit are constituted, then b_1imIt can be obtained by following formula:

(2) fixed W₁, W₂And B₁, solve B₂:

With solution B₁It is similar, it can obtain

(3) fixed W₂, B₁And B₂, solve W₁:

Constant term is removed, objective function is writeable are as follows:

There are closed solutions for this problem

Wherein D₁For diagonal matrix,

(4) fixed W₁, B₁And B₂, solve W₂:

With solution W₁It is similar, W₂There are closed solutions

Wherein D₂For diagonal matrix,

Step 9: calculating the Hamming distance of query sample and target (training) concentration isomery sample, and ascending order is pressed to Hamming distance Arrangement, the corresponding sample of preceding r Hamming distance is search result.