CN109871454B - Robust discrete supervision cross-media hash retrieval method - Google Patents

Robust discrete supervision cross-media hash retrieval method Download PDF

Info

Publication number
CN109871454B
CN109871454B CN201910096204.2A CN201910096204A CN109871454B CN 109871454 B CN109871454 B CN 109871454B CN 201910096204 A CN201910096204 A CN 201910096204A CN 109871454 B CN109871454 B CN 109871454B
Authority
CN
China
Prior art keywords
samples
sample
similarity matrix
text
hash
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910096204.2A
Other languages
Chinese (zh)
Other versions
CN109871454A (en
Inventor
姚涛
闫连山
吕高焕
崔光海
岳峻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ludong University
Original Assignee
Ludong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ludong University filed Critical Ludong University
Priority to CN201910096204.2A priority Critical patent/CN109871454B/en
Publication of CN109871454A publication Critical patent/CN109871454A/en
Application granted granted Critical
Publication of CN109871454B publication Critical patent/CN109871454B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a robust discrete supervision cross-media hash retrieval method, which can realize content-based cross-media retrieval by learning a robust similarity matrix between every two samples to mine semantic association among heterogeneous samples, and comprises the following steps: establishing an image and text data set, and respectively extracting visual and text characteristics from the image and text samples in the data set; respectively constructing a similarity matrix between every two samples by using class labels, images and text characteristics of the samples, and learning a robust similarity matrix between every two samples by using low rank of the similarity matrix between every two samples and sparse characteristics of sample noise; further, the hash codes with better differentiation are learned by utilizing the similarity matrix between every two robust samples; applying a hash functionNorms regular term constraints to learn a more robust hash function; providing a discrete iterative optimization algorithm to directly obtain a discrete solution of the hash code; the method of the invention learns a similarity matrix between every two robust samples, which can effectively resist noise possibly existing in the samples, thereby greatly improving the performance of multimedia retrieval.

Description

Robust discrete supervision cross-media hash retrieval method
Technical field:
the invention relates to a robust discrete supervision cross-modal hash retrieval method, belonging to the fields of multimedia retrieval and machine learning.
The background technology is as follows:
in recent years, a large amount of data is generated daily on the internet, which brings a great challenge to the task of multimedia retrieval, and how to efficiently and effectively find approximate samples is an urgent need. The hash method maps samples from the original feature space to the hamming space by learning a set of hash functions, which is of great interest to researchers due to its fast computation speed and memory space saving in large-scale applications. Hash codes are much less expensive to store than the original features, while the similarity between samples can be computed quickly by using XOR operations in hamming space. Hash methods have been widely studied, but most have focused on only one modality, however samples of the same semantics on the internet can often be represented as multiple modalities, which results in heterogeneous semantic gaps between different modalities. For example, the image may be represented by visual and corresponding text features. In addition, when a user submits a sample of queries to a search engine, the user prefers the search engine to return similar samples of multiple modalities. Thus, cross-media retrieval is attracting more and more attention. The goal of the cross-media hashing method is to map heterogeneous samples into a shared hamming space where similar structures of samples are maintained. In particular, for similar heterogeneous samples, the hamming distance is small in the shared hamming space and vice versa. Cross-media hashing methods can generally be divided into two categories depending on whether class labels are used in the training process: unsupervised and supervised methods. The former typically learns hash codes by preserving intra-and inter-modal similarity of samples, while the latter may further learn more discriminative hash codes in combination with class labels. Recent work has shown that incorporating class labels for samples can improve retrieval performance.
While many supervised cross-modal hash methods have been proposed and achieved with satisfactory results, there are still some problems to be further solved. First, in the real world, the sample may contain noise. However, most supervised cross-modal hashing methods use only class labels of training data to construct a similarity matrix between samples, without considering noise in the samples, such as: outliers. Obviously, these noise samples seriously impair the structure of the similarity matrix between every two samples, so that the learning of the hash code is misled, and the retrieval performance is reduced. Secondly, the problem of mixed integer optimization caused by discrete constraints of the hash code is generally difficult to solve, and most methods firstly relax the discrete constraints of the hash code to obtain a continuous solution, and then quantize the solution to generate the hash code. However, quantization may result in information loss, so that the distinguishing performance of the hash code is degraded.
The invention comprises the following steps:
the invention aims to overcome the defects of the prior art and provide the robust discrete supervision cross-modal hash retrieval method which has better learning performance, improves the algorithm performance, better noise resistance and the distinguishing capability of the hash codes, and is suitable for cross-media retrieval of real network data.
The object of the invention can be achieved by providing the following measures: a robust discrete supervision cross-modal hash retrieval method is characterized by comprising the following steps:
the first step: collecting image and text sample pairs containing class labels to form a cross-modal retrieval image-text data set corresponding to the images and the texts one by one;
and a second step of: extracting features from the image and text modal samples respectively, and removing the average value of the features of the image and text modal samples respectively to enable the average value of the feature data of the two modal samples to be 0;
and a third step of: randomly dividing all sample pairs in a data set into a training set and a testing set;
fourth step: respectively constructing a similarity matrix between every two samples by using sample characteristics of class labels, images and text modes of a training set sample pair, and learning a robust similarity matrix between every two samples by using low-rank characteristics of the similarity matrix between every two samples and sparse characteristics of noise samples; the feature of the training sample pair is set as X, x= { X (1) ,X (2) (wherein X is (1) Sample features representing image modalities in a training set, X (2) Sample features representing a text modality in a training set,wherein d is 1 And d 2 The dimension of the image and text mode sample characteristics are respectively represented, N represents the number of the images or text mode samples in the training set, class labels of the sample pairs are represented by L,c represents the number of sample classes, l i ∈{0,1} c If l ij =1, indicating that the i-th sample belongs to the j-th class; conversely, if l ij =0, represent the firsti samples do not belong to class j; the objective function for learning the similarity matrix between every two robust samples comprises the following steps:
(1) Calculating a similarity matrix between every two samples based on image mode characteristics by using the sample characteristics of the image mode, wherein the similarity matrix is defined as follows:
wherein I II F Representing the Frobenius norm, S (1) A similarity matrix between every two samples representing the image modality,representing the similarity, sigma, of the ith and jth image samples 1 Is a scale parameter;
(2) Calculating a similarity matrix between every two samples based on text mode characteristics by using the text mode sample characteristics, wherein the similarity matrix is defined as follows:
wherein S is (2) A similarity matrix between every two samples representing the text modality,representing similarity, sigma, of the ith text sample and the jth text sample 2 Is a scale parameter;
(3) Calculating a similarity matrix between every two samples based on class labels by using class labels of the sample pairs, wherein the similarity matrix is defined as follows:
wherein S is (3) A pairwise similarity matrix representing sample pairs of labels,representing the similarity of the ith sample to the label and the jth sample to the label;
(4) The objective function of learning the similarity matrix between every two robust samples is defined as follows:
s.t.S (i) =S+||E (i) || 0
wherein S represents a learned robust inter-sample pairwise similarity matrix, E (i) Representing noise in the ith pairwise similarity matrix, rank (·) represents the rank of the matrix, I.I 0 Representation l 0 A norm;
(5) Due to the discrete low rank sum l of the objective function in (4) above 0 The constraint of the norm makes it difficult to solve the problem directly, and the two constraint conditions can be relaxed to obtain an approximate solution of the problem, so that the above formula can be rewritten as
s.t.S (i) =S+||E (i) || 1
Wherein I II * The number of kernels is represented by a kernel norm, I.I 1 Representation l 1 The norm of the sample is calculated,
(6) Solving the problem by using an augmentation Lagrangian multiplier method to obtain a similarity matrix between every two robust samples;
fifth step: constructing an objective function, specifically comprising the following steps:
(1) Similarity based on a similarity matrix between every two robust samples is kept in a hamming space, and since the image text samples have the same class labels, the distance between the image text samples is as small as possible, and therefore, the objective function of hash code learning is defined as follows:
where k represents the length of the hash code, B 1 Hash code of image mode sample, B 2 Hash codes of text mode samples, and lambda is a weight parameter;
(2) Using linear mapping as a hash function and using l 2,1 The norms are used as regular terms to constrain the learning of the image and text modal hash functions so as to enhance the noise resistance of the image and text modal hash functions, and therefore, the objective function learned by each modal hash function is defined as follows:
wherein W is 1 ,W 2 Hash functions representing image modality and text modality, respectively, reg (·) representing a regularization term preventing overfitting, hereβ i And mu is a weight parameter;
(3) Adding the hash code and the objective function learned by the hash function to obtain the objective function of the method, wherein the definition is as follows:
wherein beta is i Is a weight parameter;
sixth step: because the objective function contains a plurality of unknown variables and discrete constraints of the hash code, the objective function is difficult to solve, but through observation, it can be found that when one variable is solved by fixing other variables, the objective function is a convex optimization problem, so that the objective function can be solved by using an iterative optimization algorithm, and the solving process comprises the following steps:
(1) Fix W 1 ,W 2 And B 2 Solution B 1
The constant term is removed and the objective function can be written as:
due to B 1 Discrete, the problem is difficult to solve directly, where it can be solved sample by sample, let b 1i Representation B 1 Ith column, b 2j Representation B 2 The remove constant entry scalar function can be written as:
the problem is still difficult to directly solve, and the problem is solved bit by adopting a cyclic coordinate gradient descent method, and b is set 1im Representation b 1i Is selected from the group consisting of the (m) th bit,representation b 1i A vector of bits other than the mth bit, b 1im Can be obtained by the following formula:
repeating the above steps until the hash codes of all the image mode samples are solved;
(2) Fix W 1 ,W 2 And B 1 Solution B 2
And solving for B 1 Similarly, can obtain
Repeating the above steps until the hash codes of all text mode samples are solved;
(3) Fix W 2 ,B 1 And B 2 Solving for W 1
The constant term is removed and the objective function can be written as:
this problem has a closed-form solution
Wherein D is 1 In the form of a diagonal array,
(4) Fix W 1 ,B 1 And B 2 Solving for W 2
And solving W 1 Similarly, W 2 There is a closed solution
Wherein D is 2 In the form of a diagonal array,
(5) Repeatedly executing the steps (1) - (4) until the algorithm converges or the maximum iteration number is reached;
seventh step: the user inputs a query sample, extracts the characteristics of the query sample, and removes the average value of the extracted characteristics;
eighth step: generating a hash code of the query sample using the learned hash function:
ninth step: and calculating the hamming distances between the query sample and the heterogeneous samples in the target (training) set, and arranging the hamming distances in an ascending order, wherein the samples corresponding to the first r hamming distances are the retrieval results.
Compared with the prior art, the invention has the following positive effects: according to the method, the characteristics of class labels, images and text modes are integrated into a frame to learn a robust similarity matrix between every two samples, so that hash codes with better performance are learned, and the performance of an algorithm is improved; propose applying l 2,1 The norms are used as regular terms to restrict the learning of the hash function so as to better resist noise; the discrete optimization algorithm is provided, the discrete hash codes can be directly obtained, the distinguishing capability of the hash codes is improved, and the method is suitable for cross-media retrieval of real network data.
Description of the drawings:
FIG. 1 is a flow chart of a robust discrete supervised cross-modal hash retrieval method of the present invention.
The specific embodiment is as follows:
in order to make the technical scheme of the present invention more obvious, the present invention is further described in detail below with reference to the specific embodiments, but not limited to the protection scope thereof.
Examples: a robust discrete supervised cross-modal hash retrieval method, comprising the steps of:
the first step: collecting image and text sample pairs containing class labels to form a cross-modal retrieval image-text data set corresponding to the images and the texts one by one;
and a second step of: extracting features of an image and a text, wherein an image mode sample is represented by 150-dimensional texture features, a text mode sample is represented by 500-dimensional BOW (Bag Of Words) features, and the features are subjected to mean value removal to enable the feature data mean value of the two mode samples to be 0;
and a third step of: randomly dividing all sample pairs in a data set into a training set and a testing set;
fourth step: respectively constructing similarity matrixes between every two samples by using class labels of sample pairs in training set, sample characteristics of images and text modes, and utilizing every two samplesThe low-rank characteristic of the similarity matrix between samples and the sparse characteristic of noise samples are used for learning a robust similarity matrix between every two samples; the feature of the training sample pair is set as X, x= { X (1) ,X (2) (wherein X is (1) Sample features representing image modalities in a training set, X (2) Sample features representing a text modality in a training set,wherein d is 1 And d 2 The dimension of the image and text mode sample characteristics are respectively represented, N represents the number of the images or text mode samples in the training set, class labels of the sample pairs are represented by L,c represents the number of sample classes, l i ∈{0,1} c If l ij =1, indicating that the i-th sample belongs to the j-th class; conversely, if l ij =0, indicating that the i-th sample does not belong to the j-th class; where d 1 =150,d 2 =500;
The objective function for learning the similarity matrix between every two robust samples comprises the following steps:
(1) Calculating a similarity matrix between every two samples based on image mode characteristics by using the sample characteristics of the image mode, wherein the similarity matrix is defined as follows:
wherein I II F Representing the Frobenius norm, S (1) A similarity matrix between every two samples representing the image modality,
representing the similarity, sigma, of the ith and jth image samples 1 As a scale parameter, here σ 1 =0.8;
(2) Calculating a similarity matrix between every two samples based on text mode characteristics by using the text mode sample characteristics, wherein the similarity matrix is defined as follows:
wherein S is (2) A similarity matrix between every two samples representing the text modality,representing similarity, sigma, of the ith text sample and the jth text sample 2 As a scale parameter, here σ 2 =0.3;
(3) Calculating a similarity matrix between every two samples based on class labels by using class labels of the sample pairs, wherein the similarity matrix is defined as follows:
wherein S is (3) A pairwise similarity matrix representing sample pairs of labels,representing the similarity of the ith sample to the label and the jth sample to the label;
(4) The objective function of learning the similarity matrix between every two robust samples is defined as follows:
s.t.S (i) =S+||E (i) || 0
wherein S represents a learned robust inter-sample pairwise similarity matrix, E (i) Representing noise in the ith pairwise similarity matrix, rank (·) represents the rank of the matrix, I.I 0 Representation l 0 A norm;
(5) Due to the discrete low rank sum l of the objective function in (4) above 0 The constraint of norms, so the problem is difficult to directly solve, and the two constraints can be relaxedThe conditions are such that an approximate solution of the problem is obtained, so the above can be rewritten as
s.t.S (i) =S+||E (i) || 1
Wherein I II * The number of kernels is represented by a kernel norm, I.I 1 Representation l 1 The norm of the sample is calculated,
(6) Solving the problem by using an augmentation Lagrangian multiplier method to obtain a similarity matrix between every two robust samples;
fifth step: constructing an objective function, specifically comprising the following steps:
(1) Similarity based on a similarity matrix between every two robust samples is kept in a hamming space, and since the image text samples have the same class labels, the distance between the image text samples is as small as possible, and therefore, the objective function of hash code learning is defined as follows:
where k represents the length of the hash code, B 1 Hash code of image mode sample, B 2 As a hash code of a text modal sample, λ is a weight parameter, where λ=1;
(2) Using linear mapping as a hash function and using l 2,1 The norms are used as regular terms to constrain the learning of the image and text modal hash functions so as to enhance the noise resistance of the image and text modal hash functions, and therefore, the objective function learned by each modal hash function is defined as follows:
wherein W is 1 ,W 2 Hash functions representing image modality and text modality, respectively, reg (·) representing a regularization term preventing overfitting, hereβ i And μ is a weight parameter, where β 1 =10,β 2 =10,μ=0.1:
(3) Adding the hash code and the objective function learned by the hash function to obtain the objective function of the method, wherein the definition is as follows:
sixth step: because the objective function contains a plurality of unknown variables and discrete constraints of the hash code, the objective function is difficult to solve, but through observation, it can be found that when one variable is solved by fixing other variables, the objective function is a convex optimization problem, so that the objective function can be solved by using an iterative optimization algorithm, and the solving process comprises the following steps:
(1) Fix W 1 ,W 2 And B 2 Solution B 1
The constant term is removed and the objective function can be written as:
due to B 1 Discrete, the problem is difficult to solve directly, where it can be solved sample by sample, let b 1i Representation B 1 Ith column, b 2j Representation B 2 The remove constant entry scalar function can be written as:
the problem is still difficult to directly solve, and the problem is solved bit by adopting a cyclic coordinate gradient descent method, and b is set 1im Representation b 1i Is selected from the group consisting of the (m) th bit,representation b 1i A vector of bits other than the mth bit, b 1im Can be obtained by the following formula:
repeating the above steps until the hash codes of all the image mode samples are solved;
(2) Fix W 1 ,W 2 And B 1 Solution B 2
And solving for B 1 Similarly, can obtain
Repeating the above steps until the hash codes of all text mode samples are solved;
(3) Fix W 2 ,B 1 And B 2 Solving for W 1
The constant term is removed and the objective function can be written as:
this problem has a closed-form solution
Wherein D is 1 In the form of a diagonal array,
(4) Fix W 1 ,B 1 And B 2 Solving for W 2
And solving W 1 Similarly, W 2 There is a closed solution
Wherein D is 2 In the form of a diagonal array,
(5) Repeating the steps (1) - (4), and ending if the absolute value of the error of the last two iterations is less than 0.01 or the number of iterations is greater than 20;
seventh step: the user inputs a query sample, extracts the characteristics of the query sample, and removes the average value of the extracted characteristics;
eighth step: generating a hash code of the query sample using the learned hash function:
ninth step: the Hamming distances between the query samples and the heterogeneous samples in the target (training) set are calculated, the Hamming distances are arranged in an ascending order, and the samples corresponding to the first r Hamming distances are the retrieval results, wherein r=100.
To verify the validity of the present invention, this embodiment takes the public dataset mirrflickr 25K as an example, the data set contains 20015 image text pairs, and all the sample pairs can be divided into 24 categories; randomly selecting 15011 (75%) sample pairs to form a training set, and the remaining 5004 (25%) sample pairs to form a test set; the image mode sample is represented by 150-dimensional texture features, the text mode sample is represented by 500-dimensional BOW (Bag Of Words) features, the features are subjected to mean value removal, and the feature data mean value of the two mode samples is 0; in order to objectively evaluate the search performance of the method of the present invention, the average accuracy (Mean Average Precision, MAP) is used as an evaluation criterion, and the MAP results of different hash code lengths p on the Mirflickr25K dataset are shown in table 1.
TABLE 1 MAP results on Mirflickr25K dataset
p=16 p=32 p=64 p=96
Image retrieval text 0.6718 0.6785 0.6843 0.6918
Text retrieval image 0.6813 0.6953 0.6977 0.7045
It should be understood that parts of the present specification not specifically described are prior art. The foregoing description of the preferred embodiments is provided as a detailed description of the preferred embodiments and is not intended to limit the scope of the invention.

Claims (1)

1. A robust discrete supervised cross-media hash retrieval method, the method comprising the steps of:
the first step: collecting image and text sample pairs containing class labels to form a cross-modal retrieval image-text data set corresponding to the images and the texts one by one;
and a second step of: extracting features from the image and text modal samples respectively, and removing the average value of the features of the image and text modal samples respectively to enable the average value of the feature data of the two modal samples to be 0;
and a third step of: randomly dividing all sample pairs in a data set into a training set and a testing set;
fourth step: respectively constructing a similarity matrix between every two samples by using sample characteristics of class labels, images and text modes of a training set sample pair, and learning a robust similarity matrix between every two samples by using low-rank characteristics of the similarity matrix between every two samples and sparse characteristics of noise samples; the feature of the training sample pair is set as X, x= { X (1) ,X (2) (wherein X is (1) Sample features representing image modalities in a training set, X (2) Sample features representing a text modality in a training set, wherein d is 1 And d 2 The dimension of the image and text mode sample characteristics are respectively represented, N represents the number of the images or text mode samples in the training set, class labels of the sample pairs are represented by L,c represents the number of sample classes, l i ∈{0,1} c If l ij =1, indicating that the i-th sample belongs to the j-th class; conversely, if l ij =0, indicating that the i-th sample does not belong to the j-th class; the objective function for learning the similarity matrix between every two robust samples comprises the following steps:
(1) Calculating a similarity matrix between every two samples based on image mode characteristics by using the sample characteristics of the image mode, wherein the similarity matrix is defined as follows:
wherein I II F Representing the Frobenius norm, S (1) A similarity matrix between every two samples representing the image modality,representing the similarity, sigma, of the ith and jth image samples 1 Is a scale parameter;
(2) Calculating a similarity matrix between every two samples based on text mode characteristics by using the text mode sample characteristics, wherein the similarity matrix is defined as follows:
wherein S is (2) A similarity matrix between every two samples representing the text modality,representing similarity, sigma, of the ith text sample and the jth text sample 2 Is a scale parameter;
(3) Calculating a similarity matrix between every two samples based on class labels by using class labels of the sample pairs, wherein the similarity matrix is defined as follows:
wherein S is (3) A pairwise similarity matrix representing sample pairs of labels,representing the similarity of the ith sample to the label and the jth sample to the label;
(4) The objective function of learning the similarity matrix between every two robust samples is defined as follows:
s.t.S (i) =S+||E (i) || 0
wherein S represents a learned robust inter-sample pairwise similarity matrix, E (i) Representing noise in the ith pairwise similarity matrix, rank (·) represents the rank of the matrix, I.I 0 Representation l 0 A norm;
(5) The objective function in (4) above has a discrete low rank sum l 0 Constraint of norm, the above is rewritable as
s.t.S (i) =S+||E (i) || 1
Wherein I II * The number of kernels is represented by a kernel norm, I.I 1 Representation l 1 The norm of the sample is calculated,
(6) Solving the problem by using a Lagrangian multiplication method to obtain a similarity matrix between every two robust samples;
fifth step: constructing an objective function, specifically comprising the following steps:
(1) The similarity based on the similarity matrix between every two robust samples is kept in the Hamming space, and the objective function of the Hash code learning is defined as follows:
where k represents the length of the hash code,B 1 hash code of image mode sample, B 2 Hash codes of text mode samples, and lambda is a weight parameter;
(2) Using linear mapping as a hash function and using l 2,1 The norms are used as regular terms to restrain learning of images and text modal hash functions, and the objective function learned by each modal hash function is defined as follows:
wherein W is 1 ,W 2 Hash functions representing image modality and text modality, respectively, reg (·) representing a regularization term preventing overfitting, hereβ i And mu is a weight parameter;
(3) Adding the hash code and the objective function learned by the hash function to obtain the objective function of the method, wherein the definition is as follows:
wherein beta is i Is a weight parameter;
sixth step: the objective function is solved by using an iterative optimization algorithm, and the solving process comprises the following steps:
(1) Fix W 1 ,W 2 And B 2 Solution B 1
The constant term is removed and the objective function can be written as:
here, can be solved sample by sample, let b 1i Representation B 1 Ith column, b 2j Representation B 2 The remove constant entry scalar function can be written as:
the method adopts a cyclic coordinate gradient descent method to solve bit by bit, and b is set 1im Representation b 1i Is selected from the group consisting of the (m) th bit,representation b 1i A vector of bits other than the mth bit, b 1im Can be obtained by the following formula:
repeating the above steps until the hash codes of all the image mode samples are solved;
(2) Fix W 1 ,W 2 And B 1 Solution B 2
And solving for B 1 The method is the same, and can obtain
Repeating the above steps until the hash codes of all text mode samples are solved;
(3) Fix W 2 ,B 1 And B 2 Solving for W 1
The constant term is removed and the objective function can be written as:
this problem has a closed-form solution
Wherein D is 1 In the form of a diagonal array,
(4) Fix W 1 ,B 1 And B 2 Solving for W 2
And solving W 1 The method is the same, W 2 There is a closed solution
Wherein D is 2 In the form of a diagonal array,
(5) Repeatedly executing the steps (1) - (4) until the algorithm converges or the maximum iteration number is reached;
seventh step: the user inputs a query sample, extracts the characteristics of the query sample, and removes the average value of the extracted characteristics;
eighth step: generating a hash code of the query sample using the learned hash function:
ninth step: and calculating the hamming distances between the query sample and the heterogeneous samples in the training set, and arranging the hamming distances in an ascending order, wherein the samples corresponding to the first r hamming distances are retrieval results.
CN201910096204.2A 2019-01-31 2019-01-31 Robust discrete supervision cross-media hash retrieval method Active CN109871454B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910096204.2A CN109871454B (en) 2019-01-31 2019-01-31 Robust discrete supervision cross-media hash retrieval method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910096204.2A CN109871454B (en) 2019-01-31 2019-01-31 Robust discrete supervision cross-media hash retrieval method

Publications (2)

Publication Number Publication Date
CN109871454A CN109871454A (en) 2019-06-11
CN109871454B true CN109871454B (en) 2023-08-29

Family

ID=66918414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910096204.2A Active CN109871454B (en) 2019-01-31 2019-01-31 Robust discrete supervision cross-media hash retrieval method

Country Status (1)

Country Link
CN (1) CN109871454B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111078952B (en) * 2019-11-20 2023-07-21 重庆邮电大学 Cross-modal variable-length hash retrieval method based on hierarchical structure
CN111368176B (en) * 2020-03-02 2023-08-18 南京财经大学 Cross-modal hash retrieval method and system based on supervision semantic coupling consistency
CN111680173B (en) * 2020-05-31 2024-02-23 西南电子技术研究所(中国电子科技集团公司第十研究所) CMR model for unified searching cross-media information
CN112214623A (en) * 2020-09-09 2021-01-12 鲁东大学 Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method
CN112836068B (en) * 2021-03-24 2023-09-26 南京大学 Unsupervised cross-modal hash retrieval method based on noisy tag learning
CN113961727B (en) * 2021-09-13 2022-10-21 哈尔滨工业大学(深圳) Cross-media Hash retrieval method, device, terminal and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107256271A (en) * 2017-06-27 2017-10-17 鲁东大学 Cross-module state Hash search method based on mapping dictionary learning
WO2017210949A1 (en) * 2016-06-06 2017-12-14 北京大学深圳研究生院 Cross-media retrieval method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017210949A1 (en) * 2016-06-06 2017-12-14 北京大学深圳研究生院 Cross-media retrieval method
CN107256271A (en) * 2017-06-27 2017-10-17 鲁东大学 Cross-module state Hash search method based on mapping dictionary learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于哈希方法的跨媒体检索研究;姚涛;万方中国学位论文;全文 *

Also Published As

Publication number Publication date
CN109871454A (en) 2019-06-11

Similar Documents

Publication Publication Date Title
CN109871454B (en) Robust discrete supervision cross-media hash retrieval method
CN108334574B (en) Cross-modal retrieval method based on collaborative matrix decomposition
Wu et al. Semi-supervised nonlinear hashing using bootstrap sequential projection learning
CN110222218B (en) Image retrieval method based on multi-scale NetVLAD and depth hash
CN108519971B (en) Cross-language news topic similarity comparison method based on parallel corpus
CN108460400B (en) Hyperspectral image classification method combining various characteristic information
CN115098620B (en) Cross-modal hash retrieval method for attention similarity migration
CN111125411A (en) Large-scale image retrieval method for deep strong correlation hash learning
CN111475622A (en) Text classification method, device, terminal and storage medium
CN113377981B (en) Large-scale logistics commodity image retrieval method based on multitask deep hash learning
CN114117153A (en) Online cross-modal retrieval method and system based on similarity relearning
CN112800249A (en) Fine-grained cross-media retrieval method based on generation of countermeasure network
CN112163114B (en) Image retrieval method based on feature fusion
CN112214623A (en) Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method
CN115795065A (en) Multimedia data cross-modal retrieval method and system based on weighted hash code
CN109857892B (en) Semi-supervised cross-modal Hash retrieval method based on class label transfer
CN111079011A (en) Deep learning-based information recommendation method
CN114579794A (en) Multi-scale fusion landmark image retrieval method and system based on feature consistency suggestion
US11829442B2 (en) Methods and systems for efficient batch active learning of a deep neural network
CN115098707A (en) Cross-modal Hash retrieval method and system based on zero sample learning
Wei et al. Word image representation based on visual embeddings and spatial constraints for keyword spotting on historical documents
CN116204673A (en) Large-scale image retrieval hash method focusing on relationship among image blocks
CN109614581A (en) The Non-negative Matrix Factorization clustering method locally learnt based on antithesis
CN111984800B (en) Hash cross-modal information retrieval method based on dictionary pair learning
CN115309929A (en) Cross-modal Hash retrieval method and system for maintaining nonlinear semantics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant