CN111078952B - Cross-modal variable-length hash retrieval method based on hierarchical structure - Google Patents

Cross-modal variable-length hash retrieval method based on hierarchical structure Download PDF

Info

Publication number
CN111078952B
CN111078952B CN201911141734.0A CN201911141734A CN111078952B CN 111078952 B CN111078952 B CN 111078952B CN 201911141734 A CN201911141734 A CN 201911141734A CN 111078952 B CN111078952 B CN 111078952B
Authority
CN
China
Prior art keywords
data
image
text
similarity
hash
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911141734.0A
Other languages
Chinese (zh)
Other versions
CN111078952A (en
Inventor
祁晓君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201911141734.0A priority Critical patent/CN111078952B/en
Publication of CN111078952A publication Critical patent/CN111078952A/en
Application granted granted Critical
Publication of CN111078952B publication Critical patent/CN111078952B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a hierarchical structure-based cross-mode variable length hash retrieval method, which comprises the following steps: the hash learning method is used for searching the cross-modal data, a variable-length hash algorithm which changes along with the length changes of different modal data is provided, so that the memory occupation is reduced, and the variable-length hash is first applied in the cross-modal field; introducing a hierarchical structure model, and selecting a representative data image-text pair to process training data according to similarity by constructing a hierarchical structure among data, so that training time of a subsequent cross-mode variable-length hash retrieval algorithm can be remarkably reduced; meanwhile, the accuracy of the cross-modal retrieval algorithm is effectively improved. The method can be effectively applied to cross-modal retrieval between the natural image text pairs.

Description

Cross-modal variable-length hash retrieval method based on hierarchical structure
Technical Field
The invention belongs to a multi-mode data retrieval method, and combines cross-mode retrieval, hash learning, manifold hierarchical structure, algebraic multi-grid ideas and the like to perform relevant improvement of a cross-mode hash retrieval algorithm and further develop the research work.
Background
Before the advent of the big data age, we have often used tags to retrieve various information. However, due to the explosive growth of large-scale data in recent years, a cross-modal search method has been developed, and we can flexibly query media data of any modality using the cross-modal search method. Although the existing cross-modal hash algorithm based on unified subspace learning has made some progress, the problems of large calculation amount, high storage cost and insufficient large-scale multi-modal data retrieval precision still exist. On this basis, the cross-modal retrieval mode based on hash coding is ubiquitous in machine learning and information retrieval due to its high efficiency and effective time and space saving. Hash learning generates a low-dimensional compact hash code by mapping the original high-dimensional media information to a low-dimensional space for quick and efficient retrieval. However, most hash-based models map multi-modal data to fixed-length hash codes, resulting in a less than optimal representation of the multi-modal data.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. The cross-modal variable-length hash retrieval method based on the hierarchical structure has the advantages that the cross-modal hash retrieval algorithm with variable length is provided, the hierarchical structure of algebraic multiple grids is combined, and the data retrieval accuracy is improved. The technical scheme of the invention is as follows:
a cross-modal variable length hash retrieval method based on a hierarchical structure comprises the following steps:
step 1, constructing a neighbor graph for a multi-mode dataset, wherein image data in the dataset corresponds to text data one by one, and for simplifying expression, we use image-text pairs to represent a pair of image-text data in the multi-mode dataset. Respectively extracting different characteristics of an image and a text, and constructing a similarity matrix between the image-text pair training set and an image-text pair database by combining the characteristics of the image and the characteristics of the text;
step 2, selecting a representative image-text pair to construct a bottom-up image-text pair hierarchical structure through a similarity matrix, wherein the selected image-text pair is strongly connected with unselected image-text pairs, the selected image-text pair in each layer is used as an initial image-text pair of the next layer, and the selection of the image-text pair of the next layer is carried out again until the selected representative points are few enough and can represent the whole data set, and the selected top image-text pair at the moment represents similar image-text pairs of different local areas respectively;
and 3, constructing a similarity matrix between top-level representative image-text pairs, and on the basis of a single-mode supervision discrete hash retrieval method, assuming that the image data and the text data have a common potential abstract semantic space, and directly inquiring and retrieving in the space. Respectively projecting hash codes of the image data and the text data to a potential abstract semantic space, and calculating a similarity matrix between the image data and the text data according to the inner product;
step 4, respectively projecting the image data and the text data into hash coding spaces with the optimal lengths, and solving corresponding projection matrixes, similarity incidence matrixes and compact hash codes with the optimal lengths of the modal data through iterative optimization;
step 5, interpolating from top to bottom by using a similarity transfer matrix and returning to the bottom complete data to obtain the similarity ordering of the hash codes of the data to be queried and the hash codes of all the data in the database;
and 6, obtaining the required data similar to the data to be queried according to the similarity sorting, and returning the required data to the user and detecting the accuracy of the required data.
Further, in the step 1, a neighbor graph is established for paired data of the image and the text, and a similarity matrix between the image-text pair training set and the image-text pair database is constructed, which specifically includes:
is provided withFor image dataset +.>For text data sets, d 1 And d 2 The dimension of the image and text data is respectively, n is the number of samples of the image and text data pair, and R is the whole real number set;
establishing k neighbor graph G for graph-text pairs in database [0] (V [0] ,W [0] ) Wherein V is [0] Representing all of the pairs of pictures in the database,representing a similarity matrix between each pair of teletext.
Further, in the step 2, the core of the hierarchical structure is to construct a strong connection graph between the graphic pairs. For the s-th layer in the bottom-up hierarchical structure of the image-text pair database, the image-text pair selected from the image-text pair of the s-1-th layer is taken as a representative image-text pair, and the image-text pair in the s-1-th layer but not in the s-th layer, namely the unselected image-text pair needs to be strongly connected with the representative image-text pair of the s-th layer, and the image-text pairs are arranged between different layersImage-text pair using similar transfer matrix F [n] Where n represents the corresponding number of layers. The similarity matrix of each layer can be transmitted through the similarity matrix of the bottom layer, and the specific expression is as follows:
W [i] =F [i-1]T ...F [1]T F [0]T W [0] F [0] F [1] ...F [i-1] ,i=1,2,...,s.
further, in the step 3, firstly, a similarity matrix of the top-level representation graph-text pair data is constructed, and if the multi-modal data set has a common potential abstract semantic space V, the multi-modal data can be directly queried and retrieved in the space, then the hash code B of the image is obtained X And hash coding of text B Y The projection into the underlying abstract semantic space is in the form of:
the similarity between the data in the V space is expressed as follows:
note w=w 1 T W 2 W is the similarity between the image and the text, and the specific objective function is:
wherein P is X 、P Y Representing projection matrices of image data and text data, respectively.
Further, in the step 4, a mapping function of mapping original data of each mode to respective hash codes is obtained, and a compact hash code with an optimal length to the data of each mode is obtained through the mapping function, and then the specific solving steps are as follows:
(1) Fixing other variables and solving for P X 、P Y The objective function can be reduced to the following form:
thus, P X 、P Y The analytical formula can be calculated by a regression formula:
P X =B X X T (XX T ) -1 ,P Y =B Y Y T (YY T ) -1
(2) Fixing other variables and solving for W, the objective function can be reduced to the following form:
the formula is a bilinear regression model, and the analytical formula is as follows:
(3) Fixing other variables, solving for B X The objective function may be reduced to the following form:
for variable B X Solving row by row, i.e. solving B X Fixing the rest row vectors when a certain row vector is formed, then sequentially and iteratively solving other row vectors, expanding a formula and deforming the formula into the following form:
s.t.B X ∈[-1,+1] h×n
wherein, the liquid crystal display device comprises a liquid crystal display device,H=(WB Y S T +P X x), tr (…) is the trace of the solution matrix, the solution is as follows, when solving B X Ith row vector b T Time, let B X ' is B X Deleting row vector b T Matrix g T The ith row vector of G, G' is the G deleted row vector G T The matrix after H is the ith row vector of H, H' is the matrix after H deletes the row vector H, and the solving result is:
b=sgn(h-B X ′G′ T g)
b can be solved according to the above formula X Then the rest other row vectors are obtained through similar steps;
(4) Fix other variables to solve B Y Solution process and B X Substantially similar, reference is made to step (3) B X Is a solution of (a).
Further, in the step 5, the similarity transfer matrix is used to interpolate from top to bottom and return to the bottom layer, so as to obtain the similarity ordering of the hash codes of the data to be queried and all the data in the database, which comprises the following specific steps: using a sequence of similar transfer matrices F [0] F [1] ...F [s-2] F [s-1] And (4) returning the top-down interpolation to the bottom layer to obtain the similarity ordering of the hash codes of the data to be queried and the hash codes of all the data in the database.
The invention has the advantages and beneficial effects as follows:
the innovation point of the invention is as follows: 1) A variable length cross-mode hash retrieval method is provided. The existing cross-mode hash retrieval methods are all based on hash codes with fixed lengths for retrieval, but data with different modes have hash codes with different lengths, so that the method cannot express the characteristics of different data more accurately. The method proposes to use variable-length hash codes, and follows the coding length of the data, so that the obtained hash codes can more effectively and accurately represent the data of different modes. 2) The occupation of the memory is effectively reduced. By using variable-length hash codes, the generated hash codes can most accurately express different mode data, and meanwhile, the memory occupation is effectively reduced. 3) A hierarchical model is incorporated. The algebraic multiple grid concept is used, training data is processed by selecting representative data pairs, and a subsequent search algorithm is completed by using only the representative data pairs of the top layer, so that the method is a brand new application. 4) The training time is greatly reduced. By adding the hierarchical structure model, the training data is changed from the original whole data to the top-level representative data with different characteristics, and the great reduction of the data quantity leads to the great reduction of the training time. 5) According to the invention, by combining the cross-modal hash learning method and the hierarchical structure model, the retrieval accuracy is increased under the condition of reducing time and memory occupancy rate because the selected representative data are better data.
Table 1 of the present invention statistics MAP values of a hierarchical-based cross-modality variable length hash search method (CVHH) image search text on a Wiki dataset, a NUS-WIDE dataset, and a MIRFlickr dataset.
Table 2 of the present invention counts MAP values for a cross-modal variable length hash search method (CVHH) text search image based on a hierarchical structure on a Wiki dataset, a NUS-WIDE dataset, and a MIRFlickr dataset.
Table 1 MAP value of image retrieval text
Table 2 MAP values of text retrieval image
Compared with a comparison method, the hierarchical structure-based cross-mode variable-length hash retrieval method shows better MAP values in hash codes with different lengths as can be seen from the tables 1 and 2 of the invention, and meanwhile, the CVHH method shows better MAP values along with the increase of hash coding lengths, namely, has better retrieval performance as can be seen from the tables 1 and 2. The effectiveness of the cross-modal variable-length hash retrieval method based on the hierarchical structure is verified through the experimental result.
Drawings
FIG. 1 is a flow chart of an overall algorithm of a preferred embodiment of the present invention
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and specifically described below with reference to the drawings in the embodiments of the present invention. The described embodiments are only a few embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
in order to reduce the training time of the algorithm, a neighbor graph is established for the paired data of the image and the text, and a similarity matrix between the paired training set of the image and the text and the paired database of the image and the text is established. And selecting a representative image-text pair in the image-text pair of each layer, wherein the selected image-text pair is strongly connected with the unselected image-text pair. And similarly, constructing a bottom-up image-text pair hierarchical structure, and using the image-text pair with the top layer most representative as training data. The selected top-layer image-text pairs respectively represent image-text pairs similar to the image-text pairs in different local areas, and only the top-layer image-text pairs are selected to participate in the subsequent cross-mode variable-length hash retrieval algorithm, so that the number of training data is reduced, and the training time of the algorithm is effectively shortened. The hash codes of the graphic-text modal data are projected to a potential abstract semantic space respectively, a corresponding projection matrix is obtained through iterative optimization, an association matrix between the data to be queried and top-level data is learned, and variable-length hash codes of the top-level image and text data are updated respectively until an objective function tends to converge. And (3) returning to the bottom layer by using the top-down interpolation of the similarity transfer matrix to obtain the similarity sequence of the hash codes of the data to be queried and the hash codes of all the data in the database, and returning the data required by the user according to the similarity sequence.
The technical scheme of the invention will be described in detail as follows:
a cross-modal variable length hash retrieval method based on a hierarchical structure comprises the following steps:
firstly, constructing a neighbor graph for paired image and text data in a data set, respectively extracting different characteristics of the image and the text, and constructing a similarity matrix between an image-text pair training set and an image-text pair database by combining the characteristics of the image and the characteristics of the text.
The representative image-text pairs are selected through the similarity matrix to construct a bottom-up image-text pair hierarchical structure, wherein the selected image-text pairs are strongly connected with the unselected image-text pairs, the selected image-text pairs in each layer serve as initial image-text pairs of the next layer, the selection of the image-text pairs of the next layer is carried out again until the selected representative points are few enough and can represent the whole data set, the selected top-layer image-text pairs at the moment respectively represent similar image-text pairs of different local areas, and the selected top-layer image-text pairs respectively represent similar image-text pairs of different local areas, so that the number of training data is reduced, and the training time of an algorithm is effectively reduced.
The similarity matrix between the top-level representative image-text pairs is constructed, on the basis of a single-mode supervision discrete hash retrieval method, the image data and the text data are assumed to have a common potential abstract semantic space, and the query and the retrieval can be directly carried out in the space. The hash codes of the image data and the text data are projected to a potential abstract semantic space respectively, and a similarity matrix between the image data and the text data is calculated according to the inner product.
And respectively projecting the image data and the text data into hash coding spaces with the optimal lengths, and solving corresponding projection matrixes, similarity incidence matrixes and compact hash codes with the optimal lengths of the modal data through iterative optimization.
And (4) returning to the bottom complete data by using the top-down interpolation of the similarity transfer matrix to obtain the similarity ordering of the hash codes of the data to be queried and the hash codes of all the data in the database.
And finally, obtaining the required data similar to the data to be queried according to the similarity sorting, and returning to the user.
Further, the neighbor graph is constructed on the paired image and text data, and meanwhile, a similarity matrix between the paired image training set and the paired image database is constructed. Is provided withIn order to provide a set of image data,for text data sets, d 1 And d 2 The dimension of the image and text data is respectively, n is the number of samples of the image and text data pair, and R is the whole real number set;
establishing k neighbor graph G for graph-text pairs in database [0] (V [0] ,W [0] ) Wherein V is [0] Representing all of the pairs of pictures in the database,representing a similarity matrix between each pair of teletext.
Further, a bottom-up hierarchical structure of the representative image-text pairs is constructed by selecting the image-text pairs among the similar image-text pairs, and the top-level representative image-text pairs with the most representative image-text pairs are selected as training data. For the s-th layer in the bottom-up hierarchical structure of the image-text pair database, the image-text pair selected from the image-text pair of the s-1 st layer is used as a representative image-text pair, and in the s-1 st layer but not in the s-th layer, namely the unselected image-text pair needs to be strongly connected with the representative image-text pair of the s-th layer, and the image-text pairs between different layers use a similar transmission matrix F [n] Where n represents the corresponding number of layers. The similarity matrix of each layer can be transmitted through the similarity matrix of the bottom layer, and the specific expression is as follows:
W [i] =F [i-1]T ...F [1]T F [0]T W [0] F [0] F [1] ...F [i-1] ,i=1,2,...,s.
the hierarchical structure of all the image-text pairs in the database and the similarity transfer matrix between each layer of representative image-text pairs and the adjacent layers can be obtained by the above formula, and only the top layer of representative image-text pairs are selected to participate in the subsequent cross-mode variable-length hash retrieval algorithm.
Further, the assumed multi-modal data set is provided with a common potential abstract semantic space V, in which multi-modal data can be directly queried and retrieved, and then the form of hash codes of images and texts projected to the potential abstract semantic space is respectively as follows:
the similarity between the data in the V space is expressed as follows:
note w=w 1 T W 2 W is the similarity between the modality data. The specific objective function is:
further, the corresponding projection matrix, the similarity incidence matrix and the compact hash code of the optimal length of each mode data are solved through iterative optimization, and the specific solving steps are as follows:
fixing other variables and solving for P X 、P Y The objective function can be reduced to the following form:
thus, P X 、P Y The analytical formula can be calculated by a regression formula:
P X =B X X T (XX T ) -1 ,P Y =B Y Y T (YY T ) -1
fixing other variables and solving for W, the objective function can be reduced to the following form:
the formula is a bilinear regression model, and the analytical formula is as follows:
fixing other variables, solving for B X The objective function may be reduced to the following form:
because there is a binary constraint, the direct solution is obviously very complex, and thus the variable B is referred to herein X Solving row by row, i.e. solving B X Fixing the rest row vectors when a certain row vector is formed, then sequentially and iteratively solving other row vectors, expanding a formula and deforming the formula into the following form:
s.t.B X ∈[-1,+1] h×n
wherein, the liquid crystal display device comprises a liquid crystal display device, H =(WB Y S T+P X x), tr (…) is the trace of the solution matrix,the solving process is as follows, when solving for B X Ith row vector b T Time, let B X ' is B X Deleting row vector b T Matrix g T The ith row vector of G, G' is the G deleted row vector G T The matrix after H is the ith row vector of H, H' is the matrix after H deletes the row vector H, and the solving result is:
b=sgn(h-B X ′G′ T g)
b can be solved according to the above formula X Then the rest other row vectors are obtained through similar steps;
fix other variables to solve B Y Solution process and B X Substantially similar, reference is made to B X Is a solution of (a).
Further, the method returns to the bottom layer by using the top-down interpolation of the similarity transfer matrix to obtain the similarity ordering of the hash codes of the data to be queried and all the data in the database, and comprises the following specific steps:
using a sequence of similar transfer matrices F [0] F [1] ...F [s-2] F [s-1] And (4) returning the top-down interpolation to the bottom layer to obtain the similarity ordering of the hash codes of the data to be queried and the hash codes of all the data in the database.
Step one: construction of neighbor graphs and similarity matrices
Is provided withFor image dataset +.>For text data sets, d 1 And d 2 The dimensions of the image and text data, respectively, and n is the number of samples of the image and text data pairs.
Establishing k neighbor graph G for graph-text pairs in database [0] (V [0] ,W [0] ) Wherein V is [0] Representing all of the pairs of pictures in the database,representing a similarity matrix between each pair of teletext.
Step two: building a bottom-up hierarchy
Constructing a strong connection graph between image-text pairs, wherein for the s-th layer in the bottom-up hierarchical structure of the image-text pair database, the image-text pair selected from the image-text pair of the s-1 st layer is taken as a representative image-text pair, in the s-1 st layer but not in the s-th layer, namely, the unselected image-text pair needs to be strongly connected with the representative image-text pair of the s-th layer, and the image-text pairs between different layers use a similar transmission matrix F [n] Where n represents the corresponding number of layers. The similarity matrix of each layer can be transmitted through the similarity matrix of the bottom layer, and the specific expression is as follows:
W [i] =F [i-1]T ...F [1]T F [0]T W [0] F [0] F [1] ...F [i-1] ,i=1,2,...,s.
step three: firstly, constructing a similarity matrix of top-level representative graphics-text pair data, and assuming that a common potential abstract semantic space V exists in a multi-modal data set, wherein the multi-modal data can be directly inquired and searched in the space, the hash codes of images and texts are projected to the potential abstract semantic space in the form of respectively:
the similarity between the data in the V space is expressed as follows:
note w=w 1 T W 2 W is the similarity between the modality data. The specific objective function is:
step four: learning projection matrix, association matrix and variable length hash code of each mode data
Fixing other variables and solving for P X 、P Y The objective function can be reduced to the following form:
thus, P X 、P Y The analytical formula can be calculated by a regression formula:
P X =B X X T (XX T ) -1 ,P Y =B Y Y T (YY T ) -1
fixing other variables and solving for W, the objective function can be reduced to the following form:
the formula is a bilinear regression model, and the analytical formula is as follows:
fixing other variables, solving for B X The objective function may be reduced to the following form:
because there is a binary constraint, the direct solution is obviously very complex, and thus the variable B is referred to herein X Solving row by row, i.e. solving B X The rest row vectors are fixed at first when one row vector is used, and then other row vectors are sequentially and iteratively solvedRow vectors, develop the formula and morph into the form:
s.t.B X ∈[-1,+1] h×n
wherein, the liquid crystal display device comprises a liquid crystal display device,H=(WB Y S T +P X x), tr (…) is the trace of the solution matrix, the solution is as follows, when solving B X Ith row vector b T Time, let B X ' is B X Deleting row vector b T Matrix g T The ith row vector of G, G' is the G deleted row vector G T The matrix after H is the ith row vector of H, H' is the matrix after H deletes the row vector H, and the solving result is:
b=sgn(h-B X ′G′ T g)
b can be solved according to the above formula X Then the rest other row vectors are obtained through similar steps;
fix other variables to solve B Y Solution process and B X Substantially similar, reference is made to B X Solution of (2)
Step five: obtaining a similarity ranking
Using a sequence of similar transfer matrices F [0] F [1] …F [s-2] F [s-1] And (4) returning the top-down interpolation to the bottom layer to obtain the similarity ordering of the hash codes of the data to be queried and the hash codes of all the data in the database.
Step six: returning demand data
According to the similarity ordering, the required data related to the data to be queried are obtained and the accuracy is detected, and the specific method is that the required data similar to the data to be queried are obtained according to the similarity matrix, and the data required by the user are returned.
In summary, the innovations and advantages of the present invention:
the cross-mode variable length hash retrieval method based on the hierarchical structure can realize efficient retrieval between graphic and text mode data, and greatly reduce training time and memory occupancy rate;
according to the hierarchical structure-based cross-mode variable-length hash retrieval method, the hash codes with variable lengths are used, the coding length of the data is followed, and the obtained hash codes can more effectively and accurately represent data with different modes.
According to the hierarchical structure-based cross-mode variable-length hash retrieval method, by using variable-length hash codes, the generated hash codes can be ensured to most accurately express different mode data, and meanwhile, the occupation of a memory is effectively reduced.
The invention provides a cross-modal variable-length hash retrieval method based on a hierarchical structure, which uses algebraic multiple grid ideas to process training data by selecting representative data pairs, and only uses the representative data pairs of the top layer to complete subsequent retrieval algorithms, thus being a brand new application.
According to the hierarchical structure-based cross-mode variable length hash retrieval method, the hierarchical structure model is added, so that training data is changed from original all data to top-level representative data with different characteristics, and the training time is greatly shortened due to the fact that the data quantity is greatly reduced.
According to the hierarchical structure-based cross-modal variable length hash retrieval method, the cross-modal hash learning method and the hierarchical structure model are combined, and the selected representative data are the optimal data, so that the retrieval accuracy is improved under the condition of reducing time and memory occupancy rate.
Table 1 of the present invention statistics MAP values of a hierarchical-based cross-modality variable length hash search method (CVHH) image search text on a Wiki dataset, a NUS-WIDE dataset, and a MIRFlickr dataset.
Table 2 of the present invention counts MAP values for a cross-modal variable length hash search method (CVHH) text search image based on a hierarchical structure on a Wiki dataset, a NUS-WIDE dataset, and a MIRFlickr dataset.
Table 1 MAP value of image retrieval text
Table 2 MAP values of text retrieval image
Compared with a comparison method, the hierarchical structure-based cross-mode variable-length hash retrieval method shows better MAP values in hash codes with different lengths as can be seen from the tables 1 and 2 of the invention, and meanwhile, the CVHH method shows better MAP values along with the increase of hash coding lengths, namely, has better retrieval performance as can be seen from the tables 1 and 2. The effectiveness of the cross-modal variable-length hash retrieval method based on the hierarchical structure is verified through the experimental result.
The above examples should be understood as illustrative only and not limiting the scope of the invention. Various changes and modifications to the present invention may be made by one skilled in the art after reading the teachings herein, and such equivalent changes and modifications are intended to fall within the scope of the invention as defined in the appended claims.

Claims (4)

1. A cross-modal variable length hash retrieval method based on a hierarchical structure is characterized by comprising the following steps:
step 1, constructing a neighbor graph for a multi-mode dataset, wherein image data in the dataset corresponds to text data one by one, and for simplifying expression, a pair of image-text data in the multi-mode dataset is expressed by using image-text pairs; respectively extracting different characteristics of an image and a text, and constructing a similarity matrix between the image-text pair training set and an image-text pair database by combining the characteristics of the image and the characteristics of the text;
step 2, selecting a representative image-text pair to construct a bottom-up image-text pair hierarchical structure through a similarity matrix, wherein the selected image-text pair is strongly connected with unselected image-text pairs, the selected image-text pair in each layer is used as an initial image-text pair of the next layer, and the selection of the image-text pair of the next layer is carried out again until the selected representative points are few enough and can represent the whole data set, and the selected top image-text pair at the moment represents similar image-text pairs of different local areas respectively;
step 3, constructing a similarity matrix between top-level representative graph-text pairs, and on the basis of a single-mode supervision discrete hash retrieval method, assuming that the image data and the text data have a common potential abstract semantic space, and directly inquiring and retrieving in the space; respectively projecting hash codes of the image data and the text data to a potential abstract semantic space, and calculating a similarity matrix between the image data and the text data according to the inner product;
step 4, respectively projecting the image data and the text data into hash coding spaces with the optimal lengths, and solving corresponding projection matrixes, similarity incidence matrixes and compact hash codes with the optimal lengths of the modal data through iterative optimization;
step 5, interpolating from top to bottom by using a similarity transfer matrix and returning to the bottom complete data to obtain the similarity ordering of the hash codes of the data to be queried and the hash codes of all the data in the database;
step 6, obtaining required data similar to the data to be queried according to the similarity sorting, returning the required data to the user, and detecting the accuracy of the required data;
in the step 1, a neighbor graph is established for paired data of images and texts, and a similarity matrix between a training set of image-text pairs and an image-text pair database is established, specifically comprising:
is provided withFor image dataset +.>For text data sets, d 1 And d 2 The dimension of the image and text data is respectively, n is the number of samples of the image and text data pair, and R is the whole real number set;
establishing k neighbor graph G for graph-text pairs in database [0] (V [0] ,W [0] ) Wherein V is [0] Representing all of the pairs of pictures in the database,representing a similarity matrix between each pair of graphics and texts;
in the step 3, firstly, a similarity matrix of top-level representation graphics context versus data is constructed, and the hash coding B of the image is performed on the assumption that the multi-modal data set has a common potential abstract semantic space V in which the multi-modal data can be directly queried and retrieved X And hash coding of text B Y The projection into the underlying abstract semantic space is in the form of:
the similarity between the data in the V space is expressed as follows:
note w=w 1 T W 2 W is the similarity between the image and the text, and the specific objective function is:
wherein P is X 、P Y Representing projection matrices of image data and text data, respectively.
2. The method for cross-modal variable length hash search based on hierarchical structure according to claim 1, wherein in the step 2, the core of constructing the hierarchical structure is to construct a strong connection diagram between graphic pairs; for the s-th layer in the bottom-up hierarchical structure of the image-text pair database, the image-text pair selected from the image-text pair of the s-1 st layer is used as a representative image-text pair, and in the s-1 st layer but not in the s-th layer, namely the unselected image-text pair needs to be strongly connected with the representative image-text pair of the s-th layer, and the image-text pairs between different layers use a similar transmission matrix F [n] Wherein n represents the corresponding number of layers; the similarity matrix of each layer can be transmitted through the similarity matrix of the bottom layer, and the specific expression is as follows:
W [i] =F [i-1]T …F [1]T F [0]T W [0] F [0] F [1] …F [i-1] ,i=1,2,...,s。
3. the hierarchical structure-based cross-mode variable length hash retrieval method according to claim 1, wherein in the step 4, a mapping function of mapping each mode of original data to each hash code is obtained, and a compact hash code with an optimal length to each mode of data is obtained through the mapping function, and then the specific solving steps are as follows:
(1) Fixing other variables and solving for P X 、P Y The objective function can be reduced to the following form:
thus, P X 、P Y The analytical formula can be calculated by a regression formula:
P X =B X X T (XX T ) -1 ,P Y =B Y Y T (YY T ) -1
(2) Fixing other variables and solving for W, the objective function can be reduced to the following form:
the formula is a bilinear regression model, and the analytical formula is as follows:
(3) Fixing other variables, solving for B X The objective function may be reduced to the following form:
for variable B X Solving row by row, i.e. solving B X Fixing the rest row vectors when a certain row vector is formed, then sequentially and iteratively solving other row vectors, expanding a formula and deforming the formula into the following form:
wherein, the liquid crystal display device comprises a liquid crystal display device,H=(WB Y S T +P X x), tr (…) is the trace of the solution matrix, the solution is as follows, when solving B X Ith row vector b T Time, let B X ' is B X Deleting row vector b T Matrix g T The ith row vector of G, G' is the G deleted row vector G T The matrix after H is the ith row vector of H, H' is the matrix after H deletes the row vector H, and the solving result is:
b=sgn(h-B X ′G′ T g)
b can be solved according to the above formula X Then the rest other row vectors are obtained through similar steps;
(4) Fix other variables to solve B Y Solution process and B X Substantially similar, reference is made to step (3) B X Is a solution of (a).
4. The hierarchical structure-based cross-modal variable length hash retrieval method according to claim 1, wherein in the step 5, a similarity transfer matrix is used to interpolate from top to bottom and return to the bottom layer, so as to obtain a hash code of the data to be queried and a similarity ordering of all data in a database, and the specific steps are as follows: using a sequence of similar transfer matrices F [0] F [1] …F [s-2] F [s-1] And (4) returning the top-down interpolation to the bottom layer to obtain the similarity ordering of the hash codes of the data to be queried and the hash codes of all the data in the database.
CN201911141734.0A 2019-11-20 2019-11-20 Cross-modal variable-length hash retrieval method based on hierarchical structure Active CN111078952B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911141734.0A CN111078952B (en) 2019-11-20 2019-11-20 Cross-modal variable-length hash retrieval method based on hierarchical structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911141734.0A CN111078952B (en) 2019-11-20 2019-11-20 Cross-modal variable-length hash retrieval method based on hierarchical structure

Publications (2)

Publication Number Publication Date
CN111078952A CN111078952A (en) 2020-04-28
CN111078952B true CN111078952B (en) 2023-07-21

Family

ID=70311342

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911141734.0A Active CN111078952B (en) 2019-11-20 2019-11-20 Cross-modal variable-length hash retrieval method based on hierarchical structure

Country Status (1)

Country Link
CN (1) CN111078952B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111581510B (en) * 2020-05-07 2024-02-09 腾讯科技(深圳)有限公司 Shared content processing method, device, computer equipment and storage medium
CN112199531A (en) * 2020-11-05 2021-01-08 广州杰赛科技股份有限公司 Cross-modal retrieval method and device based on Hash algorithm and neighborhood map
CN113868366B (en) * 2021-12-06 2022-04-01 山东大学 Streaming data-oriented online cross-modal retrieval method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005055196A2 (en) * 2003-12-05 2005-06-16 Koninklijke Philips Electronics N.V. System & method for integrative analysis of intrinsic and extrinsic audio-visual data
CN107273517A (en) * 2017-06-21 2017-10-20 复旦大学 Picture and text cross-module state search method based on the embedded study of figure
CN109871454A (en) * 2019-01-31 2019-06-11 鲁东大学 A kind of discrete across media Hash search methods of supervision of robust
CN110059198A (en) * 2019-04-08 2019-07-26 浙江大学 A kind of discrete Hash search method across modal data kept based on similitude
CN110059157A (en) * 2019-03-18 2019-07-26 华南师范大学 A kind of picture and text cross-module state search method, system, device and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8943011B2 (en) * 2011-06-28 2015-01-27 Salesforce.Com, Inc. Methods and systems for using map-reduce for large-scale analysis of graph-based data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005055196A2 (en) * 2003-12-05 2005-06-16 Koninklijke Philips Electronics N.V. System & method for integrative analysis of intrinsic and extrinsic audio-visual data
CN107273517A (en) * 2017-06-21 2017-10-20 复旦大学 Picture and text cross-module state search method based on the embedded study of figure
CN109871454A (en) * 2019-01-31 2019-06-11 鲁东大学 A kind of discrete across media Hash search methods of supervision of robust
CN110059157A (en) * 2019-03-18 2019-07-26 华南师范大学 A kind of picture and text cross-module state search method, system, device and storage medium
CN110059198A (en) * 2019-04-08 2019-07-26 浙江大学 A kind of discrete Hash search method across modal data kept based on similitude

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Semantic Structural Alignment of Neural Representational Spaces Enables Translation between English and Chinese Words;Benjamin D.Zinszer;《Journal of Cognitive Neuroscience》;第28卷(第11期);全文 *
基于深度学习和哈希的图像检索的方法研究;何涛;《中国优秀硕士学位论文数据库》;全文 *
跨模态检索研究综述;欧卫华;《贵州师范大学学报》;第138卷(第2期);全文 *

Also Published As

Publication number Publication date
CN111078952A (en) 2020-04-28

Similar Documents

Publication Publication Date Title
WO2022068196A1 (en) Cross-modal data processing method and device, storage medium, and electronic device
CN111078952B (en) Cross-modal variable-length hash retrieval method based on hierarchical structure
US8606774B1 (en) Methods and systems for 3D shape retrieval
JP4634214B2 (en) Method and system for identifying image relevance by utilizing link and page layout analysis
CN110059198A (en) A kind of discrete Hash search method across modal data kept based on similitude
CN106033426B (en) Image retrieval method based on latent semantic minimum hash
TW201606537A (en) Visual interactive search
Xie et al. Fast and accurate near-duplicate image search with affinity propagation on the ImageWeb
CN101842788A (en) Method, apparatus and computer program product for performing a visual search using grid-based feature organization
CN113127632B (en) Text summarization method and device based on heterogeneous graph, storage medium and terminal
CN110083683B (en) Entity semantic annotation method based on random walk
CN113971222A (en) Multi-mode composite coding image retrieval method and system
CN110795613A (en) Commodity searching method, device and system and electronic equipment
CN113344648B (en) Advertisement recommendation method and system based on machine learning
CN111090765A (en) Social image retrieval method and system based on missing multi-modal hash
CN113918807A (en) Data recommendation method and device, computing equipment and computer-readable storage medium
CN111737537B (en) POI recommendation method, device and medium based on graph database
CN112559877A (en) CTR (China railway) estimation method and system based on cross-platform heterogeneous data and behavior context
CN115455249A (en) Double-engine driven multi-modal data retrieval method, equipment and system
KR101592670B1 (en) Apparatus for searching data using index and method for using the apparatus
CN110188098B (en) High-dimensional vector data visualization method and system based on double-layer anchor point map projection optimization
Zhong et al. Deep convolutional hamming ranking network for large scale image retrieval
JP2020187644A (en) Information processor, method for processing information, and information processing program
Jain NSF workshop on visual information management systems: workshop report
Dong et al. High-performance image retrieval based on bitrate allocation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant