CN112214623A - Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method - Google Patents

Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method Download PDF

Info

Publication number
CN112214623A
CN112214623A CN202010943065.5A CN202010943065A CN112214623A CN 112214623 A CN112214623 A CN 112214623A CN 202010943065 A CN202010943065 A CN 202010943065A CN 112214623 A CN112214623 A CN 112214623A
Authority
CN
China
Prior art keywords
sample
image
matrix
text
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010943065.5A
Other languages
Chinese (zh)
Inventor
姚涛
刘莉
闫连山
贺文伟
崔光海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yantai Aidian Information Technology Co ltd
Ludong University
Original Assignee
Yantai Aidian Information Technology Co ltd
Ludong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yantai Aidian Information Technology Co ltd, Ludong University filed Critical Yantai Aidian Information Technology Co ltd
Priority to CN202010943065.5A priority Critical patent/CN112214623A/en
Publication of CN112214623A publication Critical patent/CN112214623A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/325Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/41Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of multimedia, in particular to an image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method, which comprises the following steps: constructing a picture-text sample pair sample set, and labeling semantic categories of sample pairs; extracting the characteristics of the images and the text samples in the sample set, and mapping the characteristics to a nonlinear space by using a radial basis Gaussian kernel function; constructing a graph adjacency matrix of the sample pairs by using the class labels of the sample pairs to obtain a Laplace matrix; mapping the category labels to a potential semantic space by utilizing linear mapping, and keeping the semantic similarity between the modalities and in the modalities of the image and text samples to respectively learn linear mapping matrixes for the image and text modalities; learning an orthogonal rotation matrix minimization quantization error; a discrete iteration optimization algorithm is provided to obtain a discrete solution of the hash code; the invention utilizes the semantic similarity between the modes and the semantic similarity between the modes of the images and the text samples, the similarity based on the class labels and the minimized quantization error to learn the Hash codes, thereby improving the algorithm retrieval performance.

Description

Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method
Technical Field
The invention relates to the technical field of multimedia, in particular to an efficient supervision picture embedding cross-media Hash retrieval method facing to picture and text samples.
Background
With the rapid development of network technologies and portable mobile devices, more and more people are used to share drops in life through a network, for example, when a person passes through a birthday, a birthday photo (image) is published and own mood (text) is described through social software such as WeChat and facial makeup, so that data on the network is increased explosively, and how a user searches for required information in mass data becomes a challenge. On the one hand, the amount of data on the network is large, and the dimensionality of the sample features is usually very high, even up to ten thousand dimensions. The conventional search method needs to calculate the distances between the query sample and all samples to be searched, such as euclidean distance, cosine distance, etc., which may cause excessive computational complexity and memory overhead. On the other hand, the data on the network has multiple modes, and each mode represents the heterogeneity, and how to measure the similarity of heterogeneous samples becomes a challenge. The cross-media hashing method can well solve the above two problems. The supervised cross-media hash method can learn the hash code by utilizing the class label containing high-level semantics, improves the distinguishing capability of the hash code and obtains satisfactory retrieval performance. However, most of the methods have the following problems, and need to be further solved: 1) most methods cannot fully utilize the class labels to improve the performance of the hash codes, and the existing methods mainly learn the hash codes by keeping the similarity based on two similar matrixes, but the two similar matrixes not only cause the loss of class information, but also cause higher calculation complexity and memory overhead; 2) most of the existing discrete hash methods solve the hash code matrix bit by bit in the optimization process, which results in higher computational complexity. The invention provides a high-efficiency Hash retrieval method for embedding a supervision picture facing to a picture-text sample, which can effectively solve the above problems. Firstly, in order to better keep the semantic similarity of the samples, the invention provides the method for simultaneously keeping the semantic similarity among and in the modes of the samples and the similarity based on the class labels, learning the hash code and the linear mapping matrix, learning an orthogonal rotation matrix to reduce the quantization error and further improving the distinguishing capability of the hash code. Then, an iterative optimization algorithm is provided, so that not only can the discrete solution of the Hash code closure of the sample be directly obtained, but also the computational complexity of the algorithm is reduced.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method, which is characterized by utilizing a computer device to realize the following steps:
step 1, collecting images and text samples from a network, taking the images and the text samples belonging to the same webpage as image-text sample pairs to form an image-text sample set, labeling the types of the image-text sample pairs, and dividing the image-text sample pairs into a training set and a test set;
step 2, extracting the characteristics of all images and text samples in the training set and the test set, and normalizing and removing the mean value of the characteristics;
step 3, feature use of image-text sample pair in training set
Figure DEST_PATH_IMAGE001
Is shown in which
Figure 593037DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE003
Respectively the characteristics of all image samples and text samples in the training set,
Figure 203141DEST_PATH_IMAGE004
Figure DEST_PATH_IMAGE005
which represents a real number of the digital signal,
Figure 811976DEST_PATH_IMAGE006
the dimensions of the features are represented in a graph,
Figure DEST_PATH_IMAGE007
representing the number of pairs of teletext samples in the training set,
Figure 47786DEST_PATH_IMAGE008
class labels representing pairs of samples, wherein
Figure DEST_PATH_IMAGE009
The number of total categories is indicated and,
Figure 372981DEST_PATH_IMAGE010
representing the number of pairs of teletext samples; random selection
Figure DEST_PATH_IMAGE011
A sample pair
Figure 340937DEST_PATH_IMAGE012
As anchor points, wherein
Figure DEST_PATH_IMAGE013
Figure 499386DEST_PATH_IMAGE014
Mapping the characteristics of all image samples and text samples to a nonlinear space by using a Gaussian radial basis function:
Figure DEST_PATH_IMAGE015
wherein
Figure 24040DEST_PATH_IMAGE016
In order to be a scale parameter,
Figure DEST_PATH_IMAGE017
to represent
Figure 951544DEST_PATH_IMAGE018
The norm of the number of the first-order-of-arrival,
Figure DEST_PATH_IMAGE019
represents a transpose of a matrix or vector;
step 4, constructing a graph adjacency matrix of the sample pairs by using the class labels of the image-text sample pairs
Figure 90402DEST_PATH_IMAGE020
Figure DEST_PATH_IMAGE021
Represents a real number, which is defined as follows:
Figure 985414DEST_PATH_IMAGE022
wherein,
Figure DEST_PATH_IMAGE023
representation matrix
Figure 297447DEST_PATH_IMAGE024
To (1) a
Figure DEST_PATH_IMAGE025
Go to the first
Figure 95770DEST_PATH_IMAGE026
The values of the columns are such that,
Figure DEST_PATH_IMAGE027
to represent
Figure 405528DEST_PATH_IMAGE028
A norm;
step 5, further obtaining a graph adjacency matrix
Figure 538569DEST_PATH_IMAGE024
Is the Laplace matrix
Figure DEST_PATH_IMAGE029
Wherein
Figure 591976DEST_PATH_IMAGE030
Is that
Figure DEST_PATH_IMAGE031
Diagonal matrix of (2), diagonal elements thereof
Figure 746270DEST_PATH_IMAGE032
Step 6, constructing an objective function of the method by using inter-modal and intra-modal semantic similarities and minimized quantization errors which keep sample characteristics based on the variables of the steps 1 to 5, wherein the objective function is defined as follows:
Figure DEST_PATH_IMAGE033
wherein
Figure 289247DEST_PATH_IMAGE034
Figure DEST_PATH_IMAGE035
Figure 581688DEST_PATH_IMAGE036
Figure DEST_PATH_IMAGE037
Figure 251835DEST_PATH_IMAGE038
And
Figure DEST_PATH_IMAGE039
in order to be a weight parameter, the weight parameter,
Figure 8438DEST_PATH_IMAGE040
and
Figure DEST_PATH_IMAGE041
represented as linear projection matrices learned for image and text sample modalities respectively,
Figure 971584DEST_PATH_IMAGE042
which indicates the length of the hash code and,
Figure DEST_PATH_IMAGE043
the traces of the matrix are represented by,
Figure 892267DEST_PATH_IMAGE044
in order to be a linear mapping matrix, the mapping matrix is,
Figure DEST_PATH_IMAGE045
for the learned hash code of the image-text sample pair,
Figure 287476DEST_PATH_IMAGE046
is an orthogonal rotation matrix and is characterized in that,
Figure DEST_PATH_IMAGE047
expressed in size of
Figure 898586DEST_PATH_IMAGE048
The unit matrix of (a) is obtained,
Figure DEST_PATH_IMAGE049
representing a regularization term;
and 7, solving the objective function by using an iterative optimization algorithm, which specifically comprises the following steps:
step 71, fixing
Figure 301142DEST_PATH_IMAGE050
Figure DEST_PATH_IMAGE051
Figure 995604DEST_PATH_IMAGE052
And
Figure DEST_PATH_IMAGE053
solving for
Figure 506089DEST_PATH_IMAGE054
: removing and
Figure DEST_PATH_IMAGE055
an irrelevant term, then the objective function becomes:
Figure 971705DEST_PATH_IMAGE056
to the above formula
Figure DEST_PATH_IMAGE057
And let it equal 0, then:
Figure 43697DEST_PATH_IMAGE058
due to Laplace matrix
Figure DEST_PATH_IMAGE059
Is of a size of
Figure 798027DEST_PATH_IMAGE060
So as to calculate
Figure DEST_PATH_IMAGE061
The computational complexity and memory overhead of
Figure 597356DEST_PATH_IMAGE062
The application of the invention in large-scale sample sets is limited, and the above formula can be further rewritten as follows:
Figure DEST_PATH_IMAGE063
however, calculate
Figure 435255DEST_PATH_IMAGE064
And
Figure DEST_PATH_IMAGE065
the computational complexity and memory overhead of
Figure 927416DEST_PATH_IMAGE066
The invention proposes a predefined constant
Figure DEST_PATH_IMAGE067
Then, then
Figure 903462DEST_PATH_IMAGE068
Figure DEST_PATH_IMAGE069
Further predefining constants
Figure 257215DEST_PATH_IMAGE070
Then, then
Figure DEST_PATH_IMAGE071
Can be written as
Figure 963002DEST_PATH_IMAGE072
To calculate
Figure DEST_PATH_IMAGE073
The computational complexity and memory overhead of
Figure 609753DEST_PATH_IMAGE074
(ii) a For the
Figure DEST_PATH_IMAGE075
Can be rewritten as
Figure 338675DEST_PATH_IMAGE076
The computational complexity and memory overhead to calculate this is
Figure DEST_PATH_IMAGE077
Thus calculating
Figure 479806DEST_PATH_IMAGE078
Is reduced to
Figure DEST_PATH_IMAGE079
Step 72, fixing
Figure 790833DEST_PATH_IMAGE080
Figure DEST_PATH_IMAGE081
Figure 296901DEST_PATH_IMAGE082
And
Figure 575435DEST_PATH_IMAGE053
solving for
Figure DEST_PATH_IMAGE083
: and solving for
Figure 457941DEST_PATH_IMAGE084
Similarly, one can obtain:
Figure DEST_PATH_IMAGE085
further utilization and solution
Figure 124939DEST_PATH_IMAGE086
In a similar way, the calculation can be carried out
Figure DEST_PATH_IMAGE087
Is reduced to
Figure 129804DEST_PATH_IMAGE088
Step 73, fixing
Figure DEST_PATH_IMAGE089
Figure 380788DEST_PATH_IMAGE090
Figure DEST_PATH_IMAGE091
And
Figure 801405DEST_PATH_IMAGE092
solving for
Figure DEST_PATH_IMAGE093
: removing and
Figure 336291DEST_PATH_IMAGE093
an irrelevant term, then the objective function becomes:
Figure 495746DEST_PATH_IMAGE094
to the above formula
Figure DEST_PATH_IMAGE095
And let it equal 0, then:
Figure 420977DEST_PATH_IMAGE096
step 74, fixing
Figure DEST_PATH_IMAGE097
Figure 707602DEST_PATH_IMAGE090
Figure 847727DEST_PATH_IMAGE093
And
Figure 132078DEST_PATH_IMAGE092
solving for
Figure 544605DEST_PATH_IMAGE091
: removing and
Figure 369341DEST_PATH_IMAGE091
an irrelevant term, then the objective function becomes:
Figure 550924DEST_PATH_IMAGE098
the above equation can be solved by a Singular Value Decomposition (SVD) algorithm, i.e.
Figure DEST_PATH_IMAGE099
Wherein
Figure 320690DEST_PATH_IMAGE100
In the form of a left-hand singular matrix,
Figure DEST_PATH_IMAGE101
in the form of a right singular matrix,
Figure 282830DEST_PATH_IMAGE102
is a matrix of singular values, then
Figure DEST_PATH_IMAGE103
Step 75, fixing
Figure 661990DEST_PATH_IMAGE104
Figure DEST_PATH_IMAGE105
Figure 760396DEST_PATH_IMAGE106
And
Figure 698134DEST_PATH_IMAGE091
solving for
Figure 85253DEST_PATH_IMAGE092
: removing and
Figure 189475DEST_PATH_IMAGE092
an irrelevant term, then the objective function becomes:
Figure DEST_PATH_IMAGE107
the following can be obtained:
Figure 407967DEST_PATH_IMAGE108
wherein
Figure DEST_PATH_IMAGE109
Representing a symbolic function;
step 76, repeating the steps 71-75 until the algorithm converges or the maximum iteration number is reached;
step 8, a user inputs a query sample, the sample can be an image or a text, the characteristics of the query sample are extracted, the characteristics are normalized and mean-removed, the characteristics of the sample are mapped to a nonlinear space by using a Gaussian radial basis function, and the representation of the query sample is obtained
Figure 18071DEST_PATH_IMAGE110
Step 9, generating a hash code of the query sample by using the learned linear mapping function and the rotation matrix:
Figure DEST_PATH_IMAGE111
step 10, calculating Hamming distance between query sample and hash code of heterogeneous sample in sample setArranged from small to large in Hamming distance and returned to the previous position
Figure 689223DEST_PATH_IMAGE112
And obtaining the retrieval result by the sample.
Compared with the prior art, the invention has the beneficial effects that:
1. the computation complexity and the memory overhead based on the spectrum embedding algorithm are reduced by the introduced constant
Figure DEST_PATH_IMAGE113
Is reduced to
Figure 188949DEST_PATH_IMAGE114
2. The Hash code is learned by keeping semantic similarity in and among the modes and the similarity based on the label, so that the performance of the Hash code is improved.
3. An orthogonal rotation matrix is learned in a supervision mode to reduce quantization errors, so that the distinguishing capability of the hash codes is further enhanced, and the performance of the algorithm is improved.
Drawings
Fig. 1 is a flowchart of the steps of the efficient supervised image embedding cross-media hash retrieval method for image-text samples according to the present invention.
Detailed Description
In order to more fully and clearly describe the technical scheme of the invention, the invention is further described in detail with reference to the specific embodiments, and it should be understood that the embodiments described herein are only used for explaining and explaining the invention, and are not used for limiting the protection scope of the invention.
The invention relates to an image-text sample-oriented efficient supervision image embedding cross-media Hash retrieval method, which comprises the steps of collecting images and text samples on the Internet, forming sample pairs by the images and the text samples from the same webpage, establishing an image-text sample pair set, labeling the types of the sample pairs, and dividing the image-text sample set into a training set and a testing set; extracting the characteristics of all images and text samples in the training set and the test set, and mapping the characteristics of the images and the text samples to a nonlinear space by using a radial basis Gaussian kernel function; constructing a graph adjacency matrix of the sample pair by using the class label of the sample pair, and further obtaining a Laplace matrix of the graph; mapping the category labels to a potential semantic space by utilizing linear mapping, and respectively learning linear mapping matrixes for image and text modes by keeping semantic similarity between the modes and in the modes of the image and text samples in the space; minimizing quantization error by learning an orthogonal rotation matrix; an efficient discrete iterative optimization algorithm is provided, direct solving by using a Laplace matrix is avoided by predefining a plurality of constants, the efficiency of the algorithm is improved, and a discrete solution of a Hash code can be directly obtained; the retrieval performance of the algorithm is improved by utilizing the semantic similarity between the modes and the intra-mode semantic similarity of the images and the text samples, the similarity based on the class labels and the minimized quantization error learning hash code.
Referring to fig. 1, an efficient supervised image embedding cross-media hash retrieval method for image-text samples is characterized in that a computer device is used for implementing the following steps:
the first step is as follows: collecting images and text samples from a network, taking the images and the text samples belonging to the same webpage as image-text sample pairs to form an image-text sample set, labeling the types of the image-text sample pairs, randomly selecting 75% of the image-text sample pairs to form a training set, and forming a test set by the rest of the image-text sample pairs;
the second step is that: extracting 150-dimensional texture features Of all image samples and 500-dimensional BOW (bag Of words) features Of all text samples, and normalizing and removing the mean value Of the features;
the third step: for training features of concentrated pairs of graphic samples
Figure DEST_PATH_IMAGE115
Is shown in which
Figure 261947DEST_PATH_IMAGE116
Figure DEST_PATH_IMAGE117
Respectively representing the characteristics of all the images and text samples in the training set,
Figure 229903DEST_PATH_IMAGE118
Figure DEST_PATH_IMAGE119
Figure 139084DEST_PATH_IMAGE120
which represents the number of pairs of samples,
Figure DEST_PATH_IMAGE121
class labels representing pairs of samples, wherein
Figure 647426DEST_PATH_IMAGE122
Representing the number of sample categories; randomly select 500 samples
Figure DEST_PATH_IMAGE123
(wherein
Figure 89777DEST_PATH_IMAGE124
) As anchor points, the features of the sample are mapped to a non-linear space using gaussian radial basis functions:
Figure DEST_PATH_IMAGE125
wherein
Figure 290952DEST_PATH_IMAGE126
Figure DEST_PATH_IMAGE127
Figure 687429DEST_PATH_IMAGE128
To represent
Figure DEST_PATH_IMAGE129
A norm;
the fourth step: construction of graph adjacency matrix of sample pairs using class labels of graph-text sample pairs
Figure 937145DEST_PATH_IMAGE130
It is defined as follows:
Figure DEST_PATH_IMAGE131
wherein,
Figure 984735DEST_PATH_IMAGE132
representation matrix
Figure DEST_PATH_IMAGE133
To (1) a
Figure DEST_PATH_IMAGE135
Go to the first
Figure 609008DEST_PATH_IMAGE136
The values of the columns are such that,
Figure DEST_PATH_IMAGE137
to represent
Figure 742049DEST_PATH_IMAGE138
A norm;
the fifth step: further obtaining a graph adjacency matrix
Figure DEST_PATH_IMAGE139
Is the Laplace matrix
Figure 795456DEST_PATH_IMAGE140
Wherein
Figure DEST_PATH_IMAGE141
Is a diagonal matrix whose diagonal elements
Figure 448285DEST_PATH_IMAGE142
And a sixth step: based on the above variables, the objective function of the method is constructed by keeping the inter-modal and intra-modal semantic similarity and minimizing the quantization error of the sample features, which is defined as follows:
Figure DEST_PATH_IMAGE143
wherein
Figure 991262DEST_PATH_IMAGE144
Figure DEST_PATH_IMAGE145
Figure 595287DEST_PATH_IMAGE146
Figure DEST_PATH_IMAGE147
Figure 452385DEST_PATH_IMAGE148
Figure DEST_PATH_IMAGE149
Figure 208988DEST_PATH_IMAGE040
And
Figure 860550DEST_PATH_IMAGE150
represented as linear projection matrices learned for image and text sample modalities respectively,
Figure DEST_PATH_IMAGE151
which indicates the length of the hash code and,
Figure 718915DEST_PATH_IMAGE152
the traces of the matrix are represented by,
Figure DEST_PATH_IMAGE153
in order to be a linear mapping matrix, the mapping matrix is,
Figure 176441DEST_PATH_IMAGE154
for the learned hash code of the image-text sample pair,
Figure DEST_PATH_IMAGE155
is an orthogonal rotation matrix and is characterized in that,
Figure 39749DEST_PATH_IMAGE156
expressed in size of
Figure DEST_PATH_IMAGE157
The unit matrix of (a) is obtained,
Figure 127790DEST_PATH_IMAGE158
representing a regularization term;
the seventh step: solving the objective function by using an iterative optimization algorithm, and initializing the iteration times
Figure DEST_PATH_IMAGE159
Maximum number of iterations
Figure 457141DEST_PATH_IMAGE160
Value of the objective function
Figure DEST_PATH_IMAGE161
(a sufficiently large number) and a threshold value of 0.001, comprising in particular the following steps:
(1) fixing
Figure 203511DEST_PATH_IMAGE162
Figure DEST_PATH_IMAGE163
Figure 872389DEST_PATH_IMAGE164
And
Figure DEST_PATH_IMAGE165
solving for
Figure 193649DEST_PATH_IMAGE081
: removing and
Figure 993984DEST_PATH_IMAGE055
an irrelevant term, then the objective function becomes:
Figure 996575DEST_PATH_IMAGE166
to the above formula
Figure DEST_PATH_IMAGE167
And let it equal 0, then:
Figure 582277DEST_PATH_IMAGE168
due to Laplace matrix
Figure DEST_PATH_IMAGE169
Is of a size of
Figure 746542DEST_PATH_IMAGE170
So as to calculate
Figure 801217DEST_PATH_IMAGE078
Is both of complexity and memory overhead
Figure DEST_PATH_IMAGE171
The application of the invention in large-scale sample sets is limited, and the above formula can be further rewritten as follows:
Figure 341920DEST_PATH_IMAGE172
however, calculate
Figure DEST_PATH_IMAGE173
And
Figure 782129DEST_PATH_IMAGE174
is still of complexity and memory overhead
Figure DEST_PATH_IMAGE175
The invention proposes a predefined constant
Figure 697388DEST_PATH_IMAGE176
Then, then
Figure DEST_PATH_IMAGE177
Figure 223048DEST_PATH_IMAGE178
Further predefining constants
Figure DEST_PATH_IMAGE179
Then, then
Figure 380491DEST_PATH_IMAGE180
Can be written as
Figure DEST_PATH_IMAGE181
To calculate
Figure 878468DEST_PATH_IMAGE182
Has a complexity and memory overhead of
Figure DEST_PATH_IMAGE183
(ii) a For the
Figure 446853DEST_PATH_IMAGE184
Can be rewritten as
Figure DEST_PATH_IMAGE185
The complexity and memory overhead of calculating this is
Figure 974655DEST_PATH_IMAGE186
Thus calculating
Figure 857160DEST_PATH_IMAGE061
Is reduced to
Figure 209644DEST_PATH_IMAGE074
(2) Fixing
Figure 948930DEST_PATH_IMAGE162
Figure DEST_PATH_IMAGE187
Figure 386864DEST_PATH_IMAGE188
And
Figure DEST_PATH_IMAGE189
solving for
Figure 886110DEST_PATH_IMAGE190
: and solving for
Figure 155417DEST_PATH_IMAGE055
Similarly, one can obtain:
Figure DEST_PATH_IMAGE191
further utilization and solution
Figure 268867DEST_PATH_IMAGE192
In a similar way, the calculation can be carried out
Figure DEST_PATH_IMAGE193
Is reduced to
Figure 496893DEST_PATH_IMAGE194
(3) Fixing
Figure DEST_PATH_IMAGE195
Figure 783518DEST_PATH_IMAGE196
Figure 110594DEST_PATH_IMAGE188
And
Figure 394945DEST_PATH_IMAGE189
solving for
Figure DEST_PATH_IMAGE197
: removing and
Figure 417259DEST_PATH_IMAGE197
an irrelevant term, then the objective function becomes:
Figure 445257DEST_PATH_IMAGE198
to the above formula
Figure DEST_PATH_IMAGE199
And let it equal 0, then:
Figure 938425DEST_PATH_IMAGE200
(4) fixing
Figure 393677DEST_PATH_IMAGE081
Figure 293500DEST_PATH_IMAGE090
Figure 921927DEST_PATH_IMAGE162
And
Figure 958016DEST_PATH_IMAGE189
solving for
Figure DEST_PATH_IMAGE201
: removing and
Figure 397219DEST_PATH_IMAGE201
an irrelevant term, then the objective function becomes:
Figure 518759DEST_PATH_IMAGE202
the above equation can be solved by a Singular Value Decomposition (SVD) algorithm, i.e.
Figure DEST_PATH_IMAGE203
Wherein
Figure 950877DEST_PATH_IMAGE204
In the form of a left-hand singular matrix,
Figure DEST_PATH_IMAGE205
in the form of a right singular matrix,
Figure 421566DEST_PATH_IMAGE206
is a matrix of singular values, then
Figure 218621DEST_PATH_IMAGE103
(5) Fixing
Figure DEST_PATH_IMAGE207
Figure 889773DEST_PATH_IMAGE163
Figure 610736DEST_PATH_IMAGE208
And
Figure 621417DEST_PATH_IMAGE201
solving for
Figure 651690DEST_PATH_IMAGE053
: removing and
Figure 747822DEST_PATH_IMAGE053
an irrelevant term, then the objective function becomes:
Figure DEST_PATH_IMAGE209
the following can be obtained:
Figure 771011DEST_PATH_IMAGE210
wherein
Figure DEST_PATH_IMAGE211
Representing a symbolic function;
(6) calculating the value of an objective function
Figure 964095DEST_PATH_IMAGE212
And make a judgment on
Figure DEST_PATH_IMAGE213
Or
Figure 916001DEST_PATH_IMAGE214
If yes, stopping iteration; if not, then
Figure DEST_PATH_IMAGE215
Figure 233850DEST_PATH_IMAGE216
And repeatedly executing the steps (1) - (5);
eighth step: a user inputs a query sample, or an image or a text, 150-dimensional textural features of the query sample are extracted if the image is input, 500-dimensional BOW features of the query sample are extracted if the text is input, the features are normalized and mean-removed, and the features of the sample are mapped to a nonlinear space by using a Gaussian radial basis function to obtain the representation of the query sample
Figure DEST_PATH_IMAGE217
The ninth step: generating a hash code of the query sample by using the learned linear mapping function and the rotation matrix:
Figure 811462DEST_PATH_IMAGE218
the tenth step: calculating the Hamming distance between the query sample and the hash code of the heterogeneous sample in the sample set, arranging the Hamming distances from small to large, and returning to the previous step
Figure DEST_PATH_IMAGE219
And obtaining the retrieval result by the sample.
The present example verifies the effectiveness of the method of the present invention on a public sample set Mirflickr25K, where the sample set contains 20015 image text pairs collected from social networking site Flickr, and the sample pairs contain 24 semantic categories; in the embodiment, 75% of the image-text sample pairs are randomly selected as a training set, and the rest 25% are selected as a test set; each image is represented as a Gist feature (texture feature) with 150 dimensions, a text is represented as a BOW (bag Of words) feature with 500 dimensions, and the features are normalized and subjected to mean value removing; in order to evaluate the retrieval performance of the method, average accuracy (MAP @ 100) is used as an evaluation standard, namely MAP is calculated by the first 100 returned samples, MAP @100 results of different hash code lengths on two tasks of image retrieval text and text retrieval image are shown in Table 1, the MAP @100 results of the method on a Mirflickr25K sample set are shown, and the result shows that the retrieval performance of the method is obviously higher than that of the prior art.
TABLE 1
Figure 111250DEST_PATH_IMAGE220

Claims (4)

1. An image-text sample-oriented efficient supervised graph embedding cross-media Hash retrieval method is characterized by comprising the following steps:
step 1, collecting images and text samples from a network, taking the images and the text samples belonging to the same webpage as image-text sample pairs to form an image-text sample set, labeling the types of the image-text sample pairs, and dividing the image-text sample pairs into a training set and a test set;
step 2, extracting the characteristics of all images and text samples in the training set and the test set, and normalizing and removing the mean value of the characteristics;
step 3, feature use of image-text sample pair in training set
Figure DEST_PATH_IMAGE002
Is shown in which
Figure DEST_PATH_IMAGE004
Figure DEST_PATH_IMAGE006
Respectively representing the characteristics of all image samples and text samples in the training set,
Figure DEST_PATH_IMAGE008
Figure DEST_PATH_IMAGE010
which represents a real number of the digital signal,
Figure DEST_PATH_IMAGE012
the dimensions of the features are represented in a graph,
Figure DEST_PATH_IMAGE014
representing the number of pairs of teletext samples in the training set,
Figure DEST_PATH_IMAGE016
class labels representing pairs of samples, wherein
Figure DEST_PATH_IMAGE018
The total number of categories is represented as,
Figure DEST_PATH_IMAGE020
representing the number of pairs of teletext samples; random selection
Figure DEST_PATH_IMAGE022
A sample pair
Figure DEST_PATH_IMAGE024
As anchor points, wherein
Figure DEST_PATH_IMAGE026
Figure DEST_PATH_IMAGE028
Mapping the characteristics of all image samples and text samples to a nonlinear space by using a Gaussian radial basis function:
Figure DEST_PATH_IMAGE030
wherein
Figure DEST_PATH_IMAGE032
In order to be a scale parameter,
Figure DEST_PATH_IMAGE034
to represent
Figure DEST_PATH_IMAGE036
The norm of the number of the first-order-of-arrival,
Figure DEST_PATH_IMAGE038
represents a transpose of a matrix or vector;
step 4, constructing a graph adjacency matrix of the sample pairs by using the class labels of the image-text sample pairs
Figure DEST_PATH_IMAGE040
Figure DEST_PATH_IMAGE042
Represents a real number, which is defined as follows:
Figure DEST_PATH_IMAGE044
wherein,
Figure DEST_PATH_IMAGE046
representation matrix
Figure DEST_PATH_IMAGE048
To (1) a
Figure DEST_PATH_IMAGE050
Go to the first
Figure DEST_PATH_IMAGE052
The values of the columns are such that,
Figure DEST_PATH_IMAGE054
to represent
Figure DEST_PATH_IMAGE056
A norm;
step 5, constructing a graph adjacency matrix
Figure 150601DEST_PATH_IMAGE048
Is the Laplace matrix
Figure DEST_PATH_IMAGE058
Wherein
Figure DEST_PATH_IMAGE060
Is that
Figure DEST_PATH_IMAGE062
Diagonal matrix of (2), diagonal elements thereof
Figure DEST_PATH_IMAGE064
Step 6, combining the steps 1 to 5, constructing a target function of the method by using the inter-modal and intra-modal semantic similarity and the minimized quantization error which keep the characteristics of the samples;
7, solving an objective function by using an iterative optimization algorithm;
step 8, inputting a query sample by a user, extracting the characteristics of the query sample, normalizing and removing the mean value of the characteristics, and mapping the characteristics of the sample to a nonlinear space by using a Gaussian radial basis function to obtain the representation of the query sample
Figure DEST_PATH_IMAGE066
Step 9, generating a hash code of the query sample by utilizing the learned linear mapping function and the rotation matrix;
step 10, calculating Hamming distances of the hash codes of the query samples and the heterogeneous samples in the sample set, arranging the Hamming distances from small to large, and returning to the previous step
Figure DEST_PATH_IMAGE068
And obtaining the retrieval result by the sample.
2. The method for retrieving the cross-media hash of the embedded efficient supervision map for the image-text sample as claimed in claim 1, wherein the objective function in step 6 is defined as follows:
Figure DEST_PATH_IMAGE070
wherein
Figure DEST_PATH_IMAGE072
Figure DEST_PATH_IMAGE074
Figure DEST_PATH_IMAGE076
Figure DEST_PATH_IMAGE078
Figure DEST_PATH_IMAGE080
And
Figure DEST_PATH_IMAGE082
in order to be a weight parameter, the weight parameter,
Figure DEST_PATH_IMAGE084
and
Figure DEST_PATH_IMAGE086
respectively expressed as linear projection matrices learned for image sample and text sample modalities,
Figure DEST_PATH_IMAGE088
which indicates the length of the hash code and,
Figure DEST_PATH_IMAGE090
the traces of the matrix are represented by,
Figure DEST_PATH_IMAGE092
in order to be a linear mapping matrix, the mapping matrix is,
Figure DEST_PATH_IMAGE094
for the learned hash code of the image-text sample pair,
Figure DEST_PATH_IMAGE096
is an orthogonal rotation matrix and is characterized in that,
Figure DEST_PATH_IMAGE098
expressed in size of
Figure DEST_PATH_IMAGE100
The unit matrix of (a) is obtained,
Figure DEST_PATH_IMAGE102
a regularization term is represented.
3. The method for retrieving the cross-media hash embedded in the efficient supervision map facing the image-text sample as claimed in claim 1 or 2, wherein the step 7 of solving the objective function specifically comprises the following steps:
step 71, fixing
Figure DEST_PATH_IMAGE104
Figure DEST_PATH_IMAGE106
Figure DEST_PATH_IMAGE108
And
Figure DEST_PATH_IMAGE110
solving for
Figure DEST_PATH_IMAGE112
: removing and
Figure DEST_PATH_IMAGE114
an irrelevant term, then the objective function becomes:
Figure DEST_PATH_IMAGE116
to the above formula
Figure DEST_PATH_IMAGE118
And let it equal 0, then:
Figure DEST_PATH_IMAGE120
laplace matrix
Figure DEST_PATH_IMAGE122
Is composed of
Figure DEST_PATH_IMAGE124
The matrix is a matrix of a plurality of matrices,
Figure DEST_PATH_IMAGE126
both the computational complexity and the memory overhead of
Figure DEST_PATH_IMAGE128
Figure DEST_PATH_IMAGE130
Figure DEST_PATH_IMAGE132
And
Figure DEST_PATH_IMAGE134
both the computational complexity and the memory overhead of
Figure DEST_PATH_IMAGE136
Predefining a constant
Figure DEST_PATH_IMAGE138
Then, then
Figure DEST_PATH_IMAGE140
Figure DEST_PATH_IMAGE142
(ii) a Predefined constants
Figure DEST_PATH_IMAGE144
Then, then
Figure DEST_PATH_IMAGE146
Can be converted into
Figure DEST_PATH_IMAGE148
To do so
Figure DEST_PATH_IMAGE150
The computational complexity and memory overhead of
Figure DEST_PATH_IMAGE152
Figure DEST_PATH_IMAGE154
Can be converted into
Figure DEST_PATH_IMAGE156
To do so
Figure 506889DEST_PATH_IMAGE156
The computational complexity and memory overhead of
Figure DEST_PATH_IMAGE158
Thus calculating
Figure DEST_PATH_IMAGE160
Is reduced in both computational complexity and memory overhead to
Figure DEST_PATH_IMAGE162
Step 72, fixing
Figure DEST_PATH_IMAGE164
Figure DEST_PATH_IMAGE166
Figure DEST_PATH_IMAGE168
And
Figure 99675DEST_PATH_IMAGE110
solving for
Figure DEST_PATH_IMAGE170
: and solving for
Figure DEST_PATH_IMAGE172
Similarly, one can obtain:
Figure DEST_PATH_IMAGE174
utilization and solution
Figure DEST_PATH_IMAGE176
In a similar way, will calculate
Figure DEST_PATH_IMAGE178
Is reduced in both computational complexity and memory overhead to
Figure DEST_PATH_IMAGE180
Step 73, fixing
Figure DEST_PATH_IMAGE182
Figure DEST_PATH_IMAGE184
Figure DEST_PATH_IMAGE186
And
Figure DEST_PATH_IMAGE188
solving for
Figure DEST_PATH_IMAGE190
: removing and
Figure 174511DEST_PATH_IMAGE190
an irrelevant term, then the objective function becomes:
Figure DEST_PATH_IMAGE192
to the above formula
Figure DEST_PATH_IMAGE194
And let it equal 0, then:
Figure DEST_PATH_IMAGE196
step 74, fixing
Figure DEST_PATH_IMAGE198
Figure 926567DEST_PATH_IMAGE184
Figure 686712DEST_PATH_IMAGE190
And
Figure 953745DEST_PATH_IMAGE188
solving for
Figure 152646DEST_PATH_IMAGE186
: removing and
Figure 759207DEST_PATH_IMAGE186
an irrelevant term, then the objective function becomes:
Figure DEST_PATH_IMAGE200
the above equation can be solved by a Singular Value Decomposition (SVD) algorithm, i.e.
Figure DEST_PATH_IMAGE202
Wherein
Figure DEST_PATH_IMAGE204
In the form of a left-hand singular matrix,
Figure DEST_PATH_IMAGE206
in the form of a right singular matrix,
Figure DEST_PATH_IMAGE208
is a matrix of singular values, then
Figure DEST_PATH_IMAGE210
Step 75, fixing
Figure DEST_PATH_IMAGE212
Figure DEST_PATH_IMAGE214
Figure DEST_PATH_IMAGE216
And
Figure 500374DEST_PATH_IMAGE186
solving for
Figure 192386DEST_PATH_IMAGE188
: removing and
Figure 867081DEST_PATH_IMAGE188
an irrelevant term, then the objective function becomes:
Figure DEST_PATH_IMAGE218
the following can be obtained:
Figure DEST_PATH_IMAGE220
wherein
Figure DEST_PATH_IMAGE222
Representing a symbolic function;
and step 76, repeating the steps 71-75 until the algorithm converges or the maximum iteration number is reached.
4. The method as claimed in claim 3, wherein in step 9, the hash code of the query sample is
Figure DEST_PATH_IMAGE224
CN202010943065.5A 2020-09-09 2020-09-09 Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method Withdrawn CN112214623A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010943065.5A CN112214623A (en) 2020-09-09 2020-09-09 Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010943065.5A CN112214623A (en) 2020-09-09 2020-09-09 Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method

Publications (1)

Publication Number Publication Date
CN112214623A true CN112214623A (en) 2021-01-12

Family

ID=74049225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010943065.5A Withdrawn CN112214623A (en) 2020-09-09 2020-09-09 Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method

Country Status (1)

Country Link
CN (1) CN112214623A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113191445A (en) * 2021-05-16 2021-07-30 中国海洋大学 Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm
CN113407661A (en) * 2021-08-18 2021-09-17 鲁东大学 Discrete hash retrieval method based on robust matrix decomposition
CN113868366A (en) * 2021-12-06 2021-12-31 山东大学 Streaming data-oriented online cross-modal retrieval method and system
CN117315687A (en) * 2023-11-10 2023-12-29 哈尔滨理工大学 Image-text matching method for single-class low-information-content data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107256271A (en) * 2017-06-27 2017-10-17 鲁东大学 Cross-module state Hash search method based on mapping dictionary learning
CN107729513A (en) * 2017-10-25 2018-02-23 鲁东大学 Discrete supervision cross-module state Hash search method based on semanteme alignment
CN108595688A (en) * 2018-05-08 2018-09-28 鲁东大学 Across the media Hash search methods of potential applications based on on-line study
CN109871454A (en) * 2019-01-31 2019-06-11 鲁东大学 A kind of discrete across media Hash search methods of supervision of robust
CN110110100A (en) * 2019-05-07 2019-08-09 鲁东大学 Across the media Hash search methods of discrete supervision decomposed based on Harmonious Matrix

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107256271A (en) * 2017-06-27 2017-10-17 鲁东大学 Cross-module state Hash search method based on mapping dictionary learning
CN107729513A (en) * 2017-10-25 2018-02-23 鲁东大学 Discrete supervision cross-module state Hash search method based on semanteme alignment
CN108595688A (en) * 2018-05-08 2018-09-28 鲁东大学 Across the media Hash search methods of potential applications based on on-line study
CN109871454A (en) * 2019-01-31 2019-06-11 鲁东大学 A kind of discrete across media Hash search methods of supervision of robust
CN110110100A (en) * 2019-05-07 2019-08-09 鲁东大学 Across the media Hash search methods of discrete supervision decomposed based on Harmonious Matrix

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TAO YAO,LIANSHAN YAN, YILAN MA, HONG YU, QINGTANG SU: "《Fast discrete cross-modal hashing with semantic consistency》", 《NEURAL NETWORKS》 *
姚涛: "《基于哈希方法的跨媒体检索研究》", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113191445A (en) * 2021-05-16 2021-07-30 中国海洋大学 Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm
CN113191445B (en) * 2021-05-16 2022-07-19 中国海洋大学 Large-scale image retrieval method based on self-supervision countermeasure Hash algorithm
CN113407661A (en) * 2021-08-18 2021-09-17 鲁东大学 Discrete hash retrieval method based on robust matrix decomposition
CN113868366A (en) * 2021-12-06 2021-12-31 山东大学 Streaming data-oriented online cross-modal retrieval method and system
CN117315687A (en) * 2023-11-10 2023-12-29 哈尔滨理工大学 Image-text matching method for single-class low-information-content data
CN117315687B (en) * 2023-11-10 2024-10-08 泓柯垚利(北京)劳务派遣有限公司 Image-text matching method for single-class low-information-content data

Similar Documents

Publication Publication Date Title
CN108334574B (en) Cross-modal retrieval method based on collaborative matrix decomposition
Kulis et al. Fast similarity search for learned metrics
CN112214623A (en) Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method
Kulis et al. Kernelized locality-sensitive hashing
CN106033426B (en) Image retrieval method based on latent semantic minimum hash
Ge et al. Graph cuts for supervised binary coding
CN104820696B (en) A kind of large-scale image search method based on multi-tag least square hash algorithm
CN109697451B (en) Similar image clustering method and device, storage medium and electronic equipment
CN111159485B (en) Tail entity linking method, device, server and storage medium
CN110929080B (en) Optical remote sensing image retrieval method based on attention and generation countermeasure network
Huang et al. Object-location-aware hashing for multi-label image retrieval via automatic mask learning
CN109871454B (en) Robust discrete supervision cross-media hash retrieval method
Ali et al. Modeling global geometric spatial information for rotation invariant classification of satellite images
CN110943981A (en) Cross-architecture vulnerability mining method based on hierarchical learning
Choi et al. Face video retrieval based on the deep CNN with RBF loss
Liu et al. An indoor scene classification method for service robot Based on CNN feature
CN116304307A (en) Graph-text cross-modal retrieval network training method, application method and electronic equipment
JP2014197412A (en) System and method for similarity search of images
CN115795065A (en) Multimedia data cross-modal retrieval method and system based on weighted hash code
CN113656700A (en) Hash retrieval method based on multi-similarity consistent matrix decomposition
Al-Jubouri Content-based image retrieval: Survey
CN108647295B (en) Image labeling method based on depth collaborative hash
Pengcheng et al. Fast Chinese calligraphic character recognition with large-scale data
CN107133348B (en) Approximate searching method based on semantic consistency in large-scale picture set
Sun et al. Search by detection: Object-level feature for image retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20210112

WW01 Invention patent application withdrawn after publication