CN110674333B - Large-scale image high-speed retrieval method based on multi-view enhanced depth hashing - Google Patents

Large-scale image high-speed retrieval method based on multi-view enhanced depth hashing Download PDF

Info

Publication number
CN110674333B
CN110674333B CN201910712046.9A CN201910712046A CN110674333B CN 110674333 B CN110674333 B CN 110674333B CN 201910712046 A CN201910712046 A CN 201910712046A CN 110674333 B CN110674333 B CN 110674333B
Authority
CN
China
Prior art keywords
view
code
fusion
hash
views
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910712046.9A
Other languages
Chinese (zh)
Other versions
CN110674333A (en
Inventor
颜成钢
龚镖
白俊杰
孙垚棋
张继勇
张勇东
沈韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN201910712046.9A priority Critical patent/CN110674333B/en
Publication of CN110674333A publication Critical patent/CN110674333A/en
Application granted granted Critical
Publication of CN110674333B publication Critical patent/CN110674333B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a large-scale image high-speed retrieval method based on multi-view enhanced depth hashing. The invention comprises the following steps: step 1, acquiring multi-view characteristic representation of an image; step 2, calculating a view relation matrix; step 3, designing a loss function of the model; step 4, fusing and enhancing; step 5, training the built model on a large-scale image training data set; step 6, testing the trained model to generate a hash code, and then performing hash retrieval; and 7, evaluating indexes in an experiment. The effect of the expansion of the Hamming radius on the result is small; and as the code length increases, the precision remains stable.

Description

Large-scale image high-speed retrieval method based on multi-view enhanced depth hashing
Technical Field
The invention belongs to the technical field of computer images and artificial intelligence, particularly solves the problem of high-speed retrieval of large-scale image data sets, and relates to novel theories such as multi-view, deep learning and Hash learning.
Background
With the explosive growth of image data, efficient large-scale image retrieval algorithms are urgently needed for many tasks. Approximate nearest neighbor searching has attracted increasing attention to balance the time-consuming and efficient retrieval of large-scale data sets. Hashing is an efficient method of performing nearest neighbor search in a large-scale data space by embedding high-dimensional feature descriptors in similarity preserving hamming space with low dimensions. However, large-scale high-speed retrieval by binary codes has a certain reduction in retrieval accuracy as compared with conventional retrieval methods.
Hash learning is an emerging nearest neighbor search method with high efficiency, but the precision of the Hash method is limited. The learning to hash in the large-scale image retrieval is mainly to automatically generate a hash function. The binary code output by the hash function may be used to compare hamming distances in hamming space to obtain nearest neighbors. In recent years, several hash models with convolutional artificial neural networks have been proposed. Supervised hashing is a typical representation of these methods for image retrieval. Meanwhile, many impressive researches such as Cauchy deep hashing and other methods greatly improve the retrieval precision in a Hamming space, so that efficient large-scale image retrieval enters the next stage. These methods automatically learn a good image representation that is customized for a hash and a set of hash functions. However, the foregoing approaches focus on learning binary code from data with only a single view (i.e., using a single convolution feature). Recently, many multi-view hashing methods for efficient similarity search have been proposed, such as multi-view anchor graph hashing, multi-view alignment hashing, and the like. These methods rely primarily on spectral, graphical or deep learning techniques to achieve data structure preserving coding. However, in most cases, the hashing approach, purely using the above scheme, simply collects multi-view information to supplement the missing components in the single-view hash code, which ignores the relationships between views. In addition, these methods also have the problem of high computational complexity.
We propose a supervised multi-view hashing model that can enhance multi-view information through neural networks. The method actively explores the relationship between views by using an effective view stability evaluation method, which influences the optimization direction of the whole network. We have also designed multiple data fusion methods in hamming space to preserve the advantages of convolution and multiview. The proposed method was evaluated systematically on the CIFAR-10 (Western Fax) and NUS-WIDE (national university of Singapore foreign Germany) datasets, and the results showed that our method is significantly superior to the state-of-the-art single-view/multi-view hashing method.
Disclosure of Invention
In this context, we propose depth multiview enhanced hashing (D-MVE-Hash) and multiview hashing (MV-Hash) for image retrieval. Multiview hashing is a non-convolution multiview submodule of a multiview enhanced hash that measures view relationships in a manner called view stability evaluation. To obtain a stability assessment of the views, first, the multi-view hashes are pre-trained on the labeled dataset, and then similar images are repeatedly input in different view spaces to compare their stability strengths. A view-relationship matrix (view-relationship matrix) quantifies the relationship between views. The whole process meets the requirement of back propagation, so that the optimization can be carried out through gradient descent.
In our framework, we use three enhancement methods to incorporate the view relationship matrix and various binary codes learned from the single/multi-view space into the backbone network. The three different depth multi-view enhanced hash fusion methods make full use of the view relationship matrix. copy-Fusion (Fusion-R) enhances the effect of dominant views by iteratively repeating specific code segments, while also attenuating the effect of useless views. View code Fusion (Fusion-C) takes into account the most primitive view relationship matrix and the least artificial constraints. The method overcomes the difficulty that the repetition times must be manually set by people so as to ensure that the dimensionality of input data is unified in copy fusion because of a dynamic view relation matrix. Probability view pooling (Fusion-P) is a probability-based view pool approach. We have designed a memory network to eliminate the high temporal complexity of view stability evaluation. It shares input with the backbone network, making deep multi-view enhanced hashing a two-step model. The piping is shown in figure 1. The frame is shown in fig. 2. The experimental results and the visualization results prove the effectiveness of the depth multi-view enhanced hash and the multi-view hash on the image retrieval task.
The technical scheme adopted by the invention for solving the technical problems is as follows:
1. the large-scale image high-speed retrieval method based on the multi-view enhanced depth hash is characterized by comprising the following steps of:
step 1, acquiring multi-view characteristic representation of an image;
step 2, calculating a view relation matrix;
step 3, designing a loss function of the model;
step 4, fusing and enhancing;
step 5, training the built model on a large-scale image training data set;
step 6, testing the trained model to generate a hash code, and then performing hash retrieval;
and 7, evaluating indexes in an experiment.
The steps 1 and 2 are realized as follows:
2-1. problem definition and multiview hash description:
suppose that
Figure BDA0002154112120000031
Is a set of objects and the corresponding features:
Figure BDA0002154112120000032
wherein d ismIs the dimension of the mth view, M being the number of views, N being the number of objects; integrated binary code matrix
Figure BDA0002154112120000033
Wherein b isiIs and oiAn associated binary code, and q is the code length;
2-2. setting a mapping function
Figure BDA0002154112120000034
Wherein the mapping function is capable of converting a stack of similar objects into classification scores in different views;
2-3, defining potentially expected hash function
Figure BDA0002154112120000035
The composition of (A) is as follows:
Figure BDA0002154112120000036
epsilon is an evaluation function, and each view network is trained in advance in a labeled data set to perform a classification task before stability evaluation is started; the following loss function was used:
Figure BDA0002154112120000037
2-4, abstracting the test process, wherein the output dimension of the test process is consistent with the number of classes;
given image I ═ I1,...,iNLet Q be f (i), the dimensionality of Q be M × N × C, M being the number of views, N being the number of pictures, C being the number of categories
Figure BDA0002154112120000038
The definition is as follows:
Figure BDA0002154112120000039
ε is expressed as [. epsilon. ]1,…,εM]Then make normalized epsilon:
Figure BDA00021541121200000310
the step 3 is specifically realized as follows:
training a multi-view binary code generation network by using view relation information; at the beginning, set a pair of images i1,i2And corresponding binary network output b1,b2E is B; from { -1, +1} by relaxation mappingqTo [ -1, + 1)]q(ii) a If similar, define y ═ 1, otherwise y ═ 1; the following formula is the loss function for the mth view:
Figure BDA0002154112120000041
wherein | · | purple1Is a 1 norm, | · | is an absolute value, α > 0 is a boundaryControl, the third term is a regularization term to avoid gradient vanishing1,...,iNAnd the corresponding output binary code in the multi-view space
Figure BDA0002154112120000042
To obtain an equation representation in matrix form, B is introduced
Figure BDA0002154112120000043
The formula, given below:
Figure BDA0002154112120000044
merging
Figure BDA0002154112120000045
Is expressed in the form of a second multiplication term of p (I); then the regular terms and the similarity matrix are supplemented, and the following global objective functions are obtained:
Figure BDA0002154112120000046
the view relationship matrix E is
Figure BDA0002154112120000047
The overall loss function is rewritten as:
Figure BDA0002154112120000048
with this function, the network is trained using a back propagation algorithm with a small batch gradient descent method, and the view relation matrix E can affect all layers of the network.
The step 4 is specifically realized as follows:
ordering the view relation matrix E for important views and enhancing views by repeating the binary code of the respective views in the multi-view binary codeThe importance of the graph; specifically, the basic binary code is denoted as B; the intermediate code (multi-view binary code) is represented as
Figure BDA00021541121200000411
Setting a fusion vector v to guide the multi-view binary code to be repeated under various views; the following formula represents the fusion process:
Figure BDA0002154112120000049
wherein H represents the input binary code of the fusion layer; phi (-) is a self-join operation of the vector from 1 to M; the second parameter in phi (-) represents the number of self-replicates;
Figure BDA00021541121200000410
is the ranking function in the d dimension; the advantage of this fusion method is that it can convert E into discrete control vectors, so E only determines the order between views; the strength of the enhancement or the weakening is manually controlled by the fusion vector;
eliminating a Fusion vector after view code Fusion (Fusion-C), wherein the Fusion vector is used for ensuring that the dimension of input data is unified in view code Fusion due to a dynamic view relation matrix; first, the entire binary string H is encoded as a header code (H)h) Middle code (H)m) And a tail code (H)e);HhSame as copy fusion; hmDirectly using the product of the binary code length and the coefficients of the corresponding view as the repetition time of the current code segment; this operation produces a series of dummy bytes (i.e., H)e) The lengths thereof are not equal; secondly, a specific and different view codeword is assigned in each view, which is a codeword belonging to [ -1, 1]The random number of (2); and HmIn contrast, HeUsing view code instead of multi-view binary code; thus, regardless of the dynamic view relationship matrix and code length, it can be fully populated;
a probability view pool with a view relation matrix is provided as a multi-view fusion method, and view probability distribution is generated according to E; in each pooling filter, a random number sampled from the view probability distribution activates the selected view.
The step 5 is specifically realized as follows:
establishing a module called a memory network, which is independent of the model but participates in training together; a configuration module called a memory network is supplemented; learning the view relation matrix E in the step 1 by a memory network, and then obtaining the view relation matrix E through the module in the step 2 without stability evaluation; the structure of the memory network is a multilayer convolutional neural network, but the output layer of the memory network is relative to a view relation matrix E; and the loss function during training is
Figure BDA0002154112120000051
Figure BDA0002154112120000052
ln=(In-En)2
The invention has the following beneficial effects:
experiments of different code lengths were designed without loss of generality. Compared with a single-view hash model and a common multi-view hash model, the deep multi-view enhanced hash not only obtains higher average retrieval accuracy (mAP) but also has lower calculation cost. Especially in long code environments, deep multi-view enhanced hashing may achieve better retrieval results.
The proposed multi-view hashing achieves better performance on the NUS-WIDE dataset than the most advanced multi-view hashing methods. For example, when retrieved using 16-bit, 32-bit, and 48-bit hash codes, the multiview hash achieves gains of 3.44%, 1.65%, and 2.46% compared to SSMDH.
In fig. 3, it can be seen that the performance curve of the original binary code is drastically reduced by increasing the code length, while the performance curve of the depth multi-view enhanced hash is not completely affected. The enhanced binary code can maintain stable retrieval performance under the condition of long code length. In fig. 5, we see that the maps of 128-bit depth multi-view enhanced hashing using view code fusion are between 77.82% and 83.21%. The best retrieval hamming radius is 5. the maps of the 128-bit depth multiview enhanced hash using copy fusion are between 76.68% and 83.39%, 1.14% and 0.18% lower than using view code fusion.
Two advantages of this method are summarized from the experimental results: (1) the effect of the expansion of the Hamming radius on the result is small; (2) as the code length increases, the accuracy remains stable. Deep multiview enhanced hashing not only uses a convolutional neural network to obtain potential hash functions, but also combines multiview information in each view to generate a binary code. In contrast to other multi-view approaches, deep multi-view enhanced hashing uses a view relationship matrix, allowing the network to actively consider relationships between views to control the training direction. And the view relation matrix is not learned by existing fixed neural networks, which means that it is not uninterpretable. In order to visualize the differences more intuitively, the search results are visually presented in fig. 7.
Drawings
FIG. 1 is a pipeline diagram of the present invention for a large-scale image high-speed retrieval method based on multi-view enhanced depth hashing;
FIG. 2 is a global framework architecture diagram of a large-scale image high-speed retrieval method based on multi-view enhanced depth hashing provided by the invention;
FIG. 3 is a graphical illustration of single-view intra-space and multi-view generalization constraint rules under a two-sample condition;
FIG. 4 is a plot of mean search accuracy and accuracy versus recall obtained experimentally;
FIG. 5 is an average value of retrieval average accuracy under different code length environments and different Hamming radii obtained through experiments;
FIG. 6 is a graph of the loss variation of the model during training;
FIG. 7 is a visual search result presentation of a model.
Detailed Description
The invention is further illustrated by the following figures and examples.
The invention combines the deep hash learning and the multi-view method for the first time through the deep multi-view enhanced hash. Sub-module multi-view hashing finds and quantifies view relationships under non-deep learning conditions. The deep multi-view enhanced hash retains the inherent advantages of the multi-view approach and can be applied to any single-view hash retrieval model.
The invention comprises the following steps:
step 1, problem definition and multi-view Hash (MV-Hash) detailed solution
Suppose that
Figure BDA0002154112120000061
Is a set of objects and the corresponding features:
Figure BDA0002154112120000071
wherein d ismIs the dimension of the mth view, M being the number of views, and N being the number of objects. We also represent an integrated binary code matrix
Figure BDA0002154112120000072
Figure BDA0002154112120000073
Wherein b isiIs and oiAssociated binary code and q is the code length. Formulating a mapping function
Figure BDA0002154112120000074
Where a function may convert a stack of similar objects into classification scores in different views. A potentially desirable hash function is then defined
Figure BDA0002154112120000075
The composition of (A) is as follows:
Figure BDA0002154112120000076
ε is an evaluation function that pre-trains each view network in the labeled dataset to perform the classification task before starting the stability evaluation. The following loss function was used:
Figure BDA0002154112120000077
abstracting the test process: the output dimension of which is consistent with the number of classes. Specific to image data, given image I ═ { I ═ I1,...,iN}. let Q be F (i), the dimension of Q be M × N × C, where M is the number of views, N is the number of pictures, C is the number of categories ∈ (F) is defined as follows:
Figure BDA0002154112120000078
representing epsilon as [ epsilon ]1,…,εM]Then a simple normalization ε is made:
Figure BDA0002154112120000079
then consider training the multi-view binary generation network with the view relationship information. At the beginning, consider a pair of images i1,i2And corresponding binary network output b1,b2E.g. B, do some relaxation mapping from { -1, +1}qTo [ -1, + 1)]q(ii) a Definition y-1 if they are similar, otherwise y-1. The following formula is the loss function for the mth view:
Figure BDA00021541121200000710
wherein | · | purple sweet1Is 1 norm, | · | is absolute value, | > 0 is boundary control, the third term is a regularization term to avoid gradient disappearance, and the more general image I ═ { I1NAnd corresponding output in multi-view space twoCarry system code
Figure BDA0002154112120000081
To obtain an equation representation in matrix form, B is introduced
Figure BDA0002154112120000082
The formula, given below:
Figure BDA0002154112120000083
merging
Figure BDA0002154112120000084
Is expressed in the form of a second multiplication term for p (I). Then the regular terms and the similarity matrix are supplemented, and the following global objective functions are obtained:
Figure BDA0002154112120000085
to intuitively show the effect and location of view stability assessment, symbol E is used. The view relationship matrix E is
Figure BDA0002154112120000086
The overall loss function is rewritten as:
Figure BDA0002154112120000087
with this objective function, the network is trained using a back propagation algorithm with a small batch gradient descent method, and the view relation matrix E can affect all layers of the network.
Step 2, fusion and enhancement
Copy Fusion-R is a relatively simple solution, which depends on parameters. We rank E for important views and enhance views by repeating the binary code of the respective views in the multi-view binary codeThe importance of (c). Specifically, the basic binary code is denoted B. The intermediate code (multi-view binary code) is represented as
Figure BDA00021541121200000810
The fusion vector v is set to direct the multi-view binary code to be repeated under various views, and the following formula represents the fusion process:
Figure BDA0002154112120000088
where H represents the input binary code of the fusion layer. φ (-) is a self-join operation of the vector from 1 to M. The second parameter in phi (-) indicates the number of self-replicates.
Figure BDA0002154112120000089
Is a ranking function in the d dimension. The advantage of this fusion method is that it can convert E into discrete control vectors, so E only determines the order between views. The strength of the enhancement or the weakening is manually controlled by the fusion vector.
View code Fusion (Fusion-C) takes into account the most primitive view relationship matrix and the least artificial constraints. In particular, we wish to eliminate the fused vector, which is used to ensure that the dimensions of the input data are unified in view code fusion due to the dynamic view relation matrix. First, the entire binary string H is encoded as a header code (H)h) Middle code (H)m) And a tail code (H)e)。HhAs with replication fusion. HmThe product of the binary code length and the coefficients of the corresponding view are directly used as the repetition time of the current code segment. This operation produces a series of dummy bytes (i.e., H)e) The lengths thereof are not equal. Secondly, we assign a specific and different view codeword in each view, which is a codeword belonging to [ -1, 1]The random number of (2). And HmIn contrast, HeView code is used instead of multi-view binary code. Thus, regardless of the dynamic view relationship matrix and code length, it can be fully populated. The advantage of view code fusion is that it is fully beneficialThe information contained in the view relation matrix is used. We found that view code fusion is limited by the view stability assessment in our experiments, which means that it can exceed replication fusion when the number of views increases.
Probability view pooling (Fusion-P): the invention provides a probability view pool with a view relation matrix as a multi-view fusion method. Conventional pooling operations select a maximum or average value as a result of each pooling cell. A view pool is a dimension reduction method that uses element maximization across views to unify data of multiple views into one view. Since pool operations inevitably result in information loss, we need to extend the length of the multiview binary code to preserve as much multiview information as possible before the probabilistic view pool. A view probability distribution is then generated from E. In each pooling filter, a random number sampled from the view probability distribution activates the selected view. The code fragments of this view are used for conventional pool operations. It ensures that sub-binaries of high priority views are more likely to appear during the fusion process.
Step 3, retrieval acceleration
To avoid excessive computational resources for stability evaluation during the search process, we have built a module called a memory network that is independent of the model but participates in training together. View stability evaluation seeks view relationships in the multi-view space, which is very time consuming and unsuitable in image retrieval. Therefore, we supplement a configuration module called a memory network. The memory network learns the view relation matrix E in step one, and then we can obtain the view relation matrix E through the module in step 2 without stability evaluation. The structure of the memory network is a multilayer convolutional neural network (e.g., VGG, ResNet, densnet, etc.), but its output layers are relative to the view relationship matrix E. And the loss function during training is
Figure BDA0002154112120000091
ln=(In-En)2. FIG. 1 shows different states of a deep multiview enhanced hash between two steps andand (6) associating. The model not only contains the stability assessment, but also trains some layers in advance to perform the stability assessment in step 1. In step 2, we can obtain the view relation matrix E without stability evaluation. It can greatly raise efficiency.
Step 4, experimental comparison
In the present invention we provide experiments on several common data sets and compare them with the most advanced hashing methods. The multi-view hashing method is also within our comparison scope. Two reference image datasets were used to evaluate our method: CIFAR-10 and NUS-WIDE. Next, we obtain 2D image multi-view information through RGB color space color histogram, HSV color space color histogram, texture. In addition to the multi-view information, we also learn the hash function from the convolution view using VGG-19. Dropout Batch Normalization was used for each fully connected layer to avoid overfitting. The activation function is ReLu and the hidden layer size is 4096 x 4096. We used a small batch of random gradient body planes (SGD) with 0.9 momentum and LR schedulers. Following the standard evaluation method hamming space search, it includes two consecutive steps: (1) pruning, using hash table lookup to return data points within hamming radius 2 for each query; (2) scanning, if the distance to each query is reached using a continuous code, the returned data points are rearranged in ascending order.
We compare the retrieval performance of D-MVE-Hash with several classical hashing methods: ITQ-CCA, KSH, the most advanced single view method based on CNN: CNNH, HashNet, DCH and multiview hashing methods: CHMIS, MVAGH, MAH, SSMDH. Table 1 shows the mAP experimental results of different code lengths on the CIFAR-10 dataset. Since the multiview Hash method does not use deep convolution, we split and name the multiview part of the method as MV-Hash and compare them in table 2.
TABLE 1 mAP to re-rank different bits on NUS-WIDE dataset
Figure BDA0002154112120000101
TABLE 2 mAP to re-rank different bits on NUS-WIDE dataset
Figure BDA0002154112120000102
Compared with the most advanced CNN-based single-view method and the traditional single-view hashing method, the proposed deep multi-view enhanced hashing method realizes higher retrieval performance on the CIFAR-10 data set. For example, deep multiview enhanced hashing achieves gains of 5.21%, 4.6%, 3.57%, and 4.30% when retrieved using 16-bit, 32-bit, 48-bit, and 64-bit hash codes as compared to DCH. Similar results were observed in other experiments. The proposed multi-view hashing achieves better performance on the NUS-WIDE dataset than the most advanced multi-view hashing methods. For example, when retrieved using 16-bit, 32-bit, and 48-bit hash codes, the multiview hash achieves gains of 3.44%, 1.65%, and 2.46% compared to SSMDH.
In fig. 3, we can see that the performance curve of the original binary code is drastically reduced by increasing the code length, while the performance curve of the deep multiview enhanced hash is not completely affected. The enhanced binary code can maintain stable retrieval performance under the condition of long code length. In fig. 5, we see that the maps of 128-bit depth multi-view enhanced hashing using view code fusion are between 77.82% and 83.21%. The best retrieval hamming radius is 5. the maps of the 128-bit depth multiview enhanced hash using copy fusion are between 76.68% and 83.39%, 1.14% and 0.18% lower than using view code fusion.
We summarize two advantages of this approach from experimental results: (1) the effect of the expansion of the Hamming radius on the result is small; (2) as the code length increases, the accuracy remains stable. Deep multiview enhanced hashing not only uses a convolutional neural network to obtain potential hash functions, but also combines multiview information in each view to generate a binary code. In contrast to other multi-view approaches, deep multi-view enhanced hashing uses a view relationship matrix, allowing the network to actively consider relationships between views to control the training direction. And the view relation matrix is not learned by existing fixed neural networks, which means that it is not uninterpretable. To visualize the differences more intuitively, we visually present the search results in fig. 7.

Claims (4)

1. The large-scale image high-speed retrieval method based on the multi-view enhanced depth hash is characterized by comprising the following steps of:
step 1, acquiring multi-view characteristic representation of an image;
step 2, calculating a view relation matrix;
step 3, designing a loss function of the model;
step 4, fusing and enhancing;
step 5, training the built model on a large-scale image training data set;
step 6, testing the trained model to generate a hash code, and then performing hash retrieval;
step 7, evaluating indexes in an experiment;
the method is characterized in that the steps 1 and 2 are realized as follows:
2-1. problem definition and multiview hash description:
suppose that
Figure FDA0003414986350000011
Is a set of objects and the corresponding features:
Figure FDA0003414986350000012
wherein d ismIs the dimension of the mth view, M being the number of views, N being the number of objects; integrated binary code matrix
Figure FDA0003414986350000013
Wherein b isiIs and oiAn associated binary code, and q is the code length;
2-2. setting a mapping function
Figure FDA0003414986350000014
Wherein the mapping function is capable of converting a stack of similar objects into classification scores in different views;
2-3, defining potentially expected hash function
Figure FDA0003414986350000015
The composition of (A) is as follows:
Figure FDA0003414986350000016
epsilon is an evaluation function, and each view network is trained in advance in a labeled data set to perform a classification task before stability evaluation is started; the following loss function was used:
Figure FDA0003414986350000017
2-4, abstracting the test process, wherein the output dimension of the test process is consistent with the number of classes;
given image I ═ I1,...,iNInstruction of
Figure FDA00034149863500000212
The dimension of Q is M × N × C, M is the number of views, N is the number of pictures, and C is the number of categories.
Figure FDA0003414986350000021
The definition is as follows:
Figure FDA0003414986350000022
ε is expressed as [. epsilon. ]1,…,εM]Then make normalized epsilon:
Figure FDA0003414986350000023
2. the large-scale image high-speed retrieval method based on multi-view enhanced depth hashing according to claim 1, wherein the step 3 is implemented as follows:
training a multi-view binary code generation network by using view relation information; at the beginning, set a pair of images i1,i2And corresponding binary network output b1,b2E is B; from { -1, +1} by relaxation mappingqTo [ -1, + 1)]q(ii) a If similar, define y ═ 1, otherwise y ═ 1; the following formula is the loss function for the mth view:
Figure FDA0003414986350000024
wherein | · | purple1Is 1 norm, | · | is absolute value, α > 0 is boundary control, the third term is regularization term to avoid gradient disappearance1,...,iNAnd the corresponding output binary code in the multi-view space
Figure FDA0003414986350000025
To obtain an equation representation in matrix form, B is introduced
Figure FDA0003414986350000026
The formula, given below:
Figure FDA0003414986350000027
merging
Figure FDA0003414986350000028
Is expressed in the form of a second multiplication term of p (I); then supplement the constantRule terms and similarity matrix, and obtain the following global objective function:
Figure FDA0003414986350000029
the view relationship matrix E is
Figure FDA00034149863500000210
The overall loss function is rewritten as:
Figure FDA00034149863500000211
with this function, the network is trained using a back propagation algorithm with a small batch gradient descent method, and the view relation matrix E can affect all layers of the network.
3. The large-scale image high-speed retrieval method based on multi-view enhanced depth hashing according to claim 2, wherein the step 4 is implemented as follows:
sorting the view relation matrix E to find important views, and enhancing the importance of the views by repeating the binary codes of the corresponding views in the multi-view binary codes; specifically, the basic binary code is denoted as B; the intermediate code is expressed as
Figure FDA0003414986350000031
Setting a fusion vector v to guide the multi-view binary code to be repeated under various views; the following formula represents the fusion process:
Figure FDA0003414986350000032
wherein H represents the input binary code of the fusion layer; phi (-) is a self-join operation of the vector from 1 to M; the second parameter in phi (-) represents the number of self-replicates;
Figure FDA0003414986350000033
is the ranking function in the d dimension; the advantage of this fusion method is that it can convert E into discrete control vectors, so E only determines the order between views; the strength of the enhancement or the weakening is manually controlled by the fusion vector;
eliminating a fusion vector after the view code fusion, wherein the fusion vector is used for ensuring that the dimensionality of input data is unified due to a dynamic view relation matrix in the view code fusion; first, the entire binary string H is encoded as a header code HhMiddle code HmAnd a tail code He;HhSame as copy fusion; hmDirectly using the product of the binary code length and the coefficients of the corresponding view as the repetition time of the current code segment; this operation produces a series of dummy bytes, or HeThe lengths thereof are not equal; secondly, a specific and different view codeword is assigned in each view, which is a codeword belonging to [ -1, 1]The random number of (2); and HmIn contrast, HeUsing view code instead of multi-view binary code; thus, regardless of the dynamic view relationship matrix and code length, it can be fully populated;
a probability view pool with a view relation matrix is provided as a multi-view fusion method, and view probability distribution is generated according to E; in each pooling filter, a random number sampled from the view probability distribution activates the selected view.
4. The large-scale image high-speed retrieval method based on multi-view enhanced depth hashing according to claim 3, wherein the step 5 is implemented as follows:
establishing a module called a memory network, which is independent of the model but participates in training together; a configuration module called a memory network is supplemented; learning the view relation matrix E in the step 1 by a memory network, and then obtaining the view relation matrix E through the module in the step 2 without stability evaluation; the structure of the memory network is a multilayer convolutional neural network, but its output layer is opposite to that of the convolutional neural networkA view relationship matrix E; and the loss function during training is
Figure FDA0003414986350000041
Figure FDA0003414986350000042
ln=(In-En)2
CN201910712046.9A 2019-08-02 2019-08-02 Large-scale image high-speed retrieval method based on multi-view enhanced depth hashing Active CN110674333B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910712046.9A CN110674333B (en) 2019-08-02 2019-08-02 Large-scale image high-speed retrieval method based on multi-view enhanced depth hashing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910712046.9A CN110674333B (en) 2019-08-02 2019-08-02 Large-scale image high-speed retrieval method based on multi-view enhanced depth hashing

Publications (2)

Publication Number Publication Date
CN110674333A CN110674333A (en) 2020-01-10
CN110674333B true CN110674333B (en) 2022-04-01

Family

ID=69068682

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910712046.9A Active CN110674333B (en) 2019-08-02 2019-08-02 Large-scale image high-speed retrieval method based on multi-view enhanced depth hashing

Country Status (1)

Country Link
CN (1) CN110674333B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310821B (en) * 2020-02-11 2023-11-21 佛山科学技术学院 Multi-view feature fusion method, system, computer equipment and storage medium
CN112907712A (en) * 2021-01-22 2021-06-04 杭州电子科技大学 Three-dimensional model feature representation method based on multi-view hash enhanced hash
CN113377981B (en) * 2021-06-29 2022-05-27 山东建筑大学 Large-scale logistics commodity image retrieval method based on multitask deep hash learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104679835A (en) * 2015-02-09 2015-06-03 浙江大学 Book recommending method based on multi-view hash
CN106649715A (en) * 2016-12-21 2017-05-10 中国人民解放军国防科学技术大学 Cross-media retrieval method based on local sensitive hash algorithm and neural network
CN107016708A (en) * 2017-03-24 2017-08-04 杭州电子科技大学 A kind of image Hash coding method based on deep learning
CN110059205A (en) * 2019-03-20 2019-07-26 杭州电子科技大学 A kind of threedimensional model classification retrieving method based on multiple view

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104679835A (en) * 2015-02-09 2015-06-03 浙江大学 Book recommending method based on multi-view hash
CN106649715A (en) * 2016-12-21 2017-05-10 中国人民解放军国防科学技术大学 Cross-media retrieval method based on local sensitive hash algorithm and neural network
CN107016708A (en) * 2017-03-24 2017-08-04 杭州电子科技大学 A kind of image Hash coding method based on deep learning
CN110059205A (en) * 2019-03-20 2019-07-26 杭州电子科技大学 A kind of threedimensional model classification retrieving method based on multiple view

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Unsupervised segmentation of multiview feature semantics by hashing model";Jia Cui等;《Signal Processing》;20190215;第106-110页 *

Also Published As

Publication number Publication date
CN110674333A (en) 2020-01-10

Similar Documents

Publication Publication Date Title
CN108920720B (en) Large-scale image retrieval method based on depth hash and GPU acceleration
CN111723220B (en) Image retrieval method and device based on attention mechanism and Hash and storage medium
US20180276528A1 (en) Image Retrieval Method Based on Variable-Length Deep Hash Learning
CN110674333B (en) Large-scale image high-speed retrieval method based on multi-view enhanced depth hashing
CN110222218B (en) Image retrieval method based on multi-scale NetVLAD and depth hash
Shi et al. Deep adaptively-enhanced hashing with discriminative similarity guidance for unsupervised cross-modal retrieval
CN111198959A (en) Two-stage image retrieval method based on convolutional neural network
CN108875076B (en) Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network
CN111080551B (en) Multi-label image complement method based on depth convolution feature and semantic neighbor
CN114358188A (en) Feature extraction model processing method, feature extraction model processing device, sample retrieval method, sample retrieval device and computer equipment
CN109918507B (en) textCNN (text-based network communication network) improved text classification method
CN109960732B (en) Deep discrete hash cross-modal retrieval method and system based on robust supervision
CN115248876B (en) Remote sensing image overall recommendation method based on content understanding
CN107180079B (en) Image retrieval method based on convolutional neural network and tree and hash combined index
WO2023036157A1 (en) Self-supervised spatiotemporal representation learning by exploring video continuity
KR102305575B1 (en) Method and system for highlighting similar areas using similarity between images
CN110598022A (en) Image retrieval system and method based on robust deep hash network
CN112766458A (en) Double-current supervised depth Hash image retrieval method combining classification loss
Shen et al. Unsupervised multiview distributed hashing for large-scale retrieval
CN117556067B (en) Data retrieval method, device, computer equipment and storage medium
CN117635275B (en) Intelligent electronic commerce operation commodity management platform and method based on big data
CN115564013B (en) Method for improving learning representation capability of network representation, model training method and system
CN115080699A (en) Cross-modal retrieval method based on modal specific adaptive scaling and attention network
CN113641790A (en) Cross-modal retrieval model based on distinguishing representation depth hash
CN114168770A (en) Deep learning-based method and device for searching images by images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant