CN113656632B

CN113656632B - Attribute-aware Hash coding learning method in large-scale fine-grained image retrieval

Info

Publication number: CN113656632B
Application number: CN202111223861.2A
Authority: CN
Inventors: 魏秀参
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2021-10-21
Filing date: 2021-10-21
Publication date: 2022-08-12
Anticipated expiration: 2041-10-21
Also published as: CN113656632A

Abstract

The invention discloses an attribute perception hash coding learning method in large-scale fine-grained image retrieval, which comprises the following steps: extracting global feature and local feature information in the image through a convolutional neural network; the method comprises the steps of constructing a Hash learning module, extracting high-dimensional image feature information to a low-dimensional Hash space, constructing a Hash feature decoder, and guiding an attribute feature extraction mode in the Hash learning process in an unsupervised mode; the identification capability of each dimension attribute obtained by learning of the Hash module is enhanced, and the redundant correlation among the attribute features of each dimension is removed, so that the attribute features of each dimension have unique and complete expression meanings. The method extracts local and global features in the image through a convolutional neural network and an attention mechanism, guides Hash learning to keep relatively complete and important overall image feature information by establishing an attribute feature decoder and enabling feature vectors to have self-orthogonality characteristics, and can obtain higher image retrieval accuracy.

Description

Attribute-aware Hash coding learning method in large-scale fine-grained image retrieval

Technical Field

The invention belongs to the field of computer vision, and particularly relates to an attribute perception hash coding learning method in large-scale fine-grained image retrieval.

Background

Fine-grained image retrieval has gained increasing attention in recent years as an important component of fine-grained image analysis. The fine-grained image recognition is a basic research subject in the field of computer vision and pattern recognition, and aims to research the visual recognition task of different subclasses of fine-grained levels under a certain traditional semantic class, for example, … … fine-grained image recognition of dogs of different subclasses, birds of different subclasses, automobiles of different vehicle types and the like is called as 'visual perception embedded basic stone work' by the international authority scholars of computer vision, ICCV Helmholtz awards and the professor that Marr awards the Serge Belongie. The object objects in the fine-grained image have only slight visual difference in the difference between classes, but have larger variation in the differences in the classes such as posture, scale and the like, so that the retrieval difficulty is higher.

Hash learning is a method for mapping data into a binary string form by a machine learning method, and can remarkably reduce the storage and communication overhead of the data, thereby effectively improving the efficiency of a learning system. The purpose of hash learning is to learn a binary hash code representation of data, so that the hash code retains the neighbor relation in the original space as much as possible, i.e., retains similarity. Specifically, each data point would be encoded by a compact binary string, and two similar points in the original space should be mapped to two similar points in the hash space. Hash methods are roughly classified into two types, namely, data-independent methods and data-dependent methods. In a data-independent hashing approach, the hash function in the model is typically generated randomly and independent of any training data, but the improvement in retrieval performance requires trading for the length of the hash code. Data-dependent hashing methods attempt to learn a hash function from some training data, known as the learning hash algorithm. Compared with a data-independent method, the learning hash algorithm can achieve higher accuracy with shorter hash codes. Therefore, learning hash algorithms is more popular than data-independent methods in practical applications. With the rise of deep learning, some learning hash methods integrate deep feature learning into a hash frame, and obtain good performance. In past work, many deep hash methods have been proposed for large-scale image retrieval. Compared with a deep unsupervised hash method, the deep supervised hash method can fully mine semantic information and obtain higher retrieval precision.

Although the current deep learning hash algorithm has good retrieval effect, the deep learning hash algorithm is limited to coarse-grained data retrieval. In many cases, taking a picture search of a dog as an example, one would not only want to search for a dog but not other animals, but what breed of dog it is, such as Cork or Samoyaer. In such a case, the retrieval accuracy of the currently-available learning hash method is very low. On the other hand, the binary code obtained by the existing learning hash method has no practical significance, so that the storage and retrieval results of the pictures do not have any interpretability. Therefore, a new learning hash method with practical meaning that can obtain a response result with high accuracy in a fine-grained retrieval environment is needed.

Disclosure of Invention

The invention aims to provide an attribute perception hash coding learning method in large-scale fine-grained image retrieval.

The technical scheme for realizing the purpose of the invention is as follows: an attribute perception hash coding learning method in large-scale fine-grained image retrieval comprises the following steps:

step 1, extracting global feature and local feature information in an image through a convolutional neural network;

Step 2, constructing a Hash learning module, extracting high-dimensional image feature information into a low-dimensional Hash space, constructing a Hash feature decoder, and guiding an attribute feature extraction mode in the Hash learning process in an unsupervised mode;

and 3, enhancing the identification capability of each dimension attribute obtained by learning of the Hash module in the step 2, and removing the redundant correlation among the features of each dimension attribute.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above-mentioned attribute-aware hash-code learning method in large-scale fine-grained image retrieval when executing the computer program.

A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the above-described attribute-aware hash-code learning method in large-scale fine-grained image retrieval.

Compared with the prior art, the invention has the remarkable advantages that: (1) the method has the performance result of the traditional Hash learning method compared with the coarse-grained image retrieval, and has the retrieval accuracy rate far exceeding that of the traditional Hash learning method in the fine-grained image retrieval; (2) establishing an attribute feature decoder, and guiding Hash learning to reserve relatively complete and important integral image feature information, so that the information contained in the original image can be more comprehensively expressed after information features in each dimension of Hash space are recombined; (3) by constructing the attribute self-orthogonality mode, the redundancy correlation of the attribute features learned by each dimension space is eliminated, so that the attribute features of each dimension have unique and complete expression meanings, namely each hash dimension can represent attribute feature information of a depth. The invention gives attribute meaning to each dimension of information in the hash space, which is not possessed by other hash learning methods.

Drawings

Fig. 1 is a schematic diagram of an attribute-aware hash coding learning method in large-scale fine-grained image retrieval according to the present invention.

Detailed Description

With reference to fig. 1, a method for learning attribute-aware hash codes in large-scale fine-grained image retrieval specifically includes the following steps:

attention plays a very important role in human perception, and let us pay attention to the salient features of the same thing or scene, so we introduce an attention mechanism in a convolutional neural network to acquire global and local features of an image to better express the salient features of each image. Specifically, it is first necessary to extract an input image by a convolutional neural network

The depth characteristics of (a):

wherein

Represents a custom convolutional neural network that is,C、HandWrespectively representing depth characteristics

The number of channels, the characteristic length and the characteristic width; depth feature obtained in equation (1)

On the basis of (1), introduceCA local attention guidance module, hereinCLocal attention guidance module and depth feature

Number of channelsCIs correspondingly marked as

A global attention guidance module is introduced, and is recorded as

And outputting the local characteristics of the image as follows:

the global feature output of the image is:

obtaining a global feature vector of the image by performing global average pooling on the feature outputs

And local depth feature vector

And recording the integral characteristic vector of the image obtained after sequential splicing as

。

the Hash learning module passes through a transformation matrix

The depth feature vector obtained in the step 1 is processed

Mapping into a k-dimensional Hash space, denoted

. Binary hash coding of images

By

Obtained by two activations:

wherein

Is an approximate binary code of dimension k by transforming matrices

The obtained highly condensed image feature expression vector,

the binary coding of the finally obtained image is performed, that is, the information of the whole image can be expressed by the bit information of k bits, and the retrieval space is greatly compressed. First activating tanh to constrain

The gradient can be reversely propagated, and the second activation restricts the characteristic vector to Hamming coding to accelerate the image retrieval speed.

When calculating hash loss, it is assumed thatnA query point

Andmindividual database point

Following equation (4), the hash codes of the query point and the database point can be respectively expressed as:

wherein

Is through a query point

The resulting hash-code is activated and,

is through database points

The resulting hash code is activated. The loss of hash coding can be noted as:

wherein

，

。

The characteristic decoder reconstructs the hash space characteristic after tanh activation

Restoring the attribute features and constraining the feature loss, and recording as:

wherein the content of the first and second substances,

d represents each feature vector

Dimension (d);

representing a reconstruction matrix, being a Hash transform matrix

Transposing;

，

is composed of

The hyper-parameters introduced in the loss optimization process,

through unsupervised coding reconstruction, Hash learning can be guided to keep relatively complete and important overall image characteristic information, and information contained in the original image can be more comprehensively expressed after information characteristics in each dimension of Hash space are recombined.

Performing hash transformation on the characteristic vector obtained in the step 2 and performing tanh activation on the characteristic vector

Constructing self-orthogonality loss, and recording as:

wherein

The hash dimension is an identity matrix, so that the redundant correlation of the attribute features learned by each dimension space can be eliminated, the attribute features of each dimension have unique and complete expression meanings, and each hash dimension can represent attribute feature information of a depth.

The overall constraint penalty can be written as:

（10）

wherein

And

for the introduced hyper-parameters, for the alignment dimension.

The binary hash-coded output of the input image can be written as:

in the formulaGAP(

) Representing the global average pooling.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An attribute perception hash coding learning method in large-scale fine-grained image retrieval comprises the following steps:

Step 1, extracting global feature and local feature information in an image through a convolutional neural network; the method comprises the following specific steps:

by introducing an attention mechanism into the convolutional neural network, acquiring global and local features of the image to express a salient feature of each image;

firstly, an input image is extracted through a convolutional neural network

The depth characteristics of (a):

wherein

On the basis of (1), introduceCA local injectionThe guiding module of the intention is recorded as

A global attention guidance module is introduced, and is recorded as

And the local feature output of the image is as follows:

the global feature output of the image is:

And local depth feature vector

；

Step 2, constructing a Hash learning module, extracting high-dimensional image feature information into a low-dimensional Hash space, constructing a Hash feature decoder, and guiding an attribute feature extraction mode in the Hash learning process in an unsupervised mode, namely guiding Hash learning to keep the whole image feature information so that the information contained in the original image can be expressed after information features in each dimension of Hash space are recombined; the method specifically comprises the following steps:

Constructing a Hash learning module by a transformationMatrix of

Integrating the integral characteristic vector obtained in the step 1

Mapping into a k-dimensional Hash space, denoted

(ii) a Binary hash coding of images

By

Obtained by two activations:

wherein

Is an approximate binary code of dimension k by transforming matrices

The obtained image characteristic expression vector is used for expressing the image characteristic,

then, the binary coding of the finally obtained image is performed, namely the information of the whole image is expressed by the bit information of k bits; first activating tanh to constrain

The gradient can be propagated reversely, and the second activation restricts the characteristic vector to Hamming coding;

when calculating hash loss, it is assumed thatnA query point

Andmindividual database point

Following equation (4), the hash codes of the query point and the database point are respectively recorded as:

wherein

Is through a query point

The resulting hash-code is activated and,

is through database points

Activating the obtained hash code; the loss of hash coding is noted as:

wherein

，

；

wherein the content of the first and second substances,

，drepresenting each feature vector

Dimension (d);

Representing a reconstruction matrix, being a Hash transform matrix

Transposing;

is composed of

The hyper-parameters introduced in the loss optimization process,

，

；

step 3, enhancing the identification capability of each dimension attribute obtained by the Hash learning module in the step 2, and removing redundant correlation among the attribute features of each dimension in a mode of constructing attribute self-orthogonality, so that the attribute features of each dimension have unique and complete expression meanings, namely each Hash dimension can represent attribute feature information of one depth; the method specifically comprises the following steps:

enhancing the identification capability of each dimension attribute obtained by the Hash learning module in the step 2, and performing hash transformation on the matrix in the step 2 and performing tanh activation on the obtained feature vector set

Constructing self-orthogonality loss, and recording as:

wherein

The unit matrix is used, so that the redundant correlation of the attribute characteristics learned by each dimension space can be eliminated;

the overall constraint loss is noted as:

（10）

wherein

And

for the introduced hyper-parameters, for the alignment dimension;

the binary hash-coded output of the input image is noted as:

in the formulaGAP(

) Representing the global average pooling, which has the effect of reducing the dimension of the feature map and forming a feature point.

2. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program performs the steps of the attribute-aware hash-code learning method in large-scale fine-grained image retrieval according to claim 1.

3. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of attribute-aware hash-code learning in large-scale fine-grained image retrieval as set forth in claim 1.