CN113656632B - Attribute-aware Hash coding learning method in large-scale fine-grained image retrieval - Google Patents
Attribute-aware Hash coding learning method in large-scale fine-grained image retrieval Download PDFInfo
- Publication number
- CN113656632B CN113656632B CN202111223861.2A CN202111223861A CN113656632B CN 113656632 B CN113656632 B CN 113656632B CN 202111223861 A CN202111223861 A CN 202111223861A CN 113656632 B CN113656632 B CN 113656632B
- Authority
- CN
- China
- Prior art keywords
- hash
- image
- attribute
- feature
- dimension
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/51—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Abstract
The invention discloses an attribute perception hash coding learning method in large-scale fine-grained image retrieval, which comprises the following steps: extracting global feature and local feature information in the image through a convolutional neural network; the method comprises the steps of constructing a Hash learning module, extracting high-dimensional image feature information to a low-dimensional Hash space, constructing a Hash feature decoder, and guiding an attribute feature extraction mode in the Hash learning process in an unsupervised mode; the identification capability of each dimension attribute obtained by learning of the Hash module is enhanced, and the redundant correlation among the attribute features of each dimension is removed, so that the attribute features of each dimension have unique and complete expression meanings. The method extracts local and global features in the image through a convolutional neural network and an attention mechanism, guides Hash learning to keep relatively complete and important overall image feature information by establishing an attribute feature decoder and enabling feature vectors to have self-orthogonality characteristics, and can obtain higher image retrieval accuracy.
Description
Technical Field
The invention belongs to the field of computer vision, and particularly relates to an attribute perception hash coding learning method in large-scale fine-grained image retrieval.
Background
Fine-grained image retrieval has gained increasing attention in recent years as an important component of fine-grained image analysis. The fine-grained image recognition is a basic research subject in the field of computer vision and pattern recognition, and aims to research the visual recognition task of different subclasses of fine-grained levels under a certain traditional semantic class, for example, … … fine-grained image recognition of dogs of different subclasses, birds of different subclasses, automobiles of different vehicle types and the like is called as 'visual perception embedded basic stone work' by the international authority scholars of computer vision, ICCV Helmholtz awards and the professor that Marr awards the Serge Belongie. The object objects in the fine-grained image have only slight visual difference in the difference between classes, but have larger variation in the differences in the classes such as posture, scale and the like, so that the retrieval difficulty is higher.
Hash learning is a method for mapping data into a binary string form by a machine learning method, and can remarkably reduce the storage and communication overhead of the data, thereby effectively improving the efficiency of a learning system. The purpose of hash learning is to learn a binary hash code representation of data, so that the hash code retains the neighbor relation in the original space as much as possible, i.e., retains similarity. Specifically, each data point would be encoded by a compact binary string, and two similar points in the original space should be mapped to two similar points in the hash space. Hash methods are roughly classified into two types, namely, data-independent methods and data-dependent methods. In a data-independent hashing approach, the hash function in the model is typically generated randomly and independent of any training data, but the improvement in retrieval performance requires trading for the length of the hash code. Data-dependent hashing methods attempt to learn a hash function from some training data, known as the learning hash algorithm. Compared with a data-independent method, the learning hash algorithm can achieve higher accuracy with shorter hash codes. Therefore, learning hash algorithms is more popular than data-independent methods in practical applications. With the rise of deep learning, some learning hash methods integrate deep feature learning into a hash frame, and obtain good performance. In past work, many deep hash methods have been proposed for large-scale image retrieval. Compared with a deep unsupervised hash method, the deep supervised hash method can fully mine semantic information and obtain higher retrieval precision.
Although the current deep learning hash algorithm has good retrieval effect, the deep learning hash algorithm is limited to coarse-grained data retrieval. In many cases, taking a picture search of a dog as an example, one would not only want to search for a dog but not other animals, but what breed of dog it is, such as Cork or Samoyaer. In such a case, the retrieval accuracy of the currently-available learning hash method is very low. On the other hand, the binary code obtained by the existing learning hash method has no practical significance, so that the storage and retrieval results of the pictures do not have any interpretability. Therefore, a new learning hash method with practical meaning that can obtain a response result with high accuracy in a fine-grained retrieval environment is needed.
Disclosure of Invention
The invention aims to provide an attribute perception hash coding learning method in large-scale fine-grained image retrieval.
The technical scheme for realizing the purpose of the invention is as follows: an attribute perception hash coding learning method in large-scale fine-grained image retrieval comprises the following steps:
Step 2, constructing a Hash learning module, extracting high-dimensional image feature information into a low-dimensional Hash space, constructing a Hash feature decoder, and guiding an attribute feature extraction mode in the Hash learning process in an unsupervised mode;
and 3, enhancing the identification capability of each dimension attribute obtained by learning of the Hash module in the step 2, and removing the redundant correlation among the features of each dimension attribute.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above-mentioned attribute-aware hash-code learning method in large-scale fine-grained image retrieval when executing the computer program.
A computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the above-described attribute-aware hash-code learning method in large-scale fine-grained image retrieval.
Compared with the prior art, the invention has the remarkable advantages that: (1) the method has the performance result of the traditional Hash learning method compared with the coarse-grained image retrieval, and has the retrieval accuracy rate far exceeding that of the traditional Hash learning method in the fine-grained image retrieval; (2) establishing an attribute feature decoder, and guiding Hash learning to reserve relatively complete and important integral image feature information, so that the information contained in the original image can be more comprehensively expressed after information features in each dimension of Hash space are recombined; (3) by constructing the attribute self-orthogonality mode, the redundancy correlation of the attribute features learned by each dimension space is eliminated, so that the attribute features of each dimension have unique and complete expression meanings, namely each hash dimension can represent attribute feature information of a depth. The invention gives attribute meaning to each dimension of information in the hash space, which is not possessed by other hash learning methods.
Drawings
Fig. 1 is a schematic diagram of an attribute-aware hash coding learning method in large-scale fine-grained image retrieval according to the present invention.
Detailed Description
With reference to fig. 1, a method for learning attribute-aware hash codes in large-scale fine-grained image retrieval specifically includes the following steps:
attention plays a very important role in human perception, and let us pay attention to the salient features of the same thing or scene, so we introduce an attention mechanism in a convolutional neural network to acquire global and local features of an image to better express the salient features of each image. Specifically, it is first necessary to extract an input image by a convolutional neural networkThe depth characteristics of (a):
whereinRepresents a custom convolutional neural network that is,C、HandWrespectively representing depth characteristicsThe number of channels, the characteristic length and the characteristic width; depth feature obtained in equation (1)On the basis of (1), introduceCA local attention guidance module, hereinCLocal attention guidance module and depth featureNumber of channelsCIs correspondingly marked asA global attention guidance module is introduced, and is recorded as And outputting the local characteristics of the image as follows:
the global feature output of the image is:
obtaining a global feature vector of the image by performing global average pooling on the feature outputsAnd local depth feature vectorAnd recording the integral characteristic vector of the image obtained after sequential splicing as。
Step 2, constructing a Hash learning module, extracting high-dimensional image feature information into a low-dimensional Hash space, constructing a Hash feature decoder, and guiding an attribute feature extraction mode in the Hash learning process in an unsupervised mode;
the Hash learning module passes through a transformation matrixThe depth feature vector obtained in the step 1 is processedMapping into a k-dimensional Hash space, denoted. Binary hash coding of imagesByObtained by two activations:
whereinIs an approximate binary code of dimension k by transforming matricesThe obtained highly condensed image feature expression vector,the binary coding of the finally obtained image is performed, that is, the information of the whole image can be expressed by the bit information of k bits, and the retrieval space is greatly compressed. First activating tanh to constrainThe gradient can be reversely propagated, and the second activation restricts the characteristic vector to Hamming coding to accelerate the image retrieval speed.
When calculating hash loss, it is assumed thatnA query pointAndmindividual database pointFollowing equation (4), the hash codes of the query point and the database point can be respectively expressed as:
whereinIs through a query pointThe resulting hash-code is activated and,is through database pointsThe resulting hash code is activated. The loss of hash coding can be noted as:
The characteristic decoder reconstructs the hash space characteristic after tanh activationRestoring the attribute features and constraining the feature loss, and recording as:
wherein the content of the first and second substances,d represents each feature vectorDimension (d);representing a reconstruction matrix, being a Hash transform matrixTransposing;,is composed ofThe hyper-parameters introduced in the loss optimization process,through unsupervised coding reconstruction, Hash learning can be guided to keep relatively complete and important overall image characteristic information, and information contained in the original image can be more comprehensively expressed after information characteristics in each dimension of Hash space are recombined.
And 3, enhancing the identification capability of each dimension attribute obtained by learning of the Hash module in the step 2, and removing the redundant correlation among the features of each dimension attribute.
Performing hash transformation on the characteristic vector obtained in the step 2 and performing tanh activation on the characteristic vector Constructing self-orthogonality loss, and recording as:
whereinThe hash dimension is an identity matrix, so that the redundant correlation of the attribute features learned by each dimension space can be eliminated, the attribute features of each dimension have unique and complete expression meanings, and each hash dimension can represent attribute feature information of a depth.
The overall constraint penalty can be written as:
The binary hash-coded output of the input image can be written as:
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (3)
1. An attribute perception hash coding learning method in large-scale fine-grained image retrieval comprises the following steps:
Step 1, extracting global feature and local feature information in an image through a convolutional neural network; the method comprises the following specific steps:
by introducing an attention mechanism into the convolutional neural network, acquiring global and local features of the image to express a salient feature of each image;
firstly, an input image is extracted through a convolutional neural networkThe depth characteristics of (a):
whereinRepresents a custom convolutional neural network that is,C、HandWrespectively representing depth characteristicsThe number of channels, the characteristic length and the characteristic width; depth feature obtained in equation (1)On the basis of (1), introduceCA local injectionThe guiding module of the intention is recorded asA global attention guidance module is introduced, and is recorded asAnd the local feature output of the image is as follows:
the global feature output of the image is:
obtaining a global feature vector of the image by performing global average pooling on the feature outputsAnd local depth feature vectorAnd recording the integral characteristic vector of the image obtained after sequential splicing as;
Step 2, constructing a Hash learning module, extracting high-dimensional image feature information into a low-dimensional Hash space, constructing a Hash feature decoder, and guiding an attribute feature extraction mode in the Hash learning process in an unsupervised mode, namely guiding Hash learning to keep the whole image feature information so that the information contained in the original image can be expressed after information features in each dimension of Hash space are recombined; the method specifically comprises the following steps:
Constructing a Hash learning module by a transformationMatrix ofIntegrating the integral characteristic vector obtained in the step 1Mapping into a k-dimensional Hash space, denoted(ii) a Binary hash coding of imagesByObtained by two activations:
whereinIs an approximate binary code of dimension k by transforming matricesThe obtained image characteristic expression vector is used for expressing the image characteristic,then, the binary coding of the finally obtained image is performed, namely the information of the whole image is expressed by the bit information of k bits; first activating tanh to constrainThe gradient can be propagated reversely, and the second activation restricts the characteristic vector to Hamming coding;
when calculating hash loss, it is assumed thatnA query pointAndmindividual database pointFollowing equation (4), the hash codes of the query point and the database point are respectively recorded as:
whereinIs through a query pointThe resulting hash-code is activated and,is through database pointsActivating the obtained hash code; the loss of hash coding is noted as:
The characteristic decoder reconstructs the hash space characteristic after tanh activationRestoring the attribute features and constraining the feature loss, and recording as:
wherein the content of the first and second substances,,drepresenting each feature vectorDimension (d); Representing a reconstruction matrix, being a Hash transform matrixTransposing;is composed ofThe hyper-parameters introduced in the loss optimization process,,;
step 3, enhancing the identification capability of each dimension attribute obtained by the Hash learning module in the step 2, and removing redundant correlation among the attribute features of each dimension in a mode of constructing attribute self-orthogonality, so that the attribute features of each dimension have unique and complete expression meanings, namely each Hash dimension can represent attribute feature information of one depth; the method specifically comprises the following steps:
enhancing the identification capability of each dimension attribute obtained by the Hash learning module in the step 2, and performing hash transformation on the matrix in the step 2 and performing tanh activation on the obtained feature vector setConstructing self-orthogonality loss, and recording as:
whereinThe unit matrix is used, so that the redundant correlation of the attribute characteristics learned by each dimension space can be eliminated;
the overall constraint loss is noted as:
the binary hash-coded output of the input image is noted as:
2. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program performs the steps of the attribute-aware hash-code learning method in large-scale fine-grained image retrieval according to claim 1.
3. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of attribute-aware hash-code learning in large-scale fine-grained image retrieval as set forth in claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111223861.2A CN113656632B (en) | 2021-10-21 | 2021-10-21 | Attribute-aware Hash coding learning method in large-scale fine-grained image retrieval |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111223861.2A CN113656632B (en) | 2021-10-21 | 2021-10-21 | Attribute-aware Hash coding learning method in large-scale fine-grained image retrieval |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113656632A CN113656632A (en) | 2021-11-16 |
CN113656632B true CN113656632B (en) | 2022-08-12 |
Family
ID=78484339
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111223861.2A Active CN113656632B (en) | 2021-10-21 | 2021-10-21 | Attribute-aware Hash coding learning method in large-scale fine-grained image retrieval |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113656632B (en) |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110069644B (en) * | 2019-04-24 | 2023-06-06 | 南京邮电大学 | Compressed domain large-scale image retrieval method based on deep learning |
CN111125411B (en) * | 2019-12-20 | 2022-06-21 | 昆明理工大学 | Large-scale image retrieval method for deep strong correlation hash learning |
-
2021
- 2021-10-21 CN CN202111223861.2A patent/CN113656632B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN113656632A (en) | 2021-11-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Guo et al. | Locally supervised deep hybrid model for scene recognition | |
CN111079532B (en) | Video content description method based on text self-encoder | |
Taylor et al. | Learning invariance through imitation | |
Jiang et al. | Cascaded subpatch networks for effective CNNs | |
CN111723220A (en) | Image retrieval method and device based on attention mechanism and Hash and storage medium | |
Mathur et al. | Camera2Caption: a real-time image caption generator | |
CN111666588B (en) | Emotion differential privacy protection method based on generation countermeasure network | |
CN111400494B (en) | Emotion analysis method based on GCN-Attention | |
CN114896434B (en) | Hash code generation method and device based on center similarity learning | |
Dixit et al. | Object based scene representations using fisher scores of local subspace projections | |
Xie et al. | Feature normalization for part-based image classification | |
CN111325766A (en) | Three-dimensional edge detection method and device, storage medium and computer equipment | |
CN113961736A (en) | Method and device for generating image by text, computer equipment and storage medium | |
CN113780249B (en) | Expression recognition model processing method, device, equipment, medium and program product | |
CN115565238A (en) | Face-changing model training method, face-changing model training device, face-changing model training apparatus, storage medium, and program product | |
CN116229531A (en) | Face front image synthesis method for collaborative progressive generation countermeasure network | |
CN114283352A (en) | Video semantic segmentation device, training method and video semantic segmentation method | |
Robert | The Role of Deep Learning in Computer Vision | |
CN113656632B (en) | Attribute-aware Hash coding learning method in large-scale fine-grained image retrieval | |
CN114399646B (en) | Image description method and device based on transform structure | |
CN111611427B (en) | Image retrieval method and system based on linear discriminant analysis depth hash algorithm | |
CN115457374A (en) | Deep pseudo-image detection model generalization evaluation method and device based on reasoning mode | |
CN114612826A (en) | Video and text similarity determination method and device, electronic equipment and storage medium | |
Li et al. | A semi-supervised learning model based on convolutional autoencoder and convolutional neural network for image classification | |
KR102592515B1 (en) | Apparatus and method for embedding-based data set processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |