CN114329031A - Fine-grained bird image retrieval method based on graph neural network and deep hash - Google Patents
Fine-grained bird image retrieval method based on graph neural network and deep hash Download PDFInfo
- Publication number
- CN114329031A CN114329031A CN202111521433.8A CN202111521433A CN114329031A CN 114329031 A CN114329031 A CN 114329031A CN 202111521433 A CN202111521433 A CN 202111521433A CN 114329031 A CN114329031 A CN 114329031A
- Authority
- CN
- China
- Prior art keywords
- hash
- attention
- fine
- loss
- grained
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The invention discloses a fine-grained bird image retrieval method based on a graph neural network and deep hash, which belongs to the field of fine-grained image retrieval and comprises the following steps: based on node representation of local features, local feature enhancement, relevant component relation mining based on graph convolution, semantic hash coding and a loss function, a comprehensive method based on a graph neural network and a deep hash method is provided, the method is suitable for large-scale bird image retrieval, and fine-granularity bird image retrieval can be achieved with high efficiency, low storage and high precision.
Description
Technical Field
The invention belongs to the field of fine-grained image retrieval, and particularly relates to a fine-grained bird image retrieval method based on a graph neural network and deep hash.
Background
At the present stage, monitoring of birds in natural conservation areas and national wetland areas can be used as important indexes for evaluating biological diversity and ecological environment of the birds; for farmland and airport areas, bird surveillance affects farmer income and normal operation of airports. Since birds are abundant in nature, it is desired by bird experts how to quickly and accurately search for correct birds from a large-scale data set. Because retrieval of bird images is different from retrieval of general images, two characteristics are provided between database images and query images: (1) the difference between the birds is small, and the difference is often in a subtle way. Such as a bird's head or tail. (2) The intra-class difference is large, and due to factors such as illumination, posture and the like, different images of the same bird have large difference. At present, birds are often searched by experts through manual identification, and the bird searching method is high in cost and prone to errors. Therefore, in the face of a large-scale bird data set, if a computer vision technology can be used for realizing a fine-grained bird identification technology with low cost, low storage and high efficiency, the method has important significance for the industry and the agriculture.
In recent years, fine-grained image retrieval has been precedent in the field of computer vision. In the early days, Xie et al calculated the similarity between images based on manual features. When searching, the large class to which the image belongs is judged first, and then fine-grained level searching is carried out. With the development of convolutional neural networks, fine-grained retrieval methods based on deep learning are proposed. These depth methods can be roughly classified into a supervised method and an unsupervised method. For the unsupervised retrieval method, the selective convolution descriptor aggregation method is proposed by SCDA et al, which first locates the objects in the fine-grained image and retains the useful deep descriptors for fine-grained image retrieval. However, the method uses the pre-training model as a fixed feature extractor, does not customize a network to learn corresponding fine-grained features, and does not consider the structural relationship between the corresponding fine-grained components for the extracted convolution descriptors, so that the retrieval precision is general.
In the supervised approach, fine-grained image retrieval is defined as a depth metric learning problem. Zheng et al propose a CRL-WSL method, which is a unified framework for efficient learning to distinguish features and use learning salient regions to perform centralized ordering loss and segmentation on target contours. Zheng et al also proposed a DCL-NC penalty method, which was improved based on the CRL-WSL method, increased the normalized scale layer and decorrelation ranking loss, and realized higher retrieval accuracy. However, the encoding dimensions of the two methods are 1024 dimensions, the encoding dimensions are high, and in practical large-scale image retrieval, the problems of slow query speed and storage redundancy can be encountered. Therefore, the realization of efficient and high-precision fine-grained image retrieval using low-stored code bits is a major concern at present.
Disclosure of Invention
The invention provides a fine-grained bird image retrieval method based on deep hash and a graph neural network, which can realize fine-grained bird image retrieval with high efficiency, low storage and high precision.
In order to achieve the technical purpose, the technical scheme of the invention is as follows:
a fine-grained bird image retrieval method based on a graph neural network and deep hashing comprises the following steps:
step 1, sending the image to a backbone network Resnet50 to obtain the characteristic F e R of the imageH×W×NWherein H, W and N represent the height, width and channel number of the feature, respectively; converting the obtained feature map into an attention map, and extracting local features with discriminability through the attention maps of different local areas;
step 2, generating an image with discarded attention aiming at the attention map generated in the step 1;
step 3, excavating the relation between the components through the component feature construction diagram extracted in the step 2, and obtaining fusion features;
step 4, obtaining a Hash code through a Hash layer according to the fusion characteristics obtained in the step 3;
and 5, constructing a loss function for the feature extraction and the Hash coding in the steps, and enabling the network to converge gradually.
In the above steps, a convolution function f () is used in step 1 to convert the feature map into an attention map, and the calculation method of the attention map is shown as formula (1):
generating M attention maps, A, for each imagek∈RH×WDenoted as the kth fine-grained componentThe corresponding attention diagram;
the M-part feature of the local region is calculated by equation (2):
fk=g(Ak⊙F),k=1,2,…,M (2)
wherein f iskRepresents the kth local feature, which represents the multiplication of the elements of feature map F and the kth attention map, and g () is the global average pooling operation;
the step 2 specifically comprises the following steps:
step 2.1: random attention-force diagram selection
For the M learned spatial attention maps, a common phenomenon is that a plurality of attention maps can be concentrated on similar areas of an object, which greatly inhibits the diversity of discriminative features, a random discarding strategy is performed, and an attention map A is randomly selected from the M attention mapskTo force the network to search for other local areas rich in information;
step 2.2: normalizing selected attention maps
Specifically, for each training image, an attention map A is first randomly selected from AkIn order to improve the convergence rate of the model, the minimum-maximum normalization is adopted, and A iskIs smoothed to [0,1 ]]Is shown in formula (3).
Ak *Representing the enhanced kth attention map;
step 2.3: building discard masks
By setting a discard threshold TdWill be greater than Td∈[0,1]Is set to 0, the values of the other elements are set to 1, and a discard mask M is constructeddAs shown in equation (4):
wherein A isk *(i, j) represents the value of the element belonging to the kth local feature in the ith row and jth column, Md(i, j) represents the value of the discard mask at the corresponding location, threshold TdIs set to 0.5;
step 2.4: attention-directed discarding of images and corresponding features
With a discard mask MdAnd original image, new attention discard image XdCan be obtained by multiplying them, and then it will be sent to the network again and learn M new partial features fk dBy the group of formulas, attention is encouraged to try to propose other discriminative components, and finally the positioning accuracy and the feature quality are improved;
the step 3 specifically comprises the following steps:
step 3.1: defining edge characteristics of a graph
Obtaining M local discrimination characteristics f through the step 21,f2,…fMA directed graph G ═ (V, E) is constructed to capture the context between these discrete local features, given vertices V ═ 1,2The edge characteristics from the ith vertex to the jth vertex may be defined as:
eij=hθ;σ(fi,fi-fj) (5)
hθ;σ() Is an asymmetric edge function implemented by shared MLP, as shown in equation (6):
hθ;σ(fi,fi-fj)=Relu(θ·fi+σ(fi-fj) (6)
where θ and σ represent parameters of the network;
step 3.2: generating fusion features
As shown in equation (6), from fi-fjThe neighborhood information captured for different nodes may be compared to fiThe captured global information is gradually combined, similar to the image convolution, the output of the ith vertexThe out can be obtained by performing a global max pooling operation on all edge features associated with the ith vertex, as shown in equation (7):
wherein f isi rRepresents the blend feature of the ith vertex, GMP represents the global max pooling operation, and finally, the blend feature f of all verticesrAnd sending the characteristics as the fine-grained objects to a Hash coding module for coding. The step 4 specifically comprises the following steps:
step 4.1: hash code generation
For M reconstructed context features fi rThe semantic hash coding module outputs a hash code of B bits, which can be calculated according to formula (8):
Hi=tanh((WH)Tfi r+δH),i=1,2,…,M (8)
wherein Hi∈RBIs fi rOutput through the Hash layer, δHE.g. B and WH∈RM×BRespectively, represent the bias and weight of the hash layer, tanh () represents the activation function, which can be described by equation (9):
step 4.2: mapping of real-valued hash codes to two-dimensional hash codes
The final hash code can be obtained according to the formula (10), since the value range of tanh () is [ -1,1]If H is presenti>>0,Bi1, otherwise BiSince sign () has zero at a non-zero point, which causes a problem of gradient vanishing, the method will only test the test pattern with HiThe mapping is a two-dimensional hash code,
Bi=sgn(Hi) (10);
the step 5 specifically comprises the following steps:
5.1: loss of center
In learning the discriminative features in step 1, it is necessary to use the attention map A belonging to the same class for eachkCapable of pointing to similar partial areas of the object, first introducing a loss LctrTo learn each local discriminant feature fkC center of the featurek,LctrPenalizing the variance of features from the same part of different objects with the same class label can be represented by equation (11):
wherein c iskIs AkCan be initialized from zero and averaged by moving the average ck=(1-μ)ck+μfkAnd (6) updating. Here, μ control ckUpdate rate of LctrThe loss only applies to the original image:
5.2: loss of classification
Using cross entropy loss LceTo constrain the prediction classes and real image labels Y*For M original local features fkSimply stack them together and then feed into the SoftMax layer to predict their class probability YoriThe loss can be calculated as:
similarly, the discard characteristics f of M are predicted separatelyk dAnd the reconstructed context features f of Mk rClass probability Y ofdropAnd YrecnThe total classification loss is represented by Lori、LdropAnd LrecnComposition, as shown in equation (13):
Lcls=Lori(Yori,Y*)+Ldrop(Ydrop,Y*)+Lrecn(Yrecn,Y*) (13)
5.3: hash loss
Previous deep hashing methods used Sigmoid functions to define probability functions, however, existing hashing methods often lack the ability to focus related images within a small hamming sphere, so they may not perform well for hamming space retrieval. Thus, the method uses a bayesian framework to optimize the quantization loss, and the method uses the probability function as follows:
where γ is a scale parameter of the Cauchy distribution, when the Hamming distance is small, the function drops rapidly, resulting in similar points being pulled to within a small Hamming radius, and the Hamming distance for a pair of binary hash codes hi and hj is:
where K represents the number of bits of the hash code, then the Cauchy quantization loss is derived as:
in addition to reducing quantization errors, the method takes into account the bit balancing property, which means that each bit of the hash code has about a 50% chance of being 1 or-1, and in order to produce a more discriminative hash code, the method increases the bit balancing penalty:
wherein, H is a Hash code of K bits. The purpose of the bit balance loss is to generate a balanced hash code, then the hash loss LhashGiven by equation (18):
Lhash=Lq+Lb (18)
5.4: total loss
Finally, the overall loss of the method is:
L=Lctr+Lcls+Lq+Lb (19)
has the advantages that: the invention provides a fine-grained bird image retrieval method based on deep hash and a graph neural network, wherein a global fine-grained feature aggregation module is designed, the module reconstructs distinguishing features by capturing context correlation based on a K-nearest neighbor graph, can learn each discriminant part (such as bird head, wings and the like) of birds, and establishes a relation graph among different parts, thereby constructing fusion features with discriminant; a semantic hash coding module is designed, and the semantic hash coding module generates a hash code with compact semantics under the guidance of Cauchy quantization loss and bit balance loss, so that the storage overhead in practical application is reduced, and the retrieval speed is improved. Practical application shows that the method is low in storage overhead, performance of the method is superior to that of the most advanced general retrieval method and fine-grained retrieval method all the time, and powerful support is provided for reducing human management cost in practical application.
Drawings
FIG. 1 is a flow chart of a retrieval method according to an embodiment of the present invention;
FIG. 2 is a test result chart in an embodiment of the present invention.
Detailed Description
The invention is described in detail below with reference to the following figures and specific examples:
in this example, taking bird data sets CUB200-2011 as an example, as shown in fig. 1, a fine-grained bird image retrieval method based on deep hash and a graph neural network includes the following steps:
step 1 data preparation
According to the method, experiments are carried out on a bird fine-grained data set CUB200-2011, and the bird fine-grained data set CUB200-2011 is compared with other fine-grained retrieval methods, wherein the CUB200-2011 comprises 200 bird species and 11788 images. Of these, 5994 were used for training and 5794 were used for testing. The method uses the test images as a query set and the training images as a retrieval database of all the images. All images were resized to 448 x 448 pixels before being fed into the network.
Step 2 local feature based node representation
Step 2.1 local feature based node representation
For image X, the method feeds it into the backbone network Resnet50 and selects the features of the conv5 layer as its features F ∈ RH×W×NWhere H, W and N represent the height, width and number of channels, respectively, of a feature.
Step 2.2 attention map generation
M attention maps are generated for each image, set to 32, A in this examplek∈RH×WIndicated as the attention diagram for the kth fine-grained component, which may correspond to a wing or a head of a bird. The method converts the characteristic diagram into an attention diagram by using a 1 multiplied by 1 convolution function f (), and the calculation method of the attention diagram is shown as formula (1).
Step 2.3 Generation of discriminant features
By means of the attention maps with different local areas, local features with discriminant performance are extracted from the parts. The M-portion features corresponding to these local regions can be calculated by equation (2).
fk=g(Ak⊙F),k=1,2,…,M (2)
Wherein f iskRepresents the k-th local feature, which represents the multiplication of the elements of feature map F and the k-th attention map, and g () is the global average pooling operation.
Step 3, mining related component relation based on graph convolution
Step 3.1 random attention map selection
The method carries out a random discarding strategy, and randomly selects an attention diagram A from M attention diagramskTo force the network to search for other local areas where information is rich.
Step 3.2 normalizing the selected attention map
Specifically, for each training image, an attention map A is first randomly selected from Ak. In order to improve the convergence rate of the model, the method adopts minimum-maximum normalization, and A iskIs smoothed to [0,1 ]]Is shown in formula (3).
Here, Ak *The enhanced kth attention map is shown.
Step 3.3 building discard mask
By setting a discard threshold TdWill be greater than Td∈[0,1]Is set to 0, the values of the other elements are set to 1, and a discard mask M is constructeddAs shown in equation (4):
wherein A isk *(i, j) represents the value of the element belonging to the kth local feature in the ith row and jth column, Md(i, j) represents the value of the discard mask at the corresponding location. Here, the threshold value TdIs set to 0.5;
step 3.4 attention-directed discarding of images and corresponding features
With a discard mask MdAnd original image, new attention discard image XdCan be obtained by multiplying them, and then the obtained product is fed into the network again and learns M new partial characteristics
Step 4, mining related component relation based on graph convolution
Step 4.1 defining edge characteristics of graph
Through the step 3, M local discrimination characteristics f are obtained1,f2,…fMThe method constructs a directed graph G ═ (V, E) to capture the context between these discrete local features. Given vertex V ═ {1, 2., M } and edgeThe edge characteristics from the ith vertex to the jth vertex may be defined as:
eij=hθ;σ(fi,fi-fj) (5)
hθ;σ() Is an asymmetric edge function implemented by shared MLP, as shown in equation (6):
hθ;σ(fi,fi-fj)=Relu(θ·fi+σ(fi-fj) (6)
where theta and sigma represent parameters of the network.
Step 4.2 generating fusion features
As shown in equation (6), from fi-fjThe neighborhood information captured for different nodes may be compared to fiThe captured global information is gradually combined. Similar to the image convolution, the output of the ith vertex can be obtained by performing a global max-pooling operation on all edge features associated with the ith vertex, as in equation (7).
Wherein f isi rRepresents the blend feature of the ith vertex, GMP represents the global max pooling operation, and finally, the blend feature f of all verticesrAnd sending the characteristics as the fine-grained objects to a Hash coding module for coding.
Step 5 semantic Hash coding
Step 5.1 Hash code Generation
For M reconstructed context features fi rThe semantic hash coding module outputs a hash code of B bits, which can be calculated according to the formula (8).
Hi=tanh((WH)Tfi r+δH),i=1,2,…,M (8)
Wherein Hi∈RBIs fi rAnd outputting through a hash layer. DeltaHE.g. B and WH∈RM×BRespectively representing the bias and weight of the hash layer. tanh () represents an activation function, and can be described by equation (9).
Step 5.2 real-valued hash codes are mapped into two-dimensional hash codes
According to equation (10), the final hash code can be obtained. Because the value range of tanh () is [ -1,1 [ ]]If H is presenti>>0,Bi1, otherwise BiIs-1. Since sign () has zero gradient at non-zero points, which causes the problem of gradient disappearance, the method will only test the test pattern with HiAnd mapping into a two-dimensional hash code.
Bi=sgn(Hi) (10)
Specifically, in step 5, for the feature extraction and hash coding in the above steps, a loss function is constructed to gradually converge the network. The method comprises the following steps:
step 6 loss function
6.1 center loss
In learning the discriminative feature in step 2, it is necessary to use the attention map a belonging to the same category for eachkCan be directed to similar part areas of the object. The process first introduces a loss LctrTo learn each local discriminant feature fkC center of the featurek。LctrPenalizing the variance of features from the same part of different objects with the same class label can be represented by equation (11).
Wherein c iskIs AkCan be initialized from zero and averaged by moving the average ck=(1-μ)ck+μfkAnd (6) updating. Here, μ control ckUpdate rate of LctrThe loss only applies to the original image.
6.2 loss of Classification
The method adopts cross entropy loss LceTo constrain the prediction classes and real image labels Y*The distance between them. For M original local features fkSimply stack them together and then feed into the SoftMax layer to predict their class probability YoriThe loss can be calculated as:
similarly, the method also predicts the discard characteristics f of M respectivelyk dAnd the reconstructed context features f of Mk rClass probability Y ofdropAnd Yrecn. The total classification loss is represented by Lori、LdropAnd LrecnComposition, as shown in equation (13):
Lcls=Lori(Yori,Y*)+Ldrop(Ydrop,Y*)+Lrecn(Yrecn,Y*) (13)
6.3 Hash loss
Previous deep hashing methods used Sigmoid functions to define probability functions, however, existing hashing methods often lack the ability to focus related images within a small hamming sphere, so they may not perform well for hamming space retrieval. Thus, the method uses a bayesian framework to optimize the quantization loss, and the method uses the probability function as follows:
where gamma is a scale parameter of the Cauchy distributionAnd (4) counting. When the Hamming distance is small, the function drops rapidly, causing similar points to be pulled within a small Hamming radius, for a pair of binary hash codes hiAnd hjThe Hamming distance of (d) is:
and K is the bit number of the hash code. Then, the Cauchy quantization loss is derived as:
in addition to reducing quantization errors, the method takes into account the bit balancing property, which means that each bit of the hash code has about a 50% chance of being 1 or-1, and in order to produce a more discriminative hash code, the method increases the bit balancing penalty:
wherein, H is a Hash code of K bits. The purpose of the bit balance loss is to generate an unbiased information hash code. Then, the hash loses LhashGiven by equation (18):
Lhash=Lq+Lb (18)
6.4 Total loss
Finally, the overall loss of the method is:
L=Lctr+Lcls+Lq+Lb (19)
7 training and testing of bird fine-grained model
7.1 bird dataset training
The method is based on PyTorch implementation codes, a model is trained by preprocessing images with the size of 448 multiplied by 448 by using 2 RTX 2080ti GPUs, in all experiments, Resnet-50 is used as a backbone network to extract features, and the output of a Conv5 layer is selected as the featuresFigure (a). For attention map generation, the attention map is obtained by a 1 × 1 convolution kernel, with a default M of 32, and a discard threshold TdIs set to 0.5;
7.2 evaluation index
The method uses the retrieval accuracy (mAP) as an evaluation index, and is compared with the prior method fairly, and the method calculates the retrieval accuracy as follows:
wherein n isqRepresenting the number of samples in the query set and n representing the number of samples returned. N is a radical of+Representing the number of positive samples in N returned samples, N+ kRefers to the number of positive samples in the first k returned samples. The value of pos (k) is 1 if the k-th image is a positive sample, otherwise it will be 0.
7.3 model test
All test samples were tested and the method was tested to achieve 86.92% effect on the 48-bit hash code. As shown in fig. 2, we selected two birds pictures, which retrieved the first 10 images, with 90% and 80% accuracy, respectively. For birds with small differences, the method shows high-precision retrieval results meeting user expectations on code bits of 48 bits.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (8)
1. A fine-grained bird image retrieval method based on a graph neural network and deep hash is characterized by comprising the following steps:
step 1, sending the image to a backbone network Resnet50 to obtain the characteristic F e R of the imageH×W×NWherein H, W and N represent the height, width and channel number of the feature, respectively; converting the obtained feature map into an attention map, and extracting local features with discriminability through the attention maps of different local areas;
step 2, generating an image with discarded attention aiming at the attention map generated in the step 1;
step 3, excavating the relation between the components through the component feature construction diagram extracted in the step 2, and obtaining fusion features;
step 4, obtaining a Hash code through a Hash layer according to the fusion characteristics obtained in the step 3;
and 5, constructing a loss function for the feature extraction and the Hash coding in the steps, and enabling the network to converge gradually.
3. the fine-grained bird image retrieval method based on graph neural network and depth hash according to claim 1 or 2, characterized in that M attention maps, a, are generated for each image in step 1k∈RH×WExpressed as an attention map for the kth fine-grained component;
the M-part feature of the local region is calculated by equation (2):
fk=g(Ak⊙F),k=1,2,…,M(2)
wherein f iskRepresents the k-th local feature, represents the multiplication of the elements of feature map F and the k-th attention map, and g () is the global average pooling operationDo this.
4. The fine-grained bird image retrieval method based on the graph neural network and the deep hash as claimed in claim 1, wherein the step 2 specifically comprises the following steps:
step 2.1: random attention-force diagram selection
For the M learned space attention diagrams, a random discarding strategy is carried out, and one attention diagram A is randomly selected from the M attention diagramskTo force the network to search for other local areas rich in information;
step 2.2: normalizing selected attention maps
For each training image, first randomly selecting an attention map A from AkIn order to improve the convergence rate of the model, the minimum-maximum normalization is adopted, and A iskIs smoothed to [0,1 ]]As shown in equation (3):
Ak *representing the enhanced kth attention map;
step 2.3: building discard masks
By setting a discard threshold TdWill be greater than Td∈[0,1]Is set to 0, the values of the other elements are set to 1, and a discard mask M is constructeddAs shown in equation (4):
wherein A isk *(i, j) represents the value of the element belonging to the kth local feature in the ith row and jth column, Md(i, j) represents the value of the discard mask at the corresponding location, threshold TdIs set to 0.5;
step 2.4: attention-directed discarding of images and corresponding features
New attention discard image XdBy discarding the mask MdMultiplied by the original image, and sent to the network again to learn M new partial featuresBy this set of formulas, attention is encouraged to try to propose other discriminative components, ultimately improving the accuracy of the positioning and the quality of the features.
5. The fine-grained bird image retrieval method based on graph neural network and deep hash as claimed in claim 1, wherein step 3 specifically comprises the following steps:
step 3.1: defining edge characteristics of a graph
Obtaining M local discrimination characteristics f through the step 21,f2,…fMA directed graph G ═ (V, E) is constructed to capture the context between these discrete local features, given vertices V ═ 1,2The edge characteristics from the ith vertex to the jth vertex may be defined as:
eij=hθ;σ(fi,fi-fj)(5)
hθ;σ() Is an asymmetric edge function implemented by shared MLP, as shown in equation (6):
hθ;σ(fi,fi-fj)=Relu(θ·fi+σ(fi-fj)(6)
where θ and σ represent parameters of the network;
step 3.2: generating fusion features
As shown in equation (6), from fi-fjThe neighborhood information captured for different nodes may be compared to fiThe captured global information is gradually combined, and similar to image convolution, the output of the ith vertex can be maximized globally by applying all edge features associated with the ith vertexPooling, as obtained in equation (7):
wherein f isi rRepresenting the blend feature of the ith vertex, GMP representing the global max pooling operation, and the blend feature f of all the last verticesrAnd sending the characteristics as the fine-grained objects to a Hash coding module for coding.
6. The fine-grained bird image retrieval method based on graph neural network and deep hash according to claim 1, wherein step 4 specifically comprises the following steps:
step 4.1: hash code generation
For M reconstructed context features fi rThe semantic hash coding module outputs a hash code of B bits, which can be calculated according to formula (8):
Hi=tanh((WH)Tfi r+δH),i=1,2,…,M (8)
wherein Hi∈RBIs fi rOutput through the Hash layer, δHE.g. B and WH∈RM×BRespectively, represent the bias and weight of the hash layer, tanh () represents the activation function, which can be described by equation (9):
step 4.2: mapping of real-valued hash codes to two-dimensional hash codes
The final hash code can be obtained according to the formula (10), since the value range of tanh () is [ -1,1]If H is presenti>>0,Bi1, otherwise BiSince sign () has zero at a non-zero point, which causes a problem of gradient vanishing, the method will only test the test pattern with HiMapping to two dimensionsThe hash code is a code of a hash of the code,
Bi=sgn(Hi) (10)。
7. the fine-grained bird image retrieval method based on graph neural network and deep hash as claimed in claim 1, wherein step 5 specifically comprises the following steps:
5.1: loss of center
In learning the discriminative features in step 1, it is necessary to use the attention map A belonging to the same class for eachkCapable of pointing to similar partial areas of the object, first introducing a loss LctrTo learn each local discriminant feature fkC center of the featurek,LctrPenalizing the variance of features from the same part of different objects with the same class label, represented by equation (11):
wherein c iskIs AkIs initialized from zero and averaged by moving the average ck=(1-μ)ck+μfkUpdate, μ control ckThe update rate of (d);
5.2: loss of classification
Using cross entropy loss LceTo constrain the prediction classes and real image labels Y*For M original local features fkStacking them together, then feeding them into SoftMax layer, predicting their class probability YoriThe loss can be calculated as:
similarly, the discard characteristics f of M are predicted separatelyk dAnd the reconstructed context features f of Mk rClass probability Y ofdropAnd YrecnTotal classification lossFrom Lori、LdropAnd LrecnComposition, as shown in equation (13):
Lcls=Lori(Yori,Y*)+Ldrop(Ydrop,Y*)+Lrecn(Yrecn,Y*) (13)
5.3: hash loss
The quantization loss is optimized using a bayesian framework, using the probability function as follows:
where γ is the scale parameter of Cauchy distribution, when the Hamming distance is small, the function drops rapidly, resulting in similar points being pulled to within a small Hamming radius, for a pair of binary hash codes hiAnd hjThe Hamming distance of (d) is:
where K represents the number of bits of the hash code, then the Cauchy quantization loss is derived as:
in addition to reducing quantization errors, the bit balance property is also taken into account, which means that each bit of the hash code has about a 50% chance of being 1 or-1, increasing the bit balance penalty in order to produce a more discriminative hash code:
the purpose of the bit balance loss is to generate a balanced hash code, then the hash loss LhashGiven by equation (18):
Lhash=Lq+Lb (18)
5.4: total loss
The final overall loss is:
L=Lctr+Lcls+Lq+Lb (19)。
8. the fine-grained bird image retrieval method based on graph neural network and deep hash as claimed in claim 7, wherein L isctrThe loss only applies to the original image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111521433.8A CN114329031A (en) | 2021-12-13 | 2021-12-13 | Fine-grained bird image retrieval method based on graph neural network and deep hash |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111521433.8A CN114329031A (en) | 2021-12-13 | 2021-12-13 | Fine-grained bird image retrieval method based on graph neural network and deep hash |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114329031A true CN114329031A (en) | 2022-04-12 |
Family
ID=81051096
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111521433.8A Pending CN114329031A (en) | 2021-12-13 | 2021-12-13 | Fine-grained bird image retrieval method based on graph neural network and deep hash |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114329031A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115964527A (en) * | 2023-01-05 | 2023-04-14 | 北京东方通网信科技有限公司 | Label representation construction method for single label image retrieval |
CN116563607A (en) * | 2023-04-11 | 2023-08-08 | 北京邮电大学 | Fine granularity image recognition method and device based on cross-dataset information mining |
-
2021
- 2021-12-13 CN CN202111521433.8A patent/CN114329031A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115964527A (en) * | 2023-01-05 | 2023-04-14 | 北京东方通网信科技有限公司 | Label representation construction method for single label image retrieval |
CN115964527B (en) * | 2023-01-05 | 2023-09-26 | 北京东方通网信科技有限公司 | Label characterization construction method for single-label image retrieval |
CN116563607A (en) * | 2023-04-11 | 2023-08-08 | 北京邮电大学 | Fine granularity image recognition method and device based on cross-dataset information mining |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108830296B (en) | Improved high-resolution remote sensing image classification method based on deep learning | |
CN110443143B (en) | Multi-branch convolutional neural network fused remote sensing image scene classification method | |
CN110689086B (en) | Semi-supervised high-resolution remote sensing image scene classification method based on generating countermeasure network | |
CN108960140B (en) | Pedestrian re-identification method based on multi-region feature extraction and fusion | |
Shen et al. | Generative adversarial learning towards fast weakly supervised detection | |
CN114241282B (en) | Knowledge distillation-based edge equipment scene recognition method and device | |
CN106909924B (en) | Remote sensing image rapid retrieval method based on depth significance | |
CN113378632B (en) | Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method | |
Lin et al. | RSCM: Region selection and concurrency model for multi-class weather recognition | |
CN108875816A (en) | Merge the Active Learning samples selection strategy of Reliability Code and diversity criterion | |
CN112633382B (en) | Method and system for classifying few sample images based on mutual neighbor | |
CN109886072B (en) | Face attribute classification system based on bidirectional Ladder structure | |
CN111583263A (en) | Point cloud segmentation method based on joint dynamic graph convolution | |
CN111612051B (en) | Weak supervision target detection method based on graph convolution neural network | |
CN112307995A (en) | Semi-supervised pedestrian re-identification method based on feature decoupling learning | |
CN114329031A (en) | Fine-grained bird image retrieval method based on graph neural network and deep hash | |
CN112434628B (en) | Small sample image classification method based on active learning and collaborative representation | |
Islam et al. | InceptB: a CNN based classification approach for recognizing traditional bengali games | |
CN109784288B (en) | Pedestrian re-identification method based on discrimination perception fusion | |
CN104318271B (en) | Image classification method based on adaptability coding and geometrical smooth convergence | |
CN109165698A (en) | A kind of image classification recognition methods and its storage medium towards wisdom traffic | |
CN109800314A (en) | A method of generating the Hash codes for being used for image retrieval using depth convolutional network | |
Li et al. | Image decomposition with multilabel context: Algorithms and applications | |
CN110503090A (en) | Character machining network training method, character detection method and character machining device based on limited attention model | |
CN116543269B (en) | Cross-domain small sample fine granularity image recognition method based on self-supervision and model thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |