CN114329031A - Fine-grained bird image retrieval method based on graph neural network and deep hash - Google Patents

Fine-grained bird image retrieval method based on graph neural network and deep hash Download PDF

Info

Publication number
CN114329031A
CN114329031A CN202111521433.8A CN202111521433A CN114329031A CN 114329031 A CN114329031 A CN 114329031A CN 202111521433 A CN202111521433 A CN 202111521433A CN 114329031 A CN114329031 A CN 114329031A
Authority
CN
China
Prior art keywords
hash
attention
fine
loss
grained
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111521433.8A
Other languages
Chinese (zh)
Inventor
孙涵
郎文溪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202111521433.8A priority Critical patent/CN114329031A/en
Publication of CN114329031A publication Critical patent/CN114329031A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a fine-grained bird image retrieval method based on a graph neural network and deep hash, which belongs to the field of fine-grained image retrieval and comprises the following steps: based on node representation of local features, local feature enhancement, relevant component relation mining based on graph convolution, semantic hash coding and a loss function, a comprehensive method based on a graph neural network and a deep hash method is provided, the method is suitable for large-scale bird image retrieval, and fine-granularity bird image retrieval can be achieved with high efficiency, low storage and high precision.

Description

Fine-grained bird image retrieval method based on graph neural network and deep hash
Technical Field
The invention belongs to the field of fine-grained image retrieval, and particularly relates to a fine-grained bird image retrieval method based on a graph neural network and deep hash.
Background
At the present stage, monitoring of birds in natural conservation areas and national wetland areas can be used as important indexes for evaluating biological diversity and ecological environment of the birds; for farmland and airport areas, bird surveillance affects farmer income and normal operation of airports. Since birds are abundant in nature, it is desired by bird experts how to quickly and accurately search for correct birds from a large-scale data set. Because retrieval of bird images is different from retrieval of general images, two characteristics are provided between database images and query images: (1) the difference between the birds is small, and the difference is often in a subtle way. Such as a bird's head or tail. (2) The intra-class difference is large, and due to factors such as illumination, posture and the like, different images of the same bird have large difference. At present, birds are often searched by experts through manual identification, and the bird searching method is high in cost and prone to errors. Therefore, in the face of a large-scale bird data set, if a computer vision technology can be used for realizing a fine-grained bird identification technology with low cost, low storage and high efficiency, the method has important significance for the industry and the agriculture.
In recent years, fine-grained image retrieval has been precedent in the field of computer vision. In the early days, Xie et al calculated the similarity between images based on manual features. When searching, the large class to which the image belongs is judged first, and then fine-grained level searching is carried out. With the development of convolutional neural networks, fine-grained retrieval methods based on deep learning are proposed. These depth methods can be roughly classified into a supervised method and an unsupervised method. For the unsupervised retrieval method, the selective convolution descriptor aggregation method is proposed by SCDA et al, which first locates the objects in the fine-grained image and retains the useful deep descriptors for fine-grained image retrieval. However, the method uses the pre-training model as a fixed feature extractor, does not customize a network to learn corresponding fine-grained features, and does not consider the structural relationship between the corresponding fine-grained components for the extracted convolution descriptors, so that the retrieval precision is general.
In the supervised approach, fine-grained image retrieval is defined as a depth metric learning problem. Zheng et al propose a CRL-WSL method, which is a unified framework for efficient learning to distinguish features and use learning salient regions to perform centralized ordering loss and segmentation on target contours. Zheng et al also proposed a DCL-NC penalty method, which was improved based on the CRL-WSL method, increased the normalized scale layer and decorrelation ranking loss, and realized higher retrieval accuracy. However, the encoding dimensions of the two methods are 1024 dimensions, the encoding dimensions are high, and in practical large-scale image retrieval, the problems of slow query speed and storage redundancy can be encountered. Therefore, the realization of efficient and high-precision fine-grained image retrieval using low-stored code bits is a major concern at present.
Disclosure of Invention
The invention provides a fine-grained bird image retrieval method based on deep hash and a graph neural network, which can realize fine-grained bird image retrieval with high efficiency, low storage and high precision.
In order to achieve the technical purpose, the technical scheme of the invention is as follows:
a fine-grained bird image retrieval method based on a graph neural network and deep hashing comprises the following steps:
step 1, sending the image to a backbone network Resnet50 to obtain the characteristic F e R of the imageH×W×NWherein H, W and N represent the height, width and channel number of the feature, respectively; converting the obtained feature map into an attention map, and extracting local features with discriminability through the attention maps of different local areas;
step 2, generating an image with discarded attention aiming at the attention map generated in the step 1;
step 3, excavating the relation between the components through the component feature construction diagram extracted in the step 2, and obtaining fusion features;
step 4, obtaining a Hash code through a Hash layer according to the fusion characteristics obtained in the step 3;
and 5, constructing a loss function for the feature extraction and the Hash coding in the steps, and enabling the network to converge gradually.
In the above steps, a convolution function f () is used in step 1 to convert the feature map into an attention map, and the calculation method of the attention map is shown as formula (1):
Figure BDA0003407586390000021
generating M attention maps, A, for each imagek∈RH×WDenoted as the kth fine-grained componentThe corresponding attention diagram;
the M-part feature of the local region is calculated by equation (2):
fk=g(Ak⊙F),k=1,2,…,M (2)
wherein f iskRepresents the kth local feature, which represents the multiplication of the elements of feature map F and the kth attention map, and g () is the global average pooling operation;
the step 2 specifically comprises the following steps:
step 2.1: random attention-force diagram selection
For the M learned spatial attention maps, a common phenomenon is that a plurality of attention maps can be concentrated on similar areas of an object, which greatly inhibits the diversity of discriminative features, a random discarding strategy is performed, and an attention map A is randomly selected from the M attention mapskTo force the network to search for other local areas rich in information;
step 2.2: normalizing selected attention maps
Specifically, for each training image, an attention map A is first randomly selected from AkIn order to improve the convergence rate of the model, the minimum-maximum normalization is adopted, and A iskIs smoothed to [0,1 ]]Is shown in formula (3).
Figure BDA0003407586390000031
Ak *Representing the enhanced kth attention map;
step 2.3: building discard masks
By setting a discard threshold TdWill be greater than Td∈[0,1]Is set to 0, the values of the other elements are set to 1, and a discard mask M is constructeddAs shown in equation (4):
Figure BDA0003407586390000032
wherein A isk *(i, j) represents the value of the element belonging to the kth local feature in the ith row and jth column, Md(i, j) represents the value of the discard mask at the corresponding location, threshold TdIs set to 0.5;
step 2.4: attention-directed discarding of images and corresponding features
With a discard mask MdAnd original image, new attention discard image XdCan be obtained by multiplying them, and then it will be sent to the network again and learn M new partial features fk dBy the group of formulas, attention is encouraged to try to propose other discriminative components, and finally the positioning accuracy and the feature quality are improved;
the step 3 specifically comprises the following steps:
step 3.1: defining edge characteristics of a graph
Obtaining M local discrimination characteristics f through the step 21,f2,…fMA directed graph G ═ (V, E) is constructed to capture the context between these discrete local features, given vertices V ═ 1,2
Figure BDA0003407586390000033
The edge characteristics from the ith vertex to the jth vertex may be defined as:
eij=hθ;σ(fi,fi-fj) (5)
hθ;σ() Is an asymmetric edge function implemented by shared MLP, as shown in equation (6):
hθ;σ(fi,fi-fj)=Relu(θ·fi+σ(fi-fj) (6)
where θ and σ represent parameters of the network;
step 3.2: generating fusion features
As shown in equation (6), from fi-fjThe neighborhood information captured for different nodes may be compared to fiThe captured global information is gradually combined, similar to the image convolution, the output of the ith vertexThe out can be obtained by performing a global max pooling operation on all edge features associated with the ith vertex, as shown in equation (7):
Figure BDA0003407586390000041
wherein f isi rRepresents the blend feature of the ith vertex, GMP represents the global max pooling operation, and finally, the blend feature f of all verticesrAnd sending the characteristics as the fine-grained objects to a Hash coding module for coding. The step 4 specifically comprises the following steps:
step 4.1: hash code generation
For M reconstructed context features fi rThe semantic hash coding module outputs a hash code of B bits, which can be calculated according to formula (8):
Hi=tanh((WH)Tfi rH),i=1,2,…,M (8)
wherein Hi∈RBIs fi rOutput through the Hash layer, δHE.g. B and WH∈RM×BRespectively, represent the bias and weight of the hash layer, tanh () represents the activation function, which can be described by equation (9):
Figure BDA0003407586390000042
step 4.2: mapping of real-valued hash codes to two-dimensional hash codes
The final hash code can be obtained according to the formula (10), since the value range of tanh () is [ -1,1]If H is presenti>>0,Bi1, otherwise BiSince sign () has zero at a non-zero point, which causes a problem of gradient vanishing, the method will only test the test pattern with HiThe mapping is a two-dimensional hash code,
Bi=sgn(Hi) (10);
the step 5 specifically comprises the following steps:
5.1: loss of center
In learning the discriminative features in step 1, it is necessary to use the attention map A belonging to the same class for eachkCapable of pointing to similar partial areas of the object, first introducing a loss LctrTo learn each local discriminant feature fkC center of the featurek,LctrPenalizing the variance of features from the same part of different objects with the same class label can be represented by equation (11):
Figure BDA0003407586390000043
wherein c iskIs AkCan be initialized from zero and averaged by moving the average ck=(1-μ)ck+μfkAnd (6) updating. Here, μ control ckUpdate rate of LctrThe loss only applies to the original image:
5.2: loss of classification
Using cross entropy loss LceTo constrain the prediction classes and real image labels Y*For M original local features fkSimply stack them together and then feed into the SoftMax layer to predict their class probability YoriThe loss can be calculated as:
Figure BDA0003407586390000051
similarly, the discard characteristics f of M are predicted separatelyk dAnd the reconstructed context features f of Mk rClass probability Y ofdropAnd YrecnThe total classification loss is represented by Lori、LdropAnd LrecnComposition, as shown in equation (13):
Lcls=Lori(Yori,Y*)+Ldrop(Ydrop,Y*)+Lrecn(Yrecn,Y*) (13)
5.3: hash loss
Previous deep hashing methods used Sigmoid functions to define probability functions, however, existing hashing methods often lack the ability to focus related images within a small hamming sphere, so they may not perform well for hamming space retrieval. Thus, the method uses a bayesian framework to optimize the quantization loss, and the method uses the probability function as follows:
Figure BDA0003407586390000052
where γ is a scale parameter of the Cauchy distribution, when the Hamming distance is small, the function drops rapidly, resulting in similar points being pulled to within a small Hamming radius, and the Hamming distance for a pair of binary hash codes hi and hj is:
Figure BDA0003407586390000053
where K represents the number of bits of the hash code, then the Cauchy quantization loss is derived as:
Figure BDA0003407586390000054
in addition to reducing quantization errors, the method takes into account the bit balancing property, which means that each bit of the hash code has about a 50% chance of being 1 or-1, and in order to produce a more discriminative hash code, the method increases the bit balancing penalty:
Figure BDA0003407586390000061
wherein, H is a Hash code of K bits. The purpose of the bit balance loss is to generate a balanced hash code, then the hash loss LhashGiven by equation (18):
Lhash=Lq+Lb (18)
5.4: total loss
Finally, the overall loss of the method is:
L=Lctr+Lcls+Lq+Lb (19)
has the advantages that: the invention provides a fine-grained bird image retrieval method based on deep hash and a graph neural network, wherein a global fine-grained feature aggregation module is designed, the module reconstructs distinguishing features by capturing context correlation based on a K-nearest neighbor graph, can learn each discriminant part (such as bird head, wings and the like) of birds, and establishes a relation graph among different parts, thereby constructing fusion features with discriminant; a semantic hash coding module is designed, and the semantic hash coding module generates a hash code with compact semantics under the guidance of Cauchy quantization loss and bit balance loss, so that the storage overhead in practical application is reduced, and the retrieval speed is improved. Practical application shows that the method is low in storage overhead, performance of the method is superior to that of the most advanced general retrieval method and fine-grained retrieval method all the time, and powerful support is provided for reducing human management cost in practical application.
Drawings
FIG. 1 is a flow chart of a retrieval method according to an embodiment of the present invention;
FIG. 2 is a test result chart in an embodiment of the present invention.
Detailed Description
The invention is described in detail below with reference to the following figures and specific examples:
in this example, taking bird data sets CUB200-2011 as an example, as shown in fig. 1, a fine-grained bird image retrieval method based on deep hash and a graph neural network includes the following steps:
step 1 data preparation
According to the method, experiments are carried out on a bird fine-grained data set CUB200-2011, and the bird fine-grained data set CUB200-2011 is compared with other fine-grained retrieval methods, wherein the CUB200-2011 comprises 200 bird species and 11788 images. Of these, 5994 were used for training and 5794 were used for testing. The method uses the test images as a query set and the training images as a retrieval database of all the images. All images were resized to 448 x 448 pixels before being fed into the network.
Step 2 local feature based node representation
Step 2.1 local feature based node representation
For image X, the method feeds it into the backbone network Resnet50 and selects the features of the conv5 layer as its features F ∈ RH×W×NWhere H, W and N represent the height, width and number of channels, respectively, of a feature.
Step 2.2 attention map generation
M attention maps are generated for each image, set to 32, A in this examplek∈RH×WIndicated as the attention diagram for the kth fine-grained component, which may correspond to a wing or a head of a bird. The method converts the characteristic diagram into an attention diagram by using a 1 multiplied by 1 convolution function f (), and the calculation method of the attention diagram is shown as formula (1).
Figure BDA0003407586390000071
Step 2.3 Generation of discriminant features
By means of the attention maps with different local areas, local features with discriminant performance are extracted from the parts. The M-portion features corresponding to these local regions can be calculated by equation (2).
fk=g(Ak⊙F),k=1,2,…,M (2)
Wherein f iskRepresents the k-th local feature, which represents the multiplication of the elements of feature map F and the k-th attention map, and g () is the global average pooling operation.
Step 3, mining related component relation based on graph convolution
Step 3.1 random attention map selection
The method carries out a random discarding strategy, and randomly selects an attention diagram A from M attention diagramskTo force the network to search for other local areas where information is rich.
Step 3.2 normalizing the selected attention map
Specifically, for each training image, an attention map A is first randomly selected from Ak. In order to improve the convergence rate of the model, the method adopts minimum-maximum normalization, and A iskIs smoothed to [0,1 ]]Is shown in formula (3).
Figure BDA0003407586390000072
Here, Ak *The enhanced kth attention map is shown.
Step 3.3 building discard mask
By setting a discard threshold TdWill be greater than Td∈[0,1]Is set to 0, the values of the other elements are set to 1, and a discard mask M is constructeddAs shown in equation (4):
Figure BDA0003407586390000081
wherein A isk *(i, j) represents the value of the element belonging to the kth local feature in the ith row and jth column, Md(i, j) represents the value of the discard mask at the corresponding location. Here, the threshold value TdIs set to 0.5;
step 3.4 attention-directed discarding of images and corresponding features
With a discard mask MdAnd original image, new attention discard image XdCan be obtained by multiplying them, and then the obtained product is fed into the network again and learns M new partial characteristics
Figure BDA0003407586390000082
Step 4, mining related component relation based on graph convolution
Step 4.1 defining edge characteristics of graph
Through the step 3, M local discrimination characteristics f are obtained1,f2,…fMThe method constructs a directed graph G ═ (V, E) to capture the context between these discrete local features. Given vertex V ═ {1, 2., M } and edge
Figure BDA0003407586390000083
The edge characteristics from the ith vertex to the jth vertex may be defined as:
eij=hθ;σ(fi,fi-fj) (5)
hθ;σ() Is an asymmetric edge function implemented by shared MLP, as shown in equation (6):
hθ;σ(fi,fi-fj)=Relu(θ·fi+σ(fi-fj) (6)
where theta and sigma represent parameters of the network.
Step 4.2 generating fusion features
As shown in equation (6), from fi-fjThe neighborhood information captured for different nodes may be compared to fiThe captured global information is gradually combined. Similar to the image convolution, the output of the ith vertex can be obtained by performing a global max-pooling operation on all edge features associated with the ith vertex, as in equation (7).
Figure BDA0003407586390000084
Wherein f isi rRepresents the blend feature of the ith vertex, GMP represents the global max pooling operation, and finally, the blend feature f of all verticesrAnd sending the characteristics as the fine-grained objects to a Hash coding module for coding.
Step 5 semantic Hash coding
Step 5.1 Hash code Generation
For M reconstructed context features fi rThe semantic hash coding module outputs a hash code of B bits, which can be calculated according to the formula (8).
Hi=tanh((WH)Tfi rH),i=1,2,…,M (8)
Wherein Hi∈RBIs fi rAnd outputting through a hash layer. DeltaHE.g. B and WH∈RM×BRespectively representing the bias and weight of the hash layer. tanh () represents an activation function, and can be described by equation (9).
Figure BDA0003407586390000091
Step 5.2 real-valued hash codes are mapped into two-dimensional hash codes
According to equation (10), the final hash code can be obtained. Because the value range of tanh () is [ -1,1 [ ]]If H is presenti>>0,Bi1, otherwise BiIs-1. Since sign () has zero gradient at non-zero points, which causes the problem of gradient disappearance, the method will only test the test pattern with HiAnd mapping into a two-dimensional hash code.
Bi=sgn(Hi) (10)
Specifically, in step 5, for the feature extraction and hash coding in the above steps, a loss function is constructed to gradually converge the network. The method comprises the following steps:
step 6 loss function
6.1 center loss
In learning the discriminative feature in step 2, it is necessary to use the attention map a belonging to the same category for eachkCan be directed to similar part areas of the object. The process first introduces a loss LctrTo learn each local discriminant feature fkC center of the featurek。LctrPenalizing the variance of features from the same part of different objects with the same class label can be represented by equation (11).
Figure BDA0003407586390000092
Wherein c iskIs AkCan be initialized from zero and averaged by moving the average ck=(1-μ)ck+μfkAnd (6) updating. Here, μ control ckUpdate rate of LctrThe loss only applies to the original image.
6.2 loss of Classification
The method adopts cross entropy loss LceTo constrain the prediction classes and real image labels Y*The distance between them. For M original local features fkSimply stack them together and then feed into the SoftMax layer to predict their class probability YoriThe loss can be calculated as:
Figure BDA0003407586390000093
similarly, the method also predicts the discard characteristics f of M respectivelyk dAnd the reconstructed context features f of Mk rClass probability Y ofdropAnd Yrecn. The total classification loss is represented by Lori、LdropAnd LrecnComposition, as shown in equation (13):
Lcls=Lori(Yori,Y*)+Ldrop(Ydrop,Y*)+Lrecn(Yrecn,Y*) (13)
6.3 Hash loss
Previous deep hashing methods used Sigmoid functions to define probability functions, however, existing hashing methods often lack the ability to focus related images within a small hamming sphere, so they may not perform well for hamming space retrieval. Thus, the method uses a bayesian framework to optimize the quantization loss, and the method uses the probability function as follows:
Figure BDA0003407586390000101
where gamma is a scale parameter of the Cauchy distributionAnd (4) counting. When the Hamming distance is small, the function drops rapidly, causing similar points to be pulled within a small Hamming radius, for a pair of binary hash codes hiAnd hjThe Hamming distance of (d) is:
Figure BDA0003407586390000102
and K is the bit number of the hash code. Then, the Cauchy quantization loss is derived as:
Figure BDA0003407586390000103
in addition to reducing quantization errors, the method takes into account the bit balancing property, which means that each bit of the hash code has about a 50% chance of being 1 or-1, and in order to produce a more discriminative hash code, the method increases the bit balancing penalty:
Figure BDA0003407586390000104
wherein, H is a Hash code of K bits. The purpose of the bit balance loss is to generate an unbiased information hash code. Then, the hash loses LhashGiven by equation (18):
Lhash=Lq+Lb (18)
6.4 Total loss
Finally, the overall loss of the method is:
L=Lctr+Lcls+Lq+Lb (19)
7 training and testing of bird fine-grained model
7.1 bird dataset training
The method is based on PyTorch implementation codes, a model is trained by preprocessing images with the size of 448 multiplied by 448 by using 2 RTX 2080ti GPUs, in all experiments, Resnet-50 is used as a backbone network to extract features, and the output of a Conv5 layer is selected as the featuresFigure (a). For attention map generation, the attention map is obtained by a 1 × 1 convolution kernel, with a default M of 32, and a discard threshold TdIs set to 0.5;
7.2 evaluation index
The method uses the retrieval accuracy (mAP) as an evaluation index, and is compared with the prior method fairly, and the method calculates the retrieval accuracy as follows:
Figure BDA0003407586390000111
wherein n isqRepresenting the number of samples in the query set and n representing the number of samples returned. N is a radical of+Representing the number of positive samples in N returned samples, N+ kRefers to the number of positive samples in the first k returned samples. The value of pos (k) is 1 if the k-th image is a positive sample, otherwise it will be 0.
7.3 model test
All test samples were tested and the method was tested to achieve 86.92% effect on the 48-bit hash code. As shown in fig. 2, we selected two birds pictures, which retrieved the first 10 images, with 90% and 80% accuracy, respectively. For birds with small differences, the method shows high-precision retrieval results meeting user expectations on code bits of 48 bits.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A fine-grained bird image retrieval method based on a graph neural network and deep hash is characterized by comprising the following steps:
step 1, sending the image to a backbone network Resnet50 to obtain the characteristic F e R of the imageH×W×NWherein H, W and N represent the height, width and channel number of the feature, respectively; converting the obtained feature map into an attention map, and extracting local features with discriminability through the attention maps of different local areas;
step 2, generating an image with discarded attention aiming at the attention map generated in the step 1;
step 3, excavating the relation between the components through the component feature construction diagram extracted in the step 2, and obtaining fusion features;
step 4, obtaining a Hash code through a Hash layer according to the fusion characteristics obtained in the step 3;
and 5, constructing a loss function for the feature extraction and the Hash coding in the steps, and enabling the network to converge gradually.
2. The fine-grained bird image retrieval method based on graph neural network and depth hash as claimed in claim 1, wherein a convolution function f () is used in step 1 to convert the feature map into an attention map, and the calculation method of the attention map is shown as formula (1):
Figure FDA0003407586380000011
3. the fine-grained bird image retrieval method based on graph neural network and depth hash according to claim 1 or 2, characterized in that M attention maps, a, are generated for each image in step 1k∈RH×WExpressed as an attention map for the kth fine-grained component;
the M-part feature of the local region is calculated by equation (2):
fk=g(Ak⊙F),k=1,2,…,M(2)
wherein f iskRepresents the k-th local feature, represents the multiplication of the elements of feature map F and the k-th attention map, and g () is the global average pooling operationDo this.
4. The fine-grained bird image retrieval method based on the graph neural network and the deep hash as claimed in claim 1, wherein the step 2 specifically comprises the following steps:
step 2.1: random attention-force diagram selection
For the M learned space attention diagrams, a random discarding strategy is carried out, and one attention diagram A is randomly selected from the M attention diagramskTo force the network to search for other local areas rich in information;
step 2.2: normalizing selected attention maps
For each training image, first randomly selecting an attention map A from AkIn order to improve the convergence rate of the model, the minimum-maximum normalization is adopted, and A iskIs smoothed to [0,1 ]]As shown in equation (3):
Figure FDA0003407586380000021
Ak *representing the enhanced kth attention map;
step 2.3: building discard masks
By setting a discard threshold TdWill be greater than Td∈[0,1]Is set to 0, the values of the other elements are set to 1, and a discard mask M is constructeddAs shown in equation (4):
Figure FDA0003407586380000022
wherein A isk *(i, j) represents the value of the element belonging to the kth local feature in the ith row and jth column, Md(i, j) represents the value of the discard mask at the corresponding location, threshold TdIs set to 0.5;
step 2.4: attention-directed discarding of images and corresponding features
New attention discard image XdBy discarding the mask MdMultiplied by the original image, and sent to the network again to learn M new partial features
Figure FDA0003407586380000023
By this set of formulas, attention is encouraged to try to propose other discriminative components, ultimately improving the accuracy of the positioning and the quality of the features.
5. The fine-grained bird image retrieval method based on graph neural network and deep hash as claimed in claim 1, wherein step 3 specifically comprises the following steps:
step 3.1: defining edge characteristics of a graph
Obtaining M local discrimination characteristics f through the step 21,f2,…fMA directed graph G ═ (V, E) is constructed to capture the context between these discrete local features, given vertices V ═ 1,2
Figure FDA0003407586380000024
The edge characteristics from the ith vertex to the jth vertex may be defined as:
eij=hθ;σ(fi,fi-fj)(5)
hθ;σ() Is an asymmetric edge function implemented by shared MLP, as shown in equation (6):
hθ;σ(fi,fi-fj)=Relu(θ·fi+σ(fi-fj)(6)
where θ and σ represent parameters of the network;
step 3.2: generating fusion features
As shown in equation (6), from fi-fjThe neighborhood information captured for different nodes may be compared to fiThe captured global information is gradually combined, and similar to image convolution, the output of the ith vertex can be maximized globally by applying all edge features associated with the ith vertexPooling, as obtained in equation (7):
Figure FDA0003407586380000031
wherein f isi rRepresenting the blend feature of the ith vertex, GMP representing the global max pooling operation, and the blend feature f of all the last verticesrAnd sending the characteristics as the fine-grained objects to a Hash coding module for coding.
6. The fine-grained bird image retrieval method based on graph neural network and deep hash according to claim 1, wherein step 4 specifically comprises the following steps:
step 4.1: hash code generation
For M reconstructed context features fi rThe semantic hash coding module outputs a hash code of B bits, which can be calculated according to formula (8):
Hi=tanh((WH)Tfi rH),i=1,2,…,M (8)
wherein Hi∈RBIs fi rOutput through the Hash layer, δHE.g. B and WH∈RM×BRespectively, represent the bias and weight of the hash layer, tanh () represents the activation function, which can be described by equation (9):
Figure FDA0003407586380000032
step 4.2: mapping of real-valued hash codes to two-dimensional hash codes
The final hash code can be obtained according to the formula (10), since the value range of tanh () is [ -1,1]If H is presenti>>0,Bi1, otherwise BiSince sign () has zero at a non-zero point, which causes a problem of gradient vanishing, the method will only test the test pattern with HiMapping to two dimensionsThe hash code is a code of a hash of the code,
Bi=sgn(Hi) (10)。
7. the fine-grained bird image retrieval method based on graph neural network and deep hash as claimed in claim 1, wherein step 5 specifically comprises the following steps:
5.1: loss of center
In learning the discriminative features in step 1, it is necessary to use the attention map A belonging to the same class for eachkCapable of pointing to similar partial areas of the object, first introducing a loss LctrTo learn each local discriminant feature fkC center of the featurek,LctrPenalizing the variance of features from the same part of different objects with the same class label, represented by equation (11):
Figure FDA0003407586380000033
wherein c iskIs AkIs initialized from zero and averaged by moving the average ck=(1-μ)ck+μfkUpdate, μ control ckThe update rate of (d);
5.2: loss of classification
Using cross entropy loss LceTo constrain the prediction classes and real image labels Y*For M original local features fkStacking them together, then feeding them into SoftMax layer, predicting their class probability YoriThe loss can be calculated as:
Figure FDA0003407586380000041
similarly, the discard characteristics f of M are predicted separatelyk dAnd the reconstructed context features f of Mk rClass probability Y ofdropAnd YrecnTotal classification lossFrom Lori、LdropAnd LrecnComposition, as shown in equation (13):
Lcls=Lori(Yori,Y*)+Ldrop(Ydrop,Y*)+Lrecn(Yrecn,Y*) (13)
5.3: hash loss
The quantization loss is optimized using a bayesian framework, using the probability function as follows:
Figure FDA0003407586380000042
where γ is the scale parameter of Cauchy distribution, when the Hamming distance is small, the function drops rapidly, resulting in similar points being pulled to within a small Hamming radius, for a pair of binary hash codes hiAnd hjThe Hamming distance of (d) is:
Figure FDA0003407586380000043
where K represents the number of bits of the hash code, then the Cauchy quantization loss is derived as:
Figure FDA0003407586380000044
in addition to reducing quantization errors, the bit balance property is also taken into account, which means that each bit of the hash code has about a 50% chance of being 1 or-1, increasing the bit balance penalty in order to produce a more discriminative hash code:
Figure FDA0003407586380000045
the purpose of the bit balance loss is to generate a balanced hash code, then the hash loss LhashGiven by equation (18):
Lhash=Lq+Lb (18)
5.4: total loss
The final overall loss is:
L=Lctr+Lcls+Lq+Lb (19)。
8. the fine-grained bird image retrieval method based on graph neural network and deep hash as claimed in claim 7, wherein L isctrThe loss only applies to the original image.
CN202111521433.8A 2021-12-13 2021-12-13 Fine-grained bird image retrieval method based on graph neural network and deep hash Pending CN114329031A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111521433.8A CN114329031A (en) 2021-12-13 2021-12-13 Fine-grained bird image retrieval method based on graph neural network and deep hash

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111521433.8A CN114329031A (en) 2021-12-13 2021-12-13 Fine-grained bird image retrieval method based on graph neural network and deep hash

Publications (1)

Publication Number Publication Date
CN114329031A true CN114329031A (en) 2022-04-12

Family

ID=81051096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111521433.8A Pending CN114329031A (en) 2021-12-13 2021-12-13 Fine-grained bird image retrieval method based on graph neural network and deep hash

Country Status (1)

Country Link
CN (1) CN114329031A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115964527A (en) * 2023-01-05 2023-04-14 北京东方通网信科技有限公司 Label representation construction method for single label image retrieval
CN116563607A (en) * 2023-04-11 2023-08-08 北京邮电大学 Fine granularity image recognition method and device based on cross-dataset information mining

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115964527A (en) * 2023-01-05 2023-04-14 北京东方通网信科技有限公司 Label representation construction method for single label image retrieval
CN115964527B (en) * 2023-01-05 2023-09-26 北京东方通网信科技有限公司 Label characterization construction method for single-label image retrieval
CN116563607A (en) * 2023-04-11 2023-08-08 北京邮电大学 Fine granularity image recognition method and device based on cross-dataset information mining

Similar Documents

Publication Publication Date Title
CN108830296B (en) Improved high-resolution remote sensing image classification method based on deep learning
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN110689086B (en) Semi-supervised high-resolution remote sensing image scene classification method based on generating countermeasure network
CN108960140B (en) Pedestrian re-identification method based on multi-region feature extraction and fusion
Shen et al. Generative adversarial learning towards fast weakly supervised detection
CN114241282B (en) Knowledge distillation-based edge equipment scene recognition method and device
CN106909924B (en) Remote sensing image rapid retrieval method based on depth significance
CN113378632B (en) Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method
Lin et al. RSCM: Region selection and concurrency model for multi-class weather recognition
CN108875816A (en) Merge the Active Learning samples selection strategy of Reliability Code and diversity criterion
CN112633382B (en) Method and system for classifying few sample images based on mutual neighbor
CN109886072B (en) Face attribute classification system based on bidirectional Ladder structure
CN111583263A (en) Point cloud segmentation method based on joint dynamic graph convolution
CN111612051B (en) Weak supervision target detection method based on graph convolution neural network
CN112307995A (en) Semi-supervised pedestrian re-identification method based on feature decoupling learning
CN114329031A (en) Fine-grained bird image retrieval method based on graph neural network and deep hash
CN112434628B (en) Small sample image classification method based on active learning and collaborative representation
Islam et al. InceptB: a CNN based classification approach for recognizing traditional bengali games
CN109784288B (en) Pedestrian re-identification method based on discrimination perception fusion
CN104318271B (en) Image classification method based on adaptability coding and geometrical smooth convergence
CN109165698A (en) A kind of image classification recognition methods and its storage medium towards wisdom traffic
CN109800314A (en) A method of generating the Hash codes for being used for image retrieval using depth convolutional network
Li et al. Image decomposition with multilabel context: Algorithms and applications
CN110503090A (en) Character machining network training method, character detection method and character machining device based on limited attention model
CN116543269B (en) Cross-domain small sample fine granularity image recognition method based on self-supervision and model thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination