CN114329031A

CN114329031A - Fine-grained bird image retrieval method based on graph neural network and deep hash

Info

Publication number: CN114329031A
Application number: CN202111521433.8A
Authority: CN
Inventors: 孙涵; 郎文溪
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2021-12-13
Filing date: 2021-12-13
Publication date: 2022-04-12

Abstract

The invention discloses a fine-grained bird image retrieval method based on a graph neural network and deep hash, which belongs to the field of fine-grained image retrieval and comprises the following steps: based on node representation of local features, local feature enhancement, relevant component relation mining based on graph convolution, semantic hash coding and a loss function, a comprehensive method based on a graph neural network and a deep hash method is provided, the method is suitable for large-scale bird image retrieval, and fine-granularity bird image retrieval can be achieved with high efficiency, low storage and high precision.

Description

Fine-grained bird image retrieval method based on graph neural network and deep hash

Technical Field

The invention belongs to the field of fine-grained image retrieval, and particularly relates to a fine-grained bird image retrieval method based on a graph neural network and deep hash.

Background

At the present stage, monitoring of birds in natural conservation areas and national wetland areas can be used as important indexes for evaluating biological diversity and ecological environment of the birds; for farmland and airport areas, bird surveillance affects farmer income and normal operation of airports. Since birds are abundant in nature, it is desired by bird experts how to quickly and accurately search for correct birds from a large-scale data set. Because retrieval of bird images is different from retrieval of general images, two characteristics are provided between database images and query images: (1) the difference between the birds is small, and the difference is often in a subtle way. Such as a bird's head or tail. (2) The intra-class difference is large, and due to factors such as illumination, posture and the like, different images of the same bird have large difference. At present, birds are often searched by experts through manual identification, and the bird searching method is high in cost and prone to errors. Therefore, in the face of a large-scale bird data set, if a computer vision technology can be used for realizing a fine-grained bird identification technology with low cost, low storage and high efficiency, the method has important significance for the industry and the agriculture.

In recent years, fine-grained image retrieval has been precedent in the field of computer vision. In the early days, Xie et al calculated the similarity between images based on manual features. When searching, the large class to which the image belongs is judged first, and then fine-grained level searching is carried out. With the development of convolutional neural networks, fine-grained retrieval methods based on deep learning are proposed. These depth methods can be roughly classified into a supervised method and an unsupervised method. For the unsupervised retrieval method, the selective convolution descriptor aggregation method is proposed by SCDA et al, which first locates the objects in the fine-grained image and retains the useful deep descriptors for fine-grained image retrieval. However, the method uses the pre-training model as a fixed feature extractor, does not customize a network to learn corresponding fine-grained features, and does not consider the structural relationship between the corresponding fine-grained components for the extracted convolution descriptors, so that the retrieval precision is general.

In the supervised approach, fine-grained image retrieval is defined as a depth metric learning problem. Zheng et al propose a CRL-WSL method, which is a unified framework for efficient learning to distinguish features and use learning salient regions to perform centralized ordering loss and segmentation on target contours. Zheng et al also proposed a DCL-NC penalty method, which was improved based on the CRL-WSL method, increased the normalized scale layer and decorrelation ranking loss, and realized higher retrieval accuracy. However, the encoding dimensions of the two methods are 1024 dimensions, the encoding dimensions are high, and in practical large-scale image retrieval, the problems of slow query speed and storage redundancy can be encountered. Therefore, the realization of efficient and high-precision fine-grained image retrieval using low-stored code bits is a major concern at present.

Disclosure of Invention

The invention provides a fine-grained bird image retrieval method based on deep hash and a graph neural network, which can realize fine-grained bird image retrieval with high efficiency, low storage and high precision.

In order to achieve the technical purpose, the technical scheme of the invention is as follows:

a fine-grained bird image retrieval method based on a graph neural network and deep hashing comprises the following steps:

step 1, sending the image to a backbone network Resnet50 to obtain the characteristic F e R of the image^H×W×NWherein H, W and N represent the height, width and channel number of the feature, respectively; converting the obtained feature map into an attention map, and extracting local features with discriminability through the attention maps of different local areas;

step 2, generating an image with discarded attention aiming at the attention map generated in the step 1;

step 3, excavating the relation between the components through the component feature construction diagram extracted in the step 2, and obtaining fusion features;

step 4, obtaining a Hash code through a Hash layer according to the fusion characteristics obtained in the step 3;

and 5, constructing a loss function for the feature extraction and the Hash coding in the steps, and enabling the network to converge gradually.

In the above steps, a convolution function f () is used in step 1 to convert the feature map into an attention map, and the calculation method of the attention map is shown as formula (1):

generating M attention maps, A, for each image_k∈R^H×WDenoted as the kth fine-grained componentThe corresponding attention diagram;

the M-part feature of the local region is calculated by equation (2):

f_k＝g(A_k⊙F),k＝1,2,…,M (2)

wherein f is_kRepresents the kth local feature, which represents the multiplication of the elements of feature map F and the kth attention map, and g () is the global average pooling operation;

the step 2 specifically comprises the following steps:

step 2.1: random attention-force diagram selection

For the M learned spatial attention maps, a common phenomenon is that a plurality of attention maps can be concentrated on similar areas of an object, which greatly inhibits the diversity of discriminative features, a random discarding strategy is performed, and an attention map A is randomly selected from the M attention maps_kTo force the network to search for other local areas rich in information;

step 2.2: normalizing selected attention maps

Specifically, for each training image, an attention map A is first randomly selected from A_kIn order to improve the convergence rate of the model, the minimum-maximum normalization is adopted, and A is_kIs smoothed to [0,1 ]]Is shown in formula (3).

A_k ^*Representing the enhanced kth attention map;

step 2.3: building discard masks

By setting a discard threshold T_dWill be greater than T_d∈[0,1]Is set to 0, the values of the other elements are set to 1, and a discard mask M is constructed_dAs shown in equation (4):

wherein A is_k ^*(i, j) represents the value of the element belonging to the kth local feature in the ith row and jth column, M_d(i, j) represents the value of the discard mask at the corresponding location, threshold T_dIs set to 0.5;

step 2.4: attention-directed discarding of images and corresponding features

With a discard mask M_dAnd original image, new attention discard image X_dCan be obtained by multiplying them, and then it will be sent to the network again and learn M new partial features f_k ^dBy the group of formulas, attention is encouraged to try to propose other discriminative components, and finally the positioning accuracy and the feature quality are improved;

the step 3 specifically comprises the following steps:

step 3.1: defining edge characteristics of a graph

Obtaining M local discrimination characteristics f through the step 2₁,f₂,…f_MA directed graph G ═ (V, E) is constructed to capture the context between these discrete local features, given vertices V ═ 1,2

The edge characteristics from the ith vertex to the jth vertex may be defined as:

e_ij＝h_θ；σ(f_i,f_i-f_j) (5)

h_θ；σ() Is an asymmetric edge function implemented by shared MLP, as shown in equation (6):

h_θ；σ(f_i,f_i-f_j)＝Relu(θ·f_i+σ(f_i-f_j) (6)

where θ and σ represent parameters of the network;

step 3.2: generating fusion features

As shown in equation (6), from f_i-f_jThe neighborhood information captured for different nodes may be compared to f_iThe captured global information is gradually combined, similar to the image convolution, the output of the ith vertexThe out can be obtained by performing a global max pooling operation on all edge features associated with the ith vertex, as shown in equation (7):

wherein f is_i ^rRepresents the blend feature of the ith vertex, GMP represents the global max pooling operation, and finally, the blend feature f of all vertices^rAnd sending the characteristics as the fine-grained objects to a Hash coding module for coding. The step 4 specifically comprises the following steps:

step 4.1: hash code generation

For M reconstructed context features f_i ^rThe semantic hash coding module outputs a hash code of B bits, which can be calculated according to formula (8):

H_i＝tanh((W^H)^Tf_i ^r+δ^H),i＝1,2,…,M (8)

wherein H_i∈R^BIs f_i ^rOutput through the Hash layer, δ^HE.g. B and W^H∈R^M×BRespectively, represent the bias and weight of the hash layer, tanh () represents the activation function, which can be described by equation (9):

step 4.2: mapping of real-valued hash codes to two-dimensional hash codes

The final hash code can be obtained according to the formula (10), since the value range of tanh () is [ -1,1]If H is present_i＞＞0，B_i1, otherwise B_iSince sign () has zero at a non-zero point, which causes a problem of gradient vanishing, the method will only test the test pattern with H_iThe mapping is a two-dimensional hash code,

B_i＝sgn(H_i) (10)；

the step 5 specifically comprises the following steps:

5.1: loss of center

In learning the discriminative features in step 1, it is necessary to use the attention map A belonging to the same class for each_kCapable of pointing to similar partial areas of the object, first introducing a loss L_ctrTo learn each local discriminant feature f_kC center of the feature_k，L_ctrPenalizing the variance of features from the same part of different objects with the same class label can be represented by equation (11):

wherein c is_kIs A_kCan be initialized from zero and averaged by moving the average c_k＝(1-μ)c_k+μf_kAnd (6) updating. Here, μ control c_kUpdate rate of L_ctrThe loss only applies to the original image:

5.2: loss of classification

Using cross entropy loss L_ceTo constrain the prediction classes and real image labels Y^*For M original local features f_kSimply stack them together and then feed into the SoftMax layer to predict their class probability Y_oriThe loss can be calculated as:

similarly, the discard characteristics f of M are predicted separately_k ^dAnd the reconstructed context features f of M_k ^rClass probability Y of_dropAnd Y_recnThe total classification loss is represented by L_ori、L_dropAnd L_recnComposition, as shown in equation (13):

L_cls＝L_ori(Y_ori,Y^*)+L_drop(Y_drop,Y^*)+L_recn(Y_recn,Y^*) (13)

5.3: hash loss

Previous deep hashing methods used Sigmoid functions to define probability functions, however, existing hashing methods often lack the ability to focus related images within a small hamming sphere, so they may not perform well for hamming space retrieval. Thus, the method uses a bayesian framework to optimize the quantization loss, and the method uses the probability function as follows:

where γ is a scale parameter of the Cauchy distribution, when the Hamming distance is small, the function drops rapidly, resulting in similar points being pulled to within a small Hamming radius, and the Hamming distance for a pair of binary hash codes hi and hj is:

where K represents the number of bits of the hash code, then the Cauchy quantization loss is derived as:

in addition to reducing quantization errors, the method takes into account the bit balancing property, which means that each bit of the hash code has about a 50% chance of being 1 or-1, and in order to produce a more discriminative hash code, the method increases the bit balancing penalty:

wherein, H is a Hash code of K bits. The purpose of the bit balance loss is to generate a balanced hash code, then the hash loss L_hashGiven by equation (18):

L_hash＝L_q+L_b (18)

5.4: total loss

Finally, the overall loss of the method is:

L＝L_ctr+L_cls+L_q+L_b (19)

has the advantages that: the invention provides a fine-grained bird image retrieval method based on deep hash and a graph neural network, wherein a global fine-grained feature aggregation module is designed, the module reconstructs distinguishing features by capturing context correlation based on a K-nearest neighbor graph, can learn each discriminant part (such as bird head, wings and the like) of birds, and establishes a relation graph among different parts, thereby constructing fusion features with discriminant; a semantic hash coding module is designed, and the semantic hash coding module generates a hash code with compact semantics under the guidance of Cauchy quantization loss and bit balance loss, so that the storage overhead in practical application is reduced, and the retrieval speed is improved. Practical application shows that the method is low in storage overhead, performance of the method is superior to that of the most advanced general retrieval method and fine-grained retrieval method all the time, and powerful support is provided for reducing human management cost in practical application.

Drawings

FIG. 1 is a flow chart of a retrieval method according to an embodiment of the present invention;

FIG. 2 is a test result chart in an embodiment of the present invention.

Detailed Description

The invention is described in detail below with reference to the following figures and specific examples:

in this example, taking bird data sets CUB200-2011 as an example, as shown in fig. 1, a fine-grained bird image retrieval method based on deep hash and a graph neural network includes the following steps:

step 1 data preparation

According to the method, experiments are carried out on a bird fine-grained data set CUB200-2011, and the bird fine-grained data set CUB200-2011 is compared with other fine-grained retrieval methods, wherein the CUB200-2011 comprises 200 bird species and 11788 images. Of these, 5994 were used for training and 5794 were used for testing. The method uses the test images as a query set and the training images as a retrieval database of all the images. All images were resized to 448 x 448 pixels before being fed into the network.

Step 2 local feature based node representation

Step 2.1 local feature based node representation

For image X, the method feeds it into the backbone network Resnet50 and selects the features of the conv5 layer as its features F ∈ R^H×W×NWhere H, W and N represent the height, width and number of channels, respectively, of a feature.

Step 2.2 attention map generation

M attention maps are generated for each image, set to 32, A in this example_k∈R^H×WIndicated as the attention diagram for the kth fine-grained component, which may correspond to a wing or a head of a bird. The method converts the characteristic diagram into an attention diagram by using a 1 multiplied by 1 convolution function f (), and the calculation method of the attention diagram is shown as formula (1).

Step 2.3 Generation of discriminant features

By means of the attention maps with different local areas, local features with discriminant performance are extracted from the parts. The M-portion features corresponding to these local regions can be calculated by equation (2).

f_k＝g(A_k⊙F),k＝1,2,…,M (2)

Wherein f is_kRepresents the k-th local feature, which represents the multiplication of the elements of feature map F and the k-th attention map, and g () is the global average pooling operation.

Step 3, mining related component relation based on graph convolution

Step 3.1 random attention map selection

The method carries out a random discarding strategy, and randomly selects an attention diagram A from M attention diagrams_kTo force the network to search for other local areas where information is rich.

Step 3.2 normalizing the selected attention map

Specifically, for each training image, an attention map A is first randomly selected from A_k. In order to improve the convergence rate of the model, the method adopts minimum-maximum normalization, and A is_kIs smoothed to [0,1 ]]Is shown in formula (3).

Here, A_k ^*The enhanced kth attention map is shown.

Step 3.3 building discard mask

wherein A is_k ^*(i, j) represents the value of the element belonging to the kth local feature in the ith row and jth column, M_d(i, j) represents the value of the discard mask at the corresponding location. Here, the threshold value T_dIs set to 0.5;

step 3.4 attention-directed discarding of images and corresponding features

With a discard mask M_dAnd original image, new attention discard image X_dCan be obtained by multiplying them, and then the obtained product is fed into the network again and learns M new partial characteristics

Step 4, mining related component relation based on graph convolution

Step 4.1 defining edge characteristics of graph

Through the step 3, M local discrimination characteristics f are obtained₁,f₂,…f_MThe method constructs a directed graph G ═ (V, E) to capture the context between these discrete local features. Given vertex V ═ {1, 2., M } and edge

e_ij＝h_θ；σ(f_i,f_i-f_j) (5)

h_θ；σ(f_i,f_i-f_j)＝Relu(θ·f_i+σ(f_i-f_j) (6)

where theta and sigma represent parameters of the network.

Step 4.2 generating fusion features

As shown in equation (6), from f_i-f_jThe neighborhood information captured for different nodes may be compared to f_iThe captured global information is gradually combined. Similar to the image convolution, the output of the ith vertex can be obtained by performing a global max-pooling operation on all edge features associated with the ith vertex, as in equation (7).

Wherein f is_i ^rRepresents the blend feature of the ith vertex, GMP represents the global max pooling operation, and finally, the blend feature f of all vertices^rAnd sending the characteristics as the fine-grained objects to a Hash coding module for coding.

Step 5 semantic Hash coding

Step 5.1 Hash code Generation

For M reconstructed context features f_i ^rThe semantic hash coding module outputs a hash code of B bits, which can be calculated according to the formula (8).

H_i＝tanh((W^H)^Tf_i ^r+δ^H),i＝1,2,…,M (8)

Wherein H_i∈R^BIs f_i ^rAnd outputting through a hash layer. Delta^HE.g. B and W^H∈R^M×BRespectively representing the bias and weight of the hash layer. tanh () represents an activation function, and can be described by equation (9).

Step 5.2 real-valued hash codes are mapped into two-dimensional hash codes

According to equation (10), the final hash code can be obtained. Because the value range of tanh () is [ -1,1 [ ]]If H is present_i＞＞0，B_i1, otherwise B_iIs-1. Since sign () has zero gradient at non-zero points, which causes the problem of gradient disappearance, the method will only test the test pattern with H_iAnd mapping into a two-dimensional hash code.

B_i＝sgn(H_i) (10)

Specifically, in step 5, for the feature extraction and hash coding in the above steps, a loss function is constructed to gradually converge the network. The method comprises the following steps:

step 6 loss function

6.1 center loss

In learning the discriminative feature in step 2, it is necessary to use the attention map a belonging to the same category for each_kCan be directed to similar part areas of the object. The process first introduces a loss L_ctrTo learn each local discriminant feature f_kC center of the feature_k。L_ctrPenalizing the variance of features from the same part of different objects with the same class label can be represented by equation (11).

Wherein c is_kIs A_kCan be initialized from zero and averaged by moving the average c_k＝(1-μ)c_k+μf_kAnd (6) updating. Here, μ control c_kUpdate rate of L_ctrThe loss only applies to the original image.

6.2 loss of Classification

The method adopts cross entropy loss L_ceTo constrain the prediction classes and real image labels Y^*The distance between them. For M original local features f_kSimply stack them together and then feed into the SoftMax layer to predict their class probability Y_oriThe loss can be calculated as:

similarly, the method also predicts the discard characteristics f of M respectively_k ^dAnd the reconstructed context features f of M_k ^rClass probability Y of_dropAnd Y_recn. The total classification loss is represented by L_ori、L_dropAnd L_recnComposition, as shown in equation (13):

L_cls＝L_ori(Y_ori,Y^*)+L_drop(Y_drop,Y^*)+L_recn(Y_recn,Y^*) (13)

6.3 Hash loss

where gamma is a scale parameter of the Cauchy distributionAnd (4) counting. When the Hamming distance is small, the function drops rapidly, causing similar points to be pulled within a small Hamming radius, for a pair of binary hash codes h_iAnd h_jThe Hamming distance of (d) is:

and K is the bit number of the hash code. Then, the Cauchy quantization loss is derived as:

wherein, H is a Hash code of K bits. The purpose of the bit balance loss is to generate an unbiased information hash code. Then, the hash loses L_hashGiven by equation (18):

L_hash＝L_q+L_b (18)

6.4 Total loss

Finally, the overall loss of the method is:

L＝L_ctr+L_cls+L_q+L_b (19)

7 training and testing of bird fine-grained model

7.1 bird dataset training

The method is based on PyTorch implementation codes, a model is trained by preprocessing images with the size of 448 multiplied by 448 by using 2 RTX 2080ti GPUs, in all experiments, Resnet-50 is used as a backbone network to extract features, and the output of a Conv5 layer is selected as the featuresFigure (a). For attention map generation, the attention map is obtained by a 1 × 1 convolution kernel, with a default M of 32, and a discard threshold T_dIs set to 0.5;

7.2 evaluation index

The method uses the retrieval accuracy (mAP) as an evaluation index, and is compared with the prior method fairly, and the method calculates the retrieval accuracy as follows:

wherein n is_qRepresenting the number of samples in the query set and n representing the number of samples returned. N is a radical of₊Representing the number of positive samples in N returned samples, N₊ ^kRefers to the number of positive samples in the first k returned samples. The value of pos (k) is 1 if the k-th image is a positive sample, otherwise it will be 0.

7.3 model test

All test samples were tested and the method was tested to achieve 86.92% effect on the 48-bit hash code. As shown in fig. 2, we selected two birds pictures, which retrieved the first 10 images, with 90% and 80% accuracy, respectively. For birds with small differences, the method shows high-precision retrieval results meeting user expectations on code bits of 48 bits.

Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A fine-grained bird image retrieval method based on a graph neural network and deep hash is characterized by comprising the following steps:

2. The fine-grained bird image retrieval method based on graph neural network and depth hash as claimed in claim 1, wherein a convolution function f () is used in step 1 to convert the feature map into an attention map, and the calculation method of the attention map is shown as formula (1):

3. the fine-grained bird image retrieval method based on graph neural network and depth hash according to claim 1 or 2, characterized in that M attention maps, a, are generated for each image in step 1_k∈R^H×WExpressed as an attention map for the kth fine-grained component;

the M-part feature of the local region is calculated by equation (2):

f_k＝g(A_k⊙F),k＝1,2,…,M(2)

wherein f is_kRepresents the k-th local feature, represents the multiplication of the elements of feature map F and the k-th attention map, and g () is the global average pooling operationDo this.

4. The fine-grained bird image retrieval method based on the graph neural network and the deep hash as claimed in claim 1, wherein the step 2 specifically comprises the following steps:

step 2.1: random attention-force diagram selection

For the M learned space attention diagrams, a random discarding strategy is carried out, and one attention diagram A is randomly selected from the M attention diagrams_kTo force the network to search for other local areas rich in information;

step 2.2: normalizing selected attention maps

For each training image, first randomly selecting an attention map A from A_kIn order to improve the convergence rate of the model, the minimum-maximum normalization is adopted, and A is_kIs smoothed to [0,1 ]]As shown in equation (3):

A_k ^*representing the enhanced kth attention map;

step 2.3: building discard masks

step 2.4: attention-directed discarding of images and corresponding features

New attention discard image X_dBy discarding the mask M_dMultiplied by the original image, and sent to the network again to learn M new partial features

By this set of formulas, attention is encouraged to try to propose other discriminative components, ultimately improving the accuracy of the positioning and the quality of the features.

5. The fine-grained bird image retrieval method based on graph neural network and deep hash as claimed in claim 1, wherein step 3 specifically comprises the following steps:

step 3.1: defining edge characteristics of a graph

e_ij＝h_θ；σ(f_i,f_i-f_j)(5)

h_θ；σ(f_i,f_i-f_j)＝Relu(θ·f_i+σ(f_i-f_j)(6)

where θ and σ represent parameters of the network;

step 3.2: generating fusion features

As shown in equation (6), from f_i-f_jThe neighborhood information captured for different nodes may be compared to f_iThe captured global information is gradually combined, and similar to image convolution, the output of the ith vertex can be maximized globally by applying all edge features associated with the ith vertexPooling, as obtained in equation (7):

wherein f is_i ^rRepresenting the blend feature of the ith vertex, GMP representing the global max pooling operation, and the blend feature f of all the last vertices^rAnd sending the characteristics as the fine-grained objects to a Hash coding module for coding.

6. The fine-grained bird image retrieval method based on graph neural network and deep hash according to claim 1, wherein step 4 specifically comprises the following steps:

step 4.1: hash code generation

H_i＝tanh((W^H)^Tf_i ^r+δ^H),i＝1,2,…,M (8)

step 4.2: mapping of real-valued hash codes to two-dimensional hash codes

The final hash code can be obtained according to the formula (10), since the value range of tanh () is [ -1,1]If H is present_i＞＞0，B_i1, otherwise B_iSince sign () has zero at a non-zero point, which causes a problem of gradient vanishing, the method will only test the test pattern with H_iMapping to two dimensionsThe hash code is a code of a hash of the code,

B_i＝sgn(H_i) (10)。

7. the fine-grained bird image retrieval method based on graph neural network and deep hash as claimed in claim 1, wherein step 5 specifically comprises the following steps:

5.1: loss of center

In learning the discriminative features in step 1, it is necessary to use the attention map A belonging to the same class for each_kCapable of pointing to similar partial areas of the object, first introducing a loss L_ctrTo learn each local discriminant feature f_kC center of the feature_k，L_ctrPenalizing the variance of features from the same part of different objects with the same class label, represented by equation (11):

wherein c is_kIs A_kIs initialized from zero and averaged by moving the average c_k＝(1-μ)c_k+μf_kUpdate, μ control c_kThe update rate of (d);

5.2: loss of classification

Using cross entropy loss L_ceTo constrain the prediction classes and real image labels Y^*For M original local features f_kStacking them together, then feeding them into SoftMax layer, predicting their class probability Y_oriThe loss can be calculated as:

similarly, the discard characteristics f of M are predicted separately_k ^dAnd the reconstructed context features f of M_k ^rClass probability Y of_dropAnd Y_recnTotal classification lossFrom L_ori、L_dropAnd L_recnComposition, as shown in equation (13):

L_cls＝L_ori(Y_ori,Y^*)+L_drop(Y_drop,Y^*)+L_recn(Y_recn,Y^*) (13)

5.3: hash loss

The quantization loss is optimized using a bayesian framework, using the probability function as follows:

where γ is the scale parameter of Cauchy distribution, when the Hamming distance is small, the function drops rapidly, resulting in similar points being pulled to within a small Hamming radius, for a pair of binary hash codes h_iAnd h_jThe Hamming distance of (d) is:

in addition to reducing quantization errors, the bit balance property is also taken into account, which means that each bit of the hash code has about a 50% chance of being 1 or-1, increasing the bit balance penalty in order to produce a more discriminative hash code:

the purpose of the bit balance loss is to generate a balanced hash code, then the hash loss L_hashGiven by equation (18):

L_hash＝L_q+L_b (18)

5.4: total loss

The final overall loss is:

L＝L_ctr+L_cls+L_q+L_b (19)。

8. the fine-grained bird image retrieval method based on graph neural network and deep hash as claimed in claim 7, wherein L is_ctrThe loss only applies to the original image.