CN117033687A

CN117033687A - Crop disease and pest image retrieval method and device

Info

Publication number: CN117033687A
Application number: CN202311008023.2A
Authority: CN
Inventors: 陈亚雄; 李小玉; 黄景灏; 熊盛武
Original assignee: Sanya Science and Education Innovation Park of Wuhan University of Technology
Current assignee: Sanya Science and Education Innovation Park of Wuhan University of Technology
Priority date: 2023-08-10
Filing date: 2023-08-10
Publication date: 2023-11-10

Abstract

The invention provides a crop disease and pest image retrieval method and device, wherein the method comprises the following steps: acquiring an image to be retrieved; acquiring a complete multi-level feature fusion network, carrying out multi-level feature fusion on local features and global features of an image to be retrieved based on the complete multi-level feature fusion network to obtain fusion features of the image to be retrieved, and determining hash codes of the fusion features of the image to be retrieved; based on the hash code of the fusion characteristic of the image to be searched, calculating the Hamming distance between the image to be searched and the search library image, and taking the search library image with the Hamming distance smaller than a preset value as the searched image. The invention processes the image by utilizing the context information of the global feature and the advanced semantic information of the local feature, thereby improving the image retrieval precision.

Description

Crop disease and pest image retrieval method and device

Technical Field

The invention relates to the technical field of image retrieval methods, in particular to a crop disease and pest image retrieval method.

Background

With the rise of digitization, the number of crop pest images has been explosively increased, which brings a great deal of data processing work to the experts in the agricultural field. Therefore, rapid and efficient image processing of crop diseases and insect pests has attracted considerable attention in the agricultural field. Particularly, due to farmers or agricultural field experts, the judgment generated according to experience is different, and further the problems of prevention and treatment of crop diseases and insect pests are caused. In order to alleviate the impact of this phenomenon, crop pest image search is beginning to develop, and crop pest image search can display other pest images identical to the search image, thereby assisting pest prevention and control.

The purpose of the crop pest image retrieval task is to retrieve as accurately as possible an image from a large-scale crop pest image that is the same as a given image. As the size of datasets increases, whether they are efficient and accurate is critical to the retrieval method. To achieve this goal, the depth hash-based approach presents an efficient advantage in the retrieval task with its efficient space utilization and the ability to map high-dimensional features to low-dimensional binary hash codes.

However, the existing hash methods applied to searching the crop pest images still have some problems, for example, when performing image searching tasks, many existing hash methods only focus on feature extraction by convolution and focus on significant areas by a focus mechanism, neglect the problems of multi-scale features and small inter-class differences of the crop pest images, and the problem that global information contains a large amount of redundant information, and finally result in lower searching accuracy.

Disclosure of Invention

In view of this, it is necessary to provide a method for searching crop pest images, so as to solve the problems that in the prior art, the multi-scale features of the crop pest images are ignored and the differences between classes are small, and the global information contains a large amount of redundant information, which ultimately results in the technical problem of low searching precision.

In order to solve the above problems, in one aspect, the present invention provides a crop pest image retrieval method, including:

acquiring an image to be retrieved;

acquiring a well-trained multi-level feature fusion network, carrying out multi-level feature fusion on the local features and the global features of the images to be searched based on the well-trained multi-level feature fusion network to obtain fusion features of the images to be searched, and determining hash codes of the fusion features of the images to be searched;

and calculating the Hamming distance between the image to be searched and the search library image based on the hash code of the fusion characteristic of the image to be searched, and taking the search library image with the Hamming distance smaller than a preset value as the searched image.

In some possible implementations, the well-trained multi-level feature fusion network includes a feature extraction module, a multi-scale feature fusion module, an attention module, a multi-level feature fusion module, and a hash module;

carrying out multi-level feature fusion on the local features and the global features of the image to be searched based on the trained multi-level feature fusion network to obtain fusion features of the image to be searched, and determining a hash code of the fusion features of the image to be searched, wherein the method comprises the following steps:

performing feature extraction on the image to be searched based on a feature extraction module, and determining global features and local features of the image to be searched;

pooling and splicing the local features based on a multi-scale feature fusion module to determine a first fusion feature containing multi-scale information;

performing interactive perception operation on the first fusion features based on the attention module, and determining second fusion features containing multi-scale information and fine-granularity information;

performing feature fusion on the global feature and the second fusion feature based on a multi-level feature fusion module to determine fusion features containing multi-scale information, fine granularity information and global context information;

and calculating the hash code of the fusion characteristic based on a hash module.

In some possible implementations, the determining global features and local features of the image to be retrieved based on feature extraction performed by a feature extraction module on the image to be retrieved includes:

removing an average pooling layer of the ResNet-18 network, changing a last linear layer, extracting global features of the image to be searched, and determining global features;

and taking a layer4 network of ResNet-18 as a sub-network for extracting local features to extract the local features of the image to be searched, and determining the local features.

In some possible implementations, the pooling and stitching the local features based on the multi-scale feature fusion module, to determine a first fusion feature containing multi-scale information includes:

based on four parallel average pooling layers, carrying out convolution pooling on the local features to obtain four features with different scales;

convolving the four features with different dimensions to reduce the dimension to obtain four features with different dimensions after dimension reduction;

and splicing the features with different dimensions after dimension reduction with the local features to obtain a first fusion feature containing multi-scale information.

In some possible implementations, the determining, based on the attention module performing an interactive awareness operation on the first fusion feature, a second fusion feature containing multi-scale information and fine-grained information includes:

performing convolution processing on the first fusion feature to determine a first guiding feature map;

performing interactive perception operation on the first guidance feature map to determine a first interactive perception attention pattern;

determining a second guide feature map based on the first guide feature map, the first interactive perception attention map and a preset element-by-element algorithm, performing interactive perception operation on the second guide feature map to obtain a second interactive perception attention map, determining a third guide feature map based on the second guide feature map, the second interactive perception attention map and the preset element-by-element algorithm, performing interactive perception operation on the third guide feature map to obtain a third interactive perception attention map, determining a fourth guide feature map based on the third guide feature map, the third interactive perception attention map and the preset element-by-element algorithm, and performing interactive perception operation on the fourth guide feature map to obtain a fourth interactive perception attention map;

and splicing the first interactive perception attention pattern, the second interactive perception attention pattern, the third interactive perception attention pattern and the fourth interactive perception attention pattern to obtain a second fusion characteristic containing multi-scale information and fine-grained information.

In some possible implementations, the preset element-by-element algorithm formula is:

in the method, in the process of the invention,represents the ith intermediate quantity, i.e {2,3,4}, ∈ }>Representing the i-1 st interaction awareness intention, < ->Representing matrix multiplication +.>Represents the i-1 th intermediate quantity, < + >>Represents the ith guidance feature map, phi ₂ Representing the relu activation function, BN representing the patchnorm layer, conv representing a convolution of 1X 1,/for>A parameter representing the network at the operation, wherein, when i=2,/is>

In some possible implementations, feature fusion is performed on the global feature and the second fusion feature based on a multi-level feature fusion module, and determining the fusion feature containing multi-scale information, fine-granularity information, and global context information includes:

performing similarity fusion on the global feature and the second fusion feature based on a cross attention mechanism to obtain a third fusion feature containing multi-scale information, fine-granularity information global and global context information;

and extracting a salient region of the third fusion feature based on an attention mechanism to obtain the fusion feature containing multi-scale features, fine-granularity feature information and global context information.

In some possible implementations, the similarity between the global feature and the second fusion feature is calculated by the following formula:

in delta _fa Representation ofAnd->Similarity between->And->Linear projection representing global feature and second fusion feature, respectively,/->θ and ρ are the scale factor, softmax function, and dropout operation, respectively.

In some possible implementations, obtaining a training-complete multi-level feature fusion network includes: performing iterative training on the multi-level feature fusion network model by taking a preset loss function as an optimization target until a training termination condition is reached, so as to obtain a fully trained multi-level feature fusion network model;

preset loss functionThe formula is:

wherein, a pair of images is selected from the images in the search library, Y _i And Y _j Is an actual label corresponding to a pair of images, h _i And h _j For a pair of image-corresponding hash-like codes, Y _I ' and Y _j ' is a predictive label corresponding to a pair of images, S is the similarity between a pair of images, gamma and eta represent a balance parameter and a weight parameter, respectively, dist (h _i ,h _j ) Represents h _i And h _j Euclidean distance between them, i represents a unit vector,representing a binary norm.

In another aspect, the present invention also provides an image retrieval device for crop diseases and insect pests, including:

the image acquisition unit is used for acquiring the image to be retrieved;

the multi-level feature fusion unit is used for carrying out multi-level feature fusion on the local features and the global features of the image to be searched to obtain fusion features of the image to be searched, and determining hash codes of the fusion features of the image to be searched;

the computing output unit is used for computing the Hamming distance between the image to be retrieved and the retrieval library image hash code, and the retrieval library image with the Hamming distance smaller than the preset value is used as the retrieved image.

The beneficial effects of adopting the embodiment are as follows: according to the crop disease and pest image retrieval method provided by the invention, the image to be retrieved is firstly obtained, then the image to be retrieved is put into a multi-level feature fusion network with complete training to obtain fusion features, the fusion features are converted into hash codes, the Hamming distance between the image to be retrieved and the hash codes of the images of the retrieval library is finally calculated, the images of the retrieval library with the Hamming distance smaller than a preset value are used as the retrieved images to be output, and the image is processed by utilizing the context information of global features and the high-level semantic information of local features, so that the image retrieval precision is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of an embodiment of a method for searching images of crop diseases and insect pests provided by the invention;

FIG. 2 is a flowchart illustrating the step S102 of FIG. 1 according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating the step S201 of FIG. 2 according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating the step S202 of FIG. 2 according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating the step S203 of FIG. 2 according to an embodiment of the present invention;

FIG. 6 is a flowchart illustrating the step S204 of FIG. 2 according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an embodiment of an image retrieval device for crop diseases and insect pests.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be understood that the schematic drawings are not drawn to scale. A flowchart, as used in this disclosure, illustrates operations implemented according to some embodiments of the present invention. It should be appreciated that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to or removed from the flow diagrams by those skilled in the art under the direction of the present disclosure. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor systems and/or microcontroller systems.

References to "first," "second," etc. in the embodiments of the present invention are for descriptive purposes only and are not to be construed as indicating or implying a relative importance or number of features indicated. Thus, a technical feature defining "first", "second" may include at least one such feature, either explicitly or implicitly. "and/or", describes an association relationship of an associated object, meaning that there may be three relationships, for example: a and/or B may represent: a exists alone, A and B exist together, and B exists alone.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The invention provides a crop disease and pest image retrieval method, which is described below.

Fig. 1 is a schematic flow chart of an embodiment of a method for searching an image of a crop disease and insect pest, as shown in fig. 1, the method for searching an image of a crop disease and insect pest includes:

s101, acquiring an image to be retrieved;

s102, acquiring a well-trained multi-level feature fusion network, carrying out multi-level feature fusion on local features and global features of an image to be retrieved based on the well-trained multi-level feature fusion network to obtain fusion features of the image to be retrieved, and determining hash codes of the fusion features of the image to be retrieved;

s103, calculating the Hamming distance between the image to be searched and the search library image based on the hash code of the fusion characteristic of the image to be searched, and taking the search library image with the Hamming distance smaller than a preset value as the searched image.

Compared with the prior art, the method has the advantages that firstly, the image to be searched is obtained, then the image to be searched is put into a multi-level feature fusion network with complete training, fusion features are obtained, the fusion features are converted into hash codes, finally, the Hamming distance between the image to be searched and the hash codes of the search library image is calculated, the search library image with the Hamming distance smaller than a preset value is used as the searched image to be output, and the image is processed by utilizing the context information of the global features and the high-level semantic information of the local features, so that the image searching precision is improved.

In some embodiments, the multi-level feature fusion network includes a feature extraction module, a multi-scale feature fusion module, an attention module, a multi-level feature fusion module, and a hash module, specifically, as shown in fig. 2, the multi-level feature fusion network based on training completion in step S102 performs multi-level feature fusion on local features and global features of an image to be retrieved, to obtain fusion features of the image to be retrieved, and determines a hash code of the fusion features of the image to be retrieved, including:

s201, carrying out feature extraction on an image to be retrieved based on a feature extraction module, and determining global features and local features of the image to be retrieved;

s202, carrying out pooling and splicing on local features based on a multi-scale feature fusion module, and determining a first fusion feature containing multi-scale information;

s203, performing interactive sensing operation on the first fusion characteristic based on the attention module, and determining a second fusion characteristic containing multi-scale information and fine-granularity information;

s204, carrying out feature fusion on the global features and the second fusion features based on a multi-level feature fusion module, and determining fusion features containing multi-scale information, fine granularity information and global context information;

s205, calculating hash codes of fusion features based on the hash module.

To make use of both the context information of the global features and the high-level semantic information of the local features, in some embodiments, the primary extraction of the global features and the local features is performed using a res net-18 Network (18-layer Residual Network), specifically, as shown in fig. 3, step S201 includes:

s301, removing an average pooling layer of a ResNet-18 network and changing a last linear layer to extract global features of an image to be retrieved, and determining the global features;

it should be noted that the extraction formula of the global feature is as follows:

in the method, in the process of the invention,representing global features->Representing modified ResNet-18 linear layer operation, θ _I Representing parameters of the network at the operation.

S302, extracting local features of the image to be retrieved by taking a layer4 network of ResNet-18 as a sub-network for extracting the local features, and determining the local features.

It should be noted that the local feature extraction formula is as follows:

in the method, in the process of the invention,representing local features->A fourth layer network representing ResNet-18, and (2)>Representing parameters of the sub-network.

To extract multi-scale information of the image to be retrieved, in some embodiments, multi-scale feature fusion is performed on the image to be retrieved, specifically, as shown in fig. 4, step S202 includes:

s401, carrying out convolution pooling on local features based on four parallel average pooling layers to obtain four features with different scales;

the convolution kernels of the four average pooling layers were 1×1,2×2,3×3, and 6×6, respectively.

S402, carrying out convolution dimension reduction on four features with different dimensions to obtain four features with different dimensions after dimension reduction;

it should be noted that four features of different dimensions each use a convolution with a convolution kernel size of 1×1 to reduce the channel dimension.

It should be further noted that the extraction formula of the four features with different dimensions after the dimension reduction is obtained is as follows:

in the method, in the process of the invention,representing the nth of the four different-scale features after dimension reduction, n e {1,2,3,4}, conv representing a convolution of 1 x 1, avg representing a pooling operation, < }>Representing local features.

And S403, splicing the features with different dimensions after dimension reduction with the local features to obtain a first fusion feature containing multi-scale feature information.

It should be noted that, the formula for splicing the features with different dimensions after dimension reduction with the local features is as follows:

wherein,representing a first fusion feature, phi ₁ Representing the Relu activation function, cat and Up represent the splicing operation and the upsampling operation, respectively,/->Representing the characteristics of four scales after dimension reduction, +.>Representing local features.

To focus on fine-grained information on the basis of multi-scale features, in some embodiments, the first fused feature is interactively perceptually manipulated, in particular, as shown in fig. 5, step S203 includes:

s501, carrying out convolution processing on the first fusion feature to determine a first guiding feature map;

it should be noted that, the formula for obtaining the first guiding feature map by performing convolution processing on the first fusion feature is:

wherein,phi is the first guidance feature map ₂ Representing the relu activation function, BN representing the patchnorm layer, conv representing a convolution of 1X 1,/for>For the first fusion feature, +.>Representing parameters of the network at the operation.

S502, performing interactive perception operation on the first guidance feature map, and determining a first interactive perception attention pattern;

the formula for performing the interactive sensing operation is as follows:

in the method, in the process of the invention,representing the ith interactive perception attention, τ represents the interactive perception operation, θ represents the softmax function,i e {1,2,3,4} for the ith guidance feature map.

S503, determining a second instruction feature map based on the first instruction feature map, the first interactive perception attention pattern and a preset element-by-element algorithm, performing interactive perception operation on the second instruction feature map to obtain a second interactive perception attention pattern, determining a third instruction feature map based on the second instruction feature map, the second interactive perception attention pattern and the preset element-by-element algorithm, performing interactive perception operation on the third instruction feature map, determining a fourth instruction feature map based on the third instruction feature map, the third interactive perception attention pattern and the preset element-by-element algorithm, and performing interactive perception operation on the fourth instruction feature map to obtain a fourth interactive perception attention pattern;

it should be noted that, the preset element-by-element algorithm formula is:

in the method, in the process of the invention,represents the ith intermediate quantity, i.e {2,3,4}, ∈ }>Representing the i-1 st interaction awareness intention, < ->Representing matrix multiplication +.>Represents the i-1 th intermediate quantity, < + >>Represents the ith guidance feature map, phi ₂ Representing the relu activation function, BN representing the patchnorm layer, conv representing a convolution of 1X 1,/for>Representing the network at the operation siteParameters, wherein, when i=2, +.>

S504, the first interaction perception attention pattern, the second interaction perception attention pattern, the third interaction perception attention pattern and the fourth interaction perception attention pattern are spliced to obtain a second fusion characteristic containing multi-scale information and fine-grained information.

It should be noted that, the formula for stitching the first interaction perception attention pattern, the second interaction perception attention pattern, the third interaction perception attention pattern and the fourth interaction perception attention pattern is as follows:

in the method, in the process of the invention,representing a second fusion feature, cat representing a stitching operation, < ->Representing a first interactive awareness intention, +.>Representing a second interactive awareness intention, +.>Representing a third interactive awareness intention, +.>A fourth interactive awareness attention pattern is represented.

To add context information of the global feature to the second fused feature and remove redundant information in the global feature, in some embodiments, specifically as shown in fig. 6, step S204 includes:

s601, carrying out similarity fusion on the global feature and the second fusion feature based on a cross attention mechanism to obtain a third fusion feature containing multi-scale information, fine-granularity information global and global context information;

it should be noted that, the similarity between the global feature and the second fusion feature is specifically:

It should be further noted that, the formula for fusing the global feature and the second fusion feature is:

in the formula, delta' _fa Representing a third fusion feature, delta _fa Representing the similarity of the global feature and the second fused feature,same as above->

It should also be noted that, in order to obtain a more efficient third fusion feature, a class residual operation is used for δ' _fa The specific operation is as follows:

in the method, in the process of the invention,representing the third fusion characteristic finally obtained, BN representing batch normalization operation, ρ representing dropout operation, MLP representing multi-layer perceptron network,/o>Linear projection representing global features, delta' _fa A third fusion feature representing the last operation.

It should be further noted that, by using the cross-attention mechanism, the global feature and the second fusion feature are fused, so that a part of redundant information contained in the global feature is removed.

S602, extracting a salient region of the third fusion feature based on an attention mechanism to obtain the fusion feature containing multi-scale features, fine-granularity feature information and global context information.

In order to enhance the availability of the salient region, first of allProjection is +.>And->Obtaining final fusion characteristics by adopting self-attention operation and a similar residual error structure:

in the method, in the process of the invention,representing fusion features, BN representing batch normalization operation, +.>θ and ρ are the scale factor, softmax function, and dropout operation, respectively.

In some embodiments, in step S206, the hash module includes two parallel full-connection layers, a hash layer and a label layer; the hash layer is used for generating K-bit hash codes by using a tanh function, and the output of the hash layer consists of K nodes (K is the bit number of the hash codes); the label layer is used for predicting the category of the image and consists of S nodes (S is the category number of the pest image).

It should be noted that, when training the multi-level feature fusion network, it is necessary to use the tanh function to generate the hash code to optimize the network, and when searching the image to be searched, it is necessary to use the discrete hash code to calculate the hamming distance, and at this time, it is necessary to quantize the hash code generated by the tanh function into the discrete hash code, and the specific formula is as follows:

where b represents a discrete hash code, H and Ω represent sign functions,h represents a hash-like code, σ represents an activation function tanh, fc represents a hash layer having k nodes, f _la Representing fusion characteristics, W _h Representing the weight parameters of the hash layer.

For the task of searching the pest image, the distance between the similar image hash codes should be as small as possible, and the distance between the dissimilar image hash codes should be as far away from each other as possible, so the design loss function optimizes the network, specifically, in step S102, the obtaining the multi-level feature fusion network with complete training includes: and carrying out iterative training on the multi-level feature fusion network by taking a preset loss function as an optimization target until a training termination condition is reached, so as to obtain a fully trained multi-level feature fusion network model.

It should be noted that the preset loss functionThe method comprises the following steps:

wherein, a pair of images is selected from the images in the search library, Y _i And Y _j Is an actual label corresponding to a pair of images, h _i And h _j For a pair of image-corresponding hash-like codes, Y _I ' and Y _j ' is a predictive label corresponding to a pair of images,for the similarity between a pair of images, γ and η represent a balance parameter and a weight parameter, respectively, dist (h _i ,h _j ) Represents h _i And h _j Euclidean distance between them, I represents unit vector,>representing a binary norm.

Wherein,the method comprises the following steps:

it should also be noted that in each round of training, the image is randomly selected and resized to 256×256, the hash module is set to generate k-bit hash codes, k is set to 16, 24, 32, 48, and 64, respectively, the Adam strategy is used to optimize the objective function, the weight decay is set to 0.00001, the learning rate is set to 0.0001, the size of the batch size is set to 128, and the training network 80 rounds or until the loss is no longer reduced.

In order to better implement a crop disease and pest image retrieval method in the embodiment of the present invention, correspondingly, as shown in fig. 7, on the basis of the crop disease and pest image retrieval method, the embodiment of the present invention further provides a crop disease and pest image retrieval device 700, including:

an image acquisition unit 701 for acquiring an image to be retrieved;

the multi-level feature fusion unit 702 is configured to perform multi-level feature fusion on the local features and the global features of the image to be retrieved, obtain fusion features of the image to be retrieved, and determine hash codes of the fusion features of the image to be retrieved;

a calculation output unit 703, configured to calculate a hamming distance between the image to be retrieved and the retrieval library image hash code, where the hamming distance is smaller than a preset value, as the retrieved image.

The crop disease and pest image retrieval device provided in the above embodiment can implement the technical scheme described in the above embodiment of the crop disease and pest image retrieval method, and the specific implementation principle of each unit can be referred to the corresponding content in the above embodiment of the crop disease and pest image retrieval method, which is not repeated here.

The above description is made in detail of a crop disease and pest image retrieval method provided by the invention, and specific examples are applied herein to explain the principle and implementation of the invention, and the above description of examples is only used for helping to understand the method and core idea of the invention; meanwhile, as those skilled in the art will vary in the specific embodiments and application scope according to the ideas of the present invention, the present description should not be construed as limiting the present invention in summary.

Claims

1. A crop pest image retrieval method, comprising:

acquiring an image to be retrieved;

2. The crop disease and pest image retrieval method according to claim 1, wherein the well-trained multi-level feature fusion network comprises a feature extraction module, a multi-scale feature fusion module, an attention module, a multi-level feature fusion module and a hash module;

3. The method for searching for crop pest images according to claim 2, wherein determining global features and local features of the image to be searched for based on feature extraction performed on the image to be searched for by a feature extraction module comprises:

4. The method for searching the crop disease and pest image according to claim 2, wherein the step of pooling and stitching the local features based on the multi-scale feature fusion module to determine a first fusion feature containing multi-scale information comprises the steps of:

5. The method for searching for crop pest images according to claim 2, wherein,

performing an interactive perception operation on the first fusion feature based on the attention module, determining a second fusion feature containing multi-scale information and fine-granularity information, including:

6. The method for searching for crop pest images according to claim 5, wherein the predetermined element-by-element algorithm formula is:

in the method, in the process of the invention,represents the ith intermediate quantity, i.e {2,3,4}, ∈ }>Representing the i-1 st interaction awareness attention pattern,representing matrix multiplication +.>Represents the i-1 th intermediate quantity, < + >>Represents the ith guidance feature map, phi ₂ Representing the relu activation function, BN representing the patchnorm layer, conv representing a convolution of 1X 1,/for>A parameter representing the network at the operation, wherein, when i=2,/is>

7. The method of claim 2, wherein determining the fused features containing multi-scale information, fine-grain information, and global context information based on feature fusion of the global features and the second fused features by a multi-level feature fusion module comprises:

8. The method for searching for crop pest images according to claim 7, wherein the similarity between the global feature and the second fusion feature is calculated by the following formula:

9. The method for searching for crop pest images according to claim 1, wherein the step of obtaining a trained multi-level feature fusion network comprises: performing iterative training on the multi-level feature fusion network model by taking a preset loss function as an optimization target until a training termination condition is reached, so as to obtain a fully trained multi-level feature fusion network model;

the preset loss functionThe formula is:

wherein, a pair of images is selected from the images in the search library, Y _i And Y _j Is an actual label corresponding to a pair of images, h _i And h _j For a pair of image-corresponding hash-like codes, Y _I ^′ And Y _j ^′ For a pair of predictive labels corresponding to an image,for the similarity between a pair of images, γ and η represent a balance parameter and a weight parameter, respectively, dist (h _i ,h _j ) Represents h _i And h _j Euclidean distance between them, I represents unit vector,>representing a binary norm.

10. A crop pest image retrieval device, comprising:

the image acquisition unit is used for acquiring the image to be retrieved;