CN114187183A

CN114187183A - Fine-grained insect image classification method

Info

Publication number: CN114187183A
Application number: CN202111395529.4A
Authority: CN
Inventors: 徐杰; 方伟政; 李非非; 苏光辉; 余飞; 杨帆
Original assignee: Chengdu Xingyinian Intelligent Technology Co ltd
Current assignee: Chengdu Xingyinian Intelligent Technology Co ltd
Priority date: 2021-11-23
Filing date: 2021-11-23
Publication date: 2022-03-15

Abstract

The invention discloses a method for classifying fine-grained insect images, which comprises the steps of collecting fine-grained images of different forms of different types of insects, deleting repeated, fuzzy and overexposed low-quality images in a manual screening mode, enabling the remaining images to meet the characteristics of large intra-class difference and small inter-class difference, and then establishing image labels and performing enhancement processing; and establishing a neural network model for insect classification and training, and finally, classifying and detecting the fine-grained insect picture to be detected through the trained neural network model, so that the insects in the picture are directly output to belong to the category.

Description

Fine-grained insect image classification method

Technical Field

The invention belongs to the technical field of image classification in the field of computers, and particularly relates to a fine-grained insect image classification method.

Background

The classification of fine-grained images is a very important problem in the computer field and can be applied to many professional valuable application scenes. Although a great deal of deep learning application exists in the field of insect identification, fine-grained image classification algorithm is not well researched.

The fine-grained image classification is a computer vision task which is more difficult than the traditional image classification and has more application value in professional scenes, and the insect image classification based on deep learning has great significance in pest control of agriculture and forestry. The fine-grained image classification technology is applied to insect image recognition, insect species which are difficult to divide can be distinguished, the accuracy of insect classification is further improved, and therefore the practicability and reliability of insect image recognition in the actual production process are improved. However, the current research on fine-grained image classification algorithms in the field of insect image recognition is still insufficient and deep.

The fine-grained image classification algorithm mainly comprises a fine adjustment method based on a general image classification network, a positioning and identification combined method based on a strong supervision or weak supervision attention mechanism, a bilinear pooling method based on high-order feature fusion, a metric learning method and a transformer-based method.

The method mainly aims to solve the problems of small inter-class difference and large intra-class difference by positioning the discriminant region according to the Part-RCNN method based on strong supervision target detection positioning or the RA-CNN method based on weak supervision attention mechanism positioning. Even methods based on metric learning techniques appear to be independent of the location of the discriminative region, essentially suppressing extraneous features and finding key discriminative features by narrowing intra-class distances and expanding inter-class distances.

In biological taxonomy, the classification of species is a system hierarchical structure based on levels, and a fixed number of levels, namely, boundary, phylum, class, order, family, genus and species, are used, biological images are more and more similar in vision along with the descending of classification levels, and the classification difficulty gradually meets the requirement of so-called fine-grained classification in the field of computer vision. Insects recognized in the common sense of human beings generally belong to the class Insecta. Insects with great trait differences are generally divided according to 'meshes', and different insect categories are more and more similar as classification levels go down. In the process of labeling the collected insect images by the insect experts, specific labels of certain insect samples under the 'genus' or 'species' level can be identified according to the rich experience knowledge and the comparison of the insect maps. However, because insects are the most abundant animal types in organisms that can be contacted in daily life, the varieties are various, the shapes are extremely strange, even if the insects are experts in entomology who are experienced, the identification of the class of the specific insect sample, namely the genus and the species, cannot be realized, and only the identification of the class of the specific insect sample, namely the order, the suborder and the family can be realized.

Secondly, the insect classification algorithm is mainly used for identifying agricultural pests when being practically applied. In some cases, the category information required to determine whether an insect is a pest and the corresponding control strategy need not be refined to the "one" level, since insects in the same "family" or "genus" often have similar morphological aspects and traits. Then in this type of application scenario, a good algorithm should meet the following requirements: even if the correct fine-grained classification results are not available, the algorithm should limit the erroneous results to the correct upper-level classes. Finally, the supervision information used by the existing fine-grained image classification problems is a category label with the finest granularity level. However, in the field of fine-grained classification of insects, labeling of advanced tags is not very costly, and the difficulty of obtaining them is far weaker than that of bounding box information in a strong supervision method. Therefore, the performance of the fine-grained classification task should be effectively improved by adding high-level category information into supervised learning.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a fine-grained insect image classification method, which optimizes a fine-grained classification framework based on an improved weak supervision fine-grained classification model and a level multi-label constraint of metric learning and realizes accurate classification of insect images.

In order to achieve the above object, the present invention provides a method for classifying fine-grained insect images, comprising the steps of:

(1) acquiring and preprocessing an image;

fine-grained pictures of different forms of different types of insects are collected, and repeated, fuzzy and overexposed low-quality pictures are deleted in a manual screening mode, so that the residual pictures meet the characteristics of large intra-class difference and small inter-class difference;

(2) establishing a picture label;

each picture is labeled with three layers of category labels, and is marked as (y)₁，y₂，y₃) Tree mapping relation exists among the three layers of category labels; wherein the first layer category label y₁Indicating the "order" of the insect, second-level category label y₂Indicating the same insect category with similar visual characteristics at the eye, and a third layer of category labels y₃Representing a fine-grained category of insects;

(3) image enhancement processing;

(3.1) picture zooming: sampling the length and width of each picture to a fixed pixel size by a bilinear interpolation method;

(3.2) picture random rotation: setting a twiddle factor; randomly sampling a value in [ -factor, factor ] as a rotation angle of the picture and controlling the picture to rotate, wherein when the rotation angle is not 90 ° or 180 °, a pixel-free area left during the rotation operation is filled with black;

(3.3) randomly turning pictures horizontally or vertically: setting a probability factor p; randomly generating a probability value in [0, 1], if the probability value is less than p, carrying out random horizontal or vertical turning operation on the picture, otherwise, not turning the picture;

(3.4) picture random cropping: firstly, amplifying the picture to be 1.25 times of the original picture, and then randomly cutting an area with the size of the network input size in the amplified picture;

(3.5) carrying out color dithering processing on the picture;

(3.5.1) setting a jitter Factor;

(3.5.2), enhancing the brightness of the picture: randomly generating a numerical value s between max (0,1-Factor) and (1+ Factor) as a scaling Factor of the brightness, and multiplying the brightness value bright of the original picture by the scaling Factor s to obtain an enhanced brightness value

(3.5.3), enhancing the contrast of the picture: randomly generating a numerical value s between max (0,1-Factor) and (1+ Factor) as a scaling Factor of the contrast, and multiplying the contrast of the original picture by the scaling Factor s to obtain an enhanced contrast value

(3.5.4), enhancing the saturation of the picture: randomly generating a numerical value s between max (0,1-Factor) and (1+ Factor) as a scaling Factor of the saturation, and multiplying the saturation of the original picture by the scaling Factor s to obtain an enhanced saturation value

(3.5.5) controlling the hue of the enhanced picture through the label: initializing a category set s, and adding the category into the set when the category contains the pictures collected by shooting the specimen. Before the image is subjected to hue enhancement, whether the image belongs to the category of the set s is judged, if yes, the image is subjected to hue enhancement, and if not, the image is not subjected to enhancement. The hue enhancement operation is as follows: a value s is randomly generated between max (0,1-Factor) and (1+ Factor) as a hue shift Factor. Before the hue enhancement is carried out on the picture, the hue value hue of the original picture is subjected to the shift operation on the hue ring according to the shift factor s to obtain the enhanced hue value

(4) Setting a training data set, and setting the enhanced picture and the corresponding label as the training data set;

(5) building a neural network model for insect classification;

based on the existing Resnet50 network, the whole feature extraction part of Resnet50 is divided into 5 stages, and a channel attention mechanism module is added after stage3, stage4 and stage5 respectively, wherein the working principle of the channel attention mechanism module is as follows: the output characteristic I of each stage is a three-dimensional matrix of c x w x h, and each element in the matrix is marked as I_i,j,kWherein i ∈ [1, c ]],j∈[1,w],k∈[1,h]C is the number of channels, w and h are width and height;

the channel attention mechanism module adds an attention weight W with the length c to the output characteristic I to obtain the output characteristic of the channel attention mechanism

The values of the respective elements thereof are noted

Wherein, w_iIs a weight coefficient;

after passing through stage5 and a channel attention mechanism module, adding an improved bilinear pooling layer, wherein the specific working principle is as follows: will output the characteristics

Expanded into a two-dimensional vector of length c × (w × h), and is denoted as

Then to

Carrying out bilinear fusion to obtain the final output characteristics

Finally, the output characteristics are further processed

Spread out as a one-dimensional vector of length c × c, noted

(6) Training a neural network model;

inputting the enhanced picture and the corresponding label into a neural network model, and predicting the category of the corresponding insect in the input picture by the neural network model

(6.1) in the training data set, randomly selecting b pictures and corresponding labels as input of the training of the current round, and inputting the b pictures and the corresponding labels into a training neural network model;

(6.2) extracting b images through a neural network modelOutput characteristics of the sheet

i is 1,2, …, b, where the output feature of each picture is a one-dimensional vector of length c × c;

(6.3) output characteristics of each picture

Dimension reduction processing is carried out through a full connection layer to obtain a classification judgment result p of each picture_iWherein the classification discrimination result p_iThe method is a one-dimensional matrix, the length of the one-dimensional matrix is the number M of insect species in a training data set, each item in a classification judgment result is the probability that a network judges the insect species to be the corresponding insect species, and finally, the maximum probability is taken as a prediction result and is recorded as the prediction result

And according to

Get corresponding

(6.4) calculating a loss function value L after the training of the current round is finished;

wherein, beta₁The value of the super-parameter is 0.5 through several groups of comparative experiments,

the loss function value corresponding to the third-level label in the output characteristic,

for the loss function value, L, corresponding to the second level label in the output signature_CELoss function values corresponding to the classification judgment results;

wherein the content of the first and second substances,

showing that in the b input pictures of the training, when the three-level labels of any two pictures i and j are the same

Is 1, otherwise

Is 0, and delta is a preset threshold value; p is an output characteristic

Is p ∈ [1, c × c ∈ ]]；

Wherein λ is_iRepresenting the ratio of the number of pictures of the category to which the ith input picture belongs to the b input pictures in the current round; y is_iτFor the discrimination coefficient, y is set when the real label value of the ith input picture is equal to τ_iτIs 1, otherwise is 0; p is a radical of_iτIs the classification discrimination result p_iThe value of the τ -th element;

(6.5) after the training of the current round is finished, performing back propagation on the loss function value of the current round by a gradient descent method so as to update network parameters, and then returning to the step (6.1) to perform the next round of training until the network converges, thereby obtaining a trained neural network model;

(7) real-time classification of insect pictures;

and zooming the fine-grained insect picture to be detected to the size same as the training data through picture zooming operation, and inputting the fine-grained insect picture to the trained neural network, so that the insect in the picture is directly output to belong to the category.

The invention aims to realize the following steps:

the invention relates to a method for classifying fine-grained insect images, which comprises the steps of collecting fine-grained pictures of different forms of different types of insects, deleting repeated, fuzzy and overexposed low-quality pictures in a manual screening mode, enabling the remaining pictures to meet the characteristics of large intra-class difference and small inter-class difference, and then establishing picture labels and performing enhancement processing; and establishing a neural network model for insect classification and training, and finally, classifying and detecting the fine-grained insect picture to be detected through the trained neural network model, so that the insects in the picture are directly output to belong to the category.

Meanwhile, the fine-grained insect image classification method provided by the invention also has the following beneficial effects:

(1) through the label control hue enhancement method, the color characteristics can be learned through the network, and part of categories can be distinguished without using the color characteristics;

(2) the network can extract visual features of different levels by embedding the channel attention modules in different stages of the network, and simultaneously, high-order fusion is carried out between different features by a bilinear pooling technology so as to generate a large amount of fine-grained features which are more effective for classifying fine-grained images;

(3) and using the 3-level label and combining the loss function of the distance measurement to constrain the classification training process, so that the network can better distinguish the difference of fine-grained classes, expand the distance between dissimilar large classes, and limit the classification error of the sample in the fine-grained classes.

Drawings

FIG. 1 is a flow chart of a fine-grained insect image classification method according to the present invention;

FIG. 2 is a schematic representation of different forms of different species of insects;

FIG. 3 is a map of the mapping between three layers of tags of an insect;

FIG. 4 is a schematic diagram of random rotation of a picture;

FIG. 5 is a schematic diagram of a random flip of a picture;

FIG. 6 is a basic architecture diagram of a neural network model for insect classification;

FIG. 7 is a diagram of the neural network model architecture after improvement of the present invention;

FIG. 8 is a graph of the evaluation of the results of three network classifications;

FIG. 9 is a graph of the results of an effectiveness versus location selection experiment for the SE module;

FIG. 10 is a graph of the results of a validation experiment for bilinear pooling;

FIG. 11 is a graph comparing performance of different network architectures;

FIG. 12 is a comparison graph of classification performance incorporating multi-label supervision constraints;

FIG. 13 is a graph of performance comparison with the introduction of metric loss;

FIG. 14 is a performance comparison graph of distance metric comparison;

FIG. 15 is a performance comparison graph of threshold selection comparisons;

figure 16 is a graph of performance versus loss of several other distance metrics.

Detailed Description

The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.

Examples

FIG. 1 is a flowchart of a fine-grained insect image classification method according to the present invention.

In this embodiment, as shown in fig. 1, the present invention provides a method for classifying fine-grained insect images, including the following steps:

s1, image acquisition and preprocessing;

in the embodiment, fine-grained pictures of different forms of different types of insects are collected in a common internet search platform and a special insect database website by field outdoor scene shooting, specimen shooting and combining a crawler technology; wherein, the position and angle of the camera can be adjusted to obtain the pictures containing the visual contents of different parts of the insect, as shown in fig. 2. In addition, because the morphological structure of the insect can be changed, such as different curling forms of soft insect larvae and the open and closed forms of adult insect wings, in the field shooting process, pictures of different forms of the same type of insect are collected by prolonging the observation time of the insect.

Through manual screening, repeated pictures collected in different platforms and pictures which are different in hash value and extremely similar in content and are generated by continuous shooting of the same insect in the field shooting process are eliminated. In addition, those low quality pictures that are also blurred, overexposed, etc. due to the shooting problem are eliminated. Therefore, the remained 3719 fine-grained insect images of 100 classes meet the characteristics of large intra-class difference and small inter-class difference.

S2, establishing a picture label;

each picture is labeled with three layers of category labels, and is marked as (y)₁，y₂，y₃) Wherein the first layer class label y₁Denotes the "order" of the insect; second layer category label y₂The insect category with similar visual characteristics under the same eye is represented, and the range is equal to a certain suborder, a certain family and a certain subfamily under the label 'eye'; third layer category label y₃Fine granularity categories of "genus" and "species"; the first layer of tags has 15 types, the second layer of tags has 34 types, and the third layer of tags has 100 types. After the three-layer category label is established, each picture sample corresponds to a triple label (y)₁，y₂，y₃) As shown in fig. 3, a tree-like mapping relationship exists between the three layers of tags.

S3, picture enhancement processing;

s3.1, zooming the picture: sampling the length and width of each picture to a fixed pixel size by a bilinear interpolation method;

s3.2, randomly rotating the picture: setting a twiddle factor; randomly sampling a value in [ -factor, factor ] as a rotation angle of the picture and controlling the picture to rotate, the random rotation is shown in fig. 4, wherein when the rotation angle is not 90 ° or 180 °, the non-pixel area left during the rotation operation is filled with black;

s3.3, randomly turning the picture horizontally or vertically: setting a probability factor p; randomly generating a probability value in [0, 1], if the probability value is less than p, carrying out random horizontal or vertical turning operation on the picture, otherwise, not turning the picture; wherein, the random flipping example is shown in FIG. 5,

s3.4, randomly cutting pictures: firstly, amplifying the picture to be 1.25 times of the original picture, and then randomly cutting an area with the size of the network input size in the amplified picture;

s3.5, carrying out color dithering processing on the picture;

the color dithering method is to change the basic attributes of the original image, namely four attributes of brightness, contrast, saturation and hue; we describe the specific process of color dithering process as follows:

s3.5.1, setting a jitter Factor;

s3.5.2, enhancing the brightness of the picture: randomly generating a numerical value s between max (0,1-Factor) and (1+ Factor) as a scaling Factor of the brightness, and multiplying the brightness value bright of the original picture by the scaling Factor s to obtain an enhanced brightness value

S3.5.3, contrast of enhanced picture: randomly generating a value s between max (0,1-Factor) and (1+ Factor) as a pairThe contrast contast of the original picture is multiplied by a scaling factor s to obtain an enhanced contrast value

S3.5.4, enhancing the saturation of the picture: randomly generating a numerical value s between max (0,1-Factor) and (1+ Factor) as a scaling Factor of the saturation, and multiplying the saturation of the original picture by the scaling Factor s to obtain an enhanced saturation value

S3.5.5, control the hue of the enhanced picture by the label: initializing a category set s, and adding the category into the set when the category contains the pictures collected by shooting the specimen. Before the image is subjected to hue enhancement, whether the image belongs to the category of the set s is judged, if yes, the image is subjected to hue enhancement, and if not, the image is not subjected to enhancement. The hue enhancement operation is as follows: a value s is randomly generated between max (0,1-Factor) and (1+ Factor) as a hue shift Factor. Before the hue enhancement is carried out on the picture, the hue value hue of the original picture is subjected to the shift operation on the hue ring according to the shift factor s to obtain the enhanced hue value

S4, setting a training data set, and setting the enhanced picture and the corresponding label as the training data set;

s5, building a neural network model for insect classification;

in this embodiment, the specific architecture of the neural network model for insect classification is shown in fig. 6, where f (-) is the neural network classification model, we have removed the last full connectivity layer and loss function in fig. 6 where the L1 loss function is the classification function, i.e., the cross entropy loss function, modified herein as the cross entropy loss function of online weight statistics. After the class probability distribution finally output by the network, the class distribution characteristic dimension is consistent with the class number. And the L2 loss function is a distance measurement loss function which is connected with the characteristics output by the network characteristic extraction part, and the dimension of the loss function is consistent with the network output characteristics. The distance measurement loss function can calculate the distance between two features, and the parameters extracted from the network features are optimized through the loss function, so as to obtain the network structure shown in fig. 7, specifically: based on the existing Resnet50 network, the whole feature extraction part of Resnet50 is divided into 5 stages, and a channel attention mechanism module is added after stage3, stage4 and stage5 respectively, wherein the working principle of the channel attention mechanism module is as follows: the output characteristic I of each stage is a three-dimensional matrix of c x w x h, and each element in the matrix is marked as I_i，j，kWherein i ∈ [1, c ]]，j∈[1，w]，k∈[1，h]C is the number of channels, w and h are width and height;

The values of the respective elements thereof are noted

Wherein, w_iIs a weight coefficient;

Then to

Carrying out bilinear fusion to obtain the final output characteristics

Finally, the output characteristics are further processed

Spread out as a one-dimensional vector of length c × c, noted

In this embodiment, since the final feature dimension of ResNet50 is 2048 dimensions, and after bilinear pooling, 4194304-dimensional features are generated, and if the full-link layer output 100 classification is reconstructed, a full-link layer network with 400M parameters is generated, so that the dimension of the final feature is reduced to 512 dimensions by using a convolutional layer with the size of 2048 by 1 by 512, and the parameter quantity of the final full-link layer is reduced by 16 times.

S6, training a neural network model;

S6.1, randomly selecting 20 pictures and corresponding labels from the training data set as input of the training of the current round, and inputting the pictures and the corresponding labels into a training neural network model;

s6.2, extracting output characteristics of b pictures through a neural network model

s6.3, outputting characteristics of each picture

And according to

Get corresponding

S6.4, calculating a loss function value L after the training of the current round is finished;

wherein the content of the first and second substances,

Is 1, otherwise

Is 0, and delta is a preset threshold value; p is an output characteristic

Is p ∈ [1, c × c ∈ ]]；

s6.5, after the training of the current round is finished, performing back propagation on the loss function value of the current round by a gradient descent method so as to update network parameters, and then returning to the step S6.1 to perform the next round of training until the network converges so as to obtain a trained neural network model;

s7, real-time classification of insect pictures;

Experiment of

Numerous comparative and ablative experiments are performed to ensure the effectiveness of each part of the network.

Network selection: as can be seen from fig. 8, ResNet50 and inclusion v3 achieve good performance on the task herein. We used the two values of acc and F1 as evaluation indexes, and although the result of ResNet50 is slightly worse than inclusion v3, the difference is not large, and the difference in thousandths is easily smoothed by the randomness of the deep learning process. In addition, as the infrastructure of the ResNet50 network is simpler, the configuration logic of the whole network is simpler, and the change on the infrastructure is easier.

Effectiveness and location selection experiments for SE module: as can be seen from fig. 9, the effect of adding the attention mechanism to stage3, stage4, and stage5 is the best in the experimental conditions of this embodiment, and the experimental effect shows that the channel attention mechanism of the middle-high level features can actually affect the learning process of the features more deeply under the condition of less data volume, and achieve better classification results.

Effectiveness test of bilinear pooling: as shown in FIG. 10, the result is significantly improved by introducing the bioliner pooling method, the accuracy is improved by 1.6 percentage points, and the F1 score is improved by 1 percentage point. The method is explained as an original paper that the method can combine the position characteristic with the content characteristic, thereby being helpful to fine-grained classification. As understood and analyzed herein, the performance of the bilinear posing operation is only manifested when there are more classes to classify, when there are fewer classes to classify, there is no need for as many effective distinguishing features between classes, and when the number of classes to classify increases, the number of effective features needed to completely distinguish all classes increases at a geometric rate. Because the characteristics which finally participate in classification in the original ResNet50 network are only 2048-dimensional, after the outer product operation, the characteristics are subjected to second-order fusion, the characteristic space is increased, and the task of insect classification, which needs a large number of characteristics, is further satisfied.

A comparison structure of performances of different network structures is shown in fig. 11, and the evaluation indexes of the present invention shown in the last column of the figure are obviously superior to the performances of the other network structures.

The acc2-3 is used for the basic model to carry out performance evaluation: as shown in fig. 12, it is clear that the model has a poor accuracy for upper-level classification of insects when no multi-label supervised constraint framework is introduced.

Experiments were introduced to measure loss. As shown in fig. 13, introducing the L + change 3 loss improved the 3-level classification accuracy performance of 6 and 9 thousandths in the two structures, respectively, but the 2-level classification accuracy in the wrong sample set was slightly decreased. The introduction of the L-change 2 loss resulted in a slight 3-class classification accuracy improvement in both structures, respectively, but significantly improved the accuracy of 2-class classification in the wrong sample set, with an improvement of about 30 percentage points for ResNet50 and about 20 percentage points for ResNet50+ SE structure 2. When the two loss functions are combined, the performance of the model is obviously improved on two evaluation indexes.

Distance metric comparison experiment: as shown in FIG. 14, for the L-change 2 loss function, the use of the L1 distance can better improve the performance of acc2-3, and for the L + change 3 loss function, the use of the L1 distance can better improve the performance of acc 3. In addition, because the two loss functions are optimized for the features at the same position in the network, the distance measurement functions in the same form can limit the feature expression in the same feature space, which is better for the training process.

Threshold selection experiment: as shown in fig. 15, the best effect is achieved when the threshold Δ of the L1 distance between L-change 2 and L + change 3 is 0.7 and 0.2, respectively.

In addition, the invention also verifies the invalidity and the inhibitability of other distance measurement losses. As shown in fig. 16, the fine-grained classification results of the 3-level labels can be suppressed to different degrees by 4 kinds of losses. Wherein the loss of L-hinge3 has no obvious influence on the result, the L + hinge2 inhibits the 3-grade fine-grained classification by 2 percentage points, and the L + hinge1 and the L-hinge1 have larger inhibition influence on the experimental result. It can be analyzed that L + change 2 requires feature clustering within class 2, while class 2 classification is similar to conventional image classification, which results in the web learning more conventional discriminative features, while fine-grained classification of class 3 labels requires the web learning as fine-grained discriminative features as possible, which results in the performance of fine-grained classification being suppressed. In addition, the huge inhibition of the level 1 label constraint on the network performance is mainly reflected in that the feature space required by the level 1 classification is not consistent with the level 3 classification, the combination of the feature space and the level 3 classification is similar to the joint learning of two mutually exclusive tasks, and the effect of the level 3 classification is influenced by the network in order to learn more level 1 classification features.

Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims

1. A fine-grained insect image classification method is characterized by comprising the following steps:

(1) acquiring and preprocessing an image;

(2) establishing a picture label;

each picture is labeled with three layers of category labels, and is marked as (y)₁，y₂，y₃) Tree mapping relation exists among the three layers of category labels; wherein the first layer category label y₁Indicating the "order" of the insect, second-level category label y₂Indicate the same viewInsect categories with similar visual characteristics, third-level category label y₃Representing a fine-grained category of insects;

(3) image enhancement processing;

(3.5) carrying out color dithering processing on the picture;

(3.5.1) setting a jitter Factor;

(5) building a neural network model for insect classification;

based on the existing Resnet50 network, the whole feature extraction part of Resnet50 is divided into 5 stages, and a channel attention mechanism module is added after stage3, stage4 and stage5 respectively, wherein the working principle of the channel attention mechanism module is as follows: the output characteristic I of each stage is a three-dimensional matrix of c x w x h, and each element in the matrix is marked as I_i，j，kWherein i ∈ [1, c ]]，j∈[1，w]，k∈[1，h]C is the number of channels, w and h are width and height;