CN114187183A - Fine-grained insect image classification method - Google Patents

Fine-grained insect image classification method Download PDF

Info

Publication number
CN114187183A
CN114187183A CN202111395529.4A CN202111395529A CN114187183A CN 114187183 A CN114187183 A CN 114187183A CN 202111395529 A CN202111395529 A CN 202111395529A CN 114187183 A CN114187183 A CN 114187183A
Authority
CN
China
Prior art keywords
picture
factor
insect
value
pictures
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111395529.4A
Other languages
Chinese (zh)
Inventor
徐杰
方伟政
李非非
苏光辉
余飞
杨帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Xingyinian Intelligent Technology Co ltd
Original Assignee
Chengdu Xingyinian Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Xingyinian Intelligent Technology Co ltd filed Critical Chengdu Xingyinian Intelligent Technology Co ltd
Priority to CN202111395529.4A priority Critical patent/CN114187183A/en
Publication of CN114187183A publication Critical patent/CN114187183A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/60Rotation of whole images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method for classifying fine-grained insect images, which comprises the steps of collecting fine-grained images of different forms of different types of insects, deleting repeated, fuzzy and overexposed low-quality images in a manual screening mode, enabling the remaining images to meet the characteristics of large intra-class difference and small inter-class difference, and then establishing image labels and performing enhancement processing; and establishing a neural network model for insect classification and training, and finally, classifying and detecting the fine-grained insect picture to be detected through the trained neural network model, so that the insects in the picture are directly output to belong to the category.

Description

Fine-grained insect image classification method
Technical Field
The invention belongs to the technical field of image classification in the field of computers, and particularly relates to a fine-grained insect image classification method.
Background
The classification of fine-grained images is a very important problem in the computer field and can be applied to many professional valuable application scenes. Although a great deal of deep learning application exists in the field of insect identification, fine-grained image classification algorithm is not well researched.
The fine-grained image classification is a computer vision task which is more difficult than the traditional image classification and has more application value in professional scenes, and the insect image classification based on deep learning has great significance in pest control of agriculture and forestry. The fine-grained image classification technology is applied to insect image recognition, insect species which are difficult to divide can be distinguished, the accuracy of insect classification is further improved, and therefore the practicability and reliability of insect image recognition in the actual production process are improved. However, the current research on fine-grained image classification algorithms in the field of insect image recognition is still insufficient and deep.
The fine-grained image classification algorithm mainly comprises a fine adjustment method based on a general image classification network, a positioning and identification combined method based on a strong supervision or weak supervision attention mechanism, a bilinear pooling method based on high-order feature fusion, a metric learning method and a transformer-based method.
The method mainly aims to solve the problems of small inter-class difference and large intra-class difference by positioning the discriminant region according to the Part-RCNN method based on strong supervision target detection positioning or the RA-CNN method based on weak supervision attention mechanism positioning. Even methods based on metric learning techniques appear to be independent of the location of the discriminative region, essentially suppressing extraneous features and finding key discriminative features by narrowing intra-class distances and expanding inter-class distances.
In biological taxonomy, the classification of species is a system hierarchical structure based on levels, and a fixed number of levels, namely, boundary, phylum, class, order, family, genus and species, are used, biological images are more and more similar in vision along with the descending of classification levels, and the classification difficulty gradually meets the requirement of so-called fine-grained classification in the field of computer vision. Insects recognized in the common sense of human beings generally belong to the class Insecta. Insects with great trait differences are generally divided according to 'meshes', and different insect categories are more and more similar as classification levels go down. In the process of labeling the collected insect images by the insect experts, specific labels of certain insect samples under the 'genus' or 'species' level can be identified according to the rich experience knowledge and the comparison of the insect maps. However, because insects are the most abundant animal types in organisms that can be contacted in daily life, the varieties are various, the shapes are extremely strange, even if the insects are experts in entomology who are experienced, the identification of the class of the specific insect sample, namely the genus and the species, cannot be realized, and only the identification of the class of the specific insect sample, namely the order, the suborder and the family can be realized.
Secondly, the insect classification algorithm is mainly used for identifying agricultural pests when being practically applied. In some cases, the category information required to determine whether an insect is a pest and the corresponding control strategy need not be refined to the "one" level, since insects in the same "family" or "genus" often have similar morphological aspects and traits. Then in this type of application scenario, a good algorithm should meet the following requirements: even if the correct fine-grained classification results are not available, the algorithm should limit the erroneous results to the correct upper-level classes. Finally, the supervision information used by the existing fine-grained image classification problems is a category label with the finest granularity level. However, in the field of fine-grained classification of insects, labeling of advanced tags is not very costly, and the difficulty of obtaining them is far weaker than that of bounding box information in a strong supervision method. Therefore, the performance of the fine-grained classification task should be effectively improved by adding high-level category information into supervised learning.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a fine-grained insect image classification method, which optimizes a fine-grained classification framework based on an improved weak supervision fine-grained classification model and a level multi-label constraint of metric learning and realizes accurate classification of insect images.
In order to achieve the above object, the present invention provides a method for classifying fine-grained insect images, comprising the steps of:
(1) acquiring and preprocessing an image;
fine-grained pictures of different forms of different types of insects are collected, and repeated, fuzzy and overexposed low-quality pictures are deleted in a manual screening mode, so that the residual pictures meet the characteristics of large intra-class difference and small inter-class difference;
(2) establishing a picture label;
each picture is labeled with three layers of category labels, and is marked as (y)1,y2,y3) Tree mapping relation exists among the three layers of category labels; wherein the first layer category label y1Indicating the "order" of the insect, second-level category label y2Indicating the same insect category with similar visual characteristics at the eye, and a third layer of category labels y3Representing a fine-grained category of insects;
(3) image enhancement processing;
(3.1) picture zooming: sampling the length and width of each picture to a fixed pixel size by a bilinear interpolation method;
(3.2) picture random rotation: setting a twiddle factor; randomly sampling a value in [ -factor, factor ] as a rotation angle of the picture and controlling the picture to rotate, wherein when the rotation angle is not 90 ° or 180 °, a pixel-free area left during the rotation operation is filled with black;
(3.3) randomly turning pictures horizontally or vertically: setting a probability factor p; randomly generating a probability value in [0, 1], if the probability value is less than p, carrying out random horizontal or vertical turning operation on the picture, otherwise, not turning the picture;
(3.4) picture random cropping: firstly, amplifying the picture to be 1.25 times of the original picture, and then randomly cutting an area with the size of the network input size in the amplified picture;
(3.5) carrying out color dithering processing on the picture;
(3.5.1) setting a jitter Factor;
(3.5.2), enhancing the brightness of the picture: randomly generating a numerical value s between max (0,1-Factor) and (1+ Factor) as a scaling Factor of the brightness, and multiplying the brightness value bright of the original picture by the scaling Factor s to obtain an enhanced brightness value
Figure BDA0003369846580000031
Figure BDA0003369846580000032
(3.5.3), enhancing the contrast of the picture: randomly generating a numerical value s between max (0,1-Factor) and (1+ Factor) as a scaling Factor of the contrast, and multiplying the contrast of the original picture by the scaling Factor s to obtain an enhanced contrast value
Figure BDA0003369846580000033
Figure BDA0003369846580000034
(3.5.4), enhancing the saturation of the picture: randomly generating a numerical value s between max (0,1-Factor) and (1+ Factor) as a scaling Factor of the saturation, and multiplying the saturation of the original picture by the scaling Factor s to obtain an enhanced saturation value
Figure BDA0003369846580000035
Figure BDA0003369846580000036
(3.5.5) controlling the hue of the enhanced picture through the label: initializing a category set s, and adding the category into the set when the category contains the pictures collected by shooting the specimen. Before the image is subjected to hue enhancement, whether the image belongs to the category of the set s is judged, if yes, the image is subjected to hue enhancement, and if not, the image is not subjected to enhancement. The hue enhancement operation is as follows: a value s is randomly generated between max (0,1-Factor) and (1+ Factor) as a hue shift Factor. Before the hue enhancement is carried out on the picture, the hue value hue of the original picture is subjected to the shift operation on the hue ring according to the shift factor s to obtain the enhanced hue value
Figure BDA00033698465800000413
Figure BDA0003369846580000041
(4) Setting a training data set, and setting the enhanced picture and the corresponding label as the training data set;
(5) building a neural network model for insect classification;
based on the existing Resnet50 network, the whole feature extraction part of Resnet50 is divided into 5 stages, and a channel attention mechanism module is added after stage3, stage4 and stage5 respectively, wherein the working principle of the channel attention mechanism module is as follows: the output characteristic I of each stage is a three-dimensional matrix of c x w x h, and each element in the matrix is marked as Ii,j,kWherein i ∈ [1, c ]],j∈[1,w],k∈[1,h]C is the number of channels, w and h are width and height;
the channel attention mechanism module adds an attention weight W with the length c to the output characteristic I to obtain the output characteristic of the channel attention mechanism
Figure BDA0003369846580000042
The values of the respective elements thereof are noted
Figure BDA0003369846580000043
Wherein, wiIs a weight coefficient;
after passing through stage5 and a channel attention mechanism module, adding an improved bilinear pooling layer, wherein the specific working principle is as follows: will output the characteristics
Figure BDA0003369846580000044
Expanded into a two-dimensional vector of length c × (w × h), and is denoted as
Figure BDA0003369846580000045
Then to
Figure BDA0003369846580000046
Carrying out bilinear fusion to obtain the final output characteristics
Figure BDA0003369846580000047
Finally, the output characteristics are further processed
Figure BDA0003369846580000048
Spread out as a one-dimensional vector of length c × c, noted
Figure BDA0003369846580000049
(6) Training a neural network model;
inputting the enhanced picture and the corresponding label into a neural network model, and predicting the category of the corresponding insect in the input picture by the neural network model
Figure BDA00033698465800000410
(6.1) in the training data set, randomly selecting b pictures and corresponding labels as input of the training of the current round, and inputting the b pictures and the corresponding labels into a training neural network model;
(6.2) extracting b images through a neural network modelOutput characteristics of the sheet
Figure BDA00033698465800000411
i is 1,2, …, b, where the output feature of each picture is a one-dimensional vector of length c × c;
(6.3) output characteristics of each picture
Figure BDA00033698465800000412
Dimension reduction processing is carried out through a full connection layer to obtain a classification judgment result p of each pictureiWherein the classification discrimination result piThe method is a one-dimensional matrix, the length of the one-dimensional matrix is the number M of insect species in a training data set, each item in a classification judgment result is the probability that a network judges the insect species to be the corresponding insect species, and finally, the maximum probability is taken as a prediction result and is recorded as the prediction result
Figure BDA0003369846580000051
And according to
Figure BDA0003369846580000052
Get corresponding
Figure BDA0003369846580000053
(6.4) calculating a loss function value L after the training of the current round is finished;
Figure BDA0003369846580000054
wherein, beta1The value of the super-parameter is 0.5 through several groups of comparative experiments,
Figure BDA0003369846580000055
the loss function value corresponding to the third-level label in the output characteristic,
Figure BDA0003369846580000056
for the loss function value, L, corresponding to the second level label in the output signatureCELoss function values corresponding to the classification judgment results;
Figure BDA0003369846580000057
Figure BDA0003369846580000058
Figure BDA0003369846580000059
wherein the content of the first and second substances,
Figure BDA00033698465800000510
showing that in the b input pictures of the training, when the three-level labels of any two pictures i and j are the same
Figure BDA00033698465800000511
Is 1, otherwise
Figure BDA00033698465800000512
Is 0, and delta is a preset threshold value; p is an output characteristic
Figure BDA00033698465800000513
Is p ∈ [1, c × c ∈ ]];
Figure BDA00033698465800000514
Wherein λ isiRepresenting the ratio of the number of pictures of the category to which the ith input picture belongs to the b input pictures in the current round; y isFor the discrimination coefficient, y is set when the real label value of the ith input picture is equal to τIs 1, otherwise is 0; p is a radical ofIs the classification discrimination result piThe value of the τ -th element;
(6.5) after the training of the current round is finished, performing back propagation on the loss function value of the current round by a gradient descent method so as to update network parameters, and then returning to the step (6.1) to perform the next round of training until the network converges, thereby obtaining a trained neural network model;
(7) real-time classification of insect pictures;
and zooming the fine-grained insect picture to be detected to the size same as the training data through picture zooming operation, and inputting the fine-grained insect picture to the trained neural network, so that the insect in the picture is directly output to belong to the category.
The invention aims to realize the following steps:
the invention relates to a method for classifying fine-grained insect images, which comprises the steps of collecting fine-grained pictures of different forms of different types of insects, deleting repeated, fuzzy and overexposed low-quality pictures in a manual screening mode, enabling the remaining pictures to meet the characteristics of large intra-class difference and small inter-class difference, and then establishing picture labels and performing enhancement processing; and establishing a neural network model for insect classification and training, and finally, classifying and detecting the fine-grained insect picture to be detected through the trained neural network model, so that the insects in the picture are directly output to belong to the category.
Meanwhile, the fine-grained insect image classification method provided by the invention also has the following beneficial effects:
(1) through the label control hue enhancement method, the color characteristics can be learned through the network, and part of categories can be distinguished without using the color characteristics;
(2) the network can extract visual features of different levels by embedding the channel attention modules in different stages of the network, and simultaneously, high-order fusion is carried out between different features by a bilinear pooling technology so as to generate a large amount of fine-grained features which are more effective for classifying fine-grained images;
(3) and using the 3-level label and combining the loss function of the distance measurement to constrain the classification training process, so that the network can better distinguish the difference of fine-grained classes, expand the distance between dissimilar large classes, and limit the classification error of the sample in the fine-grained classes.
Drawings
FIG. 1 is a flow chart of a fine-grained insect image classification method according to the present invention;
FIG. 2 is a schematic representation of different forms of different species of insects;
FIG. 3 is a map of the mapping between three layers of tags of an insect;
FIG. 4 is a schematic diagram of random rotation of a picture;
FIG. 5 is a schematic diagram of a random flip of a picture;
FIG. 6 is a basic architecture diagram of a neural network model for insect classification;
FIG. 7 is a diagram of the neural network model architecture after improvement of the present invention;
FIG. 8 is a graph of the evaluation of the results of three network classifications;
FIG. 9 is a graph of the results of an effectiveness versus location selection experiment for the SE module;
FIG. 10 is a graph of the results of a validation experiment for bilinear pooling;
FIG. 11 is a graph comparing performance of different network architectures;
FIG. 12 is a comparison graph of classification performance incorporating multi-label supervision constraints;
FIG. 13 is a graph of performance comparison with the introduction of metric loss;
FIG. 14 is a performance comparison graph of distance metric comparison;
FIG. 15 is a performance comparison graph of threshold selection comparisons;
figure 16 is a graph of performance versus loss of several other distance metrics.
Detailed Description
The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.
Examples
FIG. 1 is a flowchart of a fine-grained insect image classification method according to the present invention.
In this embodiment, as shown in fig. 1, the present invention provides a method for classifying fine-grained insect images, including the following steps:
s1, image acquisition and preprocessing;
in the embodiment, fine-grained pictures of different forms of different types of insects are collected in a common internet search platform and a special insect database website by field outdoor scene shooting, specimen shooting and combining a crawler technology; wherein, the position and angle of the camera can be adjusted to obtain the pictures containing the visual contents of different parts of the insect, as shown in fig. 2. In addition, because the morphological structure of the insect can be changed, such as different curling forms of soft insect larvae and the open and closed forms of adult insect wings, in the field shooting process, pictures of different forms of the same type of insect are collected by prolonging the observation time of the insect.
Through manual screening, repeated pictures collected in different platforms and pictures which are different in hash value and extremely similar in content and are generated by continuous shooting of the same insect in the field shooting process are eliminated. In addition, those low quality pictures that are also blurred, overexposed, etc. due to the shooting problem are eliminated. Therefore, the remained 3719 fine-grained insect images of 100 classes meet the characteristics of large intra-class difference and small inter-class difference.
S2, establishing a picture label;
each picture is labeled with three layers of category labels, and is marked as (y)1,y2,y3) Wherein the first layer class label y1Denotes the "order" of the insect; second layer category label y2The insect category with similar visual characteristics under the same eye is represented, and the range is equal to a certain suborder, a certain family and a certain subfamily under the label 'eye'; third layer category label y3Fine granularity categories of "genus" and "species"; the first layer of tags has 15 types, the second layer of tags has 34 types, and the third layer of tags has 100 types. After the three-layer category label is established, each picture sample corresponds to a triple label (y)1,y2,y3) As shown in fig. 3, a tree-like mapping relationship exists between the three layers of tags.
S3, picture enhancement processing;
s3.1, zooming the picture: sampling the length and width of each picture to a fixed pixel size by a bilinear interpolation method;
s3.2, randomly rotating the picture: setting a twiddle factor; randomly sampling a value in [ -factor, factor ] as a rotation angle of the picture and controlling the picture to rotate, the random rotation is shown in fig. 4, wherein when the rotation angle is not 90 ° or 180 °, the non-pixel area left during the rotation operation is filled with black;
s3.3, randomly turning the picture horizontally or vertically: setting a probability factor p; randomly generating a probability value in [0, 1], if the probability value is less than p, carrying out random horizontal or vertical turning operation on the picture, otherwise, not turning the picture; wherein, the random flipping example is shown in FIG. 5,
s3.4, randomly cutting pictures: firstly, amplifying the picture to be 1.25 times of the original picture, and then randomly cutting an area with the size of the network input size in the amplified picture;
s3.5, carrying out color dithering processing on the picture;
the color dithering method is to change the basic attributes of the original image, namely four attributes of brightness, contrast, saturation and hue; we describe the specific process of color dithering process as follows:
s3.5.1, setting a jitter Factor;
s3.5.2, enhancing the brightness of the picture: randomly generating a numerical value s between max (0,1-Factor) and (1+ Factor) as a scaling Factor of the brightness, and multiplying the brightness value bright of the original picture by the scaling Factor s to obtain an enhanced brightness value
Figure BDA0003369846580000081
Figure BDA0003369846580000082
S3.5.3, contrast of enhanced picture: randomly generating a value s between max (0,1-Factor) and (1+ Factor) as a pairThe contrast contast of the original picture is multiplied by a scaling factor s to obtain an enhanced contrast value
Figure BDA0003369846580000083
Figure BDA0003369846580000091
S3.5.4, enhancing the saturation of the picture: randomly generating a numerical value s between max (0,1-Factor) and (1+ Factor) as a scaling Factor of the saturation, and multiplying the saturation of the original picture by the scaling Factor s to obtain an enhanced saturation value
Figure BDA0003369846580000092
Figure BDA0003369846580000093
S3.5.5, control the hue of the enhanced picture by the label: initializing a category set s, and adding the category into the set when the category contains the pictures collected by shooting the specimen. Before the image is subjected to hue enhancement, whether the image belongs to the category of the set s is judged, if yes, the image is subjected to hue enhancement, and if not, the image is not subjected to enhancement. The hue enhancement operation is as follows: a value s is randomly generated between max (0,1-Factor) and (1+ Factor) as a hue shift Factor. Before the hue enhancement is carried out on the picture, the hue value hue of the original picture is subjected to the shift operation on the hue ring according to the shift factor s to obtain the enhanced hue value
Figure BDA0003369846580000094
Figure BDA0003369846580000095
S4, setting a training data set, and setting the enhanced picture and the corresponding label as the training data set;
s5, building a neural network model for insect classification;
in this embodiment, the specific architecture of the neural network model for insect classification is shown in fig. 6, where f (-) is the neural network classification model, we have removed the last full connectivity layer and loss function in fig. 6 where the L1 loss function is the classification function, i.e., the cross entropy loss function, modified herein as the cross entropy loss function of online weight statistics. After the class probability distribution finally output by the network, the class distribution characteristic dimension is consistent with the class number. And the L2 loss function is a distance measurement loss function which is connected with the characteristics output by the network characteristic extraction part, and the dimension of the loss function is consistent with the network output characteristics. The distance measurement loss function can calculate the distance between two features, and the parameters extracted from the network features are optimized through the loss function, so as to obtain the network structure shown in fig. 7, specifically: based on the existing Resnet50 network, the whole feature extraction part of Resnet50 is divided into 5 stages, and a channel attention mechanism module is added after stage3, stage4 and stage5 respectively, wherein the working principle of the channel attention mechanism module is as follows: the output characteristic I of each stage is a three-dimensional matrix of c x w x h, and each element in the matrix is marked as Ii,j,kWherein i ∈ [1, c ]],j∈[1,w],k∈[1,h]C is the number of channels, w and h are width and height;
the channel attention mechanism module adds an attention weight W with the length c to the output characteristic I to obtain the output characteristic of the channel attention mechanism
Figure BDA0003369846580000096
The values of the respective elements thereof are noted
Figure BDA0003369846580000097
Wherein, wiIs a weight coefficient;
after passing through stage5 and a channel attention mechanism module, adding an improved bilinear pooling layer, wherein the specific working principle is as follows: will output the characteristics
Figure BDA0003369846580000101
Expanded into a two-dimensional vector of length c × (w × h), and is denoted as
Figure BDA0003369846580000102
Then to
Figure BDA0003369846580000103
Carrying out bilinear fusion to obtain the final output characteristics
Figure BDA0003369846580000104
Finally, the output characteristics are further processed
Figure BDA0003369846580000105
Spread out as a one-dimensional vector of length c × c, noted
Figure BDA0003369846580000106
In this embodiment, since the final feature dimension of ResNet50 is 2048 dimensions, and after bilinear pooling, 4194304-dimensional features are generated, and if the full-link layer output 100 classification is reconstructed, a full-link layer network with 400M parameters is generated, so that the dimension of the final feature is reduced to 512 dimensions by using a convolutional layer with the size of 2048 by 1 by 512, and the parameter quantity of the final full-link layer is reduced by 16 times.
S6, training a neural network model;
inputting the enhanced picture and the corresponding label into a neural network model, and predicting the category of the corresponding insect in the input picture by the neural network model
Figure BDA0003369846580000107
S6.1, randomly selecting 20 pictures and corresponding labels from the training data set as input of the training of the current round, and inputting the pictures and the corresponding labels into a training neural network model;
s6.2, extracting output characteristics of b pictures through a neural network model
Figure BDA0003369846580000108
i is 1,2, …, b, where the output feature of each picture is a one-dimensional vector of length c × c;
s6.3, outputting characteristics of each picture
Figure BDA0003369846580000109
Dimension reduction processing is carried out through a full connection layer to obtain a classification judgment result p of each pictureiWherein the classification discrimination result piThe method is a one-dimensional matrix, the length of the one-dimensional matrix is the number M of insect species in a training data set, each item in a classification judgment result is the probability that a network judges the insect species to be the corresponding insect species, and finally, the maximum probability is taken as a prediction result and is recorded as the prediction result
Figure BDA00033698465800001010
And according to
Figure BDA00033698465800001011
Get corresponding
Figure BDA00033698465800001012
S6.4, calculating a loss function value L after the training of the current round is finished;
Figure BDA00033698465800001013
wherein, beta1The value of the super-parameter is 0.5 through several groups of comparative experiments,
Figure BDA00033698465800001014
the loss function value corresponding to the third-level label in the output characteristic,
Figure BDA00033698465800001015
for the loss function value, L, corresponding to the second level label in the output signatureCELoss function values corresponding to the classification judgment results;
Figure BDA00033698465800001016
Figure BDA0003369846580000111
Figure BDA0003369846580000112
wherein the content of the first and second substances,
Figure BDA0003369846580000113
showing that in the b input pictures of the training, when the three-level labels of any two pictures i and j are the same
Figure BDA0003369846580000114
Is 1, otherwise
Figure BDA0003369846580000115
Is 0, and delta is a preset threshold value; p is an output characteristic
Figure BDA0003369846580000116
Is p ∈ [1, c × c ∈ ]];
Figure BDA0003369846580000117
Wherein λ isiRepresenting the ratio of the number of pictures of the category to which the ith input picture belongs to the b input pictures in the current round; y isFor the discrimination coefficient, y is set when the real label value of the ith input picture is equal to τIs 1, otherwise is 0; p is a radical ofIs the classification discrimination result piThe value of the τ -th element;
s6.5, after the training of the current round is finished, performing back propagation on the loss function value of the current round by a gradient descent method so as to update network parameters, and then returning to the step S6.1 to perform the next round of training until the network converges so as to obtain a trained neural network model;
s7, real-time classification of insect pictures;
and zooming the fine-grained insect picture to be detected to the size same as the training data through picture zooming operation, and inputting the fine-grained insect picture to the trained neural network, so that the insect in the picture is directly output to belong to the category.
Experiment of
Numerous comparative and ablative experiments are performed to ensure the effectiveness of each part of the network.
Network selection: as can be seen from fig. 8, ResNet50 and inclusion v3 achieve good performance on the task herein. We used the two values of acc and F1 as evaluation indexes, and although the result of ResNet50 is slightly worse than inclusion v3, the difference is not large, and the difference in thousandths is easily smoothed by the randomness of the deep learning process. In addition, as the infrastructure of the ResNet50 network is simpler, the configuration logic of the whole network is simpler, and the change on the infrastructure is easier.
Effectiveness and location selection experiments for SE module: as can be seen from fig. 9, the effect of adding the attention mechanism to stage3, stage4, and stage5 is the best in the experimental conditions of this embodiment, and the experimental effect shows that the channel attention mechanism of the middle-high level features can actually affect the learning process of the features more deeply under the condition of less data volume, and achieve better classification results.
Effectiveness test of bilinear pooling: as shown in FIG. 10, the result is significantly improved by introducing the bioliner pooling method, the accuracy is improved by 1.6 percentage points, and the F1 score is improved by 1 percentage point. The method is explained as an original paper that the method can combine the position characteristic with the content characteristic, thereby being helpful to fine-grained classification. As understood and analyzed herein, the performance of the bilinear posing operation is only manifested when there are more classes to classify, when there are fewer classes to classify, there is no need for as many effective distinguishing features between classes, and when the number of classes to classify increases, the number of effective features needed to completely distinguish all classes increases at a geometric rate. Because the characteristics which finally participate in classification in the original ResNet50 network are only 2048-dimensional, after the outer product operation, the characteristics are subjected to second-order fusion, the characteristic space is increased, and the task of insect classification, which needs a large number of characteristics, is further satisfied.
A comparison structure of performances of different network structures is shown in fig. 11, and the evaluation indexes of the present invention shown in the last column of the figure are obviously superior to the performances of the other network structures.
The acc2-3 is used for the basic model to carry out performance evaluation: as shown in fig. 12, it is clear that the model has a poor accuracy for upper-level classification of insects when no multi-label supervised constraint framework is introduced.
Experiments were introduced to measure loss. As shown in fig. 13, introducing the L + change 3 loss improved the 3-level classification accuracy performance of 6 and 9 thousandths in the two structures, respectively, but the 2-level classification accuracy in the wrong sample set was slightly decreased. The introduction of the L-change 2 loss resulted in a slight 3-class classification accuracy improvement in both structures, respectively, but significantly improved the accuracy of 2-class classification in the wrong sample set, with an improvement of about 30 percentage points for ResNet50 and about 20 percentage points for ResNet50+ SE structure 2. When the two loss functions are combined, the performance of the model is obviously improved on two evaluation indexes.
Distance metric comparison experiment: as shown in FIG. 14, for the L-change 2 loss function, the use of the L1 distance can better improve the performance of acc2-3, and for the L + change 3 loss function, the use of the L1 distance can better improve the performance of acc 3. In addition, because the two loss functions are optimized for the features at the same position in the network, the distance measurement functions in the same form can limit the feature expression in the same feature space, which is better for the training process.
Threshold selection experiment: as shown in fig. 15, the best effect is achieved when the threshold Δ of the L1 distance between L-change 2 and L + change 3 is 0.7 and 0.2, respectively.
In addition, the invention also verifies the invalidity and the inhibitability of other distance measurement losses. As shown in fig. 16, the fine-grained classification results of the 3-level labels can be suppressed to different degrees by 4 kinds of losses. Wherein the loss of L-hinge3 has no obvious influence on the result, the L + hinge2 inhibits the 3-grade fine-grained classification by 2 percentage points, and the L + hinge1 and the L-hinge1 have larger inhibition influence on the experimental result. It can be analyzed that L + change 2 requires feature clustering within class 2, while class 2 classification is similar to conventional image classification, which results in the web learning more conventional discriminative features, while fine-grained classification of class 3 labels requires the web learning as fine-grained discriminative features as possible, which results in the performance of fine-grained classification being suppressed. In addition, the huge inhibition of the level 1 label constraint on the network performance is mainly reflected in that the feature space required by the level 1 classification is not consistent with the level 3 classification, the combination of the feature space and the level 3 classification is similar to the joint learning of two mutually exclusive tasks, and the effect of the level 3 classification is influenced by the network in order to learn more level 1 classification features.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.

Claims (1)

1. A fine-grained insect image classification method is characterized by comprising the following steps:
(1) acquiring and preprocessing an image;
fine-grained pictures of different forms of different types of insects are collected, and repeated, fuzzy and overexposed low-quality pictures are deleted in a manual screening mode, so that the residual pictures meet the characteristics of large intra-class difference and small inter-class difference;
(2) establishing a picture label;
each picture is labeled with three layers of category labels, and is marked as (y)1,y2,y3) Tree mapping relation exists among the three layers of category labels; wherein the first layer category label y1Indicating the "order" of the insect, second-level category label y2Indicate the same viewInsect categories with similar visual characteristics, third-level category label y3Representing a fine-grained category of insects;
(3) image enhancement processing;
(3.1) picture zooming: sampling the length and width of each picture to a fixed pixel size by a bilinear interpolation method;
(3.2) picture random rotation: setting a twiddle factor; randomly sampling a value in [ -factor, factor ] as a rotation angle of the picture and controlling the picture to rotate, wherein when the rotation angle is not 90 ° or 180 °, a pixel-free area left during the rotation operation is filled with black;
(3.3) randomly turning pictures horizontally or vertically: setting a probability factor p; randomly generating a probability value in [0, 1], if the probability value is less than p, carrying out random horizontal or vertical turning operation on the picture, otherwise, not turning the picture;
(3.4) picture random cropping: firstly, amplifying the picture to be 1.25 times of the original picture, and then randomly cutting an area with the size of the network input size in the amplified picture;
(3.5) carrying out color dithering processing on the picture;
(3.5.1) setting a jitter Factor;
(3.5.2), enhancing the brightness of the picture: randomly generating a numerical value s between max (0,1-Factor) and (1+ Factor) as a scaling Factor of the brightness, and multiplying the brightness value bright of the original picture by the scaling Factor s to obtain an enhanced brightness value
Figure FDA0003369846570000011
Figure FDA0003369846570000012
(3.5.3), enhancing the contrast of the picture: randomly generating a numerical value s between max (0,1-Factor) and (1+ Factor) as a scaling Factor of the contrast, and multiplying the contrast of the original picture by the scaling Factor s to obtain an enhanced contrast value
Figure FDA0003369846570000021
Figure FDA0003369846570000022
(3.5.4), enhancing the saturation of the picture: randomly generating a numerical value s between max (0,1-Factor) and (1+ Factor) as a scaling Factor of the saturation, and multiplying the saturation of the original picture by the scaling Factor s to obtain an enhanced saturation value
Figure FDA0003369846570000023
Figure FDA0003369846570000024
(3.5.5) controlling the hue of the enhanced picture through the label: initializing a category set s, and adding the category into the set when the category contains the pictures collected by shooting the specimen. Before the image is subjected to hue enhancement, whether the image belongs to the category of the set s is judged, if yes, the image is subjected to hue enhancement, and if not, the image is not subjected to enhancement. The hue enhancement operation is as follows: a value s is randomly generated between max (0,1-Factor) and (1+ Factor) as a hue shift Factor. Before the hue enhancement is carried out on the picture, the hue value hue of the original picture is subjected to the shift operation on the hue ring according to the shift factor s to obtain the enhanced hue value
Figure FDA0003369846570000025
Figure FDA0003369846570000026
(4) Setting a training data set, and setting the enhanced picture and the corresponding label as the training data set;
(5) building a neural network model for insect classification;
based on the existing Resnet50 network, the whole feature extraction part of Resnet50 is divided into 5 stages, and a channel attention mechanism module is added after stage3, stage4 and stage5 respectively, wherein the working principle of the channel attention mechanism module is as follows: the output characteristic I of each stage is a three-dimensional matrix of c x w x h, and each element in the matrix is marked as Ii,j,kWherein i ∈ [1, c ]],j∈[1,w],k∈[1,h]C is the number of channels, w and h are width and height;
the channel attention mechanism module adds an attention weight W with the length c to the output characteristic I to obtain the output characteristic of the channel attention mechanism
Figure FDA0003369846570000027
The values of the respective elements thereof are noted
Figure FDA0003369846570000028
Wherein, wiIs a weight coefficient;
after passing through stage5 and a channel attention mechanism module, adding an improved bilinear pooling layer, wherein the specific working principle is as follows: will output the characteristics
Figure FDA0003369846570000029
Expanded into a two-dimensional vector of length c × (w × h), and is denoted as
Figure FDA00033698465700000210
Then to
Figure FDA00033698465700000211
Carrying out bilinear fusion to obtain the final output characteristics
Figure FDA00033698465700000212
Finally, the output characteristics are further processed
Figure FDA00033698465700000213
Spread out as a one-dimensional vector of length c × c, noted
Figure FDA00033698465700000214
(6) Training a neural network model;
inputting the enhanced picture and the corresponding label into a neural network model, and predicting the category of the corresponding insect in the input picture by the neural network model
Figure FDA0003369846570000031
(6.1) in the training data set, randomly selecting b pictures and corresponding labels as input of the training of the current round, and inputting the b pictures and the corresponding labels into a training neural network model;
(6.2) extracting output characteristics of the b pictures through a neural network model
Figure FDA0003369846570000032
Wherein, the output characteristic of each picture is a one-dimensional vector with the length of c multiplied by c;
(6.3) output characteristics of each picture
Figure FDA0003369846570000033
Dimension reduction processing is carried out through a full connection layer to obtain a classification judgment result p of each pictureiWherein the classification discrimination result piThe method is a one-dimensional matrix, the length of the one-dimensional matrix is the number M of insect species in a training data set, each item in a classification judgment result is the probability that a network judges the insect species to be the corresponding insect species, and finally, the maximum probability is taken as a prediction result and is recorded as the prediction result
Figure FDA0003369846570000034
And according to
Figure FDA0003369846570000035
Get corresponding
Figure FDA0003369846570000036
(6.4) calculating a loss function value L after the training of the current round is finished;
Figure FDA00033698465700000317
wherein, beta1The value of the super-parameter is 0.5 through several groups of comparative experiments,
Figure FDA0003369846570000037
the loss function value corresponding to the third-level label in the output characteristic,
Figure FDA0003369846570000038
for the loss function value, L, corresponding to the second level label in the output signatureCELoss function values corresponding to the classification judgment results;
Figure FDA0003369846570000039
Figure FDA00033698465700000310
Figure FDA00033698465700000311
wherein the content of the first and second substances,
Figure FDA00033698465700000312
showing that in the b input pictures of the training, when the three-level labels of any two pictures i and j are the same
Figure FDA00033698465700000313
Is 1Otherwise
Figure FDA00033698465700000314
Is 0, and delta is a preset threshold value; p is an output characteristic
Figure FDA00033698465700000315
Is p ∈ [1, c × c ∈ ]];
Figure FDA00033698465700000316
Wherein λ isiRepresenting the ratio of the number of pictures of the category to which the ith input picture belongs to the b input pictures in the current round; y isFor the discrimination coefficient, y is set when the real label value of the ith input picture is equal to τIs 1, otherwise is 0; p is a radical ofIs the classification discrimination result piThe value of the τ -th element;
(6.5) after the training of the current round is finished, performing back propagation on the loss function value of the current round by a gradient descent method so as to update network parameters, and then returning to the step (6.1) to perform the next round of training until the network converges, thereby obtaining a trained neural network model;
(6) real-time classification of insect pictures;
and zooming the fine-grained insect picture to be detected to the size same as the training data through picture zooming operation, and inputting the fine-grained insect picture to the trained neural network, so that the insect in the picture is directly output to belong to the category.
CN202111395529.4A 2021-11-23 2021-11-23 Fine-grained insect image classification method Pending CN114187183A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111395529.4A CN114187183A (en) 2021-11-23 2021-11-23 Fine-grained insect image classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111395529.4A CN114187183A (en) 2021-11-23 2021-11-23 Fine-grained insect image classification method

Publications (1)

Publication Number Publication Date
CN114187183A true CN114187183A (en) 2022-03-15

Family

ID=80541280

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111395529.4A Pending CN114187183A (en) 2021-11-23 2021-11-23 Fine-grained insect image classification method

Country Status (1)

Country Link
CN (1) CN114187183A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116453032A (en) * 2023-06-16 2023-07-18 福建农林大学 Marine ecology detecting system
CN117237814A (en) * 2023-11-14 2023-12-15 四川农业大学 Large-scale orchard insect condition monitoring method based on attention mechanism optimization

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116453032A (en) * 2023-06-16 2023-07-18 福建农林大学 Marine ecology detecting system
CN116453032B (en) * 2023-06-16 2023-08-25 福建农林大学 Marine ecology detecting system
CN117237814A (en) * 2023-11-14 2023-12-15 四川农业大学 Large-scale orchard insect condition monitoring method based on attention mechanism optimization
CN117237814B (en) * 2023-11-14 2024-02-20 四川农业大学 Large-scale orchard insect condition monitoring method based on attention mechanism optimization

Similar Documents

Publication Publication Date Title
CN106920243B (en) Improved ceramic material part sequence image segmentation method of full convolution neural network
CN109886066B (en) Rapid target detection method based on multi-scale and multi-layer feature fusion
Wang et al. A random forest classifier based on pixel comparison features for urban LiDAR data
CN107766933B (en) Visualization method for explaining convolutional neural network
CN112734775B (en) Image labeling, image semantic segmentation and model training methods and devices
Solaiman et al. Multisensor data fusion using fuzzy concepts: application to land-cover classification using ERS-1/JERS-1 SAR composites
CN108052966A (en) Remote sensing images scene based on convolutional neural networks automatically extracts and sorting technique
Zhang et al. PSO and K-means-based semantic segmentation toward agricultural products
Liu et al. Remote sensing image change detection based on information transmission and attention mechanism
CN108492298B (en) Multispectral image change detection method based on generation countermeasure network
CN109840483B (en) Landslide crack detection and identification method and device
CN104850822B (en) Leaf identification method under simple background based on multi-feature fusion
Shahab et al. How salient is scene text?
CN109886161A (en) A kind of road traffic index identification method based on possibility cluster and convolutional neural networks
CN108229550A (en) A kind of cloud atlas sorting technique that network of forests network is cascaded based on more granularities
CN106408030A (en) SAR image classification method based on middle lamella semantic attribute and convolution neural network
CN114187183A (en) Fine-grained insect image classification method
CN110211127B (en) Image partition method based on bicoherence network
CN106022254A (en) Image recognition technology
Patil et al. Enhanced radial basis function neural network for tomato plant disease leaf image segmentation
Su et al. LodgeNet: Improved rice lodging recognition using semantic segmentation of UAV high-resolution remote sensing images
Chen et al. Agricultural remote sensing image cultivated land extraction technology based on deep learning
Ju et al. Classification of jujube defects in small data sets based on transfer learning
Saba et al. Optimization of multiresolution segmentation for object-oriented road detection from high-resolution images
Zhao et al. Butterfly recognition based on faster R-CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination