CN114187183A - Fine-grained insect image classification method - Google Patents
Fine-grained insect image classification method Download PDFInfo
- Publication number
- CN114187183A CN114187183A CN202111395529.4A CN202111395529A CN114187183A CN 114187183 A CN114187183 A CN 114187183A CN 202111395529 A CN202111395529 A CN 202111395529A CN 114187183 A CN114187183 A CN 114187183A
- Authority
- CN
- China
- Prior art keywords
- picture
- factor
- insect
- value
- pictures
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 241000238631 Hexapoda Species 0.000 title claims abstract description 95
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000012549 training Methods 0.000 claims abstract description 40
- 238000003062 neural network model Methods 0.000 claims abstract description 27
- 238000012545 processing Methods 0.000 claims abstract description 11
- 238000012216 screening Methods 0.000 claims abstract description 5
- 230000007246 mechanism Effects 0.000 claims description 16
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000002474 experimental method Methods 0.000 claims description 10
- 238000011176 pooling Methods 0.000 claims description 9
- 230000002708 enhancing effect Effects 0.000 claims description 8
- 230000004927 fusion Effects 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 230000000007 visual effect Effects 0.000 claims description 5
- 230000000052 comparative effect Effects 0.000 claims description 4
- 238000005520 cutting process Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 4
- 241000287196 Asthenes Species 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 3
- 230000006870 function Effects 0.000 description 27
- 230000008569 process Effects 0.000 description 9
- 230000008859 change Effects 0.000 description 6
- 241000894007 species Species 0.000 description 6
- 238000011156 evaluation Methods 0.000 description 5
- 238000005259 measurement Methods 0.000 description 5
- 238000007635 classification algorithm Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 241000607479 Yersinia pestis Species 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000005764 inhibitory process Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 230000000877 morphologic effect Effects 0.000 description 2
- 101100268670 Caenorhabditis elegans acc-3 gene Proteins 0.000 description 1
- 241000282414 Homo sapiens Species 0.000 description 1
- 241000500891 Insecta Species 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012418 validation experiment Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/60—Rotation of whole images or parts thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a method for classifying fine-grained insect images, which comprises the steps of collecting fine-grained images of different forms of different types of insects, deleting repeated, fuzzy and overexposed low-quality images in a manual screening mode, enabling the remaining images to meet the characteristics of large intra-class difference and small inter-class difference, and then establishing image labels and performing enhancement processing; and establishing a neural network model for insect classification and training, and finally, classifying and detecting the fine-grained insect picture to be detected through the trained neural network model, so that the insects in the picture are directly output to belong to the category.
Description
Technical Field
The invention belongs to the technical field of image classification in the field of computers, and particularly relates to a fine-grained insect image classification method.
Background
The classification of fine-grained images is a very important problem in the computer field and can be applied to many professional valuable application scenes. Although a great deal of deep learning application exists in the field of insect identification, fine-grained image classification algorithm is not well researched.
The fine-grained image classification is a computer vision task which is more difficult than the traditional image classification and has more application value in professional scenes, and the insect image classification based on deep learning has great significance in pest control of agriculture and forestry. The fine-grained image classification technology is applied to insect image recognition, insect species which are difficult to divide can be distinguished, the accuracy of insect classification is further improved, and therefore the practicability and reliability of insect image recognition in the actual production process are improved. However, the current research on fine-grained image classification algorithms in the field of insect image recognition is still insufficient and deep.
The fine-grained image classification algorithm mainly comprises a fine adjustment method based on a general image classification network, a positioning and identification combined method based on a strong supervision or weak supervision attention mechanism, a bilinear pooling method based on high-order feature fusion, a metric learning method and a transformer-based method.
The method mainly aims to solve the problems of small inter-class difference and large intra-class difference by positioning the discriminant region according to the Part-RCNN method based on strong supervision target detection positioning or the RA-CNN method based on weak supervision attention mechanism positioning. Even methods based on metric learning techniques appear to be independent of the location of the discriminative region, essentially suppressing extraneous features and finding key discriminative features by narrowing intra-class distances and expanding inter-class distances.
In biological taxonomy, the classification of species is a system hierarchical structure based on levels, and a fixed number of levels, namely, boundary, phylum, class, order, family, genus and species, are used, biological images are more and more similar in vision along with the descending of classification levels, and the classification difficulty gradually meets the requirement of so-called fine-grained classification in the field of computer vision. Insects recognized in the common sense of human beings generally belong to the class Insecta. Insects with great trait differences are generally divided according to 'meshes', and different insect categories are more and more similar as classification levels go down. In the process of labeling the collected insect images by the insect experts, specific labels of certain insect samples under the 'genus' or 'species' level can be identified according to the rich experience knowledge and the comparison of the insect maps. However, because insects are the most abundant animal types in organisms that can be contacted in daily life, the varieties are various, the shapes are extremely strange, even if the insects are experts in entomology who are experienced, the identification of the class of the specific insect sample, namely the genus and the species, cannot be realized, and only the identification of the class of the specific insect sample, namely the order, the suborder and the family can be realized.
Secondly, the insect classification algorithm is mainly used for identifying agricultural pests when being practically applied. In some cases, the category information required to determine whether an insect is a pest and the corresponding control strategy need not be refined to the "one" level, since insects in the same "family" or "genus" often have similar morphological aspects and traits. Then in this type of application scenario, a good algorithm should meet the following requirements: even if the correct fine-grained classification results are not available, the algorithm should limit the erroneous results to the correct upper-level classes. Finally, the supervision information used by the existing fine-grained image classification problems is a category label with the finest granularity level. However, in the field of fine-grained classification of insects, labeling of advanced tags is not very costly, and the difficulty of obtaining them is far weaker than that of bounding box information in a strong supervision method. Therefore, the performance of the fine-grained classification task should be effectively improved by adding high-level category information into supervised learning.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a fine-grained insect image classification method, which optimizes a fine-grained classification framework based on an improved weak supervision fine-grained classification model and a level multi-label constraint of metric learning and realizes accurate classification of insect images.
In order to achieve the above object, the present invention provides a method for classifying fine-grained insect images, comprising the steps of:
(1) acquiring and preprocessing an image;
fine-grained pictures of different forms of different types of insects are collected, and repeated, fuzzy and overexposed low-quality pictures are deleted in a manual screening mode, so that the residual pictures meet the characteristics of large intra-class difference and small inter-class difference;
(2) establishing a picture label;
each picture is labeled with three layers of category labels, and is marked as (y)1,y2,y3) Tree mapping relation exists among the three layers of category labels; wherein the first layer category label y1Indicating the "order" of the insect, second-level category label y2Indicating the same insect category with similar visual characteristics at the eye, and a third layer of category labels y3Representing a fine-grained category of insects;
(3) image enhancement processing;
(3.1) picture zooming: sampling the length and width of each picture to a fixed pixel size by a bilinear interpolation method;
(3.2) picture random rotation: setting a twiddle factor; randomly sampling a value in [ -factor, factor ] as a rotation angle of the picture and controlling the picture to rotate, wherein when the rotation angle is not 90 ° or 180 °, a pixel-free area left during the rotation operation is filled with black;
(3.3) randomly turning pictures horizontally or vertically: setting a probability factor p; randomly generating a probability value in [0, 1], if the probability value is less than p, carrying out random horizontal or vertical turning operation on the picture, otherwise, not turning the picture;
(3.4) picture random cropping: firstly, amplifying the picture to be 1.25 times of the original picture, and then randomly cutting an area with the size of the network input size in the amplified picture;
(3.5) carrying out color dithering processing on the picture;
(3.5.1) setting a jitter Factor;
(3.5.2), enhancing the brightness of the picture: randomly generating a numerical value s between max (0,1-Factor) and (1+ Factor) as a scaling Factor of the brightness, and multiplying the brightness value bright of the original picture by the scaling Factor s to obtain an enhanced brightness value
(3.5.3), enhancing the contrast of the picture: randomly generating a numerical value s between max (0,1-Factor) and (1+ Factor) as a scaling Factor of the contrast, and multiplying the contrast of the original picture by the scaling Factor s to obtain an enhanced contrast value
(3.5.4), enhancing the saturation of the picture: randomly generating a numerical value s between max (0,1-Factor) and (1+ Factor) as a scaling Factor of the saturation, and multiplying the saturation of the original picture by the scaling Factor s to obtain an enhanced saturation value
(3.5.5) controlling the hue of the enhanced picture through the label: initializing a category set s, and adding the category into the set when the category contains the pictures collected by shooting the specimen. Before the image is subjected to hue enhancement, whether the image belongs to the category of the set s is judged, if yes, the image is subjected to hue enhancement, and if not, the image is not subjected to enhancement. The hue enhancement operation is as follows: a value s is randomly generated between max (0,1-Factor) and (1+ Factor) as a hue shift Factor. Before the hue enhancement is carried out on the picture, the hue value hue of the original picture is subjected to the shift operation on the hue ring according to the shift factor s to obtain the enhanced hue value
(4) Setting a training data set, and setting the enhanced picture and the corresponding label as the training data set;
(5) building a neural network model for insect classification;
based on the existing Resnet50 network, the whole feature extraction part of Resnet50 is divided into 5 stages, and a channel attention mechanism module is added after stage3, stage4 and stage5 respectively, wherein the working principle of the channel attention mechanism module is as follows: the output characteristic I of each stage is a three-dimensional matrix of c x w x h, and each element in the matrix is marked as Ii,j,kWherein i ∈ [1, c ]],j∈[1,w],k∈[1,h]C is the number of channels, w and h are width and height;
the channel attention mechanism module adds an attention weight W with the length c to the output characteristic I to obtain the output characteristic of the channel attention mechanismThe values of the respective elements thereof are notedWherein, wiIs a weight coefficient;
after passing through stage5 and a channel attention mechanism module, adding an improved bilinear pooling layer, wherein the specific working principle is as follows: will output the characteristicsExpanded into a two-dimensional vector of length c × (w × h), and is denoted asThen toCarrying out bilinear fusion to obtain the final output characteristicsFinally, the output characteristics are further processedSpread out as a one-dimensional vector of length c × c, noted
(6) Training a neural network model;
inputting the enhanced picture and the corresponding label into a neural network model, and predicting the category of the corresponding insect in the input picture by the neural network model
(6.1) in the training data set, randomly selecting b pictures and corresponding labels as input of the training of the current round, and inputting the b pictures and the corresponding labels into a training neural network model;
(6.2) extracting b images through a neural network modelOutput characteristics of the sheeti is 1,2, …, b, where the output feature of each picture is a one-dimensional vector of length c × c;
(6.3) output characteristics of each pictureDimension reduction processing is carried out through a full connection layer to obtain a classification judgment result p of each pictureiWherein the classification discrimination result piThe method is a one-dimensional matrix, the length of the one-dimensional matrix is the number M of insect species in a training data set, each item in a classification judgment result is the probability that a network judges the insect species to be the corresponding insect species, and finally, the maximum probability is taken as a prediction result and is recorded as the prediction resultAnd according toGet corresponding
(6.4) calculating a loss function value L after the training of the current round is finished;
wherein, beta1The value of the super-parameter is 0.5 through several groups of comparative experiments,the loss function value corresponding to the third-level label in the output characteristic,for the loss function value, L, corresponding to the second level label in the output signatureCELoss function values corresponding to the classification judgment results;
wherein the content of the first and second substances,showing that in the b input pictures of the training, when the three-level labels of any two pictures i and j are the sameIs 1, otherwiseIs 0, and delta is a preset threshold value; p is an output characteristicIs p ∈ [1, c × c ∈ ]];
Wherein λ isiRepresenting the ratio of the number of pictures of the category to which the ith input picture belongs to the b input pictures in the current round; y isiτFor the discrimination coefficient, y is set when the real label value of the ith input picture is equal to τiτIs 1, otherwise is 0; p is a radical ofiτIs the classification discrimination result piThe value of the τ -th element;
(6.5) after the training of the current round is finished, performing back propagation on the loss function value of the current round by a gradient descent method so as to update network parameters, and then returning to the step (6.1) to perform the next round of training until the network converges, thereby obtaining a trained neural network model;
(7) real-time classification of insect pictures;
and zooming the fine-grained insect picture to be detected to the size same as the training data through picture zooming operation, and inputting the fine-grained insect picture to the trained neural network, so that the insect in the picture is directly output to belong to the category.
The invention aims to realize the following steps:
the invention relates to a method for classifying fine-grained insect images, which comprises the steps of collecting fine-grained pictures of different forms of different types of insects, deleting repeated, fuzzy and overexposed low-quality pictures in a manual screening mode, enabling the remaining pictures to meet the characteristics of large intra-class difference and small inter-class difference, and then establishing picture labels and performing enhancement processing; and establishing a neural network model for insect classification and training, and finally, classifying and detecting the fine-grained insect picture to be detected through the trained neural network model, so that the insects in the picture are directly output to belong to the category.
Meanwhile, the fine-grained insect image classification method provided by the invention also has the following beneficial effects:
(1) through the label control hue enhancement method, the color characteristics can be learned through the network, and part of categories can be distinguished without using the color characteristics;
(2) the network can extract visual features of different levels by embedding the channel attention modules in different stages of the network, and simultaneously, high-order fusion is carried out between different features by a bilinear pooling technology so as to generate a large amount of fine-grained features which are more effective for classifying fine-grained images;
(3) and using the 3-level label and combining the loss function of the distance measurement to constrain the classification training process, so that the network can better distinguish the difference of fine-grained classes, expand the distance between dissimilar large classes, and limit the classification error of the sample in the fine-grained classes.
Drawings
FIG. 1 is a flow chart of a fine-grained insect image classification method according to the present invention;
FIG. 2 is a schematic representation of different forms of different species of insects;
FIG. 3 is a map of the mapping between three layers of tags of an insect;
FIG. 4 is a schematic diagram of random rotation of a picture;
FIG. 5 is a schematic diagram of a random flip of a picture;
FIG. 6 is a basic architecture diagram of a neural network model for insect classification;
FIG. 7 is a diagram of the neural network model architecture after improvement of the present invention;
FIG. 8 is a graph of the evaluation of the results of three network classifications;
FIG. 9 is a graph of the results of an effectiveness versus location selection experiment for the SE module;
FIG. 10 is a graph of the results of a validation experiment for bilinear pooling;
FIG. 11 is a graph comparing performance of different network architectures;
FIG. 12 is a comparison graph of classification performance incorporating multi-label supervision constraints;
FIG. 13 is a graph of performance comparison with the introduction of metric loss;
FIG. 14 is a performance comparison graph of distance metric comparison;
FIG. 15 is a performance comparison graph of threshold selection comparisons;
figure 16 is a graph of performance versus loss of several other distance metrics.
Detailed Description
The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings. It is to be expressly noted that in the following description, a detailed description of known functions and designs will be omitted when it may obscure the subject matter of the present invention.
Examples
FIG. 1 is a flowchart of a fine-grained insect image classification method according to the present invention.
In this embodiment, as shown in fig. 1, the present invention provides a method for classifying fine-grained insect images, including the following steps:
s1, image acquisition and preprocessing;
in the embodiment, fine-grained pictures of different forms of different types of insects are collected in a common internet search platform and a special insect database website by field outdoor scene shooting, specimen shooting and combining a crawler technology; wherein, the position and angle of the camera can be adjusted to obtain the pictures containing the visual contents of different parts of the insect, as shown in fig. 2. In addition, because the morphological structure of the insect can be changed, such as different curling forms of soft insect larvae and the open and closed forms of adult insect wings, in the field shooting process, pictures of different forms of the same type of insect are collected by prolonging the observation time of the insect.
Through manual screening, repeated pictures collected in different platforms and pictures which are different in hash value and extremely similar in content and are generated by continuous shooting of the same insect in the field shooting process are eliminated. In addition, those low quality pictures that are also blurred, overexposed, etc. due to the shooting problem are eliminated. Therefore, the remained 3719 fine-grained insect images of 100 classes meet the characteristics of large intra-class difference and small inter-class difference.
S2, establishing a picture label;
each picture is labeled with three layers of category labels, and is marked as (y)1,y2,y3) Wherein the first layer class label y1Denotes the "order" of the insect; second layer category label y2The insect category with similar visual characteristics under the same eye is represented, and the range is equal to a certain suborder, a certain family and a certain subfamily under the label 'eye'; third layer category label y3Fine granularity categories of "genus" and "species"; the first layer of tags has 15 types, the second layer of tags has 34 types, and the third layer of tags has 100 types. After the three-layer category label is established, each picture sample corresponds to a triple label (y)1,y2,y3) As shown in fig. 3, a tree-like mapping relationship exists between the three layers of tags.
S3, picture enhancement processing;
s3.1, zooming the picture: sampling the length and width of each picture to a fixed pixel size by a bilinear interpolation method;
s3.2, randomly rotating the picture: setting a twiddle factor; randomly sampling a value in [ -factor, factor ] as a rotation angle of the picture and controlling the picture to rotate, the random rotation is shown in fig. 4, wherein when the rotation angle is not 90 ° or 180 °, the non-pixel area left during the rotation operation is filled with black;
s3.3, randomly turning the picture horizontally or vertically: setting a probability factor p; randomly generating a probability value in [0, 1], if the probability value is less than p, carrying out random horizontal or vertical turning operation on the picture, otherwise, not turning the picture; wherein, the random flipping example is shown in FIG. 5,
s3.4, randomly cutting pictures: firstly, amplifying the picture to be 1.25 times of the original picture, and then randomly cutting an area with the size of the network input size in the amplified picture;
s3.5, carrying out color dithering processing on the picture;
the color dithering method is to change the basic attributes of the original image, namely four attributes of brightness, contrast, saturation and hue; we describe the specific process of color dithering process as follows:
s3.5.1, setting a jitter Factor;
s3.5.2, enhancing the brightness of the picture: randomly generating a numerical value s between max (0,1-Factor) and (1+ Factor) as a scaling Factor of the brightness, and multiplying the brightness value bright of the original picture by the scaling Factor s to obtain an enhanced brightness value
S3.5.3, contrast of enhanced picture: randomly generating a value s between max (0,1-Factor) and (1+ Factor) as a pairThe contrast contast of the original picture is multiplied by a scaling factor s to obtain an enhanced contrast value
S3.5.4, enhancing the saturation of the picture: randomly generating a numerical value s between max (0,1-Factor) and (1+ Factor) as a scaling Factor of the saturation, and multiplying the saturation of the original picture by the scaling Factor s to obtain an enhanced saturation value
S3.5.5, control the hue of the enhanced picture by the label: initializing a category set s, and adding the category into the set when the category contains the pictures collected by shooting the specimen. Before the image is subjected to hue enhancement, whether the image belongs to the category of the set s is judged, if yes, the image is subjected to hue enhancement, and if not, the image is not subjected to enhancement. The hue enhancement operation is as follows: a value s is randomly generated between max (0,1-Factor) and (1+ Factor) as a hue shift Factor. Before the hue enhancement is carried out on the picture, the hue value hue of the original picture is subjected to the shift operation on the hue ring according to the shift factor s to obtain the enhanced hue value
S4, setting a training data set, and setting the enhanced picture and the corresponding label as the training data set;
s5, building a neural network model for insect classification;
in this embodiment, the specific architecture of the neural network model for insect classification is shown in fig. 6, where f (-) is the neural network classification model, we have removed the last full connectivity layer and loss function in fig. 6 where the L1 loss function is the classification function, i.e., the cross entropy loss function, modified herein as the cross entropy loss function of online weight statistics. After the class probability distribution finally output by the network, the class distribution characteristic dimension is consistent with the class number. And the L2 loss function is a distance measurement loss function which is connected with the characteristics output by the network characteristic extraction part, and the dimension of the loss function is consistent with the network output characteristics. The distance measurement loss function can calculate the distance between two features, and the parameters extracted from the network features are optimized through the loss function, so as to obtain the network structure shown in fig. 7, specifically: based on the existing Resnet50 network, the whole feature extraction part of Resnet50 is divided into 5 stages, and a channel attention mechanism module is added after stage3, stage4 and stage5 respectively, wherein the working principle of the channel attention mechanism module is as follows: the output characteristic I of each stage is a three-dimensional matrix of c x w x h, and each element in the matrix is marked as Ii,j,kWherein i ∈ [1, c ]],j∈[1,w],k∈[1,h]C is the number of channels, w and h are width and height;
the channel attention mechanism module adds an attention weight W with the length c to the output characteristic I to obtain the output characteristic of the channel attention mechanismThe values of the respective elements thereof are notedWherein, wiIs a weight coefficient;
after passing through stage5 and a channel attention mechanism module, adding an improved bilinear pooling layer, wherein the specific working principle is as follows: will output the characteristicsExpanded into a two-dimensional vector of length c × (w × h), and is denoted asThen toCarrying out bilinear fusion to obtain the final output characteristicsFinally, the output characteristics are further processedSpread out as a one-dimensional vector of length c × c, noted
In this embodiment, since the final feature dimension of ResNet50 is 2048 dimensions, and after bilinear pooling, 4194304-dimensional features are generated, and if the full-link layer output 100 classification is reconstructed, a full-link layer network with 400M parameters is generated, so that the dimension of the final feature is reduced to 512 dimensions by using a convolutional layer with the size of 2048 by 1 by 512, and the parameter quantity of the final full-link layer is reduced by 16 times.
S6, training a neural network model;
inputting the enhanced picture and the corresponding label into a neural network model, and predicting the category of the corresponding insect in the input picture by the neural network model
S6.1, randomly selecting 20 pictures and corresponding labels from the training data set as input of the training of the current round, and inputting the pictures and the corresponding labels into a training neural network model;
s6.2, extracting output characteristics of b pictures through a neural network modeli is 1,2, …, b, where the output feature of each picture is a one-dimensional vector of length c × c;
s6.3, outputting characteristics of each pictureDimension reduction processing is carried out through a full connection layer to obtain a classification judgment result p of each pictureiWherein the classification discrimination result piThe method is a one-dimensional matrix, the length of the one-dimensional matrix is the number M of insect species in a training data set, each item in a classification judgment result is the probability that a network judges the insect species to be the corresponding insect species, and finally, the maximum probability is taken as a prediction result and is recorded as the prediction resultAnd according toGet corresponding
S6.4, calculating a loss function value L after the training of the current round is finished;
wherein, beta1The value of the super-parameter is 0.5 through several groups of comparative experiments,the loss function value corresponding to the third-level label in the output characteristic,for the loss function value, L, corresponding to the second level label in the output signatureCELoss function values corresponding to the classification judgment results;
wherein the content of the first and second substances,showing that in the b input pictures of the training, when the three-level labels of any two pictures i and j are the sameIs 1, otherwiseIs 0, and delta is a preset threshold value; p is an output characteristicIs p ∈ [1, c × c ∈ ]];
Wherein λ isiRepresenting the ratio of the number of pictures of the category to which the ith input picture belongs to the b input pictures in the current round; y isiτFor the discrimination coefficient, y is set when the real label value of the ith input picture is equal to τiτIs 1, otherwise is 0; p is a radical ofiτIs the classification discrimination result piThe value of the τ -th element;
s6.5, after the training of the current round is finished, performing back propagation on the loss function value of the current round by a gradient descent method so as to update network parameters, and then returning to the step S6.1 to perform the next round of training until the network converges so as to obtain a trained neural network model;
s7, real-time classification of insect pictures;
and zooming the fine-grained insect picture to be detected to the size same as the training data through picture zooming operation, and inputting the fine-grained insect picture to the trained neural network, so that the insect in the picture is directly output to belong to the category.
Experiment of
Numerous comparative and ablative experiments are performed to ensure the effectiveness of each part of the network.
Network selection: as can be seen from fig. 8, ResNet50 and inclusion v3 achieve good performance on the task herein. We used the two values of acc and F1 as evaluation indexes, and although the result of ResNet50 is slightly worse than inclusion v3, the difference is not large, and the difference in thousandths is easily smoothed by the randomness of the deep learning process. In addition, as the infrastructure of the ResNet50 network is simpler, the configuration logic of the whole network is simpler, and the change on the infrastructure is easier.
Effectiveness and location selection experiments for SE module: as can be seen from fig. 9, the effect of adding the attention mechanism to stage3, stage4, and stage5 is the best in the experimental conditions of this embodiment, and the experimental effect shows that the channel attention mechanism of the middle-high level features can actually affect the learning process of the features more deeply under the condition of less data volume, and achieve better classification results.
Effectiveness test of bilinear pooling: as shown in FIG. 10, the result is significantly improved by introducing the bioliner pooling method, the accuracy is improved by 1.6 percentage points, and the F1 score is improved by 1 percentage point. The method is explained as an original paper that the method can combine the position characteristic with the content characteristic, thereby being helpful to fine-grained classification. As understood and analyzed herein, the performance of the bilinear posing operation is only manifested when there are more classes to classify, when there are fewer classes to classify, there is no need for as many effective distinguishing features between classes, and when the number of classes to classify increases, the number of effective features needed to completely distinguish all classes increases at a geometric rate. Because the characteristics which finally participate in classification in the original ResNet50 network are only 2048-dimensional, after the outer product operation, the characteristics are subjected to second-order fusion, the characteristic space is increased, and the task of insect classification, which needs a large number of characteristics, is further satisfied.
A comparison structure of performances of different network structures is shown in fig. 11, and the evaluation indexes of the present invention shown in the last column of the figure are obviously superior to the performances of the other network structures.
The acc2-3 is used for the basic model to carry out performance evaluation: as shown in fig. 12, it is clear that the model has a poor accuracy for upper-level classification of insects when no multi-label supervised constraint framework is introduced.
Experiments were introduced to measure loss. As shown in fig. 13, introducing the L + change 3 loss improved the 3-level classification accuracy performance of 6 and 9 thousandths in the two structures, respectively, but the 2-level classification accuracy in the wrong sample set was slightly decreased. The introduction of the L-change 2 loss resulted in a slight 3-class classification accuracy improvement in both structures, respectively, but significantly improved the accuracy of 2-class classification in the wrong sample set, with an improvement of about 30 percentage points for ResNet50 and about 20 percentage points for ResNet50+ SE structure 2. When the two loss functions are combined, the performance of the model is obviously improved on two evaluation indexes.
Distance metric comparison experiment: as shown in FIG. 14, for the L-change 2 loss function, the use of the L1 distance can better improve the performance of acc2-3, and for the L + change 3 loss function, the use of the L1 distance can better improve the performance of acc 3. In addition, because the two loss functions are optimized for the features at the same position in the network, the distance measurement functions in the same form can limit the feature expression in the same feature space, which is better for the training process.
Threshold selection experiment: as shown in fig. 15, the best effect is achieved when the threshold Δ of the L1 distance between L-change 2 and L + change 3 is 0.7 and 0.2, respectively.
In addition, the invention also verifies the invalidity and the inhibitability of other distance measurement losses. As shown in fig. 16, the fine-grained classification results of the 3-level labels can be suppressed to different degrees by 4 kinds of losses. Wherein the loss of L-hinge3 has no obvious influence on the result, the L + hinge2 inhibits the 3-grade fine-grained classification by 2 percentage points, and the L + hinge1 and the L-hinge1 have larger inhibition influence on the experimental result. It can be analyzed that L + change 2 requires feature clustering within class 2, while class 2 classification is similar to conventional image classification, which results in the web learning more conventional discriminative features, while fine-grained classification of class 3 labels requires the web learning as fine-grained discriminative features as possible, which results in the performance of fine-grained classification being suppressed. In addition, the huge inhibition of the level 1 label constraint on the network performance is mainly reflected in that the feature space required by the level 1 classification is not consistent with the level 3 classification, the combination of the feature space and the level 3 classification is similar to the joint learning of two mutually exclusive tasks, and the effect of the level 3 classification is influenced by the network in order to learn more level 1 classification features.
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, and various changes may be made apparent to those skilled in the art as long as they are within the spirit and scope of the present invention as defined and defined by the appended claims, and all matters of the invention which utilize the inventive concepts are protected.
Claims (1)
1. A fine-grained insect image classification method is characterized by comprising the following steps:
(1) acquiring and preprocessing an image;
fine-grained pictures of different forms of different types of insects are collected, and repeated, fuzzy and overexposed low-quality pictures are deleted in a manual screening mode, so that the residual pictures meet the characteristics of large intra-class difference and small inter-class difference;
(2) establishing a picture label;
each picture is labeled with three layers of category labels, and is marked as (y)1,y2,y3) Tree mapping relation exists among the three layers of category labels; wherein the first layer category label y1Indicating the "order" of the insect, second-level category label y2Indicate the same viewInsect categories with similar visual characteristics, third-level category label y3Representing a fine-grained category of insects;
(3) image enhancement processing;
(3.1) picture zooming: sampling the length and width of each picture to a fixed pixel size by a bilinear interpolation method;
(3.2) picture random rotation: setting a twiddle factor; randomly sampling a value in [ -factor, factor ] as a rotation angle of the picture and controlling the picture to rotate, wherein when the rotation angle is not 90 ° or 180 °, a pixel-free area left during the rotation operation is filled with black;
(3.3) randomly turning pictures horizontally or vertically: setting a probability factor p; randomly generating a probability value in [0, 1], if the probability value is less than p, carrying out random horizontal or vertical turning operation on the picture, otherwise, not turning the picture;
(3.4) picture random cropping: firstly, amplifying the picture to be 1.25 times of the original picture, and then randomly cutting an area with the size of the network input size in the amplified picture;
(3.5) carrying out color dithering processing on the picture;
(3.5.1) setting a jitter Factor;
(3.5.2), enhancing the brightness of the picture: randomly generating a numerical value s between max (0,1-Factor) and (1+ Factor) as a scaling Factor of the brightness, and multiplying the brightness value bright of the original picture by the scaling Factor s to obtain an enhanced brightness value
(3.5.3), enhancing the contrast of the picture: randomly generating a numerical value s between max (0,1-Factor) and (1+ Factor) as a scaling Factor of the contrast, and multiplying the contrast of the original picture by the scaling Factor s to obtain an enhanced contrast value
(3.5.4), enhancing the saturation of the picture: randomly generating a numerical value s between max (0,1-Factor) and (1+ Factor) as a scaling Factor of the saturation, and multiplying the saturation of the original picture by the scaling Factor s to obtain an enhanced saturation value
(3.5.5) controlling the hue of the enhanced picture through the label: initializing a category set s, and adding the category into the set when the category contains the pictures collected by shooting the specimen. Before the image is subjected to hue enhancement, whether the image belongs to the category of the set s is judged, if yes, the image is subjected to hue enhancement, and if not, the image is not subjected to enhancement. The hue enhancement operation is as follows: a value s is randomly generated between max (0,1-Factor) and (1+ Factor) as a hue shift Factor. Before the hue enhancement is carried out on the picture, the hue value hue of the original picture is subjected to the shift operation on the hue ring according to the shift factor s to obtain the enhanced hue value
(4) Setting a training data set, and setting the enhanced picture and the corresponding label as the training data set;
(5) building a neural network model for insect classification;
based on the existing Resnet50 network, the whole feature extraction part of Resnet50 is divided into 5 stages, and a channel attention mechanism module is added after stage3, stage4 and stage5 respectively, wherein the working principle of the channel attention mechanism module is as follows: the output characteristic I of each stage is a three-dimensional matrix of c x w x h, and each element in the matrix is marked as Ii,j,kWherein i ∈ [1, c ]],j∈[1,w],k∈[1,h]C is the number of channels, w and h are width and height;
the channel attention mechanism module adds an attention weight W with the length c to the output characteristic I to obtain the output characteristic of the channel attention mechanismThe values of the respective elements thereof are notedWherein, wiIs a weight coefficient;
after passing through stage5 and a channel attention mechanism module, adding an improved bilinear pooling layer, wherein the specific working principle is as follows: will output the characteristicsExpanded into a two-dimensional vector of length c × (w × h), and is denoted asThen toCarrying out bilinear fusion to obtain the final output characteristicsFinally, the output characteristics are further processedSpread out as a one-dimensional vector of length c × c, noted
(6) Training a neural network model;
inputting the enhanced picture and the corresponding label into a neural network model, and predicting the category of the corresponding insect in the input picture by the neural network model
(6.1) in the training data set, randomly selecting b pictures and corresponding labels as input of the training of the current round, and inputting the b pictures and the corresponding labels into a training neural network model;
(6.2) extracting output characteristics of the b pictures through a neural network modelWherein, the output characteristic of each picture is a one-dimensional vector with the length of c multiplied by c;
(6.3) output characteristics of each pictureDimension reduction processing is carried out through a full connection layer to obtain a classification judgment result p of each pictureiWherein the classification discrimination result piThe method is a one-dimensional matrix, the length of the one-dimensional matrix is the number M of insect species in a training data set, each item in a classification judgment result is the probability that a network judges the insect species to be the corresponding insect species, and finally, the maximum probability is taken as a prediction result and is recorded as the prediction resultAnd according toGet corresponding
(6.4) calculating a loss function value L after the training of the current round is finished;
wherein, beta1The value of the super-parameter is 0.5 through several groups of comparative experiments,the loss function value corresponding to the third-level label in the output characteristic,for the loss function value, L, corresponding to the second level label in the output signatureCELoss function values corresponding to the classification judgment results;
wherein the content of the first and second substances,showing that in the b input pictures of the training, when the three-level labels of any two pictures i and j are the sameIs 1OtherwiseIs 0, and delta is a preset threshold value; p is an output characteristicIs p ∈ [1, c × c ∈ ]];
Wherein λ isiRepresenting the ratio of the number of pictures of the category to which the ith input picture belongs to the b input pictures in the current round; y isiτFor the discrimination coefficient, y is set when the real label value of the ith input picture is equal to τiτIs 1, otherwise is 0; p is a radical ofiτIs the classification discrimination result piThe value of the τ -th element;
(6.5) after the training of the current round is finished, performing back propagation on the loss function value of the current round by a gradient descent method so as to update network parameters, and then returning to the step (6.1) to perform the next round of training until the network converges, thereby obtaining a trained neural network model;
(6) real-time classification of insect pictures;
and zooming the fine-grained insect picture to be detected to the size same as the training data through picture zooming operation, and inputting the fine-grained insect picture to the trained neural network, so that the insect in the picture is directly output to belong to the category.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111395529.4A CN114187183A (en) | 2021-11-23 | 2021-11-23 | Fine-grained insect image classification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111395529.4A CN114187183A (en) | 2021-11-23 | 2021-11-23 | Fine-grained insect image classification method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114187183A true CN114187183A (en) | 2022-03-15 |
Family
ID=80541280
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111395529.4A Pending CN114187183A (en) | 2021-11-23 | 2021-11-23 | Fine-grained insect image classification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114187183A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116453032A (en) * | 2023-06-16 | 2023-07-18 | 福建农林大学 | Marine ecology detecting system |
CN117237814A (en) * | 2023-11-14 | 2023-12-15 | 四川农业大学 | Large-scale orchard insect condition monitoring method based on attention mechanism optimization |
-
2021
- 2021-11-23 CN CN202111395529.4A patent/CN114187183A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116453032A (en) * | 2023-06-16 | 2023-07-18 | 福建农林大学 | Marine ecology detecting system |
CN116453032B (en) * | 2023-06-16 | 2023-08-25 | 福建农林大学 | Marine ecology detecting system |
CN117237814A (en) * | 2023-11-14 | 2023-12-15 | 四川农业大学 | Large-scale orchard insect condition monitoring method based on attention mechanism optimization |
CN117237814B (en) * | 2023-11-14 | 2024-02-20 | 四川农业大学 | Large-scale orchard insect condition monitoring method based on attention mechanism optimization |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106920243B (en) | Improved ceramic material part sequence image segmentation method of full convolution neural network | |
CN109886066B (en) | Rapid target detection method based on multi-scale and multi-layer feature fusion | |
Wang et al. | A random forest classifier based on pixel comparison features for urban LiDAR data | |
CN107766933B (en) | Visualization method for explaining convolutional neural network | |
CN112734775B (en) | Image labeling, image semantic segmentation and model training methods and devices | |
Solaiman et al. | Multisensor data fusion using fuzzy concepts: application to land-cover classification using ERS-1/JERS-1 SAR composites | |
CN108052966A (en) | Remote sensing images scene based on convolutional neural networks automatically extracts and sorting technique | |
Zhang et al. | PSO and K-means-based semantic segmentation toward agricultural products | |
Liu et al. | Remote sensing image change detection based on information transmission and attention mechanism | |
CN108492298B (en) | Multispectral image change detection method based on generation countermeasure network | |
CN109840483B (en) | Landslide crack detection and identification method and device | |
CN104850822B (en) | Leaf identification method under simple background based on multi-feature fusion | |
Shahab et al. | How salient is scene text? | |
CN109886161A (en) | A kind of road traffic index identification method based on possibility cluster and convolutional neural networks | |
CN108229550A (en) | A kind of cloud atlas sorting technique that network of forests network is cascaded based on more granularities | |
CN106408030A (en) | SAR image classification method based on middle lamella semantic attribute and convolution neural network | |
CN114187183A (en) | Fine-grained insect image classification method | |
CN110211127B (en) | Image partition method based on bicoherence network | |
CN106022254A (en) | Image recognition technology | |
Patil et al. | Enhanced radial basis function neural network for tomato plant disease leaf image segmentation | |
Su et al. | LodgeNet: Improved rice lodging recognition using semantic segmentation of UAV high-resolution remote sensing images | |
Chen et al. | Agricultural remote sensing image cultivated land extraction technology based on deep learning | |
Ju et al. | Classification of jujube defects in small data sets based on transfer learning | |
Saba et al. | Optimization of multiresolution segmentation for object-oriented road detection from high-resolution images | |
Zhao et al. | Butterfly recognition based on faster R-CNN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |