CN117036948A - Sensitized plant identification method based on attention mechanism - Google Patents

Sensitized plant identification method based on attention mechanism Download PDF

Info

Publication number
CN117036948A
CN117036948A CN202311009797.7A CN202311009797A CN117036948A CN 117036948 A CN117036948 A CN 117036948A CN 202311009797 A CN202311009797 A CN 202311009797A CN 117036948 A CN117036948 A CN 117036948A
Authority
CN
China
Prior art keywords
sensitized
plant
input
output
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311009797.7A
Other languages
Chinese (zh)
Inventor
肖荣波
罗树华
王鹏
黄飞
肖美红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202311009797.7A priority Critical patent/CN117036948A/en
Publication of CN117036948A publication Critical patent/CN117036948A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/188Vegetation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of intelligent plant identification, and discloses a sensitized plant identification method based on an attention mechanism, which comprises the following steps: acquiring a sensitized plant image and screening to construct a sensitized plant image data set; performing data enhancement processing on the sensitized plant image in the sensitized plant image data set; labeling sensitized plant images in the sensitized plant image data set, and dividing the sensitized plant image data set after labeling; constructing a sensitized plant identification model based on an attention mechanism, and training the model by utilizing the partitioned sensitive plant image data set; and storing the trained sensitized plant identification model for sensitized plant identification detection of the image to be detected. The sensitized plant identification method based on the attention mechanism realizes remarkable improvement on Precision, recall, F fraction and mAP, and has higher identification precision and accuracy; the invention has wide application prospect and provides a new solution for screening and protecting the allergen.

Description

Sensitized plant identification method based on attention mechanism
Technical Field
The invention relates to the technical field of plant identification, in particular to a sensitized plant identification method based on an attention mechanism.
Background
Plants are an important component of the ecosystem, providing a number of important ecosystem services to humans and other organisms. However, some plant species may trigger allergic reactions, causing health problems for allergic disease patients and allergen-sensitive populations. Current plant allergen identification methods generally rely on manual observation and chemical analysis, which require expertise and time-consuming experimental manipulations; furthermore, since the form and concentration of allergens in different plant species may vary, accurately identifying allergens in a plant remains challenging. Thus, there is a need for an efficient and accurate intelligent method for identifying allergens in plants. Such a method should be able to take into account the characteristics of the plant, the environmental conditions and the interactions with other organisms.
Disclosure of Invention
The invention aims to provide an attention-improved sensitized plant identification method which is used for efficiently and accurately identifying allergic sources in plants.
In order to realize the tasks, the invention adopts the following technical scheme:
a method of identifying sensitized plants based on an attention mechanism comprising the steps of:
acquiring a sensitized plant image and screening to construct a sensitized plant image data set;
performing data enhancement processing on the sensitized plant image in the sensitized plant image data set;
labeling sensitized plant images in the sensitized plant image data set, and dividing the sensitized plant image data set after labeling;
constructing a sensitized plant identification model based on an attention mechanism, and training the model by utilizing the partitioned sensitive plant image data set; the sensitized plant identification model comprises:
taking YOLOv5 of a lightweight CSPDarknet53 Backbone network as a model framework, wherein the model framework comprises three parts, namely a feature extraction network Backbone, a feature pyramid module Neck and a Prediction head Prediction;
the backstene is responsible for extracting features from an input image, an attention introducing mechanism is added after the Bottleneck CSP of the backstene part, the input of the backstene is a feature map, and the output of the backstene is the feature map subjected to self-adaptive attention weight adjustment; the SPPF module is used for replacing an SPP module of the Yolov5 to form an SPPFCSPC module, the SPPF module divides a feature map output by the upper layer into a plurality of grids, and then pooling operation is carried out in each grid to obtain feature vectors with fixed sizes; the feature vectors with different scales are fused through weighted summation, and Fusion feature Fusion is obtained 1 Outputting through SPPFCSPC module; the Bottleneck CSP of the backbond adopts a CSPC network design strategy, the output characteristics of the Bottleneck CSP of the backbond are divided into two branches, one branch is directly transmitted to the Bottleneck CSP module in the Neck, the other branch is subjected to a series of convolution operations to obtain new characteristics, the new characteristics are fused through weighted summation, and the fused characteristic Fusion is obtained 2 Transferring to a Neck for further processing;
neck is a part for further performing multi-scale fusion and context sensing on the features on the basis of a Backbone; the input to Neck is a feature map of the Bottleneck CSP and SPPFCSPC outputs from Backbone; replacing a Concat module after an up-sampling module Upsample in the Neck with a DwConv2 deconvolution layer module, realizing up-sampling operation through transposed convolution, and up-sampling a low-resolution feature map to the same size as a high-resolution feature map;
the Prediction head Prediction is the last part of the YOLOv5 model and is used for predicting target detection on the feature map; the Prediction head Prediction input is the output of Neck, and the output of the Prediction head Prediction is the predicted target frame and class probability; outputting boundary frame coordinates including target frames and confidence probability of each category for each detection head of the Prediction, wherein the outputs are used for subsequent target frame screening and non-maximum suppression to obtain a final detection result;
and storing the trained sensitized plant identification model for sensitized plant identification detection of the image to be detected.
Further, the data enhancement processing is performed on the sensitized plant image in the sensitized plant image data set, including:
and (3) brightness adjustment, overturning and noise addition are carried out on the sensitized plant images, so that the number of the sensitized plant images is amplified to 5 times of the original number.
Further, labeling the sensitized plants in the sensitized plant image manually by using a labeling tool LabelImg software, and generating corresponding xml files which contain the path and the size of the image as well as the type, the position and the bounding box information of the labeling object;
integrating each image with the corresponding xml file by writing a Python program to ensure that the images are in one-to-one correspondence; through the process, a data set containing complete annotation information can be obtained, and finally txt annotation data files needed by the model are generated.
Further, the calculation process of the attention mechanism is as follows:
F∈R^(C×H×W)(1)
wherein F represents a feature map of an input image, R represents a real number domain, C represents a channel number, H represents a height, and W represents a width; for each channel, carrying out global average pooling on the input feature map F in the space dimension to obtain an average value F of the channels avg Can be expressed as:
wherein c represents a channel index, i represents a height index, and j represents a width index;
subsequently, the average value f for the channel avg Make correctionsTo enhance the response value f of the important channel scale Expressed as:
f scale (c)=γ×δ(f avg (c))+β (3)
wherein γ and β represent learnable parameters in the attention mechanism, δ represents a ReLu activation function;
the attention mechanism performs local adaptive weighting by applying a 1D convolution operation on each channel, the 1D convolution operation taking the response value between channels as the attention weight;
finally, the corrected response value f is used scale The input feature map F is subjected to weighted adjustment of channel dimension, and a feature map F' subjected to attention mechanism adjustment is obtained, and is expressed as follows:
F′=f scale (c)×F(c,i,j) (4)
further, the input tensor of the DwConv2 deconvolution layer module is a four-dimensional tensor that converts each element in the input signature into a 4x4 small square, leaving some gaps between these small squares, thereby increasing the size of the output signature; the deconvolution layer has 512 output channels in this case, each generating a 4x4 output matrix; the convolution kernel size of this layer is 4x4, the stride is 2, and the padding size is 1.
Further, the computation formula of the DwConv2 deconvolution layer module output feature map is as follows:
output height =(input height -1)*stride-2*padding+k size (8)
output width =(input width -1)*stride-2*padding+k size (9)
wherein, stride represents Stride, padding represents Padding, ksize represents the size of a convolution kernel, which is a square or rectangular window, and slides on the input feature map to perform convolution operation; input height And input width The height and width of the input tensor, respectively, output height And output set width The height and width of the output tensor, respectively.
Further, in the Prediction section, the processing procedure of the feature is as follows:
firstly, carrying out a 1x1 convolution operation on the output of a feature pyramid, wherein the convolution operation is mainly used for reducing the number of channels so as to reduce the number of parameters and the computational complexity, and adjusting the dimension of a feature map into a form suitable for subsequent prediction; next, predicting the characteristics through a plurality of detection heads, wherein each detection head is responsible for predicting a target frame with a specific scale; the detection head consists of a plurality of 3x3 convolution layers and a 1x1 convolution layer. The 3x3 convolution is used to further process the features, extract higher level feature representations, and the 1x1 convolution is used for final target prediction; in order to predict target frames with different sizes, anchor Boxes are adopted to define default frames with different scales; each detection head will use a predefined Anchor Boxes to predict the position and size of the target.
Further, in the Prediction, the SIoU loss function is used to improve the loss function in YOLO v5, and the calculation formula is as follows:
SIoU=IoU-p (10)
where IoU is the intersection ratio of the predicted and real frames and p is a smoothing factor for reducing the gradient discontinuity of the loss function; the smoothing factor p is calculated as follows:
where v is a constant for controlling the size of the smoothing factor.
Further, two screening rules, a confidence threshold of 0.5 and a soft non-maximal inhibition value of 0.45, were set to determine the location and class of the plant.
Compared with the prior art, the invention has the following technical characteristics:
firstly, by introducing an attention mechanism, SPPF multi-scale feature fusion and CSPC network design strategy, the model realizes better multi-scale feature fusion and rich feature information extraction, improves the identification capability of the model on sensitized plants, enables the model to more accurately identify and classify sensitized plants, and provides important technical support for research and management of plant related fields.
Second, by introducing deconvolution layers, SIoU and Soft-NMS and optimizing them, the model is made excellent in detail recovery and target box selection. The up-sampling operation of the deconvolution layer improves the resolution of the output feature map so that the model can better capture fine features. And the optimization of SIoU and Soft-NMS promotes the selection and inhibition strategy of the target frame, and enhances the accuracy and stability of the model. These optimizations enable the model to more accurately locate and identify sensitized plants, reducing the likelihood of false positives and false negatives.
Finally, the application of the scheme is beneficial to improving the monitoring and management effects of the sensitized plants, and accurately identifying the sensitized plants is beneficial to timely taking corresponding measures, such as adjusting planting layout, taking protective measures and the like, so as to reduce the occurrence and influence of plant anaphylactic reaction.
Drawings
FIG. 1 is a flow chart of an identification method of one embodiment of the present invention;
FIG. 2 is a schematic diagram of a network architecture;
FIG. 3 is a schematic diagram of SPPFCSPC network architecture;
FIG. 4 is a block diagram of a sensitized plant identification model based on an attention mechanism;
FIG. 5 is a graph showing a comparison of the detection and recognition effects of a set of Yolov5 (left) and sensitized plant recognition models (right).
Detailed Description
The invention provides a sensitized plant identification method based on an attention mechanism, which utilizes computer vision and machine learning technology to accurately identify an allergen in plant images by analyzing and processing the plant images; by introducing a attentive mechanism, the most relevant and important parts in the plant image can be automatically focused, so that the accuracy and efficiency of identification are improved. The method has wide application prospect, can play an important role in diagnosis and management of allergic diseases, and provides a new solution for screening and protecting the allergic sources. Meanwhile, the method provides important technical support for research and practice in the fields of botanic, ecology, environmental protection and the like, and provides powerful tool and data support for in-depth research on distribution, ecological characteristics and relation with human health of sensitized plants.
Referring to fig. 1 to 5, a method for identifying sensitized plants based on an attention mechanism comprises the steps of:
s1, acquiring sensitized plant images, screening, and constructing sensitized plant image data sets
By searching and collecting in mainstream social media platforms such as Xinlang microblogs, xiaohong books and the like, the scheme actively obtains rich picture resources related to three sensitized plants including kapok trees, mango trees and albizia. These platforms provide a large number of photographs and information about plants as a place for users to widely share content. The scheme digs deeply and collects a large number of representative pictures to construct a comprehensive and diversified sensitized plant image data set.
In the process of collection, the scheme carries out strict manual identification and screening on the pictures, and eliminates pictures with poor quality or irrelevant to scheme contents. This screening process ensures the reliability of the quality and correlation of the final obtained picture.
Finally, the scheme successfully obtains a large number of high-quality plant images including 118 kapok tree pictures, 102 mango tree pictures and 110 silk tree pictures. The number and variety of these plant images provides a solid basis for the protocol study of the present protocol. Meanwhile, through the effort and careful selection of the scheme, the scheme ensures the representativeness and the effectiveness of the pictures so as to improve the accuracy and the reliability of a follow-up intelligent identification method of the sensitized plants.
S2, carrying out data enhancement processing on sensitized plant images in the sensitized plant image data set
By writing a Python program, a series of data enhancement operations including brightness adjustment, overturning and noise addition are performed on the collected sensitive plant images of the kapok tree, the mango tree and the albizia tree, and the operations not only can amplify the number of pictures, but also can increase the diversity and the robustness of a data set. The number of sensitized plant images is amplified to 5 times of the original number by the data enhancement operation. The purpose of this is to improve the generalization ability of the model, so that the model can better cope with the influence of different factors such as illumination conditions, angles, noise and the like.
S3, labeling the sensitized plant images in the sensitized plant image data set, and dividing the sensitized plant image data set after labeling
According to the scheme, a labeling tool LabelImg software is used for manually labeling sensitized plants in a sensitized plant image, and corresponding xml files are generated and comprise paths and sizes of the images, and category, position and bounding box information of labeled objects.
Next, each image and the corresponding xml file are integrated by writing Python program, ensuring their one-to-one correspondence. Through the process, a data set containing complete annotation information can be obtained, and finally txt annotation data files needed by the model are generated.
In order to perform model training and performance evaluation, dividing the whole marked sensitive plant image dataset into a training set and a verification set according to the proportion of 9:1; in the training set, 10% of the pictures are further retained as part of the validation set in order to evaluate the performance of the model during the training process.
Through the picture labeling and data set dividing steps, a labeled sensitive plant image data set is obtained and divided into a training set and a verification set, and labeled data is provided for the subsequent model training and performance evaluation. Such a data set would provide a reliable basis for the implementation of the invention and help to develop a more accurate and intelligent method of identifying sensitized plants while ensuring the integrity and practicality of the overall method.
S4, constructing a sensitized plant identification model based on an attention mechanism, and training the model
In the scheme, the YOLOv5 adopting a lightweight CSPDarknet53 Backbone network is taken as a model framework, and comprises three parts, namely a feature extraction network backbox, a feature pyramid module Neck and a Prediction head Prediction, wherein:
1.Backbone
the backbox is a core part of the YOLOv5 model and is responsible for extracting features from an input image; typically consisting of a series of convolution layers, for progressively reducing the resolution of the input image and extracting the advanced features of the image. Attention mechanisms are introduced (Efficient Channel Attention) to enhance feature expression. The SPPF (Spatial Pyramid Pooling with Fusion) module is adopted to replace an SPP module in the YOLOv5, so that better multi-scale feature fusion is realized; by adopting the CSPC network design strategy, the feature map can better exchange and fuse information between different stages, thereby enhancing the feature expression capability and improving the performance and accuracy of the model.
In order to enhance the feature expression capability, an attention introducing mechanism is added after 3 Bottleneck CSPs of a Backbone part, the input of the attention introducing mechanism is a feature map, and the output of the attention introducing mechanism is the feature map subjected to self-adaptive attention weight adjustment; as shown in fig. 2 and 4, the calculation formula is as follows:
F∈R^(C×H×W) (1)
wherein F represents a feature map of an input image, R represents a real number domain, C represents the number of channels, H represents the height, and W represents the width. The input of the attention mechanism is a feature map of the input image, which is extracted by a convolution layer and contains abundant feature information such as shape, texture, color and the like.
For each channel, carrying out global average pooling on the input feature map F in the space dimension to obtain an average value F of the channels avg Can be expressed as:
where c represents the channel index, i represents the height index, and j represents the width index.
Subsequently, the average value f for the channel avg Correction is carried out to enhance the response value f of the important channel scale Expressed as:
f scale (c)=γ×δ(f avg (c))+β (3)
where γ and β represent the learnable parameters in the attention mechanism and δ represents the ReLu activation function.
The attention mechanism performs local adaptive weighting by applying a 1D convolution operation on each channel, which 1D convolution operation takes the response value between channels as the attention weight.
Finally, the corrected response value f is used scale The input feature map F is subjected to weighted adjustment of channel dimension, and a feature map F' subjected to attention mechanism adjustment is obtained, and is expressed as follows:
F′=f scale (c)×F(c,i,j) (4)
through the weighting of the attention weight, the attention of important channels is enhanced, the influence of unimportant channels is weakened, and the weighted feature map contains more enhanced feature representation, so that key information in sensitized plant images can be better captured, and the recognition performance is improved.
The SPP module of YOLOv5 was replaced with SPPF (Spatial Pyramid Pooling with Fusion) to form an SPPFCSPC module (consisting of SPPF module and CSPC network design strategy) to achieve better multi-scale feature fusion. The SPPF module divides the feature map of the previous layer output into a plurality of grids, which are expressed as:
P u =[P 1 ,P 2 ,…,P n ] (5)
where u is the index of the feature map, P u N is expressed as n multiplied by n and is a feature vector of the u-th grid, and then pooling operation is carried out in each grid to obtain a feature vector with fixed size; the feature vectors of different scales are fused by weighted summation, which is expressed as:
Fusion 1 =α 1 ×P 12 ×P 2 +…+α k ×P k (6)
wherein P is k Represents the kth eigenvector, alpha k Representing the kth best weight of the network auto-learning. Fusion of 1 And finally outputting through an SPPFCSPC module.
The bottleneck csp of the backbox employs a CSPC (Cross Stage Partial Connection) network design strategy, and features output by the bottleneck csp of the backbox section are split into two branches, one of which is directly passed to the bottleneck csp module in the neg, and the other branch is subjected to a series of convolution operations to obtain new features, denoted as X 'and Y', which are fused by weighted summation, denoted as:
Fusion 2 =μ×X′+(1-μ)×Y′ (7)
wherein μ represents a fusion weight for automatically learning two feature maps by the model; fusion of 2 As input to the next module, to be passed to the negk for further processing. Bottleneck CSP is internally branched, so Fusion 2 Can be split into two branches. Such network design strategies allow features to be shared in multiple phases, reducing computational effort and improving performance. The SPPFCSPC structure is shown in fig. 3.
And finally outputting a characteristic representation after multi-scale characteristic fusion by the Backbone for a subsequent sensitized plant detection and classification process of Neck and Prediction.
2.Neck
Neck is a part for further performing multi-scale fusion and context sensing on the features on the basis of a Backbone; the input to Neck is the output profile from Backbone, i.e., the output of two Bottleneck CSPs and SPPFCSPC.
In the scheme, a deconvolution layer is adopted, a Concat module after two upsampling modules Upsample in Neck is replaced by a DwConv2 deconvolution layer module, upsampling operation is realized through transposed convolution, and a low-resolution feature map is upsampled to the same size as a high-resolution feature map so as to perform feature fusion. This module is defined using the nn. ConvTranspose2d function, whose input tensor is a four-dimensional tensor shaped as [ batch size, input channels, input height, input width ], the function of which is to decompress the input feature map from its compressed representation back to the original space. Specifically, it converts each element in the input signature into a 4x4 small square, leaving some space between the small squares, thereby increasing the size of the output signature. The deconvolution layer has 512 output channels in this case, each generating a 4x4 output matrix. The convolution kernel size of this layer is 4x4, the stride is 2, and the padding size is 1. This means that the height and width of the output feature map is twice that of the input feature map, and each pixel of the output feature map is affected by 2 rows/columns of pixels around the input feature map, the output feature map calculation formula is as follows:
output height =(input height -1)*stride-2*padding+k size (8)
output width =(input width -1)*stride-2*padding+k size (9)
wherein, stride represents Stride, padding represents Padding, ksize represents the size of a convolution kernel, which is a square or rectangular window, and slides on the input feature map to perform convolution operation; input height And input width The height and width of the input tensor, respectively, output height And output set width The height and width of the output tensor, respectively. This formula represents that the height and width of the output profile are twice that of the input profile, respectively.
By using deconvolution layers in Neck, feature map size can be effectively increased and more detailed information can be introduced, helping the model capture a wider range of context and spatial information. Such operations are typically used to increase feature map resolution in the object detection task, thereby improving the performance and accuracy of the model.
3.Prediction
The Prediction head Prediction is the last part of the YOLOv5 model, which is used for the Prediction of target detection on feature maps. In this scenario, the prediction head is responsible for predicting the location and class information of the sensitized plants in the map.
The input of the Prediction head Prediction is the output of the feature pyramid module Neck, and the output of the Prediction head Prediction is the predicted target frame and class probability; for each detection head of the Prediction, the outputs typically include bounding box coordinates (x, y, width, height) of the target box and confidence probabilities for each class, which will be used for subsequent target box screening and non-maximum suppression, resulting in a final detection result.
In the Prediction section, the processing procedure of the features is as follows:
first, a convolution operation of 1×1 is performed on the output of the feature pyramid. This convolution operation is mainly used to reduce the number of channels to reduce the number of parameters and computational complexity and to adjust the dimensions of the feature map to a form suitable for subsequent predictions. Next, the features are predicted by a plurality of detection heads, each of which is responsible for predicting a target frame of a particular scale. The detection head is typically composed of several 3x3 convolutional layers and one 1x1 convolutional layer. The 3x3 convolution is used to further process the features, extract higher level feature representations, and the 1x1 convolution is used for final target prediction. To predict target frames of different sizes, YOLOv5 uses Anchor Boxes to define default frames of different dimensions; each detection head will use a predefined Anchor Boxes to predict the position and size of the target.
To improve the accuracy and stability of the model, improved optimization strategies such as SIoU (Soft Intersection over Union) and soft-nms (soft non-maximum suppression) are introduced.
In Prediction, the SIoU loss function is used to improve the loss function in YOLO v 5. Conventional object detection models typically use IoU (Intersection over Union) as part of the loss function to measure the degree of overlap between the prediction bounding box and the real bounding box. However, since IoU has some problems in computation, such as instability and sensitivity to small targets, SIoU was introduced to solve these problems, the computational formula is as follows:
SIoU=IoU-p (10)
where IoU is the intersection ratio of the predicted and real frames and p is a smoothing factor for reducing the gradient discontinuity of the loss function. The smoothing factor p is calculated as follows:
where v is a constant, the value of this embodiment is 0.5, which is used to control the size of the smoothing factor. The SIoU loss function leads the calculation result to be smoother and more stable by introducing the smoothing factor v into IoU calculation, and avoids the situation that the denominator is zero. This may improve the accuracy of matching the bounding box of the object, especially when dealing with small objects. Therefore, taking the SIoU as part of the loss function can effectively improve the training process of the model, making it more robust and accurate.
And then, setting a confidence threshold value of 0.5 and a Soft non-maximum suppression value (Soft-NMS) of 0.45 to determine the position and the category of the plant so as to improve the accuracy and the reliability of a detection result, ensure that the finally output position and category information accords with expectations and meet the requirements of specific applications. Notably, the present preferred embodiment is an improvement over the traditional non-maximum suppression (NMS) to Soft-NMS. The Soft-NMS has the advantage of being able to more accurately select the best target box, especially if there is a large overlap between target boxes. The method can keep important information in the overlapped frames and improve recall rate and accuracy of target detection. Compared with the traditional NMS method, the Soft-NMS is more flexible, can adapt to scenes with different target sizes and densities, and is more suitable for identifying sensitized plants in cities
After the model construction is completed, the sensitized plant identification model is subjected to model training by using the training set processed in the step S3. In a preferred embodiment of the invention, the hardware environment of the training process is: CPU is Intel (R) Core (TM) i7-9700, the main frequency is 3.0 GHz, and the memory is 32G. By adopting GPU (graphic processing unit) acceleration training, the identification of the model to sensitized plants can be accelerated, the model of the GPU is NVIDIA GeForce RTX and 2060 SUPER, and the size of the video memory is 16G. Software environment of training process: the operating system is Windows 10, python version 3.7, and the deep learning framework is Pytorch 1.8.0, cuda 11.3.
Training the sensitized plant identification model by using the sensitized plant image data set, wherein the training parameters of the network model are set as follows: the input image size is 640 x3, the initial learning rate is 0.01 by adopting SGD as an optimizer, the size of each training batch is 32, the training round number is 100, and other training parameters remain default. After training, the best weight file best. Pt is saved, and a trained sensitized plant identification model and a training result thereof are obtained. Such training settings and parameter configurations can take full advantage of the provided sensitized plant image dataset to improve model performance through an optimizer and appropriate learning rate. The proper training batch size and training round number ensures adequate learning and capture of the model for different classes of sensitized plant features. Saving the optimal weight file helps to obtain a fine-tuned model for subsequent testing and application.
S5, identifying a sensitized plant identification model and evaluating performance.
After model training is completed, the trained model is used for the task of identifying and detecting sensitized plants. To evaluate the performance of the model, common evaluation metrics are used, including Precision, recall, F1 score, and mAP (mean average Precision).
The calculation formula is as follows:
accuracy rate:
recall rate:
f1 fraction:
average accuracy:
average mean precision:
in the formula, TP represents true positive, FP represents false positive, and FN represents false negative. P represents the precision, r represents the recall, and P is a function of r as a parameter, i.e., the area under the precision-recall curve. N represents the number of identified categories.
The Precision index measures the accuracy of the model in the recognition result, and the Recall index measures the detection capability of the model on the actual sensitized plants. The F1 score comprehensively considers the balance of Precision and Recall. The mAP index considers the average accuracy of different categories, and can comprehensively evaluate the overall performance of the model. By evaluating the model on the test data set, the recognition performance and accuracy of the model can be obtained, so that the effectiveness and practicability of the proposed method in the aspect of sensitized plant recognition are verified. As can be seen from table 1, the attention-based sensitized plant recognition model in the present preferred embodiment was improved by 3.3%, 2.5%, 2.7% and 2.7% at Precision, recall, F score and mAP, respectively, as compared to YOLOv 5. This means that the model is more accurate in identifying sensitized plants, the recall is higher, and a better balance is achieved between accuracy and recall.
TABLE 1 evaluation index comparison of Yolov5 with sensitized plant identification model
Model Precision/% Recall/% mAP/% F1-Score/%
YOLOv5 94.1 93.4 96.2 93.7
Sensitized plant identification model 97.4 95.9 98.9 96.4
The sensitized plant identification model based on the attention mechanism in the embodiment realizes better multiscale feature fusion and rich feature information extraction by introducing key technologies such as the attention mechanism, SPPF multiscale feature fusion, CSPC network design strategy and the like. Meanwhile, the deconvolution layer is adopted to carry out up-sampling operation, so that the resolution of the output feature map is improved, and the detail information is recovered. The SIoU and Soft-NMS algorithm is introduced to improve the selection and inhibition strategy of the target frame and improve the accuracy and stability of the model. The comprehensive application of the technologies and the optimization strategies enables the sensitized plant identification model based on the attention mechanism to be remarkably improved in performance and accuracy. The method can better process the multi-scale characteristics and accurately extract rich characteristic information, thereby enhancing the identification capability of the model on sensitized plants. Meanwhile, the improved SIoU and Soft-NMS algorithms enable the model to be more accurate and flexible in the selection and inhibition of the target frame, and the quality of the identification result is further improved. These optimized and improved techniques provide a reliable and efficient technical solution for the attention-mechanism-based sensitized plant identification model. The method provides powerful support for identification of sensitized plants and research application in related fields, is beneficial to improving identification accuracy, enhances monitoring and management of sensitized plants, and provides important technical tools and support for research and practice in fields of botanic, ecology, environmental protection and the like.

Claims (9)

1. A method for identifying sensitized plants based on an attention mechanism, comprising the steps of:
acquiring a sensitized plant image and screening to construct a sensitized plant image data set;
performing data enhancement processing on the sensitized plant image in the sensitized plant image data set;
labeling sensitized plant images in the sensitized plant image data set, and dividing the sensitized plant image data set after labeling;
constructing a sensitized plant identification model based on an attention mechanism, and training the model by utilizing the partitioned sensitive plant image data set; the sensitized plant identification model comprises:
taking YOLOv5 of a lightweight CSPDarknet53 Backbone network as a model framework, wherein the model framework comprises three parts, namely a feature extraction network Backbone, a feature pyramid module Neck and a Prediction head Prediction;
the backstene is responsible for extracting features from an input image, an attention introducing mechanism is added after the Bottleneck CSP of the backstene part, the input of the backstene is a feature map, and the output of the backstene is the feature map subjected to self-adaptive attention weight adjustment; the SPPF module is used for replacing an SPP module of the Yolov5 to form an SPPFCSPC module, the SPPF module divides a feature map output by the upper layer into a plurality of grids, and then pooling operation is carried out in each grid to obtain feature vectors with fixed sizes; the feature vectors with different scales are fused through weighted summation, and Fusion feature Fusion is obtained 1 Outputting through SPPFCSPC module; the Bottleneck CSP of the backbond adopts a CSPC network design strategy, the output characteristics of the Bottleneck CSP of the backbond are divided into two branches, one branch is directly transmitted to the Bottleneck CSP module in the Neck, the other branch is subjected to a series of convolution operations to obtain new characteristics, the new characteristics are fused through weighted summation, and the fused characteristic Fusion is obtained 2 Transferring to a Neck for further processing;
neck is a part for further performing multi-scale fusion and context sensing on the features on the basis of a Backbone; the input to Neck is a feature map of the Bottleneck CSP and SPPFCSPC outputs from Backbone; replacing a Concat module after an up-sampling module Upsample in the Neck with a DwConv2 deconvolution layer module, realizing up-sampling operation through transposed convolution, and up-sampling a low-resolution feature map to the same size as a high-resolution feature map;
the Prediction head Prediction is the last part of the YOLOv5 model and is used for predicting target detection on the feature map; the Prediction head Prediction input is the output of Neck, and the output of the Prediction head Prediction is the predicted target frame and class probability; outputting boundary frame coordinates including target frames and confidence probability of each category for each detection head of the Prediction, wherein the outputs are used for subsequent target frame screening and non-maximum suppression to obtain a final detection result;
and storing the trained sensitized plant identification model for sensitized plant identification detection of the image to be detected.
2. The attention-based sensitized plant identification method according to claim 1, wherein the data enhancement processing of sensitized plant images in the sensitized plant image dataset comprises:
and (3) brightness adjustment, overturning and noise addition are carried out on the sensitized plant images, so that the number of the sensitized plant images is amplified to 5 times of the original number.
3. The attention mechanism based sensitized plant identification method according to claim 1, wherein a labeling tool LabelImg software is used for manually labeling sensitized plants in a sensitized plant image, and corresponding xml files are generated, wherein the xml files comprise paths and sizes of the image and category, position and bounding box information of a labeling object;
integrating each image with the corresponding xml file by writing a Python program to ensure that the images are in one-to-one correspondence; through the process, a data set containing complete annotation information can be obtained, and finally txt annotation data files needed by the model are generated.
4. The method of claim 1, wherein the attention mechanism is calculated as follows:
F∈R (C×H×W) (1)
wherein F represents a feature map of an input image, R represents a real number domain, C represents a channel number, H represents a height, and W represents a heightA width; for each channel, carrying out global average pooling on the input feature map F in the space dimension to obtain an average value F of the channels avg Can be expressed as:
wherein c represents a channel index, i represents a height index, and j represents a width index;
subsequently, the average value f for the channel avg Correction is carried out to enhance the response value f of the important channel scale Expressed as:
f scale (c)=γ×δ(f avg (c))+β (3)
where γ and β represent the learnable parameters in the attentional mechanism and δ represents the ReLu activation function:
the attention mechanism performs local adaptive weighting by applying a 1D convolution operation on each channel, the 1D convolution operation taking the response value between channels as the attention weight;
finally, the corrected response value f is used scale The input feature map F is subjected to weighted adjustment of channel dimension, and a feature map F' subjected to attention mechanism adjustment is obtained, and is expressed as follows:
F′=f scale (c)×F(c,i,j) (4)
5. the attention-based sensitized plant identification method according to claim 1, wherein the input tensor of the DwConv2 deconvolution layer module is a four-dimensional tensor that converts each element in the input signature into a 4x4 small square, leaving some gaps between the small squares, thereby increasing the size of the output signature; the deconvolution layer has 512 output channels in this case, each generating a 4x4 output matrix; the convolution kernel size of this layer is 4x4, the stride is 2, and the padding size is 1.
6. The method for identifying sensitized plants based on an attention mechanism according to claim 1, wherein the computation formula of the DwConv2 deconvolution layer module output feature map is as follows:
output height =(input height -1)*stride-2*padding+k size (8)
output width =(input width -1)*stride-2*padding+k size (9)
wherein, stride represents Stride, padding represents Padding, ksize represents the size of a convolution kernel, which is a square or rectangular window, and slides on the input feature map to perform convolution operation; input height And input width The height and width of the input tensor, respectively, output height And output set width The height and width of the output tensor, respectively.
7. The method for identifying sensitized plants based on an attention mechanism according to claim 1, wherein the processing of the features in the Prediction part is as follows:
firstly, carrying out a 1x1 convolution operation on the output of a feature pyramid, wherein the convolution operation is mainly used for reducing the number of channels so as to reduce the number of parameters and the computational complexity, and adjusting the dimension of a feature map into a form suitable for subsequent prediction; next, predicting the characteristics through a plurality of detection heads, wherein each detection head is responsible for predicting a target frame with a specific scale; the detection head consists of a plurality of 3x3 convolution layers and a 1x1 convolution layer. The 3x3 convolution is used to further process the features, extract higher level feature representations, and the 1x1 convolution is used for final target prediction; in order to predict target frames with different sizes, anchor Boxes are adopted to define default frames with different scales; each detection head will use a predefined Anchor Boxes to predict the position and size of the target.
8. The method for identifying sensitized plants based on an attention mechanism according to claim 1, wherein in the Prediction, a SIoU loss function is used to improve the loss function in YOLO v5, and the calculation formula is as follows:
SIoU=IoU-p (10)
where IoU is the intersection ratio of the predicted and real frames and p is a smoothing factor for reducing the gradient discontinuity of the loss function; the smoothing factor p is calculated as follows:
where v is a constant for controlling the size of the smoothing factor.
9. The attention mechanism based sensitized plant identification method according to claim 1, wherein the position and class of the plant are determined by setting two screening rules of confidence threshold of 0.5 and soft non-maximal inhibition value of 0.45.
CN202311009797.7A 2023-08-11 2023-08-11 Sensitized plant identification method based on attention mechanism Pending CN117036948A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311009797.7A CN117036948A (en) 2023-08-11 2023-08-11 Sensitized plant identification method based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311009797.7A CN117036948A (en) 2023-08-11 2023-08-11 Sensitized plant identification method based on attention mechanism

Publications (1)

Publication Number Publication Date
CN117036948A true CN117036948A (en) 2023-11-10

Family

ID=88642528

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311009797.7A Pending CN117036948A (en) 2023-08-11 2023-08-11 Sensitized plant identification method based on attention mechanism

Country Status (1)

Country Link
CN (1) CN117036948A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117853801A (en) * 2024-01-09 2024-04-09 华中农业大学 Deep learning classification model and application thereof in corn multiple growth stage phenotype detection
CN117952985A (en) * 2024-03-27 2024-04-30 江西师范大学 Image data processing method based on lifting information multiplexing under defect detection scene

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117853801A (en) * 2024-01-09 2024-04-09 华中农业大学 Deep learning classification model and application thereof in corn multiple growth stage phenotype detection
CN117952985A (en) * 2024-03-27 2024-04-30 江西师范大学 Image data processing method based on lifting information multiplexing under defect detection scene

Similar Documents

Publication Publication Date Title
JP6843086B2 (en) Image processing systems, methods for performing multi-label semantic edge detection in images, and non-temporary computer-readable storage media
CN108229490B (en) Key point detection method, neural network training method, device and electronic equipment
CN113065558A (en) Lightweight small target detection method combined with attention mechanism
CN111126258A (en) Image recognition method and related device
CN117036948A (en) Sensitized plant identification method based on attention mechanism
CN112036447B (en) Zero-sample target detection system and learnable semantic and fixed semantic fusion method
CN114092833B (en) Remote sensing image classification method and device, computer equipment and storage medium
CN113609896A (en) Object-level remote sensing change detection method and system based on dual-correlation attention
CN105930794A (en) Indoor scene identification method based on cloud computing
CN111967464B (en) Weak supervision target positioning method based on deep learning
CN115272828A (en) Intensive target detection model training method based on attention mechanism
CN109472733A (en) Image latent writing analysis method based on convolutional neural networks
CN114781514A (en) Floater target detection method and system integrating attention mechanism
CN109255382A (en) For the nerve network system of picture match positioning, method and device
CN115410081A (en) Multi-scale aggregated cloud and cloud shadow identification method, system, equipment and storage medium
Wang et al. An efficient attention module for instance segmentation network in pest monitoring
CN113435254A (en) Sentinel second image-based farmland deep learning extraction method
CN114387270A (en) Image processing method, image processing device, computer equipment and storage medium
CN114581789A (en) Hyperspectral image classification method and system
CN117437691A (en) Real-time multi-person abnormal behavior identification method and system based on lightweight network
CN117315499A (en) Satellite remote sensing image target detection method and system
CN116758419A (en) Multi-scale target detection method, device and equipment for remote sensing image
CN112528803B (en) Road feature extraction method, device, equipment and storage medium
CN113361336A (en) Method for positioning and identifying pedestrian view attribute in video monitoring scene based on attention mechanism
CN112465821A (en) Multi-scale pest image detection method based on boundary key point perception

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination