CN116824144A - U-shaped sensing light-weight transducer method for segmenting small lesions of grape leaves - Google Patents

U-shaped sensing light-weight transducer method for segmenting small lesions of grape leaves Download PDF

Info

Publication number
CN116824144A
CN116824144A CN202310789644.2A CN202310789644A CN116824144A CN 116824144 A CN116824144 A CN 116824144A CN 202310789644 A CN202310789644 A CN 202310789644A CN 116824144 A CN116824144 A CN 116824144A
Authority
CN
China
Prior art keywords
information
frequency
pixel
segmentation
weights
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310789644.2A
Other languages
Chinese (zh)
Inventor
穆维松
张馨心
郑海颖
范梦杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Agricultural University
Original Assignee
China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Agricultural University filed Critical China Agricultural University
Priority to CN202310789644.2A priority Critical patent/CN116824144A/en
Publication of CN116824144A publication Critical patent/CN116824144A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of agricultural information, and particularly relates to a U-shaped sensing light-weight transducer method for segmenting small lesions of grape leaves. The method comprises the steps of adopting a lightweight convolutional neural network MobileNetV2, and extracting multi-scale characteristic information through downsampling of a U-shaped pyramid; extracting a low-frequency global feature and a high-frequency local feature map by a context perception enhancement module; introducing a token aggregation strategy, and reducing detail information loss caused by direct aggregation of low-frequency and high-frequency features; the aggregated tokens are directly transmitted to a lightweight segmentation head to implement segmentation tasks. The invention can realize the balance of efficiency and speed so as to solve the problem of small spot segmentation of grape leaves in a complex background of natural fields.

Description

U-shaped sensing light-weight transducer method for segmenting small lesions of grape leaves
Technical Field
The invention belongs to the technical field of agricultural information, and particularly relates to a U-shaped sensing light-weight transducer method for segmenting small lesions of grape leaves.
Background
Grape leaf spot is one of the major factors responsible for reduced yield and quality in grape planting. Moreover, the grape leaf lesions can spread the fungus rapidly throughout the plantation and cause epidemics throughout the field. The distribution of diseases is quickly known by distributing labels to each pixel, the lightweight segmentation model is helpful for quickly diagnosing and monitoring the disease trend on the blade, and the treatment efficiency is improved and the treatment cost is reduced by targeted management measures. However, the heavyweight model is not friendly to these rapidly propagating plant diseases that require timely segmentation and is difficult to deploy on hardware devices with limited resources. In order to improve the segmentation efficiency, the segmentation model needs to be designed accurately, lightweight and fast.
(1) Convolutional neural network
The lightweight visual task has been overwhelmingly dominated by Convolutional Neural Networks (CNNs). The inherent generalized bias and weight sharing characteristics allow the model to characterize learning with fewer parameters. However, they have some problems that limit their performance: 1. the local connectivity of CNNs often prevents modeling long-term dependencies, ignoring fine-grained semantic information in complex contexts. 2. The fixed convolution kernel and weight result in loss of detail information, and the pixel semantics of small target diseases cannot be extracted. The transform-based approach has demonstrated excellent long-range modeling capabilities, a method to learn visual global characterization instead of CNN, benefiting from self-attention mechanisms. However, heavy weights and time-consuming computer mechanisms are less friendly to reasoning in realistic industrial deployment scenarios.
(2)Transformer
In order to reduce the computational efficiency of the model, many variants of the transducer are struggling to free the model from the dilemma of time-consuming computations. In addition, many work extracts lesion low resolution features by introducing convolution operators, downsampling feature maps, employing pyramid hierarchies, and redesigning markers. When the model is compressed to a size suitable for movement, the segmentation performance of the corresponding model is sacrificed, which appears to be a catastrophic failure in terms of the lightweight model. CNNs exhibit impressive performance in terms of their inherently biased architectural design, and thus recent work has attempted to embed the advantages of CNNs into transformers to achieve excellent accuracy-efficiency trade-offs. The MobileFormer utilizes MobileNet and Transformer to realize bidirectional fusion of local and global information with lower calculation cost. Unfortunately, the above approach focuses on designing to capture low frequency global information, ignoring the importance of high frequency local information, which helps to extract the features of small lesions.
(3) Feature aggregation policy
A weakness of directly aggregating high resolution features and low frequency feature information is that details are easily overwhelmed or lost. Some pioneering efforts explored various polymerization schemes to overcome the above problems. ASPP performs dilation convolution with multiple parallel branches with different dilation rates, which enables the model to aggregate local and global background information without significant increase in computational complexity. PPM in PSPNet captures multi-scale context information by pyramid pooling of the input feature map, which enhances model perception of different scale features and produces finer segmentation results than ASPP. To obtain more refined context information, subsequent variants are derived for applications such as DAPPM, feature integration through larger convolution kernels and deeper information flow. However, the depth information of the above method is not processed in parallel, and the number of channels per scale is relatively large, which means that the calculation amount is relatively large.
Disclosure of Invention
The invention discloses a U-shaped sensing light-weight transducer method for dividing grape leaf small lesions, which adopts a light-weight convolutional neural network MobileNet V2 to extract multi-scale characteristic information through downsampling of a U-shaped pyramid; extracting a low-frequency global feature and a high-frequency local feature map by a context perception enhancement module; introducing a token aggregation strategy, and reducing detail information loss caused by direct aggregation of low-frequency and high-frequency features; the aggregated tokens are directly transmitted to a lightweight segmentation head to implement segmentation tasks.
Preferably, the U-shaped pyramid adopts MobileNetV2 to extract characteristic information, and reduces the resolution of original tokens by using an average pooling operator, and splices marks of different scales along the channel dimension to generate new marks, wherein the new marks are used as input of a context perception enhancement module; the extraction of the multi-scale feature information has a low computational complexity since the multi-scale labels are downsampled to a smaller resolution, i.e. the new labels have a large number of channels.
Preferably, the context-aware enhancement module includes a prototype-aware branch and a pixel-aware broadcast branch.
Preferably, the prototype perceived branch downsamples K and V to reduce matrix operations; the convolution layer exchanges information between marks along the spatial dimension, reducing the number of remodelling, and the nonlinear activation layer is replaced by RELU6 and GELU; the speed of the added batch normalization in each convolution is superior to that of the inferred layer normalization mode; fine-grained semantic information in the token is contained by the residual map of the transducer. The prototype perceived branch effectively achieves a global receptive field and enhances low frequency characterization at a lower computational cost.
Preferably, the pixel perceives the broadcasting branch, adopts a convolution type attention mechanism, and effectively mines the context weight through sharing the weight and the pixel perceiving weight; a linear layer is used to generate key K, query Q, and value V, as follows:
Q,K,V=Linear(X in )
wherein X is in Representing features input from the U-shaped pyramid.
Preferably, the pixel perceives a broadcast branch, which is divided into the following steps:
step 1: local features are extracted by a depth convolution (DWconv) operator and shared weights are used on V, as follows:
V=DWconv(V)
step 2: performing local enhancement processing with pixel perception weights on the Q and the K; obtaining local information of Q and V respectively by using two convolutions with translational invariance; calculating the values of Q and K by the Hadamard product for use as output; the Softmax in the traditional attention is replaced by Tanh and Awish to obtain the pixel perception weight between-1 and 1; the gating mechanism is adopted to obtain the pixel perception weight, so that the pixel perception weight has stronger nonlinearity, and the stronger nonlinearity means higher quality pixel perception weight, and is specifically expressed as follows:
Q l =DWconv(Q)
K l =DWconv(K)
Atten l =Linear(Swinsh(Linear(Q l ⊙K l )))
the generated strong nonlinear weight is aggregated with other pixels, and local characteristics are enhanced through Hadamard product operation. The output graph is defined as:
X local =Attn⊙V
preferably, the token aggregation policy effectively aggregates low-frequency global information and high-frequency local information, and reduces the number of channels on each scale; a global averaging pool of different convolution kernels and step sizes is employed to obtain feature maps of different image resolutions. The channel dimensions of different scales are transformed by a 1 x 1 convolution and the feature map is up-sampled. The original features are then aggregated with background information of different scales using a 3 x 3 convolution method. Finally, the feature map is concatenated and compressed using a 1 x 1 convolution. Furthermore, for ease of optimization, a residual map of 1×1 is also introduced. Assuming x is the input, the features for each scale can be expressed as:
where Up represents upsampling.
The invention has the advantages that a light U-shaped perception transducer is provided, which takes the descending samples of the token with different scales to small scales as input and inherits the advantages of CNN and transducer; the core component perception enhancement module employs a parallel architecture in a small scale token n to achieve superior cost effectiveness. The perception enhancement module consists of two branches: the prototype perception branch learns low-frequency global information through downsampling K and V, the pixel perception broadcasting branch adopts a gating mechanism to enhance nonlinearity, and high-frequency local information is mined through sharing weights and context perception weights; a token aggregation strategy is designed to compensate for the sacrificed details without increasing the number of parameters. The invention can realize the balance of efficiency and speed so as to solve the problem of small spot segmentation of grape leaves in a complex background of natural fields.
Drawings
FIG. 1 is a diagram of the overall architecture of a U-shaped perceptual lightweight transducer method for segmenting small lesions of grape leaves;
FIG. 2 is a schematic diagram of a pixel aware broadcast branch;
fig. 3 is a schematic diagram of a token aggregation policy.
Detailed Description
The overall framework diagram of the U-shaped perception lightweight transducer method for segmenting small lesions of grape leaves is shown in figure 1; FIG. 2 is a schematic diagram of a pixel-aware broadcast branch in a context-aware enhancement module of the present invention; fig. 3 is a schematic diagram of a token aggregation policy according to the present disclosure.
In the training stage, the experiment and other comparison method experiments are deployed in a pytorch and a segmentation library for semantic segmentation experiments. All models were trained on NVIDIA Tesla V100 GPU. The present invention follows the same training strategy of the previous work, considering the fairness of the comparison. Specifically, the image is randomly cropped to 512×512. In the training phase AdamW with a weight decay of 0.01 was used to optimize the model of the invention. Training LRT using "poly" LR strategy
(lr=baselr×(1-spoch/maxiter) power ) Wherein the "poly" LR strategy factor is set to 1, the initial learning rate is 6×10 -6 A total of 16 tens of thousands of iterations.
The present invention evaluates the design architecture on three data sets, including a field-PV data set, a plant village data set, and a hybrid-PV data set. The Field-PV dataset was collected in an OLYMPUS OM-D camera used by forestry and fruit tree research institute at Beijing, forestry, academy of sciences, china. 400 original images containing the natural scene of grape gray mold are shot together. Plant Village is a public, fair data set that is specifically used for crop pest identification. The dataset consisted of 54303 high resolution images, including different disease categories and healthy leaves of 38 plants. These images were obtained in a controlled laboratory. We utilized 1383 grape black measurements images and 1180 Zhang Putao black rot images. Syn-PV is a natural field image synthesized from plant village segmentation images obtained from a controlled laboratory by background replacement. A background replacement method is adopted to synthesize the grape disease image with a complex background. All datasets were manually annotated with disease areas and leaf areas by using labelme tools on the collected images. The annotated data is saved in JavaScript object (.json) format. The data is then converted to the PASCAL VOC 2012 data format, which has semantic tags for foreground and background objects. The invention uses Augmentor modules to perform geometric transformations such as random left/right flipping, random clipping, random sampling, color and brightness enhancement or reduction, etc. In the training process, the invention applies a basic and powerful data enhancement method in the semantic segmentation library.
To evaluate the effectiveness of the U-shaped perceptual Transformer, the model was compared to other segmentation methods. Including 3 classical segmentation methods such as deep labv3+, UNet, PSPNet.4 weight scale segmentation methods based on transducer such as PVT2, dual-ViT, segformer, segnext. The 7 lightweight segmentation methods based on the transducer, such as Seaformer, AFFormer, poolFormer, efficientFormer, LVT, nextViT, topformer.
The evaluation index adopts four indexes of accuracy, ioU, recall rate and Dice to measure the performance of the model. Meanwhile, parameters (parameters) of each model, gigabit floating point Operation Seconds (GFLOPs), fps and occupied memory are analyzed.
TABLE 1 quantitative comparison of grape mosaic Virus on plant village datasets based on CNN and transducer methods
Table 2 quantitative comparison of grape leaf and background on plant village dataset based on CNN and transducer methods
TABLE 3 quantitative comparison of grape leaf disease and background in field-PV datasets based on CNN and transducer methods
Experimental results show that the segmentation performance of the method is superior to that of the most advanced transducer method and the method based on deep learning at present. The invention has the advantages that the image segmentation performance and the training and running cost are comprehensively considered, the invention has the optimal performance in the complex small grape leaf spot segmentation task, and the balance of the segmentation performance and the speed is realized.

Claims (7)

1. A U-shaped sensing light-weight transducer method for segmenting small lesions of grape leaves is characterized in that a light-weight convolutional neural network MobileNet V2 is adopted, and multi-scale characteristic information is extracted through downsampling of a U-shaped pyramid; extracting a low-frequency global feature and a high-frequency local feature map by a context perception enhancement module; introducing a token aggregation strategy, and reducing detail information loss caused by direct aggregation of low-frequency and high-frequency features; the aggregated tokens are directly transmitted to a lightweight segmentation head to implement segmentation tasks.
2. The method of claim 1, wherein the U-shaped pyramid uses MobileNetV2 to extract feature information and uses an averaging pooling operator to reduce the resolution of the original tokens, concatenating labels of different dimensions along the channel dimension to generate new labels, which are used as inputs to the context awareness enhancement module.
3. The method of claim 1, wherein the context aware enhancement module comprises a prototype aware branch and a pixel aware broadcast branch.
4. A prototype perceived branch according to claim 3, characterized in that K and V are downsampled, the convolution layers exchange information between the labels along the spatial dimension, the nonlinear active layer is replaced by RELU6 and GELU, a batch normalization is added in each convolution, and fine-grained semantic information in the token is accommodated by the residual mapping of the Transformer.
5. A pixel-aware broadcast branch according to claim 3, wherein a convolution type attention mechanism is employed to efficiently mine context weights by sharing weights and pixel-aware weights; a linear layer is used to generate key K, query Q, and value V, as follows:
Q,K,V=Linear(X in )
wherein X is in Representing features input from the U-shaped pyramid.
6. A pixel-aware broadcast branch according to claim 3, characterized by the following steps:
step 6.1: local features are extracted by a depth convolution (DWconv) operator and shared weights are used on V, as follows:
V=DWconv(V)
step 6.2: performing local enhancement processing with pixel perception weights on the Q and the K; obtaining local information of Q and V respectively by using two convolutions with translational invariance; calculating the values of Q and V by Hadamard product for use as output; the Softmax in the traditional attention is replaced by Tanh and Awish; a gating mechanism is employed to obtain pixel perceptual weights.
7. The method of claim 1, wherein the token aggregation policy effectively aggregates low frequency global information and high frequency local information, reducing the number of channels on each scale; a global averaging pool of different convolution kernels and step sizes is employed to obtain feature maps of different image resolutions.
CN202310789644.2A 2023-06-30 2023-06-30 U-shaped sensing light-weight transducer method for segmenting small lesions of grape leaves Pending CN116824144A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310789644.2A CN116824144A (en) 2023-06-30 2023-06-30 U-shaped sensing light-weight transducer method for segmenting small lesions of grape leaves

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310789644.2A CN116824144A (en) 2023-06-30 2023-06-30 U-shaped sensing light-weight transducer method for segmenting small lesions of grape leaves

Publications (1)

Publication Number Publication Date
CN116824144A true CN116824144A (en) 2023-09-29

Family

ID=88142562

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310789644.2A Pending CN116824144A (en) 2023-06-30 2023-06-30 U-shaped sensing light-weight transducer method for segmenting small lesions of grape leaves

Country Status (1)

Country Link
CN (1) CN116824144A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117274607A (en) * 2023-11-23 2023-12-22 吉林大学 Multi-path pyramid-based lightweight medical image segmentation network, method and equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117274607A (en) * 2023-11-23 2023-12-22 吉林大学 Multi-path pyramid-based lightweight medical image segmentation network, method and equipment
CN117274607B (en) * 2023-11-23 2024-02-02 吉林大学 Multi-path pyramid-based lightweight medical image segmentation network, method and equipment

Similar Documents

Publication Publication Date Title
Junos et al. Automatic detection of oil palm fruits from UAV images using an improved YOLO model
Wang et al. NAS-guided lightweight multiscale attention fusion network for hyperspectral image classification
CN108460391B (en) Hyperspectral image unsupervised feature extraction method based on generation countermeasure network
Peng et al. Spatial–spectral transformer with cross-attention for hyperspectral image classification
Su et al. LodgeNet: Improved rice lodging recognition using semantic segmentation of UAV high-resolution remote sensing images
Hao et al. Growing period classification of Gynura bicolor DC using GL-CNN
Ilyas et al. Multi-scale context aggregation for strawberry fruit recognition and disease phenotyping
CN116824144A (en) U-shaped sensing light-weight transducer method for segmenting small lesions of grape leaves
Khan et al. End-to-end semantic leaf segmentation framework for plants disease classification
EP3971767A1 (en) Method for constructing farmland image-based convolutional neural network model, and system thereof
CN115909052A (en) Hyperspectral remote sensing image classification method based on hybrid convolutional neural network
Yang et al. Multi-scale spatial-spectral fusion based on multi-input fusion calculation and coordinate attention for hyperspectral image classification
CN115331104A (en) Crop planting information extraction method based on convolutional neural network
CN113435254A (en) Sentinel second image-based farmland deep learning extraction method
Sun et al. RL-DeepLabv3+: A lightweight rice lodging semantic segmentation model for unmanned rice harvester
Zheng et al. An efficient mobile model for insect image classification in the field pest management
Yeswanth et al. Residual skip network-based super-resolution for leaf disease detection of grape plant
Devisurya et al. Early detection of major diseases in turmeric plant using improved deep learning algorithm
Sharma et al. Multi classification of tomato leaf diseases: A convolutional neural network model
Shi et al. F 3 Net: Fast Fourier filter network for hyperspectral image classification
CN113221913A (en) Agriculture and forestry disease and pest fine-grained identification method and device based on Gaussian probability decision-level fusion
CN116091770A (en) Grape leaf lesion image segmentation method based on cross-resolution transducer model
Shantkumari et al. Machine learning techniques implementation for detection of grape leaf disease
Yuan et al. Impact of dataset on the study of crop disease image recognition
Jia et al. Semantic segmentation of deep learning remote sensing images based on band combination principle: Application in urban planning and land use

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination