CN116824144A - U-shaped sensing light-weight transducer method for segmenting small lesions of grape leaves - Google Patents
U-shaped sensing light-weight transducer method for segmenting small lesions of grape leaves Download PDFInfo
- Publication number
- CN116824144A CN116824144A CN202310789644.2A CN202310789644A CN116824144A CN 116824144 A CN116824144 A CN 116824144A CN 202310789644 A CN202310789644 A CN 202310789644A CN 116824144 A CN116824144 A CN 116824144A
- Authority
- CN
- China
- Prior art keywords
- information
- frequency
- pixel
- segmentation
- weights
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 235000009754 Vitis X bourquina Nutrition 0.000 title claims abstract description 20
- 235000012333 Vitis X labruscana Nutrition 0.000 title claims abstract description 20
- 235000014787 Vitis vinifera Nutrition 0.000 title claims abstract description 20
- 230000003902 lesion Effects 0.000 title claims abstract description 11
- 240000006365 Vitis vinifera Species 0.000 title claims 2
- 230000011218 segmentation Effects 0.000 claims abstract description 25
- 230000008447 perception Effects 0.000 claims abstract description 18
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 14
- 230000002776 aggregation Effects 0.000 claims abstract description 12
- 238000004220 aggregation Methods 0.000 claims abstract description 12
- 230000007246 mechanism Effects 0.000 claims description 7
- 238000012935 Averaging Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims 1
- 241000219095 Vitis Species 0.000 abstract description 18
- 201000010099 disease Diseases 0.000 description 8
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 8
- 241000196324 Embryophyta Species 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000012549 training Methods 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 3
- 101100295091 Arabidopsis thaliana NUDT14 gene Proteins 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000010339 dilation Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 241000233866 Fungi Species 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 241000607479 Yersinia pestis Species 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 239000008358 core component Substances 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Landscapes
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of agricultural information, and particularly relates to a U-shaped sensing light-weight transducer method for segmenting small lesions of grape leaves. The method comprises the steps of adopting a lightweight convolutional neural network MobileNetV2, and extracting multi-scale characteristic information through downsampling of a U-shaped pyramid; extracting a low-frequency global feature and a high-frequency local feature map by a context perception enhancement module; introducing a token aggregation strategy, and reducing detail information loss caused by direct aggregation of low-frequency and high-frequency features; the aggregated tokens are directly transmitted to a lightweight segmentation head to implement segmentation tasks. The invention can realize the balance of efficiency and speed so as to solve the problem of small spot segmentation of grape leaves in a complex background of natural fields.
Description
Technical Field
The invention belongs to the technical field of agricultural information, and particularly relates to a U-shaped sensing light-weight transducer method for segmenting small lesions of grape leaves.
Background
Grape leaf spot is one of the major factors responsible for reduced yield and quality in grape planting. Moreover, the grape leaf lesions can spread the fungus rapidly throughout the plantation and cause epidemics throughout the field. The distribution of diseases is quickly known by distributing labels to each pixel, the lightweight segmentation model is helpful for quickly diagnosing and monitoring the disease trend on the blade, and the treatment efficiency is improved and the treatment cost is reduced by targeted management measures. However, the heavyweight model is not friendly to these rapidly propagating plant diseases that require timely segmentation and is difficult to deploy on hardware devices with limited resources. In order to improve the segmentation efficiency, the segmentation model needs to be designed accurately, lightweight and fast.
(1) Convolutional neural network
The lightweight visual task has been overwhelmingly dominated by Convolutional Neural Networks (CNNs). The inherent generalized bias and weight sharing characteristics allow the model to characterize learning with fewer parameters. However, they have some problems that limit their performance: 1. the local connectivity of CNNs often prevents modeling long-term dependencies, ignoring fine-grained semantic information in complex contexts. 2. The fixed convolution kernel and weight result in loss of detail information, and the pixel semantics of small target diseases cannot be extracted. The transform-based approach has demonstrated excellent long-range modeling capabilities, a method to learn visual global characterization instead of CNN, benefiting from self-attention mechanisms. However, heavy weights and time-consuming computer mechanisms are less friendly to reasoning in realistic industrial deployment scenarios.
(2)Transformer
In order to reduce the computational efficiency of the model, many variants of the transducer are struggling to free the model from the dilemma of time-consuming computations. In addition, many work extracts lesion low resolution features by introducing convolution operators, downsampling feature maps, employing pyramid hierarchies, and redesigning markers. When the model is compressed to a size suitable for movement, the segmentation performance of the corresponding model is sacrificed, which appears to be a catastrophic failure in terms of the lightweight model. CNNs exhibit impressive performance in terms of their inherently biased architectural design, and thus recent work has attempted to embed the advantages of CNNs into transformers to achieve excellent accuracy-efficiency trade-offs. The MobileFormer utilizes MobileNet and Transformer to realize bidirectional fusion of local and global information with lower calculation cost. Unfortunately, the above approach focuses on designing to capture low frequency global information, ignoring the importance of high frequency local information, which helps to extract the features of small lesions.
(3) Feature aggregation policy
A weakness of directly aggregating high resolution features and low frequency feature information is that details are easily overwhelmed or lost. Some pioneering efforts explored various polymerization schemes to overcome the above problems. ASPP performs dilation convolution with multiple parallel branches with different dilation rates, which enables the model to aggregate local and global background information without significant increase in computational complexity. PPM in PSPNet captures multi-scale context information by pyramid pooling of the input feature map, which enhances model perception of different scale features and produces finer segmentation results than ASPP. To obtain more refined context information, subsequent variants are derived for applications such as DAPPM, feature integration through larger convolution kernels and deeper information flow. However, the depth information of the above method is not processed in parallel, and the number of channels per scale is relatively large, which means that the calculation amount is relatively large.
Disclosure of Invention
The invention discloses a U-shaped sensing light-weight transducer method for dividing grape leaf small lesions, which adopts a light-weight convolutional neural network MobileNet V2 to extract multi-scale characteristic information through downsampling of a U-shaped pyramid; extracting a low-frequency global feature and a high-frequency local feature map by a context perception enhancement module; introducing a token aggregation strategy, and reducing detail information loss caused by direct aggregation of low-frequency and high-frequency features; the aggregated tokens are directly transmitted to a lightweight segmentation head to implement segmentation tasks.
Preferably, the U-shaped pyramid adopts MobileNetV2 to extract characteristic information, and reduces the resolution of original tokens by using an average pooling operator, and splices marks of different scales along the channel dimension to generate new marks, wherein the new marks are used as input of a context perception enhancement module; the extraction of the multi-scale feature information has a low computational complexity since the multi-scale labels are downsampled to a smaller resolution, i.e. the new labels have a large number of channels.
Preferably, the context-aware enhancement module includes a prototype-aware branch and a pixel-aware broadcast branch.
Preferably, the prototype perceived branch downsamples K and V to reduce matrix operations; the convolution layer exchanges information between marks along the spatial dimension, reducing the number of remodelling, and the nonlinear activation layer is replaced by RELU6 and GELU; the speed of the added batch normalization in each convolution is superior to that of the inferred layer normalization mode; fine-grained semantic information in the token is contained by the residual map of the transducer. The prototype perceived branch effectively achieves a global receptive field and enhances low frequency characterization at a lower computational cost.
Preferably, the pixel perceives the broadcasting branch, adopts a convolution type attention mechanism, and effectively mines the context weight through sharing the weight and the pixel perceiving weight; a linear layer is used to generate key K, query Q, and value V, as follows:
Q,K,V=Linear(X in )
wherein X is in Representing features input from the U-shaped pyramid.
Preferably, the pixel perceives a broadcast branch, which is divided into the following steps:
step 1: local features are extracted by a depth convolution (DWconv) operator and shared weights are used on V, as follows:
V=DWconv(V)
step 2: performing local enhancement processing with pixel perception weights on the Q and the K; obtaining local information of Q and V respectively by using two convolutions with translational invariance; calculating the values of Q and K by the Hadamard product for use as output; the Softmax in the traditional attention is replaced by Tanh and Awish to obtain the pixel perception weight between-1 and 1; the gating mechanism is adopted to obtain the pixel perception weight, so that the pixel perception weight has stronger nonlinearity, and the stronger nonlinearity means higher quality pixel perception weight, and is specifically expressed as follows:
Q l =DWconv(Q)
K l =DWconv(K)
Atten l =Linear(Swinsh(Linear(Q l ⊙K l )))
the generated strong nonlinear weight is aggregated with other pixels, and local characteristics are enhanced through Hadamard product operation. The output graph is defined as:
X local =Attn⊙V
preferably, the token aggregation policy effectively aggregates low-frequency global information and high-frequency local information, and reduces the number of channels on each scale; a global averaging pool of different convolution kernels and step sizes is employed to obtain feature maps of different image resolutions. The channel dimensions of different scales are transformed by a 1 x 1 convolution and the feature map is up-sampled. The original features are then aggregated with background information of different scales using a 3 x 3 convolution method. Finally, the feature map is concatenated and compressed using a 1 x 1 convolution. Furthermore, for ease of optimization, a residual map of 1×1 is also introduced. Assuming x is the input, the features for each scale can be expressed as:
where Up represents upsampling.
The invention has the advantages that a light U-shaped perception transducer is provided, which takes the descending samples of the token with different scales to small scales as input and inherits the advantages of CNN and transducer; the core component perception enhancement module employs a parallel architecture in a small scale token n to achieve superior cost effectiveness. The perception enhancement module consists of two branches: the prototype perception branch learns low-frequency global information through downsampling K and V, the pixel perception broadcasting branch adopts a gating mechanism to enhance nonlinearity, and high-frequency local information is mined through sharing weights and context perception weights; a token aggregation strategy is designed to compensate for the sacrificed details without increasing the number of parameters. The invention can realize the balance of efficiency and speed so as to solve the problem of small spot segmentation of grape leaves in a complex background of natural fields.
Drawings
FIG. 1 is a diagram of the overall architecture of a U-shaped perceptual lightweight transducer method for segmenting small lesions of grape leaves;
FIG. 2 is a schematic diagram of a pixel aware broadcast branch;
fig. 3 is a schematic diagram of a token aggregation policy.
Detailed Description
The overall framework diagram of the U-shaped perception lightweight transducer method for segmenting small lesions of grape leaves is shown in figure 1; FIG. 2 is a schematic diagram of a pixel-aware broadcast branch in a context-aware enhancement module of the present invention; fig. 3 is a schematic diagram of a token aggregation policy according to the present disclosure.
In the training stage, the experiment and other comparison method experiments are deployed in a pytorch and a segmentation library for semantic segmentation experiments. All models were trained on NVIDIA Tesla V100 GPU. The present invention follows the same training strategy of the previous work, considering the fairness of the comparison. Specifically, the image is randomly cropped to 512×512. In the training phase AdamW with a weight decay of 0.01 was used to optimize the model of the invention. Training LRT using "poly" LR strategy
(lr=baselr×(1-spoch/maxiter) power ) Wherein the "poly" LR strategy factor is set to 1, the initial learning rate is 6×10 -6 A total of 16 tens of thousands of iterations.
The present invention evaluates the design architecture on three data sets, including a field-PV data set, a plant village data set, and a hybrid-PV data set. The Field-PV dataset was collected in an OLYMPUS OM-D camera used by forestry and fruit tree research institute at Beijing, forestry, academy of sciences, china. 400 original images containing the natural scene of grape gray mold are shot together. Plant Village is a public, fair data set that is specifically used for crop pest identification. The dataset consisted of 54303 high resolution images, including different disease categories and healthy leaves of 38 plants. These images were obtained in a controlled laboratory. We utilized 1383 grape black measurements images and 1180 Zhang Putao black rot images. Syn-PV is a natural field image synthesized from plant village segmentation images obtained from a controlled laboratory by background replacement. A background replacement method is adopted to synthesize the grape disease image with a complex background. All datasets were manually annotated with disease areas and leaf areas by using labelme tools on the collected images. The annotated data is saved in JavaScript object (.json) format. The data is then converted to the PASCAL VOC 2012 data format, which has semantic tags for foreground and background objects. The invention uses Augmentor modules to perform geometric transformations such as random left/right flipping, random clipping, random sampling, color and brightness enhancement or reduction, etc. In the training process, the invention applies a basic and powerful data enhancement method in the semantic segmentation library.
To evaluate the effectiveness of the U-shaped perceptual Transformer, the model was compared to other segmentation methods. Including 3 classical segmentation methods such as deep labv3+, UNet, PSPNet.4 weight scale segmentation methods based on transducer such as PVT2, dual-ViT, segformer, segnext. The 7 lightweight segmentation methods based on the transducer, such as Seaformer, AFFormer, poolFormer, efficientFormer, LVT, nextViT, topformer.
The evaluation index adopts four indexes of accuracy, ioU, recall rate and Dice to measure the performance of the model. Meanwhile, parameters (parameters) of each model, gigabit floating point Operation Seconds (GFLOPs), fps and occupied memory are analyzed.
TABLE 1 quantitative comparison of grape mosaic Virus on plant village datasets based on CNN and transducer methods
Table 2 quantitative comparison of grape leaf and background on plant village dataset based on CNN and transducer methods
TABLE 3 quantitative comparison of grape leaf disease and background in field-PV datasets based on CNN and transducer methods
Experimental results show that the segmentation performance of the method is superior to that of the most advanced transducer method and the method based on deep learning at present. The invention has the advantages that the image segmentation performance and the training and running cost are comprehensively considered, the invention has the optimal performance in the complex small grape leaf spot segmentation task, and the balance of the segmentation performance and the speed is realized.
Claims (7)
1. A U-shaped sensing light-weight transducer method for segmenting small lesions of grape leaves is characterized in that a light-weight convolutional neural network MobileNet V2 is adopted, and multi-scale characteristic information is extracted through downsampling of a U-shaped pyramid; extracting a low-frequency global feature and a high-frequency local feature map by a context perception enhancement module; introducing a token aggregation strategy, and reducing detail information loss caused by direct aggregation of low-frequency and high-frequency features; the aggregated tokens are directly transmitted to a lightweight segmentation head to implement segmentation tasks.
2. The method of claim 1, wherein the U-shaped pyramid uses MobileNetV2 to extract feature information and uses an averaging pooling operator to reduce the resolution of the original tokens, concatenating labels of different dimensions along the channel dimension to generate new labels, which are used as inputs to the context awareness enhancement module.
3. The method of claim 1, wherein the context aware enhancement module comprises a prototype aware branch and a pixel aware broadcast branch.
4. A prototype perceived branch according to claim 3, characterized in that K and V are downsampled, the convolution layers exchange information between the labels along the spatial dimension, the nonlinear active layer is replaced by RELU6 and GELU, a batch normalization is added in each convolution, and fine-grained semantic information in the token is accommodated by the residual mapping of the Transformer.
5. A pixel-aware broadcast branch according to claim 3, wherein a convolution type attention mechanism is employed to efficiently mine context weights by sharing weights and pixel-aware weights; a linear layer is used to generate key K, query Q, and value V, as follows:
Q,K,V=Linear(X in )
wherein X is in Representing features input from the U-shaped pyramid.
6. A pixel-aware broadcast branch according to claim 3, characterized by the following steps:
step 6.1: local features are extracted by a depth convolution (DWconv) operator and shared weights are used on V, as follows:
V=DWconv(V)
step 6.2: performing local enhancement processing with pixel perception weights on the Q and the K; obtaining local information of Q and V respectively by using two convolutions with translational invariance; calculating the values of Q and V by Hadamard product for use as output; the Softmax in the traditional attention is replaced by Tanh and Awish; a gating mechanism is employed to obtain pixel perceptual weights.
7. The method of claim 1, wherein the token aggregation policy effectively aggregates low frequency global information and high frequency local information, reducing the number of channels on each scale; a global averaging pool of different convolution kernels and step sizes is employed to obtain feature maps of different image resolutions.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310789644.2A CN116824144A (en) | 2023-06-30 | 2023-06-30 | U-shaped sensing light-weight transducer method for segmenting small lesions of grape leaves |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310789644.2A CN116824144A (en) | 2023-06-30 | 2023-06-30 | U-shaped sensing light-weight transducer method for segmenting small lesions of grape leaves |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116824144A true CN116824144A (en) | 2023-09-29 |
Family
ID=88142562
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310789644.2A Pending CN116824144A (en) | 2023-06-30 | 2023-06-30 | U-shaped sensing light-weight transducer method for segmenting small lesions of grape leaves |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116824144A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117274607A (en) * | 2023-11-23 | 2023-12-22 | 吉林大学 | Multi-path pyramid-based lightweight medical image segmentation network, method and equipment |
-
2023
- 2023-06-30 CN CN202310789644.2A patent/CN116824144A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117274607A (en) * | 2023-11-23 | 2023-12-22 | 吉林大学 | Multi-path pyramid-based lightweight medical image segmentation network, method and equipment |
CN117274607B (en) * | 2023-11-23 | 2024-02-02 | 吉林大学 | Multi-path pyramid-based lightweight medical image segmentation network, method and equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Junos et al. | Automatic detection of oil palm fruits from UAV images using an improved YOLO model | |
Wang et al. | NAS-guided lightweight multiscale attention fusion network for hyperspectral image classification | |
CN108460391B (en) | Hyperspectral image unsupervised feature extraction method based on generation countermeasure network | |
Peng et al. | Spatial–spectral transformer with cross-attention for hyperspectral image classification | |
Su et al. | LodgeNet: Improved rice lodging recognition using semantic segmentation of UAV high-resolution remote sensing images | |
Hao et al. | Growing period classification of Gynura bicolor DC using GL-CNN | |
Ilyas et al. | Multi-scale context aggregation for strawberry fruit recognition and disease phenotyping | |
CN116824144A (en) | U-shaped sensing light-weight transducer method for segmenting small lesions of grape leaves | |
Khan et al. | End-to-end semantic leaf segmentation framework for plants disease classification | |
EP3971767A1 (en) | Method for constructing farmland image-based convolutional neural network model, and system thereof | |
CN115909052A (en) | Hyperspectral remote sensing image classification method based on hybrid convolutional neural network | |
Yang et al. | Multi-scale spatial-spectral fusion based on multi-input fusion calculation and coordinate attention for hyperspectral image classification | |
CN115331104A (en) | Crop planting information extraction method based on convolutional neural network | |
CN113435254A (en) | Sentinel second image-based farmland deep learning extraction method | |
Sun et al. | RL-DeepLabv3+: A lightweight rice lodging semantic segmentation model for unmanned rice harvester | |
Zheng et al. | An efficient mobile model for insect image classification in the field pest management | |
Yeswanth et al. | Residual skip network-based super-resolution for leaf disease detection of grape plant | |
Devisurya et al. | Early detection of major diseases in turmeric plant using improved deep learning algorithm | |
Sharma et al. | Multi classification of tomato leaf diseases: A convolutional neural network model | |
Shi et al. | F 3 Net: Fast Fourier filter network for hyperspectral image classification | |
CN113221913A (en) | Agriculture and forestry disease and pest fine-grained identification method and device based on Gaussian probability decision-level fusion | |
CN116091770A (en) | Grape leaf lesion image segmentation method based on cross-resolution transducer model | |
Shantkumari et al. | Machine learning techniques implementation for detection of grape leaf disease | |
Yuan et al. | Impact of dataset on the study of crop disease image recognition | |
Jia et al. | Semantic segmentation of deep learning remote sensing images based on band combination principle: Application in urban planning and land use |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |