CN116740121A

CN116740121A - Straw image segmentation method based on special neural network and image preprocessing

Info

Publication number: CN116740121A
Application number: CN202310707836.4A
Authority: CN
Inventors: 刘振泽; 胡闻捷; 臧一凡; 陈金炎; 董迪锴; 王成喜; 孙吉; 胡海洋
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2023-06-15
Filing date: 2023-06-15
Publication date: 2023-09-12

Abstract

The invention provides a straw image segmentation method based on a special neural network and image preprocessing, which is used for improving the accuracy and efficiency of straw image segmentation and realizing the balance of complexity and accuracy. Aims to solve the problem of straw agronomic image segmentation. The algorithm firstly collects color RGB images through an unmanned plane, limits the image size to 512×384 by using a random interception technology, and makes a Straw320 data set for training and testing of a subsequent network. Next, a straw image graying method is proposed, aiming at maintaining the distinguishing degree of the image and reducing the complexity of the subsequent model. Finally, a Straw Mixing Network (SMN) is introduced for processing the preprocessed image. The network comprises a position coding module for enhancing the relative information of the straw images and a mixed characteristic extraction module for balancing complexity and accuracy.

Description

Straw image segmentation method based on special neural network and image preprocessing

Technical Field

The invention relates to the field of image processing, in particular to a straw image segmentation method based on a special neural network and image preprocessing. The method can be used for straw processing and utilization in the agricultural field, and provides accurate straw image segmentation results.

Background

The straw has important significance in agriculture. First, the protective work of the straw is critical to soil protection. The straw coverage can reduce the evaporation of water, reduce soil erosion and prevent the degradation of soil quality. Secondly, the straw is used as one of organic matters, and can be returned to the field, so that the organic matter content of the soil is effectively increased, and the soil structure and fertility are improved. Therefore, the treatment and the utilization of the straw are of great significance to agricultural production and environmental protection.

The segmentation of the straw images plays a fundamental role in the subsequent processing and utilization of the straw. By dividing the straw image, the region and shape information of the straw can be accurately extracted, and a reference basis is provided for subsequent processing and utilization. For example, the segmentation of the straw image may help determine how to collect, compact, stack, etc. the straw, as well as to evaluate the quality of the straw and to study the agriculture.

However, the segmentation task of straw images faces some challenges. First, straw images often have complex texture, color, and shape variations, making conventional image segmentation algorithms less accurate in segmenting straw images. Secondly, the existing segmentation neural network is not specially designed for straw images, and the effective learning and expression capability of straw characteristics is lacking, so that the segmentation accuracy is further reduced. In addition, the balance between complexity and accuracy of the straw image segmentation algorithm is also a challenging problem.

Therefore, it is necessary to provide a straw image segmentation method based on a special neural network and image preprocessing, so as to improve the accuracy and efficiency of straw image segmentation and realize the balance of complexity and accuracy. By introducing a feature learning and expression mechanism for the straw image, the algorithm can better process the complexity of the straw image and provide accurate reference results for subsequent straw processing and utilization work.

Disclosure of Invention

The invention provides a straw image segmentation method based on a special neural network and image preprocessing, which mainly comprises the following steps:

step 1, color RGB images are collected through an unmanned aerial vehicle, a Straw320 data set is manufactured, and normalization processing is conducted on the data set.

Step 2, designing a straw image-based graying method, aiming at keeping the distinguishing degree of images and reducing the complexity of a subsequent model.

Straw is typically yellow in color and background is typically black. In order to improve the distinguishing degree of the straw and the background and reduce the complexity of the network. From the corresponding value (255,255,0) of yellow in RGB space, the following graying coefficients can be derived: 0.5,0.5,0. This means that the pixel values of the red channel are weighted averaged with the pixel values of the green channel, whereas the pixel values of the blue channel are ignored. Thus, the following graying formula can be used:

P＝0.5×R+0.5×G+0×B

where P represents the pixel values after graying, R, G and B represent the pixel values of the red, green and blue channels, respectively, in the original color image. In this way, the invention enhances the information of the yellow channel, and reduces the complexity of network processing, so that the subsequent image segmentation task is more efficient and accurate.

And 3, providing a Straw Mixing Network (SMN) for processing the preprocessed image. The network comprises a position coding module which aims at enhancing the relative information of the straw image and adopts a mixed characteristic extraction module which balances complexity and accuracy.

The SMN network uses the classical U-Net segmentation network structure as a reference, and performs some key improvements so as to better adapt to the characteristics of straw images. The overall structure is shown in fig. 2. Firstly, the invention introduces a position coding module to provide the position information of the straw and the background in the straw image. By introducing the position information, the network can better understand the spatial distribution of the straws and improve the accuracy of segmentation. Secondly, the invention designs a mixed feature extraction module which balances complexity and accuracy. The module combines a jump connection mechanism of U-Net, and extracts multi-scale characteristic information while keeping the network complexity controllable. The characteristics of different scales are synthesized, the characteristics are fused and selected, the network can better distinguish straw from background, and the accuracy of the segmentation result is improved. Finally, the SMN network uses downsampling and upsampling operations to transfer information and recover features between the encoder and decoder. The downsampling operation 7 reduces the feature size and the number of parameters through the maximum pooling layer, and accelerates the training speed. The upsampling operation restores the spatial resolution of the features using bilinear interpolation, preserving more detailed information.

And 3-1, a relative position coding module. In the field of image processing, abundant interaction information exists among pixels, and the accuracy of an overall task can be improved by effectively extracting information among points.

The invention proposes a relative position coding module for coupling relative position coordinates with an original image in a simple way. To reduce the complexity of the algorithm, the invention divides the original image into small blocks (i.e., patches) of the same size, and then fuses the relative position information with the pixel values of each small block. Assume that the input image has a width W and a height H, and is divided into small blocks of the size patch_size×patch_size. The abscissa range of each tile can be expressed as:

col_indices＝[0,1,...,W/(patch_size-1)]

row_indices＝[0,1,...,H/(patch_size-1)]

for relative position coding, the present invention defines two coded values col_co and row_co, which represent the relative positions of columns and rows, respectively. These encoded values are obtained by dividing col_indices and row_indices by (W/patch_size) and (H/patch_size), as follows:

col_co＝col_indices/(W/patch_size)

row_co＝row_indices/(H/patch_size)

the present invention then adds these encoded values to the original data to fuse the relative position information. Because the original data is normalized, the value range is between (0, 1), and the direct addition of col_co and row_co codes can mask the original image data information, so that the data is unbalanced, and the convergence of the network is further affected. Therefore, the invention introduces a flexible parameter β to limit the influence of the relative position information, the formula is as follows:

P′＝P+β(col_co+row_co)

wherein P is E R ^W×H×1 ，P′∈R ^W×H×1 。

Step 3-2, the Mixed feature extraction module (Mixed-extract feature module) plays a key role in the straw segmentation task, aiming at achieving a balance between accuracy and complexity. To address this challenge, the present invention proposes a hybrid feature extraction module that combines convolution operations and adaptive attention mechanisms to achieve more efficient image feature extraction based on the design principles of the encocoder-Decoder.

The module first performs local feature extraction on the image using convolution operations. The convolution operation can capture the spatial relationship between adjacent features in the image and keep the feature dimensions unchanged. Thus, the invention can fully utilize the local information of the image and extract rich local features.

Second, the hybrid feature extraction module introduces an adaptive attention mechanism for extracting correlations between global features of the image. The adaptive attention mechanism can automatically adjust the importance of different features by learning weights between the features, thereby capturing global context information of the image more accurately. The global feature extraction capability enables the module to better understand the relevance between different areas in the image, so that the performance of an image processing task is improved. In order to reduce the complexity of the attention computation, the present invention proposes a method to add a hidden layer to achieve a balance between accuracy and complexity.

In the design of the module, the invention simplifies the Encoder function to be Z= Encoder (Input), and the Encoder function to be the precursor=encoder (Z) can also be regarded as Z= Decoder (Predict). Due to the similarity of the Encoder and the Decoder networks, the present invention maintains a consistent design principle in the hybrid feature extraction module. Specifically, when processing features, the Encoder part performs convolution operation first, and then applies an adaptive attention mechanism; while the Decoder part uses an adaptive attention mechanism before performing the convolution operation. The calculation process of the adaptive attention mechanism is as follows:

first, the input features are represented as shapes (B, H _c ,W _c ) Wherein B is the batch size, H _c And W is _c The height and width of the feature, respectively.

In order to reduce the memory consumption of attention calculation, the invention divides the input features into the dimensions of token×token, and H _c And W is _c Dividing each by token to obtain a shape (B, (H) _c /token)×(W _c /token),token x token). By doing so, the features can be partitioned into smaller blocks for subsequent computation.

However, direct attention calculations to the token x token dimension may result in memory overflow. To solve this problem, the present invention introduces a hidden layer, hidden_size, which reduces the feature dimension by adding hidden layers. The dimensions of token×token were reduced to hidden_size to give a shape (B, (H) _c /token)×(W _c /token), hidden_size), which can reduce the amount of computation and memory consumption. Next, the present invention performs attention calculation on the feature after the dimension reduction. First, features are mapped linearly to Q, K, V tensors, representing query, key, and value, respectively. Then, the attention layer output y=softmax (Q, K) V is calculated by a softmax function, resulting in a shape (B, (H) _c /token)×(W _c /token), hidden_size).

Finally, the invention restores the attention layer output Y to the original dimension, taking the shape (B, (H) _c /token)×(W _c Token), hidden_size) into a shape (B, H) _c ,W _c ) Is characterized by the following. Thus, the present invention results in a feature representation that is processed by an adaptive attention mechanism that reduces the computational load of the network by adding a hidden layer.

Through the calculation steps, the self-adaptive attention mechanism can reduce the memory consumption of attention calculation while using a smaller token. The mechanism has important application value in processing image tasks, can reduce the computational complexity while maintaining the accuracy, and brings convenience and efficiency to model training and inference.

And 3-3, a downsampling module, wherein maximum bipartite pooling is used in the downsampling module.

And 3-4, bilinear interpolation is used in the up-sampling module.

And 3-5, constructing an SMN neural network according to the relative position coding module, the mixed characteristic extraction module, the downsampling module and the upsampling module which are designed from 3-1 to 3-4.

And step 3-6, training and testing the SMN network set in the step 3-5 by using a preset Straw320 data set. Wherein cross entropy is used as a loss function, adam optimizer is used, and learning rate cosine scheduler is used.

The straw image segmentation method based on the special neural network and the image preprocessing has the following technical effects:

1. the image segmentation accuracy is improved: through the graying method and the position coding module, the invention enhances the yellow channel information in the straw image, so that the straw and the background are more clearly distinguished. The position coding module provides relative position information of the straw and the background in the straw image, and is favorable for the network to better understand the spatial distribution of the straw, so that the accuracy of image segmentation is improved.

2. Balance complexity and accuracy: the mixed feature extraction module designed by the invention combines convolution operation and a self-adaptive attention mechanism, and realizes effective extraction of local features and global features of straw images. By adding the hidden layer and reducing the feature dimension, the attention mechanism improves the performance of the image processing task while keeping smaller calculation amount and memory consumption. The design of the balance complexity and accuracy enables the network to obtain accurate segmentation results while keeping high efficiency.

3. Improved U-Net structure: the invention improves the structure based on the U-Net segmentation network, adds a position coding module and a mixed characteristic extraction module, and adapts to the characteristics of straw images. Through fusion and selection of a jump connection mechanism and multi-scale features, the network can better distinguish straw from background, and accuracy of a segmentation result is improved.

4. Efficient image processing: the downsampling and upsampling operations perform information transfer and feature recovery between the encoder and decoder, speeding up training and preserving more detailed information. The upsampling method of bilinear interpolation can restore the spatial resolution of the features and maintain the integrity of the image details.

5. And (3) comprehensive experiment verification: the invention uses a preset Straw320 data set to train and test, adopts cross entropy as a loss function, and uses an Adam optimizer and a learning rate cosine scheduler. Experiments prove that the method provided by the invention has good effect in the straw image segmentation task, and the accuracy and feasibility of the method are verified.

In summary, the straw image segmentation method provided by the invention can effectively realize accurate segmentation of straw and background through image preprocessing and the design of the special neural network, and improves the efficiency and accuracy of image processing.

Drawings

Fig. 1 is a flowchart of a straw image segmentation method based on a private neural network and image preprocessing.

Fig. 2 is a flow chart of the SMN structure.

Detailed Description

The following detailed description of embodiments of the invention, taken in conjunction with the accompanying drawings, will provide a more readily understood understanding of the objects, technical solutions, features, etc., it being evident that the examples described are only a few of the examples of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. The examples are only intended to illustrate the invention and are not intended to limit it.

The invention relates to the field of image processing, in particular to a straw image segmentation method based on a special neural network and image preprocessing. The method can be used for straw processing and utilization in the agricultural field, and provides accurate straw image segmentation results. The overall flow chart is shown in fig. 1. The specific embodiments can be illustrated by the following steps:

step 1, color RGB images are collected through an unmanned aerial vehicle, the image size is limited to 512 multiplied by 384 by using a random interception technology, and a Straw320 data set is manufactured for subsequent network training and testing.

The image is acquired at a fixed height at a specific place by using the MINI2 unmanned aerial vehicle in the Xinjiang. A total of 320 images were acquired with a resolution of 4K (1K x 400). Because of hardware limitations, complete 4K data cannot be simultaneously input to the graphics card for training. Thus, for each image, the invention randomly intercepts a 384×512 pixel region and creates a corresponding Mask label based on the region. Wherein "1" in the label indicates a straw portion, and "0" indicates a background (land) portion.

For further experiments and evaluations, the present invention randomly divides the data set into training, validation and test sets at a ratio of 6:2:2. Such partitioning helps to monitor the performance of the model during training and evaluate its generalization ability on new data.

In order to improve the model training effect and the convergence rate, the invention normalizes the original RGB data and limits the pixel value to the range of (0, 1). Such a preprocessing step helps to eliminate scale differences in the data, making the model easier to learn meaningful features.

In the straw image segmentation task, color images acquired by unmanned aerial vehicle cameras and robot cameras are generally used as input. A color image consists of three channels, red (R), green (G) and blue (B), with the pixel value of each channel representing the intensity of that color in the image. However, for the straw image segmentation task, the invention mainly focuses on the degree of distinction between straw and background, rather than the detailed information of color. Thus, converting a color image to a grayscale image may reduce the complexity of the subsequent network while preserving sufficient discrimination.

Straw is typically yellow in color and background is typically black. In order to improve the distinguishing degree of the straw and the background and reduce the complexity of the network. From the corresponding value (255,255,0) of yellow in RGB space, the following graying coefficients can be derived: 0.5,0.5,0. This means that the pixel values of the red channel are weighted averaged with the pixel values of the green channel, whereas the pixel values of the blue channel are ignored. Thus, the present invention can use the following graying formula:

P＝0.5×R+0.5×G+0×B

in the formula, P represents the pixel value after graying, and R, G and B represent the pixel values of red, green, and blue channels, respectively, in the original color image. In this way, the invention enhances the information of the yellow channel, and reduces the complexity of network processing, so that the subsequent image segmentation task is more efficient and accurate.

The SMN network uses the classical U-Net segmentation network structure as a reference, and performs some key improvements so as to better adapt to the characteristics of straw images. The overall structure is shown in fig. 2, where the digits represent the feature dimensions entered into the module. Firstly, the invention introduces a position coding module to provide the position information of the straw and the background in the straw image. By introducing the position information, the network can better understand the spatial distribution of the straws and improve the accuracy of segmentation. Secondly, the invention designs a mixed feature extraction module which balances complexity and accuracy. The module combines a jump connection mechanism of U-Net, and extracts multi-scale characteristic information while keeping the network complex 11 impurity degree controllable. By integrating the features of different scales and carrying out feature fusion and selection, the network can better distinguish straw from background and improve the accuracy of the segmentation result. Finally, the SMN network uses downsampling and upsampling operations to transfer information and recover features between the encoder and decoder. The downsampling operation reduces the feature size and the number of parameters through the maximum pooling layer, and accelerates the training speed. The upsampling operation restores the spatial resolution of the features using bilinear interpolation, preserving more detailed information.

And 3-1, a relative position coding module. In the field of image processing, abundant interaction information exists among pixels, and the accuracy of an overall task can be improved by effectively extracting information among points. However, the conventional convolution operation is limited by the size of the convolution kernel, and the correlation features between pixels which are too far apart cannot be sufficiently extracted, which is one of the reasons for poor segmentation effect at present.

col_indices＝[0,1,...,W/(patch_size-1)]

row_indices＝[0,1,...,H/(patch_size-1)]

col_co＝col_indices/(W/patch_size)

row_co＝row_indices/(H/patch_size)

P′＝P+β(col_co+row_co)

wherein P is E R ^W×H×1 ，P′∈R ^W×H×1 。

Step 3-2, the Mixed feature extraction module (Mixed-extract feature module) plays a key role in the straw segmentation task, aiming at achieving a balance between accuracy and complexity. To address these 12 challenges, the present invention proposes a hybrid feature extraction module that combines convolution operations with adaptive attention mechanisms to achieve more efficient image feature extraction based on the design principles of the Encoder-Decoder.

In the design of the module, the invention simplifies the Encoder function to be Z= Encoder (Input), and the Encoder function to be the precursor=encoder (Z) can also be regarded as Z= Decoder (Predict). Due to the similarity of the Encoder and the Decoder networks, the present invention maintains a consistent design principle in the hybrid feature extraction module. Specifically, when processing features, the Encoder part performs convolution operation first, and then applies an adaptive attention mechanism; while the Decoder part uses an adaptive attention mechanism before performing the convolution operation.

1. The convolution layer is a convolution operation with a hole convolution kernel, and the hole convolution (Dilated Convolution) is a convolution operation which introduces a hole factor based on the traditional convolution to increase the scope of the receptive field. The hole convolution can help the network increase the perception area while maintaining the computational efficiency, and is suitable for processing images or features with large-scale space structures.

The calculation formula of the cavity convolution is as follows:

wherein:

y [ i, j ] is the value of the i, j-th position of the output feature map.

X [ i+r.k, j+r.l ] is the value of the i+r.k, j+r.l-th position of the input feature map, where r is a hole-forming factor.

W [ K, L, M ] is the kth, L, M weight value of the convolution kernel, K, L, M is the size of the convolution kernel.

The main characteristic of the cavity convolution is that the scope of the receptive field is enlarged by introducing a cavity factor r. Specifically, when r=1, the hole convolution degenerates into a conventional convolution operation; when r >1, there will be a space between the sampling points on the input signature, thereby expanding the receptive field. Therefore, by adjusting the size of the void factor, the computational efficiency can be maintained while increasing the perception range.

2. Adaptive attention mechanism layer (adaptive attention layer): viT typically segments the input image into multiple sets of tokens and calculates correlations between the tokens using an adaptive attention mechanism to extract valid information for the input features. However, if the size of the token is set too small, the attention calculation amount may be too large, thereby causing a problem of memory overflow and the like. To solve this problem, the present invention proposes an adaptive attention mechanism. This mechanism reduces the intermediate dimension by increasing the number of hidden layers (hiddenlayers), thereby directly reducing the complexity of the attention computation while maintaining a smaller token size.

The calculation process of the adaptive attention mechanism is as follows:

In order to reduce the memory consumption of attention calculation, the invention divides the input features into the dimensions of token×token, and H _c And W is _c Dividing each by token to obtain a shape (B, (H) _c /token)×(W _c /token), token x token). By doing so, the features can be partitioned into smaller blocks for subsequent computation.

However, direct attention calculations to the token x token dimension may result in memory overflow. To solve this problem, the present invention introduces a hidden layer, hidden_size, which reduces the feature dimension by adding hidden layers. The dimensions of token×token were reduced to hidden_size to give a shape (B, (H) _c /token)×(W _c /token), hidden_size), which can reduce the amount of computation and memory consumption.

Next, the present invention performs attention calculation on the feature after the dimension reduction. First, features are mapped linearly to Q, K, V tensors, representing query, key, and value, respectively. Then, the attention layer output y=softmax (Q, K) V is calculated by a softmax function, resulting in a shape (B, (H) _c /token)×(W _c /token), hidden_size).

Step 3-3, a downsampling module, wherein the downsampling module uses the maximum binary pooling mathematical formula as follows:

representing input features as X _d The output characteristic is denoted as Y _d Dimension is H _d ×W _d ×C。

The mathematical formula of the maximum bipartite pooling operation is as follows:

Y _d [i,j,c]＝

max(X _d [2i,2j,c],X _d [2i,2j+1,c],X _d [2i+1,2j,c],X _d [2i+1,2j+1,c])

wherein i and j represent the output feature map Y _d C represents the channel index.

This formula describes a maximum bipartite pooling operation, selecting, for each output profile position (i, j), the maximum of the corresponding four input profile positions as the output value. In this way the spatial dimension of the output profile will be halved (both height and width divided by 2), the number of channels remaining unchanged.

The mathematical formula of using bilinear interpolation (Bilinear Interpolation) in the up-sampling module in steps 3-4 is as follows:

inputting a feature map: x is X _u Dimension is H _u ×W _u X C, where Hu is height, wu is width, and C is the number of channels.

Output feature map of up-sampling operation: y is Y _u Dimension of 2H _u ×2W _u ×C。

For each position (i, j) in the output feature map, the calculation is as follows:

where c denotes the channel index and f (i, j, m, n) is the interpolation function of bilinear interpolation.

The bilinear interpolation function f (i, j, m, n) is calculated as follows:

this formula describes a bilinear interpolation operation, where for each position (i, j) of the output profile, the output value is calculated by a weighted average of the input profile. The interpolation function is weighted according to the distance of the relative positions, thereby obtaining a smoother upsampling effect.

And 3-5, constructing an SMN neural network according to the relative position coding module, the mixed characteristic extraction module, the downsampling module and the upsampling module which are designed from 3-1 to 3-4. The SMN network is configured as follows:

1. input P epsilon R by using the relative bit encoding module of step 3-1 ^W×H×1 Conversion to P' ∈R ^W×H×1 。

2. By using a convolution module, the input P 'is upscaled to P'. Epsilon.R ^W×H×64 。

3. Inputting P' extracted features to P by using the mixed feature extraction module of step 3-2 ⁽³⁾ ∈R ^W×H×64 。

4. By using the downsampling module of step 3-3, the input P is input ⁽³⁾ Reducing dimensions to

5. Input P is entered by using the hybrid feature extraction module of step 3-2 ⁽⁴⁾ Extracting features to

6. By using the downsampling module of step 3-3, the input P is input ⁽⁵⁾ Reducing dimensions to

7. Extracting features from the input P (6) by using the hybrid feature extraction module of step 3-2

8. By using the downsampling module of step 3-3, the input P is input ⁽⁷⁾ Reducing dimensions to

9. Input P is entered by using the hybrid feature extraction module of step 3-2 ⁽⁸⁾ Extracting features to

10. By using stepsThe downsampling module of step 3-3 inputs P ⁽⁹⁾ Reducing dimensions to

11. Input P is entered by using the hybrid feature extraction module of step 3-2 ⁽¹⁰⁾ Extracting features to

12. Input P is input by using the upsampling module of steps 3-4 ⁽¹¹⁾ Raise the dimension toAnd is connected with P ⁽⁹⁾ And performing jump connection.

13. Input P is entered by using the hybrid feature extraction module of step 3-2 ⁽¹²⁾ Extracting features to

14. Input P is input by using the upsampling module of steps 3-4 ⁽¹³⁾ Raise the dimension toAnd is connected with P ⁽⁷⁾ And performing jump connection.

15. Input P is entered by using the hybrid feature extraction module of step 3-2 ⁽¹⁴⁾ Extracting features to

16. Input P is input by using the upsampling module of steps 3-4 ⁽¹⁵⁾ Raise the dimension toAnd is connected with P ⁽⁵⁾ And performing jump connection.

17. Input P is entered by using the hybrid feature extraction module of step 3-2 ⁽¹⁶⁾ Extracting features to

18. Input P is input by using the upsampling module of steps 3-4 ⁽¹⁷ ) Raise dimension to P ⁽¹⁸⁾ ∈R ^W×H×64 And makes a jump connection with P (3).

19. Input P is entered by using the hybrid feature extraction module of step 3-2 ⁽¹⁸⁾ Extracting features to P ⁽¹⁹⁾ ∈R ^W×H×64 。

20. By using convolution module, input P ⁽¹⁹⁾ Extracting features to P ⁽²⁰⁾ ∈R ^W×H×2 。

Claims

1. The invention provides a straw image segmentation method based on a special neural network and image preprocessing, which mainly comprises the following steps:

step 1, collecting color RGB images through an unmanned plane, manufacturing a Straw320 data set, and carrying out normalization processing on the data set;

step 2, designing a straw image-based graying method, aiming at keeping the distinguishing degree of images and reducing the complexity of a subsequent model;

step 3, a straw hybrid network SMN is provided for processing the preprocessed image, the network comprising a position coding module for enhancing the relative information of the straw image, and a hybrid feature extraction module for balancing complexity and accuracy.

2. The straw image segmentation method based on the special neural network and the image preprocessing as claimed in claim 1, wherein in the step 2, a straw image graying method is designed, which aims at keeping the degree of distinction of images and reducing the complexity of a subsequent model;

the straw is usually yellow, the background is usually black, in order to improve the distinction degree between straw and background and reduce the complexity of the network, according to the corresponding value 255,255,0 of yellow in RGB space, the following grey scale coefficient can be obtained: 0.5,0.5,0, which means that the pixel values of the red channel are weighted averaged with the pixel values of the green channel, whereas the pixel values of the blue channel are ignored, the following graying formula can be used:

P＝0.5×R+0.5×G+0×B

wherein P represents the pixel value after graying, R, G and B represent the pixel values of red, green and blue channels in the original color image respectively, and through the mode, the information of the yellow channel is enhanced, and meanwhile, the complexity of network processing is reduced, so that the subsequent image segmentation task is more efficient and accurate.

3. The straw image segmentation method based on the special neural network and the image preprocessing as claimed in claim 1, wherein in the step 3, the SMN network refers to a classical U-Net segmentation network structure,

firstly, introducing a position coding module to provide position information of straws and a background in a straw image, and understanding the spatial distribution of the straws by a network through introducing the position information to improve the accuracy of segmentation;

secondly, by means of a mixed feature extraction module with balanced complexity and accuracy, the module combines a jump connection mechanism of U-Net, extracts multi-scale feature information while keeping network complexity controllable, and accurately distinguishes straws and backgrounds by integrating features of different scales and performing feature fusion and selection, so that accuracy of a segmentation result is improved;

and finally, the SMN network adopts downsampling and upsampling operations to transfer information and restore features between the encoder and the decoder, the downsampling operations reduce the feature size and the parameter quantity through a maximum pooling layer, the training speed is accelerated, the upsampling operations restore the spatial resolution of the features by using a bilinear interpolation method, and more detail information is reserved.

4. A straw image segmentation method based on private neural network and image preprocessing as set forth in claim 3, characterized in that,

step 3-1, a relative position coding module, wherein abundant interaction information exists among pixels in the field of image processing, and the accuracy of an overall task can be improved by effectively extracting information among points;

the relative position coding module couples the relative position coordinates with the original image in a simple manner, in order to reduce the complexity of the algorithm, divides the original image into small blocks of the same size, then fuses the relative position information with the pixel value of each small block, and supposedly, the width of the input image is W, the height is H, and the relative position information is divided into small blocks of the size patch_size×patch_size, then the abscissa range of each small block can be expressed as:

col_indices＝[0,1,...,W/(patch_size-1)]

row_indices[0,1,...,H/(patch_size-1)]

for relative position coding, two coded values col_co and row_co are defined, which represent the relative positions of columns and rows, respectively, and are obtained by dividing col_indices and row_indices by (W/patch_size) and (H/patch_size), as follows:

col_co＝col_indices/(W/patch_size)

row_co＝row_indices/(H/patch_size)

then, these coding values are added into the original data to fuse the relative position information, because the original data is normalized, the value range is between 0 and 1, and the direct addition of col_co and row_co codes may mask the original image data information, which results in unbalanced data and further affects the convergence of the network, therefore, a flexible parameter beta is introduced to limit the influence of the relative position information, and the formula is as follows:

P＝P+β(col_co+row_co)

where P.epsilon.RWXH.times.1, P'. Epsilon.RWXH.times.1.

5. The straw image segmentation method based on the special neural network and the image preprocessing as set forth in claim 3 or 4, wherein,

step 3-2, a Mixed feature extraction module Mixed-extract feature module aims at realizing the balance between accuracy and complexity in a straw segmentation task, and a Mixed feature extraction module is provided according to the design principle of an Encoder-Decoder, which integrates convolution operation and a self-adaptive attention mechanism so as to realize more effective image feature extraction;

in the design of the module, the method simplifies the Encoder function to Z= Encoder (Input), the Decoder function to the pre=Decoder (Z), and the Encoder function can also be regarded as Z= Decoder (Predict), and due to the similarity of the Encoder and the Decoder network, the consistent design principle in the mixed feature extraction module is maintained, and when the features are processed, the Encoder part carries out convolution operation first and then applies the self-adaptive attention mechanism; while the Decoder part uses an adaptive attention mechanism before performing the convolution operation.

6. The straw image segmentation method based on private neural network and image preprocessing as set forth in claim 5, wherein,

the calculation process of the adaptive attention mechanism is as follows:

first, the input features are represented as tensors of the shape (B, hc, wc), where B is the batch size and Hc and Wc are the height and width of the feature, respectively;

in order to reduce the memory consumption of attention calculation, the invention divides the input features into the dimensions of token×token, divides Hc and Wc by token respectively to obtain a feature representation with the shape of (B, (Hc/token) × (Wc/token), token×token), and in this way, the feature can be divided into smaller blocks, so that the subsequent calculation is convenient;

however, direct attention computation of token×token dimensions may lead to memory overflow, introducing a hidden layer of hidden_size, reducing feature dimensions by adding hidden layers, reducing token×token dimensions to hidden_size, obtaining a feature representation of shape (B, (Hc/token) × (Wc/token), hidden_size) to reduce computation and memory consumption;

next, performing attention computation on the feature after dimension reduction, firstly, linearly mapping the feature to Q, K, V tensors, respectively representing query, key and value, and then computing an attention layer output y=softmax (Q, K) V through a softmax function to obtain an attention layer output with the shape of (B, (Hc/token) × (Wc/token), hidden_size);

finally, the present invention restores the attention layer output Y to the original dimension, converting the tensor of shape (B, (Hc/token) × (Wc/token), hidden_size) to the feature representation of shape (B, hc, wc). Thus, a feature representation is obtained after being processed by an adaptive attention mechanism, wherein the attention mechanism reduces the calculation amount of the network by adding a hidden layer;