CN113361373A - Real-time semantic segmentation method for aerial image in agricultural scene - Google Patents

Real-time semantic segmentation method for aerial image in agricultural scene Download PDF

Info

Publication number
CN113361373A
CN113361373A CN202110612989.1A CN202110612989A CN113361373A CN 113361373 A CN113361373 A CN 113361373A CN 202110612989 A CN202110612989 A CN 202110612989A CN 113361373 A CN113361373 A CN 113361373A
Authority
CN
China
Prior art keywords
semantic segmentation
convolution
real
images
texture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110612989.1A
Other languages
Chinese (zh)
Inventor
熊盛武
刘江梁
王晓楠
詹昶
余涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202110612989.1A priority Critical patent/CN113361373A/en
Publication of CN113361373A publication Critical patent/CN113361373A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/02Agriculture; Fishing; Forestry; Mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Animal Husbandry (AREA)
  • Agronomy & Crop Science (AREA)
  • Marine Sciences & Fisheries (AREA)
  • Mining & Mineral Resources (AREA)
  • Probability & Statistics with Applications (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a real-time semantic segmentation method for aerial images in agricultural scenes. The method comprises the steps that a camera carried by an unmanned aerial vehicle is used for collecting image data of an original farmland, the image data are transmitted to a server, the server is used for processing the image data of the original farmland, and finally a data set required by network training is generated. A corresponding semantic segmentation network model is constructed, the model is composed of lightweight modules, the real-time performance can be met, a good segmentation effect can be kept, and meanwhile, a weight cross entropy loss function is used for training the network. In addition, abundant texture information in the agricultural scene image is utilized to improve the semantic segmentation effect. After training is completed, the network model is transplanted to the unmanned aerial vehicle, the unmanned aerial vehicle shoots images in an actual scene, the cut images are transmitted to the semantic segmentation network model on the unmanned aerial vehicle to generate segmentation results, the segmentation results are transmitted to the server side, and analysis and decision-making are performed by a user.

Description

Real-time semantic segmentation method for aerial image in agricultural scene
Technical Field
The invention relates to the field of image recognition in an agricultural scene, in particular to real-time semantic segmentation of an aerial image in the agricultural scene, and marking of a specific area.
Background
Information acquisition and analysis under traditional agricultural scene need consume a large amount of human costs and efficiency is not high, and image recognition technology based on deep learning at present constantly develops, and the application in each field is more and more extensive, and people also begin to apply image recognition technology to agricultural scene.
Semantic segmentation is to assign a category to each pixel point in an image, the categories are predefined categories with practical significance, and the current semantic segmentation is widely applied to unmanned driving and medical image analysis. The semantic segmentation of the image under the agricultural scene is of great significance, particularly, the farmland condition is monitored and analyzed, and farmers take corresponding countermeasures after obtaining the analysis result, so that the potential income of the whole growing season is improved or some losses are reduced. The image data acquisition efficiency under the agricultural scene is low, and the problem of unbalanced category exists, and the unbalanced category of the data set influences the effect of the image semantic segmentation method. In addition, many repeated irregular structures in the aerial images of the farm contain abundant texture information, and the information can be learned to improve the segmentation effect.
Along with the application of unmanned aerial vehicle technique in the agricultural field is more and more extensive, can utilize unmanned aerial vehicle to carry out the collection and the analysis of farmland image, but the hardware performance that unmanned aerial vehicle can carry on is limited, and image segmentation method need reach a real-time effect in practical application, therefore segmentation method need can be quick analysis farmland image and occupy less memory.
Disclosure of Invention
Aiming at the technical problems, the invention provides a real-time semantic segmentation method for aerial images in an agricultural scene, an unmanned aerial vehicle is used for collecting farmland image data and transmitting the farmland image data to a server, the server processes the image data to obtain a training set and a testing set, a lightweight semantic segmentation network model is built based on the idea of deep learning, and then the model is transplanted to the unmanned aerial vehicle, so that the unmanned aerial vehicle can collect and segment images, meets the real-time performance, has a good segmentation effect and finally improves the economic benefit of agriculture.
The technical scheme of the invention is a real-time semantic segmentation method for aerial images in agricultural scenes, which specifically comprises the following steps:
step 1, collecting original agricultural scene image data;
step 2, preprocessing original agricultural scene image data, generating corresponding label images, and then dividing a training set and a verification set;
step 3, constructing a real-time semantic segmentation network model, wherein the semantic segmentation network model comprises a backbone feature extraction network, a void space pyramid pooling module, a texture feature extraction module and an up-sampling module;
the main feature extraction network is used for generating a shallow feature map and a deep feature map, the obtained deep feature map is transmitted to the cavity space pyramid pooling module, the cavity space pyramid pooling module is used for multi-scale feature extraction, and then extracted multi-scale feature maps are connected, so that the segmentation accuracy of different scale areas is improved;
the texture feature extraction module transmits a shallow feature map in a main feature extraction network for extracting multi-scale texture features;
connecting the multi-scale feature map output by the cavity space pyramid pooling module with the texture feature map output by the texture feature extraction module, inputting the multi-scale feature map and the texture feature map into an up-sampling module for up-sampling operation to restore the multi-scale feature map and the texture feature map to the size of an original image, calculating the probability of different types of each pixel point by using a softmax function, and then generating a segmentation image;
step 4, training the constructed semantic segmentation network model;
and 5, inputting the cut test image data into the trained semantic segmentation network model to generate a semantic segmentation result.
Further, the specific implementation in step 2 includes the following sub-steps;
step 2.1, marking original agricultural scene image data, wherein marking modes comprise shadow, dryness, nutrient deficiency, weeds, ponding and canals, and generating corresponding label images;
step 2.2, cutting the original image and the corresponding label image into a plurality of images with certain sizes;
step 2.3, deleting the images which do not contain the marked areas and the images of which the marked areas are larger than a certain threshold value, so that all the images can keep enough context information;
step 2.4, calculating the proportion of the total number of the labeling pixel points of each category in all the images, and performing down-sampling on the image of the category with the overlarge proportion, so that the problem that the semantic segmentation network training effect is poor due to extreme unbalance of the category is avoided;
and 2.5, dividing the processed data set and the label graph according to a certain proportion to obtain a training set and a verification set, wherein the training set and the verification set both have corresponding label graphs.
Further, the main feature extraction network firstly performs convolution downsampling on the image by 3 × 3 to obtain a shallow feature map; then n bottleeck modules are arranged, the bottleeck modules are divided into a step size of 1 and a step size of 2, the bottleeck module with the step size of 1 is formed by 1 × 1 convolution, Relu6 activation function, 3 × 3 depth separable convolution, Relu6 activation function, 1 × 1 convolution, linear activation function and jump connection of the initial characteristic diagram; the bottleeck module with step size of 2 is convolved by 1 × 1, Relu6 activation function, 3 × 3 depth separable convolution with step size of 2, Relu6 activation function, 1 × 1 convolution, linear activation function; after n bottleeck modules, performing 1 × 1 convolution, performing average pooling operation and 1 × 1 convolution, and finally outputting a deep feature map;
further, the cavity space pyramid pooling module is composed of 1 × 1 convolution, 3 × 3 cavity convolution with an expansion rate of 6, 3 × 3 cavity convolution with an expansion rate of 12, 3 × 3 cavity convolution with an expansion rate of 16 and global average pooling, multi-scale feature extraction is achieved, and finally feature fusion is performed, so that the segmentation accuracy of different scale regions is improved.
Further, the texture feature extraction module takes a shallow layer feature map in the trunk feature extraction network as input, then transmits the shallow layer feature map into 4 branches, extracts multi-scale texture features, performs 1 × 1 convolution operation on a first branch, performs 2 × 2 convolution operation on a second branch, performs 3 × 3 convolution operation on a third branch, performs 8 × 8 convolution operation on a fourth branch, performs statistical texture quantization operation on feature maps obtained by respective convolution calculation, performs multi-layer perceptron operation and upsampling after quantization is completed, and finally connects outputs of different branches to obtain final texture features.
Furthermore, statistical texture quantization is constructed based on the idea of statistical texture in traditional digital image processing, a feature graph A obtained by first convolution of each branch of an input texture feature extraction module is firstly input, global average pooling operation is firstly carried out on the input feature graph A to obtain an average feature g, then cosine similarity of a feature vector and the average feature g of each pixel in a space is calculated to obtain a similarity feature graph S, and the formula is as follows:
Figure BDA0003096734070000041
wherein | g |2A 2-norm representing a vector; carrying out quantitative statistics on the similarity feature map S, and extracting information representation to obtain N quantization level features, wherein the nth quantization level is represented as:
Figure BDA0003096734070000042
then it is rightS carries out quantitative coding to each pixel point SiEncoding into an N-dimensional vector Ei,nThe concrete formula is as follows:
Figure BDA0003096734070000043
encoding quantization characteristic Ei,nAnd a quantization level characteristic LnAnd transmitting the result of the connection into a multilayer perceptron, then performing up-sampling on the average characteristic g, and then connecting the result of the up-sampling with the result output by the multilayer perceptron to finally obtain the statistical texture characteristic.
Further, a semantic segmentation network model is trained by adopting a weight cross entropy loss function L, specifically, class weights of different classes in a training set are calculated by using a median frequency method, freqc represents the frequency of the class c appearing in the training set, mean-freqc represents the median of all class frequencies, and each class weight coefficient w is obtained by calculationcThen, a corresponding weight cross entropy loss function L is established, wherein M represents the total number of categories, ycRepresenting the original probability, 1 if the prediction class is the same as the real class c, or 0, pcAnd (3) representing the prediction probability belonging to the class c, wherein the specific formula is as follows:
Figure BDA0003096734070000051
Figure BDA0003096734070000052
further, when training the semantic segmentation network model, an SGD is selected as an optimizer, the initial learning rate is 0.01, the loss function is a weight cross entropy loss function obtained through calculation in the last step, the weight cross entropy loss function is transmitted into a training set and a verification set, the semantic segmentation network model is trained, the initial learning rate is 0.01, the weight attenuation is 0.0001, the batch size is 4, training is carried out for 200000 times, and trained model parameters are stored.
Further, in step 1, a specific image acquisition path is planned in advance, and a camera carried by an unmanned aerial vehicle is used for acquiring image data in a fixed farm area.
The advantages of the invention are mainly reflected in that: the method has the advantages that the deep learning method is utilized to realize semantic segmentation of aerial images in an agricultural scene, and aiming at the application requirements of the method in an actual scene, namely the method needs to be transplanted to an unmanned aerial vehicle with poor hardware performance and needs to realize real-time segmentation, the lightweight network module is utilized to extract trunk characteristics, so that the speed is high and the parameters are few. The method also utilizes abundant texture information in the farm aerial image to improve the segmentation effect, and improves the segmentation precision of different regional scales through the cavity space pyramid pooling module. And finally, aiming at the class unbalance problem, a data set down-sampling method and a weight cross entropy loss function are adopted to improve the segmentation accuracy of the class with low ratio.
Drawings
FIG. 1 is a flow chart of a method for real-time semantic segmentation of aerial images in an agricultural setting, in accordance with an embodiment of the present invention;
FIG. 2 is a network architecture diagram of the present invention implementing semantic segmentation;
FIG. 3 is a flow chart of data set generation in accordance with the present invention;
FIG. 4 is a diagram of a network architecture for extracting features of a backbone implemented in accordance with the present invention;
FIG. 5 is a flow chart of texture feature extraction implemented in the present invention;
FIG. 6 is a flow chart of statistical texture quantization implemented in the present invention.
Detailed Description
The method provided by the invention designs a corresponding network structure aiming at the problems existing in semantic segmentation of aerial images in agricultural scenes, and adopts some related skills to enable the method to play a better role in actual scenes. The embodiments will be described in detail with reference to the accompanying drawings.
Step 1: the camera carried by the unmanned aerial vehicle is used for collecting image data in a fixed farm area, and a specific image collecting path needs to be planned in advance. The aerial image is transmitted to the server side in real time by using a wireless network, and the server side needs to store image data;
step 2: processing all the acquired image data by using the server, thereby obtaining a training set and a verification set required by the network;
furthermore, step 2 comprises the following substeps:
step 2.1: the labelme tool is used for marking original aerial images of farms, and marking categories comprise shadows, dryness, nutrient deficiency, weeds, ponding and canals, and the existence of the categories can affect the growth of crops and the final income. The labeling modes belong to important information in the agricultural field and can guide a user to make a next decision. Each category is allocated with a pixel value which is 1 to 6 respectively, the background which does not contain the labeling area is allocated with a pixel point of 0, and a corresponding label image can be generated after the labeling is finished;
step 2.2: cutting an original image and a corresponding label image into a plurality of images with the size of 512 multiplied by 512 by using a sliding window mode;
step 2.3: traversing the label graphs of all the images, and deleting the label graphs and the corresponding images which do not contain the labeling areas and have the labeling areas larger than 90%, so that all the images can keep enough context information, and simultaneously, some redundant information is reduced, and the network can learn enough information;
step 2.4: and traversing all the images, calculating the total number of the labeled pixel points of each category, calculating the respective occupation ratio, and randomly deleting the images corresponding to the categories with the occupation ratio exceeding 30% while only keeping 80% of the number of the images. The network can only learn the information of the category with a large proportion due to the unbalanced category, so that the segmentation effect of the category with a small proportion is poor, and the step is a data down-sampling method for relieving the problem of the extreme unbalanced category;
step 2.5: and dividing the processed data set and the label graph according to the proportion of 7:3 to obtain a training set and a verification set, wherein the training set and the verification set both have corresponding label graphs.
And step 3: aiming at the problems in the agricultural scene, a real-time semantic segmentation network model is constructed, wherein the network model mainly comprises a trunk feature extraction network, a cavity space pyramid pooling module, a texture feature extraction module and an up-sampling module, and the network inputs images of 512 x 512 and outputs the images as a segmentation result graph;
the main feature extraction network firstly performs convolution downsampling on an image by 3 x 3, and then 5 bottleeck modules are used, wherein the bottleeck is a lightweight feature extraction module provided by a network MobileNet V2. The bottleeck module is divided into 1 and 2 stride, the 1 stride bottleeck module is composed of 1 × 1 convolution, Relu6 activation function, 3 × 3 depth separable convolution, Relu6 activation function, 1 × 1 convolution, linear activation function and jump connection of the initial feature map, the 2 stride bottleeck module is different in that there is no jump connection and the 3 × 3 depth separable convolution has a step size of 2, and is composed of 1 × 1 convolution, Relu6 activation function, 3 × 3 depth separable convolution (step size of 2), Relu6 activation function, 1 × 1 convolution, linear activation function. The bottleeck module can well improve the network computing rate and reduce the model parameter quantity. After 5 bottleeck modules, performing 1 × 1 convolution, performing average pooling operation and 1 × 1 convolution, and finally outputting a feature map;
and (3) transmitting the characteristic diagram obtained in the previous step into a cavity space pyramid pooling module, wherein cavity convolution is used for improving the receptive field and better acquiring the context information of the image so as to improve the final segmentation precision. The cavity space pyramid pooling module is composed of 1 × 1 convolution, 3 × 3 cavity convolution with expansion rate of 6, 3 × 3 cavity convolution with expansion rate of 12, 3 × 3 cavity convolution with expansion rate of 16 and global average pooling, multi-scale feature extraction is realized, and finally feature fusion is performed, so that the segmentation precision of different scale regions is improved;
the texture feature extraction module takes a shallow feature map obtained by the first layer of convolution in the trunk feature extraction network as an input, because the texture features are mainly contained in the low-dimensional features. Then the multi-scale texture features are transmitted into 4 branches for extraction. Performing 1 × 1 convolution operation on a first branch, performing 2 × 2 convolution operation on a second branch, performing 3 × 3 convolution operation on a third branch, performing 8 × 8 convolution operation on a fourth branch, performing statistical texture quantization operation on feature maps obtained by respective convolution calculation, performing mlp multi-layer perceptron operation and upsampling after quantization is completed, and finally connecting the outputs of different branches to obtain final texture features;
the statistical texture quantization is constructed based on the idea of statistical texture in traditional digital image processing, and is similar to a histogram for modeling the statistical texture of an image. Firstly, inputting a feature map A obtained by the first convolution of each branch of the texture feature extraction module, and firstly carrying out global average pooling operation on the input feature map A to obtain an average feature g. Then calculating the cosine similarity of the feature vector and the average feature g of each pixel in the space to obtain a similarity feature map S, wherein the formula is as follows:
Figure BDA0003096734070000081
wherein | g |2A 2-norm representing a vector; carrying out quantitative statistics on the similarity feature map S, extracting information representation, and obtaining N quantization level features, wherein N is set to be 150, and the nth quantization level is represented as:
Figure BDA0003096734070000082
then, the S is quantized and coded, and each pixel point S is subjected to quantization codingiEncoding into an N-dimensional vector Ei,nThe concrete formula is as follows:
Figure BDA0003096734070000091
encoding quantization characteristic Ei,nAnd a quantization level characteristic LnThe result of the connection is transmitted to the multilayer perceptron mlp, then the average characteristic g is up-sampled, and the up-sampled result is connected with the result output by the multilayer perceptron, and finally the statistical texture characteristic is obtained;
and connecting the characteristic graph output by the cavity space pyramid pooling module with the characteristic graph output by the texture characteristic extraction module. And then, performing up-sampling operation to restore the original image to the original image, calculating the probability of different types of each pixel point by using a softmax function, and then generating a segmentation image.
And 4, step 4: training a built semantic segmentation network model;
because the data downsampling can only relieve the problem of class imbalance, a weight cross entropy loss function L is required to be used, class weights of different classes in a training set are calculated by a median frequency method, freqc represents the frequency of the class c in the training set, mean-freqc represents the median of all class frequencies, and a weight coefficient w of each class is calculatedcThen, a corresponding weight cross entropy loss function L is established, wherein M represents the total number of categories, ycRepresenting the original probability, 1 if the prediction class is the same as the real class c, or 0, pcAnd (3) representing the prediction probability belonging to the class c, wherein the specific formula is as follows:
Figure BDA0003096734070000092
Figure BDA0003096734070000093
after a network model is constructed, SGD is selected as an optimizer, the initial learning rate is 0.01, and the loss function is a weight cross entropy loss function obtained through the last step of calculation. And transmitting a training set and a verification set, training the network model, wherein the initial learning rate is 0.01, the weight attenuation is 0.0001, the batch size is 4, training is carried out for 200000 times, and the trained model parameters are stored.
And 5: transplanting the trained model and parameters to an unmanned aerial vehicle;
step 6: the unmanned aerial vehicle collects test image data in an actual farm, firstly zooms the image, and then cuts the image to 512 x 512;
and 7: and transmitting the cut image into a semantic segmentation network model carried by the unmanned aerial vehicle, generating a semantic segmentation result, transmitting the semantic segmentation result to a server in real time, and analyzing and making a decision by a user according to the segmentation result.
The above description is a specific embodiment of the present invention, which is used for illustration of the technical solution and is not an absolute limitation. Modifications and additions may be made thereto by those skilled in the art without departing from the inventive concept. Furthermore, processes not described in detail in this specification are all derived from the prior art.

Claims (9)

1. A real-time semantic segmentation method for aerial images in agricultural scenes is characterized by comprising the following steps:
step 1, collecting original agricultural scene image data;
step 2, preprocessing original agricultural scene image data, generating corresponding label images, and then dividing a training set and a verification set;
step 3, constructing a real-time semantic segmentation network model, wherein the semantic segmentation network model comprises a backbone feature extraction network, a void space pyramid pooling module, a texture feature extraction module and an up-sampling module;
the main feature extraction network is used for generating a shallow feature map and a deep feature map, the obtained deep feature map is transmitted to the cavity space pyramid pooling module, the cavity space pyramid pooling module is used for multi-scale feature extraction, and then extracted multi-scale feature maps are connected, so that the segmentation accuracy of different scale areas is improved;
the texture feature extraction module transmits a shallow feature map in a main feature extraction network for extracting multi-scale texture features;
connecting the multi-scale feature map output by the cavity space pyramid pooling module with the texture feature map output by the texture feature extraction module, inputting the multi-scale feature map and the texture feature map into an up-sampling module for up-sampling operation to restore the multi-scale feature map and the texture feature map to the size of an original image, calculating the probability of different types of each pixel point by using a softmax function, and then generating a segmentation image;
step 4, training the constructed semantic segmentation network model;
and 5, inputting the cut test image data into the trained semantic segmentation network model to generate a semantic segmentation result.
2. The real-time semantic segmentation method for the aerial images in the agricultural scene as claimed in claim 1, characterized in that: the specific implementation in the step 2 comprises the following substeps;
step 2.1, marking original agricultural scene image data, wherein marking modes comprise shadow, dryness, nutrient deficiency, weeds, ponding and canals, and generating corresponding label images;
step 2.2, cutting the original image and the corresponding label image into a plurality of images with certain sizes;
step 2.3, deleting the images which do not contain the marked areas and the images of which the marked areas are larger than a certain threshold value, so that all the images can keep enough context information;
step 2.4, calculating the proportion of the total number of the labeling pixel points of each category in all the images, and performing down-sampling on the image of the category with the overlarge proportion, so that the problem that the semantic segmentation network training effect is poor due to extreme unbalance of the category is avoided;
and 2.5, dividing the processed data set and the label graph according to a certain proportion to obtain a training set and a verification set, wherein the training set and the verification set both have corresponding label graphs.
3. The real-time semantic segmentation method for the aerial images in the agricultural scene as claimed in claim 1, characterized in that: the main feature extraction network firstly performs 3 multiplied by 3 convolution downsampling on the image to obtain a shallow feature map; then n bottleeck modules are arranged, the bottleeck modules are divided into a step size of 1 and a step size of 2, the bottleeck module with the step size of 1 is formed by 1 × 1 convolution, Relu6 activation function, 3 × 3 depth separable convolution, Relu6 activation function, 1 × 1 convolution, linear activation function and jump connection of the initial characteristic diagram; the bottleeck module with step size of 2 is convolved by 1 × 1, Relu6 activation function, 3 × 3 depth separable convolution with step size of 2, Relu6 activation function, 1 × 1 convolution, linear activation function; and after n bottleeck modules, performing 1 × 1 convolution, performing average pooling operation and 1 × 1 convolution, and finally outputting a deep feature map.
4. The real-time semantic segmentation method for the aerial images in the agricultural scene as claimed in claim 1, characterized in that: the cavity space pyramid pooling module is composed of 1 x 1 convolution, 3 x 3 cavity convolution with an expansion rate of 6, 3 x 3 cavity convolution with an expansion rate of 12, 3 x 3 cavity convolution with an expansion rate of 16 and global average pooling, multi-scale feature extraction is achieved, and finally feature fusion is conducted, so that segmentation accuracy of regions with different scales is improved.
5. The real-time semantic segmentation method for the aerial images in the agricultural scene as claimed in claim 1, characterized in that: the texture feature extraction module takes a shallow layer feature map in a trunk feature extraction network as input, then transmits the shallow layer feature map into 4 branches to extract multi-scale texture features, the first branch is subjected to 1 × 1 convolution operation, the second branch is subjected to 2 × 2 convolution operation, the third branch is subjected to 3 × 3 convolution operation, the fourth branch is subjected to 8 × 8 convolution operation, then statistical texture quantization operation is carried out on feature maps obtained by respective convolution calculation, multi-layer perceptron operation and up-sampling are carried out after quantization is completed, and finally output of different branches is connected to obtain final texture features.
6. The real-time semantic segmentation method for the aerial image in the agricultural scene as claimed in claim 5, characterized in that: the statistical texture quantization is constructed based on the idea of statistical texture in traditional digital image processing, firstly, a feature graph A obtained by first convolution of each branch of an input texture feature extraction module is input, for the input feature graph A, global average pooling operation is firstly carried out to obtain an average feature g, then the cosine similarity of a feature vector and the average feature g of each pixel in a space is calculated to obtain a similarity feature graph S, and the formula is as follows:
Figure FDA0003096734060000021
wherein | g |2A 2-norm representing a vector; carrying out quantitative statistics on the similarity feature map S, and extracting information representation to obtain N quantization level features, wherein the nth quantization level is represented as:
Figure FDA0003096734060000022
then, the S is quantized and coded, and each pixel point S is subjected to quantization codingiEncoding into an N-dimensional vector Ei,nThe concrete formula is as follows:
Figure FDA0003096734060000031
encoding quantization characteristic Ei,nAnd a quantization level characteristic LnAnd transmitting the result of the connection into a multilayer perceptron, then performing up-sampling on the average characteristic g, and then connecting the result of the up-sampling with the result output by the multilayer perceptron to finally obtain the statistical texture characteristic.
7. The real-time semantic segmentation method for the aerial images in the agricultural scene as claimed in claim 1, characterized in that: the semantic segmentation network model is trained by adopting a weight cross entropy loss function L, specifically, class weights of different classes in a training set are calculated by using a median frequency method, freqc represents the frequency of the class c appearing in the training set, mean-freqc represents the median of all class frequencies, and a weight coefficient w of each class is obtained by calculationcThen, a corresponding weight cross entropy loss function L is established, wherein M represents the total number of categories, ycRepresenting the original probability, 1 if the prediction class is the same as the real class c, or 0, pcAnd (3) representing the prediction probability belonging to the class c, wherein the specific formula is as follows:
Figure FDA0003096734060000032
Figure FDA0003096734060000033
8. the real-time semantic segmentation method for the aerial image in the agricultural scene as claimed in claim 7, characterized in that: when the semantic segmentation network model is trained, an SGD is selected as an optimizer, the initial learning rate is 0.01, the loss function is a weight cross entropy loss function obtained through calculation in the last step, the weight cross entropy loss function is transmitted into a training set and a verification set, the semantic segmentation network model is trained, the initial learning rate is 0.01, the weight attenuation is 0.0001, the batch size is 4, training is carried out for 200000 times, and trained model parameters are stored.
9. The real-time semantic segmentation method for the aerial images in the agricultural scene as claimed in claim 1, characterized in that: in the step 1, a specific image acquisition path is planned in advance, and a camera carried by an unmanned aerial vehicle is adopted to acquire image data in a fixed farm area.
CN202110612989.1A 2021-06-02 2021-06-02 Real-time semantic segmentation method for aerial image in agricultural scene Pending CN113361373A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110612989.1A CN113361373A (en) 2021-06-02 2021-06-02 Real-time semantic segmentation method for aerial image in agricultural scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110612989.1A CN113361373A (en) 2021-06-02 2021-06-02 Real-time semantic segmentation method for aerial image in agricultural scene

Publications (1)

Publication Number Publication Date
CN113361373A true CN113361373A (en) 2021-09-07

Family

ID=77531185

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110612989.1A Pending CN113361373A (en) 2021-06-02 2021-06-02 Real-time semantic segmentation method for aerial image in agricultural scene

Country Status (1)

Country Link
CN (1) CN113361373A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113516657A (en) * 2021-09-14 2021-10-19 中国石油大学(华东) Self-adaptive weight-based fully-polarized SAR image sea surface oil spill detection method
CN113743417A (en) * 2021-09-03 2021-12-03 北京航空航天大学 Semantic segmentation method and semantic segmentation device
CN113822287A (en) * 2021-11-19 2021-12-21 苏州浪潮智能科技有限公司 Image processing method, system, device and medium
CN113902765A (en) * 2021-12-10 2022-01-07 聚时科技(江苏)有限公司 Automatic semiconductor partitioning method based on panoramic segmentation
CN114037922A (en) * 2021-11-29 2022-02-11 南京审计大学 Aerial image segmentation method based on hierarchical context network
CN114119621A (en) * 2021-11-30 2022-03-01 云南电网有限责任公司输电分公司 SAR remote sensing image water area segmentation method based on depth coding and decoding fusion network
CN114462559A (en) * 2022-04-14 2022-05-10 中国科学技术大学 Target positioning model training method, target positioning method and device
CN114943835A (en) * 2022-04-20 2022-08-26 西北工业大学 Real-time semantic segmentation method for aerial images of ice slush unmanned aerial vehicle in yellow river
CN117882546A (en) * 2024-03-13 2024-04-16 山西诚鼎伟业科技有限责任公司 Intelligent planting method for agricultural operation robot

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190005356A1 (en) * 2017-06-30 2019-01-03 Canon Kabushiki Kaisha Image recognition apparatus, learning apparatus, image recognition method, learning method, and storage medium
CN109255334A (en) * 2018-09-27 2019-01-22 中国电子科技集团公司第五十四研究所 Remote sensing image terrain classification method based on deep learning semantic segmentation network
CN111079649A (en) * 2019-12-17 2020-04-28 西安电子科技大学 Remote sensing image ground feature classification method based on lightweight semantic segmentation network
CN111259898A (en) * 2020-01-08 2020-06-09 西安电子科技大学 Crop segmentation method based on unmanned aerial vehicle aerial image
CN111950349A (en) * 2020-06-22 2020-11-17 华中农业大学 Semantic segmentation based field navigation line extraction method
CN112004085A (en) * 2020-08-14 2020-11-27 北京航空航天大学 Video coding method under guidance of scene semantic segmentation result
WO2020238560A1 (en) * 2019-05-27 2020-12-03 腾讯科技(深圳)有限公司 Video target tracking method and apparatus, computer device and storage medium
CN112634276A (en) * 2020-12-08 2021-04-09 西安理工大学 Lightweight semantic segmentation method based on multi-scale visual feature extraction

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190005356A1 (en) * 2017-06-30 2019-01-03 Canon Kabushiki Kaisha Image recognition apparatus, learning apparatus, image recognition method, learning method, and storage medium
CN109255334A (en) * 2018-09-27 2019-01-22 中国电子科技集团公司第五十四研究所 Remote sensing image terrain classification method based on deep learning semantic segmentation network
WO2020238560A1 (en) * 2019-05-27 2020-12-03 腾讯科技(深圳)有限公司 Video target tracking method and apparatus, computer device and storage medium
CN111079649A (en) * 2019-12-17 2020-04-28 西安电子科技大学 Remote sensing image ground feature classification method based on lightweight semantic segmentation network
CN111259898A (en) * 2020-01-08 2020-06-09 西安电子科技大学 Crop segmentation method based on unmanned aerial vehicle aerial image
CN111950349A (en) * 2020-06-22 2020-11-17 华中农业大学 Semantic segmentation based field navigation line extraction method
CN112004085A (en) * 2020-08-14 2020-11-27 北京航空航天大学 Video coding method under guidance of scene semantic segmentation result
CN112634276A (en) * 2020-12-08 2021-04-09 西安理工大学 Lightweight semantic segmentation method based on multi-scale visual feature extraction

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LANYUN ZHU ET AL.: "Learning Statistical Texture for Semantic Segmentation", 《ARXIV.ORG》 *
LIANG-CHIEH CHEN ET AL.: "Rethinking Atrous Convolution for Semantic Image Segmentation", 《ARXIV.ORG》 *
MARK SANDLER ET AL.: "MobileNetV2: Inverted Residuals and Linear Bottlenecks", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113743417A (en) * 2021-09-03 2021-12-03 北京航空航天大学 Semantic segmentation method and semantic segmentation device
CN113743417B (en) * 2021-09-03 2024-02-23 北京航空航天大学 Semantic segmentation method and semantic segmentation device
CN113516657A (en) * 2021-09-14 2021-10-19 中国石油大学(华东) Self-adaptive weight-based fully-polarized SAR image sea surface oil spill detection method
WO2023087597A1 (en) * 2021-11-19 2023-05-25 苏州浪潮智能科技有限公司 Image processing method and system, device, and medium
CN113822287A (en) * 2021-11-19 2021-12-21 苏州浪潮智能科技有限公司 Image processing method, system, device and medium
CN113822287B (en) * 2021-11-19 2022-02-22 苏州浪潮智能科技有限公司 Image processing method, system, device and medium
CN114037922A (en) * 2021-11-29 2022-02-11 南京审计大学 Aerial image segmentation method based on hierarchical context network
CN114037922B (en) * 2021-11-29 2023-04-07 南京审计大学 Aerial image segmentation method based on hierarchical context network
CN114119621A (en) * 2021-11-30 2022-03-01 云南电网有限责任公司输电分公司 SAR remote sensing image water area segmentation method based on depth coding and decoding fusion network
CN113902765A (en) * 2021-12-10 2022-01-07 聚时科技(江苏)有限公司 Automatic semiconductor partitioning method based on panoramic segmentation
CN114462559A (en) * 2022-04-14 2022-05-10 中国科学技术大学 Target positioning model training method, target positioning method and device
CN114462559B (en) * 2022-04-14 2022-07-15 中国科学技术大学 Target positioning model training method, target positioning method and device
CN114943835A (en) * 2022-04-20 2022-08-26 西北工业大学 Real-time semantic segmentation method for aerial images of ice slush unmanned aerial vehicle in yellow river
CN114943835B (en) * 2022-04-20 2024-03-12 西北工业大学 Real-time semantic segmentation method for yellow river ice unmanned aerial vehicle aerial image
CN117882546A (en) * 2024-03-13 2024-04-16 山西诚鼎伟业科技有限责任公司 Intelligent planting method for agricultural operation robot
CN117882546B (en) * 2024-03-13 2024-05-24 山西诚鼎伟业科技有限责任公司 Intelligent planting method for agricultural operation robot

Similar Documents

Publication Publication Date Title
CN113361373A (en) Real-time semantic segmentation method for aerial image in agricultural scene
CN112052886B (en) Intelligent human body action posture estimation method and device based on convolutional neural network
CN111259898B (en) Crop segmentation method based on unmanned aerial vehicle aerial image
CN110245709B (en) 3D point cloud data semantic segmentation method based on deep learning and self-attention
CN113159051B (en) Remote sensing image lightweight semantic segmentation method based on edge decoupling
CN112991354B (en) High-resolution remote sensing image semantic segmentation method based on deep learning
CN111696101A (en) Light-weight solanaceae disease identification method based on SE-Inception
CN111429460A (en) Image segmentation method, image segmentation model training method, device and storage medium
CN113705580B (en) Hyperspectral image classification method based on deep migration learning
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN104077742B (en) Human face sketch synthetic method and system based on Gabor characteristic
CN112950780B (en) Intelligent network map generation method and system based on remote sensing image
CN101667292B (en) SAR image segmentation system and segmentation method based on immune clone and projection pursuit
CN112241937B (en) Hyperspectral image reconstruction method based on neural network
CN115272828A (en) Intensive target detection model training method based on attention mechanism
CN113269224A (en) Scene image classification method, system and storage medium
CN114445715A (en) Crop disease identification method based on convolutional neural network
CN110969182A (en) Convolutional neural network construction method and system based on farmland image
CN112597919A (en) Real-time medicine box detection method based on YOLOv3 pruning network and embedded development board
CN115410087A (en) Transmission line foreign matter detection method based on improved YOLOv4
CN114494910A (en) Facility agricultural land multi-class identification and classification method based on remote sensing image
CN111242028A (en) Remote sensing image ground object segmentation method based on U-Net
CN116543165B (en) Remote sensing image fruit tree segmentation method based on dual-channel composite depth network
CN117349622A (en) Wind power plant wind speed prediction method based on hybrid deep learning mechanism
CN112132207A (en) Target detection neural network construction method based on multi-branch feature mapping

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210907

RJ01 Rejection of invention patent application after publication