CN114463542A - Orchard complex road segmentation method based on lightweight semantic segmentation algorithm - Google Patents

Orchard complex road segmentation method based on lightweight semantic segmentation algorithm Download PDF

Info

Publication number
CN114463542A
CN114463542A CN202210075398.XA CN202210075398A CN114463542A CN 114463542 A CN114463542 A CN 114463542A CN 202210075398 A CN202210075398 A CN 202210075398A CN 114463542 A CN114463542 A CN 114463542A
Authority
CN
China
Prior art keywords
image
algorithm
orchard
convolution
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210075398.XA
Other languages
Chinese (zh)
Inventor
伍荣达
杨尘宇
朱立学
张世昂
郭晓耿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongkai University of Agriculture and Engineering
Original Assignee
Zhongkai University of Agriculture and Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongkai University of Agriculture and Engineering filed Critical Zhongkai University of Agriculture and Engineering
Priority to CN202210075398.XA priority Critical patent/CN114463542A/en
Publication of CN114463542A publication Critical patent/CN114463542A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of orchard road analysis, and particularly discloses a lightweight semantic segmentation algorithm-based orchard complex road segmentation method, which comprises the steps of acquiring a data set in a preset range, and determining an image recognition model based on the data set; acquiring an image to be detected in real time, and extracting the characteristics of the image to be detected based on the image recognition model to obtain a characteristic diagram; performing convolution operation on the feature map based on the trained depth separable convolution model; wherein, the number of output channels of the characteristic diagram is unchanged; and determining a path and the position relation of the path in the image to be detected according to the convolution operation result. The visual recognition method has small difference with the recognition effect of other models, meets the requirement of extracting complex path information of the orchard, effectively improves the calculation efficiency of the algorithm, reduces the requirement of the agricultural robot on the calculation force required by the visual recognition in the orchard environment, and meets the requirement of the subsequent visual navigation on the real-time property.

Description

Orchard complex road segmentation method based on lightweight semantic segmentation algorithm
Technical Field
The invention relates to the technical field of orchard road analysis, in particular to an orchard complex road segmentation method based on a lightweight semantic segmentation algorithm.
Background
Hilly and mountain lands are main landforms in south-hilly areas of China and are also output areas of important economic crops such as forest and fruit crops. As the urbanization process advances, the labor force in rural areas is continuously reduced, and the problems of labor force shortage and high cost are more prominent when the hilly and mountain agricultural production faces. The promotion of agricultural production mechanization and intellectualization is one of effective ways for solving the problem. The autonomous navigation of the agricultural robot is an important step for realizing automatic operation, except that a satellite is used for positioning and planning a path, and for hilly and mountain orchards, the perception of the environment around the robot by using a vision system is particularly important so that the robot can perform autonomous navigation.
However, most of the deployment applications of the existing deep learning algorithms are application in the specific field of algorithm model migration trained by the public data set in a migration learning mode, in the process, the accuracy of the algorithms is considered more and the real-time performance is ignored, and the requirements on computing resources are extremely high, so that the method is not suitable for the real-time detection process and is difficult to be directly applied to the intelligent robot.
Disclosure of Invention
The invention aims to provide an orchard complex road segmentation method based on a lightweight semantic segmentation algorithm, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme:
an orchard complex road segmentation method based on a lightweight semantic segmentation algorithm comprises the following steps:
acquiring a data set in a preset range, and determining an image recognition model based on the data set;
acquiring an image to be detected in real time, and extracting the characteristics of the image to be detected based on the image recognition model to obtain a characteristic diagram;
performing convolution operation on the feature graph based on a trained depth separable convolution model; wherein, the number of output channels of the characteristic diagram is unchanged;
and determining a path and the position relation of the path in the image to be detected according to the convolution operation result.
As a further scheme of the invention: the step of acquiring a data set within a preset range and determining an image recognition model based on the data set comprises:
acquiring a data set within a preset range; the data set comprises a training set and a test set;
preprocessing the data set; the preprocessing comprises image labeling and data enhancement;
initializing algorithm parameters based on a preset algorithm network initialization method and an activation function, and training the initialized algorithm based on a training set;
and testing the trained algorithm according to the test set to determine an image recognition model.
As a further scheme of the invention: adopting Labelme software for image annotation, and generating an annotated image in the png format on the basis of original image annotation; the annotated image is generated by carrying out reassignment on different image channels based on the annotation of the corresponding original image.
As a further scheme of the invention: the data enhancement steps include image rotation, brightness adjustment, resizing and noise superposition.
As a further scheme of the invention: the algorithm network initialization method comprises the following steps of initializing algorithm parameters based on a preset algorithm network initialization method and an activation function, and training the initialized algorithm based on a training set, wherein the algorithm network initialization method comprises the steps of Xavier initialization and MSRA initialization; the activation function employs a nonlinear function ReLU.
As a further scheme of the invention: the step of extracting the characteristics of the image to be detected based on the image recognition model comprises the following steps:
carrying out convolution operation on an input picture for a plurality of times and carrying out downsampling operation;
the feature extraction process is a two-classification extraction process, and the contents extracted in the two-classification extraction process are divided into roads and backgrounds.
As a further scheme of the invention: the operation process of the depth separable convolution model is decoupling and separating calculation of convolution operation on channel dimension and space dimension; wherein, the mapping operation is carried out on the space dimension separately as an initial operation step.
As a further scheme of the invention: the operation process of the depth separable convolution model comprises the following steps:
broadening the number of characteristic image channels based on 1 multiplied by 1 point convolution;
when the characteristic diagram channel is widened, copying and overlapping are carried out based on the original characteristic diagram.
As a further scheme of the invention: the overlay process of the feature map includes a depth separable convolution step and a global tie pooling weighting step.
As a further scheme of the invention: the step of determining the positional relationship includes:
an algorithm network of an encoding-decoding structure extracts the features of the image to be detected and compresses the resolution of the image to obtain a feature map;
decoding the feature maps at different stages based on a decoder and performing up-sampling operation;
and restoring the decoded image to be detected into the resolution of the original image, and determining the spatial position information of the image to be detected.
Compared with the prior art, the invention has the beneficial effects that: the visual recognition method has small difference with the recognition effect of other models, meets the requirement of extracting complex path information of the orchard, effectively improves the calculation efficiency of the algorithm, reduces the requirement of the agricultural robot on the calculation force required by the visual recognition in the orchard environment, and meets the requirement of the subsequent visual navigation on the real-time property.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention.
FIG. 1 is a flow chart of an orchard complex road segmentation method based on a lightweight semantic segmentation algorithm.
Fig. 2 is a first exemplary diagram of an orchard unstructured path.
Fig. 3 is a second exemplary diagram of an orchard unstructured path.
Fig. 4 is a graph of the variation of the loss value during training.
FIG. 5 is a graph of the change in pixel classification accuracy during the training process.
Fig. 6 is a graph of the change of the cross-over ratio in the training process.
FIG. 7 is a graph comparing the test results.
FIG. 8 is a graph of algorithm parameter versus calculated quantity.
FIG. 9 is a comparison graph of algorithm accuracy versus cross-correlation.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Hilly and mountain lands are main landforms in south-hilly areas of China and are also output areas of important economic crops such as forest and fruit crops. As the urbanization process advances, the labor force in rural areas is continuously reduced, and the problems of labor force shortage and high cost are more prominent when the hilly and mountain agricultural production faces. The promotion of agricultural production mechanization and intellectualization is one of effective ways for solving the problem. The autonomous navigation of the agricultural robot is an important step for realizing automatic operation, except that a satellite is used for positioning and planning a path, and for hilly and mountain orchards, the perception of the environment around the robot by using a vision system is particularly important so that the robot can perform autonomous navigation.
The visual navigation system is widely used for obstacle detection, target tracking, path information extraction and the like, and can provide more actual environment characteristic information in a satellite positioning signal interruption or complex environment. The existing technical scheme comprises:
the Mengqing breadth and the like utilize Cg components in a YCrCg model to carry out fuzzy C mean clustering segmentation, and a method based on linear scanning of crop rows is adopted to extract crop row center lines;
the Gao national organ and the like adopt H components in an HIS model to carry out K-means clustering segmentation, and then carry out linear fitting through a Hough algorithm to extract path information;
jiang and the like carry out Hough change and clustering on the feature points at the center of the wheat line to extract the center line of the wheat line;
the Wangchunli and the like optimize the gray operator by utilizing a genetic algorithm and use the gray operator for threshold segmentation, so that the corn stubble rows are accurately and quickly segmented;
chen Zi and the like provide a multi-crop line extraction algorithm based on an automatic Hough transformation accumulation threshold;
zhangxiong and the like carry out binarization segmentation on the image by using a color difference method and an OTUS method, and then fit a navigation path through a Hough algorithm;
wanqiao and the like judge whether the headland appears according to the jumping characteristics of the gray levels of the pixels inside and outside the farmland, and then fit the jumping characteristic points through a stable regression algorithm to obtain a headland path leading line;
chen and the like provide a predictive point Hough transform algorithm to extract a navigation path aiming at the problem of large calculation amount of the traditional Hough algorithm. The traditional image processing algorithm can obtain a better processing result in a stable scene, but in the problems of processing complex scenes, filtering interference and the like, the designed traditional image algorithm is difficult to cover the possibility of all scenes, and meanwhile, in order to achieve a target effect, the algorithm design complexity is high, and real-time calculation is not facilitated.
In recent years, deep learning and computing devices have been developed rapidly, so that artificial intelligence technology is beginning to be widely applied to the agricultural field. Liyunwei et al propose to perform scene recognition on the field roads in hilly and mountainous areas based on an improved hole convolution neural network; zhang Qin and the like utilize a YOLO algorithm to realize the detection and the positioning of rice seedlings so as to extract crop row lines; poetry and the like provide an infrared real-time bilateral semantic segmentation network suitable for infrared images; zhang et al used the DeepLabV3+ network to divide the wheat growth area.
However, most of the existing deployment applications of the deep learning algorithm are to migrate the algorithm model trained by the public data set to a specific field for application in a migration learning manner, and in the process, the accuracy of the algorithm is considered more and the real-time performance is ignored.
Aiming at the problems of complex orchard environment, fuzzy path boundary, uncertain interference and the like, the orchard path information is identified and extracted by a semantic segmentation algorithm based on a deep convolutional neural network; meanwhile, in order to reduce the image information processing cost of the agricultural robot and meet the requirement on real-time performance, a plurality of lightweight strategies are adopted to design the network, the parameter number and the calculated amount of the algorithm model are reduced, the utilization efficiency and the operation processing speed of the algorithm model are improved, and a foundation is provided for navigation by subsequently utilizing path information.
Example 1
Fig. 1 is a flow chart of an orchard complex road segmentation method based on a lightweight semantic segmentation algorithm, and in the embodiment of the present invention, the orchard complex road segmentation method based on the lightweight semantic segmentation algorithm includes:
step S100: acquiring a data set in a preset range, and determining an image recognition model based on the data set;
step S200: acquiring an image to be detected in real time, and extracting the characteristics of the image to be detected based on the image recognition model to obtain a characteristic diagram;
step S300: performing convolution operation on the feature graph based on a trained depth separable convolution model; wherein, the number of output channels of the characteristic diagram is unchanged;
step S400: and determining a path and the position relation of the path in the image to be detected according to the convolution operation result.
After the full Convolutional neural Networks (FCNs) have been proposed, more and more Semantic Segmentation Networks (issbpcs) based on pixel classification are used for Image Segmentation.
Compared with the Image Semantic Segmentation based on the region Classification (ISSbRC), ISSbPC has better performance in terms of Image Segmentation accuracy and Segmentation speed. At present, the commonly used ISSbPC algorithms include FCN, UNet, SegNet, deep lab series and the like, but the parameter quantity and the calculation quantity of the algorithms are relatively large, and the algorithms are difficult to be deployed in practical application scenes to realize real-time operation. In the navigation process of the agricultural robot, the path segmentation information needs to be acquired in real time under limited computational power resources, and the problems of low algorithm operation speed, high energy consumption and the like caused by excessive parameter quantity and computational quantity of an algorithm model are solved. Aiming at the problems, the path identification method under the orchard complex environment is provided based on the lightweight semantic segmentation algorithm, and is used for the agricultural robot to perform real-time path identification under the orchard non-structural environment.
Step S100 is a process of establishing an image recognition model, and further, the step of acquiring a data set within a preset range and determining the image recognition model based on the data set includes:
acquiring a data set within a preset range; the data set comprises a training set and a test set;
preprocessing the data set; the preprocessing comprises image labeling and data enhancement;
initializing algorithm parameters based on a preset algorithm network initialization method and an activation function, and training the initialized algorithm based on a training set;
and testing the trained algorithm according to the test set to determine an image recognition model.
Specifically, Labelme software is adopted for image annotation, and an annotated image in the png format is generated on the basis of original image annotation; the annotated image is generated by carrying out reassignment on different image channels based on the annotation of the corresponding original image; the data enhancement steps include image rotation, brightness adjustment, resizing and noise superposition.
In addition, the algorithm parameters are initialized based on a preset algorithm network initialization method and an activation function, and the algorithm network initialization method for training the initialized algorithm based on the training set comprises Xavier initialization and MSRA initialization; the activation function employs a nonlinear function ReLU.
For the above, the following are exemplified:
referring to fig. 2 and 3, in an orchard environment, paths generally exist in an unstructured manner, and different states exist on the same road section due to influences of time, place, climate and the like. Taking an experimental object banana garden as an example, the path under the orchard complex environment mainly has the following characteristics: (1) path boundary blurring; (2) interference such as weeds and fallen leaves exists in the path; (3) interference such as uneven light, light spots and the like exists; (4) and fruit tree branches and leaves fall and are shielded. These complex environmental characteristics and interferences pose great difficulties for orchard path identification.
The orchard environment is divided into two categories, namely a path and a background according to the requirement of subsequent agricultural robot autonomous navigation operation. The algorithm is used for fitting an algorithm model for identifying the path in the image by performing iterative training on information obtained by pre-dividing the image, so that a visual processing system can extract path information quickly, and the division is defined as table 1.
Figure BDA0003483730170000071
TABLE 1 orchard road scene object Classification
The semantic segmentation is carried out on the orchard environment, and the main purpose is to extract path information in a complex environment so as to facilitate the autonomous navigation operation of the agricultural robot. Training data were collected at banana planting base (21 ° 06N, 110 ° 06E) in tunnel county, Zhanjiang province, Guangdong, in 7-8 months (banana harvest time). In order to keep consistency with the actual operation scene of the subsequent trolley, when the image is collected, the manual handheld collecting equipment is kept within the range of being 90 +/-5 cm away from the ground, and the included angle between the center of the optical axis and the ground is kept within the range of 20-45 degrees. In order to improve the diversity of data sets and the generalization capability of training models, different acquisition devices are adopted for shooting, and simultaneously, image acquisition is carried out under different weather conditions and different time periods such as sunny days, cloudy days and the like. In an orchard, 4600 images are collected for a plurality of scenes and under different environmental conditions as a data set, 4000 images are randomly extracted as a training set, and the rest part is a test set.
The deep convolutional neural network training mode is supervised learning, and before training, a data set needs to be labeled manually. Meanwhile, in order to improve the utilization of data set information and enhance the generalization capability of the network model, the data of the training set is enhanced, and the data quality and the characteristic diversity are effectively improved. Namely, the image preprocessing comprises image labeling and data enhancement.
And (3) adopting Labelme software for image annotation, and generating an annotated image in the png format on the basis of original image annotation. The marked image is mainly characterized in that the mark of the corresponding original image is utilized to carry out reassignment on different image channels. For example, in this document, images are mainly divided into two main categories — background and path, where RGB channels are all assigned 0 at the pixels corresponding to the background and 1 at the pixels corresponding to the path, and the specific definitions are shown in table 2.
Figure BDA0003483730170000081
TABLE 2 orchard scene road image annotation schematic
Because the training data set only consists of 4000 pictures, a certain persuasion is lacked for the training of the deep learning algorithm, and meanwhile, an overfitting phenomenon possibly occurs in the training process due to the fact that the data volume is small. Therefore, the original data set is expanded by adopting a data enhancement mode, operation is performed by combining the conditions possibly met by the agricultural robot in the actual scene, the effective utilization information of the data set is increased, the anti-interference capability of an algorithm model is increased, and the number of images in the data set is increased to 8000.
The specific operations of data enhancement mainly include image rotation, brightness adjustment, size scaling, noise superposition and the like. When the image rotation corresponds to the operation, the camera is inclined due to the road bumping; the brightness adjustment corresponds to the light intensity in different time periods; zooming in and out the view angles corresponding to different cameras; noise superposition indicates that the camera is disturbed. The data enhancement operation described above randomly enhances the original training set with a 50% probability, expanding the training set to 8000 sheets. The following experiments prove that 8000 pictures just meet the requirement of algorithm design through training, and the algorithm testing process has good generalization capability and does not generate an overfitting phenomenon.
For step S200 and step 300, the step of performing feature extraction on the to-be-detected image based on the image recognition model includes:
carrying out convolution operation on an input picture for a plurality of times and carrying out downsampling operation;
the feature extraction process is a two-classification extraction process, and the contents extracted in the two-classification extraction process are divided into roads and backgrounds.
The operation process of the depth separable convolution model is decoupling and separating calculation of convolution operation on channel dimension and space dimension; wherein, the mapping operation is carried out on the space dimension separately as an initial operation step.
Further, the operation process of the depth separable convolution model includes:
broadening the number of characteristic image channels based on 1 multiplied by 1 point convolution;
when the characteristic diagram channel is widened, copying and overlapping are carried out based on the original characteristic diagram.
Specifically, the superposition process of the feature map comprises a depth separable convolution step and a global tie pooling weighting step.
In the process of constructing the semantic segmentation model, in order to extract feature information with higher dimensionality, an input picture is usually subjected to convolution operation for multiple times and downsampling operation, the size of a feature image is reduced, and the receptive field of a convolution kernel is improved. The semantic segmentation model generally performs downsampling operation on an input image for 4-5 times, so that the characteristic diagram information is richer, and an algorithm is convenient to classify the detailed parts of the image. In the method, the target is set, only the algorithm model needs to classify and extract the orchard image, namely, the orchard image is a two-classification task, the edge of the path is fuzzy, and only the outline of the path needs to be extracted.
Furthermore, considering the real-time requirement of the segmentation task in the visual navigation, and only performing two classifications on image pixels, the required network capacity is relatively small, so that the network structure is cut in depth, and only 3 times of downsampling and 3 times of upsampling operations are performed on the original image, which is beneficial to improving the network operation speed. And the reduction of the network depth of the algorithm can reduce the whole reception field of the network, which is not beneficial to the extraction of image space information, so that the convolution kernel is improved, and the cavity convolution is adopted for substitution, so that the whole reception field can not be reduced due to the reduction of the network depth.
In particular, the deep separable convolution is another effective measure in the algorithm design to effectively reduce the network computation amount. The conventional convolution kernel is a three-dimensional filter and performs joint mapping operation on the channel dimension and the space dimension of the image. Under the condition that the convolution kernel size is large and the number of output channels is large, the required parameters are large, assuming that the convolution kernel size is k × k, the output feature map size is c × c, the number of input channels is n, and the number of output channels is m, then the calculated amount s1 required for completing one conventional convolution is:
s1=n×k×k×m×c×c
the depth separable convolution is to decouple the operation of the convolution on the channel dimension and the space dimension, separate the calculation and firstly separately perform the mapping operation on the space dimension. Compared with the conventional convolution, each channel of the input feature map is only operated with the corresponding convolution kernel channel, and the number of the channels of the output feature map is consistent with the input number. Meanwhile, in order to change the number of output channels and allow the feature map data to perform flow interaction in the channel dimension, a conventional convolution (point convolution) with a convolution kernel size of 1 × 1 is generally performed again at the output stage. Assuming that the convolution kernel size is k × k, the output feature map size is c × c, the number of input channels is n, and the number of output channels is m, the amount s2 required for completing one depth separable convolution and point convolution is:
s2=n×k×k×c×c+n×m×c×c
the two ways calculate the quantity ratio as:
Figure BDA0003483730170000101
it can be seen from the above calculation amount ratio that the convolution kernel size setting has a large influence on the calculation amount. In the design, except for the input and output layers, the convolution layers are replaced with depth separable convolutions and the convolution kernel size is set to 3 × 3, the amount of computation after the replacement will be reduced to 1/9.
It should be noted that, in order to reduce the number of parameters in algorithm design, the conventional convolution is replaced by the depth separable convolution. In order to widen the number of channels of the network, the number of channels of the feature map needs to be adjusted by using 1 × 1 point convolution. The use of the 1 × 1 convolution will occupy a large amount of memory and increase the number of Floating points (FLOPs), and especially when the 1 × 1 convolution modifies the number of channels, the access cost to the memory will increase. In view of the fact that a large number of redundant features exist in the feature diagram of the same layer, when the feature diagram channel is widened, the original feature diagram is used for copying and overlapping, and the purpose of increasing the number of the feature diagram channels is achieved. In order to reduce the consumption of memory access during 1 × 1 convolution, the number of channels is not adjusted, the number of channels is kept unchanged, output is performed, and features are copied and superimposed on an output feature map during output, so that the number of channels of the feature map is widened.
Assuming that the number of final output channels is 2 times of the input (i.e., m is 2n) and the size of the feature map is w × h, the FLOPs obtained by superimposing the channel widening and the feature copy by using the point convolution are respectively:
point convolution FLOPs ═ w × h × n × m ═ w × h × n × 2n
Feature copy overlay FLOPs
From the above formula, it can be seen that the FLOPs are effectively reduced by using the feature copy superposition mode, and the reduction size is proportional to the channel increase multiple.
The design method can greatly reduce the number of algorithm parameters, but due to the fact that direct copying is adopted, the problems of omission and the like caused by extraction of the obtained feature layer information can be caused, the final algorithm training result is poor, and the problems are also confirmed in subsequent experiment comparison.
In order to make full use of the feature map of the overlapped part, some transformation processing is also carried out on the part in the design process, so that the information of the whole feature map is richer. The transformation processing comprises the steps of calculating the superposition part by adopting depth separable convolution and weighting after global tie pooling is carried out by the superposition part, and the two methods have the common advantage that the information of the superposed characteristic graph can be richer only by increasing a small amount of calculation. In the experimental process, two improvement modes are also shown to contribute to the improvement of the algorithm result.
To sum up, the overall design structure of the network is mainly based on the encoding-decoding structure, and the original image is down-sampled and up-sampled three times, as shown in fig. 4. Except for the input layer and the output layer, all other convolution layers are replaced by a depth separable convolution and point convolution form so as to achieve the purpose of reducing parameter quantity and calculation quantity. On the basis of the structure, feature multiplexing is introduced, the purpose of widening the number of feature diagram channels is achieved by utilizing feature superposition, and the use of 1 multiplied by 1 convolution is reduced. Meanwhile, in order to enrich the superposed characteristic graph information, two modes of depth separable convolution and global tie pooling weighting are used for the superposed part of the characteristic layer, the utilization efficiency of the algorithm network on the characteristic information of the image is further improved, and the algorithm network can realize more accurate segmentation on the path in a complex orchard environment.
Step S400 is a result output step, in which the determination process of the positional relationship is a core step, and the determination step of the positional relationship includes:
an algorithm network of an encoding-decoding structure extracts the features of the image to be detected and compresses the resolution of the image to obtain a feature map;
decoding the feature maps at different stages based on a decoder and performing up-sampling operation;
and restoring the decoded image to be detected into the resolution of the original image, and determining the spatial position information of the image to be detected.
In the process of identifying the path in the complex environment of the orchard, the acquisition of the spatial position relationship of the path in the image is very important in order to accurately judge the pose of the robot in the path in the subsequent visual navigation process. In order to better acquire the position information of pixels in an image space, an algorithm network adopting an encoding-decoding structure is provided, an encoder is used for carrying out feature extraction on an original image and compressing the resolution of the image to acquire high-dimensional abstract feature information, a decoder is used for decoding feature images at different stages and carrying out up-sampling operation, a segmentation structure is restored to the resolution of the original image, and the space position information of the image is restored.
In order to verify the feasibility of the network designed herein and the recognition effect of the path in a complex environment, the network is subjected to programming and training tests, and the specific experimental environment and configuration are shown in table 3;
Figure BDA0003483730170000121
TABLE 3 Experimental Environment and configuration
In the process of network training, the initialization method and the training strategy have great influence on the network training. The current main algorithm network initialization method comprises Xavier initialization, MSRA initialization and the like. By using a proper initialization method, the problems of gradient disappearance, gradient explosion and the like in the algorithm network training process can be prevented, so that the algorithm can extract features better. In the process of algorithm design, a nonlinear function ReLU is used as an activation function, and the calculation is as follows:
Figure BDA0003483730170000122
in each layer of the network, in order to make the input and output follow the same distribution as much as possible, a MSRA initialization method is used for initializing algorithm parameters;
the learning rate (lr) refers to the amplitude of the back-propagation update of the parameters during network training. The training of the network can be accelerated by a larger learning rate, but the problem of jitter can also be brought; a smaller learning rate may extend the training time of the network and cause overfitting problems. In the network training process, the learning rate is dynamically adjusted, and the network is guided to train better to a certain extent. In the experiment, the learning rate is mainly adjusted through the training loss value condition, and the specific expression is as follows:
Figure BDA0003483730170000123
in the training process, the Dice Loss and the binary cross entropy Loss are selected for combination, and the specific calculation is as follows:
Figure BDA0003483730170000131
Figure BDA0003483730170000132
Figure BDA0003483730170000133
where yc denotes the true label and pc denotes the predictive label.
In combination with the requirement of the experimental target, the Pixel Classification Accuracy (PA) and the Intersection ratio (IoU) in the current semantic segmentation evaluation index are used as the main evaluation indexes of the experiment. The specific calculation is as follows:
Figure BDA0003483730170000134
Figure BDA0003483730170000135
and pii represents the total number of pixels with the actual type i and the predicted type i, and pij represents the total number of pixels with the actual type i and the predicted type j.
And (3) carrying out algorithm model construction according to the lightweight strategy, and improving the characteristic multiplexing part on the basis of the lightweight strategy, namely carrying out primary depth separable convolution operation on the superposition part and weighting the result obtained after global average pooling, thereby deriving two improved networks. Meanwhile, compared with a network which uses 1 multiplied by 1 point convolution to carry out the number of characteristic image channels, the image information and the network performance thereof can be effectively extracted by a verification analysis algorithm model in a characteristic multiplexing mode.
For convenience of description, in the following description, base _ net represents a basic algorithm model designed herein, base _ net _ dc represents a deep separable convolution operation performed on the multiplexing feature layer on the basic model, base _ net _ ga represents a weighting operation performed on the multiplexing feature layer by taking a global average pooling on the basic model, base _ net _ 1x1 represents an algorithm model for performing channel number widening by using 1x1 point convolution, and specific design parameters and training results thereof are shown in table 4;
Figure BDA0003483730170000136
Figure BDA0003483730170000141
TABLE 4 network model parameters
The 4 designed algorithm network models are listed in table 4, and the parameter quantity and the calculated quantity of each algorithm model are compared, and it can be seen from table 4 that after the algorithm model is constructed by using the lightweight strategy, the parameter quantity of each algorithm is effectively reduced, and the parameter quantity is maintained between 40000 and 50000. Particularly, a characteristic multiplexing mode is used, and the parameter quantity of the algorithm model is further reduced. In terms of calculated amount, the channel number is widened by using a characteristic multiplexing mode instead of 1 × 1 point convolution, so that the calculated amount is reduced, and besides a full connection layer is used in the process of weighting by taking global average pooling, the parameter amount and the calculated amount are reduced relatively little.
In terms of pixel prediction accuracy and cross-correlation ratio, PA and IoU are reduced compared with an algorithm model in which feature layers are directly superimposed, which uses 1 × 1 point convolution to widen the number of channels. The main reason is that the direct copying and overlapping of the feature layer causes missing of feature information extraction, and the effective information of the feature layer is not fully utilized. The algorithm is improved by adopting two modes of calculating the superposition part by adopting the depth separable convolution and weighting after taking the global tie pooling by the superposition part, the effective information of the characteristic layer is enriched, the effective information extraction is enhanced, the improved PA has 0.29 to 0.98 percent improvement compared with the original value, and IoU has 0.07 to 1.63 percent improvement, namely on the premise of reducing the parameter and the calculated amount, the improved network model has more pixel classification accuracy and cross-over ratio than a 1 multiplied by 1 point convolution for carrying out the channel number widening algorithm model.
In the training process, the difficulty of training the algorithm model is increased due to the reduction of the parameter number, as shown in fig. 4 to 6. According to the accuracy change curve and the intersection ratio change curve, in the initial training stage, PA and IoU are lifted fast, but the fluctuation is large, and a stable result can be achieved after multiple iterations.
In the testing process, images which do not appear in the training set are selected for testing, and the images mainly include three types, namely a normal orchard path diagram, a path diagram with weed and fallen leaf interference in the road surface and a path diagram with light spot interference and uneven illumination. It can be seen from the test results (fig. 8) that the above algorithms can extract the main part of the path, filter most of the interference, and mainly distinguish the main part from the edge of the path. The network which uses 1 × 1 point convolution to widen the channel number has the condition of error identification or no identification on edge details, particularly under the environment with interference; for example, in the base _ net _ 1x1 diagram of fig. 7, most of the unrecognized path portions appear on the left and right sides, and the unrecognized path portions appear on the right side under the light interference in the diagram. The method of feature superposition is used for replacing convolution with 1 multiplied by 1 points to widen the number of channels, and due to the fact that the number of algorithm parameters is reduced, the problem that omission and insufficiency occur in the process of extracting image feature information by the algorithm, the same problem occurs in a test result, for example, the problem that recognition cannot be achieved in base _ net in fig. 7 and the problem that recognition is wrong in the second graph occur.
In the algorithm model test after the improvement of the superposed characteristic layers, the problems are effectively reduced, and the fitting can be more effectively carried out on the unstructured path edges. As in base _ net _ dc in fig. 7, when the interference of fallen leaves exists, a part of the area on the left side cannot be recognized, but the edge can be better fitted on the right side of the path, and the situation of false recognition does not occur. Compared with the two improved methods for superposing the feature layers, the method has the advantages that the test result of the network for carrying out global pooling weighting on the superposed part of the feature layers is better, the situations that recognition cannot be carried out and recognition errors occur are less, and the path edge fitting capability is stronger, as shown in the test results of three scenes in fig. 7.
Currently, the mainstream lightweight networks include MobileNetUNet and ENet, and these networks are mainly designed for lightweight for convenience of mobile terminal deployment, but the examples are directed at example segmentation tasks, and large algorithm capacity is required, so that the parameter quantity is large. The algorithm model designed by the method mainly aims at the path recognition task under the unstructured orchard environment, only the path information and the background need to be distinguished, the required algorithm capacity is small, and certain advantages are achieved in the aspect of parameter reduction.
As shown in fig. 8, in terms of the quantity of parameters and the amount of computation, the algorithm network model base _ net _ ga proposed herein has a great advantage compared to MobileNetUNet and endet, the quantity of parameters and the amount of computation are only 11% -12% of other algorithms, requirements on memory and computation are low, it is beneficial to implement deployment at the terminal of the agricultural robot, and the threshold for hardware configuration is reduced. In the aspect of accuracy and intersection ratio, the difference is not large compared with other algorithms, the pixel classification accuracy is maintained at about 95%, and the intersection ratio is maintained at about 90%, so that the requirement for extracting the path information is met, as shown in fig. 9.
Meanwhile, in the training process, the problem solved by the method is that the range is specific for orchard unstructured path recognition, and in the process of training MobileNetUNet and ENet, due to the fact that the network models are large in capacity, an overfitting phenomenon easily occurs in the training process, and the generalization capability in the model testing process is poor. The capacity of the algorithm model provided by the method is more suitable for solving the problems, and the function of each network layer parameter can be fully exerted by combining a corresponding initialization method and an optimization strategy in the training process, so that a large amount of unnecessary calculation is avoided.
In conclusion, the invention designs a path identification algorithm for visual navigation according to the requirement of the agricultural robot for navigation in the complex environment of the orchard. The algorithm is designed aiming at specific scene application, has the characteristics of small parameter and calculation amount, meets the requirement of operation of the agricultural robot deployed in the complex environment of the orchard, has strong robustness and anti-interference performance, can filter most of influences existing in the environment, and extracts main path information.
The work herein focuses on algorithm design and training. In design, a plurality of lightweight strategies are adopted to reduce the model parameters and the calculated amount and improve the access efficiency of the memory, wherein the method is mainly characterized in that the channel number of the characteristic layer is widened by adopting a characteristic multiplexing mode, and the traditional mode that the number of convolution kernels is used to increase the channel number of the characteristic layer is replaced. On the basis, in order to ensure the network capacity and the richness of the feature layer, two improvements are made on the feature multiplexing part by adding a small amount of operation, so that a better identification effect is achieved. In the training, because the number of originally collected data sets is only 4600, the data sets are difficult to cover various conditions under the orchard environment, in order to enhance the generalization capability and the anti-interference capability of a network, the data enhancement is carried out on the existing data sets according to the change of the characteristics of the orchard environment, and the number of the data sets is expanded. And finally, testing after training and comparing with the current mainstream lightweight network, wherein under the condition that the accuracy rate is almost the same as the cross-over comparison rate, the parameter quantity and the calculated quantity are 11% -12% of other algorithm models, and the calculation efficiency of the algorithm is improved.
At present, the semantic segmentation algorithm based on deep learning shows excellent processing capacity in all aspects, the next step of work is to continue exploring network design, and more identification tasks are uniformly incorporated into one algorithm model to be completed, for example, obstacle avoidance, depth information matching and the like are carried out in the navigation process of the agricultural robot, so that the utilization efficiency of the algorithm model is improved, and the algorithm is lighter so as to meet the outdoor operation deployment of the agricultural robot and the requirements on the real-time performance of the agricultural robot.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. An orchard complex road segmentation method based on a lightweight semantic segmentation algorithm is characterized by comprising the following steps:
acquiring a data set in a preset range, and determining an image recognition model based on the data set;
acquiring an image to be detected in real time, and extracting the characteristics of the image to be detected based on the image recognition model to obtain a characteristic diagram;
performing convolution operation on the feature graph based on a trained depth separable convolution model; wherein, the number of output channels of the characteristic diagram is unchanged;
and determining a path and the position relation of the path in the image to be detected according to the convolution operation result.
2. The orchard complex road segmentation method based on the light-weight semantic segmentation algorithm according to claim 1, wherein a data set is obtained within a preset range, and the step of determining an image recognition model based on the data set comprises:
acquiring a data set within a preset range; the data set comprises a training set and a test set;
preprocessing the data set; the preprocessing comprises image labeling and data enhancement;
initializing algorithm parameters based on a preset algorithm network initialization method and an activation function, and training the initialized algorithm based on a training set;
and testing the trained algorithm according to the test set to determine an image recognition model.
3. The orchard complex road segmentation method based on the light-weight semantic segmentation algorithm according to claim 2, characterized in that Labelme software is adopted for image annotation, and an annotated image in png format is generated on the basis of original image annotation; the annotated image is generated by carrying out reassignment on different image channels based on the annotation of the corresponding original image.
4. The orchard complex road segmentation method based on the light-weight semantic segmentation algorithm according to claim 2, characterized in that the data enhancement step comprises image rotation, brightness adjustment, size scaling and noise superposition.
5. The orchard complex road segmentation method based on the lightweight semantic segmentation algorithm according to claim 2 is characterized in that algorithm parameters are initialized based on a preset algorithm network initialization method and an activation function, and the algorithm network initialization method in training the initialized algorithm based on a training set comprises Xavier initialization and MSRA initialization; the activation function employs a nonlinear function ReLU.
6. The orchard complex road segmentation method based on the light-weight semantic segmentation algorithm according to claim 1, wherein the step of performing feature extraction on the to-be-detected image based on the image recognition model comprises the following steps:
carrying out convolution operation on an input picture for a plurality of times and carrying out downsampling operation;
the feature extraction process is a two-classification extraction process, and the contents extracted in the two-classification extraction process are divided into roads and backgrounds.
7. The orchard complex road segmentation method based on the lightweight semantic segmentation algorithm according to claim 6, wherein the operation process of the depth separable convolution model is to decouple and separately calculate the operation of convolution on channel dimension and space dimension; wherein, the mapping operation is carried out on the space dimension separately as an initial operation step.
8. The orchard complex road segmentation method based on the lightweight semantic segmentation algorithm according to claim 6 or 7, wherein the operation process of the depth separable convolution model comprises the following steps:
broadening the number of characteristic image channels based on 1 multiplied by 1 point convolution;
when the characteristic diagram channel is widened, copying and overlapping are carried out based on the original characteristic diagram.
9. The orchard complex road segmentation method based on the light-weight semantic segmentation algorithm according to claim 8, characterized in that the superposition process of the feature maps comprises a depth separable convolution step and a global tie pooling weighting step.
10. The orchard complex road segmentation method based on the light-weight semantic segmentation algorithm according to claim 1, wherein the step of determining the position relation comprises the following steps:
an algorithm network of an encoding-decoding structure extracts the features of the image to be detected and compresses the resolution of the image to obtain a feature map;
decoding the feature maps at different stages based on a decoder and performing up-sampling operation;
and restoring the decoded image to be detected into the resolution of the original image, and determining the spatial position information of the image to be detected.
CN202210075398.XA 2022-01-22 2022-01-22 Orchard complex road segmentation method based on lightweight semantic segmentation algorithm Pending CN114463542A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210075398.XA CN114463542A (en) 2022-01-22 2022-01-22 Orchard complex road segmentation method based on lightweight semantic segmentation algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210075398.XA CN114463542A (en) 2022-01-22 2022-01-22 Orchard complex road segmentation method based on lightweight semantic segmentation algorithm

Publications (1)

Publication Number Publication Date
CN114463542A true CN114463542A (en) 2022-05-10

Family

ID=81412153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210075398.XA Pending CN114463542A (en) 2022-01-22 2022-01-22 Orchard complex road segmentation method based on lightweight semantic segmentation algorithm

Country Status (1)

Country Link
CN (1) CN114463542A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024131479A1 (en) * 2022-12-21 2024-06-27 腾讯科技(深圳)有限公司 Virtual environment display method and apparatus, wearable electronic device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112634276A (en) * 2020-12-08 2021-04-09 西安理工大学 Lightweight semantic segmentation method based on multi-scale visual feature extraction
CN113159051A (en) * 2021-04-27 2021-07-23 长春理工大学 Remote sensing image lightweight semantic segmentation method based on edge decoupling
CN113688836A (en) * 2021-09-28 2021-11-23 四川大学 Real-time road image semantic segmentation method and system based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112634276A (en) * 2020-12-08 2021-04-09 西安理工大学 Lightweight semantic segmentation method based on multi-scale visual feature extraction
CN113159051A (en) * 2021-04-27 2021-07-23 长春理工大学 Remote sensing image lightweight semantic segmentation method based on edge decoupling
CN113688836A (en) * 2021-09-28 2021-11-23 四川大学 Real-time road image semantic segmentation method and system based on deep learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
TAHA EMARA 等: "《LiteSeg: A Novel Lightweight ConvNet for Semantic Segmentation》", 《ARXIV:DICTA》 *
李云伍 等: "《基于改进空洞卷积神经网络的丘陵山区田间道路场景识别》", 《农业工程学报》 *
莫冬炎 等: "《基于环境感知的果园机器人自主导航技术研究进展》", 《机电工程技术》 *
韩振浩 等: "《基于U-Net网络的果园视觉导航路径识别方法》", 《农业机械学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024131479A1 (en) * 2022-12-21 2024-06-27 腾讯科技(深圳)有限公司 Virtual environment display method and apparatus, wearable electronic device and storage medium

Similar Documents

Publication Publication Date Title
Jia et al. Detection and segmentation of overlapped fruits based on optimized mask R-CNN application in apple harvesting robot
Zheng et al. A mango picking vision algorithm on instance segmentation and key point detection from RGB images in an open orchard
Peng et al. S-FPN: A shortcut feature pyramid network for sea cucumber detection in underwater images
CN110765916B (en) Farmland seedling ridge identification method and system based on semantics and example segmentation
CA3129174A1 (en) Method and apparatus for acquiring boundary of area to be operated, and operation route planning method
CN113609889B (en) High-resolution remote sensing image vegetation extraction method based on sensitive characteristic focusing perception
Foedisch et al. Adaptive real-time road detection using neural networks
Yang et al. Real-time detection of crop rows in maize fields based on autonomous extraction of ROI
Cai et al. Residual-capsule networks with threshold convolution for segmentation of wheat plantation rows in UAV images
Zhan et al. Vegetation land use/land cover extraction from high-resolution satellite images based on adaptive context inference
CN114067206B (en) Spherical fruit identification positioning method based on depth image
de Silva et al. Towards agricultural autonomy: crop row detection under varying field conditions using deep learning
Ye et al. A comparison between Pixel-based deep learning and Object-based image analysis (OBIA) for individual detection of cabbage plants based on UAV Visible-light images
Tabb et al. Automatic segmentation of trees in dynamic outdoor environments
Moreno et al. Analysis of Stable Diffusion-derived fake weeds performance for training Convolutional Neural Networks
Shuai et al. An improved YOLOv5-based method for multi-species tea shoot detection and picking point location in complex backgrounds
CN117611996A (en) Grape planting area remote sensing image change detection method based on depth feature fusion
Lu et al. Citrus green fruit detection via improved feature network extraction
Yan et al. High-resolution mapping of paddy rice fields from unmanned airborne vehicle images using enhanced-TransUnet
CN114463542A (en) Orchard complex road segmentation method based on lightweight semantic segmentation algorithm
CN114581307A (en) Multi-image stitching method, system, device and medium for target tracking identification
Wang et al. Fusing vegetation index and ridge segmentation for robust vision based autonomous navigation of agricultural robots in vegetable farms
Huang et al. A survey of deep learning-based object detection methods in crop counting
Dehkordi et al. Performance Evaluation of Temporal and Spatial-Temporal Convolutional Neural Networks for Land-Cover Classification (A Case Study in Shahrekord, Iran)
Zhang et al. Roadside vegetation segmentation with adaptive texton clustering model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220510