CN114463542A

CN114463542A - Orchard complex road segmentation method based on lightweight semantic segmentation algorithm

Info

Publication number: CN114463542A
Application number: CN202210075398.XA
Authority: CN
Inventors: 伍荣达; 杨尘宇; 朱立学; 张世昂; 郭晓耿
Original assignee: Zhongkai University of Agriculture and Engineering
Current assignee: Zhongkai University of Agriculture and Engineering
Priority date: 2022-01-22
Filing date: 2022-01-22
Publication date: 2022-05-10

Abstract

The invention relates to the technical field of orchard road analysis, and particularly discloses a lightweight semantic segmentation algorithm-based orchard complex road segmentation method, which comprises the steps of acquiring a data set in a preset range, and determining an image recognition model based on the data set; acquiring an image to be detected in real time, and extracting the characteristics of the image to be detected based on the image recognition model to obtain a characteristic diagram; performing convolution operation on the feature map based on the trained depth separable convolution model; wherein, the number of output channels of the characteristic diagram is unchanged; and determining a path and the position relation of the path in the image to be detected according to the convolution operation result. The visual recognition method has small difference with the recognition effect of other models, meets the requirement of extracting complex path information of the orchard, effectively improves the calculation efficiency of the algorithm, reduces the requirement of the agricultural robot on the calculation force required by the visual recognition in the orchard environment, and meets the requirement of the subsequent visual navigation on the real-time property.

Description

Orchard complex road segmentation method based on lightweight semantic segmentation algorithm

Technical Field

The invention relates to the technical field of orchard road analysis, in particular to an orchard complex road segmentation method based on a lightweight semantic segmentation algorithm.

Background

Hilly and mountain lands are main landforms in south-hilly areas of China and are also output areas of important economic crops such as forest and fruit crops. As the urbanization process advances, the labor force in rural areas is continuously reduced, and the problems of labor force shortage and high cost are more prominent when the hilly and mountain agricultural production faces. The promotion of agricultural production mechanization and intellectualization is one of effective ways for solving the problem. The autonomous navigation of the agricultural robot is an important step for realizing automatic operation, except that a satellite is used for positioning and planning a path, and for hilly and mountain orchards, the perception of the environment around the robot by using a vision system is particularly important so that the robot can perform autonomous navigation.

However, most of the deployment applications of the existing deep learning algorithms are application in the specific field of algorithm model migration trained by the public data set in a migration learning mode, in the process, the accuracy of the algorithms is considered more and the real-time performance is ignored, and the requirements on computing resources are extremely high, so that the method is not suitable for the real-time detection process and is difficult to be directly applied to the intelligent robot.

Disclosure of Invention

The invention aims to provide an orchard complex road segmentation method based on a lightweight semantic segmentation algorithm, so as to solve the problems in the background technology.

In order to achieve the purpose, the invention provides the following technical scheme:

an orchard complex road segmentation method based on a lightweight semantic segmentation algorithm comprises the following steps:

acquiring a data set in a preset range, and determining an image recognition model based on the data set;

acquiring an image to be detected in real time, and extracting the characteristics of the image to be detected based on the image recognition model to obtain a characteristic diagram;

performing convolution operation on the feature graph based on a trained depth separable convolution model; wherein, the number of output channels of the characteristic diagram is unchanged;

and determining a path and the position relation of the path in the image to be detected according to the convolution operation result.

As a further scheme of the invention: the step of acquiring a data set within a preset range and determining an image recognition model based on the data set comprises:

acquiring a data set within a preset range; the data set comprises a training set and a test set;

preprocessing the data set; the preprocessing comprises image labeling and data enhancement;

initializing algorithm parameters based on a preset algorithm network initialization method and an activation function, and training the initialized algorithm based on a training set;

and testing the trained algorithm according to the test set to determine an image recognition model.

As a further scheme of the invention: adopting Labelme software for image annotation, and generating an annotated image in the png format on the basis of original image annotation; the annotated image is generated by carrying out reassignment on different image channels based on the annotation of the corresponding original image.

As a further scheme of the invention: the data enhancement steps include image rotation, brightness adjustment, resizing and noise superposition.

As a further scheme of the invention: the algorithm network initialization method comprises the following steps of initializing algorithm parameters based on a preset algorithm network initialization method and an activation function, and training the initialized algorithm based on a training set, wherein the algorithm network initialization method comprises the steps of Xavier initialization and MSRA initialization; the activation function employs a nonlinear function ReLU.

As a further scheme of the invention: the step of extracting the characteristics of the image to be detected based on the image recognition model comprises the following steps:

carrying out convolution operation on an input picture for a plurality of times and carrying out downsampling operation;

the feature extraction process is a two-classification extraction process, and the contents extracted in the two-classification extraction process are divided into roads and backgrounds.

As a further scheme of the invention: the operation process of the depth separable convolution model is decoupling and separating calculation of convolution operation on channel dimension and space dimension; wherein, the mapping operation is carried out on the space dimension separately as an initial operation step.

As a further scheme of the invention: the operation process of the depth separable convolution model comprises the following steps:

broadening the number of characteristic image channels based on 1 multiplied by 1 point convolution;

when the characteristic diagram channel is widened, copying and overlapping are carried out based on the original characteristic diagram.

As a further scheme of the invention: the overlay process of the feature map includes a depth separable convolution step and a global tie pooling weighting step.

As a further scheme of the invention: the step of determining the positional relationship includes:

an algorithm network of an encoding-decoding structure extracts the features of the image to be detected and compresses the resolution of the image to obtain a feature map;

decoding the feature maps at different stages based on a decoder and performing up-sampling operation;

and restoring the decoded image to be detected into the resolution of the original image, and determining the spatial position information of the image to be detected.

Compared with the prior art, the invention has the beneficial effects that: the visual recognition method has small difference with the recognition effect of other models, meets the requirement of extracting complex path information of the orchard, effectively improves the calculation efficiency of the algorithm, reduces the requirement of the agricultural robot on the calculation force required by the visual recognition in the orchard environment, and meets the requirement of the subsequent visual navigation on the real-time property.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention.

FIG. 1 is a flow chart of an orchard complex road segmentation method based on a lightweight semantic segmentation algorithm.

Fig. 2 is a first exemplary diagram of an orchard unstructured path.

Fig. 3 is a second exemplary diagram of an orchard unstructured path.

Fig. 4 is a graph of the variation of the loss value during training.

FIG. 5 is a graph of the change in pixel classification accuracy during the training process.

Fig. 6 is a graph of the change of the cross-over ratio in the training process.

FIG. 7 is a graph comparing the test results.

FIG. 8 is a graph of algorithm parameter versus calculated quantity.

FIG. 9 is a comparison graph of algorithm accuracy versus cross-correlation.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The visual navigation system is widely used for obstacle detection, target tracking, path information extraction and the like, and can provide more actual environment characteristic information in a satellite positioning signal interruption or complex environment. The existing technical scheme comprises:

the Mengqing breadth and the like utilize Cg components in a YCrCg model to carry out fuzzy C mean clustering segmentation, and a method based on linear scanning of crop rows is adopted to extract crop row center lines;

the Gao national organ and the like adopt H components in an HIS model to carry out K-means clustering segmentation, and then carry out linear fitting through a Hough algorithm to extract path information;

jiang and the like carry out Hough change and clustering on the feature points at the center of the wheat line to extract the center line of the wheat line;

the Wangchunli and the like optimize the gray operator by utilizing a genetic algorithm and use the gray operator for threshold segmentation, so that the corn stubble rows are accurately and quickly segmented;

chen Zi and the like provide a multi-crop line extraction algorithm based on an automatic Hough transformation accumulation threshold;

zhangxiong and the like carry out binarization segmentation on the image by using a color difference method and an OTUS method, and then fit a navigation path through a Hough algorithm;

wanqiao and the like judge whether the headland appears according to the jumping characteristics of the gray levels of the pixels inside and outside the farmland, and then fit the jumping characteristic points through a stable regression algorithm to obtain a headland path leading line;

chen and the like provide a predictive point Hough transform algorithm to extract a navigation path aiming at the problem of large calculation amount of the traditional Hough algorithm. The traditional image processing algorithm can obtain a better processing result in a stable scene, but in the problems of processing complex scenes, filtering interference and the like, the designed traditional image algorithm is difficult to cover the possibility of all scenes, and meanwhile, in order to achieve a target effect, the algorithm design complexity is high, and real-time calculation is not facilitated.

In recent years, deep learning and computing devices have been developed rapidly, so that artificial intelligence technology is beginning to be widely applied to the agricultural field. Liyunwei et al propose to perform scene recognition on the field roads in hilly and mountainous areas based on an improved hole convolution neural network; zhang Qin and the like utilize a YOLO algorithm to realize the detection and the positioning of rice seedlings so as to extract crop row lines; poetry and the like provide an infrared real-time bilateral semantic segmentation network suitable for infrared images; zhang et al used the DeepLabV3+ network to divide the wheat growth area.

However, most of the existing deployment applications of the deep learning algorithm are to migrate the algorithm model trained by the public data set to a specific field for application in a migration learning manner, and in the process, the accuracy of the algorithm is considered more and the real-time performance is ignored.

Aiming at the problems of complex orchard environment, fuzzy path boundary, uncertain interference and the like, the orchard path information is identified and extracted by a semantic segmentation algorithm based on a deep convolutional neural network; meanwhile, in order to reduce the image information processing cost of the agricultural robot and meet the requirement on real-time performance, a plurality of lightweight strategies are adopted to design the network, the parameter number and the calculated amount of the algorithm model are reduced, the utilization efficiency and the operation processing speed of the algorithm model are improved, and a foundation is provided for navigation by subsequently utilizing path information.

Example 1

Fig. 1 is a flow chart of an orchard complex road segmentation method based on a lightweight semantic segmentation algorithm, and in the embodiment of the present invention, the orchard complex road segmentation method based on the lightweight semantic segmentation algorithm includes:

step S100: acquiring a data set in a preset range, and determining an image recognition model based on the data set;

step S200: acquiring an image to be detected in real time, and extracting the characteristics of the image to be detected based on the image recognition model to obtain a characteristic diagram;

step S300: performing convolution operation on the feature graph based on a trained depth separable convolution model; wherein, the number of output channels of the characteristic diagram is unchanged;

step S400: and determining a path and the position relation of the path in the image to be detected according to the convolution operation result.

After the full Convolutional neural Networks (FCNs) have been proposed, more and more Semantic Segmentation Networks (issbpcs) based on pixel classification are used for Image Segmentation.

Compared with the Image Semantic Segmentation based on the region Classification (ISSbRC), ISSbPC has better performance in terms of Image Segmentation accuracy and Segmentation speed. At present, the commonly used ISSbPC algorithms include FCN, UNet, SegNet, deep lab series and the like, but the parameter quantity and the calculation quantity of the algorithms are relatively large, and the algorithms are difficult to be deployed in practical application scenes to realize real-time operation. In the navigation process of the agricultural robot, the path segmentation information needs to be acquired in real time under limited computational power resources, and the problems of low algorithm operation speed, high energy consumption and the like caused by excessive parameter quantity and computational quantity of an algorithm model are solved. Aiming at the problems, the path identification method under the orchard complex environment is provided based on the lightweight semantic segmentation algorithm, and is used for the agricultural robot to perform real-time path identification under the orchard non-structural environment.

Step S100 is a process of establishing an image recognition model, and further, the step of acquiring a data set within a preset range and determining the image recognition model based on the data set includes:

Specifically, Labelme software is adopted for image annotation, and an annotated image in the png format is generated on the basis of original image annotation; the annotated image is generated by carrying out reassignment on different image channels based on the annotation of the corresponding original image; the data enhancement steps include image rotation, brightness adjustment, resizing and noise superposition.

In addition, the algorithm parameters are initialized based on a preset algorithm network initialization method and an activation function, and the algorithm network initialization method for training the initialized algorithm based on the training set comprises Xavier initialization and MSRA initialization; the activation function employs a nonlinear function ReLU.

For the above, the following are exemplified:

referring to fig. 2 and 3, in an orchard environment, paths generally exist in an unstructured manner, and different states exist on the same road section due to influences of time, place, climate and the like. Taking an experimental object banana garden as an example, the path under the orchard complex environment mainly has the following characteristics: (1) path boundary blurring; (2) interference such as weeds and fallen leaves exists in the path; (3) interference such as uneven light, light spots and the like exists; (4) and fruit tree branches and leaves fall and are shielded. These complex environmental characteristics and interferences pose great difficulties for orchard path identification.

The orchard environment is divided into two categories, namely a path and a background according to the requirement of subsequent agricultural robot autonomous navigation operation. The algorithm is used for fitting an algorithm model for identifying the path in the image by performing iterative training on information obtained by pre-dividing the image, so that a visual processing system can extract path information quickly, and the division is defined as table 1.

TABLE 1 orchard road scene object Classification

The semantic segmentation is carried out on the orchard environment, and the main purpose is to extract path information in a complex environment so as to facilitate the autonomous navigation operation of the agricultural robot. Training data were collected at banana planting base (21 ° 06N, 110 ° 06E) in tunnel county, Zhanjiang province, Guangdong, in 7-8 months (banana harvest time). In order to keep consistency with the actual operation scene of the subsequent trolley, when the image is collected, the manual handheld collecting equipment is kept within the range of being 90 +/-5 cm away from the ground, and the included angle between the center of the optical axis and the ground is kept within the range of 20-45 degrees. In order to improve the diversity of data sets and the generalization capability of training models, different acquisition devices are adopted for shooting, and simultaneously, image acquisition is carried out under different weather conditions and different time periods such as sunny days, cloudy days and the like. In an orchard, 4600 images are collected for a plurality of scenes and under different environmental conditions as a data set, 4000 images are randomly extracted as a training set, and the rest part is a test set.

The deep convolutional neural network training mode is supervised learning, and before training, a data set needs to be labeled manually. Meanwhile, in order to improve the utilization of data set information and enhance the generalization capability of the network model, the data of the training set is enhanced, and the data quality and the characteristic diversity are effectively improved. Namely, the image preprocessing comprises image labeling and data enhancement.

And (3) adopting Labelme software for image annotation, and generating an annotated image in the png format on the basis of original image annotation. The marked image is mainly characterized in that the mark of the corresponding original image is utilized to carry out reassignment on different image channels. For example, in this document, images are mainly divided into two main categories — background and path, where RGB channels are all assigned 0 at the pixels corresponding to the background and 1 at the pixels corresponding to the path, and the specific definitions are shown in table 2.

TABLE 2 orchard scene road image annotation schematic

Because the training data set only consists of 4000 pictures, a certain persuasion is lacked for the training of the deep learning algorithm, and meanwhile, an overfitting phenomenon possibly occurs in the training process due to the fact that the data volume is small. Therefore, the original data set is expanded by adopting a data enhancement mode, operation is performed by combining the conditions possibly met by the agricultural robot in the actual scene, the effective utilization information of the data set is increased, the anti-interference capability of an algorithm model is increased, and the number of images in the data set is increased to 8000.

The specific operations of data enhancement mainly include image rotation, brightness adjustment, size scaling, noise superposition and the like. When the image rotation corresponds to the operation, the camera is inclined due to the road bumping; the brightness adjustment corresponds to the light intensity in different time periods; zooming in and out the view angles corresponding to different cameras; noise superposition indicates that the camera is disturbed. The data enhancement operation described above randomly enhances the original training set with a 50% probability, expanding the training set to 8000 sheets. The following experiments prove that 8000 pictures just meet the requirement of algorithm design through training, and the algorithm testing process has good generalization capability and does not generate an overfitting phenomenon.

For step S200 and step 300, the step of performing feature extraction on the to-be-detected image based on the image recognition model includes:

The operation process of the depth separable convolution model is decoupling and separating calculation of convolution operation on channel dimension and space dimension; wherein, the mapping operation is carried out on the space dimension separately as an initial operation step.

Further, the operation process of the depth separable convolution model includes:

Specifically, the superposition process of the feature map comprises a depth separable convolution step and a global tie pooling weighting step.

In the process of constructing the semantic segmentation model, in order to extract feature information with higher dimensionality, an input picture is usually subjected to convolution operation for multiple times and downsampling operation, the size of a feature image is reduced, and the receptive field of a convolution kernel is improved. The semantic segmentation model generally performs downsampling operation on an input image for 4-5 times, so that the characteristic diagram information is richer, and an algorithm is convenient to classify the detailed parts of the image. In the method, the target is set, only the algorithm model needs to classify and extract the orchard image, namely, the orchard image is a two-classification task, the edge of the path is fuzzy, and only the outline of the path needs to be extracted.

Furthermore, considering the real-time requirement of the segmentation task in the visual navigation, and only performing two classifications on image pixels, the required network capacity is relatively small, so that the network structure is cut in depth, and only 3 times of downsampling and 3 times of upsampling operations are performed on the original image, which is beneficial to improving the network operation speed. And the reduction of the network depth of the algorithm can reduce the whole reception field of the network, which is not beneficial to the extraction of image space information, so that the convolution kernel is improved, and the cavity convolution is adopted for substitution, so that the whole reception field can not be reduced due to the reduction of the network depth.

In particular, the deep separable convolution is another effective measure in the algorithm design to effectively reduce the network computation amount. The conventional convolution kernel is a three-dimensional filter and performs joint mapping operation on the channel dimension and the space dimension of the image. Under the condition that the convolution kernel size is large and the number of output channels is large, the required parameters are large, assuming that the convolution kernel size is k × k, the output feature map size is c × c, the number of input channels is n, and the number of output channels is m, then the calculated amount s1 required for completing one conventional convolution is:

s₁＝n×k×k×m×c×c

the depth separable convolution is to decouple the operation of the convolution on the channel dimension and the space dimension, separate the calculation and firstly separately perform the mapping operation on the space dimension. Compared with the conventional convolution, each channel of the input feature map is only operated with the corresponding convolution kernel channel, and the number of the channels of the output feature map is consistent with the input number. Meanwhile, in order to change the number of output channels and allow the feature map data to perform flow interaction in the channel dimension, a conventional convolution (point convolution) with a convolution kernel size of 1 × 1 is generally performed again at the output stage. Assuming that the convolution kernel size is k × k, the output feature map size is c × c, the number of input channels is n, and the number of output channels is m, the amount s2 required for completing one depth separable convolution and point convolution is:

s₂＝n×k×k×c×c+n×m×c×c

the two ways calculate the quantity ratio as:

it can be seen from the above calculation amount ratio that the convolution kernel size setting has a large influence on the calculation amount. In the design, except for the input and output layers, the convolution layers are replaced with depth separable convolutions and the convolution kernel size is set to 3 × 3, the amount of computation after the replacement will be reduced to 1/9.

It should be noted that, in order to reduce the number of parameters in algorithm design, the conventional convolution is replaced by the depth separable convolution. In order to widen the number of channels of the network, the number of channels of the feature map needs to be adjusted by using 1 × 1 point convolution. The use of the 1 × 1 convolution will occupy a large amount of memory and increase the number of Floating points (FLOPs), and especially when the 1 × 1 convolution modifies the number of channels, the access cost to the memory will increase. In view of the fact that a large number of redundant features exist in the feature diagram of the same layer, when the feature diagram channel is widened, the original feature diagram is used for copying and overlapping, and the purpose of increasing the number of the feature diagram channels is achieved. In order to reduce the consumption of memory access during 1 × 1 convolution, the number of channels is not adjusted, the number of channels is kept unchanged, output is performed, and features are copied and superimposed on an output feature map during output, so that the number of channels of the feature map is widened.

Assuming that the number of final output channels is 2 times of the input (i.e., m is 2n) and the size of the feature map is w × h, the FLOPs obtained by superimposing the channel widening and the feature copy by using the point convolution are respectively:

point convolution FLOPs ═ w × h × n × m ═ w × h × n × 2n

Feature copy overlay FLOPs

From the above formula, it can be seen that the FLOPs are effectively reduced by using the feature copy superposition mode, and the reduction size is proportional to the channel increase multiple.

The design method can greatly reduce the number of algorithm parameters, but due to the fact that direct copying is adopted, the problems of omission and the like caused by extraction of the obtained feature layer information can be caused, the final algorithm training result is poor, and the problems are also confirmed in subsequent experiment comparison.

In order to make full use of the feature map of the overlapped part, some transformation processing is also carried out on the part in the design process, so that the information of the whole feature map is richer. The transformation processing comprises the steps of calculating the superposition part by adopting depth separable convolution and weighting after global tie pooling is carried out by the superposition part, and the two methods have the common advantage that the information of the superposed characteristic graph can be richer only by increasing a small amount of calculation. In the experimental process, two improvement modes are also shown to contribute to the improvement of the algorithm result.

To sum up, the overall design structure of the network is mainly based on the encoding-decoding structure, and the original image is down-sampled and up-sampled three times, as shown in fig. 4. Except for the input layer and the output layer, all other convolution layers are replaced by a depth separable convolution and point convolution form so as to achieve the purpose of reducing parameter quantity and calculation quantity. On the basis of the structure, feature multiplexing is introduced, the purpose of widening the number of feature diagram channels is achieved by utilizing feature superposition, and the use of 1 multiplied by 1 convolution is reduced. Meanwhile, in order to enrich the superposed characteristic graph information, two modes of depth separable convolution and global tie pooling weighting are used for the superposed part of the characteristic layer, the utilization efficiency of the algorithm network on the characteristic information of the image is further improved, and the algorithm network can realize more accurate segmentation on the path in a complex orchard environment.

Step S400 is a result output step, in which the determination process of the positional relationship is a core step, and the determination step of the positional relationship includes:

In the process of identifying the path in the complex environment of the orchard, the acquisition of the spatial position relationship of the path in the image is very important in order to accurately judge the pose of the robot in the path in the subsequent visual navigation process. In order to better acquire the position information of pixels in an image space, an algorithm network adopting an encoding-decoding structure is provided, an encoder is used for carrying out feature extraction on an original image and compressing the resolution of the image to acquire high-dimensional abstract feature information, a decoder is used for decoding feature images at different stages and carrying out up-sampling operation, a segmentation structure is restored to the resolution of the original image, and the space position information of the image is restored.

In order to verify the feasibility of the network designed herein and the recognition effect of the path in a complex environment, the network is subjected to programming and training tests, and the specific experimental environment and configuration are shown in table 3;

TABLE 3 Experimental Environment and configuration

In the process of network training, the initialization method and the training strategy have great influence on the network training. The current main algorithm network initialization method comprises Xavier initialization, MSRA initialization and the like. By using a proper initialization method, the problems of gradient disappearance, gradient explosion and the like in the algorithm network training process can be prevented, so that the algorithm can extract features better. In the process of algorithm design, a nonlinear function ReLU is used as an activation function, and the calculation is as follows:

in each layer of the network, in order to make the input and output follow the same distribution as much as possible, a MSRA initialization method is used for initializing algorithm parameters;

the learning rate (lr) refers to the amplitude of the back-propagation update of the parameters during network training. The training of the network can be accelerated by a larger learning rate, but the problem of jitter can also be brought; a smaller learning rate may extend the training time of the network and cause overfitting problems. In the network training process, the learning rate is dynamically adjusted, and the network is guided to train better to a certain extent. In the experiment, the learning rate is mainly adjusted through the training loss value condition, and the specific expression is as follows:

in the training process, the Dice Loss and the binary cross entropy Loss are selected for combination, and the specific calculation is as follows:

where yc denotes the true label and pc denotes the predictive label.

In combination with the requirement of the experimental target, the Pixel Classification Accuracy (PA) and the Intersection ratio (IoU) in the current semantic segmentation evaluation index are used as the main evaluation indexes of the experiment. The specific calculation is as follows:

and pii represents the total number of pixels with the actual type i and the predicted type i, and pij represents the total number of pixels with the actual type i and the predicted type j.

And (3) carrying out algorithm model construction according to the lightweight strategy, and improving the characteristic multiplexing part on the basis of the lightweight strategy, namely carrying out primary depth separable convolution operation on the superposition part and weighting the result obtained after global average pooling, thereby deriving two improved networks. Meanwhile, compared with a network which uses 1 multiplied by 1 point convolution to carry out the number of characteristic image channels, the image information and the network performance thereof can be effectively extracted by a verification analysis algorithm model in a characteristic multiplexing mode.

For convenience of description, in the following description, base _ net represents a basic algorithm model designed herein, base _ net _ dc represents a deep separable convolution operation performed on the multiplexing feature layer on the basic model, base _ net _ ga represents a weighting operation performed on the multiplexing feature layer by taking a global average pooling on the basic model, base _ net _ 1x1 represents an algorithm model for performing channel number widening by using 1x1 point convolution, and specific design parameters and training results thereof are shown in table 4;

TABLE 4 network model parameters

The 4 designed algorithm network models are listed in table 4, and the parameter quantity and the calculated quantity of each algorithm model are compared, and it can be seen from table 4 that after the algorithm model is constructed by using the lightweight strategy, the parameter quantity of each algorithm is effectively reduced, and the parameter quantity is maintained between 40000 and 50000. Particularly, a characteristic multiplexing mode is used, and the parameter quantity of the algorithm model is further reduced. In terms of calculated amount, the channel number is widened by using a characteristic multiplexing mode instead of 1 × 1 point convolution, so that the calculated amount is reduced, and besides a full connection layer is used in the process of weighting by taking global average pooling, the parameter amount and the calculated amount are reduced relatively little.

In terms of pixel prediction accuracy and cross-correlation ratio, PA and IoU are reduced compared with an algorithm model in which feature layers are directly superimposed, which uses 1 × 1 point convolution to widen the number of channels. The main reason is that the direct copying and overlapping of the feature layer causes missing of feature information extraction, and the effective information of the feature layer is not fully utilized. The algorithm is improved by adopting two modes of calculating the superposition part by adopting the depth separable convolution and weighting after taking the global tie pooling by the superposition part, the effective information of the characteristic layer is enriched, the effective information extraction is enhanced, the improved PA has 0.29 to 0.98 percent improvement compared with the original value, and IoU has 0.07 to 1.63 percent improvement, namely on the premise of reducing the parameter and the calculated amount, the improved network model has more pixel classification accuracy and cross-over ratio than a 1 multiplied by 1 point convolution for carrying out the channel number widening algorithm model.

In the training process, the difficulty of training the algorithm model is increased due to the reduction of the parameter number, as shown in fig. 4 to 6. According to the accuracy change curve and the intersection ratio change curve, in the initial training stage, PA and IoU are lifted fast, but the fluctuation is large, and a stable result can be achieved after multiple iterations.

In the testing process, images which do not appear in the training set are selected for testing, and the images mainly include three types, namely a normal orchard path diagram, a path diagram with weed and fallen leaf interference in the road surface and a path diagram with light spot interference and uneven illumination. It can be seen from the test results (fig. 8) that the above algorithms can extract the main part of the path, filter most of the interference, and mainly distinguish the main part from the edge of the path. The network which uses 1 × 1 point convolution to widen the channel number has the condition of error identification or no identification on edge details, particularly under the environment with interference; for example, in the base _ net _ 1x1 diagram of fig. 7, most of the unrecognized path portions appear on the left and right sides, and the unrecognized path portions appear on the right side under the light interference in the diagram. The method of feature superposition is used for replacing convolution with 1 multiplied by 1 points to widen the number of channels, and due to the fact that the number of algorithm parameters is reduced, the problem that omission and insufficiency occur in the process of extracting image feature information by the algorithm, the same problem occurs in a test result, for example, the problem that recognition cannot be achieved in base _ net in fig. 7 and the problem that recognition is wrong in the second graph occur.

In the algorithm model test after the improvement of the superposed characteristic layers, the problems are effectively reduced, and the fitting can be more effectively carried out on the unstructured path edges. As in base _ net _ dc in fig. 7, when the interference of fallen leaves exists, a part of the area on the left side cannot be recognized, but the edge can be better fitted on the right side of the path, and the situation of false recognition does not occur. Compared with the two improved methods for superposing the feature layers, the method has the advantages that the test result of the network for carrying out global pooling weighting on the superposed part of the feature layers is better, the situations that recognition cannot be carried out and recognition errors occur are less, and the path edge fitting capability is stronger, as shown in the test results of three scenes in fig. 7.

Currently, the mainstream lightweight networks include MobileNetUNet and ENet, and these networks are mainly designed for lightweight for convenience of mobile terminal deployment, but the examples are directed at example segmentation tasks, and large algorithm capacity is required, so that the parameter quantity is large. The algorithm model designed by the method mainly aims at the path recognition task under the unstructured orchard environment, only the path information and the background need to be distinguished, the required algorithm capacity is small, and certain advantages are achieved in the aspect of parameter reduction.

As shown in fig. 8, in terms of the quantity of parameters and the amount of computation, the algorithm network model base _ net _ ga proposed herein has a great advantage compared to MobileNetUNet and endet, the quantity of parameters and the amount of computation are only 11% -12% of other algorithms, requirements on memory and computation are low, it is beneficial to implement deployment at the terminal of the agricultural robot, and the threshold for hardware configuration is reduced. In the aspect of accuracy and intersection ratio, the difference is not large compared with other algorithms, the pixel classification accuracy is maintained at about 95%, and the intersection ratio is maintained at about 90%, so that the requirement for extracting the path information is met, as shown in fig. 9.

Meanwhile, in the training process, the problem solved by the method is that the range is specific for orchard unstructured path recognition, and in the process of training MobileNetUNet and ENet, due to the fact that the network models are large in capacity, an overfitting phenomenon easily occurs in the training process, and the generalization capability in the model testing process is poor. The capacity of the algorithm model provided by the method is more suitable for solving the problems, and the function of each network layer parameter can be fully exerted by combining a corresponding initialization method and an optimization strategy in the training process, so that a large amount of unnecessary calculation is avoided.

In conclusion, the invention designs a path identification algorithm for visual navigation according to the requirement of the agricultural robot for navigation in the complex environment of the orchard. The algorithm is designed aiming at specific scene application, has the characteristics of small parameter and calculation amount, meets the requirement of operation of the agricultural robot deployed in the complex environment of the orchard, has strong robustness and anti-interference performance, can filter most of influences existing in the environment, and extracts main path information.

The work herein focuses on algorithm design and training. In design, a plurality of lightweight strategies are adopted to reduce the model parameters and the calculated amount and improve the access efficiency of the memory, wherein the method is mainly characterized in that the channel number of the characteristic layer is widened by adopting a characteristic multiplexing mode, and the traditional mode that the number of convolution kernels is used to increase the channel number of the characteristic layer is replaced. On the basis, in order to ensure the network capacity and the richness of the feature layer, two improvements are made on the feature multiplexing part by adding a small amount of operation, so that a better identification effect is achieved. In the training, because the number of originally collected data sets is only 4600, the data sets are difficult to cover various conditions under the orchard environment, in order to enhance the generalization capability and the anti-interference capability of a network, the data enhancement is carried out on the existing data sets according to the change of the characteristics of the orchard environment, and the number of the data sets is expanded. And finally, testing after training and comparing with the current mainstream lightweight network, wherein under the condition that the accuracy rate is almost the same as the cross-over comparison rate, the parameter quantity and the calculated quantity are 11% -12% of other algorithm models, and the calculation efficiency of the algorithm is improved.

At present, the semantic segmentation algorithm based on deep learning shows excellent processing capacity in all aspects, the next step of work is to continue exploring network design, and more identification tasks are uniformly incorporated into one algorithm model to be completed, for example, obstacle avoidance, depth information matching and the like are carried out in the navigation process of the agricultural robot, so that the utilization efficiency of the algorithm model is improved, and the algorithm is lighter so as to meet the outdoor operation deployment of the agricultural robot and the requirements on the real-time performance of the agricultural robot.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. An orchard complex road segmentation method based on a lightweight semantic segmentation algorithm is characterized by comprising the following steps:

2. The orchard complex road segmentation method based on the light-weight semantic segmentation algorithm according to claim 1, wherein a data set is obtained within a preset range, and the step of determining an image recognition model based on the data set comprises:

3. The orchard complex road segmentation method based on the light-weight semantic segmentation algorithm according to claim 2, characterized in that Labelme software is adopted for image annotation, and an annotated image in png format is generated on the basis of original image annotation; the annotated image is generated by carrying out reassignment on different image channels based on the annotation of the corresponding original image.

4. The orchard complex road segmentation method based on the light-weight semantic segmentation algorithm according to claim 2, characterized in that the data enhancement step comprises image rotation, brightness adjustment, size scaling and noise superposition.

5. The orchard complex road segmentation method based on the lightweight semantic segmentation algorithm according to claim 2 is characterized in that algorithm parameters are initialized based on a preset algorithm network initialization method and an activation function, and the algorithm network initialization method in training the initialized algorithm based on a training set comprises Xavier initialization and MSRA initialization; the activation function employs a nonlinear function ReLU.

6. The orchard complex road segmentation method based on the light-weight semantic segmentation algorithm according to claim 1, wherein the step of performing feature extraction on the to-be-detected image based on the image recognition model comprises the following steps:

7. The orchard complex road segmentation method based on the lightweight semantic segmentation algorithm according to claim 6, wherein the operation process of the depth separable convolution model is to decouple and separately calculate the operation of convolution on channel dimension and space dimension; wherein, the mapping operation is carried out on the space dimension separately as an initial operation step.

8. The orchard complex road segmentation method based on the lightweight semantic segmentation algorithm according to claim 6 or 7, wherein the operation process of the depth separable convolution model comprises the following steps:

9. The orchard complex road segmentation method based on the light-weight semantic segmentation algorithm according to claim 8, characterized in that the superposition process of the feature maps comprises a depth separable convolution step and a global tie pooling weighting step.

10. The orchard complex road segmentation method based on the light-weight semantic segmentation algorithm according to claim 1, wherein the step of determining the position relation comprises the following steps: