CN117367402A

CN117367402A - Unmanned mower map construction method based on semantic segmentation and depth map

Info

Publication number: CN117367402A
Application number: CN202311206127.4A
Authority: CN
Inventors: 李沈崇
Original assignee: Nanjing Tengya Robot Technology Co ltd
Current assignee: Nanjing Tengya Robot Technology Co ltd
Priority date: 2023-09-18
Filing date: 2023-09-18
Publication date: 2024-01-09

Abstract

The invention provides a map construction method of an unmanned mower based on semantic segmentation and depth map combination, which comprises the following steps: s1) acquiring a target environment image and performing semantic segmentation to segment grasslands in the image, S2) detecting obstacles in the grasslands where the robot is located, S3) acquiring distance information between the robot and the grassland boundary and distance information between the robot and the obstacles according to a lawn semantic segmentation map and an obstacle detection map, S4) controlling the robot to run around the grassland boundary according to the distance information between the robot and the grassland boundary, S5) controlling the robot to execute obstacle avoidance operation according to the distance information between the robot and the obstacles, and S6) recording the robot paths and the distance information between the obstacles obtained in the steps S4 and S5 through a GPS and constructing a map.

Description

Unmanned mower map construction method based on semantic segmentation and depth map

Technical Field

The invention relates to a visual detection technology, in particular to an unmanned mower map construction method based on semantic segmentation and depth map combination, and belongs to the technical field of artificial intelligence.

Background

In recent years, the global lawn market still presents a rapid development trend, and the industry is located in a high-scenic spot area. The annual trimming of lawns, golf courses, stadiums and the like requires a great deal of time, money and manpower. Unmanned mowers replace manual mowing and have become a necessary trend of social development. However, the construction of the working map of unmanned mowers has been a difficult problem.

There are three methods for conventional mower work maps. The first is to adopt a mode of embedding magnetic induction lines to solve the problem of a working map of the mower, and adopt a random steering method to change the direction after detecting electromagnetic signals. The method has the advantages that the lawn trimming effect is common, the places with missed cutting are easy to occur, and the follow-up magnetic wires also need to be maintained regularly. The second approach to near-low speed unmanned applications uses a variety of sensor sensing, such as ultrasonic and infrared sensors, but this approach is costly. The third is a GPS positioning mode, the mower is controlled to run on a working boundary for one circle before the mower works normally, and map information including the working boundary and a fixed obstacle boundary is positioned in advance, but the work is too tedious and time-consuming.

The inventors searched for CN113115622: although the method and the device for controlling obstacle avoidance of the vision robot and the method for detecting the obstacle in the patent of the mowing robot are similar, the patent adopts a preset mowing strategy database to search a mowing strategy corresponding to the shape and the size of the obstacle; the patent uses binocular ranging to measure the distance between the obstacle and the lawn boundary, and does not relate to the shape matching mowing strategy of the obstacle. CN109601109: in the patent of the anti-collision method of the unmanned mowing vehicle based on binocular vision detection, a binocular detection method is also adopted, but the patent adopts a depth map ranging and semantic segmentation method in computer vision, and the boundary of the lawn is detected more accurately as the moving range of the robot.

Disclosure of Invention

Aiming at the technical problems, the application provides a high-efficiency and high-precision mower working area map construction method, which aims at solving the problems that in the prior art, manual marking is needed for mower working area map construction, and the process is complex and tedious.

The application provides a semantic segmentation combined depth map-based unmanned mower map construction method, which is characterized by comprising the following steps of:

s1) acquiring a target environment image, performing semantic segmentation, and segmenting grasslands in the image;

s2) detecting obstacles in the grassland where the robot is located;

s3) obtaining boundary distance information and obstacle distance information of the robot and the grassland according to the lawn semantic segmentation map and the obstacle detection map;

s4) controlling the robot to run around the grassland boundary according to the distance information of the robot and the grassland boundary;

s5) controlling the robot to execute obstacle avoidance operation according to the distance information of the robot and the obstacle;

s6) recording the robot path and obstacle distance information obtained in the step S4 and the step S5 through a GPS, and constructing a map.

Preferably, the semantic segmentation step of the target environment image in step S1 includes:

s11) data preprocessing: splitting the obtained RGB-D data, carrying out image correction on an RGB image in the RGB-D data, then adding a random enhancement mode, and then adopting normalization processing to obtain training data finally required by a model;

s12) segmenting the dataset, making a category label: the obtained RGB image is manufactured into a lawn segmentation data set aiming at a lawn scene, and a lawn area and other category labels are designed;

s13) improving a lightweight MobieNet V2 backbone network, embedding a high-efficiency channel attention mechanism in a sampling process with a network step length, deleting a last average pooling layer and a full connection layer of an original network, and finally generating a characteristic layer of CxHxW;

s14) pyramid pooling layer extracts the overall features: generating characteristic layers with different sizes of CxHxW generated in the previous step, wherein cx1x1, cx2x2 and cx4x4 are generated through an average pooling layer; up-sampling three characteristic layers with different sizes in the previous step to the original CxHxW size through bilinear interpolation, stacking the obtained three characteristic layers with the convolution result of the original characteristic layer 1x1 to obtain a characteristic layer of 3CxHxW, and finally outputting the characteristic layer of CxHxW through a convolution of 3x 3;

s15) feature fusion: the fusion decoding is carried out to generate a feature map and a feature layer output by the pyramid in the step S14 or the feature layer generated in the step S13, the feature map is output, the weight of the two feature layers is adaptively calculated to obtain a better fusion mode, the picture subjected to data preprocessing is input to obtain a corresponding semantic segmentation result,

s16) target distance mapping

Combining the semantic segmentation result with the original depth map, sampling the depth map and the segmentation result to the same size, obtaining a pixel coordinate set sum of a target area according to the semantic segmentation result, summing and averaging pixel values according to the coordinate set sum of the extracted depth map, and dividing the sum by a scale coefficient to obtain the actual distance of each obstacle;

the mapping method is that each pixel in a label corresponding depth map of a segmentation result map is multiplied by a distance scale to obtain an actual measurement distance, the parallel distance between each display point and a depth camera is obtained according to the mapping of the pixel position, and then all the parallel distances in the label are averaged to obtain a final distance.

Preferably, the step S11) of preprocessing data includes:

s111) converting the obtained RGB image into HSV format, wherein the specific conversion formula is as follows

Cmax＝(max(R，G，B)

Cmin＝min(R，G，B)

Δ＝Cmax-Cmin

V＝Cmax

Wherein: RGB is the pixel value of red, green and blue channels of the picture, HSV is the color tone, saturation and brightness value of the picture, and V' isThe V after the adjustment is carried out,the average value of V is shown, I (X, Y) is a value after three Gaussian convolutions, and X, Y are pixel coordinates;

s112) inputting the V value into three continuous Gaussian kernel convolutions with the size of 7 to extract the low-frequency information of the picture, inputting the information into gamma function gamma, combining the gamma function gamma with the original H and S to form a new HSV image, and converting the new HSV image back to an RGB image to obtain a corrected RGB image.

Preferably, the step of fusing S15 features includes:

s151) inputting two feature images with different sizes, and interpolating the small-size feature image to the size of the large feature image by a bilinear interpolation up-sampling method;

s152) convolving the feature map 1x1 sampled in step S151 to obtain a two-dimensional matrix of the same size, convolving the large-size feature map directly 1x1 to obtain a two-dimensional matrix,

s153) obtaining weight matrixes of the two feature graphs in the last step through matrix dot multiplication and a Sigmoid function, wherein the weight matrixes are divided into an upper branch, the upper branch is a small-size feature graph interpolation result, the lower branch is a (1-weight average matrix), the lower branch is a large-size feature graph, and then adding the obtained two matrixes to obtain an output value.

Preferably, in step S6, when only the unreachable area and the cut area remain on the map, the map unit will save the GPS information of the unreachable area and divide the area into a grid of 20cm x 20cm aligned with the map, and synchronize the grid with the map and save the map in the memory, so as to obtain the complete map information.

Preferably, in step S6, the grill area through which the mower is traveling is recorded as a cut area, and the area that is not passed is not a cut area;

1) When the static obstacles of the shrubs, flower beds and trees are encountered in the running process, controlling the mower to run to the boundary of the obstacle according to the obstacle category information and the obstacle distance information transmitted by the image processing unit, executing the obstacle-surrounding running by the robot, recording GPS information, setting the obstacle-surrounding running by the robot as an unreachable point, and setting an area surrounded by a connecting line of the robot as an unreachable area;

2) When the robot encounters a dynamic obstacle with positions of pedestrians, kittens and puppies moving at any time in the running process, the robot executes obstacle detouring instructions according to the obstacle category information and the obstacle distance information transmitted by the image processing unit, and the robot keeps a safe distance of 3 grids, namely 0.6m with the obstacle through binocular distance measurement, and the robot preferentially runs towards an uncut area to detours around the dynamic obstacle.

The invention also discloses an unmanned mower map building model based on semantic segmentation and depth map, which is characterized by comprising the following steps:

the data preprocessing module is used for carrying out image correction on the RGB image of the target environment, adding a random enhancement mode and carrying out normalization processing;

the pyramid pooling module is used for aggregating the context information in different areas and extracting the whole characteristics;

the feature fusion module is used for fusing the feature image generated by decoding with the feature layer output by the pyramid or fusing the feature image output by the feature fusion layer of the average pooling layer, and obtaining a better fusion mode through self-adaptive calculation of the weights of the two feature layers;

and the target distance mapping module is used for calculating the actual measurement distance from the other region category labels to the robot.

The invention also discloses a vision robot obstacle avoidance control device, which is arranged on the robot and comprises:

1) The image acquisition unit is arranged on the left camera and the right camera at two sides of the head of the robot and is used for transmitting the acquired image information to the image processing unit for identifying the obstacle;

2) The image processing unit is used for processing the image information transmitted by the image acquisition unit so as to detect grassland and obstacle information in the grassland environment where the robot is positioned;

3) The post-processing unit is used for processing the grassland boundary and obstacle distance information and sending a robot control instruction;

4) The execution unit is used for controlling the robot to bypass the obstacle and executing mowing operation according to the instruction of the post-processing unit;

5) And the map unit is used for recording GPS information and recording lawn boundaries and static obstacle information.

The invention also discloses an unmanned mowing robot, and the vision obstacle avoidance control device adopting the robot.

The beneficial technical effects are as follows: the unmanned mower map construction method based on binocular vision detection does not need excessive manual intervention and later maintenance, reduces cost and improves efficiency.

Drawings

Fig. 1 is a flowchart of image correction in embodiment 1;

FIG. 2 is a schematic diagram of the attention mechanism in example 1;

FIG. 3 is a schematic view showing the processing of the feature image through the pyramid pooling layer in example 1;

FIG. 4 is a feature fusion diagram of the feature map in example 1;

FIG. 5 is a schematic diagram of the feature map of example 1 subjected to pyramid pooling and feature fusion;

fig. 6 is a method of automatically constructing map boundaries in embodiment 1;

FIG. 7 is a method of detecting the distance between a robot and a lawn boundary in embodiment 1;

FIG. 8 is a method of maintaining the robot orientation parallel to the lawn boundary in example 1;

fig. 9 is a schematic diagram showing the identification of lawns and other types of labels in example 1.

Fig. 10 is a schematic diagram showing the comparison of RGB image correction in example 1.

Fig. 11 is a schematic diagram of a feature fusion calculation process in embodiment 1.

A method of identifying an obstacle update map in embodiment 2 of fig. 12.

Description of the embodiments

In order to better understand the above technical solutions, the following detailed description will refer to the accompanying drawings and specific embodiments.

Example 1

The embodiment provides a map construction method of an unmanned mower based on semantic segmentation and depth map, when a robot enters a working grassland environment for the first time, as shown in fig. 6, the execution steps include:

s1) recognizing the lawn, collecting image information of a target environment by a camera, transmitting the image information to an image processing unit, and performing semantic segmentation on the lawn by the image processing unit through key frames transmitted by the camera to obtain a semantic segmentation map of the lawn. When the front is detected to be the lawn, the robot always runs forward, and when the lawn mower approaches the lawn boundary, the image processing unit detects the distance between the robot and the lawn boundary: the semantic segmentation step comprises the following steps:

s11) data preprocessing: as shown in fig. 1: firstly, carrying out image correction on an RGB image of a target environment, then adding a random enhancement mode such as random clipping, random overturning, adding random Gaussian noise and other enhancement modes, and finally, adopting normalization processing to obtain training data finally required by a model; the step S11) of preprocessing the data comprises the following steps:

s111) converting the obtained RGB image into HSV format, the specific conversion formula is:

Cmax＝max(R，G，B)；

Δ＝Cmax-Cmin；

V＝Cmax；

wherein: RGB is the pixel value of the red, green and blue channels of the picture, HSV is the tone, saturation and brightness value of the picture, V' is the adjusted V,the average value of V is shown, I (X, Y) is a value after three Gaussian convolutions, and X, Y are pixel coordinates;

s112) inputting the V value into three continuous Gaussian kernel convolutions with the size of 7 to extract the low-frequency information of the picture, inputting the information into a gamma function gamma, combining the gamma function gamma with the original H and S to form a new HSV image, and converting the new HSV image back to an RGB image to obtain a corrected RGB image, wherein before and after correction, the corrected RGB image is shown as a graph a, a-1, a graph b and a graph b-1 in FIG. 10.

S12) segmenting the dataset, making a category label: the obtained RGB-D data is fabricated into a lawn segmentation dataset for a lawn scene, and the lawn area and other categories such as labels of trees, stones, roads, etc. are designed as shown in fig. 9.

S13) improving the lightweight mobienet v2 backbone network to generate a feature layer of CxHxW, as shown in fig. 2: to efficiently utilize the computational power of the mobile device, an improved lightweight mobienet v2 backbone network is chosen: and embedding a high-efficiency channel attention mechanism in the sampling process of the network step length of one to ensure that the model selects the interesting characteristic, and deleting the final average pooling layer and the full connection layer of the network.

S14) pyramid pooling layer extracts the overall features: the context information in different areas is aggregated through the pyramid pooling layer, and global-based guiding information is provided. Although the features extracted through the backbone network are quite abundant, the trans-regional context information expression is deficient, and the pyramid pooling module can be used for extracting the whole features better. And (3) generating the sizes of CxHxW 1x1, cx2x2 and cx3x3 by using the characteristic layer with the size of CxHxW generated in the last step through an average pooling layer, up-sampling to the size of CxHxW through bilinear interpolation, stacking the obtained three characteristic layers with the convolution result of the original characteristic layer 1x1 to obtain the characteristic layer of 3CxHxW, and finally outputting the characteristic layer of CxHxW through the convolution of 3x3, as shown in fig. 3.

S15) feature fusion: the fusion decoding is carried out to generate a feature map and a feature layer output by the pyramid in the step S14, or the feature layer is fused with the feature generated in the step S13, the feature map is output, the weight of the two feature layers is calculated in a self-adaptive mode, a better fusion mode is obtained, and the picture subjected to data preprocessing is input to obtain a corresponding semantic segmentation result; as shown in fig. 4 and 11: the step of S15 feature fusion comprises the following steps:

s153) obtaining weight matrixes of the two feature graphs in the last step through matrix dot multiplication and Si gmoi d function, wherein the weight matrixes are divided into an upper branch, a small-size feature graph interpolation result is divided into an upper branch, a lower branch (1-weight average matrix) is divided into a large-size feature graph, and then the obtained two matrixes are added to obtain an output value.

S16) target distance mapping

The RGB-D camera generates an RGB picture and a depth map, combines a semantic segmentation result with an initial depth map, samples the depth map and the segmentation result to the same size, ignores the corresponding segmentation result due to possible occurrence of holes in the depth map, obtains a pixel coordinate set sum of a target area according to the semantic segmentation result, sums and averages pixel values according to the coordinate set sum of the extracted depth map, and divides the average by a scale factor to obtain the actual distance of each obstacle;

as shown in fig. 5: the mapping method is that each pixel in a label corresponding depth map of a segmentation result map is multiplied by a distance scale to obtain an actual measurement distance, the parallel distance between each display point and a depth camera is obtained according to the mapping of the pixel position, and then all the parallel distances in the label are averaged to obtain a final distance.

S2) detecting obstacles in the grassland where the robot is located; in this embodiment, the obstacle is roughly classified into a dynamic obstacle, such as an obstacle moving at any time, such as a person, a cat, a dog, a car, etc.; static barriers such as long-term unchanged positions of flower beds, stones, benches and the like.

S3) calculating boundary distance information and obstacle distance information between the robot and the grassland according to the lawn semantic segmentation map and the obstacle detection map;

and when the continuous key frames identify the lawn boundary and the distance information, controlling the mower to drive towards the lawn edge.

The direction is changed when the robot runs to the lawn boundary, and the first steering is prioritized in the left direction. The image processing unit controls the direction of the image processing unit to be parallel to the lawn direction: and the right camera judges the shape of the lawn area in front according to the right side contour line of the lawn semantic segmentation graph, and adjusts the position of the robot until the lower end of the contour line in the segmentation graph moves to a specified position.

When the image processing unit detects that the front boundary turns, the binocular range principle and the image processing unit are also utilized to control the robot to drive to the turning position and turn. And judging the shape of the front lawn area according to the right side contour line of the lawn instance segmentation map, and if the front lawn area is bent leftwards, adjusting the robot to rotate leftwards so as to maintain the right side contour line of the lawn instance segmentation map at an initial position. Thereby keeping the robot parallel to the lawn boundary.

S5) controlling the robot to execute obstacle avoidance operation according to the distance information of the robot and the obstacle; as shown in fig. 7: when the robot A runs and meets the intersection of the flower bed B and the grassland boundary, the robot calculates the shortest distance S between the obstacle smaller than the height of the robot and the grassland boundary and the robot.

Referring to fig. 8, d is a value representing the longest distance of the robot to the obstacle, subtracting a preset safety threshold u from S as the safety distance travelled by the mowing robot, and then controlling the robot to travel to the obstacle boundary. And then controlling the robot to change the direction so that the direction of the robot is parallel to the grassland boundary until the robot bypasses the obstacle, and recording the flower bed B and the driving path into the map.

When the GPS is overlapped with the GPS at the beginning after one circle of running, dividing the area surrounded by the running path of the robot into mowing areas, and dividing the area into 20 cm-20 cm grids serving as the map range.

S6) recording the robot coordinate paths and the obstacle distance information obtained in the step S4 and the step S5 through a GPS, and constructing a map.

The embodiment also provides a model adopted by the map construction method, which comprises the following steps:

the feature fusion module is used for fusing the feature layer output by the pyramid and the feature layer generated by fusion decoding or the feature layer output by the feature fusion layer of the average pooling layer, and obtaining a better fusion mode through self-adaptive calculation of the weights of the two feature layers;

The embodiment also discloses a visual robot obstacle avoidance control device adopted by the method, which comprises the following steps:

Example 2

The technical scheme is that the robot executes a full-coverage path planning algorithm to perform full-coverage operation on a mowing area in a map after the robot runs around a lawn for a circle. The area of the grid that the mower is traveling through is recorded as a cut area, and the area that is not passed is not a cut area.

1) When static obstacles such as shrubs, flower beds, trees and the like are encountered during running, the image processing unit can identify the obstacles. And controlling the mower to travel to the boundary of the obstacle according to the obstacle type information and the obstacle distance information transmitted by the image processing unit, and then traveling around the obstacle and recording GPS information by the method of the first embodiment, setting the obstacle type information and the obstacle distance information as unreachable points, and setting an area surrounded by connecting lines of the obstacle type information and the obstacle distance information as unreachable areas.

2) If a dynamic obstacle moving at any time such as a pedestrian, a kitten, a puppy and the like is encountered during the driving process, the image processing unit recognizes the dynamic obstacle. The control device controls the mower to execute obstacle detouring instructions according to the obstacle category information and the obstacle distance information transmitted by the image processing unit. As shown in fig. 12, the robot preferentially travels around the dynamic obstacle toward the uncut area by keeping the robot at a safe distance of 0.6m (3 grids) from the dynamic obstacle by binocular ranging.

Dynamic obstacles coincide with fixed obstacles in the process of constructing a map, and a person sitting on a chair is taken as an example. The robot will travel around the obstacle with the person first, keeping a distance of 0.6m from the person.

When only the unreachable area and the cut area are left on the map, the map unit stores GPS information of the unreachable area and divides the area into grids of 20cm by 20cm aligned with the map, and the grids are synchronized to the map to be stored in the memory, so that complete map information is obtained. The obstacle information may be retrieved from the map when the robot identifies the fixed obstacle again the next time, thereby reducing the repeatedly divided unreachable areas.

The foregoing is merely a preferred embodiment of the invention, and it should be noted that modifications could be made by those skilled in the art without departing from the principles of the invention, which modifications would also be considered to be within the scope of the invention.

Claims

1. The unmanned mower map construction method based on semantic segmentation and depth map combination is characterized by comprising the following steps of:

s2) detecting obstacles in the grassland where the robot is located;

2. The unmanned mower map construction method based on semantic segmentation and depth map according to claim 1, wherein the semantic segmentation of the target environment image in step S1 comprises:

s15) feature fusion: the fusion decoding is carried out to generate a feature map and a feature layer output by the pyramid in the step S14, or the feature layer is fused with the feature generated in the step S13, the feature map is output, the weight of the two feature layers is calculated in a self-adaptive mode, a better fusion mode is obtained, and the picture subjected to data preprocessing is input to obtain a corresponding semantic segmentation result;

s16) target distance mapping

Combining the semantic segmentation result with the initial depth map, sampling the depth map and the segmentation result to the same size, acquiring pixel coordinates of a target area according to the semantic segmentation result, and dividing the pixel coordinates by a scale coefficient to obtain the actual distance of each obstacle according to the coordinate set and the pixel value of the extracted depth map.

3. The unmanned mower map construction method based on semantic segmentation and depth map according to claim 2, wherein the method is characterized in that: the step S11) of preprocessing the data comprises the following steps:

Cmax＝max(R，G，B)

Cmin＝min(R，G，B)

Δ＝Cmax-Cmin

V＝Cmax

4. The unmanned mower map construction method based on semantic segmentation and depth map according to claim 2, wherein the method is characterized in that: the step of S15 feature fusion comprises the following steps:

5. The method for mapping unmanned mower based on semantic segmentation and depth map according to claim 1, wherein in step S6, when only the unreachable area and the segmented area remain on the map, the map unit will save the GPS information of the unreachable area and segment the area into a grid of 20cm x 20cm aligned with the map, and synchronize to the map to save in the memory, so as to obtain the complete map information.

6. The unmanned mower map construction method based on semantic segmentation and depth map according to claim 1, wherein in step S6, the grid area where the mower is running is recorded as a cut area, and the area which is not passed is not a cut area;

2) When the robot encounters a dynamic obstacle with positions of pedestrians, kittens and puppies moving at any time in the running process, the robot executes obstacle detouring instructions according to the obstacle category information and the obstacle distance information transmitted by the image processing unit, and the robot keeps a safe distance of 3 grids with the obstacle through binocular distance, so that the robot preferentially runs towards an uncut area to detours the dynamic obstacle.

7. The unmanned mower map building model based on semantic segmentation and depth map is characterized by comprising the following components:

8. A vision robot obstacle avoidance control device, the control device being mounted on a robot, comprising:

9. An unmanned robot that mows, its characterized in that: the robot vision obstacle avoidance control device of claim 8.