CN117889867B

CN117889867B - Path planning method based on local self-attention moving window algorithm

Info

Publication number: CN117889867B
Application number: CN202410304943.7A
Authority: CN
Inventors: 范至正; 谢非; 杨继全; 张策; 李艺钧; 王鲁睿; 孙煜炫; 周正亚; 陈君
Original assignee: Nanjing Normal University
Current assignee: Nanjing Normal University
Priority date: 2024-03-18
Filing date: 2024-03-18
Publication date: 2024-05-24
Anticipated expiration: 2044-03-18
Also published as: CN117889867A

Abstract

The invention discloses a path planning method based on a local self-attention moving window algorithm, which comprises the steps of obtaining an RGB image of a current working environment, preprocessing and converting the RGB image into a gray scale image; after edge detection, a binarized image is obtained, and an edge self-attention weight is obtained through an activation function; obtaining curvature self-attention weight according to entropy of sea plug matrix eigenvalue of the image block after dividing the gray level diagram; the self-supervision attention semantic segmentation network is improved through the edge self-attention weight and the curvature self-attention weight, an image containing semantic information is obtained through the improved network, and the movement direction of the obstacle is predicted according to the images obtained at continuous moments, so that the movement direction of the robot is selected. Introducing entropy of the sea plug matrix to obtain new self-attention weight, and enhancing segmentation of the object edges; the self-attention weight obtained by edge detection is introduced, the segmentation of a part with larger color change in the image is enhanced, and the accuracy of edge segmentation is improved, so that the success rate of obstacle avoidance is improved.

Description

Path planning method based on local self-attention moving window algorithm

Technical Field

The invention relates to robot vision path planning, in particular to a path planning method based on a local self-attention moving window algorithm.

Background

With the continuous development of computer vision technology, semantic segmentation based on deep learning is also widened in the field of robot vision path planning. The technology can utilize the deep learning model to sense the local environment in real time, and further improves the recognition accuracy of the obstacle through semantic information, so that more reliable environment sensing data is provided for path planning. However, the existing partial path planning technology still has the problems of inaccurate identification target and inaccurate edge segmentation.

At present, path planning is used as a key technology in the fields of autonomous mobile robots, automatic driving vehicles and the like, and aims to realize safe and efficient movement in complex and dynamic environments. The traditional path planning method is mainly based on a classical planning algorithm, and has good effect in a static environment, but is difficult to process environmental information changing in real time in a dynamic environment. Along with the continuous development of artificial intelligence technologies such as deep learning, reinforcement learning and the like, a new breakthrough is made to the path planning technology. The deep learning method can better understand environmental characteristics and dynamic changes by learning a large amount of real scene data, so that the robustness and adaptability of path planning are improved. However, the current path planning method adopting deep learning has the problems of dimensional change and self movement of the obstacle under the dynamic complex condition, and the problem of error in obstacle avoidance caused by inaccurate edge segmentation is easy to occur.

Disclosure of Invention

The invention aims to: aiming at the defects, the invention provides the path planning method based on the local self-attention moving window algorithm, which has the advantages of strong instantaneity, high barrier segmentation accuracy and strong background interference resistance.

The technical scheme is as follows: in order to solve the problems, the invention adopts a path planning method based on a local self-attention moving window algorithm, which comprises the following steps:

(1) Acquiring an RGB image of a current operation environment, and preprocessing to acquire an RGB image of the preprocessed operation environment;

(2) Converting the RGB image of the preprocessed working environment into a gray scale image;

(3) Performing edge detection on the converted gray level image to obtain a binary image containing edge characteristic information, and obtaining the edge self-attention weight of the binary image through an activation function; dividing the converted gray image to obtain a plurality of gray image blocks, calculating entropy of sea plug matrix eigenvalues corresponding to each divided gray image block, and calculating curvature self-attention weight of each gray image block through an activation function according to the entropy;

(4) The self-supervision attention semantic segmentation network is improved through edge self-attention weights and curvature self-attention weights, the self-supervision attention semantic segmentation network comprises two continuous self-attention window layers, and the edge self-attention weights and the curvature self-attention weights are respectively added into the attention calculation of the two continuous self-attention window layers;

(5) The RGB image of the operation environment after preprocessing is processed through the improved self-supervision attention semantic segmentation network, the operation environment RGB image containing semantic information is obtained, the position of an obstacle substance center is calculated according to the operation environment RGB image containing semantic information obtained at continuous moments, the movement direction of the obstacle is predicted, and the movement direction of the obstacle is selected according to the predicted movement direction of the obstacle.

Further, the preprocessing of the RGB image of the current working environment in the step (1) includes scaling the RGB image of the current working environment, and then performing flipping, affine transformation and noise addition on the scaled RGB image to obtain the RGB image of the preprocessed working environment.

Further, in the step (3), performing edge detection on the transformed gray scale image to obtain a binary image containing edge feature information, and obtaining the edge self-attention weight of the binary image through an activation function includes:

(3.11) carrying out smoothing treatment on the gray level image, then calculating the gradient amplitude of each pixel in the image, carrying out maximum value inhibition operation on the gradient amplitude, and finally carrying out edge detection through a double-threshold algorithm to obtain a binarized image containing edge information;

(3.12) smoothing the confidence coefficient of each pixel point in the binarized image containing the edge information, wherein the confidence coefficient is calculated according to the formula:

；

wherein, Represents the/>Confidence of individual pixel points,/>Representing the/>, in a binarized image containing edge informationGray value of each pixel/(Representing the/>, in a binarized image containing edge informationPixel dot and/>Distance of individual pixel points,/>Divider/>, representing a binarized image containing edge informationFirst pixel out ofConfidence of individual pixel points,/>Representing the total number of pixel points in the binarized image containing the edge information;

(3.13) converting the confidence into edge self-attention weight of each pixel point through an activation function, wherein the calculation formula is as follows:

；

wherein, Edge self-attention weight value representing first pixel point in binarized image containing edge information,/>For/>The function is activated.

Further, the calculation formula of the entropy of the sea plug matrix eigenvalue corresponding to each gray image block after segmentation in the step (3) is as follows:

；

wherein, Represents the/>Entropy of sea plug matrix eigenvalue corresponding to each gray image block,/>Representing the number of pixels in each gray image block,/>Represents the/>And the characteristic values of the sea plug matrix corresponding to the gray image blocks.

Further, the calculation formula of the curvature self-attention weight in the step (3) is as follows:

；

wherein, Represents the/>Curvature self-attention weight corresponding to each gray image block,/>The total number of the gray image blocks after the segmentation processing is defined.

Further, the processing the RGB image of the pre-processed working environment through the improved self-supervision attention semantic segmentation network in the step (5) includes:

(5.1) reducing the RGB image of the preprocessed working environment, and inputting the images before and after reduction into an image block segmentation layer from an original image channel and a reduced image channel respectively for segmentation to obtain a plurality of non-overlapping image blocks respectively;

(5.2) inputting the image blocks obtained by segmentation into a first linear self-attention feature extraction module, wherein the first linear self-attention feature extraction module comprises a linear embedding layer and two continuous self-attention window layers, and obtaining a feature map after the first feature extraction;

(5.3) sequentially inputting the feature map after the first feature extraction into three identical fusion self-attention feature extraction modules, wherein each fusion self-attention feature extraction module comprises a patch fusion layer and two continuous self-attention window layers, and obtaining a feature map after the fourth feature extraction;

and (5.4) inputting the feature images obtained in the original image channel and the contracted image channel after the feature extraction of each time into a decoder for semantic segmentation after up-sampling and jump connection operation, and obtaining the operation environment RGB image containing semantic information.

Further, the patch fusion layer of the first fused self-attention feature extraction module includes:

extracting the first characteristic in the original image channel Input to the patch fusion layer, downsampled, and output dimension set to/>Obtaining a feature map/>, after the second feature extraction; Feature map/>, after first feature extraction, in a contracted map channelInput to the patch fusion layer, downsampled, and output dimension set to/>Obtaining a feature map/>, after the second feature extraction，/>For image depth,/>Is the height of the image,/>Is the width of the image;

The second patch fusion layer fused with the self-attention feature extraction module comprises:

Extracting the second characteristic in the original image channel Input to the patch fusion layer, downsampled, and output dimension set to/>Obtaining a feature map/>, after the third feature extraction; Feature map/>, after feature extraction of the second time, in the contracted map channelInput to the patch fusion layer, downsampled, and output dimension set to/>Obtaining a feature map/>, after the third feature extraction；

The third patch fusion layer fused with the self-attention feature extraction module includes:

extracting the third feature in the original image channel to obtain a feature image Input to the patch fusion layer, downsampled, and output dimension set to/>Obtaining a feature map/>, after the fourth feature extraction; Feature map/>, after feature extraction of third time, in contracted map channelInput to the patch fusion layer, downsampled, and output dimension set to/>Obtaining a feature map/>, after the fourth feature extraction。

Further, the two consecutive self-attention window layers include:

Inputting the feature images output by the linear embedding layer or the patch fusion layer into a first layer normalization layer for normalization processing, inputting the feature images after layer normalization into a window self-attention calculation layer for window allocation operation, adding curvature self-attention weight into self-attention calculation, sequentially inputting the feature images output by the window self-attention calculation layer and the feature images output by the linear embedding layer or the patch fusion layer into a full-connection layer, a second layer normalization layer and a multi-layer perceptron layer, and finally performing full-connection operation on the feature images output by the multi-layer perceptron layer and the feature images output by the window self-attention calculation layer to obtain the feature images output by a first attention window module, wherein the self-attention calculation formula is as follows:

；

wherein, To add the self-attention calculation function of curvature self-attention weight, variablesTo query vectors, variables/>Is a key vector,/>As a transpose of the key vector, variable/>Is a value vector, variable/>Represents the offset, variable/>Representing the dimensions of the key,/>For the/>, in the feature map of the current working environment after layer normalizationCurvature self-attention weight of each pixel point,/>Representing the total number of pixel points in the feature map of the current working environment after layer normalization,/> />, In feature map representing current working environment after layer normalizationPixel dot,/>To be improved afterThe function is activated.

Inputting the feature map output by the first attention window module into a third layer normalization layer for normalization processing, inputting the feature map after layer normalization into a moving window multi-head self-attention layer, adding edge self-attention weight into self-attention calculation, sequentially inputting the feature map output by the moving window multi-head self-attention layer and the feature map output by the first attention window module into a full-connection layer, a fourth layer normalization layer and a multi-layer perceptron layer, and performing full-connection operation on the feature map output by the multi-layer perceptron layer and the feature map output by the moving window multi-head self-attention layer through the processing of an activation function of the multi-layer perceptron layer to obtain the feature map output by the whole of two continuous attention window modules; the self-attention calculation formula is as follows:

；

wherein, To add the self-attention computation function of the edge self-attention weight,For the/>, in the feature map of the current working environment after layer normalizationThe edges of the individual pixels are weighted from the attention.

Further, in the step (5.4), the step of inputting the feature map obtained by extracting each feature in the original map channel and the contracted map channel into the decoder for semantic segmentation after upsampling and jump connection operation includes:

Upsampling the feature map after the fourth feature extraction in the contracted map channel to make the feature map have the following size The image is connected with the feature image after the fourth feature extraction in the original image channel in a jumping way, and is input to the extended self-attention module, and the output image size is/>The extended self-attention module comprises a patch extension layer and two continuous self-attention layers;

Performing up-sampling operation on the feature map after the third feature extraction in the contracted map channel to make the size of the feature map be the same as that of the contracted map channel The image is connected with the feature image after the third feature extraction in the original image channel in a jumping way, and is input to the extended self-attention module, and the output image size is/>；

Performing up-sampling operation on the feature map after the second feature extraction in the contracted map channel to make the feature map have the size ofAnd the image is connected with the feature image after the second feature extraction in the original image channel in a jumping way, and is input to the extended self-attention module, and the output image size is/>；

Upsampling the feature map after the first feature extraction in the contracted map channel to make the feature map have the following sizeThe image is connected with the feature image after the first feature extraction in the original image channel in a jumping way, and is input to the extended self-attention module, and the output image size is/>。

Further, the calculating the position of the center of the obstacle according to the RGB images of the working environment containing semantic information obtained at successive times, predicting the movement direction of the obstacle, and selecting the movement direction according to the predicted movement direction of the obstacle includes:

(5.51) calculating position coordinates of the center point of the obstacle in the operation environment image containing semantic information at each moment;

(5.52) predicting the position of the obstacle at the next moment according to the prediction function and the position coordinates of the center point of the obstacle, wherein the formula is as follows:

；

wherein, Represents the abscissa of the predicted obstacle center point at the next moment,/>Represents the/>Abscissa of obstacle center point in operation environment RGB image containing semantic information at moment,/>Representing the number of co-acquired obstacle coordinates;

(5.53) determining the movement direction of the robot at the next moment according to the predicted position of the center point of the obstacle at the next moment in the working environment image containing semantic information, wherein the calculation formula is as follows:

；

wherein, Abscissa representing center point of RGB image of a work environment containing semantic information,/>Represents the abscissa of the predicted obstacle center point at the next moment,/>For predicting the distance between the center point of the obstacle at the next moment and the center point of the RGB image of the operation environment containing semantic information,/>The robot rotation angle during path planning is set;

(5.54) the robot adjusts the corresponding movement direction and moves forward by a preset distance, and then returns to execute the step (1).

The beneficial effects are that: compared with the prior art, the invention has the remarkable advantages that the self-supervision attention semantic segmentation network is more focused on the segmentation of the edges of the object by introducing new self-attention weights obtained by entropy calculation according to the sea plug matrix; the self-supervision attention semantic segmentation network is led in to pay more attention to segmentation of the part with larger color change in the image according to the self-supervision attention weight obtained by edge detection and calculation, and the accuracy of edge segmentation is improved, so that the success rate of obstacle avoidance is improved. The improved activation function is used for calculating the self-attention weight, so that the self-supervision attention semantic segmentation network is more suitable for an application scene of path planning, the accuracy of edge segmentation is improved, and the success rate of obstacle avoidance is improved; smoothing the binarized image obtained by edge detection to avoid excessive attention to the edge part during self-attention calculation; the speed of identifying the obstacle is high, the accuracy is high, the background interference resistance is high, and the path planning accuracy is improved.

Drawings

FIG. 1 is a schematic workflow diagram of a path planning method of the present invention.

Fig. 2 is a gray scale image of the current work environment captured by the camera of the present invention.

FIG. 3 is a binarized image of the present invention for edge detection of the current operating environment.

FIG. 4 is a diagram of a self-supervising attention semantic segmentation network model in accordance with the present invention.

Fig. 5 is a semantic segmentation effect diagram of an RGB image of a current work environment obtained using the prior art UperNet.

Fig. 6 is a semantic division effect diagram of an RGB image of the current working environment obtained by the present invention.

Detailed Description

In the path planning method based on the local self-attention moving window algorithm, a REALSENSE D435i depth binocular camera is used for collecting an RGB image of a current working environment, processing the image through software of an upper computer, displaying a processing result by utilizing a liquid crystal display screen, and inputting the processing result into a robot processor for path planning. The method can be applied to the field of robot path planning.

As shown in fig. 1, a path planning method based on a local self-attention moving window algorithm in this embodiment includes the following steps:

(1) The RGB image of the current working environment is obtained through the camera, the RGB image of the current working environment is preprocessed, and the RGB image of the preprocessed working environment is obtained.

(1.1) Scaling the acquired color image to an image size acceptable to the backbone network.

(1.2) Inverting, affine transforming and noise adding the scaled color image to obtain a preprocessed RGB image of the current working environment.

(2) The RGB image of the current working environment is converted into a gray image, and the gray image of the current working environment is obtained, as shown in fig. 2.

(3) Performing edge detection on the converted gray level image to obtain a binary image containing edge characteristic information, and obtaining the edge self-attention weight of the binary image through an activation function; the method comprises the following specific steps:

(3.11) smoothing the preprocessed gray image of the current working environment, and using a Gaussian kernel function to the original image by using a Gaussian filter to obtain the smoothed gray image of the current working environment. Wherein the formula of the gaussian kernel function is as follows:

；

wherein, As a Gaussian kernel function,/>Representing bandwidth,/>And the coordinates of each pixel point in the preprocessed gray-scale image of the current working environment are represented.

(3.12) Calculating a first-order partial derivative finite difference by using a Sobel edge detection operator to obtain a gradient amplitude of each pixel in a gray level image of the current working environment, performing maximum value inhibition operation on the gradient amplitude, and finally performing edge detection by a double-threshold algorithm, wherein the specific method can refer to Yanglin. The image edge extraction [ J ]. Information of the improved operator of the local self-adaptive threshold method and a computer (theoretical version), 2023,35 (14): 78-80, which are not described herein any more, so as to obtain a binary image of the current working environment containing edge information, as shown in fig. 3.

(3.13) Taking the pixel value of each pixel point in the binarized image of the current working environment containing the edge information as a confidence coefficient, and smoothing the confidence coefficient, wherein a confidence coefficient calculation formula is as follows:

；

wherein, Represents the/>Confidence of individual pixel points,/>Gray value representing first pixel point in binarized image containing edge information,/>, andRepresenting the/>, in a binarized image containing edge informationPixel dot and/>Distance of individual pixel points,/>Divider/>, representing a binarized image containing edge informationFirst pixel out ofConfidence of individual pixel points,/>The total number of pixels in the binarized image containing the edge information is represented.

Calculating the edge attention weight of the confidence coefficient of each pixel point in the smoothed binarized image of the current working environment containing the edge information by using the improved activation function, wherein the calculation formula is as follows:

；

wherein, Represents the/>Confidence of individual pixel points,/>First/>, in a binarized image representing a current working environment containing edge informationEdge self-attention weight value of each pixel point,/>Representing the total number of pixel points in a binarized image of a current working environment containing edge information,/>, andTo be improved/>The function is activated.

Dividing the converted gray image to obtain a plurality of gray image blocks, calculating entropy of sea plug matrix eigenvalues corresponding to each divided gray image block, and calculating curvature self-attention weight of each gray image block through an activation function according to the entropy, wherein the specific steps comprise:

(3.21) extracting image blocks from the gray level image of the current operation environment by taking the step length as 8, wherein each image block comprises 64 pixel points, the size of the image block is 8 multiplied by 8, and a plurality of gray level image blocks of the current operation environment after segmentation processing are obtained;

And (3.22) calculating the sea plug matrix of the feature map corresponding to each gray image block, and filtering out the punctiform structures and noise points. The pixel points in each gray image block of the current operation environment after each division processing are convolved with the second derivative of the Gaussian function corresponding to the pixel points to obtain a sea plug matrix corresponding to the gray image block of the current operation environment after each division processing, so that the characteristic value solution of the sea plug matrix corresponding to each gray image block of the current operation environment after the division processing is obtained, and the calculation formula is as follows:

；

wherein, A gray value function corresponding to the gray image block of the current working environment after the segmentation processing is represented,Representing gray value corresponding to each pixel point in gray image block of current working environment after segmentation processing,/>Representing the number of pixel points in the gray image block of the current working environment after the segmentation processing,/>Representing sea plug matrix corresponding to each gray image block of the current working environment after segmentation processing,/>Gaussian function corresponding to gray image block representing current working environment after segmentation processing,/>Respectively representing the horizontal and vertical coordinates of each pixel point in the gray image block of the current working environment after preprocessing and segmentation processing,/>Represents the standard deviation in Gaussian distribution,/>And the characteristic values of the sea plug matrix corresponding to the gray image blocks of the current working environment after the segmentation processing are represented.

(3.23) Calculating entropy of characteristic values corresponding to each gray image block of the current working environment after the segmentation processing, wherein the basic formula is as follows:

；

wherein, Represents the/>Entropy of characteristic value corresponding to gray image block of current working environment after segmentation processingRepresenting the number of pixel points in each gray image block of the current working environment after the segmentation processing,/>Represents the/>, of the current working environment after the segmentation processAnd the characteristic values of the sea plug matrix corresponding to the gray image blocks.

(3.24) Entropy application of sea plug matrix corresponding to gray image blocks of the current working environment after the segmentation processingAnd activating the function to obtain the curvature self-attention weight corresponding to the gray image block of the current working environment after each segmentation process. The calculation formula is as follows:

；

wherein, Represents the/>Entropy of characteristic value corresponding to gray image block of current working environment after segmentation processingRepresents the/>Curvature self-attention weight corresponding to gray image blocks of current working environment,/>The total number of the gray image blocks after the segmentation processing is defined.

(4) The self-supervision attention semantic segmentation network is improved by edge self-attention weights and curvature self-attention weights, wherein the self-supervision attention semantic segmentation network comprises two continuous self-attention window layers, and the edge self-attention weights and the curvature self-attention weights are respectively added into the attention calculation of the two continuous self-attention window layers.

(5) The RGB image of the operation environment after preprocessing is processed through the improved self-supervision attention semantic segmentation network, the operation environment RGB image containing semantic information is obtained, the position of an obstacle substance center is calculated according to the operation environment RGB image containing semantic information obtained at continuous moments, the movement direction of the obstacle is predicted, and the movement direction of the obstacle is selected according to the predicted movement direction of the obstacle. The specific steps are shown in fig. 4, including:

(5.1) reducing the RGB image of the current working environment after preprocessing by two times, and inputting the images before and after reduction from the original image channel and the reduced image channel respectively to an image block dividing layer, and dividing the image block into a plurality of non-overlapping image blocks. The original image obtained has an image block size of The original image obtained has an image block size of/>Wherein, the method comprises the steps of, wherein,For compressed image depth,/>For the height of the RGB image of the current working environment after preprocessing,/>Is the width of the RGB image of the current working environment after preprocessing.

(5.2) Inputting the image block output by the image segmentation layer into a first linear self-attention feature extraction module, wherein the module comprises a linear embedding layer and two continuous self-attention window layers, and a feature map after the first feature extraction is obtained.

And (5.3) sequentially inputting the feature map after the first feature extraction into three identical fusion self-attention feature extraction modules, wherein the modules comprise a patch fusion layer and two continuous self-attention window layers, so as to obtain the feature map after the fourth feature extraction.

The first patch fusion layer fused with the self-attention feature extraction module comprises:

extracting the first characteristic in the original image channel Input to the first patch fusion layer, downsampled, and output dimension set to/>. The size of the obtained characteristic diagram is/>. Feature map/>, after first feature extraction, in a contracted map channelInput to the first patch fusion layer, downsampled, and output dimension set to/>. The size of the obtained characteristic diagram is/>。

Extracting the second characteristic in the original image channel Input to the second patch fusion layer, downsampled, and output dimension set to/>. The size of the obtained characteristic diagram is/>. Inputting the feature map extracted from the second feature in the contracted map channel to a second patch fusion layer, downsampling, and setting the output dimension as. The size of the obtained characteristic diagram is/>。

extracting the third feature in the original image channel to obtain a feature image Input to the third patch fusion layer, downsampled, and output dimension set to/>. The size of the obtained characteristic diagram is/>. Feature map/>, after feature extraction of third time, in contracted map channelInput to the third patch fusion layer, downsampled, and output dimension set to/>. The size of the obtained characteristic diagram is/>。

The two successive self-attention window layers include:

A first self-attention window layer: inputting the feature images output by the linear embedding layer or the patch fusion layer into a first layer normalization layer for normalization processing, inputting the feature images of the current working environment after layer normalization into a window self-attention calculation layer for window allocation operation, adding the obtained curvature self-attention weight into attention calculation, sequentially inputting the feature images output by the window self-attention calculation layer and the feature images output by the linear embedding layer or the patch fusion layer into a full-connection layer, a second layer normalization layer and a multi-layer perceptron layer, and finally performing full-connection operation on the feature images output by the multi-layer perceptron layer and the feature images output by the window self-attention calculation layer to obtain the feature images output by a first attention module, wherein the self-attention calculation formula is as follows:

；

wherein, To add the self-attention calculation function of curvature self-attention weight, variablesTo query vectors, variables/>Is a key vector,/>As a transpose of the key vector, variable/>Is a value vector, variable/>Represents the offset, variable/>Representing the dimensions of the key,/>For the/>, in the feature map of the current working environment after layer normalizationCurvature self-attention weight of each pixel point,/>Representing the total number of pixel points in the feature map of the current working environment after layer normalization,/>/>, In feature map representing current working environment after layer normalizationPixel dot,/>To be improved/>The function is activated.

A second self-attention window layer: the method comprises the steps of inputting a feature map output by a first attention window module into a third layer normalization layer for normalization processing, inputting the feature map of the current working environment after layer normalization into a moving window multi-head self-attention layer, adding the obtained edge self-attention weight into self-attention calculation, sequentially inputting the feature map output by the moving window multi-head self-attention layer and the feature map output by the first attention window module into a full-connection layer, and processing activation functions of the multi-layer perceptron layer in a fourth layer normalization layer and the multi-layer perceptron layer, and performing full-connection operation on the feature map output by the multi-layer perceptron layer and the feature map output by the moving window multi-head self-attention layer to obtain the feature map output by the whole of two continuous attention window modules. The basic formula is as follows:

；

And (5.4) sequentially inputting the feature images after the fourth feature extraction into a decoder for semantic segmentation to obtain an RGB image of the current working environment containing semantic information, as shown in fig. 6. The method specifically comprises the following steps:

Upsampling the feature map obtained by the fourth feature extraction in the contracted map channel to obtain a feature map with the size of And jumping connection is carried out on the feature map obtained by the fourth feature extraction in the original map channel, and the feature map is input to an extended self-attention module which comprises a patch extended layer and two continuous self-attention layers. Output picture size is/>。

Performing up-sampling operation on the feature map obtained by the third feature extraction in the contracted map channel to make the feature map have the size ofAnd jumping connection is carried out on the feature map obtained by extracting the third feature in the original map channel, and the feature map is input to an extended self-attention module which comprises a patch extended layer and two continuous self-attention layers. Output picture size is/>。

Performing up-sampling operation on the feature map obtained by the second feature extraction in the contracted map channel to make the feature map have the size ofAnd jumping connection is carried out on the feature map obtained by extracting the second feature in the original map channel, and the feature map is input to an extended self-attention module which comprises a patch extended layer and two continuous self-attention layers. The output image has a size of。

Upsampling the feature map obtained by the first feature extraction in the contracted map channel to obtain a feature map of the same size as the feature mapAnd jumping connection is carried out on the feature map obtained by the first feature extraction in the original map channel, and the feature map is input to an extended self-attention module which comprises a patch extended layer and two continuous self-attention layers. The output image has a size of。

And (5.5) calculating the position of the obstacle substance center according to the RGB images of the working environment containing semantic information, which are obtained at continuous moments, predicting the movement direction of the obstacle, and selecting the movement direction of the obstacle according to the predicted movement direction of the obstacle. The method comprises the following specific steps:

(5.51) calculating the position coordinates of the center point of the obstacle in the image of the current working environment containing semantic information obtained at each time. The basic calculation formula is as follows:

；

wherein, Represents the/>Abscissa of center point of obstacle in RGB image of current working environment containing semantic information at each moment,/>Represents the/>Left edge point abscissa,/>, of obstacle in RGB image of current working environment containing semantic information at each momentRepresents the/>The right edge point abscissa of the obstacle in the RGB image of the current working environment containing semantic information at each moment.

(5.52) Predicting the position of the obstacle at the next moment according to the prediction function and the obtained center point coordinates of the obstacle, wherein the basic formula is as follows:

；

wherein, Represents the abscissa of the center point of the obstacle at the next time predicted,/>Represents the/>Abscissa of center point of obstacle in RGB image of current working environment containing semantic information at each moment,/>Representing the number of co-acquired obstacle coordinates.

(5.53) Calculating the direction of robot movement at the next moment according to the predicted position of the center point of the obstacle at the next moment in the RGB image of the current working environment containing semantic information, wherein the basic calculation formula is as follows:

；

wherein, Abscissa of center point of RGB image representing current working environment containing semantic information,/>Represents the abscissa of the center point of the predicted obstacle at the next moment,/>Distance between center point of obstacle and center point of RGB image of current working environment containing semantic information is predicted.

And (5.54) calculating the motion direction which the robot itself should adjust according to the motion direction, and moving forward by 0.5 meter. The above operation is repeated to realize path planning, and the basic calculation formula is as follows:

；

wherein, For the angle of rotation in path planning, in this scenario/>Taking a constant value of 30 degrees.

The invention also provides a semantic segmentation effect graph comparison of RGB images of the current working environment, which is the same as the working environment of the embodiment but obtained by adopting the prior art UperNet. FIG. 5 is a semantic segmentation effect diagram of an RGB image of a current work environment obtained using prior art UperNet; FIG. 6 is a semantic segmentation effect diagram of an RGB image of a current working environment obtained by adopting the local self-attention moving window algorithm in the invention; the comparison shows that the effect of the object edge segmentation in fig. 6 is obviously better than that in fig. 5, so that in the embodiment, the self-supervision attention semantic segmentation network provided by the invention effectively improves the accuracy of semantic segmentation, thereby improving the obstacle avoidance accuracy of the robot.

Claims

1. A path planning method based on a local self-attention moving window algorithm, comprising the steps of:

(3) Performing edge detection on the converted gray level image to obtain a binary image containing edge characteristic information, and obtaining the edge self-attention weight of the binary image through an activation function, wherein the calculation formula is as follows:

；

wherein, Represents the/>Confidence of individual pixel points,/>Representing the/>, in a binarized image containing edge informationEdge self-attention weight value of each pixel point,/>For/>Activating a function,/>Representing the total number of pixel points in the binarized image containing the edge information;

dividing the converted gray image to obtain a plurality of gray image blocks, calculating entropy of sea plug matrix eigenvalues corresponding to each divided gray image block, and calculating curvature self-attention weight of each gray image block through an activation function according to the entropy; the method comprises the following specific steps:

(3.21) dividing the converted gray image to obtain a plurality of gray image blocks;

(3.22) calculating a sea plug matrix of the feature map corresponding to each gray image block, and filtering out punctiform structures and noise points; the pixel points in each gray image block of the current operation environment after each division processing are convolved with the second derivative of the Gaussian function corresponding to the pixel points to obtain a sea plug matrix corresponding to the gray image block of the current operation environment after each division processing, so that the characteristic value solution of the sea plug matrix corresponding to each gray image block of the current operation environment after the division processing is obtained, and the calculation formula is as follows:

；

wherein, A gray value function corresponding to the gray image block of the current working environment after the segmentation processing is represented,Representing gray value corresponding to each pixel point in gray image block of current working environment after segmentation processing,/>Representing the number of pixel points in the gray image block of the current working environment after the segmentation processing,/>Representing sea plug matrix corresponding to each gray image block of the current working environment after segmentation processing,/>Gaussian function corresponding to gray image block representing current working environment after segmentation processing,/>Respectively representing the horizontal and vertical coordinates of each pixel point in the gray image block of the current working environment after preprocessing and segmentation processing,/>Represents the standard deviation in Gaussian distribution,/>Characteristic values of sea plug matrixes corresponding to all gray image blocks of the current working environment after segmentation processing are represented;

；

wherein, Represents the/>Entropy of characteristic value corresponding to gray image block of current working environment after segmentation processingRepresenting the number of pixel points in each gray image block of the current working environment after the segmentation processing,/>Represents the/>, of the current working environment after the segmentation processCharacteristic values of sea plug matrixes corresponding to the gray image blocks;

(3.24) entropy application of sea plug matrix corresponding to gray image blocks of the current working environment after the segmentation processing Activating a function to obtain curvature self-attention weights corresponding to gray image blocks of the current working environment after each segmentation process; the calculation formula is as follows:

；

wherein, Represents the/>Entropy of a characteristic value corresponding to the gray image block of the current working environment after the segmentation processing,Represents the/>Curvature self-attention weight corresponding to gray image blocks of current working environment,/>The total number of the gray image blocks after the segmentation processing;

2. The path planning method according to claim 1, wherein the preprocessing of the RGB image of the current working environment in step (1) includes scaling the RGB image of the current working environment, and then performing flipping, affine transformation and noise addition on the scaled RGB image to obtain the RGB image of the preprocessed working environment.

3. The path planning method according to claim 1, wherein the step (3) of performing edge detection on the transformed gray scale map to obtain a binary image containing edge feature information, and the step of obtaining the edge self-attention weight of the binary image by activating the function comprises:

；

(3.13) converting the confidence into an edge self-attention weight for each pixel point by an activation function.

4. The path planning method according to claim 1, wherein the processing of the RGB image of the pre-processed work environment through the modified self-supervised attention semantic segmentation network in step (5) comprises:

(5.3) sequentially inputting the feature map after the first feature extraction into three fusion self-attention feature extraction modules, wherein each fusion self-attention feature extraction module comprises a patch fusion layer and two continuous self-attention window layers, and obtaining a feature map after the fourth feature extraction;

5. The path planning method of claim 4, wherein the first patch fusion layer that fuses the self-attention feature extraction module comprises:

extracting the first characteristic in the original image channel Input to the patch fusion layer, downsampled, and output dimension set to/>Obtaining a feature map/>, after the second feature extraction; Feature map/>, after first feature extraction, in a contracted map channelInput to the patch fusion layer, downsampled, and output dimension set toObtaining a feature map/>, after the second feature extraction，/>For image depth,/>Is the height of the image,/>Is the width of the image;

extracting the third feature in the original image channel to obtain a feature image Input to the patch fusion layer, downsampled, and output dimension set to/>Obtaining a feature map/>, after the fourth feature extraction; Inputting the feature map after the third feature extraction in the contracted map channel to a patch fusion layer, performing downsampling, and setting the output dimension as/>Obtaining a feature map/>, after the fourth feature extraction。

6. The path planning method of claim 5, wherein the two consecutive self-attention window layers comprise:

；

wherein, To add the self-attention computation function of curvature self-attention weight, variable/>To query vectors, variables/>Is a key vector,/>As a transpose of the key vector, variable/>Is a value vector, variable/>Represents the offset, variable/>Representing the dimensions of the key,/>For the/>, in the feature map of the current working environment after layer normalizationCurvature self-attention weight of each pixel point,/>The total number of pixel points in the characteristic diagram of the current working environment after the layer normalization is represented,/>, In feature map representing current working environment after layer normalizationPixel dot,/>For/>Activating a function;

；

7. The path planning method according to claim 6, wherein the step (5.4) of inputting the feature map obtained in the original map channel and the contracted map channel after each feature extraction into the decoder for semantic segmentation after upsampling and jump connection operations comprises:

Performing up-sampling operation on the feature map after the second feature extraction in the contracted map channel to make the feature map have the size ofThe image is connected with the feature image after the second feature extraction in the original image channel in a jumping way, and is input to the extended self-attention module, and the output image size is/>；

8. The path planning method according to claim 7, wherein calculating the heart position of the obstacle from the RGB images of the work environment containing semantic information obtained at successive times and predicting the movement direction of the obstacle, and selecting the movement direction of the obstacle based on the predicted movement direction comprises:

(5.51) calculating position coordinates of the center point of the obstacle in the RGB image of the working environment containing semantic information at each moment;

；

(5.53) determining the movement direction of the robot at the next moment according to the predicted position of the center point of the obstacle at the next moment in the RGB image of the working environment containing semantic information, wherein the calculation formula is as follows:

，