CN111462192A

CN111462192A - Space-time double-current fusion convolutional neural network dynamic obstacle avoidance method for sidewalk sweeping robot

Info

Publication number: CN111462192A
Application number: CN202010112294.2A
Authority: CN
Inventors: 高国琴; 王晨钰; 方志明
Original assignee: Jiangsu University
Current assignee: Jiangsu University
Priority date: 2020-02-24
Filing date: 2020-02-24
Publication date: 2020-07-28

Abstract

The invention discloses a space-time double-current fusion convolutional neural network dynamic obstacle avoidance method for a sidewalk sweeping robot. Firstly, image information of the pavement sweeping robot is obtained in real time through a binocular camera. And compressing and assembling the continuous optical flow sequence into a single ordered optical flow diagram by using a Rank support vector machine to realize the modeling of the video time domain structure. Then inputting the processed image into a neural network model, regarding a spatial domain, taking a single-frame RGB image of a video as input, and sending the input into a VGGNet-16 model; for the time domain, a plurality of frames of superimposed optical flow images are taken as input and sent into a C3Dnet model. And finally, performing multi-frame Softmax output weighted fusion on the two models to obtain an output result, and obtaining the multi-model fused dynamic obstacle avoidance method for the pavement sweeping robot. The invention can enable the sweeping robot to more effectively utilize the motion information of the dynamic barrier in the sidewalk environment, reduce the collision probability with the barrier and enable the sweeping robot to more autonomously avoid the barrier in the environment at high speed and high efficiency.

Description

Space-time double-current fusion convolutional neural network dynamic obstacle avoidance method for sidewalk sweeping robot

Technical Field

The invention relates to obstacle avoidance research based on machine vision, in particular to a dynamic obstacle avoidance method for a sidewalk sweeping robot based on binocular vision.

Background

The sidewalk sweeping robot is an important component of a future urban cleaning system, is a comprehensive system integrating functions of environmental perception, decision planning, motion control and the like, relates to a plurality of advanced technical fields, and can effectively improve the cleaning efficiency of urban roads. The sweeping robot needs to operate in an intricate sidewalk environment, so that how to ensure the personal safety of pedestrians on the road becomes one of the core problems in the autonomous control research field of the sweeping robot for the sidewalks. From the perspective of a robot for sweeping a sidewalk, pedestrians on the road are obstacles which cannot collide and can move autonomously. Therefore, the dynamic obstacle avoidance method of the pavement sweeping robot not only reflects the level of the intellectualization of the pavement sweeping robot to a certain extent, but also is an important guarantee for the pavement sweeping robot to realize autonomous safe and reliable running. The current commonly used obstacle avoidance methods include an artificial potential field method, a fuzzy navigation method, a VFH obstacle avoidance method and the like. However, these methods do not have a dynamic prediction function, and it is difficult to accurately avoid obstacles when facing a fast or irregularly moving dynamic obstacle. Therefore, some scholars add the prediction function to the obstacle avoidance method aiming at the dynamic obstacle avoidance. Commonly used prediction methods include gray prediction, regression analysis, time series methods, and the like. However, these methods focus on the analysis of the time series model and the causal relationship regression model, and the built model cannot fully and essentially reflect the internal structure and complex characteristics of the dynamic information, so that the information amount is lost. Videos shot by the vision system of the pavement sweeping robot are used as continuous image sequences, and effective utilization of dynamic time domain information of the videos is of great significance to design of an obstacle avoidance method.

The document "behavior recognition method for multi-scale input 3D convolution fusion dual-flow model" (sons li fei et al, computer aided design and graphics bulletin 2018) proposes a 3D convolution neural network structure, which is an extension of the original 2D neural network in the time dimension, so that the temporal characteristics of video segments can be learned. But the input quantity of the deep learning structure is too small, and only a single optical flow frame and a plurality of optical flows sampled at equal intervals in a time domain exist.

The document "infrared behavior recognition based on space-time double-current convolutional neural network" (wuxue Ping et al, applied optics.2018) proposes a space-time double-current deep learning strategy, which is used for respectively extracting the space information and the time information of a video and finally fusing the two information. But their fusion of spatial and temporal features does not take into account the correlation between spatial and temporal features and how these correlations vary over time.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides a machine vision dynamic obstacle avoidance method based on an improved double-current convolutional neural network aiming at the characteristics of a pavement sweeping robot and the dynamic obstacle avoidance requirement of the pavement sweeping robot. The method can solve the problem that the image characteristics on the time axis are easy to lose when the traditional obstacle avoidance method is used for dynamically avoiding the obstacle, and improves the learning capacity of the image characteristics on the time axis; and the motion information of the obstacles in the dynamic obstacle avoidance can be effectively utilized so as to improve the accuracy of the dynamic obstacle avoidance.

The technical scheme of the invention is as follows: a method for dynamically avoiding obstacles by a space-time double-flow fusion convolutional neural network of a sidewalk sweeping robot comprises the following steps:

step 1, image acquisition based on binocular vision: acquiring an original image of the pavement sweeping robot in the operation process based on binocular vision, and acquiring image information of the pavement sweeping robot in real time through a binocular camera, wherein the camera is adjusted to a proper position to ensure that an obstacle needing to be avoided is always within the visual field range of the camera in the pavement sweeping operation process;

step 2, image processing and acquisition of a light flow graph: processing the acquired original RGB image, compressing and integrating a continuous light stream sequence into a single ordered light stream graph by using a Rank Support Vector Machine (SVM) method, and realizing the modeling of a video time domain structure;

step 3, improving modeling of the double-current convolutional neural network: the established neural network model comprises a space domain and a time domain, and respectively corresponds to the position information and the motion information of the dynamic barrier; for a spatial domain, a single frame RGB image of a video is taken as input and sent into a VGGNet-16 neural network model; for a time domain, taking the optical flow graph as input and sending the optical flow graph into a C3Dnet neural network model;

and 4, model fusion: and performing weighted fusion on the multi-frame Softmax output of the spatial flow convolutional neural network and the time flow convolutional neural network to obtain probability vectors of prediction samples belonging to various categories. And selecting the category with the maximum probability as a classification result, and performing corresponding obstacle avoidance action.

Further, the specific process of step 2 is:

obtaining a continuous light stream sequence F ═ F of n frames through a binocular camera₁,f₂,…,f_n]Wherein

d₁、d₂Height and width of the luminous flux map, each luminous flux map containing a horizontal component f of the two-channel image corresponding to the luminous flux_i ^xAnd a vertical component f_i ^yDefining a light flow pattern f of the t-th frame_tThe weighted moving average of

The weighted average method of equation (1) can reduce both the error rate of optical flow estimation and the effect of white noise;

carrying out ordered light flow graph calculation on a weighted moving average graph of a light flow sequence, wherein the calculation formula is

In the formula (2), the reaction mixture is,

for the optical flow sequence, G is the ordered optical flow graph, C is the compromise between boundary size and training error, ξ_ijIn order to be a function of the relaxation variable,

for inner product, constraint condition

The sequential information of the optical flow frames is reserved; parameters obtained by training learning

Representing a sequence of optical flows that is in fact the same size as the optical flow graph, thus defining G as an ordered optical flow graph, the solution of equation (2) is equivalent to the unconstrained optimization problem, i.e., minimizing the Hinge L oss function:

in the formula (3)

Is the function max (0, x), λ is the reciprocal of C;

equation (2) can be converted to two channels corresponding to the horizontal and vertical components of the light flow

In the formula (4), G_xIs the horizontal component of the ordered light-flow graph G, G_yFor the vertical component of the ordered light flow graph G,

for the two-channel image corresponding to the horizontal component estimate of the optical flow,

a vertical component estimation value corresponding to optical flow for the two-channel image; the obtained G_x,G_yConversion to [0,255 ] using min-max normalization]Overlapping in the range to generate an ordered light flow graph, and taking the ordered light flow graph as the input of the deep network; the mapping from the n-frame optical flow sequence to the single ordered optical flow diagram is realized through the processes, and the ordered optical flow diagram can express the motion information of the multi-frame video sequence.

Further, the specific process of step 3 is:

for the position information and the motion information of the dynamic barrier in the image information, an improved double-current convolution neural network model is established, and the improved double-current convolution neural network model corresponds to a space flow convolution neural network and a time flow convolution neural network respectively;

the method comprises the steps of establishing a VGGNet-16 model as a spatial flow convolutional neural network model, wherein the VGGNet-16 model is a model with 1000 classifications obtained by training on a database ImageNet, a 16-layer deep network is adopted, the deep network comprises 13 convolutional layers and 3 full-connection layers, all convolutional layers use convolutional kernels with the size of 3 × 3, and the convolution step size is reduced to 1;

the time-flow convolutional neural network is used for extracting optical flow information, so that a C3Dnet model pre-trained on an optical flow image is established, the C3Dnet comprises 8 convolutional layers conv x, the size of the convolutional layers conv x is 3 × 3 × 3, the step size is 1, the maximum pooling layer pool y is 5, the size of the pooling core except pool 1 is 1 × 2 × 2, the size of the other pooling cores is 2 × 2 × 2, the output response of each full connection layer is 4096-dimensional, 1 softmax output layer takes 16-frame segments as input units, adjacent segments are overlapped by 8 frames, the size of an input picture is 224 × 224, fc 6-layer responses of all segments of a video are averaged, L2 normalization is carried out, and the obtained 4096-dimensional vector is used as the C3D characteristic of the video.

Further, the specific process of step 4 is:

for the moving pedestrians, the space flow network identifies the relative position of the pedestrian to the robot, the time flow network identifies whether the pedestrian is far away from or close to the robot or whether the relative robot moves leftwards or rightwards, and the obstacle avoidance action which the robot should make can be judged by combining the two networks; dividing obstacle avoidance actions into left turning, straight going and right turning as three classification results of the convolutional neural network;

for the obstacle avoidance method of the pavement sweeping robot, probability vectors correspondingly output by all images in a video need to be fused to obtain a prediction probability vector of a certain video single model, and then multi-frame Softmax outputs of a spatial flow convolutional neural network and a time flow convolutional neural network are subjected to weighted fusion to obtain a prediction sampleProbability vectors V originally belonging to the respective classes_ec：

In formula (5): v_ecLambda is the space flow convolution neural network ratio, n is the video frame number, V_{ec air}For the spatial stream convolutional neural network probability vector, V_{At ec time}The probability vector is a time-stream convolutional neural network. And finally, selecting the category with the maximum probability as a classification result, and executing a corresponding obstacle avoidance action.

The invention provides a space-time double-current fusion convolutional neural network dynamic obstacle avoidance method for a sidewalk sweeping robot, which has the following beneficial effects by adopting the technical scheme: aiming at the problem that the image characteristics on a time axis are easily lost when the traditional obstacle avoidance method is used for dynamically avoiding obstacles, the invention designs an improved deep convolutional neural network structure to improve the learning capability of the image characteristics on the time axis; aiming at the problem that the convolutional neural network can not fully utilize the motion information of the dynamic barrier when singly processing the RGB image, a multi-model fusion method is designed by fusing a space flow convolutional neural network model and a time flow convolutional neural network model. The motion information of the dynamic barrier is fully utilized, the collision probability with the barrier is reduced, and the dynamic barrier avoidance accuracy is improved, so that the barrier avoidance problem of the sweeping robot in a sidewalk environment is solved, and the sweeping robot can avoid the barrier in the environment at high speed and high efficiency more autonomously.

Drawings

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

Fig. 1 is a sidewalk sweeping robot mechanism diagram.

Fig. 2 is a flow chart of a space-time double-flow fusion convolutional neural network dynamic obstacle avoidance method of the sidewalk sweeping robot.

Fig. 3 is an acquired image dataset.

FIG. 4 shows the recognition results of different sub-sequence lengths.

FIG. 5 is a diagram of a dual-stream convolutional neural network model.

FIG. 6 is an input graph source.

FIG. 7 is a test result graph a.

FIG. 8 is a test result chart b.

FIG. 9 is a test result chart c.

FIG. 10 is a graph of training loss and validation rate statistics.

Detailed Description

The following further describes the embodiments of the present invention with reference to the drawings.

The invention provides a space-time double-current fusion convolutional neural network dynamic obstacle avoidance method for a sidewalk sweeping robot. The method aims to solve the problem that image characteristics on a time axis are easily lost when the traditional obstacle avoidance methods such as a fuzzy logic method, an artificial potential field method, a shallow neural network and the like are used for dynamic obstacle avoidance, and the method aims to effectively utilize motion information of a dynamic obstacle so as to improve the accuracy of dynamic obstacle avoidance. The invention provides a space-time double-current fusion convolutional neural network dynamic obstacle avoidance method for a sidewalk sweeping robot. Firstly, original images of the pavement sweeping robot in the operation process are collected based on binocular vision, image information of the pavement sweeping robot is obtained in real time through a binocular camera, and the camera is adjusted to a proper position to ensure that obstacles needing to be avoided are always within the visual field range of the camera in the pavement sweeping operation process. Secondly, processing the acquired original RGB image, compressing and integrating the continuous optical flow sequence into a single ordered optical flow graph by using a Rank Support Vector Machine (SVM) method, and realizing the modeling of the video time domain structure. And thirdly, the established neural network model comprises a space domain and a time domain which respectively correspond to the position information and the motion information of the dynamic barrier. For a spatial domain, a single frame RGB image of a video is taken as input and sent into a VGGNet-16 neural network model; for the time domain, the optical flow map is fed into the C3Dnet neural network model as input. And finally, performing weighted fusion on the multi-frame Softmax output of the spatial flow convolutional neural network and the time flow convolutional neural network to obtain probability vectors of prediction samples belonging to various categories. And selecting the category with the maximum probability as a classification result, and performing corresponding obstacle avoidance action.

The specific implementation mode is described by taking a sidewalk sweeping robot researched and developed by the subject group as a research object, referring to fig. 1, the overall structure of the sidewalk sweeping robot mainly comprises 4 single-shaft intelligent motor modules, a battery module, a main body frame, a control cabin module and a binocular camera, and each module is electrically connected through a waterproof aviation plug.

The method comprises the following specific steps:

1. referring to fig. 2, a flow chart of a space-time double-flow fusion convolutional neural network dynamic obstacle avoidance method of the sidewalk sweeping robot. The sidewalk sweeping robot can make a correct obstacle avoidance decision through image acquisition, image processing and improvement of the double-flow neural network.

2. Referring to fig. 3, a sequence of n-frame continuous optical flows F ═ F is obtained via a binocular camera₁,f₂,…,f_n]Wherein

The weighted average method of equation (6) can reduce both the error rate of optical flow estimation and the effect of white noise;

In the formula (7), the reaction mixture is,

is a sequence of light flows, G is a sequential light flow graph, C is boundary size and trainingCompromise between errors, ξ_ijIn order to be a function of the relaxation variable,

is the inner product. Constraint conditions

Representing a sequence of optical flows that is in fact the same size as the optical flow graph, thus defining G as an ordered optical flow graph, the solution of equation (7) is equivalent to the unconstrained optimization problem, i.e., minimizing the Hinge L oss function:

in the formula (8)

Is the function max (0, x), λ is the reciprocal of C;

equation (7) can be converted to two channels corresponding to the horizontal and vertical components of the light flow

In formula (9) G_xIs the horizontal component of the ordered light-flow graph G, G_yFor the vertical component of the ordered light flow graph G,

the two-channel image is the vertical component estimate corresponding to the optical flow. The obtained G_x,G_yConversion to [0,255 ] using min-max normalization]The range superposition generates an ordered light flow graph and the ordered light flow graph is used for generating the ordered light flow graphAs input to the deep network; the mapping from the n-frame optical flow sequence to the single ordered optical flow diagram is realized through the processes, and the ordered optical flow diagram can express the motion information of the multi-frame video sequence.

3. Firstly, a section of optical flow sequence is divided into a plurality of sub-sequences in the unit of w frames in the time dimension, the interval is w/2, namely, w/2 frames are overlapped between adjacent sub-sequences, then, an ordered optical flow graph is respectively established on each sub-sequence, the ordered optical flow graphs are input into C3Dnet, the size of the input image is also adjusted to 224 × 224, the fc 6 layer responses of all ordered optical flow graphs are averaged and normalized by L2 to obtain the C3D characteristic.

If the number of the sub-sequence frames is too small, the purpose of modeling a time domain structure cannot be achieved, and if the number of the sub-sequence frames is too large, part of motion information may be lost, so that a reasonable sub-sequence length needs to be determined firstly. Fig. 4 shows the corresponding recognition results of different subsequence lengths w on two data sets when the time domain convolutional neural network is used alone for recognition. As can be seen from fig. 4, the highest recognition result can be obtained when w is 24 and 28, and therefore, the length of the subsequence in the present invention is 26 frames in the middle.

4. Referring to fig. 5, the dual-stream convolutional neural network is composed of 2 branch streams, one is called a spatial stream; the other is called a time stream. The network is pre-trained with a single frame of image in the spatial stream, and with an optical flow picture formed with adjacent frame images in the temporal stream. By utilizing a double-flow method, optical flow information is added, namely time information contained before and after a video is added, a single-frame image of an image to be recognized and the corresponding optical flow image are recognized through a pre-trained network respectively, and scores of two paths are fused through a neural network, so that the category corresponding to the image can be obtained more accurately.

The original double-current convolutional neural network model structure design is basically the same as that of an AlexNet model, and comprises 5 convolutional layers and 3 fully-connected layers, the size of an input image of the network is fixed to 224 × 224, compared with the AlexNet, the original double-current convolutional neural network comprises more convolutional filters, the size of a convolutional kernel of the convolutional layer of the first layer is reduced to 7 × 7, the convolution step size is reduced to 2, and parameters of other layers are the same as those of the AlexNet.

The VGGNet-16 model inherits the network framework of the AlexNet model, adopts a 16-layer deep network and comprises 13 convolutional layers and 3 fully-connected layers, compared with the AlexNet model, the VGGNet-16 model uses a deeper network, all the convolutional layers use convolutional kernels with the size of 3 ×, the convolution is also reduced to 1, a larger receptive field can be simulated, and the number of free parameters is reduced.

In addition, the time-flow convolutional neural network is used for extracting optical flow information, so the invention adopts a C3Dnet model pre-trained on an optical flow image, wherein the C3Dnet comprises 8 convolutional layers (conv x) with the size of 3 × 3 × and the step size of 1, 5 maximum pooling layers (pool y), the size of the other pooling cores except pool 1 is 1 × 2 ×, the size of each pooling core is 2 × 2 ×,2 full-connected layers (fc z), the output response of each full-connected layer is 4096-dimensional, and 1 softmax output layer.

5. The fusion of the space-time network is to judge the position and the motion state of the obstacle by using the relevance of the spatial characteristic and the time characteristic of the video, so that the sweeping robot can make a correct obstacle avoidance action. For the moving pedestrians, the space flow network identifies the relative position of the pedestrian to the robot, the time flow network identifies whether the pedestrian is far away from or close to the robot or whether the relative robot moves leftwards or rightwards, and the obstacle avoidance action which the robot should make can be judged by combining the two networks. And dividing the obstacle avoidance action into a left turn, a straight going and a right turn as three classification results of the convolutional neural network.

For the obstacle avoidance method of the pavement sweeping robot, the input of the model is a single-frame image, the samples are in video units, therefore, probability vectors corresponding to the output of all the images in the video are required to be fused to obtain a prediction probability vector of a certain video single model, then the multi-frame Softmax outputs of the spatial flow convolutional neural network and the time flow convolutional neural network are subjected to weighted fusion to obtain probability vectors V of prediction samples belonging to various categories_ec：

In formula (10): v_ecLambda is the space flow convolution neural network ratio, n is the video frame number, V_{ec air}For the spatial stream convolutional neural network probability vector, V_{At ec time}The probability vector is a time-stream convolutional neural network. And finally, selecting the category with the maximum probability as a classification result, and executing a corresponding obstacle avoidance action.

Examples

The invention provides a space-time double-current fusion convolutional neural network dynamic obstacle avoidance method for a sidewalk sweeping robot, and solves the problem that the obstacle avoidance accuracy based on vision is low due to the fact that the working environment and road conditions are complex, light interference exists and dynamic obstacle conditions are variable in the operation process of the sidewalk sweeping robot.

The specific embodiment describes the pavement sweeping robot researched and developed by the subject group as a research object, and the specific implementation mode is as follows:

1. the method comprises the steps of collecting original images of obstacles in front of a sidewalk sweeping robot. The image of the obstacle of the pavement sweeping robot is acquired through the binocular camera, wherein the model of the camera is KS861-60, the camera is installed above the sweeping robot, and the height and the angle of the camera to the ground can be flexibly adjusted. In addition, the detection system adopts a computer with an operating system of Windows7, a processor model of Intel (R) core (TM) i7-3770, a main frequency of 3.40GHz and a memory of 8 GB.

2. The data set is established mainly by manually controlling the robot to avoid obstacles, video shooting is carried out in the process, then the video is intercepted according to frames, and pictures are marked as left-turning, straight-going and right-turning.

In order to ensure the validity of the data set, the acquisition of the data set requires the following: 1) collecting data sets in as many different scenarios as possible; 2) collecting data under different illumination conditions, weather conditions, different times of day and the like; 3) commands for straight and steering with the same condition as possible; 4) the occurrence of a steering command should be avoided when there is no obstacle, etc. According to the requirements, under the condition that the sidewalks are under different illumination and environment conditions, data sets are collected, and 500 left-handed, straight-handed and right-handed images are obtained. In order to ensure that the number of images is enough to prevent the occurrence of situations such as overfitting, data amplification is carried out on the data set by adding Gaussian noise, salt and pepper noise and the like into the original image, and finally the data set of 1500 images is obtained. The partial training set data is shown in figure 3.

To facilitate training of the network, input data is batch processed, images are decoded, data is preprocessed for each classified image, rotated, scaled, cropped, normalized images the preprocessed images are then visualized and stored to obtain 224 × 224 image data, the processed images are shown in FIG. 6.

3. Obtaining a continuous light stream sequence F ═ F of n frames through a binocular camera₁,f₂,…,f_n]Wherein

The weighted average method of equation (11) can reduce both the error rate of optical flow estimation and the effect of white noise;

In the formula (12), the reaction mixture is,

is the inner product. Constraint conditions

Representing a sequence of optical flows that is in fact the same size as the optical flow graph, thus defining G as an ordered optical flow graph, the solution of equation (12) is equivalent to the unconstrained optimization problem, i.e., minimizing the Hinge L oss function:

in the formula (13)

Is the function max (0, x), λ is the reciprocal of C;

equation (12) can be converted to two channels corresponding to the horizontal and vertical components of the light flow

In formula (14), G_xAs a sequential stream of lightHorizontal component of graph G, G_yFor the vertical component of the ordered light flow graph G,

the two-channel image is the vertical component estimate corresponding to the optical flow. The obtained G_x,G_yConversion to [0,255 ] using min-max normalization]Overlapping in the range to generate an ordered light flow graph, and taking the ordered light flow graph as the input of the deep network; the mapping from the n-frame optical flow sequence to the single ordered optical flow diagram is realized through the processes, and the ordered optical flow diagram can express the motion information of the multi-frame video sequence.

4. Firstly, a section of optical flow sequence is divided into a plurality of subsequences with the unit of w frames in the time dimension, the interval is w/2, namely, w/2 frames are overlapped between adjacent subsequences, then, an ordered optical flow graph is respectively established on each subsequence, the ordered optical flow graphs are input into C3Dnet, the size of the input image is also adjusted to 224 × 224, fc 6 layer responses of all ordered optical flow graphs are averaged, and L2 normalization is carried out to obtain C3D characteristics.

5. The method comprises the steps of setting the last Fc layer classification parameter of a VGG-16 model and a C3Dnet model as 3 according to the classification number contained in a data set, normalizing the sizes of an RGB image and an optical flow image to 224 × 224, superposing each three frames of optical flow images to be used as an input sample, then respectively inputting a single frame of RGB original image and the superposed optical flow image into the VGGNet-16 model and the C3Dnet model, setting the initial learning rate of the VGGNet-16 model to be 0.001, reducing the initial learning rate to 10% after 10000 times of iteration, totally iterating 60000 times, setting the initial learning rate of the C3Dnet model to be 0.001, reducing the initial learning rate to 10% after 2000 times of iteration, and totally iterating 10000 times.

And after 6h of training, successfully finishing the generation of the training network, and the maximum training step length is 600 steps. The accuracy of the verification set reaches 100% and is stable, and a part of the process of the invention trains the loss and the statistical graph of the verification rate as shown in figure 10.

6. The fusion of the space-time network is to judge the position and the motion state of the obstacle by using the relevance of the spatial characteristic and the time characteristic of the video, so that the sweeping robot can make a correct obstacle avoidance action. For the moving pedestrians, the space flow network identifies the relative position of the pedestrian to the robot, the time flow network identifies whether the pedestrian is far away from or close to the robot or whether the relative robot moves leftwards or rightwards, and the obstacle avoidance action which the robot should make can be judged by combining the two networks. And dividing the obstacle avoidance action into a left turn, a straight going and a right turn as three classification results of the convolutional neural network.

In formula (15): v_ecLambda is the space flow convolution neural network ratio, n is the video frame number, V_{ec air}For the spatial stream convolutional neural network probability vector, V_{At ec time}Convolving neural network probability directions for time streamsAmount of the compound (A). And finally, selecting the category with the maximum probability as a classification result, and executing a corresponding obstacle avoidance action.

And inputting the test set without labels into the trained VGG-16 model and the C3Dnet model respectively to obtain the prediction results of the models, wherein the test result graphs are shown in FIGS. 7 to 9.

And fusing the predicted values obtained by the two models, and obtaining a final recognition result by selecting 5 different weights for fusion. And respectively comparing and counting the identification results and labels of the test set under 5 weights, and calculating to obtain the obstacle avoidance accuracy under each weight.

The accuracy rate of a single frame RGB (lambda is 1) convolutional neural network model is 78.5%, the accuracy rate of a single frame optical flow (lambda is 0) convolutional neural network model is 84.48%, the accuracy rate of a 1/3 spatial flow +2/3 time flow (lambda is 1/3) convolutional neural network model is 97.14%, the accuracy rate of a 1/2 spatial flow +1/2 time flow (lambda is 1/2) convolutional neural network model is 94.09%, and the accuracy rate of a 2/3 spatial flow +1/3 time flow (lambda is 2/3) convolutional neural network model is 92.76%.

Therefore, the time flow convolution neural network has a better identification effect than the space flow convolution neural network model, the identification effect obtained through model fusion is related to the proportion of different model prediction results, generally speaking, the classification effect of the model fusion method is better than that of a single model, and when the outputs of the space flow convolution neural network model and the time flow convolution neural network model are fused with the proportion of 1:2, the obtained final classification result has the best effect.

The method provided by the invention is compared with the accuracy of other mobile robot obstacle avoidance methods, wherein the accuracy of the convolutional neural network obstacle avoidance method is 86.09%, the accuracy of the P-convolutional neural network obstacle avoidance method is 93.6%, the accuracy of the original double-current convolutional neural network obstacle avoidance method is 92.2%, and the accuracy of the double-current convolutional neural network obstacle avoidance method is 97.14%.

Therefore, compared with the original convolutional neural network and the double-current convolutional neural network, the improved double-current convolutional neural network provided by the invention has the advantage that the accuracy rate of the obstacle avoidance method is improved.

The operation speeds of the method on the VGG-16 model and the C3Dnet model are respectively 68frame · s^-1And 51frame s^-1The method comprises the image processing and model fusion processes, and single obstacle avoidance decision can be completed within 0.1s, so that the real-time requirement is met.

In conclusion, the invention provides a space-time double-flow fusion convolutional neural network dynamic obstacle avoidance method for a sidewalk sweeping robot. Firstly, image information of the pavement sweeping robot is obtained in real time through a binocular camera. And compressing and assembling the continuous optical flow sequence into a single ordered optical flow graph by using a Rank Support Vector Machine (SVM) method, thereby realizing the modeling of the video time domain structure. Then inputting the processed image into a neural network model, regarding a spatial domain, taking a single-frame RGB image of a video as input, and sending the input into a VGGNet-16 model; for the time domain, a plurality of frames of superimposed optical flow images are taken as input and sent into a C3Dnet model. And finally, performing multi-frame Softmax output weighted fusion on the two models to obtain an output result, and obtaining the multi-model fused dynamic obstacle avoidance method for the pavement sweeping robot. The invention can enable the sweeping robot to more effectively utilize the motion information of the dynamic barrier in the pavement environment and reduce the collision probability with the barrier, thereby solving the problem of obstacle avoidance of the sweeping robot in the pavement environment and enabling the sweeping robot to more autonomously avoid the obstacle in the environment at high speed and high efficiency.

Claims

1. A method for dynamically avoiding obstacles by a space-time double-flow fusion convolutional neural network of a sidewalk sweeping robot is characterized by comprising the following steps:

2. The method for dynamically avoiding the obstacles by the space-time double-flow fusion convolutional neural network of the sidewalk sweeping robot as claimed in claim 1, which is characterized in that the specific process of the step 2 is as follows:

In the formula (2), the reaction mixture is,

for inner product, constraint condition

in the formula (3)

Is the function max (0, x), λ is the reciprocal of C;

3. The method for dynamically avoiding the obstacles by the space-time double-flow fusion convolutional neural network of the sidewalk sweeping robot as claimed in claim 2, which is characterized in that the specific process of the step 3 is as follows:

the time-flow convolutional neural network is used for extracting optical flow information, so that a C3Dnet model pre-trained on an optical flow image is established, the C3Dnet comprises 8 convolutional layers conv x, the size of the convolutional layers conv x is 3 × 3 × 3, the step size is 1, the 5 maximum pooling layers pooly are 2 × 2 × 2 except the pooling core size of pool 1 is 1 × 2 × 2, the output response of each full connection layer is 4096-dimensional, 1 softmax output layer is obtained, the network takes 16-frame segments as input units, the adjacent segments are overlapped by 8 frames, the input picture size is 224 × 224, fc 6-layer responses of all segments of a video are averaged, L2 normalization is carried out, and the obtained 4096-dimensional vector is used as the C3D feature of the video.

4. The method for dynamically avoiding the obstacles by the space-time double-flow fusion convolutional neural network of the sidewalk sweeping robot as claimed in claim 3, which is characterized in that the specific process of the step 4 is as follows:

for the obstacle avoidance method of the pavement sweeping robot, probability vectors correspondingly output by all images in a video need to be fused to obtain a prediction probability vector of a certain video single model, and then multi-frame Softmax outputs of a spatial flow convolutional neural network and a time flow convolutional neural network are subjected to weighted fusion to obtain probability vectors V of prediction samples belonging to various categories_ec：