CN112965081A

CN112965081A - Simulated learning social navigation method based on feature map fused with pedestrian information

Info

Publication number: CN112965081A
Application number: CN202110163401.9A
Authority: CN
Inventors: 熊蓉; 崔瑜翔; 王越
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-02-05
Filing date: 2021-02-05
Publication date: 2021-06-15
Anticipated expiration: 2041-02-05
Also published as: CN112965081B

Abstract

The invention discloses a learning-simulating social navigation method based on a feature map fused with pedestrian information. The invention guides the robot to simulate the motion habit of experts by introducing the simulation learning method, plans the navigation method which meets the social standard, improves the planning efficiency, relieves the locking problem of the robot, and helps the robot to better integrate into the man-machine co-fusion environment. The method obtains the time sequence motion state of the pedestrian through pedestrian detection and tracking and three-dimensional point cloud alignment in the sequence RGB image. And then, combining the two-dimensional laser data and the social force model to obtain a local characteristic map marked with the pedestrian dynamic information. And finally, establishing a deep network with the local feature map, the current speed of the robot and the relative position of the target as input and the control instruction of the robot as output, and training by taking expert teaching data as supervision to obtain a navigation strategy meeting the social standard.

Description

Simulated learning social navigation method based on feature map fused with pedestrian information

Technical Field

The invention belongs to the field of mobile robot navigation, and particularly relates to a learning-simulated social navigation algorithm based on a feature map fused with pedestrian dynamic information.

Background

The positioning of the service robot determines a large characteristic of its working environment, man-machine hybrid. From a conventional static scene to a man-machine co-fusion scene with complex dynamic characteristics, the great expansion of the activity range puts higher requirements on the behavior specification of the robot, and the robot meets the social specification. On one hand, the service type robot can timely sense the state of human beings through harmonious human-computer interaction, know the demand of human beings, find the best scheme, assist human beings to work with high quality and high efficiency, and on the other hand, the service type robot can also guarantee the safety of human beings around in the course of the work, and meanwhile, the comfort level of human movement is considered, and no obstruction is generated to the activity of human beings.

The service robot generally acquires an intelligent autonomous movement capability by using a navigation system with a good mounting. Under the guidance of a navigation system, the robot can complete service tasks in a larger range, so that a more flexible service effect is realized. In a static or approximately static environment, a traditional navigation mode can realize good path planning, and guide the robot to reach a target point without colliding with obstacles in the environment. However, the man-machine shared environment has a high dynamic characteristic, the set conditions of the traditional navigation mode are destroyed by the complex pedestrian movement, and it is difficult to plan a smooth path in a dense environment by continuously applying the traditional navigation system, so that the comfort level of the surrounding pedestrian movement is influenced, and even collision is caused. Therefore, the research of the navigation algorithm oriented to the man-machine shared environment is urgently needed to be solved.

In recent years, the development of deep learning has greatly promoted the research and development and application of robotics. By establishing the artificial neural network, the deep learning technology can extract the characteristic representation of information from a large amount of data, so that a high-dimensional function model is established to solve the problem of complex artificial intelligence, and the high efficiency and the mobility of the high-dimensional function model are verified in multiple fields. Therefore, the sensor information can be analyzed and processed by utilizing deep learning, and the mapping between the environment information and the navigation decision of the mobile robot is established, so that the navigation planning problem in the man-machine co-fusion environment is solved, and the method has high research and practical values.

Disclosure of Invention

The invention aims to solve the problems in the prior art and provides a learning-simulated social navigation method based on a feature map fused with pedestrian information. The pedestrian motion state in the field of view of the robot is obtained by means of a pedestrian detection and tracking module based on RGB images and a pedestrian three-dimensional position estimation module fusing three-dimensional point cloud information, and a local feature map marking pedestrian dynamic information is further obtained by combining laser information. And the strategy network takes the characteristic map as input and takes expert teaching data as supervision to train and obtain the social navigation decision network.

In order to achieve the purpose of the invention, the invention specifically adopts the following technical scheme:

a method for simulating learning social navigation based on a feature map fused with pedestrian information comprises the following steps:

s1, constructing a pedestrian simulation environment based on the social force model to simulate a man-machine coexistence environment;

s2, constructing a feature map acquisition module fused with pedestrian dynamic information, and processing sensor information of the robot to represent the comprehensive environment condition under the robot coordinate system; the process in the feature map acquisition module is as follows from S21 to S24:

s21, acquiring two-dimensional laser information of a plane with a specified height based on a three-dimensional laser radar carried on the robot, and recovering the two-dimensional laser information into a local obstacle map form;

s22, acquiring a position sequence of a pedestrian in a scene in an image coordinate system by utilizing a pedestrian tracking algorithm based on an RGB camera carried on the robot;

s23, based on the three-dimensional laser radar carried on the robot, combining the pedestrian detection result obtained in S22, obtaining multi-frame pedestrian position information under a robot coordinate system by using a three-dimensional point cloud alignment algorithm, and further extracting speed information of pedestrians;

s24, calculating potential field information of each pedestrian according to the speed and the direction difference by using the social force model, and marking the potential field information of each pedestrian on the local obstacle map obtained in S21 according to different colors to obtain a feature map fused with pedestrian dynamic information;

s3, manually operating the robot to avoid dynamic obstacles in the pedestrian simulation environment and reach a target point, and acquiring a large amount of teaching data for training a strategy network; the teaching data comprise a feature map fused with pedestrian dynamic information, a current speed state of the robot and a corresponding control instruction;

s4, establishing a deep neural network, training the deep neural network by using the teaching data, and gradually approaching the robot motion decision behavior meeting the social standard;

and S5, generating a control instruction by using the trained deep neural network, and using the control instruction to control the robot.

Preferably, the specific implementation method of step S1 is:

building a training environment by adopting Gazebo simulation, wherein the training environment comprises a plurality of common pedestrian interaction scenes, and each scene comprises one or more dynamic obstacles for simulating pedestrians; in the simulation, a mobile robot is selected to verify the navigation decision effect, and the robot utilizes an ROS communication framework and is controlled by a teaching expert through a game handle or directly controlled by a deep neural network; the training environment forms a man-machine hybrid dynamic environment by randomly generating a plurality of simulated pedestrians moving according to the social force model.

Preferably, in step S2, an Intel RealSense D435 depth camera and a Velodyne32 laser are respectively used as sensing elements to acquire an RGB image and three-dimensional laser point cloud information.

Preferably, in step S21, the local obstacle map in the robot coordinate system is restored from the direction and distance information of the laser spot using the two-dimensional laser information; the robot judges the distribution condition of the obstacles under the view angle of the self coordinate system according to the angle distance information returned by the laser sensor, and expresses the obstacles in the form of a binary image, wherein the obstacles are represented by white points, and the open area is represented by black blocks.

Preferably, in step S22, the Deep SORT algorithm is used to extract the pedestrian position sequence in the RGB image coordinate system, and the three-dimensional point cloud alignment algorithm in step S23 is used to obtain the pedestrian position in the robot coordinate system, and the clustering and filtering methods are used in the alignment to ensure the accuracy of determining the pedestrian position.

Preferably, the specific implementation flow of step S23 is as follows:

firstly, aligning an image coordinate system and a point cloud coordinate system by using poses and parameters of a camera and a laser radar; secondly, segmenting a corresponding part in the three-dimensional point cloud according to the position of a pedestrian detection frame in the image coordinate system; then, screening the divided point clouds according to a filtering and clustering algorithm to obtain a three-dimensional boundary frame of the point clouds corresponding to a single pedestrian, wherein the central position is used as the position estimation of the current pedestrian; and finally, averaging the position difference among frames of the same target in a preset time window to obtain the approximate motion state of the pedestrian in the robot coordinate system.

Preferably, in step S24, a motion potential field is established according to a repulsive force of a pedestrian in the social force model, and then the pedestrian with a difference in motion state is distinguished and labeled by using an equipotential line, which specifically includes:

firstly, determining a boundary equipotential line according to a preset pedestrian repulsion receiving range, dividing a comfortable range of pedestrians on a local obstacle map obtained in S21, marking an occupied area for each pedestrian obtained in S22 detection, wherein the size of the marked occupied area is positively correlated with the speed of the pedestrian, so that the individuals have difference; then, coloring the occupied area of each pedestrian on the local barrier map according to the movement direction of each pedestrian; and finally, obtaining a feature map fused with pedestrian dynamic information, and comprehensively displaying the environment state of the robot under the coordinate system.

Preferably, in step S3, the teaching expert uses the game handle to control the movement of the mobile robot in the Gazebo through the ROS communication framework to simulate the pedestrian reaching the target point in the evasive scene; and saving the local obstacle map information obtained in the step S24, the self state information of the robot, the relative position of the target and the corresponding expert control information in the moving process of the robot, so as to obtain an expert teaching data set.

Preferably, in step S4, a deep neural network is established, and iterative training is performed under an expert teaching data set with local obstacle map information and self-state information of the robot as inputs and a control command as an output, so as to gradually approach an expert control criterion and learn a social navigation strategy.

In the deep neural network, a feature map fused with pedestrian dynamic information extracts hidden variables through convolution layers, the self state information and the target relative position of the robot respectively extract the hidden variables through full-connection layers, and after three kinds of hidden variables are spliced, control instructions are output through two full-connection layers.

Preferably, in step S4, the teaching learning algorithm trained on line is adopted, and the teaching data set is updated in real time by means of data aggregation, where the specific training process is as follows: a teaching expert controls the mobile robot to move to a target point in real time in a simulation environment and avoids a simulated pedestrian in a scene; the deep neural network carries out iterative training on the teaching data set updated and stored in real time; with the progress of training, the control frequency of experts is gradually reduced, so that the strategy network obtains the control right of the robot with a certain probability, on one hand, the performance of the network is evaluated, on the other hand, the teaching data distribution is enriched, and the network is helped to improve the capability of recovering from the deviated track.

Compared with the prior art, the invention has the following beneficial effects:

the invention utilizes the characteristic map fused with the pedestrian dynamic information to comprehensively process the local obstacle information and the dynamic pedestrian information under the robot coordinate system, and helps the robot to sense the environmental state more reasonably and efficiently. On the basis of obtaining the information, the algorithm utilizes the teaching information of the experts to guide the deep neural network to update and iterate, gradually approaches the expert strategy habit and imitates the expert decision-making mode, so that the robot can move in a complex crowd according to the moving mode similar to the expert. The deep neural network can respond to complex and variable pedestrian environments by simulating expert behaviors, a pedestrian trajectory prediction module required by a traditional algorithm is omitted, the feasible region of the robot is enlarged, and the problem of locking in the traditional algorithm is avoided. Meanwhile, due to the reasonable comprehensive environment representation used by the algorithm, the execution efficiency of the algorithm is improved.

Drawings

FIG. 1 is a flow chart of a method for learning-simulated social navigation based on a pedestrian information-fused feature map;

FIG. 2 is a frame diagram of a learning-mimicking social navigation method based on a pedestrian information-fused feature map;

FIG. 3 is a diagram of pedestrian detection and tracking and three-dimensional point cloud segmentation effects;

FIG. 4 is a schematic diagram of a social force model;

FIG. 5 is a diagram of the effects of a human-machine hybrid simulation environment;

FIG. 6 is a characteristic map effect diagram of fusing pedestrian dynamic information;

FIG. 7 is a diagram of a deep neural network architecture;

FIG. 8 is a social navigation effect diagram.

Detailed Description

The invention will be further elucidated and described with reference to the drawings and the detailed description. The technical features of the embodiments of the present invention can be combined correspondingly without mutual conflict.

In a preferred embodiment of the invention, a learning-simulated social navigation method based on a feature map fused with pedestrian information is provided, and the method is oriented to the navigation problem of a mobile robot in a man-machine coexistence environment. Most of the traditional navigation algorithms are applied to static or approximately static simple scenes, and when the traditional navigation algorithms are directly transferred to a man-machine co-fusion environment with complex dynamic characteristics, smooth tracks are difficult to plan to avoid pedestrians, so that the motion safety of the pedestrians is threatened. The existing improved method further limits the feasible region of the robot by introducing pedestrian detection and pedestrian trajectory prediction information, however, on one hand, the method introduces more information processing pressure and prediction uncertainty, and on the other hand, the robot is easy to generate a 'locking problem' because the motion range of the robot is excessively limited. The invention guides the robot to simulate the motion habit of experts by introducing the simulation learning method, plans a navigation algorithm meeting the social standard, improves the planning efficiency, relieves the locking problem of the robot, and helps the robot to better integrate into a man-machine co-fusion environment. The algorithm acquires the time sequence motion state of the pedestrian through pedestrian detection and tracking and three-dimensional point cloud alignment in the sequence RGB image. And then, combining the two-dimensional laser data and the social force model to obtain a local characteristic map marked with the pedestrian dynamic information. And finally, establishing a deep network with the local feature map, the current speed of the robot and the relative position of the target as input and the control instruction of the robot as output, and training by taking expert teaching data as supervision to obtain a navigation strategy meeting the social standard.

The specific steps of the method are shown in fig. 1, and are described in detail as follows:

and S1, constructing a pedestrian simulation environment based on the social force model to simulate a man-machine coexistence environment.

And S2, constructing a feature map acquisition module fused with pedestrian dynamic information, and processing the sensor information of the robot to represent the comprehensive environment condition under the robot coordinate system.

S3, manually operating the robot to avoid dynamic obstacles in the pedestrian simulation environment and reach a target point, and acquiring a large amount of teaching data for training a strategy network; the teaching data comprise a feature map fused with pedestrian dynamic information, a current speed state of the robot and a corresponding control instruction.

S4, establishing a deep neural network, and training the deep neural network by using the teaching data to gradually approach the robot motion decision behavior meeting the social standard.

The core idea of the navigation method is shown in figure 2, and the method is based on the idea that a social force model is used for fusing pedestrian dynamic information and a local feature map, and a simulation learning network is built to learn the social behavior specification from expert teaching data, so that the robot is helped to make reasonable navigation decision in a man-machine co-fusion environment. The following describes a specific implementation form of the above steps in this embodiment.

The specific implementation method of the step S1 is as follows:

building a training environment by adopting Gazebo simulation, wherein the training environment comprises a plurality of common pedestrian interaction scenes, and each scene comprises one or more dynamic obstacles for simulating pedestrians; in the simulation, a Turtlebot2 mobile robot is selected to verify the navigation decision effect, and the robot utilizes an ROS communication framework and is controlled by a teaching expert through a Switch controller pro game handle or directly controlled by a deep neural network. The training environment forms a man-machine hybrid dynamic environment by randomly generating a plurality of simulated pedestrians moving according to the social force model. The specific form of the social force model can be seen in the prior art, and for ease of understanding, the following description is set forth.

As shown in fig. 4, the social force model describes the relationship between pedestrians and the surrounding environment in a complex dynamic environment and the relationship between individual pedestrians inside the crowd in a dynamic modeling manner. The model comprehensively considers various influence factors in a complex environment, converts the influence factors into a force expression mode, and quantitatively describes the constraints of a pedestrian caused by a target position, obstacle distribution, social regulations and the like by acting force with certain magnitude and direction.

Considering that the condition of close contact among individual pedestrians is basically absent in the conventional man-machine hybrid environment, the space occupation of single pedestrians is small, and the volume factor of the pedestrians and the mutual extrusion condition caused by mutual crowding can be ignored. Therefore, in order to unify the expression form of the interaction force, the point model is adopted to express the pedestrian and the obstacle in the specific implementation. The single pedestrian corresponds to a single particle model, and obstacles with different shapes are replaced by dot matrixes which conform to the outline characteristics of the single pedestrian, so that the environment representation of the dot model is formed. When analyzing a single pedestrian, the resultant force generated by all the mass points except the current point is considered. The expression of resultant force is as (1)

Resultant force

Attraction of a pedestrian by a target point

Mutual repulsion force between pedestrians

Repulsion of pedestrian by obstacle

And attractiveness of hotspots in the scene

Four items are formed.

The attraction force of the target point is a driving force in the movement of the pedestrian, and guides the pedestrian to move toward the target position. The attraction adjusts the speed direction of the pedestrian to gradually approach the direction of the target point, and meanwhile, the pedestrian is gradually accelerated to the ideal speed. In the case of no obstacle, the pedestrian will make a uniform acceleration movement until the maximum speed is reached, and therefore the effect of the attraction of the target point expressed in acceleration is selected here, the expression being (2)

Wherein

The speed of the motor is the ideal speed,

is a unit direction of a target directionThe amount of the compound (A) is,

is the current velocity vector. The pedestrian has certain reaction time and peripheral environment can also bring certain interference, so that the pedestrian is difficult to reach an ideal state in actual movement. The relaxation time tau is obtained by adding a correction factor in the formula_αThis phenomenon is described. The relaxation time expresses the time length required by the pedestrian to adjust the self motion state under the actual condition, and the pedestrian gradually approaches the ideal speed in the time interval.

The repulsive force of the dynamic pedestrian and the static obstacle to the current pedestrian obstructs the pedestrian from going to the target point. Because the point model is selected to express the current environment, the repulsive force of the pedestrian and the obstacle is converted into the repulsive force between the points. The repulsive force increases as the distance between two points decreases, but the decreasing speeds differ in each direction of the pedestrian. According to the social standard of pedestrians in public places, the pedestrians need a certain comfortable movement space. The space extends back and forth along the direction of movement and is relatively short in the vertical direction of movement, representing the general area of movement of the pedestrian, i.e. the area required for the necessary avoidance. This region is described here in terms of elliptical equipotential lines, as shown in fig. 4.

The ellipse is defined as shown in formula (3)

The current pedestrian is defined as A, and the surrounding pedestrians or obstacle points are defined as B. The current pedestrian step length is the focal length. The sum of the distances A 'B between the current distances AB and B when A reaches A' after walking one step along the current motion direction is the long axis length. The ellipse constructed by the method is the approximate avoidance range of the pedestrian A in the pedestrian AB interaction process. Since the longer the minor axis B of the ellipse, the larger the space for the pedestrian to escape, and the relatively weaker the uncomfortable feeling given to the pedestrian a by the pedestrian B, we define equation (4) and express the change in the repulsive force action in the form of an exponential function.

The parameters M and N are related to the scale of a scene, the blocking characteristic of an obstacle, the characteristics of a crowd and the like, express the strength of interaction between pedestrians and are adjusted in the specific test of an experiment.

By using the known relationship among the interaction force direction among the pedestrians, the moving direction of the pedestrians and the target direction, whether the surrounding pedestrians enter the current view angle range of the pedestrians can be judged. Calculating the acting force of the surrounding pedestrians on the current pedestrian

In the target direction

Size d of projection on_nowAnd rotating the force to the edge of the field of view, i.e. at

At the position in the target direction

Size d of projection on_minAnd comparing the two to judge whether the surrounding pedestrians exceed the visual field. If the former is larger, it indicates that the pedestrian is in the view field of the current pedestrian, and a larger influence weight should be given. Otherwise, the influence of the pedestrian on the current pedestrian is weakened or even ignored.

The social force model is used for carrying out abstract processing on interaction between pedestrians and the environment, quantitative analysis is carried out in a unified mode, reasonable crowd motion simulation can be achieved, and therefore a man-machine interaction simulation environment is constructed.

As shown in fig. 5, a man-machine hybrid simulation environment effect diagram constructed by the present invention.

In the step S2, hardware devices used by the robot to realize sensing may be adjusted as needed, and in this embodiment, an Intel RealSense D435 depth camera and a Velodyne32 laser are respectively used as sensing elements to obtain RGB images and three-dimensional laser point cloud information.

In step S2, the present embodiment specifically executes the process in the feature map obtaining module as in S21 to S24:

and S22, acquiring a position sequence of the pedestrian in the scene in the image coordinate system by utilizing a pedestrian tracking algorithm based on the RGB camera mounted on the robot.

And S23, based on the three-dimensional laser radar carried on the robot, combining the pedestrian detection result obtained in S22, and utilizing a three-dimensional point cloud alignment algorithm to obtain multi-frame pedestrian position information under the robot coordinate system, and further extracting the speed information of the pedestrian.

Fig. 3 is a diagram of pedestrian detection and tracking and three-dimensional point cloud segmentation effects obtained in a scene by the method.

And S24, calculating potential field information of each pedestrian according to the speed and the direction difference by using the social force model, and marking the potential field information of each pedestrian on the local obstacle map obtained in the S21 according to different colors to obtain a feature map fused with pedestrian dynamic information. According to the invention, the current environment information is integrated by selecting the local characteristic map labeled with the pedestrian dynamic information to form the input of a strategy network, and the robot can be helped to better perceive and understand the environment by effectively combining the multi-sensor information.

In step S21, a local obstacle map in the robot coordinate system is restored from the direction and distance information of the laser spot using the two-dimensional laser information; the robot judges the distribution condition of the obstacles under the view angle of the self coordinate system according to the angle distance information returned by the laser sensor, and expresses the obstacles in the form of a binary image, wherein the obstacles are represented by white points, and the open area is represented by black blocks.

In step S22, a Deep SORT algorithm is used to extract a pedestrian position sequence in an RGB image coordinate system, and the pedestrian position in the robot coordinate system is obtained through a three-dimensional point cloud alignment algorithm in S23, and a clustering and filtering method is used in the alignment to ensure the accuracy of pedestrian position determination.

Of course, in a real environment, the motion state information of the pedestrian needs to be acquired through detection and tracking. In the simulation environment, the acquisition can be directly carried out through an environment interface. Therefore, in the strategy training, the dynamic information of the simulated pedestrians is acquired by utilizing the Gazebo environment interface in order to facilitate the data acquisition and the network effect verification.

In step S23, the specific implementation flow is as follows:

The Deep SORT algorithm is used for realizing pedestrian detection and tracking based on RGB images in a real environment and preliminarily determining the pedestrian position in an image coordinate system. The Deep SORT realizes a more robust tracking effect by introducing the correlation measurement integrating the texture information and a cascade matching mechanism on the basis of the SORT algorithm, and is a more mainstream multi-target tracking algorithm at present. Wherein the textural features of the target are extracted by a convolutional neural network pre-trained on a large-scale pedestrian data set. Appearance similarity measurement is obtained by comparing the difference of the inter-frame texture features of the detection frame, and the appearance similarity measurement is combined with the movement distance measurement in the SORT algorithm to form a comprehensive criterion of the association degree. The inter-frame data association is carried out by taking the criterion as a criterion, and the occurrence probability of the identity interleaving problem in the adjacent track tracking is greatly reduced.

By combining the detection tracking result and the three-dimensional point cloud information in the image coordinate system, the pedestrian state information can be further converted into the robot coordinate system, and reference is provided for the navigation decision of the robot. Firstly, the alignment of an image coordinate system and a point cloud coordinate system is realized by using the poses and parameters of a camera and a laser radar. And then, segmenting a corresponding part in the three-dimensional point cloud according to the position of the pedestrian detection frame in the image coordinate system. And then, screening the divided point clouds according to a filtering and clustering algorithm to obtain a three-dimensional boundary frame of the point clouds corresponding to the single pedestrian, wherein the central position is used as the position estimation of the current pedestrian. And finally, averaging the position difference among frames of the same target in a proper time window to obtain the approximate motion state of the pedestrian in the robot coordinate system.

In the step S24, a motion potential field is established according to the repulsive force of the pedestrian in the social force model, and then the pedestrian with a difference in motion state is distinguished and labeled by using equipotential lines, which specifically includes:

firstly, determining a boundary equipotential line according to a preset pedestrian repulsion receiving range, dividing a comfortable range of pedestrians on a local obstacle map obtained in S21, marking an occupied area for each pedestrian obtained in S22 detection, wherein the size of the marked occupied area is positively correlated with the speed of the pedestrian, so that the individuals have difference; then, coloring the occupied area of each pedestrian on the local barrier map according to the movement direction of each pedestrian; finally, a feature map fused with pedestrian dynamic information as shown in fig. 6 is obtained, and the environmental state of the robot in the coordinate system is comprehensively displayed.

In the step S3, the teaching expert uses the game handle to control the movement of the mobile robot in the Gazebo through the ROS communication architecture to simulate the pedestrian to reach the target point in the evasive scene; and saving the local obstacle map information obtained in the step S24, the self state information of the robot, the relative position of the target and the corresponding expert control information in the moving process of the robot, so as to obtain an expert teaching data set. The teaching expert as used herein refers to a person who can thoroughly practice a robot.

In the step S4, by establishing a deep neural network, taking local obstacle map information and self-state information of the robot as input and a control instruction as output, iterative training is performed under an expert teaching data set, so as to gradually approach an expert control criterion, and a social navigation strategy is learned. In the deep neural network, hidden variables are extracted from a feature map fused with pedestrian dynamic information through convolution layers, hidden variables are extracted from the self state information and the target relative position of the robot through full-connection layers, and control instructions are output after the three hidden variables are spliced and pass through two full-connection layers.

The specific policy network structure is shown in fig. 7. The network takes a local characteristic map marked with pedestrian dynamic information, the relative position of a target point and the current speed of the robot as input, and directly outputs a control instruction. As can be seen from the figure, the image part is processed by using a multilayer convolution network, the target position and the robot speed part are coded by using a full-connection layer, intermediate layer hidden variable representations obtained by the two parts are spliced to be used as current comprehensive state information, and finally, a final control command is output through the multilayer full-connection layer.

In the above step S4, the present embodiment adopts a teaching learning algorithm of online training, and updates a teaching data set in real time by a data aggregation manner, where a specific training flow is as follows: a teaching expert controls the mobile robot to move to a target point in real time in a simulation environment and avoids a simulated pedestrian in a scene; the deep neural network carries out iterative training on the teaching data set updated and stored in real time; with the progress of training, the control frequency of experts is gradually reduced, so that the strategy network obtains the control right of the robot with a certain probability, on one hand, the performance of the network is evaluated, on the other hand, the teaching data distribution is enriched, and the network is helped to improve the capability of recovering from the deviated track.

And returning to the original simulation environment for testing and evaluation, and generating a control instruction by using the trained deep neural network obtained after training from S1 to S4 to replace a teaching expert for carrying out a robot control experiment.

The effectiveness of the social force model was verified experimentally setting a randomly initialized "corridor" scenario. Under the guidance of the social force model, pedestrians can keep a relative distance as far as possible and avoid each other. The model also presents stronger adaptability to the change of the pedestrian density, so the model is considered to be more reasonable and can be used for simulating the pedestrian environment.

Experiments a plurality of human-computer coexistence scenes are built in a Gazebo and used for training a strategy network, as shown in FIG. 5. Before each round of training is started, the pedestrian simulation parameters including initial positions, initial speeds, target positions and the like are initialized randomly, the complexity of a scene is enhanced, and the strategy model is prevented from being over-fitted. The simulated pedestrians avoid each other in the interaction process, certain social characteristics are shown, the crowd simulation requirements are met, and the method can be used for follow-up strategy training.

The experiment selects a plurality of randomly initialized scenes to evaluate the strategy performance obtained by the simulated learning. And the task requirement strategy network controls the mobile robot to pass through a man-machine coexistence environment and finally reach a target point. If the vehicle successfully reaches the range of 0.5m near the target point, the task is considered to be completed, and if the vehicle collides with an obstacle or a simulated pedestrian, the task is considered to be failed. In forty navigation tasks, the social navigation strategy performance effect based on the imitation learning is shown in table 1, and the task requirements are basically met.

TABLE 1 mimic learning strategy Performance

In the test process, the robot can flexibly make a navigation decision for dynamic pedestrians, and generates interaction effects of avoidance from the right side, deceleration following and the like, as shown in fig. 8, so that the method can be considered to learn social norms from expert teaching to a certain extent, and the safety and the comfort of pedestrian movement are ensured on the basis of completing a navigation task.

The above-described embodiments are merely preferred embodiments of the present invention, which should not be construed as limiting the invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, the technical scheme obtained by adopting the mode of equivalent replacement or equivalent transformation is within the protection scope of the invention.

Claims

1. A learning-simulated social navigation method based on a feature map fused with pedestrian information is characterized by comprising the following steps:

2. The method for simulating learning social navigation based on the pedestrian information fusion feature map as claimed in claim 1, wherein the step S1 is implemented by:

3. The method as claimed in claim 1, wherein in step S2, an Intel RealSense D435 depth camera and a Velodyne32 laser are used as sensing elements to obtain RGB images and three-dimensional laser point cloud information.

4. The method for social navigation based on learning-by-imitation of a feature map fused with pedestrian information according to claim 1, wherein in step S21, the local obstacle map under the robot coordinate system is restored according to the direction and distance information of the laser point by using the two-dimensional laser information; the robot judges the distribution condition of the obstacles under the view angle of the self coordinate system according to the angle distance information returned by the laser sensor, and expresses the obstacles in the form of a binary image, wherein the obstacles are represented by white points, and the open area is represented by black blocks.

5. The method for social navigation based on learning-by-imitation of a feature map fused with pedestrian information according to claim 1, wherein in step S22, a Deep SORT algorithm is used to extract a pedestrian position sequence in an RGB image coordinate system, and a three-dimensional point cloud alignment algorithm in step S23 is used to obtain the pedestrian position in a robot coordinate system, and a clustering and filtering method is used to ensure the accuracy of pedestrian position determination in the alignment.

6. The method for learning-by-imitation social navigation based on the pedestrian information fusion feature map as claimed in claim 1, wherein the specific implementation flow of step S23 is as follows:

7. The method according to claim 1, wherein in step S24, a motion potential field is established according to the repulsive force of the pedestrian in the social force model, and then the pedestrian with difference in motion state is labeled with equipotential lines in a distinguishing manner, which comprises the following steps:

8. The method for social navigation based on learning-simulated of feature map fused with pedestrian information according to claim 1, wherein in step S3, a teaching expert uses a game handle to control the movement of a mobile robot in a Gazebo to simulate a pedestrian reaching a target point in an evasive scene through an ROS communication architecture; and saving the local obstacle map information obtained in the step S24, the self state information of the robot, the relative position of the target and the corresponding expert control information in the moving process of the robot, so as to obtain an expert teaching data set.

9. The method according to claim 1, wherein in step S4, by establishing a deep neural network, taking local obstacle map information and self-state information of the robot as input and a control command as output, iterative training is performed under an expert teaching data set to gradually approach expert control criteria, and a social navigation strategy is learned.

10. The method for social navigation based on learning-by-imitation of a feature map fused with pedestrian information as claimed in claim 1, wherein in step S4, a teaching learning algorithm of online training is adopted, and a teaching data set is updated in real time in a data aggregation manner, and a specific training flow is as follows: a teaching expert controls the mobile robot to move to a target point in real time in a simulation environment and avoids a simulated pedestrian in a scene; the deep neural network carries out iterative training on the teaching data set updated and stored in real time; with the progress of training, the control frequency of experts is gradually reduced, so that the strategy network obtains the control right of the robot with a certain probability, on one hand, the performance of the network is evaluated, on the other hand, the teaching data distribution is enriched, and the network is helped to improve the capability of recovering from the deviated track.