CN112965081B

CN112965081B - Simulated learning social navigation method based on feature map fused with pedestrian information

Info

Publication number: CN112965081B
Application number: CN202110163401.9A
Authority: CN
Inventors: 熊蓉; 崔瑜翔; 王越
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-02-05
Filing date: 2021-02-05
Publication date: 2023-08-01
Anticipated expiration: 2041-02-05
Also published as: CN112965081A

Abstract

The invention discloses a simulated learning social navigation method based on a feature map fused with pedestrian information. According to the invention, the robot is guided to simulate the movement habit of an expert by introducing the simulated learning method, the navigation method conforming to the social regulation is planned, the planning efficiency is improved, the problem of locking of the robot is relieved, and the robot is helped to be better integrated into the co-fusion environment of the robot and the machine. The method obtains the time sequence motion state of pedestrians through pedestrian detection and tracking in the sequence RGB images and three-dimensional point cloud alignment. And then, combining the two-dimensional laser data and the social force model to obtain a local feature map marked with pedestrian dynamic information. And finally, constructing a depth network which takes a local feature map, the current speed of the robot and the target relative position as input, takes a robot control instruction as output, and takes expert teaching data as supervision to train so as to obtain a navigation strategy which accords with social specifications.

Description

Simulated learning social navigation method based on feature map fused with pedestrian information

Technical Field

The invention belongs to the field of mobile robot navigation, and particularly relates to a simulated learning social navigation algorithm based on a feature map fused with pedestrian dynamic information.

Background

The positioning of the service robot determines a large characteristic of its working environment, man-machine clutter. From a conventional static scene to a man-machine co-fusion scene with complex dynamic characteristics, the wide expansion of the activity range brings higher requirements on the behavior specification of the robot, and meets the social specification. On one hand, the service robot can timely sense the state of human beings through harmonious human-computer interaction, understand the demands of the human beings, find an optimal scheme, assist the human beings to work with high quality and high efficiency, and on the other hand, the service robot can ensure the safety of surrounding human beings in the working process, consider the comfort level of human movements and avoid obstructing the activities of the human beings.

Service robots generally acquire intelligent autonomous movement capability through well-equipped navigation systems. Under the guidance of the navigation system, the robot can complete the service task in a larger range, thereby realizing more flexible service effect. In a static or near-static environment, the traditional navigation mode can realize good path planning, guide the robot to reach the target point and not collide with obstacles in the environment. However, the man-machine sharing environment has the characteristic of high dynamic state, the complex pedestrian movement destroys the setting condition of the traditional navigation mode, and the traditional navigation system is continuously used, so that a smooth path is difficult to plan in a dense environment, the comfort level of surrounding pedestrian movement is influenced, and even collision is caused. Therefore, research on navigation algorithms oriented to man-machine sharing environments is urgent to be solved.

In recent years, development of deep learning has greatly promoted development and application of robot technology. By establishing an artificial neural network, the deep learning technology can extract the characteristic characterization of information from a large amount of data, thereby establishing a high-dimensional function model for solving the complex artificial intelligence problem, and the high efficiency and the mobility of the deep learning model are verified in a plurality of fields. Therefore, the sensor information can be analyzed and processed by deep learning, and the mapping between the environment information and the navigation decision of the mobile robot is established, so that the navigation planning problem in the man-machine co-fusion environment is solved, and the method has higher research and practical value.

Disclosure of Invention

The invention aims to solve the problems in the prior art and provides a simulated learning social navigation method based on a feature map fused with pedestrian information. The invention acquires the pedestrian motion state in the visual field of the robot by means of the pedestrian detection and tracking module based on the RGB image and the pedestrian three-dimensional position estimation module fused with the three-dimensional point cloud information, and further acquires the local feature map marked with the pedestrian dynamic information by combining with the laser information. And the strategy network takes the characteristic map as input, takes expert teaching data as supervision, and trains to obtain the social navigation decision network.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

an imitation learning social navigation method based on a feature map fused with pedestrian information, comprising the following steps of:

s1, constructing a pedestrian simulation environment based on a social force model to simulate a man-machine coexistence environment;

s2, constructing a characteristic map acquisition module fused with pedestrian dynamic information, which is used for processing sensor information of the robot and representing the comprehensive environmental condition under a robot coordinate system; the flow in the feature map acquisition module is as follows:

s21, acquiring two-dimensional laser information of a specified height plane based on a three-dimensional laser radar carried on the robot, and recovering the two-dimensional laser information into a local obstacle map form;

s22, based on an RGB camera carried on the robot, acquiring a position sequence of a pedestrian in the scene in an image coordinate system by using a pedestrian tracking algorithm;

s23, based on the three-dimensional laser radar carried on the robot, combining the pedestrian detection result obtained in the S22, acquiring multi-frame pedestrian position information under a robot coordinate system by utilizing a three-dimensional point cloud alignment algorithm, and further extracting the speed information of the pedestrian;

s24, calculating potential field information of each pedestrian according to the speed and direction difference by using a social force model, and marking the potential field information of each pedestrian on the local obstacle map obtained in the S21 according to different colors to obtain a characteristic map fused with dynamic pedestrian information;

s3, the robot is manually controlled to avoid dynamic obstacles in the pedestrian simulation environment and reach a target point, a large amount of teaching data is obtained, and the teaching data are used for training a strategy network; the teaching data comprises a characteristic map fused with pedestrian dynamic information and a current speed state of the robot and a corresponding control instruction;

s4, establishing a deep neural network, training the deep neural network by using teaching data, and gradually approaching the robot movement decision behavior conforming to social specifications;

s5, generating a control instruction by using the trained deep neural network, and controlling the robot.

Preferably, the specific implementation method of the step S1 is as follows:

constructing a training environment by adopting Gazebo simulation, wherein the training environment comprises a plurality of common pedestrian interaction scenes, and each scene comprises one or a plurality of dynamic barriers for pedestrian simulation; a mobile robot is selected in the simulation to verify the navigation decision effect, and the robot uses an ROS communication architecture, is controlled by a teaching expert through a game handle, or is directly controlled by a deep neural network; the training environment forms a man-machine hybrid dynamic environment by randomly generating a plurality of simulated pedestrians which move according to the social force model.

Preferably, in the step S2, a Intel RealSense D435 depth camera and a Velodyne32 laser are used as sensing elements, respectively, to obtain RGB images and three-dimensional laser point cloud information.

Preferably, in the step S21, the local obstacle map under the robot coordinate system is restored based on the direction and distance information of the laser spot by using the two-dimensional laser information; the robot judges the distribution condition of the obstacle under the view angle of the self coordinate system according to the angle distance information returned by the laser sensor, and expresses the obstacle in a binary image mode, wherein the obstacle is represented by a white point, and the open area is represented by a black block.

Preferably, in the step S22, a Deep SORT algorithm is adopted to extract the pedestrian position sequence under the RGB image coordinate system, and the three-dimensional point cloud alignment algorithm in S23 is used to obtain the pedestrian position under the robot coordinate system, and the precision of pedestrian position determination is ensured by adopting clustering and filtering methods in alignment.

Preferably, the specific implementation procedure of the step S23 is as follows:

firstly, aligning an image coordinate system and a point cloud coordinate system by using pose and parameters of a camera and a laser radar; then, according to the pedestrian detection frame position under the image coordinate system, dividing the corresponding part in the three-dimensional point cloud; then, screening the segmented point clouds according to filtering and clustering algorithms to obtain a three-dimensional boundary frame of the point clouds corresponding to the single pedestrian, wherein the central position is used as the position estimation of the current pedestrian; and finally, averaging the inter-frame difference of the same target in a preset time window to obtain the approximate motion state of the pedestrian under the robot coordinate system.

Preferably, in the step S24, a motion potential field is established according to the pedestrian repulsion force in the social force model, and then the equipotential lines are used for distinguishing and labeling pedestrians with differences in motion states, which comprises the following specific processes:

firstly, determining demarcation equipotential lines according to a preset pedestrian repulsion accepting range, dividing a pedestrian comfort range on the local obstacle map obtained in the step S21, marking occupied areas for the pedestrians detected in the step S22, wherein the size of the marked occupied areas is positively related to the speed of the pedestrians, so that the individuals are different; then, coloring each pedestrian occupation area on the local obstacle map according to the movement direction of each pedestrian; and finally, obtaining a characteristic map fused with pedestrian dynamic information, and comprehensively displaying the environment state under the robot coordinate system.

Preferably, in the step S3, the teaching expert controls the movement of the mobile robot in the Gazebo to simulate the pedestrian to reach the target point in the avoidance scene by using the game handle through the ROS communication architecture; and (3) storing the local obstacle map information obtained in the step (S24), the state information of the robot, the relative position of the target and the corresponding expert control information in the moving process of the robot, so as to obtain an expert teaching data set.

Preferably, in the step S4, by establishing a deep neural network, taking local obstacle map information and self state information of the robot as input, taking a control instruction as output, and performing iterative training under an expert teaching data set, so as to gradually approach an expert control criterion, and acquire a social navigation strategy.

In the deep neural network, the characteristic map fused with pedestrian dynamic information extracts hidden variables through a convolution layer, the state information of the robot and the relative position of a target extract hidden variables through a full-connection layer respectively, and after three hidden variables are spliced, control instructions are output through two full-connection layers.

Preferably, in the step S4, an online training teaching learning algorithm is adopted, and a data aggregation mode is adopted to update the teaching data set in real time, and a specific training process is as follows: the teaching expert controls the mobile robot to move towards the target point in real time in the simulation environment and avoids the simulation pedestrian in the scene; the deep neural network carries out iterative training on a teaching data set which is updated and stored in real time; along with the training, the control frequency of an expert is gradually reduced, so that the strategy network obtains the control right of the robot with a certain probability, on one hand, the performance of the network is evaluated, on the other hand, the distribution of teaching data is enriched, and the network is helped to improve the recovery capability from the deviated track.

Compared with the prior art, the invention has the following beneficial effects:

the invention utilizes the characteristic map fused with the pedestrian dynamic information to comprehensively process the local obstacle information and the dynamic pedestrian information under the robot coordinate system, thereby helping the robot to more reasonably and efficiently perceive the environment state. Based on the information, the algorithm uses the teaching information of the expert to guide the deep neural network to update and iterate, gradually approaches the expert strategy habit, and imitates the expert decision mode, so that the robot can move in the complex crowd in a similar expert movement mode. The deep neural network can be corresponding to complex and changeable pedestrian environments by simulating expert behaviors, a pedestrian track prediction module required by a traditional algorithm is omitted, the feasible area of the robot is enlarged, and the problem of locking in the traditional algorithm is avoided. Meanwhile, the execution efficiency of the algorithm is improved due to the reasonable comprehensive environment representation used by the algorithm.

Drawings

FIG. 1 is a flow chart of a simulated learning social navigation method based on feature maps fused with pedestrian information;

FIG. 2 is a frame diagram of a simulated learning social navigation method based on feature maps fused with pedestrian information;

FIG. 3 is a graph of pedestrian detection and tracking and three-dimensional point cloud segmentation effects;

FIG. 4 is a schematic diagram of a social force model;

FIG. 5 is a graph of the effect of human-machine hybrid simulation environment;

FIG. 6 is a feature map effect diagram incorporating pedestrian dynamic information;

FIG. 7 is a block diagram of a deep neural network;

FIG. 8 is a social navigation effect diagram.

Detailed Description

The invention is further illustrated and described below with reference to the drawings and detailed description. The technical features of the embodiments of the invention can be combined correspondingly on the premise of no mutual conflict.

In a preferred embodiment of the invention, a method for simulating learning social navigation based on feature maps fused with pedestrian information is provided, and the method is oriented to the problem of mobile robot navigation in a man-machine coexisting environment. The traditional navigation algorithm is mostly corresponding to a static or approximately static simple scene, and when the traditional navigation algorithm is directly migrated to a man-machine co-fusion environment with complex dynamic characteristics, a smooth track is difficult to be planned to avoid pedestrians, so that the motion safety of the pedestrians is threatened. The existing improved method further limits the feasible region of the robot by introducing pedestrian detection and pedestrian track prediction information, however, the method introduces more information processing pressure and prediction uncertainty on one hand, and on the other hand, the problem of robot locking is easy to generate due to excessively limiting the movement range of the robot. According to the invention, the robot is guided to simulate the movement habit of an expert by introducing the simulated learning method, so that the navigation algorithm conforming to the social regulation is planned, the planning efficiency is improved, the problem of locking of the robot is relieved, and the robot is helped to be better integrated into the co-fusion environment of the human and the machine. The algorithm obtains the time sequence motion state of pedestrians through pedestrian detection and tracking in the sequence RGB images and three-dimensional point cloud alignment. And then, combining the two-dimensional laser data and the social force model to obtain a local feature map marked with pedestrian dynamic information. And finally, constructing a depth network which takes a local feature map, the current speed of the robot and the target relative position as input, takes a robot control instruction as output, and takes expert teaching data as supervision to train so as to obtain a navigation strategy which accords with social specifications.

The specific steps of the method are shown in fig. 1, and are described in detail below:

s1, constructing a pedestrian simulation environment based on a social force model so as to simulate a man-machine coexistence environment.

S2, constructing a characteristic map acquisition module fusing pedestrian dynamic information, which is used for processing sensor information of the robot and representing comprehensive environmental conditions under a robot coordinate system.

S3, the robot is manually controlled to avoid dynamic obstacles in the pedestrian simulation environment and reach a target point, a large amount of teaching data is obtained, and the teaching data are used for training a strategy network; the teaching data comprises a characteristic map fused with pedestrian dynamic information and a current speed state of the robot and a corresponding control instruction thereof.

And S4, establishing a deep neural network, training the deep neural network by using teaching data, and gradually approaching the robot movement decision behavior conforming to the social specification.

The key idea of the navigation method is shown in fig. 2, and the method is characterized in that a social force model is used for fusing pedestrian dynamic information and a local feature map, and social behavior specifications are learned from expert teaching data by building a simulated learning network, so that a robot is helped to carry out reasonable navigation decision in a man-machine co-fusion environment. The specific implementation form of each step in this embodiment will be described below.

The specific implementation method of the step S1 is as follows:

constructing a training environment by adopting Gazebo simulation, wherein the training environment comprises a plurality of common pedestrian interaction scenes, and each scene comprises one or a plurality of dynamic barriers for pedestrian simulation; the Turtlebot2 mobile robot is selected in the simulation to verify the navigation decision effect, and the robot is controlled by a teaching expert through a Switch controller pro game handle or directly controlled by a deep neural network by utilizing an ROS communication architecture. The training environment forms a man-machine hybrid dynamic environment by randomly generating a plurality of simulated pedestrians which move according to the social force model. The specific form of the social force model can be found in the prior art, and for ease of understanding, the following description is provided.

As shown in fig. 4, the social force model describes the relationship between pedestrians and the surrounding environment in the complex dynamic environment and the relationship between pedestrians in the crowd in a dynamic modeling manner. The model comprehensively considers various influencing factors in a complex environment, converts the influencing factors into a force expression mode, and quantitatively describes the restriction caused by target positions, barrier distribution, social specifications and the like of pedestrians by acting forces with certain sizes and directions.

Considering that the situation of close contact between pedestrians and individuals basically does not exist in the conventional man-machine hybrid environment, and the space occupation of single pedestrians is small, the volume factors of the pedestrians and the mutual extrusion conditions generated by crowding can be ignored. Therefore, in order to unify the expression form of the interaction force, the pedestrian and the obstacle are expressed by adopting a point model in the concrete implementation. The single pedestrian corresponds to a single particle model, and the obstacles with different forms are replaced by a dot matrix conforming to the outline characteristics of the obstacle, so that the environment representation of the dot model is formed. When analyzing a single pedestrian, consider the resultant force that all particles except the current point produce on it. The expression of the resultant force is as (1)

Resultant forceAttraction of the target point to the pedestrian>Mutual repulsive force ∈10 between pedestrians>Repulsive force of obstacle to pedestrian->Attractive +.>Four items.

The attraction force of the target point is the driving force in the movement of the pedestrian, and guides the pedestrian to move to the target position. The attractive force adjusts the pedestrian speed direction to gradually approach the target point direction, and simultaneously promotes the pedestrian to gradually accelerate to an ideal speed. In the absence of obstruction by an obstacle, the pedestrian will perform a uniform acceleration movement until the maximum speed is reached, so that the attraction effect of the target point is expressed here by the acceleration, as expressed in (2)

Wherein the method comprises the steps ofFor the ideal speed size +.>Is the target directionUnit vector->Is the current velocity vector. Because the pedestrians have certain reaction time, the surrounding environment can also bring certain interference, so that the pedestrians are difficult to reach an ideal state in actual movement. The relaxation time tau is calculated by adding a correction factor to the formula _α This phenomenon is described. The relaxation time expresses the length of time required by the pedestrian to adjust the motion state of the pedestrian under the actual condition, and the pedestrian gradually approaches the ideal speed in the time interval.

Dynamic pedestrians and static obstacles block the pedestrian from moving to the target point if the repulsive force of the current pedestrian. Because the point model is selected to express the current environment, the repulsive force of pedestrians and obstacles is converted into repulsive force between points. The repulsive force increases as the distance between the two points decreases, but the speed of decrease varies in the various directions of the pedestrian. According to the social regulation of pedestrians in public places, the pedestrians need a certain comfortable movement space. The space extends back and forth in the direction of movement and is relatively short in the direction of vertical movement, representing the general area of movement of the pedestrian, i.e. the area required for the necessary avoidance. The area is described here in terms of elliptical equipotential lines, as shown in fig. 4.

The definition of the ellipse is shown as (3)

Defining the current behavior A, and defining surrounding pedestrians or obstacle points as B. The current pedestrian step size is the focal length. The sum of the distances A 'B when the current distance AB and B between AB is kept motionless and A reaches A' after a step of walking along the current movement direction is long. The ellipse constructed by the method is the approximate avoidance range of the current pedestrian A in the pedestrian AB interaction process. The longer the minor axis B of the ellipse, the larger the avoidance space of the pedestrian, and the less uncomfortable the pedestrian B gives to the pedestrian a, so we define the expression (4) to express the change of the repulsive force effect in the form of an exponential function.

The parameters M and N are related to the scale of the scene, the blocking characteristics of the obstacle, the characteristics of the crowd and the like, express the strength of interaction between people and are adjusted in the specific test of the experiment.

By utilizing the known relationship among the interaction force direction between pedestrians, the motion direction of the pedestrians and the target direction, whether the surrounding pedestrians enter the current visual angle range of the pedestrians can be judged. Calculating the acting force of surrounding pedestrians on the current pedestriansIn the target direction->Projection size d on _now And rotating the force to the edge of the field of view, i.e. at + ->In the position, in the target direction->Projection size d on _min Comparing the two can judge whether the surrounding pedestrians are beyond the visual field. If the former is larger, it indicates that the pedestrian is located within the field of view of the current pedestrian, and a greater impact weight should be applied. Whereas the impact it has on the current pedestrian is attenuated or even ignored.

The interaction between pedestrians and the environment is abstracted by using the social force model, and quantitative analysis is performed in a unified mode, so that reasonable crowd motion simulation can be realized, and a man-machine interaction simulation environment is constructed.

As shown in FIG. 5, a man-machine hybrid simulation environment effect diagram constructed by the invention is shown.

In the step S2, the hardware device adopted for realizing the sensing by the robot can be adjusted according to the need, and in this embodiment, a Intel RealSense D435 depth camera and a Velodyne32 laser are respectively adopted as sensing elements to obtain the RGB image and the three-dimensional laser point cloud information.

In step S2, the flow specifically executed in the feature map acquisition module in this embodiment is as follows:

s22, based on the RGB camera carried on the robot, the pedestrian tracking algorithm is utilized to acquire the position sequence of the pedestrian in the scene in the image coordinate system.

S23, based on the three-dimensional laser radar carried on the robot, combining the pedestrian detection result obtained in the S22, acquiring multi-frame pedestrian position information under a robot coordinate system by utilizing a three-dimensional point cloud alignment algorithm, and further extracting the speed information of the pedestrian.

Fig. 3 is a graph of pedestrian detection and tracking and three-dimensional point cloud segmentation effects obtained in a scene by the method.

And S24, calculating potential field information of each pedestrian according to the speed and direction difference by using the social force model, and marking the potential field information of each pedestrian on the local obstacle map obtained in the S21 according to different colors to obtain a characteristic map fused with dynamic pedestrian information. According to the invention, the local feature map marked with the pedestrian dynamic information is selected to integrate the current environment information, so that the input of a strategy network is formed, the multi-sensor information is effectively combined, and the robot can be helped to better sense and understand the environment.

In step S21, using the two-dimensional laser information, recovering a local obstacle map in the robot coordinate system according to the direction and distance information of the laser point; the robot judges the distribution condition of the obstacle under the view angle of the self coordinate system according to the angle distance information returned by the laser sensor, and expresses the obstacle in a binary image mode, wherein the obstacle is represented by a white point, and the open area is represented by a black block.

In step S22, a Deep SORT algorithm is adopted to extract a pedestrian position sequence under an RGB image coordinate system, and the pedestrian position under a robot coordinate system is obtained through a three-dimensional point cloud alignment algorithm in S23, wherein the precision of pedestrian position determination is ensured by adopting a clustering and filtering method in alignment.

Of course, in a real environment, the acquisition of the motion state information of the pedestrian needs to be performed by detection and tracking. In the simulation environment, the simulation environment can be obtained directly through an environment interface. Therefore, in the strategy training, in order to facilitate the acquisition of data and the verification of network effect, the dynamic information of the simulated pedestrians is also acquired by using the Gazebo environment interface.

In the above step S23, the specific implementation flow is as follows:

The Deep SORT algorithm is used for realizing pedestrian detection and tracking based on RGB images in a real environment and preliminarily determining the positions of pedestrians under an image coordinate system. The Deep SORT realizes more robust tracking effect by introducing the correlation measurement integrating texture information and the cascade matching mechanism on the basis of the SORT algorithm, and is a mainstream multi-target tracking algorithm at present. Wherein the texture features of the target are extracted by a convolutional neural network pre-trained on a large-scale pedestrian dataset. Appearance similarity measurement is obtained by comparing the inter-frame texture feature difference of the detection frame, and the appearance similarity measurement is combined with the motion distance measurement in the SORT algorithm, so that a comprehensive criterion of the association degree can be formed. And the criterion is used for carrying out inter-frame data association, so that the occurrence probability of the identity interleaving problem in adjacent track tracking is greatly reduced.

By combining the detection tracking result under the image coordinate system and the three-dimensional point cloud information, the pedestrian state information can be further converted into the robot coordinate system, and a reference is provided for the navigation decision of the robot. Firstly, the pose and parameters of a camera and a laser radar are utilized to realize the alignment of an image coordinate system and a point cloud coordinate system. And then, according to the positions of the pedestrian detection frames in the image coordinate system, dividing the corresponding parts in the three-dimensional point cloud. And then, screening the segmented point clouds according to filtering and clustering algorithms to obtain a three-dimensional boundary frame of the point clouds corresponding to the single pedestrian, wherein the central position is used as the position estimation of the current pedestrian. And finally, averaging the inter-frame difference of the same target in a proper time window to obtain the approximate motion state of the pedestrian under the robot coordinate system.

In the step S24, a motion potential field is established according to the pedestrian repulsion force in the social force model, and then the equipotential lines are used for distinguishing and labeling pedestrians with different motion states, and the specific process is as follows:

firstly, determining demarcation equipotential lines according to a preset pedestrian repulsion accepting range, dividing a pedestrian comfort range on the local obstacle map obtained in the step S21, marking occupied areas for the pedestrians detected in the step S22, wherein the size of the marked occupied areas is positively related to the speed of the pedestrians, so that the individuals are different; then, coloring each pedestrian occupation area on the local obstacle map according to the movement direction of each pedestrian; finally, a characteristic map which is shown in fig. 6 and integrates pedestrian dynamic information is obtained, and the environment state under the robot coordinate system is comprehensively displayed.

In the step S3, the teaching expert controls the movement of the mobile robot in the Gazebo to simulate the pedestrian to reach the target point in the avoidance scene by using the game handle through the ROS communication architecture; and (3) storing the local obstacle map information obtained in the step (S24), the state information of the robot, the relative position of the target and the corresponding expert control information in the moving process of the robot, so as to obtain an expert teaching data set. The teaching expert refers to a person who can proficiently control the robot.

In the step S4, by establishing the deep neural network, taking the local obstacle map information and the state information of the robot as input and the control instruction as output, iterative training is performed under the expert teaching data set, so that the expert control criterion is gradually approximated, and the social navigation strategy is learned. In the deep neural network, the characteristic map fused with pedestrian dynamic information extracts hidden variables through a convolution layer, the state information of the robot and the relative position of a target extract hidden variables through a full-connection layer respectively, and after three hidden variables are spliced, control instructions are output through two full-connection layers.

A specific policy network structure is shown in fig. 7. The network takes a local feature map marked with pedestrian dynamic information, the relative position of a target point and the current speed of the robot as inputs, and directly outputs a control instruction. From the figure, it can be seen that the image portion is processed by using a multi-layer convolution network, the target position and the robot speed portion are encoded by using a full-connection layer, intermediate layer hidden variable representations obtained by the two portions are spliced to serve as current comprehensive state information, and finally a final control instruction is output through the multi-layer full-connection layer.

In the step S4, the present embodiment adopts the teaching learning algorithm of online training, and updates the teaching data set in real time by adopting a data aggregation manner, and the specific training process is as follows: the teaching expert controls the mobile robot to move towards the target point in real time in the simulation environment and avoids the simulation pedestrian in the scene; the deep neural network carries out iterative training on a teaching data set which is updated and stored in real time; along with the training, the control frequency of an expert is gradually reduced, so that the strategy network obtains the control right of the robot with a certain probability, on one hand, the performance of the network is evaluated, on the other hand, the distribution of teaching data is enriched, and the network is helped to improve the recovery capability from the deviated track.

And returning to the original simulation environment for testing and evaluation, and generating a control instruction by using the trained deep neural network obtained after the training of the S1-S4 to replace a teaching expert to perform a robot control experiment.

Experiments set up a randomly initialized "corridor" scenario to verify the validity of the social force model. Under the guidance of the social force model, pedestrians can keep relative distances as far as possible and avoid each other. The model also has strong adaptability to the change of pedestrian density, so the model is considered to be reasonable and can be used for the simulation of pedestrian environment.

Experiments set up multiple human-machine coexistence scenarios in Gazebo for training of policy networks, as shown in fig. 5. Before each round of training starts, the parameters of the simulated pedestrians, including initial positions, initial speeds, target positions and the like, are randomly initialized, the complexity of the scene is enhanced, and the strategy model is prevented from being over fitted. The simulation pedestrians avoid each other in the interaction process, a certain social characteristic is shown, the crowd simulation requirement is met, and the simulation system can be used for subsequent strategy training.

Multiple randomly initialized scenarios are chosen for experiments to evaluate the performance of policies that mimic learning. The task requires the strategy network to control the mobile robot to pass through the human-computer coexisting environment and finally reach the target point. And if the task successfully reaches the range of 0.5m near the target point, the task is considered to be completed, and if the task collides with an obstacle or a simulated pedestrian, the task is considered to be failed. In forty navigation tasks, social navigation strategy performance effects based on imitative learning are shown in table 1, and task requirements are basically met.

Table 1 mimics learning strategy performance

In the test process, the robot can flexibly conduct navigation decision on dynamic pedestrians, and generates interaction effects of avoidance from the right, deceleration following and the like, as shown in fig. 8, so that the method can be considered to acquire social norms from expert teaching to a certain extent, and the safety and the comfort of the movement of the pedestrians are ensured on the basis of completing navigation tasks.

The above embodiment is only a preferred embodiment of the present invention, but it is not intended to limit the present invention. Various changes and modifications may be made by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present invention. Therefore, all the technical schemes obtained by adopting the equivalent substitution or equivalent transformation are within the protection scope of the invention.

Claims

1. A simulated learning social navigation method based on a feature map fused with pedestrian information is characterized by comprising the following steps:

s2, constructing a characteristic map acquisition module fused with pedestrian dynamic information, which is used for processing sensor information of the robot and representing the comprehensive environmental condition under a robot coordinate system; the flow in the feature map acquisition module is as follows S21-S24:

s4, establishing a deep neural network, taking local obstacle map information and self state information of the robot as input, taking a control instruction as output, and performing iterative training under an expert teaching data set, so as to gradually approach an expert control criterion and acquire a social navigation strategy;

in the deep neural network, the characteristic map fused with pedestrian dynamic information extracts hidden variables through a convolution layer, the state information of the robot and the relative position of a target extract hidden variables through a full-connection layer respectively, and after three hidden variables are spliced, control instructions are output through two full-connection layers;

the iterative training adopts an online training teaching learning algorithm, and adopts a data aggregation mode to update a teaching data set in real time, and the specific training flow is as follows: the teaching expert controls the mobile robot to move towards the target point in real time in the simulation environment and avoids the simulation pedestrian in the scene; the deep neural network carries out iterative training on a teaching data set which is updated and stored in real time; along with the training, the control frequency of an expert is gradually reduced, so that a strategy network obtains the control right of the robot with a certain probability, on one hand, the performance of the network is evaluated, on the other hand, the distribution of teaching data is enriched, and the network is helped to improve the recovery capability from the deviated track;

2. The method for simulated learning social navigation based on feature maps fused with pedestrian information as claimed in claim 1, wherein the specific implementation method of step S1 is as follows:

3. The method for learning social navigation based on the simulation of feature map fusing pedestrian information as claimed in claim 1, wherein Intel RealSense D435 depth camera and Velodyne32 laser are adopted as sensing elements in the step S2, respectively, to obtain RGB image and three-dimensional laser point cloud information.

4. The method for learning social navigation based on simulation of feature map with pedestrian information fusion as set forth in claim 1, wherein in the step S21, the local obstacle map under the robot coordinate system is restored by using two-dimensional laser information according to the direction and distance information of the laser point; the robot judges the distribution condition of the obstacle under the view angle of the self coordinate system according to the angle distance information returned by the laser sensor, and expresses the obstacle in a binary image mode, wherein the obstacle is represented by a white point, and the open area is represented by a black block.

5. The method for simulated learning social navigation based on the feature map fused with pedestrian information as claimed in claim 1, wherein in the step S22, the Deep SORT algorithm is adopted to extract the pedestrian position sequence under the RGB image coordinate system, and the pedestrian position under the robot coordinate system is obtained through the three-dimensional point cloud alignment algorithm in S23, and the precision of pedestrian position determination is ensured by adopting the clustering and filtering method in alignment.

6. The method for learning social navigation based on the imitation of feature map fusing pedestrian information as set forth in claim 1, wherein the specific implementation procedure of step S23 is as follows:

7. The method for simulated learning social navigation based on the feature map fused with pedestrian information as claimed in claim 1, wherein in the step S24, a motion potential field is established according to pedestrian repulsion in the social force model, and then the difference between the motion states of pedestrians is marked by using equipotential lines, which comprises the following specific steps:

8. The method for simulated learning social navigation based on the feature map fused with pedestrian information as claimed in claim 1, wherein in said step S3, the teaching expert controls the mobile robot in Gazebo to move in the avoidance scene by using the gamepad through ROS communication architecture to simulate the pedestrian to reach the target point; and (3) storing the local obstacle map information obtained in the step (S24), the state information of the robot, the relative position of the target and the corresponding expert control information in the moving process of the robot, so as to obtain an expert teaching data set.