CN114964247A - Crowd sensing navigation method and system based on high-order graph convolution neural network - Google Patents

Crowd sensing navigation method and system based on high-order graph convolution neural network Download PDF

Info

Publication number
CN114964247A
CN114964247A CN202210278766.0A CN202210278766A CN114964247A CN 114964247 A CN114964247 A CN 114964247A CN 202210278766 A CN202210278766 A CN 202210278766A CN 114964247 A CN114964247 A CN 114964247A
Authority
CN
China
Prior art keywords
robot
order
pedestrian
pedestrians
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210278766.0A
Other languages
Chinese (zh)
Inventor
周风余
侯林飞
薛秉鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202210278766.0A priority Critical patent/CN114964247A/en
Publication of CN114964247A publication Critical patent/CN114964247A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Automation & Control Theory (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention belongs to the technical field of crowd sensing navigation, and provides a crowd sensing navigation method and system based on a high-order graph convolution neural network. The method comprises the steps of obtaining real-time observable state information of a robot and pedestrians, constructing a relation directed graph between the robot and the pedestrians, regarding the pedestrians within a preset distance near the robot as a first-order neighborhood, and regarding other pedestrians as a second-order neighborhood; calculating the attention value of a corresponding pedestrian in a first-order neighborhood and a second-order neighborhood related to the robot through a social attention mechanism, and extracting the interactive features among all agents from a robot and pedestrian relation directed graph based on the attention value and a high-order graph convolutional neural network; wherein the agent comprises a robot and a pedestrian; and searching an optimal action strategy of the robot based on the extracted interactive features among the agents and the deep reinforcement learning network based on the Dueling DQN framework so as to ensure that the expected reward maximization of the robot when the robot reaches a target point.

Description

Crowd sensing navigation method and system based on high-order graph convolution neural network
Technical Field
The invention belongs to the technical field of crowd sensing navigation, and particularly relates to a crowd sensing navigation method and system based on a high-order graph convolution neural network.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The graph neural network is the strongest tool for modeling objects and interactions among objects, non-Euclidean data is generated in the process of crowd interaction, and the graph neural network has great advantages in processing the data compared with other networks, so that the crowd interaction can be modeled by the graph neural network. The graph convolutional neural network (GCN) is a variation of the Graph Neural Network (GNN), in the process of carrying out crowd interaction modeling, pedestrians and robots are defined as nodes of the graph neural network, and interaction relations among the pedestrians and between the robots are represented by using adjacency matrixes.
The spatial relationships among pedestrians and between robots are very important for the robots to predict future behaviors of humans and conduct navigation, and although people are modeled as graph structures in the past, the inventor finds that only the simplest interactive prediction is conducted, and people are not classified according to the spatial relationships of the people, so that the robots ignore much information in spatial aspects (such as distance, speed and the like), therefore, a common GCN (general GCN network) has a poor modeling effect on the crowd relationships under complex conditions, and the common interaction relationships cannot be learned by using a common graph convolution neural network, and further the robots cannot conduct accurate navigation in the people.
Disclosure of Invention
In order to solve the technical problems in the background art, the invention provides a crowd sensing navigation method and system based on a high-order graph convolutional neural network, which can enable a robot to accurately navigate in crowds.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a crowd sensing navigation method based on a high-order graph convolutional neural network, which comprises the following steps:
acquiring real-time observable state information of the robot and the pedestrians, constructing a relation directed graph between the robot and the pedestrians, regarding the pedestrians within a preset distance near the robot as a first-order neighborhood, and regarding other pedestrians as a second-order neighborhood;
calculating the attention value of a corresponding pedestrian in a first-order neighborhood and a second-order neighborhood related to the robot through a social attention mechanism, and extracting the interactive features among all agents from a robot and pedestrian relation directed graph based on the attention value and a high-order graph convolutional neural network; wherein the agent comprises a robot and a pedestrian;
and searching an optimal action strategy of the robot based on the extracted interactive features among the agents and the deep reinforcement learning network based on the Dueling DQN framework so as to ensure that the expected reward maximization of the robot when the robot reaches a target point.
The second aspect of the present invention provides a crowd sensing navigation system based on a high-order graph convolutional neural network, which includes:
the social relation directed graph building module is used for obtaining real-time observable state information of the robot and the pedestrians, building a relation directed graph between the robot and the pedestrians, regarding the pedestrians within a preset distance near the robot as a first-order neighborhood, and regarding other pedestrians as a second-order neighborhood;
the interactive feature extraction module is used for calculating the attention value of a corresponding pedestrian in a first-order neighborhood and a second-order neighborhood related to the robot through a social attention mechanism, and extracting interactive features among all agents from a robot and pedestrian relation directed graph based on the attention value and a high-order graph convolutional neural network; wherein the agent comprises a robot and a pedestrian;
and the deep reinforcement learning module is used for searching the optimal action strategy of the robot based on the extracted interactive features among the agents and the deep reinforcement learning network based on the Dueling DQN framework so as to ensure the expected reward maximization of the robot when the robot reaches a target point.
A third aspect of the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the method for crowd-sensing navigation based on high-order convolutional neural network as described above.
A fourth aspect of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps in the crowd sensing navigation method based on the high-order convolutional neural network as described above.
Compared with the prior art, the invention has the beneficial effects that:
the method comprises the steps of dividing pedestrians into second-order neighborhoods, modeling each agent and corresponding first-order and second-order interaction by adopting a social attention mechanism to achieve the purpose of modeling crowd interaction, calculating the attention value of the corresponding pedestrian in the first-order neighborhoods and the second-order neighborhoods related to the robot through the social attention mechanism, and extracting interaction features among the agents from a robot and pedestrian relation directed graph based on the attention value and a high-order graph convolutional neural network; and finally, based on the target of the robot, the extracted interactive characteristics among the agents and the deep reinforcement learning network based on the Dueling DQN framework, searching for the optimal action strategy of the robot so as to ensure the expected reward maximization of the robot when the robot reaches the target point, improve the accuracy of the robot in navigating the crowd and avoid the collision between the robot and the pedestrian.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is a flow chart of a crowd sensing navigation method based on a high-order graph convolution neural network according to an embodiment of the present invention;
FIG. 2 is a setting of first and second order neighborhoods for an embodiment of the present invention;
FIG. 3 is a network structure of a high-level graph convolutional neural network for extracting crowd information according to an embodiment of the present invention;
FIG. 4 is a layer-by-layer convolution case of an embodiment of the present invention;
FIG. 5(a) is a first extreme case in the navigation process of an embodiment of the present invention;
FIG. 5(b) is a second extreme case in the navigation process of an embodiment of the present invention;
FIG. 6 is a graph of success rate of various algorithms versus the number of people in the environment, in accordance with an embodiment of the present invention;
fig. 7 is a relationship between the minimum distance between the robot and the pedestrian and the number of pedestrians in the environment according to the embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Example one
Referring to fig. 1, the embodiment provides a crowd sensing navigation method based on a high-order graph convolution neural network, which includes the following steps:
step 1: the method comprises the steps of obtaining real-time observable state information of the robot and pedestrians, constructing a directed graph of the relationship between the robot and the pedestrians, regarding the pedestrians in a preset distance near the robot as a first-order neighborhood, and regarding other pedestrians as a second-order neighborhood.
Specifically, the observable state information includes location, velocity, and diameter change information.
The task of the robot is to pass through an area with unknown pedestrian number without collision and successfully reach a target point, and the task can be expressed as a sequential decision problem under a reinforced learning framework. Assume that there are n agents in total in this area, (including robots and pedestrians). All the agents are coded by natural numbers, 0 represents a robot, i represents the ith pedestrian, and the states of all the agents are described in a coordinate system which takes the original position of the robot as the center and takes the connecting line of the robot and a target point as an X axis. To simplify the description of the proxy states, all proxies are approximated as a circle of radius r. The state of the agent can be divided into two parts: observable state and hidden state, each time step, the agent can acquire its own complete state (including observable state and unobservable state) and observable states of other agents, the observable states of the agents include: position P ═ P x ,p y ]Velocity V ═ V x ,v y ]And a radius r, the unobservable state including a target g ═ g x ,g y ]Preferred speed v pref And a heading angle theta, the input state of the network including a complete state of the robot and observable states of all pedestrians, the complete state of the robot being represented by w 0 Indicating, by w, the state observable by the pedestrian i (i>0) The representation is performed so that the complete input information s of the network is:
s=[w 0 ,w 1 ,…,w i ]
Figure RE-GDA0003733054510000061
Figure RE-GDA0003733054510000062
the motion space of the robot comprises a stopping motion and 80 discrete motions, wherein the 80 discrete motions are composed of 5 speeds and 16 directions, and the 5 speeds are uniformly distributed in [0, v ] pref ]16 directions are uniformly distributed in [0,2 pi ]]At time t, the view of the robotThe state is observed as S t Assuming the velocity v of the successive person t Can issue command instruction a t Immediately thereafter, the equivalent is: v. of t =a t
The method of high-order GCN is adopted to model crowd relations, and the attention value of each agent and the related first-order and second-order agents of each agent is calculated through a social attention mechanism, so that the purpose of modeling crowd interaction is achieved, and the motion trail of the crowd is predicted. The specific high-level proxy is divided as shown in fig. 2, for example: the distance is used as the most main measuring standard, 5 pedestrians closest to the robot are used as the first-order neighborhood of the robot, and other pedestrians are used as the second-order neighborhood. Experiments show that the performance can be slightly improved by dividing the pedestrians into higher-order crowds, but the higher-order crowds mean higher calculation cost, and in order to balance the performance and the calculation cost, the second-order neighborhood mode is adopted in the embodiment.
Step 2: calculating the attention value of a corresponding pedestrian in a first-order neighborhood and a second-order neighborhood related to the robot through a social attention mechanism, and extracting the interactive features among all agents from a robot and pedestrian relation directed graph based on the attention value and a high-order graph convolutional neural network; wherein the agent includes a robot and a pedestrian.
In the embodiment, a social-attention mechanism (social attention mechanism) is introduced into a graph convolution network for learning the importance of the pedestrian to the robot, attention weights are calculated for all agents, interaction between the robot and the pedestrian and interaction between the pedestrian and the pedestrian can be modeled in a graph convolution mode,
handling different numbers of people with large randomness is a key problem in crowd navigation, however, graph networks can solve this problem very skillfully, and there are two main reasons for using GCN (graph convolution network) to model the interaction between pedestrians and robots: firstly, an interactive scene can be modeled as a structure of a graph network, all agents (including all pedestrians and robots) in the scene can be regarded as nodes of the graph, the GCN network has proved to have great success in feature extraction of class graph data, secondly, for the difference of the number of the pedestrians in different scenes, the GCN network can well process the scene with different pedestrians, because only the number of the phases of the graph needs to be changed, and the basic structure of the graph has no influence, so the GCN-based network can be well generalized to the scene with different numbers of people.
Modeling the relationship between the robot and the pedestrian as a directed graph, where G t =(V t ,E t ) Because the attention weights between the pedestrians and the robot are not necessarily the same, the directed graph can better represent the interaction between the crowds and the robot than in the form of an undirected graph, V ═ N +1, where N represents the number of pedestrians, in a simulated environment, the edge e of the graph, which is assumed to be known but in dynamic variation, is ij Representing the importance of node j to node i. Because the dimensions of the states of the pedestrian and the robot are different,
Figure RE-GDA0003733054510000071
thus, the MLP network is used to extract hidden states of the same dimension for different agents and then input into subsequent networks, and the MLP network is used to extract hidden states of the same dimension for different agents and then input into subsequent networks. The structure of the high-order graph convolutional neural network is shown in fig. 3.
Where the potential states of the pedestrian and robot can be expressed as:
h 0 =f r (w 0 ;W r )
h i =f h (w 0 ;W h )i=1,2,3…N
wherein f is r And f h Is a three-layer multilayer perceptron network for extracting the state dimensions of pedestrians and robots, and the network weights of the pedestrians and the robots are respectively W r And W h The GCN network using two layers in the current network is used for the interaction characteristics between agents, the number of layers is represented by L, and the input of the first layer is H 0 =[h 0 ,h 1 ,h 2 ,h 3 ,…,h N-1 ,h N ]
For the output of different layers, use is made of
Figure RE-GDA0003733054510000081
First, the attention scores between agents need to be solved:
e ij =LeakyReLU(a T [W i h i ‖W j h j ])
Figure RE-GDA0003733054510000082
Figure RE-GDA0003733054510000083
wherein W i And W j Is a parameter to be learned in a fully-connected neural network, and the output dimension of the parameter is 32, a is an attention function, | represents a connection operation, and alpha ij Is the attention weight after softmax, which is used to indicate how important the j node is to the i node,
Figure RE-GDA0003733054510000084
and
Figure RE-GDA0003733054510000085
respectively representing a first-order neighborhood and a second-order neighborhood of the robot divided according to the previous division standard.
People in a short distance near the robot are considered as a first-order neighborhood, pedestrians in the second-order neighborhood can not directly influence the motion of the robot but can influence the motion of surrounding pedestrians so as to influence the motion planning of the robot, besides, if the range of the first-order neighborhood is expanded, some irrelevant pedestrians can be introduced, the pedestrians can not directly influence the path planning of the robot, therefore, only 5 pedestrians near the robot are used as the first-order neighborhood of the pedestrians, and other pedestrians are used as the second-order neighborhood to be processed, the advantages are obvious, the pedestrians in different neighborhoods can be processed by adopting matrixes with different parameters, and therefore the different influences of the first-order neighborhood and the second-order neighborhood on the agent can be considered.
In the number of layers of the GCN network, the discovery performance is different, and it is found that the performance of the two-layer GCN network is greatly improved compared with the single-layer GCN network, but the performance of the more than three-layer GCN network is less improved compared with the two-layer GCN network, and considering that the complexity of the network is greatly increased by improving the number of layers of the GCN network, in order to balance the performance and the calculation cost of the network, the number of layers of the GCN network is set to 2 in this embodiment.
Therefore, the embodiment adopts a two-layer high-order graph convolutional neural network to extract the interactive features between the agents from the robot and pedestrian relationship directed graph.
And step 3: and searching an optimal action strategy of the robot based on the extracted interactive features among the agents and the deep reinforcement learning network based on the Dueling DQN framework so as to ensure that the expected reward of the robot is maximized when the robot reaches a target point.
In the present embodiment, the deep reinforcement learning network based on the Dueling DQN architecture is used to directly evaluate the action value (i.e. Q value) of an agent and select the action value with the highest reward, firstly, the present embodiment takes all states of the robot and the observable state of the pedestrian as input values, and there are two main reasons why only the state of one level of the processed agent is taken as input and not the complete sensor data as input: firstly, the data volume of the agent level is far smaller than that of the sensor level, which greatly reduces the calculation cost, and secondly, the state information of the agent level can help the robot to better understand the information in the scene.
Suppose that the robot performs an action at time t, and the reward earned at time t +1 is r t =R(s t ,a t ) The goal of the blanking DQN network is to find the optimal strategy: pi * :s t →a t To ensure the robot to arrive at the target pointExpected reward maximization of the waiting:
Figure RE-GDA0003733054510000091
Figure RE-GDA0003733054510000092
Q * (s t ,a t ) Is the action-state function:
Figure RE-GDA0003733054510000101
where s 'and R (s, a, s') are the successor state and the instant prize received at time t, respectively, γ ∈ (0,1) is the discount factor, V * Is an optimal cost function, using v pref And Δ t as a normalization term, P (s', r | s, a) is the transition probability from time t to Δ t, which depends on the knowledge of the dynamics of the system and the knowledge of the world, in which case the transition probability is usually unknown to the agent, and existing methods assume that the transition probability function is known during the training process and are modeled using a simple linear model during the training, which assumption greatly simplifies the complexity of the navigation problem, since the agent can basically solve the navigation problem by searching the next state space, assuming that the agent does not know the transition probability, instead we use a model-based approach to predict the motion of the human body, which is more realistic.
Since the setting of the reward function is very sparse, in a complex scene, the non-convergence of the reward function may occur, and in order to ensure the convergence of the reward function, the present embodiment redesigns the reward function.
In the process of finding the optimal action strategy of the robot, the expression of the reward function R is as follows:
R=R g +R c +R d +R v
wherein R is g The robot is rewarded to continuously approach a target point, and the aim is to reward the robot to continuously approach the target point, guide the robot to continuously approach the target point in the moving process and avoid the phenomenon of continuous waiting;
R c is a reward for a collision between the robot and the pedestrian, which reward is generally negative, which can be understood as a penalty for a collision event;
R d the reward is that the distance between the robot and the pedestrian is smaller than the comfortable distance, the reward is also a negative value, and the penalty can be understood as the punishment that the robot violates the social etiquette and is excessively close to the pedestrian;
R v the robot is punished to be close to the pedestrian in the speed direction, and the punishment can ensure that the robot cannot be excessively close to the pedestrian, so that the safety of the pedestrian is ensured.
Figure RE-GDA0003733054510000111
Figure RE-GDA0003733054510000112
Figure RE-GDA0003733054510000113
Figure RE-GDA0003733054510000114
Figure RE-GDA0003733054510000115
Figure RE-GDA0003733054510000116
Figure RE-GDA0003733054510000117
Figure RE-GDA0003733054510000118
Figure RE-GDA0003733054510000119
Figure RE-GDA00037330545100001110
Wherein
Figure RE-GDA00037330545100001111
And
Figure RE-GDA00037330545100001112
respectively the target position of the robot and the coordinate position at time t, d s And d r Is the safe distance that the robot needs to keep from the person,
Figure RE-GDA00037330545100001113
is the distance between the robot and the ith person at time t,
Figure RE-GDA00037330545100001114
is the angle between the connecting line of the ith pedestrian and the robot at the time t and the speed direction of the pedestrian, and is corresponding to the angle
Figure RE-GDA0003733054510000121
Is the angle between the connecting line of the ith pedestrian and the robot at the time t and the speed direction of the robot. Wherein the values of the parameters are as follows: r is static =0.2,m v =0.15,m R =0.3。
In order to enhance the effect of DQN (deep learning network), this embodiment changes the original DQN network into a dulling DQN network, as shown in fig. 4, it can be seen that, in many cases, the obtained reward value is almost the same no matter what reaction the robot makes, as shown in fig. 5(a), in this scenario, because the distance from the robot to all pedestrians is relatively far, in this case, the robot selects any action, but as shown in fig. 5(b), the distance between the robot and the pedestrian is very close, and the selection of the action at this time has a very large influence on reward return of the robot, so this embodiment adopts the form of dulling DQN network.
The Dueling DQN network divides the Q function into two parts, the first part is a cost function, and the part is only related to the current input state S t About it, is denoted as V (S) t (ii) a α, β), the second part being simultaneously with the input state S t And action a are all related, called the merit function, so the final Q function can be expressed as:
Q(H,a;α,β,η)=V(H;α,β)+D(H,a;α,η)
meanwhile, in order to prevent the situation that the cost function is not updated and only the dominant function is updated, all the cost functions are divided by the mean value of the cost functions again, so that the final formula can be updated as follows:
Figure RE-GDA0003733054510000122
wherein
Figure RE-GDA0003733054510000123
The value of (c) is equal to the number of all actions that the robot can take.
In order to verify the realization effect of the robot, the algorithm is verified under the two conditions of simplicity and complexity respectively, and for a simple environment, 5 pedestrians exist on a circle with the radius of 4m, and move in a circular interaction mode. The circular interaction means that a target point of a pedestrian is on the other side of the initial position, so that the robot and the pedestrian can interact near the center of a circle. Random disturbance is added to the coordinates of the initial position and the target position, and when the pedestrian reaches the initial target point, a target point is reselected on the circle, so that the interaction complexity can be increased, and the training samples can be increased. According to the setting, the interaction of the robot near the circle is the most, almost all the robots are near the center of the circle, and the obstacle avoidance pressure of the robot is the largest.
For complex environments, the settings are: the number of people is 7-9, the speed of the pedestrian is 0.7-1.8 m/s, the interaction mode of the pedestrian is a combination mode of circular interaction and square interaction, the square interaction means that the initial position and the target position of the pedestrian are both positioned on the positive direction with the side length of 10m, the square and the circle are concentric, the number of the circular interactions is set to be 4-6, and the interaction complexity is increased, and the condition that the robot and the pedestrian only have one interaction at the circle is avoided. Similarly, in a complex environment, when a pedestrian reaches a target point for the first time, the pedestrian randomly selects a point on a circular or positive-direction frame as the next target point to continue moving.
It should be noted that in the experiment of this embodiment, no matter in a simple situation or a complex situation, the robot is invisible for the pedestrian, because the robot is visible for the pedestrian, a situation that the pedestrian actively dodges the robot occurs, the requirement on the ability of the robot to actively avoid the obstacle is low, it is found that if the situation of actively avoiding the obstacle exists, the algorithm of the first version can also obtain a very good effect, so the situation that the pedestrian is visible is ignored in the experimental design of this time, and the simulation in both cases is set as invisible for the pedestrian. Therefore, pedestrians cannot actively avoid the robot, and the obstacle avoidance strategy of the robot in the path planning process needs to have more distant awareness.
The experimental results are as follows:
table 1 experimental results in a simple scenario
Figure RE-GDA0003733054510000131
Figure RE-GDA0003733054510000141
Table 2 experimental results in complex scenarios
Figure RE-GDA0003733054510000142
In order to compare the advancement of the algorithm, three existing advanced methods, GAT-RGL, CEM-RGL and CAWR, are used as baseline to be compared, so the three algorithms are selected to be compared mainly because the three algorithms are all algorithms for performing robot navigation in a crowd by adopting a graph neural network.
Three existing methods of mapping neural networks: GAT-RGL, CEM-RGL and CAWR are compared as baseline, in order to test the performance of the reward function, the setting of the reward function in CAWR is used in the former algorithms, and then the reward function of the SA-GCN network is replaced by the reward function described in this chapter as comparison, so as to respectively verify the superiority of the network and the reward function proposed in this embodiment.
All algorithms in the test are tested in the same environment for 10 times, each test is a model obtained by 1000 times of random test case evaluation, and random seeds used in 10 test processes are respectively 2,7,12,17,22,27,32,37,42,47,52 and 57. In the process of algorithm comparison, all RL algorithms use the same reward function, and in order to verify the effectiveness of the reward function newly designed by us, the reward function of SA-GCN is modified into the reward function which is set latest to form an algorithm SA-GCN-AZ, and the effectiveness of the reward function setting is verified by comparing the effects of the original reward functions. The SA-GCN network and the SA-GCN-AZ are both tested by adopting the high-order GCN network and the blanking DQN network which are proposed by the user, and through the test results, the user can find that all listed baselines can obtain better effects under the condition of simple interaction, and from the aspects of success rate, running time and minimum distance from a crowd, the two algorithms adopted by the user are sufficiently improved compared with the traditional graph convolution network algorithm, and the method adopting the original reward function can reduce the time required by the robot to reach a target point to a certain extent but increases the frequency of the robot to reach an uncomfortable area near a pedestrian in the running process, which is unacceptable for the user. But in summary, our algorithm has a great improvement over previous algorithms in success rate, as well as run time and minimum distance.
The indexes of quantitative evaluation include: success rate: probability of the robot successfully reaching the target point, collision rate: probability of collision of the robot with the pedestrian in the navigation process, time: time taken for the robot to reach the target point, hazard rate: the robot is apart from pedestrian distance and is less than the proportion of safe distance's step number in total step number, distance: the robot is at a minimum distance from the pedestrian during operation.
In the testing process, the robot needs to prejudge the behaviors of the pedestrians in advance, keeps a certain distance from the pedestrians, and continuously approaches to a target to obtain high reward, and the discomfort frequency of the pedestrians is defined as: t is t disc T, where T disc Is the distance d between the robot and the pedestrian t <A time of 0.2 m. It can be seen that in a simple environment, all algorithms can exhibit good performance, where the success rate of SA-GCN and SA-GCN-AZ both reach above 0.99, and the performance in terms of navigation time, minimum distance from pedestrians, etc. is due to other algorithms, in addition to a slight increase in discomfort frequency.
As shown in fig. 6, the algorithm can be more advantageous in complex situations than in simple situations, the success rate of SA-GCN-AZ decreases the least in complex situations than in simple situations, and the SA-GCN algorithm using the second-order neighborhood and the SA-GCN-AZ algorithm perform the best in complex situations than in previous algorithms because the second-order neighborhood is used in the SA-GCN algorithm, so that better modeling can be achieved for interactions among people, and the robot can make more informed judgments, thereby achieving better navigation effects.
In order to compare the performance of each algorithm more intuitively, an experiment is set, the number of pedestrians in the environment is gradually increased from 3 people, the maximum number of the pedestrians in the environment is set to 10, and different algorithms are used for testing under the same environment. All algorithms performed 10 experiments using the same random seed number and then averaged the results. The evaluation index is mainly the success rate of the robot reaching the target, as shown in fig. 6. And the minimum distance of the robot from the pedestrian during navigation, as shown in fig. 7. It can be found that when the number of pedestrians increases to more than 7 pedestrians, the success rate and the minimum distance of the three algorithms of CAWR, GAT-RGL and CAWR are rapidly reduced, compared with the SA-GCN algorithm and the SA-GCN-AZ algorithm which use the high-order attention mechanism, the reduction rate is smaller, the SA-GCN-AZ algorithm which uses the latest reward function setting is best in performance, and the success rate of more than 0.98 can be kept in the environment of 10 pedestrians.
Example two
The embodiment provides a crowd sensing navigation system based on a high-order graph convolution neural network, which comprises the following modules:
the social relation directed graph building module is used for obtaining real-time observable state information of the robot and the pedestrians, building a relation directed graph between the robot and the pedestrians, regarding the pedestrians within a preset distance near the robot as a first-order neighborhood, and regarding other pedestrians as a second-order neighborhood;
the interactive feature extraction module is used for calculating the attention value of a corresponding pedestrian in a first-order neighborhood and a second-order neighborhood related to the robot through a social attention mechanism, and extracting interactive features among all agents from a robot and pedestrian relation directed graph based on the attention value and a high-order graph convolutional neural network; wherein the agent comprises a robot and a pedestrian;
and the deep reinforcement learning module is used for searching the optimal action strategy of the robot based on the extracted interactive features among the agents and the deep reinforcement learning network based on the Dueling DQN framework so as to ensure that the expected reward maximization of the robot when the robot reaches the target point is ensured.
It should be noted that, each module in the present embodiment corresponds to each step in the first embodiment one to one, and the specific implementation process is the same, which is not described herein again.
EXAMPLE III
The present embodiment provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps in the method for crowd-sensing navigation based on high-order convolutional neural network as described above.
Example four
The embodiment provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the program to implement the steps in the crowd-sensing navigation method based on the high-order convolutional neural network as described above.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A crowd sensing navigation method based on a high-order graph convolution neural network is characterized by comprising the following steps:
acquiring real-time observable state information of the robot and the pedestrians, constructing a relation directed graph between the robot and the pedestrians, regarding the pedestrians within a preset distance near the robot as a first-order neighborhood, and regarding other pedestrians as a second-order neighborhood;
calculating the attention value of a corresponding pedestrian in a first-order neighborhood and a second-order neighborhood related to the robot through a social attention mechanism, and extracting the interactive features among all agents from a robot and pedestrian relation directed graph based on the attention value and a high-order graph convolutional neural network; wherein the agent comprises a robot and a pedestrian;
and searching an optimal action strategy of the robot based on the extracted interactive features among the agents and the deep reinforcement learning network based on the Dueling DQN framework so as to ensure that the expected reward maximization of the robot when the robot reaches a target point.
2. The method of claim 1, wherein the observable state information comprises position, velocity, and diameter-variable information.
3. The crowd sensing navigation method based on the high-order graph convolution neural network as claimed in claim 1, wherein in the process of finding the optimal action strategy of the robot, the expression of the reward function R is as follows:
R=R g +R c +R d +R v
wherein R is g Is a reward for the robot to continuously approach the target point; r is c Is a reward for a collision between the robot and the pedestrian; r d Is a reward for the distance between the robot and the pedestrian being less than the comfortable distance; r v Is a penalty for the robot to approach the pedestrian in the speed direction.
4. The crowd sensing navigation method based on the high-order graph convolutional neural network as claimed in claim 1, wherein two layers of high-order graph convolutional neural networks are adopted to extract the interactive features between the agents from the robot and pedestrian relationship directed graph.
5. A crowd-sensing navigation system based on a high-order graph convolution neural network is characterized by comprising:
the social relation directed graph building module is used for obtaining real-time observable state information of the robot and the pedestrians and building a relation directed graph between the robot and the pedestrians, wherein the pedestrians within a preset distance near the robot are regarded as a first-order neighborhood, and other pedestrians are regarded as a second-order neighborhood;
the interactive feature extraction module is used for calculating the attention value of a corresponding pedestrian in a first-order neighborhood and a second-order neighborhood related to the robot through a social attention mechanism, and extracting interactive features among all agents from a robot and pedestrian relation directed graph based on the attention value and a high-order graph convolutional neural network; wherein the agent comprises a robot and a pedestrian;
and the deep reinforcement learning module is used for searching the optimal action strategy of the robot based on the target of the robot, the extracted interactive characteristics among the agents and the deep reinforcement learning network based on the Dueling DQN framework so as to ensure that the expected reward of the robot is maximized when the robot reaches the target point.
6. The high-order graph convolutional neural network-based crowd-sensing navigation system of claim 5, wherein in the social relationship directed graph building module, the observable state information comprises location, velocity, and diameter-variable information.
7. The crowd-sensing navigation system based on the high-order graph convolution neural network as claimed in claim 5, wherein in the deep reinforcement learning module, in the process of finding the optimal action strategy of the robot, the expression of the reward function R is as follows:
R=R g +R c +R d +R v
wherein R is g Is a reward for the robot to continuously approach the target point; r c Is a reward for a collision between the robot and the pedestrian; r d Is a reward for the distance between the robot and the pedestrian being less than the comfortable distance; r v Is a penalty for the robot to approach the pedestrian in the speed direction.
8. The crowd-sensing navigation system based on the high-order graph convolutional neural network of claim 5, wherein in the interactive feature extraction module, two layers of high-order graph convolutional neural networks are adopted to extract interactive features between agents from a robot and pedestrian relationship directed graph.
9. A computer readable storage medium, on which a computer program is stored, which when executed by a processor performs the steps of the method for crowd-sensing navigation based on higher order graph convolution neural networks according to any of claims 1-4.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps in the method for crowd-sensing navigation based on higher order convolutional neural network of claims 1-4.
CN202210278766.0A 2022-03-21 2022-03-21 Crowd sensing navigation method and system based on high-order graph convolution neural network Pending CN114964247A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210278766.0A CN114964247A (en) 2022-03-21 2022-03-21 Crowd sensing navigation method and system based on high-order graph convolution neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210278766.0A CN114964247A (en) 2022-03-21 2022-03-21 Crowd sensing navigation method and system based on high-order graph convolution neural network

Publications (1)

Publication Number Publication Date
CN114964247A true CN114964247A (en) 2022-08-30

Family

ID=82972169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210278766.0A Pending CN114964247A (en) 2022-03-21 2022-03-21 Crowd sensing navigation method and system based on high-order graph convolution neural network

Country Status (1)

Country Link
CN (1) CN114964247A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117191046A (en) * 2023-11-03 2023-12-08 齐鲁工业大学(山东省科学院) Crowd navigation method and system based on deep reinforcement learning and graph neural network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117191046A (en) * 2023-11-03 2023-12-08 齐鲁工业大学(山东省科学院) Crowd navigation method and system based on deep reinforcement learning and graph neural network
CN117191046B (en) * 2023-11-03 2024-01-26 齐鲁工业大学(山东省科学院) Crowd navigation method and system based on deep reinforcement learning and graph neural network

Similar Documents

Publication Publication Date Title
Jesus et al. Deep deterministic policy gradient for navigation of mobile robots in simulated environments
Cao et al. Target search control of AUV in underwater environment with deep reinforcement learning
JP7448683B2 (en) Learning options for action selection using meta-gradient in multi-task reinforcement learning
EP2363251A1 (en) Robot with Behavioral Sequences on the basis of learned Petri Net Representations
Jeffril et al. The integration of fuzzy logic and artificial neural network methods for mobile robot obstacle avoidance in a static environment
CN114964247A (en) Crowd sensing navigation method and system based on high-order graph convolution neural network
Wu et al. Vision-language navigation: a survey and taxonomy
CN109740192B (en) Crowd evacuation simulation method and system based on Arnold emotion model
De Jesus et al. Deep deterministic policy gradient for navigation of mobile robots
CN114485673A (en) Service robot crowd perception navigation method and system based on deep reinforcement learning
CN113515131A (en) Mobile robot obstacle avoidance method and system based on condition variation automatic encoder
Oh et al. Scan: Socially-aware navigation using monte carlo tree search
CN117518907A (en) Control method, device, equipment and storage medium of intelligent agent
CN116562332A (en) Robot social movement planning method in man-machine co-fusion environment
CN114326826B (en) Multi-unmanned aerial vehicle formation transformation method and system
CN116202526A (en) Crowd navigation method combining double convolution network and cyclic neural network under limited view field
Keong et al. Reinforcement learning for autonomous aircraft avoidance
CN115730630A (en) Control method and device of intelligent agent, electronic equipment and storage medium
CN111723941B (en) Rule generation method and device, electronic equipment and storage medium
Guo et al. Object goal visual navigation using Semantic Spatial Relationships
CN117191046B (en) Crowd navigation method and system based on deep reinforcement learning and graph neural network
Godoy et al. Online learning for multi-agent local navigation
Li et al. Multi-Agent Dynamic Relational Reasoning for Social Robot Navigation
Roth et al. MSVIPER
US20240160888A1 (en) Realistic, controllable agent simulation using guided trajectories and diffusion models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination