CN115268467B

CN115268467B - Navigation control system and control method of luggage van

Info

Publication number: CN115268467B
Application number: CN202211170194.0A
Authority: CN
Inventors: 马列; 马海兵; 沈亮; 马琼
Original assignee: Jiangsu Tianyi Aviation Industry Co Ltd
Current assignee: Jiangsu Tianyi Aviation Industry Co Ltd
Priority date: 2022-09-26
Filing date: 2022-09-26
Publication date: 2023-01-10
Anticipated expiration: 2042-09-26
Also published as: CN115268467A

Abstract

The invention discloses a navigation control system and a control method of a luggage van, wherein the system comprises a man-machine interaction unit, a sensing unit, a path planning unit and a control unit; the path planning unit carries out path planning according to the obtained target position information, the environment information and the state information of the luggage van, and when the luggage van cannot detect the dynamic barrier, a global path plan is generated; when a dynamic obstacle is detected; generating a local path plan according to a machine learning algorithm; and sending the path plan to a control unit and a man-machine interaction unit; the control unit controls the luggage van to carry luggage according to the path planning result, and a user can check the path planning information through the man-machine interaction unit and move to a target position along with the luggage van, so that automatic obstacle avoidance control of the luggage van is realized. The invention can keep a constant distance between the luggage van and the mobile terminal carried by the user, generate a smooth, safe and efficient path track and realize effective dynamic obstacle avoidance.

Description

Navigation control system and control method of luggage van

Technical Field

The invention belongs to the technical field of automation, and particularly relates to a navigation control system and a control method for a luggage van.

Background

The luggage van can help people to carry various luggage articles in a trip or work, reduces the carrying burden of passengers, and frees both hands. However, the existing luggage barrow mostly needs to be pushed by hand, so that more manpower is needed to be paid out, and people are difficult to do other things at the same time because both hands are occupied when the barrow goes forward. The automatic luggage van also has the defects of poor ability of following the user, obstacle avoidance and path planning, easy falling into local optimum, and the conditions of collision or detour and the like.

Disclosure of Invention

The invention aims to provide a navigation control system and a control method of a luggage van. Can realize that the luggage van follows automatically, keeps away the barrier automatically.

The technical scheme of the invention is as follows: a navigation control system for a luggage cart, the system comprising: the system comprises a human-computer interaction unit, a sensing unit, a path planning unit and a control unit;

the man-machine interaction unit comprises a display screen and a voice input module, wherein the display screen is used for displaying an environment map and path planning information; the voice input module is used for voice interaction with a user, so that the starting and stopping of the luggage van are realized, and a voice consultation function is realized;

the sensing unit comprises a laser radar, a vision sensor and a vehicle-mounted state sensor, and obtains environment information, obstacle information and state information of the luggage van around the luggage van through the laser radar, the vision sensor and the vehicle-mounted state sensor, and constructs an environment map;

the path planning unit is used for planning paths according to the obtained environment information around the luggage van, the obtained obstacle information and the obtained state information of the luggage van, the path planning comprises global path planning and local path planning, and when the luggage van cannot detect the dynamic obstacle, the luggage van drives according to the global path planning; when the dynamic barrier is detected, planning to drive according to the local path; the local path planning comprises the steps of predicting dynamic obstacles according to an environment map and detected dynamic obstacle information, generating a sampling space moving to a plurality of track states in a state space based on the environment map, an obstacle prediction result, a current point and positions of a target point, and generating a plurality of control actions corresponding to the plurality of track states; obtaining an expected reward of each control action in the plurality of control actions based on a machine learning algorithm, scoring the path through an evaluation function, and obtaining a track with the highest score as a local optimal path; and sending the path plan to a control unit and a man-machine interaction module;

the control unit controls the luggage van to transport luggage according to the path planning result, and a user can check path planning information through the display screen and move to a target position along with the luggage van, so that automatic obstacle avoidance control of the luggage van is achieved.

Further, the machine learning algorithm comprises an Actor network and a Critic network, wherein the Actor network is used for determining a control action corresponding to a path state to form a new motion state; the Critic network is used for determining the reward of the control action based on the given path state; the Actor network observes the state according to the current particlesAnd an objectgSelecting an appropriate control actionaObtaining an expected reward by calculating a reward functionrThen, the state is fromsIs transferred tos′Will bes，g，a，r，s′Combined into one tuple X =: (s, g,a,r,s′) And store it in the experience playback pool; the expected reward for each action is accumulated to calculate a merit function,

wherein E is the mathematical expectation,

as a cost factor; iterating according to a Bellman equation until strategy parameters are converged to be optimal; the bellman equation is described as follows:

，

for the observed state of the luggage van at time t,

in state for control strategy

A reward is issued;

is the state transition probability;

to make a state

The strategy that gets the highest reward.

Further, the system further comprises: the sensing unit senses the obstacle information in the environment and controls the speed of the luggage van according to the obstacle information and the distance between the luggage van and the terminal equipment carried by the user.

The invention provides a navigation control method of a luggage van, which is characterized by comprising the following steps:

step 1: obtaining environmental information, obstacle information and state information of the luggage van around the luggage van through a laser radar, a vision sensor and a vehicle-mounted state sensor in a sensing unit, and constructing an environmental map;

step 2: acquiring a target position set by a human-computer interaction unit; the path planning unit plans a path according to the obtained target position information, the environment information, the obstacle information and the state information of the luggage van, wherein the path planning comprises global path planning and local path planning, and when the luggage van cannot detect the dynamic obstacle, the luggage van drives according to the global path planning; when the dynamic barrier is detected, planning to drive according to the local path; the local path planning comprises the steps of predicting a dynamic barrier according to an environment map and detected dynamic barrier information, generating a sampling space moving to a plurality of track states in a state space based on the environment map, a barrier prediction result, a current point and the position of a target point, and generating a plurality of control actions corresponding to the plurality of track states; obtaining an expected reward of each control action in the plurality of control actions based on a machine learning algorithm, scoring the path through an evaluation function, and obtaining a track with the highest score as a local optimal path; and sending the path plan to the control unit and the man-machine interaction module;

and step 3: the control unit controls the luggage van to carry luggage according to the path planning result, and a user can check path planning information through a display screen of the man-machine interaction module and move to a target position along with the luggage van, so that automatic obstacle avoidance control of the luggage van is realized.

Further, the machine learning algorithm comprises an Actor network and a Critic network, wherein the Actor network is used for determining a control action corresponding to a path state to form a new motion state; the Critic network is used for determining the reward of the control action based on the given path state; the Actor network observes the state according to the current particlesAnd an objectgSelecting an appropriate control actionaObtaining an expected reward by calculating a reward functionrThen, the state is fromsIs transferred tos′Will bes，g，a，r，s′The combination is one tuple X =: (s, g,a,r,s′) And store it in the experience playback pool; the expected reward for each action is accumulated to calculate a merit function,

wherein E is the mathematical expectation,

is a cost factor; iterating according to a Bellman equation until strategy parameters are converged to be optimal; the bellman equation is described as follows:

，

for the observed state of the luggage van at time t,

is in state for control strategy

A reward is issued;

is the state transition probability;

to make a state of

The strategy that gets the highest reward.

Preferably, the method comprises: the sensing unit senses the obstacle information in the environment and controls the speed of the luggage van according to the obstacle information and the distance between the luggage van and the terminal equipment carried by the user.

Preferably, the global path plan is generated based on an improved group optimization algorithm, and the specific steps include: step 2.1, rasterizing the map; each grid is in a running state or an obstacle state; initializing a particle population, wherein the particle population comprises a population scale, an initial position, an initial speed and iteration times; generating a plurality of groups of initial path point sets, namely a plurality of particles, according to the starting point and the target point; step 2.2, calculating the particle fitness by using a fitness function; step 2.3, updating the position and the speed of the particles; step 2.4, obtaining an individual optimal value and a global optimal value according to the fitness function; step 2.5 repeating steps 2.2 to 2.4 until the maximum number of iterations is reached; step 2.6, outputting a global optimal solution; step 2.7, taking the output optimal solution as a path point; the path point interpolation is processed using a cubic spline to generate a smooth path.

Further, the fitness function is:

(ii) a WhereinX _k ={

,

,…,

}，X _k Is as followskParticles, each particle being a path,

is as followskA first particle ofiRoute point，

The constant number is a constant number,

，

in order to be a function of avoiding obstacles,

(ii) a Wherein

The distance of the neighboring waypoint vector to the center of the obstacle,

is shown asjThe radius of the individual obstacles is,

a swelling factor that is an obstacle;

as a function of the distance of the path,

wherein

As the distance from the starting point to the end point,

is the path length;

as a function of the smoothness of the path,

，

，i=1,2,…,nwherein

For angular variation of adjacent paths, in the range 0, pi]；

In order to be a function of the energy consumption,

；

；

(ii) a Wherein the content of the first and second substances,

in order to be a point of the path,

respectively, the coordinates of the path points are represented,

is the terrain height of the waypoint;

>1>

。

the luggage van navigation control system and the control method provided by the invention have the following beneficial effects:

1. the luggage van can maintain a constant distance from the mobile terminal carried by the user. 2. The global optimal path is obtained by adopting an improved group optimization algorithm, the fitness function of the particle swarm optimization algorithm is improved, and the global path which is safer, smoother and more efficient is generated by setting the fitness function and considering the energy consumption, obstacle avoidance capability, path smoothness, path length and path blocking degree of the path. 3. The local path planning adopts an improved machine learning algorithm to realize the obstacle avoidance of the dynamic obstacle and the local path planning, and the state potential energy is introduced by setting a reward and punishment function to realize effective dynamic obstacle avoidance.

Drawings

FIG. 1 is a block diagram of the system of the present invention;

FIG. 2 is a flow chart of the method of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

The first embodiment: as shown in fig. 1, the present embodiment provides a navigation control system for a luggage cart, which is characterized in that the system includes: the system comprises a human-computer interaction unit, a sensing unit, a path planning unit and a control unit;

the path planning unit is used for planning a path according to the obtained environmental information around the luggage van, the obtained obstacle information and the obtained state information of the luggage van, wherein the path planning comprises global path planning and local path planning, and when the luggage van cannot detect a dynamic obstacle, the luggage van is driven according to the global path planning; when the dynamic barrier is detected, planning to drive according to the local path; the local path planning comprises the steps of predicting dynamic obstacles according to an environment map and detected dynamic obstacle information, generating a sampling space moving to a plurality of track states in a state space based on the environment map, an obstacle prediction result, a current point and positions of a target point, and generating a plurality of control actions corresponding to the plurality of track states; obtaining an expected reward of each control action in the plurality of control actions based on a machine learning algorithm, scoring the path through an evaluation function, and obtaining a track with the highest score as a local optimal path; and sending the path plan to the control unit and the man-machine interaction module;

Preferably, to avoid the algorithm falling into local optimality, a reward of 0.8 times the last state is added to the current reward, resulting in a reward:

；

；

；

；

wherein the content of the first and second substances,

indicating a state when the luggage van reaches the terminal;

indicating the luggage carrier and the obstacleA state in which the distance of the object is less than a set threshold value;

representing that the state potential value of the current moment is less than the state potential value of the last moment;

the potential energy value state which can be larger than the previous moment state at the current moment state is represented;

indicating other states;

representing a distance to the target;

is a penalty factor;

a prize value for the current time;

is the observed state of the vehicle at time t;

is in a state

The control strategy of (2);

the potential energy value of the state at the current moment represents the potential energy between the current moment and the target point; the potential energy value of the current moment state is smaller than that of the previous moment state, so that the luggage van arrives at the position close to the target from the position far away from the target, and a reward value is added to the current point; the potential energy value which can be more than the previous time state in the current time state indicates that the luggage van reaches the position far away from the target from the position close to the target, the reward value is reduced for the current point,and state potential energy is introduced, when the luggage van approaches a target point, a certain reward can be obtained, otherwise, the reward is reduced, and therefore a better convergence effect is achieved.

Preferably, the machine learning algorithm comprises an Actor network and a Critic network, wherein the Actor network is used for determining a control action corresponding to a path state to form a new motion state; the Critic network is used for determining the reward of the control action based on the given path state; the Actor network observes the state according to the current particlesAnd an objectgSelecting an appropriate control actionaObtaining an expected reward by calculating a reward functionrThen, the state is fromsIs transferred tos′Will bes，g，a，r，s′Combined into one tuple X =: (s,g, a,r,s′) And store it in the experience playback pool; the expected reward for each action is accumulated to calculate a merit function,

wherein E is the mathematical expectation,

，

for the observed state of the luggage van at time t,

is in state for control strategy

A reward is issued;

is the state transition probability;

to make a state

The strategy that gets the highest reward.

Preferably, the system comprises: the sensing unit senses the obstacle information in the environment and controls the speed of the luggage van according to the obstacle information and the distance between the luggage van and the terminal equipment carried by the user.

The second embodiment: as shown in fig. 2, an embodiment of the present invention provides an automatic navigation control method for a baggage car, including the following steps:

and 3, step 3: the control unit controls the luggage van to carry luggage according to the path planning result, and a user can check path planning information through a display screen of the man-machine interaction module and move to a target position along with the luggage van, so that automatic obstacle avoidance control of the luggage van is realized.

Preferably, the global path planning in step 2 is generated based on an improved group optimization algorithm, and the specific steps include: step 2.1, rasterizing the map; each grid is in a running state or an obstacle state; initializing a particle population, wherein the particle population comprises a population scale, an initial position, an initial speed and iteration times Maxn; from the starting and target points, sets of initial path points, i.e., particles, are generatedX _k ，k=1,2, \ 8230, maxn, one path per particle comprisingnA path point; step 2.2, calculating the particle fitness by using a fitness function; step 2.3, updating the position and the speed of the particles; step 2.4, obtaining an individual optimal value and a global optimal value according to the fitness function; step 2.5 repeating steps 2.2 to 2.4 until the maximum number of iterations is reached; step 2.6, outputting a global optimal solution; step 2.7, taking the output optimal solution as a path point; the path point interpolation is processed using cubic splines to generate a smooth path.

Further, the fitness function is:

(ii) a WhereinX _k ={

,

,…,

}，X _k Is as followskParticles, each particle being a path representing a set of discrete path points from a start point to an end point,

is as followskA first particle ofiThe point of the path is the point of the path,

the constant number is a constant number,

，

in order to be a function of avoiding obstacles,

(ii) a Wherein

The distance of the neighboring waypoint vector to the center of the obstacle,

，

denotes the firstjThe radius of the individual obstacles is,

a swelling factor that is an obstacle; when in use

When the number is 1, the path is safe, and the obstacle can be avoided; when in use

A value of 0 indicates the presence of an obstacle.

As a function of the distance of the path,

wherein

As is the distance from the starting point to the end point,

is the path length; the shorter the length of the path is the shorter,

the larger.

As a function of the smoothness of the path,

，

，i=1,2,…,nwherein

，

，

For angular variation of adjacent paths, in the range [0, π]The smaller the angular variation of the adjacent paths,

the larger the value of (c), the smoother the path.

In order to be a function of the energy consumption,

；

；

(ii) a Wherein, the first and the second end of the pipe are connected with each other,

in order to be a point of the path,

respectively, the coordinates of the path points are represented,

is the terrain height of the waypoint;

>1>

(ii) a Wherein

Is the energy consumption coefficient when climbing an uphill,

the energy consumption coefficient when the slope is downhill; coefficient of through energy consumption

Distinguishing different energy consumption of the luggage van when going up a slope and when going down the slope; when the luggage van goes up a slope, the energy consumption is increased, and when the luggage van goes down a slope, the energy consumption is reduced.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims

1. A navigation control system for a luggage cart, the system comprising: the system comprises a human-computer interaction unit, a sensing unit, a path planning unit and a control unit;

the sensing unit comprises a laser radar, a vision sensor and a vehicle-mounted state sensor, and is used for acquiring environmental information, obstacle information and state information of the luggage van around the luggage van through the laser radar, the vision sensor and the vehicle-mounted state sensor and constructing an environmental map;

the path planning unit is used for planning a path according to the obtained environmental information around the luggage van, the obtained obstacle information and the obtained state information of the luggage van, wherein the path planning comprises global path planning and local path planning, and when the luggage van cannot detect a dynamic obstacle, the luggage van is driven according to the global path planning; when the dynamic barrier is detected, planning to drive according to the local path; the local path planning comprises the steps of predicting a dynamic barrier according to an environment map and detected dynamic barrier information, generating a sampling space moving to a plurality of track states in a state space based on the environment map, a barrier prediction result, a current point and the position of a target point, and generating a plurality of control actions corresponding to the plurality of track states; obtaining an expected reward of each control action in the plurality of control actions based on a machine learning algorithm, scoring the path through an evaluation function, and obtaining a track with the highest score as a local optimal path; and sending the path plan to a control unit and a man-machine interaction unit;

the control unit controls the luggage van to carry luggage according to the path planning result, and a user can check path planning information through the display screen and move to a target position along with the luggage van, so that automatic obstacle avoidance control of the luggage van is realized;

the path planning unit generates a global path plan based on an improved group optimization algorithm, and the specific steps include: 1) Rasterizing a map; each grid is in a drivable state or an obstacle state; initializing a particle population, wherein the particle population comprises a population scale, an initial position, an initial speed and iteration times; generating a plurality of groups of initial path point sets, namely a plurality of particles, according to the starting point and the target point; 2) Calculating particle fitness by using a fitness function; 3) Updating the particle position and velocity; 4) Obtaining an individual optimal value and a global optimal value according to the fitness function; 5) Repeating the steps 2 to 4 until the maximum iteration number is reached; 6) Outputting a global optimal solution; 7) Taking the output optimal solution as a path point; processing the path point interpolation using a cubic spline to generate a smooth path;

the fitness function is:

(ii) a WhereinX _k ={

,

,…,

}，X _k Is as followskParticles, each particle being a path,

the constant number is a constant number,

，

in order to be a function of avoiding obstacles,

(ii) a Wherein

The distance of the neighboring waypoint vector to the center of the obstacle,

is shown asjThe radius of the individual obstacles,

a swelling factor that is an obstacle;

as a function of the distance of the path,

wherein

As is the distance from the starting point to the end point,

is the path length;

as a function of the smoothness of the path,

，

，i=1,2,…,nwherein

For angular variation of adjacent paths, in the range 0, pi]；

In order to be a function of the energy consumption,

；

；

(ii) a Wherein

In order to have a high energy consumption coefficient,

in order to be a point of the path,

the coordinates of the path points are respectively represented,

is the terrain height of the waypoint;

>1>

(ii) a Wherein

Is the energy consumption coefficient when climbing an uphill,

the energy consumption coefficient when the slope is downhill;

the reward is:

；

；

；

wherein the content of the first and second substances,

indicating a state when the luggage van reaches the terminal;

indicating a state when the distance between the luggage van and the obstacle is less than a set threshold value;

the potential energy value state which can be larger than the previous time state at the current time state is represented;

represents other states;

representing a distance to the target;

is a penalty factor;

a prize value for the current time;

is the observed state of the vehicle at time t;

is in a state

The control strategy of (2);

the potential energy value of the state at the current moment represents the potential energy between the current moment and the target point.

2. The control system of claim 1, wherein the machine learning algorithm comprises an Actor network and a Critic network, the Actor network is used for determining a control action corresponding to a path state to form a new motion state; the Critic network is used for determining the reward of the control action based on the given path state; the Actor network observes the state according to the current particlesAnd an objectgSelecting an appropriate control actionaObtaining an expected reward by computing a reward functionrThen, the state is fromsIs transferred tos′Will bes，g，a，r，s′The combination is one tuple X =: (s,g,a,r,s′) And store it in the experience playback pool; the expected reward for each action is accumulated to calculate a merit function,

in whichEIn order to be the mathematical expectation,

in order to be a cost factor, the cost factor,

is in a state

The control strategy of (2); iterating according to a Bellman equation until strategy parameters converge to be optimal; the bellman equation is described as follows:

，

as an observation of the baggage car at time tIn the state of the mobile communication device,

is in state for control strategy

A reward is issued;

is the state transition probability;

to make a state

The strategy that gets the highest reward.

3. The control system of claim 1, wherein the system comprises: the sensing unit senses the obstacle information in the environment and controls the speed of the luggage van according to the obstacle information and the distance between the luggage van and the terminal equipment carried by the user.

4. A control method of a navigation control system of a luggage van according to any one of claims 1 to 3, characterized in that the method comprises the steps of:

step 2: acquiring a target position set by a human-computer interaction unit; the path planning unit plans a path according to the obtained target position information, the environment information, the obstacle information and the state information of the luggage van, wherein the path planning comprises global path planning and local path planning, and when the luggage van cannot detect the dynamic obstacle, the luggage van drives according to the global path planning; when the dynamic barrier is detected, planning to drive according to the local path; the local path planning comprises the steps of predicting a dynamic barrier according to an environment map and detected dynamic barrier information, generating a sampling space moving to a plurality of track states in a state space based on the environment map, a barrier prediction result, a current point and the position of a target point, and generating a plurality of control actions corresponding to the plurality of track states; obtaining an expected reward of each control action in the plurality of control actions based on a machine learning algorithm, scoring the path through an evaluation function, and obtaining a track with the highest score as a local optimal path; and sending the path plan to a control unit and a man-machine interaction unit;

and 3, step 3: the control unit controls the luggage van to carry luggage according to the path planning result, and a user can check path planning information through a display screen of the man-machine interaction unit and move to a target position along with the luggage van, so that automatic obstacle avoidance control of the luggage van is realized.

5. The control method according to claim 4, wherein the machine learning algorithm comprises an Actor network and a Critic network, and the Actor network is used for determining the control action corresponding to the path state to form a new motion state; the Critic network is used for determining the reward of the control action based on the given path state; the Actor network observes the state according to the current particlesAnd objectsgSelecting an appropriate control actionaObtaining an expected reward by calculating a reward functionrThen, the state is fromsIs transferred tos′Will bes，g，a，r，s′Combined into one tuple X =: (s,g,a,r,s′) And store it in the experience playback pool; the expected reward for each action is accumulated to compute a merit function,

whereinEIn order to be the mathematical expectation,

in order to be a cost factor, the cost factor,

is a state

The control strategy of (2); iterating according to a Bellman equation until strategy parameters are converged to be optimal; the bellman equation is described as follows:

，

for the observed state of the luggage van at time t,

in state for control strategy

A reward is issued;

is the state transition probability;

to make a state

The strategy that gets the highest prize.

6. The control method according to claim 4, characterized in that the method comprises: the sensing unit senses the obstacle information in the environment and controls the speed of the luggage van according to the obstacle information and the distance between the luggage van and the terminal equipment carried by the user.