CN110032189A

CN110032189A - A kind of intelligent storage method for planning path for mobile robot not depending on map

Info

Publication number: CN110032189A
Application number: CN201910323366.5A
Authority: CN
Inventors: 魏长赟; 张鹏鹏; 蔡帛良; 倪福生; 蒋爽; 顾磊; 李洪彬; 刘增辉
Original assignee: Changzhou Campus of Hohai University
Current assignee: Changzhou Campus of Hohai University
Priority date: 2019-04-22
Filing date: 2019-04-22
Publication date: 2019-07-19

Abstract

A kind of intelligent storage method for planning path for mobile robot for not depending on map is claimed in the present invention; include step: S1: being trained in the environment of simulation first; S2: Mobile Robotics Navigation in actual environment determines that gradient policy method carries out movement selection using the depth for saving network parameter in S1.Path planning problem of the method for the present invention effective solution in circumstances not known；By simulated training, the avoidance efficiency in circumstances not known is effectively improved.

Description

A kind of intelligent storage method for planning path for mobile robot not depending on map

Technical field

The invention belongs to robot path planning's technical field, it is related to a kind of using laser sensor, and does not depend on ground The intelligent storage method for planning path for mobile robot of figure.

Background technique

Path planning is one of key element of autonomous mobile robot, it is desirable to which mobile robot can as far as possible quick and precisely Ground arrives at the destination, while being also required to mobile robot and can safely and effectively hide barrier in environment.At present in environment It safely and effectively avoiding barrier and accurately arrives at destination in the case that map is completely known and has had more preferable solution Certainly scheme.But it is unknown in environmental map, and when relying solely on the perception data compared with dispersion laser sensor, to mobile machine The obstacle avoidance algorithm real-time and accuracy requirement of people's navigation procedure are higher, if continuing to use method known to environment carries out environment Unknown navigation and avoidance leads to failure of finally navigating then may greatly avoidance be caused to fail.

The research of the dynamic obstacle avoidance of mobile robot is mainly effectively detected to barrier and control is hidden in collision Algorithm design optimization, enables mobile robot that navigation task is accurately rapidly completed.Detection for barrier needs to utilize shifting The measurement sensor of mobile robot itself institute band carries out measurement and the movement shape of distance and position by sensor to barrier The judgement of state.Generally there are sonar sensor, infrared sensor, laser sensor, vision for the use of this kind of sensor at present Sensor etc..But sensor often has its defect, such as detection effect will when encountering sound-absorbing material for sonar sensor Being greatly affected leads to error, and for visual sensor in the poor situation of light, detection has large error etc..

In the research of dynamic obstacle avoidance algorithm, more commonly used method has Artificial Potential Field Method, VFH class algorithm, neural network Method, genetic algorithm, fuzzy logic method and rolling window method etc..Respectively there are respective advantage and disadvantage, such as Artificial Potential Field Method calculation amount Small real-time is good, but is easy to appear local minizing point.

Summary of the invention

Present invention seek to address that the above problem of the prior art, proposes a kind of intelligent storage moving machine for not depending on map Device people's paths planning method, this method are relative to conventional method advantage: the 1. laser sensor laser beams that use are less, but It is able to achieve reliable real-time route planning, reduces the sensor cost of mobile robot；2. without establishing physical surroundings map, It still can be carried out path planning.Technical scheme is as follows:

A kind of intelligent storage method for planning path for mobile robot not depending on map comprising following steps: S1: first It is trained in the environment of simulation, a1: when setting moveable robot movement, random initial target point co-ordinate position information (xt, ) and radius of target range R yt_m；Xt, yt respectively indicate X, Y coordinates of the center of target point in static map, R_mIndicate with Side length centered on (xt, yt) is d_minSquare area, all arrive at the destination finally in the zone, set mobile robot Current pose (x, y, θ_r), x, y are the current position coordinates of mobile robot, θ_rIt is the real-time direction of motion of mobile robot With the angle of X-axis, and navigation path planning is carried out by location information (θ, d) of the target point under mobile robot polar coordinates, And moved forward with fixed speed, θ is angle information of the target point in mobile robot polar coordinates, and d is target point away from movement The range information at robot center；A2: in navigation procedure, environmental data L that laser sensor in mobile robot is detected_i With target position data D_iIt is pre-processed and is characterized, then blend to obtain environmental data S_i；A3: ladder is determined using depth Strategy process is spent, obtains the action state a of next step, and change in tactful sub-network after movement a is executed by reward feedback The weight and biasing of neuron, the angle that mobile robot is deflected when a ∈ W represents execution movement is within the scope of W；A4: judgement Whether mobile robot reaches target point (xt, yt), and a2 is returned if not reaching target point and continues to navigate, if arrived Target point then terminates to navigate；A5: it after terminating navigation, according to reward value, updates depth and determines the evaluation net in gradient policy method Network parameter saves depth and determines the tactful sub-network in gradient policy method after trained success rate reaches target success rate, Network parameter is evaluated, after trained success rate reaches target success rate, depth is stored in and determines net in gradient policy method Network parameter.S2: actual Mobile Robotics Navigation (environment can be different from the environment of simulation), using saving network in S1 The depth of parameter determines that gradient policy method carries out movement selection.

Further, the environmental data L that the step a2 detects laser sensor_iWith target position data D_iIt carries out Pretreatment and characterization, then blend to obtain environmental data S_i, specifically include: laser sensor data L_i(i=1,2 ..., 10) it is pre-processed, is reconverted into environment characteristic parameters L_fi(i=1,2 ..., 10)；Target position data need to first carry out subregion Region distance data D is obtained after processing_i(i=11,12,13), wherein D₁₁It is angle of the current mobile robot relative to X-coordinate Degree, D₁₂It is the distance of distance objective point, D₁₃It is angle of the target point relative to mobile robot itself direction of advance, then carries out Be converted to distance feature parameter D_fi(i=11,12,13)；According to the maximum distance dm of definition, by the range data of laser sensor Be converted to distance feature Value Data: L_fi=L_i÷ dm (i=1,2,3 ..., 10) by the range data of laser sensor be converted to away from From characteristic value data: D_fi=D₁₁÷ π, D₁₂÷ dm, D₁₃÷ π, then according to the distance feature Value Data and target of laser sensor The distance feature Value Data of point position is merged, and obtains current environmental characteristic data S_f1~S_f13, amalgamation mode are as follows:

Further, the purpose of data of the target position need to first carry out subregion, subregion is to reach mesh in order to obtain Target best angle obtains range data D after processing₁₃, D₁₃It is angle of the target point relative to mobile robot itself direction of advance Degree, specifically includes: first by, as with reference to starting point, clockwise angle is negative, and counterclockwise angle is positive, and obtains immediately ahead of mobile robot Absolute value to the optimal angle relative to target position, angle is less than or equal to 180 °.

Further, depth determines that gradient policy method specifically includes in the step a3: movement selection strategy uses It is tactful sub-network output action, and additional N_tDisturbance, be expressed as

A=A (s | μ A)+N_t

Wherein, s indicates state, and μ A is tactful sub-network parameter, N_tIt is disturbance, A is that depth determines gradient policy method Action policy.When mobile robot needs to carry out dynamic obstacle avoidance, gradient plan is determined using the fused data at the moment as depth Then slightly input data exports movement of lower a moment a after depth determines gradient policy decision, movement a is held in the environment After row, according to the different updates for carrying out depth and determining gradient policy method network parameter of reward value, in evaluation network:

Q (s, a)=Q (s, a)+α (r+ (Q (s', a'))-Q (s, a))

Wherein Q is value function, (s, a) be t moment state, r is the corresponding reward value of t moment behavior, and Q (s', a') is In the Q value that the behavior that the t+1 moment takes calculates under new state, α is learning rate, and γ is discount factor.

Further, the design of the movement a in fixed continuum specifically, select.

Further, which is characterized in that the design of R value specifically: in order to define reward function, first to mobile robot State S classified as follows:

1) safe condition SS: one group of state that any barrier in mobile robot and environment does not collide；

2) non-secure states NS: one group of state of any barrier collision in mobile robot and environment；

3) winning phase WS: mobile robot reaches state when target；

According to mobile robot state, reward function is defined.

Further, the step a4 specifically: current coordinate information (x, y) judges moving machine according to mobile robot Whether device people reaches target point (xt, yt)；IfShow that mobile robot has arrived at mesh Within the scope of punctuate, if min { L₁,L₂,...L₁₀} > C, L₁It is the distance apart from obstacle that laser sensor obtains, C is moving machine The length of device people shows that mobile robot generates collision with obstacle, has been WS or NS, terminates this time to navigate；Conversely, Show that target point has not yet been reached in mobile robot, it is still necessary to continue to navigate, return step a2 is continued to execute, until reaching target Point.

It advantages of the present invention and has the beneficial effect that:

The present invention provides a kind of intelligent storage method for planning path for mobile robot for not depending on map, the method for the present invention By the method for deep learning, path planning problem of the effective solution in circumstances not known；By simulated training, effectively Improve the avoidance efficiency in true environment.

Detailed description of the invention

Fig. 1 is that the present invention provides preferred embodiment as mobile robot perception target point model；

Fig. 2 is mobile robot laser sensor disturbance of perception model；

Fig. 3 is S1 step overall flow figure；

Fig. 4 is S2 step overall flow figure.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, detailed Carefully describe.Described embodiment is only a part of the embodiments of the present invention.

The technical solution that the present invention solves above-mentioned technical problem is:

As shown in figures 3 and 4, a kind of intelligent storage method for planning path for mobile robot not depending on map, this method include Following steps:

S1: it is trained in the environment of simulation first；

A1: the target of moveable robot movement, random initial target point co-ordinate position information (xt, yt) and target half are set Diameter range R_m；Xt, yt respectively indicate X of the center of target point in static map, Y axis coordinate, R_mIndicate with (xt, yt) be The side length of the heart is d_minSquare area, all arrive at the destination finally in the zone, the current pose of setting mobile robot (x, y, θ_r), x, y are the current position coordinates of mobile robot, θ_rIt is the folder of the mobile robot real-time direction of motion and X-axis Angle, and path planning is carried out by location information (θ, d) of the target point under mobile robot polar coordinates, and with fixed speed to Preceding traveling, θ are angle information of the target point in mobile robot polar coordinates, d is target point away from mobile robot center away from From information；

A2: in navigation procedure, environmental data L that laser sensor in mobile robot is detected_iWith target position number According to D_iIt is pre-processed and is characterized, then blend to obtain environmental data S_i；

A3: determine that gradient policy method obtains the action state a of next step using depth；A ∈ W represents execution movement time shift The angle that mobile robot is deflected is within the scope of W；

A4: judging whether mobile robot reaches target point (xt, yt) or collision, continues to lead if returning to a2 without if Boat, terminates to navigate if it arrived target point；

A5: it after terminating navigation, according to reward value, updates depth and determines the tactful sub-network in gradient policy method, evaluate Network parameter is stored in depth and determines that the network in gradient policy method is joined after trained success rate reaches target success rate Number.

S2: Mobile Robotics Navigation (environment can be different from environment when simulation) in actual environment are protected using in S1 The depth for having deposited network parameter determines that gradient policy method carries out movement selection.

Further, the environmental data L that the step a3 detects laser sensor_iWith target position data D_iIt carries out Pretreatment and characterization, then blend to obtain environmental data S_i, specifically include: laser sensor data L_i(i=1,2 ..., 10) it is pre-processed, is reconverted into environment characteristic parameters L_fi(i=1,2 ..., 10)；The data of target position need to first carry out subregion Region distance data D is obtained after the processing of domain_i(i=11,12,13), wherein D₁₁It is angle of the current mobile robot relative to X-coordinate Degree, D₁₂It is the distance i.e. d of distance objective point, D₁₃It is angle i.e. θ of the target point relative to mobile robot itself direction of advance, It is converted again, obtains distance feature parameter D_fi(i=11,12,13)；According to the maximum distance dm of definition, by laser sensor Range data value be converted to distance feature Value Data: L_fi=L_i÷ dm (i=1,2,3 ..., 10) is by the distance of laser sensor Data value is converted to distance feature Value Data: D_fi=D₁₁÷ π, D₁₂÷ dm, D₁₃÷ π, it is then special according to the environment of laser sensor The environmental characteristic Value Data of value indicative data and aiming spot is merged, and obtains current environmental characteristic data S_f1~S_f13, melt Conjunction mode are as follows:

Further, the data of the target position need to first carry out subregion, and subregional purpose is to reach in order to obtain The best angle of target obtains range data D after processing₁₃, D₁₃It is target point relative to mobile robot itself direction of advance Angle specifically includes: it will be first used as immediately ahead of mobile robot and refer to starting point, clockwise angle is negative, and counterclockwise angle is positive, The optimal angle relative to target position is obtained, the absolute value of angle is equal to less than 180 °.

Further, depth determines that gradient policy method specifically includes in the step a3: the movement of selection is tactful son Network output action corresponds to the movement a that current state obtains after tactful sub-network operation as input value, and additional N_t Disturbance, be expressed as

A=A (s | μ A)+N_t (2)

S indicates state, and μ A is tactful sub-network parameter, and A is the action policy that depth determines gradient policy method, works as movement When robot needs path planning, determines that gradient policy inputs using the fused data at the moment as depth, then pass through depth After determining gradient policy method decision, movement of lower a moment a is exported, it is deep according to the different progress of reward value after movement a is executed The update for determining gradient policy network parameter is spent, in evaluation network:

Q (s, a)=Q (s, a)+α (r+ (Q (s', a'))-Q (s, a)) (3)

Further, the design of the movement a in fixed continuum specifically, select.

Further, the step a5, which is characterized in that the design of R value specifically: in order to define reward function, first Classified as follows to the state S of mobile robot:

3) winning phase WS: mobile robot reaches state when target；

According to mobile robot state, it is as follows to define reward function:

When mobile robot reaches target, and state is winning phase WS, R=10；When mobile robot and obstacle produce When raw collision, when state is non-secure states NS, R=-5；When mobile robot is in the environment both without colliding or not reaching When terminal, state is safe condition SS, R=(d_i-d_i+1)/d_m, d_iIt is current time at a distance from target point, d_i+1It is lower a period of time It carves at a distance from target point.

Further, the step S2 specifically, in the navigation procedure of entity mobile robot, inherit by mobile robot Network parameter in step sl determines the movement at gradient policy method choice current time by depth, until reaching target Region.

The above embodiment is interpreted as being merely to illustrate the present invention rather than limit the scope of the invention.? After the content for having read record of the invention, technical staff can be made various changes or modifications the present invention, these equivalent changes Change and modification equally falls into the scope of the claims in the present invention.

Claims

1. a kind of intelligent storage method for planning path for mobile robot for not depending on map, which comprises the following steps:

S1: mobile robot is trained in the environment of simulation first；

A1: target when setting moveable robot movement, random initial target point co-ordinate position information (xt, yt) and radius of target Range R_m；Xt, yt respectively indicate X of the center of target point in static map, Y axis coordinate, R_mIt indicates centered on (xt, yt) Side length be d_minSquare area, all arrive at the destination finally in the zone, the current pose of setting mobile robot (x, Y, θ_r), x, y are the current position coordinates of mobile robot, θ_rIt is the angle of the mobile robot real-time direction of motion and X-axis, and Path planning is carried out by location information (θ, d) of the target point under mobile robot polar coordinates, and with fixed speed to moving ahead It sails, θ is angle information of the target point under mobile robot polar coordinates, and d is distance letter of the target point away from mobile robot center Breath；

A2: in navigation procedure, environmental data L that laser sensor in mobile robot is detected_iWith target position data D_iInto Row pretreatment and characterization, then blend to obtain environmental data S_i；

A3: determining gradient policy method using depth, and action state a, a ∈ W for obtaining next step represents movement when execution acts The angle that robot is deflected is within the scope of W；

A4: judging whether mobile robot reaches target point (xt, yt), continues to navigate if returning to a2 without if, if arrived Up to then terminating to navigate；

A5: it after terminating navigation, according to reward value, updates depth and determines the tactful sub-network in gradient policy method, evaluate network Parameter is stored in depth and determines network parameter in gradient policy method after trained success rate reaches target success rate；

S2: actual environment Mobile Robotics Navigation use saved in S1 network parameter depth determine gradient policy method into The movement selection of row mobile robot.

2. a kind of intelligent storage method for planning path for mobile robot for not depending on map according to claim 1, special Sign is, the environmental data L that the step a2 detects laser sensor_iWith target position data D_iCarry out pretreatment and spy Then signization blends to obtain environmental data S_i, it specifically includes:

Laser sensor data L_i(i=1,2 ..., 10) is pre-processed, and environment characteristic parameters L is reconverted into_fi(i=1, 2 ..., 10)；The data of target position need to first carry out subarea processing, then obtain region distance data D_i(i=1,2,3), Middle D₁It is angle of the current mobile robot with respect to X-coordinate, D₂It is the distance i.e. d of distance objective point, D₃It is target point relative to shifting Angle, that is, θ of mobile robot itself direction of advance, then D_iIt carries out being converted to distance feature parameter D again_fi(i=11,12, 13)；According to the maximum distance dm of definition, the range data of laser sensor is converted into distance feature Value Data: L_fi=L_i÷ The range data of laser sensor is converted to distance feature Value Data: D by dm (i=1,2,3 ..., 10)_fi=D₁₁÷ π, D₁₂÷ Dm, D₁₃Then ÷ π is melted according to the distance feature Value Data of the distance feature Value Data of laser sensor and aiming spot It closes, obtains current environmental characteristic data S_f1~S_f13, amalgamation mode are as follows:

3. a kind of intelligent storage method for planning path for mobile robot for not depending on map according to claim 2, special Sign is that the data of the target position obtain data D after need to first carrying out subarea processing₁₃, D₁₃It is target point relative to movement The angle of robot itself direction of advance, specifically includes: will first be used as immediately ahead of mobile robot and refers to starting point, clockwise angle It is negative, counterclockwise angle is positive, and obtains the optimal angle relative to target position, and the absolute value of angle is less than or equal to 180 °.

4. a kind of intelligent storage method for planning path for mobile robot for not depending on map according to claim 1, special Sign is that depth determines that gradient policy method specifically includes in the step a3: movement selection strategy is using tactful subnet Network output action, and additional disturbance:

A=A (s | μ A)+N_t

Wherein, s indicates state, and μ A is tactful sub-network parameter, N_tIt is disturbance, A is the movement plan that depth determines gradient policy method Slightly.When mobile robot needs to carry out dynamic obstacle avoidance, determine that gradient policy inputs using the fused data at the moment as depth Then data export movement of lower a moment a, after movement a is executed in the environment, root after depth determines gradient policy decision According to the different updates for carrying out depth and determining gradient policy method network parameter of reward value, in evaluation network:

Q (s, a)=Q (s, a)+α (r+ (Q (s', a'))-Q (s, a))

Wherein Q is value function, (s, a) be t moment state, R is the corresponding reward value of t moment behavior, and Q (s', a') is in t+1 The Q value that the behavior that moment takes calculates under new state, α are learning rates, and γ is discount factor.

5. a kind of intelligent storage method for planning path for mobile robot for not depending on map according to claim 4, special Sign is that the design of the movement a in fixed continuum specifically, select.

6. a kind of intelligent storage method for planning path for mobile robot for not depending on map according to claim 4, special Sign is, the design of R value specifically: in order to define reward function, classified as follows to the state S of mobile robot first:

3) winning phase WS: mobile robot reaches state when target；

According to mobile robot state, reward function is defined.

7. a kind of intelligent storage method for planning path for mobile robot for not depending on map according to claim 1, special Sign is, the step a4 specifically:

Judge whether mobile robot reaches target point (xt, yt) according to the current coordinate information (x, y) of mobile robot；IfShow that mobile robot has arrived in target point range, if min { L₁,L₂, ...L₁₀} > C, L₁It is the distance apart from obstacle that laser sensor obtains, C is the length of mobile robot, shows mobile machine People generates collision with obstacle, has been WS or NS, terminates this time to navigate；Conversely, showing that mobile robot has not yet been reached Target point, it is still necessary to continue to navigate, return step a2 is continued to execute, until reaching target point.