CN110362070B

CN110362070B - Path following method, system, electronic device and storage medium

Info

Publication number: CN110362070B
Application number: CN201811191784.5A
Authority: CN
Inventors: 高萌; 李雨倩; 刘懿; 李�浩
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2018-10-12
Filing date: 2018-10-12
Publication date: 2022-09-06
Anticipated expiration: 2038-10-12
Also published as: CN110362070A

Abstract

The invention provides a path following method, a system, an electronic device and a storage medium, wherein the methodThe method comprises the following steps: according to the grid map, obtaining a planned path from a starting grid to an end grid and a path area obtained based on planned path expansion, and obtaining an initial value of each grid obstacle avoidance in the path area; traversing from the starting point grid to obtain each grid in the path area based on each action a in the action set _i Including cyclically obtaining each current grid s _n Based on each action a _i Cost value Q(s) _n ,a _i ). And, on the current grid s _n Execution cost value Q(s) _n ,a _i ) The minimum cost value of the corresponding action. According to the invention, the path area is established in the allowable error range of the planned path, and the Q-Learning algorithm is combined to update and train the cost value of each grid in the path area based on each action, so that the action with the minimum cost value can be selected for each grid to carry out follow-up control on the planned path after the training is finished.

Description

Path following method, system, electronic device and storage medium

Technical Field

The present invention relates to the field of unmanned driving technologies, and in particular, to a path following method, a system, an electronic device, and a storage medium.

Background

In the field of unmanned driving, such as unmanned vehicle driving, the accuracy of path following determines whether the unmanned vehicle can safely and accurately drive. No matter how accurate the global path planning or the local path planning plans the path, if the unmanned vehicle cannot accurately follow the planned path, the unmanned vehicle cannot be accurately controlled in real time.

In the traditional path following method, more classical is Pure Pursuit control, and the method establishes a mathematical model for the vehicle to follow the path by analyzing the current position and orientation and the target position and orientation of the vehicle so as to realize the control of the vehicle path following.

With the development of machine learning, the application of reinforcement learning in path following is gradually wide, and the existing reinforcement learning path following method is to establish a mapping relation between the average curvature of a path and a vehicle motion mode through reinforcement learning to realize the path following function of an unmanned vehicle. Besides reinforcement learning, deep learning also realizes the following function of the lane line to a certain extent by identifying the current lane line by the camera.

However, the conventional method for realizing path following mainly establishes a mathematical model of the vehicle, and the adjustment requirement of the relevant parameters is strict by establishing the mathematical model to perform path following control on the vehicle, and the quality of the parameters has direct influence on the path following effect. Meanwhile, the change of the path has a great influence on the parameters, and when the path is a straight line and the path is a curved path or the paths have different bending degrees, different parameters are needed to ensure that the path has a good following effect. Therefore, the traditional control mode depends too much on parameter adjustment, and the adaptability and the universality are poor.

Moreover, the lane line can be followed by using deep learning, but the following effect on the path without the lane line is not ideal, and the actual requirement cannot be met. The method of establishing the relationship between the average curvature of the path and the vehicle action mode through reinforcement learning can meet the requirements under most conditions, but cannot adapt to the conditions that the average curvature is zero, two curves with a short distance actually exist and the like, so that the vehicle cannot well follow the path under the conditions, even has large deviation, and the vehicle is collided.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present invention and therefore may include information that does not constitute prior art known to a person of ordinary skill in the art.

Disclosure of Invention

Therefore, the invention provides a path following method, a system, an electronic device and a storage medium, which solve the problems that the path following depends on manual experience to adjust parameters and the path following cannot be accurately followed under a complex path in the prior art.

According to an aspect of the present invention, there is provided a path following method comprising the steps of: according to the grid map, obtaining a planned path from a starting grid to an end grid and a path area obtained based on planned path expansion, and obtaining an initial value of each grid obstacle avoidance in the path area; traversing from the starting point grid to obtain each grid in the path area based on each action a in the action set _i The cost value of (2), comprising: obtaining a current grid s _n Performing each action a _i Next grid s that can be reached _n+1 As the current grid s _n Based on each action a _i Current profit R _n (ii) a Obtaining each next grid s _n+1 Re-executing each action a _i Reachable next grid s _n+2 Maximum value max (Q (s)) among the initial values of (a) _n+1 ,a _i ) ); according to the current profit R _n And maximum value max (Q(s) _n+1 ,a _i ) To obtain the current grid s _n Based on each action a _i Cost value Q(s) _n ,a _i )。

Preferably, the path following method further includes the steps of: at the current grid s _n Execution cost value Q(s) _n ,a _i ) The minimum cost value of the corresponding action.

Preferably, in the path following method described above, the cost value Q(s) is obtained _n ,a _i ) In step (2), according to the current profit R _n And the current grid s _n Based on each action a _i Future profit of W _n Obtaining the cost value Q(s) _n ,a _i ) Future profit W _n ＝γ(max(Q(s _n+1 ,a _i ) Y) is a discount factor representing the degree of sacrificing current revenue in exchange for future revenue, 0<γ<1。

Preferably, in the path following method described above, a cost value Q(s) is obtained _n ,a _i ) The formula of (1) is: q(s) _n ,a _i )＝R _n +γ(max(Q(s _n+1 ,a _i )))。

Preferably, in the path following method described above, the cost value Q(s) is obtained _n ,a _i ) Further comprising the steps of: detecting the current grid s _n Whether the position is in the path area or not, if so, obtaining the current profit R _n If not, the grid is traversed again from the starting point.

Preferably, in the path following method, each grid in the path region obtained by traversing is based on each action a _i The step of cost value of (a) further comprises: and detecting the position of the passed grid in real time, and when the position of the passed grid exceeds the path area, starting to traverse the grid from the starting point.

Preferably, in the above path following method, the traversal from the starting grid to obtain each grid in the path region is based on each action a _i The step of cost value of (2) comprises: executing any action in the action set from the starting grid to the current grid s _n (ii) a Obtaining a current grid s _n Based on each action a _i Cost value Q(s) _n ,a _i ) (ii) a And, applying the current grid s _n As a starting point grid, traversing each grid in the obtained path area based on each action a _i The cost value of (c).

Preferably, in the path following method, each action a in the action set is executed according to a preset frequency _i 。

Preferably, in the above path following method, the action set includes j × k actions, i ≦ j × k is greater than or equal to 1, j is j different speeds in an interval from the minimum speed to the maximum speed, and k is k different angles in an interval from the minimum angle to the maximum angle.

Preferably, in the path following method, the step of obtaining the path region includes: obtaining a free movement area of the grid map according to the cost map, expanding preset distances from the planning path to two sides in the free movement area, and generating two boundaries which are positioned at two sides of the planning path and are parallel to the planning path; and taking the two boundaries as obstacles, and generating a path area between the two boundaries, wherein each grid in the path area has an initial value for avoiding the obstacles.

According to another aspect of the present invention, there is provided a path following system comprising: the initialization module acquires a planned path from a starting point grid to an end point grid and a path area obtained based on planned path expansion according to a grid map, and acquires an initial value of each grid for avoiding obstacles in the path area; a training module which traverses from the starting point grid to obtain each grid in the path area based on each action a in the action set _i The cost value of (2) includes: obtaining a current grid s _n Perform each action a _i Next grid s that can be reached _n+1 As the current grid s _n Based on each action a _i Current profit R _n (ii) a Obtaining each next grid s _n+1 Then each action a is executed _i Reachable next grid s _n+2 Maximum value max (Q (s)) among the initial values of (a) _n+1 ,a _i ) ); according to the current profit R _n And maximum value max (Q(s) _n+1 ,a _i ) To obtain the current grid s _n Based on each action a _i Cost value Q(s) _n ,a _i )。

Preferably, the path following system further includes: a follow control module for following the current grid s _n Execution cost value Q(s) _n ,a _i ) The minimum cost value of the corresponding action.

According to an aspect of the invention, there is provided an electronic device comprising a processor and a memory for storing executable instructions, the processor being configured to perform the steps of the path following method described above via execution of the executable instructions.

According to an aspect of the invention, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the path following method described above.

By adopting the technical scheme, compared with the prior art, the invention has the beneficial effects that:

establishing a path area in an allowable error range of a planned path, and training on the basis of the path area; and updating and training the cost value of each grid in the path area based on a Q-Learning algorithm, so that the action with the minimum cost value can be selected to follow and control after the training is finished.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

FIG. 1 is a schematic diagram illustrating steps of a path following method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating the steps of obtaining a cost value of a current grid based on each action in an embodiment;

FIG. 3 is a diagram illustrating steps of traversing to obtain a cost value of each grid based on each action in an embodiment;

FIG. 4 shows a schematic diagram of a planned path and path region in an embodiment of the invention;

FIG. 5 is a detailed flow chart of a path following method in an embodiment of the invention;

FIG. 6 shows a block diagram of a path following system in an embodiment of the invention;

FIG. 7 shows a schematic diagram of an electronic device in an embodiment of the invention;

FIG. 8 shows a schematic diagram of a computer-readable storage medium in an embodiment of the invention.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar structures, and thus their repetitive description will be omitted.

According to the method, through a Q-Learning algorithm in a reinforcement Learning algorithm, grids in a path region generated by a cost map (Costmap) are combined with a Q table to perform autonomous Learning updating, so that the problem that a traditional control mode depends on manual experience to adjust parameters is solved, and meanwhile, path following with average curvature as reference is optimized, and the problem that a complex path cannot be followed accurately is solved.

Referring to fig. 1, in one embodiment of the present invention, a path following method includes:

s10, obtaining a planned path from a starting point grid to an end point grid and a path area obtained based on planned path expansion according to the grid map, and obtaining an initial value of each grid in the path area for avoiding the obstacle; s20 traversing from the starting grid to obtain each grid in the path area based on each action a in the action set _i The cost value of (c).

Based on a path area obtained based on the extension of the planned path, each grid in the path area is obtained through traversal based on each action a _i For selecting the optimal action when the path is followed.

In one embodiment, the path following is controlled by S30 on the current grid S _n Execution cost value Q(s) _n ,a _i ) The minimum cost value of the corresponding action. Specifically, the path tracking starts at each current grid s from the origin grid _n Execution cost value Q(s) _n ,a _i ) And the optimal action follows the planned path in the path area to reach the end point grid at the minimum cost by the circulation.

Referring to fig. 2, in a preferred embodiment, step S20 includes: s221 obtaining a current grid S _n Perform each action a _i Next grid s that can be reached _n+1 As the current grid s _n On a per basisAn action a _i Current profit R _n (ii) a S222 obtaining each next grid S _n+1 Re-executing each action a _i Reachable next grid s _n+2 Maximum value max (Q (s)) among the initial values of (a) _n+1 ,a _i ) ); s223 according to the current profit R _n And maximum value max (Q(s) _n+1 ,a _i ) To obtain the current grid s _n Based on each action a _i Cost value Q(s) _n ,a _i ). Wherein the current grid s _n May be any grid within the routing area. Next grid s _n+1 Not necessarily with the current grid s _n Adjacent, but from the current grid s _n And executing the grid which can be reached by a certain action in the action set. From the current grid s _n Performing different actions in the set of actions, the next grid s that can be reached _n+1 And also vary from one another. In the same way, the next grid s _n+2 Nor necessarily with the next grid s _n+1 Adjacent to each other.

Further, in S223, according to the current profit R _n And maximum value max (Q(s) _n+1 ,a _i ) To obtain a cost value Q(s) _n ,a _i ) The method comprises the following steps: according to the current grid s _n Based on each action a _i Current profit R _n And a current grid s _n Based on each action a _i Future profit of (W) _n Obtaining the cost value Q(s) _n ,a _i ). Future profit W _n ＝γ(max(Q(s _n+1 ,a _i ) Y) is a discount factor representing the degree of sacrificing current revenue in exchange for future revenue, 0<γ<1. In one embodiment, γ is 0.8. Obtaining the cost value Q(s) _n ,a _i ) The formula of (1) is: q(s) _n ,a _i )＝R _n +γ(max(Q(s _n+1 ,a _i ))). Current grid s _n Marked by coordinate values, action a _i Taken from the action set.

In a preferred embodiment, the set of actions comprises j × k actions, i ≦ j × k with 1 ≦ i ≦ j, j being j different speeds in the interval from the minimum speed to the maximum speed, k being k different angles in the interval from the minimum angle to the maximum angle. The minimum speed, the maximum speed, the minimum angle, and the maximum angle are all limits within an allowable range. The angle may be a steering wheel angle, i.e. the angle of rotation of the steering wheel relative to the home position. In order to realize accurate control, j equal division can be carried out on the interval from the minimum speed to the maximum speed, j different speed equal differences are increased progressively, and k equal division is carried out on the interval from the minimum angle to the maximum angle in the same way, k different angle equal differences are increased progressively, and j × k actions are formed by combination.

Further, each action a in the action set is executed according to a preset frequency _i And the accurate control is convenient. The predetermined frequency is, for example, 80 HZ.

By means of any of the embodiments described above, with a grid s ₁ Is the current grid s _n For example, a grid s is obtained ₁ Based on a certain action a in the action set ₁ Cost value Q(s) ₁ ,a ₁ ) The method comprises the following steps: first, a grid s is obtained ₁ Performing action a ₁ Arriving grid s ₂ As the initial value of the grid s ₁ Based on action a ₁ Current profit R ₁ (ii) a Then obtaining the grid s ₂ Performing each action a in the set of actions _i Reachable individual grids s ₃ Maximum value max (Q (s)) among the initial values of (a) ₂ ,a _i ) ); finally according to the current profit R ₁ And max (Q(s) ₂ ,a _i ) To obtain a grid s) ₁ Based on action a ₁ Cost value Q(s) ₁ ,a ₁ ). Similarly, a grid s may be obtained ₁ Based on each action a _i Cost value Q(s) ₁ ,a _i ). Similarly, from the starting grid, every current grid s in the routing region _n Based on each action a _i Cost value Q(s) _n ,a _i ) Can be obtained by going through the steps.

Referring to fig. 3, a specific cycle manner in the traversal process is as follows: s21 executing an action in the action set from the starting grid to the current grid S _n . S22 obtaining a current grid S _n Based on each action a _i Cost value Q(s) _n ,a _i ) For example, Q (S) is obtained in the manner of S221 to S223 in the above embodiment _n ,a _i ). And S23 willCurrent grid s _n Cycling as a starting grid so as to traverse each grid in the acquired path region on a per action a basis _i The cost value of (2). For example, a certain grid (e.g., s) is obtained ₁₁ ) Based on each action a _i Cost value Q(s) ₁₁ ,a _i ) Then, the grid s ₁₁ Randomly performing an action in the set of actions to another grid (e.g., s) ₂₂ ) Obtaining the grid s ₂₂ Based on each action a _i Cost value Q(s) ₂₂ ,a _i ) (ii) a Then grid s ₂₂ Randomly performing an action in the set of actions to reach grid s ₃₃ …, until each grid in the traversal acquisition path region is based on each action a _i The cost value of (c).

In the process of obtaining the cost value through traversal, the method can further comprise the step of detecting whether the obstacle is touched, if the obstacle is touched, the action is finished, traversal is restarted, and errors of the obtained cost value are avoided. The first way is when the current grid s is reached _n To start obtaining the current grid s _n Based on each action a _i Current profit R _n Before, the current grid s is detected _n Whether it is located in the path region, it can pass through the current grid s _n The coordinate value of (2) is judged; if the current grid s _n The action is effective when the route is positioned in the route area, so that the current profit R is obtained continuously _n If the current grid s is present _n If the range of the path area is exceeded, the traversal is invalid, and the obstacle is hit, so that a new traversal is started from the starting grid, namely, the starting grid is returned to start a new round of traversal. The second detection method is to detect the passing grid position in real time during the process of traversing to obtain the cost value, and the "passing grid position" referred to herein is not limited to each current grid s reached by executing the action according to the preset frequency _n Instead, the real-time detection is achieved for each grid that is passed through during the execution of the action, rather than the detection at the preset frequency as described above. If the passed grid position is in the path area, the whole process is continued, and if the passed grid position exceeds the path area, the original grid position is returned toA new round of traversal is started. Similarly, other detection methods may also be employed, and are not described herein.

Any detection method listed here can be combined with the path following method described in any of the above embodiments, so that each grid in the traversal obtaining path area is based on each action a _i In the process of obtaining the cost value, the steps of obtaining the cost value each time are guaranteed to be effective.

Further, after the step S10 obtains the planned path from the start grid to the end grid according to the grid map, the step S of obtaining the path region based on the planned path expansion includes: obtaining a free movement area of the grid map according to the cost map, expanding preset distances from the planning path to two sides in the free movement area, and generating two boundaries which are positioned at two sides of the planning path and are parallel to the planning path; and taking the two boundaries as obstacles, and generating a path area between the two boundaries, wherein each grid in the path area has an initial value for avoiding the obstacles.

Specifically, a cost map (Costmap) establishes a grid map based cost map according to point cloud data of the laser radar, and the whole grid map is divided into three parts, namely an obstacle area, a free movement area and an unknown area according to different distribution of the point cloud data. The planned paths from the start grid to the end grid derived by the path planning algorithm (e.g., a-x algorithm) are all within the free movement area of the grid map. Referring to FIG. 4, for example, a self-starting grid S is obtained ₁ To the end point grid S ₁₀₀ The planned path 10. Then, in the free movement area (the planned path 10 and the path area obtained subsequently are both located in the free movement area, the boundary of the free movement area is not marked in the figure), Point Cloud data (Point Cloud) is issued at a distance δ from the planned path 10 on both sides of the planned path 10, and two parallel paths 11 parallel to the planned path 10 are formed. The two parallel paths 11 composed of the point cloud data form an obstacle after cost map processing, and further a path area between the two parallel paths 11 is generated in the free movement area, and the two parallel paths 11 are processed as the boundary of the path area. Wherein, the path areas are two pieces shown by bold black lines in the figureThe area between the parallel paths 11 in which each grid has an initial value for avoiding obstacles (i.e., two parallel paths 11 indicated by bold black lines in the figure). The closer to the obstacle, i.e. the grid of the parallel path 11, the larger the initial value, and the closer to the grid of the planned path 10, the smaller the initial value.

Subsequent acquisition of each grid is based on each action a _i The process of the cost value takes the path area as an effective area, the path following needs to move along the planned path 10, the allowed following error is delta, and when the deviation of the planned path 10 touches the parallel path 11 as an obstacle, the allowable following error delta is considered to be exceeded, and the deviation deviates from the effective path area. The following error δ can be set in various embodiments as desired, for example, to 1 cm.

The following describes a specific flow of the path following method with reference to an embodiment. Referring to fig. 5, the path following method in this embodiment includes three stages, an initialization stage S10, a cost value updating stage S20, and a following control stage S30. The initialization stage S10, the cost value updating stage S20, and the follow-up control stage S30 respectively correspond to the step S10, the step S20, and the step S30 in the foregoing embodiment, and the specific steps and the layout of the sub-steps thereof may be selected as needed.

In the initialization stage S10, map information is mainly processed. Firstly, loading a grid map, point cloud data acquired by a laser radar and a planning path obtained by a path planning algorithm. And then, carrying out cost value processing on the grid map through the cost map, and dividing the grid map into an obstacle area, a free movement area and an unknown area. And finally, point cloud data are issued on two sides of the planned path in the free movement area to generate a new free movement area, namely a path area, so that the tolerable deviation of the unmanned vehicle in the path following process is restrained. Each grid in the path area has an initial value for avoiding obstacles.

In the cost value updating stage S20, the cost value of each grid is updated in combination with the Q-Learning algorithm. The coordinates of each grid of the path region formed in the initialization stage S10 are used as the state value S of the Q-Learning algorithm. The unmanned vehicle motion speed is divided into 10 equal parts in the interval of 0m/s to 2m/s, the steering wheel angle is divided into 10 equal parts in the interval of 0 degrees to 180 degrees, the motion speed and the steering wheel angle are combined, and the formed motion is used as the motion value a of the Q-Learning algorithm. In the updating process, the initial value of each grid is used as a reward and punishment value of a Q-Learning algorithm, and whether an updating round is finished or not is judged according to the boundary of a path area.

The specific updating process is as follows: firstly, initializing the Q value of each grid in the path area, namely the initial value of each grid. An action is then randomly selected and executed. And obtaining a current grid reached after the action is executed and an initial value of the current grid. And then judging whether the current grid is in the range of the path area, if so, updating the cost value of the current grid based on each action, and continuing to select the action from the current grid and carrying out the whole updating process after the updating. If not, namely the boundary of the path area is detected, returning to the starting point grid, which is equivalent to returning to the initial step of the whole cost value updating stage S20, and entering the next round of training. By adopting the mode, the Q value of different actions is updated for the grid which is walked.

Wherein, the formula for updating the cost value of the current grid based on each action is as follows: q(s) _n ,a _i )＝R _n +γ(max(Q(s _n+1 ,a _i ))). s represents a state, i.e., a coordinate value of the grid; s _n I.e. the coordinate values of the current grid. a represents an action taken from the current state, i.e. an action selected in the action set; a is _i I.e. the current grid s _n The various actions to be performed. Q(s) _n ,a _i ) I.e. the current grid s _n Based on each action a _i The cost value of (c). R _n Is Reward, i.e. this action, with action a ₁ For example, take the current grid s _n Performing action a ₁ Next grid s that can be reached _n+1 As a reward for this action, i.e. the current grid s _n Based on action a ₁ Current profit R _n 。Q(s _n+1 ,a _i ) Is the current grid s _n Performing action a ₁ The next grid s reached _n+1 Re-executing each action a _i Each next gate s that can be reached _n+2 Selecting the largest initial value, multiplying by the discount factor gamma, and using the multiplied value as the current grid s _n Based on action a ₁ Future benefits of. Will be the current grid s _n Based on action a ₁ Current profit R _n With the current grid s _n Based on action a ₁ Future profit of gamma (max (Q (s)) _n+1 ,a _i ) ) are added as the current grid s) _n Based on action a ₁ Cost value Q(s) _n ,a ₁ )。

And similarly, updating the cost value of each grid based on each action in the action set through a Q-Learning algorithm, and finally outputting a cost map after the path area is updated, wherein each grid in the updated cost map has the cost value of avoiding the obstacle based on each action.

In the following control stage S30, according to the updated cost map in the route area, the unmanned vehicle can select an optimal motion to follow the planned route. Specifically, starting from the starting grid, at each current grid s _n Execution cost value Q(s) _n ,a _i ) The action corresponding to the minimum cost value can realize accurate following of the planned path.

The above-described embodiments improve the problem that the conventional control method relies on parameter tuning and the problem that the optimization reinforcement learning cannot cover all paths using the mean curvature as the state input. And updating the cost value of each grid in the path area based on each action through an autonomous learning process of reinforcement learning on the basis of the path area generated by the cost map so that the path follows the action with the minimum cost value selected at any time.

An embodiment of the present invention further provides a path following system, and as shown in fig. 6, the path following system 60 includes: the initialization module 601 is configured to obtain a planned path from a starting point grid to an end point grid and a path region obtained based on planned path expansion according to a grid map, and obtain an initial value of each grid in the path region for avoiding an obstacle; higher cost valueA new module 602, configured to traverse from the starting grid to obtain each grid in the path region based on each action a in the action set _i The cost value of (2), comprising: obtaining a current grid s _n Performing each action a _i Next grid s that can be reached _n+1 As the current grid s _n Based on each action a _i Current profit R _n (ii) a Obtain each next grid s _n+1 Re-executing each action a _i Reachable next grid s _n+2 Maximum value max (Q (s)) among the initial values of (a) _n+1 ,a _i ) ); according to the current profit R _n And maximum value max (Q(s) _n+1 ,a _i ) To obtain the current grid s _n Based on each action a _i Cost value Q(s) _n ,a _i )。

Further, the path following system 60 further includes: a follow control module 603 for controlling the current grid s _n Execution cost value Q(s) _n ,a _i ) The minimum cost value of the corresponding action.

The initialization module 601, the cost value updating module 602, and the follow-up control module 603 may be respectively configured to execute step/stage S10, step/stage S20, and step/stage S30 of the foregoing embodiments, so as to establish a path region as a constraint within an allowable error range of a planned path, update a raster cost value in the path region by using a Q-Learning algorithm, and perform follow-up control on the planned path by selecting an action with a minimum cost value at each raster after the update.

Embodiments of the present invention further provide an electronic device, including a processor and a memory, where the memory stores executable instructions, and the processor is configured to execute the steps of the path following method in the foregoing embodiments by executing the executable instructions.

As described above, the electronic device of the present invention can establish a path region within the allowable error range of the planned path, and perform training based on the path region; and updating and training the cost value of each grid in the path area based on each action based on a Q-Learning algorithm, so that the action with the minimum cost value can be selected for each grid after the training is finished to carry out follow control on the planned path.

Fig. 7 is a schematic structural diagram of an electronic device in an embodiment of the present invention, and it should be understood that fig. 7 only schematically illustrates various modules, and these modules may be virtual software modules or actual hardware modules, and the combination, the splitting, and the addition of the remaining modules of these modules are within the protection scope of the present invention.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Accordingly, various aspects of the present invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" platform.

As shown in fig. 7, electronic device 700 is embodied in the form of a general purpose computing device. The components of the electronic device 700 may include, but are not limited to: at least one processing unit 710, at least one memory unit 720, a bus 730 connecting the different platform components (including memory unit 720 and processing unit 710), a display unit 740, etc.

Wherein the storage unit stores program code which can be executed by the processing unit 710 such that the processing unit 710 performs the steps according to various exemplary embodiments of the present invention as described in the path following method section above in this description. For example, the processing unit 710 may perform the steps shown in fig. 1-3 and fig. 5, respectively.

The storage unit 720 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)7201 and/or a cache memory unit 7202, and may further include a read only memory unit (ROM) 7203.

The storage unit 720 may also include a program/utility 7204 having a set (at least one) of program modules 7205, such program modules 7205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 730 may be any representation of one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 700 may also communicate with one or more external devices 800 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 700, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 700 to communicate with one or more other computing devices. Such communication may be through input/output (I/O) interface 750. Also, the electronic device 700 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 760. The network adapter 760 may communicate with other modules of the electronic device 700 via the bus 730. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 700, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage platforms, to name a few.

An embodiment of the present invention further provides a computer-readable storage medium, which is used for storing a program, and when the program is executed, the steps of the path following method of the above-mentioned embodiment are implemented. In some possible embodiments, the various aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps of the various exemplary embodiments described in the above-mentioned path following method section of the present description, when the program product is run on the terminal device.

As described above, the computer-readable storage medium of the present invention can establish a path region within a planned path allowable error range, and perform training based on the path region; and updating and training the cost value of each grid in the path area based on each action based on a Q-Learning algorithm, so that the action with the minimum cost value can be selected for each grid after the training is finished to carry out follow control on the planned path.

Fig. 8 is a schematic structural diagram of a computer-readable storage medium of the present invention. Referring to fig. 8, a program product 900 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this respect, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In situations involving remote computing devices, the remote computing devices may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to external computing devices (e.g., through the internet using an internet service provider).

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, numerous simple deductions or substitutions may be made without departing from the spirit of the invention, which shall be deemed to belong to the scope of the invention.

Claims

1. A path following method, comprising the steps of:

according to the grid map, obtaining a planned path from a starting grid to an end grid and a path area obtained based on planned path expansion, and obtaining an initial value of each grid for avoiding obstacles in the path area;

the step of obtaining the path area based on the planned path expansion comprises the following steps: obtaining a free movement area of the grid map according to the cost map, expanding preset distances from the planning path to two sides in the free movement area, and generating two boundaries which are positioned at two sides of the planning path and are parallel to the planning path; generating a path area between the two boundaries by taking the two boundaries as obstacles, wherein each grid in the path area has an initial value for avoiding the obstacles;

traversing from the starting point grid to obtain each grid in the path area based on each action a in the action set _i The cost value of (2) includes:

obtaining a current grid s _n Perform each action a _i Next grid s that can be reached _n+1 As the current grid s _n Based on each action a _i Current profit R _n ；

Obtain each next grid s _n+1 Then each action a is executed _i Reachable next grid s _n+2 Maximum value max (Q (s)) among the initial values of (a) _n+1 ,a _i ))；

According to the current profit R _n And maximum value max (Q(s) _n+1 ,a _i ) To obtain the current grid s _n Based on each action a _i Cost value Q(s) _n ,a _i )。

2. The path following method according to claim 1, further comprising the steps of:

at the current grid s _n Execution cost value Q(s) _n ,a _i ) The minimum cost value of the corresponding action.

3. The path following method according to claim 1, wherein a cost value Q(s) is obtained _n ,a _i ) According to the current profit R _n And the current grid s _n Based on each action a _i Future profit of (W) _n Obtaining the cost value Q(s) _n ,a _i ) Future profit W _n ＝γ(max(Q(s _n+1 ,a _i ) Y) is a discount factor representing the degree of sacrificing current revenue in exchange for future revenue, 0<γ<1。

4. A path following method according to claim 3, characterized in that a cost value Q(s) is obtained _n ,a _i ) The formula of (1) is as follows: q(s) _n ,a _i )＝R _n +γ(max(Q(s _n+1 ,a _i )))。

5. The path-following method of claim 1, wherein a cost value Q(s) is obtained _n ,a _i ) Further comprising the steps of:

detecting the current grid s _n Whether the position is in the path area or not, if so, obtaining the current profit R _n If not, the grid is traversed again from the starting point.

6. The path following method of claim 1, wherein traversing each grid within the obtained-path region is based on each action a _i The step of cost value of (a) further comprises:

and detecting the position of the passed grid in real time, and when the position of the passed grid exceeds the path area, starting to traverse the grid from the starting point.

7. The path-following method of claim 1, wherein the traversing from the origin raster to obtain each raster within the path region is based on each action a _i The step of cost value of (2) comprises:

executing any action in the action set from the starting grid to the current grid s _n ；

Obtaining a current grid s _n Based on each action a _i Cost value Q(s) _n ,a _i ) (ii) a And

will be the current grid s _n As a starting point grid, traversing each grid in the obtained path area based on each action a _i The cost value of (2).

8. Path following method according to claim 1, wherein each action a of the set of actions is performed at a preset frequency _i 。

9. The path following method according to claim 1, wherein the action set includes j x k actions, i ≦ j ≦ k 1, j being j different speeds in a range from a minimum speed to a maximum speed, k being k different angles in a range from a minimum angle to a maximum angle.

10. A path following system, comprising:

the initialization module obtains a planned path from a starting point grid to an end point grid and a path area obtained based on planned path expansion according to a grid map, and obtains an initial value of each grid in the path area for avoiding obstacles;

the initialization module obtains a path area based on planned path expansion, and the method comprises the following steps: obtaining a free movement area of the grid map according to the cost map, expanding preset distances from the planning path to two sides in the free movement area, and generating two boundaries which are positioned at two sides of the planning path and are parallel to the planning path; generating a path area between the two boundaries by taking the two boundaries as obstacles, wherein each grid in the path area has an initial value for avoiding the obstacles;

a training module, wherein the training module traverses from the starting point grid to obtain each grid in the path area based on each action a in the action set _i The cost value of (2), comprising:

Obtaining each next grid s _n+1 Then each action a is executed _i Reachable next grid s _n+2 Maximum value max (Q (s)) among the initial values of (a) _n+1 ,a _i ))；

11. The path following system of claim 10, further comprising:

a follow control module for following the current grid s _n Execution cost value Q(s) _n ,a _i ) The minimum cost value of the corresponding action.

12. An electronic device comprising a processor and a memory, wherein the memory is configured to store executable instructions, and wherein the processor is configured to perform the steps of the path following method of any of claims 1 to 9 via execution of the executable instructions.

13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the path following method according to any one of claims 1 to 9.