CN109405843A

CN109405843A - A kind of paths planning method and device and mobile device

Info

Publication number: CN109405843A
Application number: CN201811105686.5A
Authority: CN
Inventors: 钱德恒; 任冬淳; 丁曙光; 付圣; 韩勤
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2018-09-21
Filing date: 2018-09-21
Publication date: 2019-03-01
Anticipated expiration: 2038-09-21
Also published as: CN109405843B

Abstract

The application provides a kind of paths planning method and device, mobile device and computer readable storage medium.Wherein, paths planning method includes: to sample according to initial samples strategy to current environment, obtains multiple state sample points；Corresponding first value of each state sample point is obtained based on first path planning algorithm；Corresponding second value of each state sample point is obtained based on the second path planning algorithm；Summation is weighted to the first value and the second value, obtains the corresponding value of each state sample point；Driving path planning is determined based on the currently corresponding value of each state sample point.The present embodiment, it realizes comprehensive two kinds of path planning algorithms and determines current driving path, be both adapted to complicated driving environment, reduce the gap with the manipulation behavior of human operators, the amount of operational data for needing to record can be reduced again, so that identified current driving path is more reasonable.

Description

A kind of paths planning method and device and mobile device

Technical field

This application involves Path Planning Technique more particularly to a kind of paths planning methods and device, mobile device and calculating Machine readable storage medium storing program for executing.

Background technique

With the development of computer technology and artificial intelligence, unmanned vehicle becomes the important research direction of robot field And research hotspot.The path planning and control strategy of unmanned vehicle refer to the plan that unmanned vehicle selects its own to act under various regimes Slightly.The movement of unmanned vehicle includes acceleration, deceleration, steering, whistle, switch light etc..Path planning and control for unmanned vehicle Strategy mainly has two major classes method at present, and one kind is the method based on heuristic rule, and another kind of is the side based on expert's demonstration Method.

Method based on heuristic rule, exactly by the rule artificially formulated come the path planning of specification unmanned vehicle and control System, these rules are the rules that engineer praises in common sense and instinctively.For example, a rule can make to make unmanned vehicle most Amount is located at lane center position, and another rule, which can be, allows unmanned vehicle as far as possible far from barrier.

Based on the method for expert's demonstration, the path planning that a large amount of human operators carry out during driving is exactly recorded It with control data, then allows computer learn from these data, imitate planning that the mankind make and control operation, finally allow meter Calculation machine association carries out path planning and control to unmanned vehicle.

But the method based on heuristic rule, it is sometimes difficult to adapt to complicated driving environment, and obtained by rule Path planning and the manipulation behavior gap of control strategy and human operators are larger.And the method based on expert's demonstration needs to record greatly The operation data of amount needs to consume the resources such as substantial contribution, time.

Summary of the invention

In view of this, the application provides a kind of paths planning method and device, mobile device and computer-readable storage medium Matter.

Specifically, the application is achieved by the following technical solution:

According to the first aspect of the embodiments of the present disclosure, a kind of paths planning method is provided, which comprises

Current environment is sampled according to initial samples strategy, obtains multiple state sample points；

Corresponding first value of each state sample point is obtained based on first path planning algorithm；

Corresponding second value of each state sample point is obtained based on the second path planning algorithm；

Summation is weighted to first value and second value, it is corresponding to obtain each state sample point Value；

Current driving path planning is determined based on the currently corresponding value of each state sample point.

In one embodiment, described that driving path planning, packet are determined based on the currently corresponding value of each state sample point It includes:

If present sample strategy is unsatisfactory for the condition of convergence, sampling policy is updated, is carried out according to updated sampling policy Sampling, and continue to execute described based on corresponding first value of each state sample point of first path planning algorithm acquisition and described The operation that corresponding second value of each state sample point is obtained based on the second path planning algorithm, until present sample plan Slightly restrain；

If present sample strategy meets the condition of convergence, determined according to the currently corresponding value of each state sample point current Maximum value path under environment, and using the maximum value path as current driving path.

In one embodiment, the condition of convergence refers to the corresponding sampling density of the present sample strategy and state sample The corresponding valuation of point is directly proportional.

In one embodiment, the update sampling policy, comprising:

The corresponding sampling density of current each state sample point is updated according to gauss hybrid models.

In one embodiment, corresponding first valence of each state sample point is obtained based on first path planning algorithm described Before value, the method also includes:

State value function corresponding with the first path planning algorithm is trained by reverse nitrification enhancement.

In one embodiment, it is described trained by reverse nitrification enhancement it is corresponding with the first path planning algorithm State value function, comprising:

According to the corresponding first path layout data of the first path planning algorithm and it is based on second path planning The second Route Planning Data that algorithm determines, is trained and the first path planning algorithm pair by reverse nitrification enhancement The state value function answered.

In one embodiment, under the basis currently determining current environment of the corresponding value of each state sample point most After big value path, the method also includes:

Maximum value path under the current environment is added in the first path layout data, for updating The state value function.

According to the second aspect of an embodiment of the present disclosure, a kind of path planning apparatus is provided, described device includes:

Sampling module obtains multiple state sample points for sampling according to initial samples strategy to current environment；

First obtains module, for obtaining each state sample that the sampling module obtains based on first path planning algorithm Corresponding first value of this point；

Second obtains module, for obtaining each shape that the sampling module obtains based on the second path planning algorithm Corresponding second value of state sample point；

Weighted sum module, first value and second value for obtaining to the acquisition module add Power summation, obtains the corresponding value of each state sample point；

Determining module, the corresponding value of current each state sample point for being obtained based on the weighted sum module are true Determine driving path planning.

In one embodiment, determining module includes:

If processing submodule is unsatisfactory for the condition of convergence for present sample strategy, sampling policy is updated, after update Sampling policy sampled, and continue to execute that described based on first path planning algorithm to obtain each state sample point corresponding First value obtains the operation that each state sample point corresponding second is worth based on the second path planning algorithm with described, Until present sample strategy is restrained；

Submodule is determined, if meeting the condition of convergence for present sample strategy, according to current each state sample point pair The value answered determines the maximum value path under current environment, and using the maximum value path as current driving path.

According to the third aspect of an embodiment of the present disclosure, a kind of computer readable storage medium is provided, the storage medium is deposited Computer program is contained, the computer program is for executing above-mentioned paths planning method.

According to a fourth aspect of embodiments of the present disclosure, a kind of mobile device is provided, including processor, memory and is stored in On the memory and the computer program that can run on a processor, the processor are realized when executing the computer program Above-mentioned paths planning method.

The embodiment of the present application obtains multiple state samples by sampling according to initial samples strategy to current environment Point obtains corresponding first value of each state sample point by first path planning algorithm and the second path planning algorithm respectively It is worth with second, and summation is weighted to the first value and the second value, obtain the corresponding value of each state sample point, so Driving path planning is determined according to the currently corresponding value of each state sample point afterwards, to realize comprehensive two kinds of path plannings Algorithm determines current driving path, has both been adapted to complicated driving environment, reduces the gap with the manipulation behavior of human operators, The amount of operational data for needing to record can be reduced again, so that identified current driving path is more reasonable.

It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.

Detailed description of the invention

The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets implementation of the invention Example, and be used to explain the principle of the present invention together with specification.

Fig. 1 is a kind of flow chart of paths planning method shown in one exemplary embodiment of the application；

Fig. 2 is the flow chart of another paths planning method shown in one exemplary embodiment of the application；

Fig. 3 is the flow chart of another paths planning method shown in one exemplary embodiment of the application；

Fig. 4 is the flow chart of another paths planning method shown in one exemplary embodiment of the application；

Fig. 5 is a kind of hardware structure diagram of mobile device where the application path planning apparatus；

Fig. 6 is a kind of block diagram of path planning apparatus shown in one exemplary embodiment of the application；

Fig. 7 is the block diagram of another path planning apparatus shown in one exemplary embodiment of the application.

Specific embodiment

Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the application.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the application.

It is only to be not intended to be limiting the application merely for for the purpose of describing particular embodiments in term used in this application. It is also intended in the application and the "an" of singular used in the attached claims, " described " and "the" including majority Form, unless the context clearly indicates other meaning.It is also understood that term "and/or" used herein refers to and wraps It may be combined containing one or more associated any or all of project listed.

It will be appreciated that though various information, but this may be described using term first, second, third, etc. in the application A little information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other out.For example, not departing from In the case where the application range, the first information can also be referred to as the second information, and similarly, the second information can also be referred to as One information.Depending on context, word as used in this " if " can be construed to " ... when " or " when ... When " or " in response to determination ".

Fig. 1 is a kind of flow chart of paths planning method shown in one exemplary embodiment of the application, and this method can answer For mobile device, which may include but is not limited to unmanned vehicle, as shown in Figure 1, this method comprises:

Step S100 samples current environment according to initial samples strategy, obtains multiple state sample points.

Wherein, initial samples strategy can be uniform sampling.For example, uniform sampling can be carried out to present road environment, Obtain multiple state sample points.

Step S101 obtains corresponding first value of each state sample point based on first path planning algorithm.

Step S102 obtains corresponding second value of each state sample point based on the second path planning algorithm.

Wherein, first path planning algorithm can be but be not limited to expert's exemplary algorithm, and the second path planning algorithm can Think but is not limited to heuristic rule algorithm.

In this embodiment it is possible to based on the state value function corresponding with first path planning algorithm trained in advance Obtain corresponding first value of each state sample point.Likewise it is possible to be calculated based on what is trained in advance with the second path planning The corresponding state value function of method obtains corresponding second value of each state sample point.

It should be noted that not stringent successive of above-mentioned steps S101 and step S102 executes sequence, it can first hold Row step S101, it is rear to execute step S102, step S102 can also be first carried out, it is rear to execute step S101.

Step S103 is weighted summation to the first value and the second value, obtains the corresponding valence of each state sample point Value.

It is assumed that each state sample point corresponding first is worth and uses V_irlIt indicates, each state sample point corresponding second Value uses V_objIt indicates, then can be calculated by the following formula the value of each state sample point:

V_s=1/Z* (V_irl+lambda*V_obj)

Wherein, Z is normaliztion constant, and lambda is the weight for balancing expert's exemplary algorithm and heuristic rule algorithm.

It should be noted that lambda can be with dynamic change.For example, when just starting, the second power corresponding with the second value Weight can be bigger, and over time, the training data of driver is more and more, so that it may increase to these data It relies on, and therefore reducing can make the first weight corresponding with the first value increasing the dependence of rule.Thus may be used See, with the increase of the data of accumulation, lambda can be gradually become smaller.

In this embodiment, by being weighted summation to the first value and the second value, each state sample point is obtained Corresponding value, to achieve the purpose that carry out comprehensive valuation to each state sample point.

Step S104 determines driving path planning based on the currently corresponding value of each state sample point.

In this embodiment, the purpose of path planning is that a good path is found in current environment.An and paths Quality depend on this paths on each state sample point value.Once it is determined that the value of good each state sample point, just As soon as the highest path of comprehensive value can be found, the path of control unmanned vehicle traveling is also had found.Therefore, the embodiment In, by being worth the maximum value path determined under current environment according to current each state sample point is corresponding, and will be maximum Path is worth as current driving path.

Above-described embodiment obtains multiple state sample points by sampling according to initial samples strategy to current environment, Respectively by first path planning algorithm and the second path planning algorithm obtain corresponding first value of each state sample point and Second value, and summation is weighted to the first value and the second value, the corresponding value of each state sample point is obtained, then Driving path planning is determined based on the currently corresponding value of each state sample point, is calculated to realize comprehensive two kinds of path plannings Method determines current driving path, has both been adapted to complicated driving environment, reduces the gap with the manipulation behavior of human operators, again The amount of operational data for needing to record can be reduced, so that identified current driving path is more reasonable.

Fig. 2 is the flow chart of another paths planning method shown in one exemplary embodiment of the application, as shown in Fig. 2, This method comprises:

Step S201 samples current environment according to initial samples strategy, obtains multiple state sample points.

Step S202 obtains corresponding first value of each state sample point based on first path planning algorithm, based on the Two path planning algorithms obtain corresponding second value of each state sample point.

Step S203 is weighted summation to the first value and the second value, obtains the corresponding valence of each state sample point Value.

Step S204, judges whether present sample strategy meets the condition of convergence, if satisfied, S205 is thened follow the steps, if not Meet, then follow the steps S206,

Wherein, the condition of convergence can refer to the valuation corresponding with state sample point of the corresponding sampling density of present sample strategy at Direct ratio.

For example, the corresponding valuation of state sample point 1 is 10, then it is 10 to the sampling density around state sample point 1, state The corresponding valuation of sample point 2 is 5, then is 5 to the sampling density around state sample point 2.

Step S205 is worth the maximum value road determined under current environment according to current each state sample point is corresponding Diameter, and using maximum value path as current driving path, operation terminates.

Step S206 updates sampling policy, is sampled according to updated sampling policy, and continue to execute step S202。

Wherein, updating sampling policy refers to update to the sampling density around each state sample point.

In this embodiment it is possible to be updated according to gauss hybrid models (GMM), currently each state sample point is corresponding is adopted Sample density is fitted the valuation of each state samples with GMM, correspondingly just obtained new sampling density.

In above-described embodiment, when present sample strategy is unsatisfactory for the condition of convergence, sampling policy is updated, until updated Sampling policy meets the condition of convergence, is sampled according to value to state sample point to realize, increases to high value shape Sampling density near state sample point realizes the intense adjustment to path planning.

Fig. 3 is the flow chart of another paths planning method shown in one exemplary embodiment of the application, as shown in figure 3, This method comprises:

Step S300 trains state value letter corresponding with first path planning algorithm by reverse nitrification enhancement Number.

It can will be based on the in order to reduce first path layout data such as expert's example data of acquisition, in the embodiment The second Route Planning Data that two path planning algorithms such as heuristic rule algorithm determines is added to existing first path rule It draws in data, state value function corresponding with first path planning algorithm is then trained by reverse nitrification enhancement.

The state value function is the mapping being worth from state to the state first.Its input is state, and output is First value of the state.The state value function is used to export the first value of a certain state.

Step S301 samples current environment according to initial samples strategy, obtains multiple state sample points.

Wherein, above-mentioned steps S300 and S301, which has no, stringent executes sequence, it can step S300 is first carried out, it is rear to execute Step S301 can also first carry out step S301, rear to execute step S300.

Step S302 obtains corresponding first value of each state sample point based on first path planning algorithm, based on the Two path planning algorithms obtain corresponding second value of each state sample point.

Wherein it is possible to which training corresponding with first path planning algorithm state value function by step S300 obtains the One value.Corresponding second value of each state sample point is obtained by the second path planning algorithm.

Step S303 is weighted summation to the first value and the second value, obtains the corresponding valence of each state sample point Value.

Step S304 is worth the maximum value road determined under current environment according to current each state sample point is corresponding Diameter, and using maximum value path as current driving path.

Above-described embodiment trains state value corresponding with first path planning algorithm by reverse nitrification enhancement Function, to provide condition for subsequent first value that obtains.

Fig. 4 is the flow chart of another paths planning method shown in one exemplary embodiment of the application, as shown in figure 4, This method comprises:

Step S400 according to the corresponding first path layout data of first path planning algorithm and is based on the second path planning The second Route Planning Data that algorithm determines, is trained corresponding with first path planning algorithm by reverse nitrification enhancement State value function.

Step S401 samples current environment according to initial samples strategy, obtains multiple state sample points.

Step S402 obtains corresponding first value of each state sample point based on first path planning algorithm, based on the Two path planning algorithms obtain corresponding second value of each state sample point.

Step S403 is weighted summation to the first value and the second value, obtains the corresponding valence of each state sample point Value.

Step S404 is worth the maximum value road determined under current environment according to current each state sample point is corresponding Diameter, and using maximum value path as current driving path.

Maximum value path under current environment is added in first path layout data by step S405, for more New state cost function.

In this embodiment, after finding the maximum value path under current environment, can by under current environment most Big value path is added in first path layout data, to be used for subsequent update state value function, so that according to more The value that state value function after new determines is more acurrate.

Above-described embodiment, by the way that the maximum value path under current environment is added in first path layout data, with For updating state value function, so that more acurrate according to the value that updated state value function determines.

Corresponding with the embodiment of aforesaid paths planing method, present invention also provides the embodiments of path planning apparatus.

The embodiment of the application path planning apparatus can be using on the mobile apparatus.Wherein, which can be Unmanned vehicle.Installation practice can also be realized by software realization by way of hardware or software and hardware combining.Such as figure It is a kind of hardware structure diagram of 500 place mobile device of the application path planning apparatus shown in 5, which includes processing Device 510, memory 520 and it is stored in the computer program that can be run on memory 520 and on processor 510, the processor Above-mentioned paths planning method is realized when 510 execution computer program.In addition to processor 510 shown in fig. 5 and memory 520 it Outside, the mobile device in embodiment where device can also include other hardware generally according to the actual functional capability of the path planning, This is repeated no more.

Fig. 6 is a kind of block diagram of path planning apparatus shown in one exemplary embodiment of the application, as shown in fig. 6, the road Diameter device for planning includes: that sampling module 60, first obtains the acquisition of module 61, second module 62, weighted sum module 63 and determines Module 64.

Sampling module 60 obtains multiple state sample points for sampling according to initial samples strategy to current environment.

First acquisition module 61 is used to obtain each state sample that sampling module 61 obtains based on first path planning algorithm Corresponding first value of this point.

Second acquisition module 62 is used to obtain each state sample that sampling module 61 obtains based on the second path planning algorithm Corresponding second value of this point.

Weighted sum module 63 is used to be weighted summation to the first value and the second value that obtain the acquisition of module 62, obtains To the corresponding value of each state sample point.

V_s=1/Z* (V_irl+lambda*V_obj)

Current each state sample point that determining module 64 is used to obtain based on weighted sum module 63 is corresponding to be worth really Determine driving path planning.

Above-described embodiment obtains multiple state sample points by sampling according to initial samples strategy to current environment, Respectively by first path planning algorithm and the second path planning algorithm obtain corresponding first value of each state sample point and Second value, and summation is weighted to the first value and the second value, the corresponding value of each state sample point is obtained, then Driving path planning is determined according to the currently corresponding value of each state sample point, is calculated to realize comprehensive two kinds of path plannings Method determines current driving path, has both been adapted to complicated driving environment, reduces the gap with the manipulation behavior of human operators, again The amount of operational data for needing to record can be reduced, so that identified current driving path is more reasonable.

Fig. 7 is the block diagram of another path planning apparatus shown in one exemplary embodiment of the application, as shown in fig. 7, On the basis of above-mentioned embodiment illustrated in fig. 6, determining module 64 includes: processing submodule 641 and determining submodule 642.

If processing submodule 641 is unsatisfactory for the condition of convergence for present sample strategy, sampling policy is updated, according to update Sampling policy afterwards is sampled, and is continued to execute described corresponding based on each state sample point of first path planning algorithm acquisition First value and it is described based on the second path planning algorithm obtain each state sample point it is corresponding second be worth behaviour Make, until present sample strategy is restrained.

If determining, submodule 642 meets the condition of convergence for present sample strategy, according to current each state sample point It is corresponding to be worth the maximum value path determined under current environment, and using maximum value path as current driving path.

The function of each unit and the realization process of effect are specifically detailed in the above method and correspond to step in above-mentioned apparatus Realization process, details are not described herein.

In the exemplary embodiment, a kind of computer readable storage medium is additionally provided, which is stored with calculating Machine program, the computer program is for executing above-mentioned paths planning method, wherein the paths planning method includes:

Summation is weighted to the first value and the second value, obtains the corresponding value of each state sample point；

Driving path planning is determined based on the currently corresponding value of each state sample point.

Above-mentioned computer readable storage medium can be read-only memory (ROM), random access memory (RAM), CD Read-only memory (CD-ROM), tape, floppy disk and optical data storage devices etc..

For device embodiment, since it corresponds essentially to embodiment of the method, so related place is referring to method reality Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein being used as separate part description Unit may or may not be physically separated, component shown as a unit may or may not be Physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to the actual needs Some or all of the modules therein is selected to realize the purpose of application scheme.Those of ordinary skill in the art are not paying wound In the case that the property made is worked, it can understand and implement.

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the application Its embodiment.This application is intended to cover any variations, uses, or adaptations of the application, these modifications, purposes or Person's adaptive change follows the general principle of the application and including the undocumented common knowledge in the art of the application Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the application are wanted by right It asks and points out.

It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want There is also other identical elements in the process, method of element, commodity or equipment.

The above is only the preferred embodiments of the application, not to limit the application, it is all in spirit herein and Within principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the application protection.

Claims

1. a kind of paths planning method, which is characterized in that the described method includes:

Summation is weighted to first value and second value, obtains the corresponding valence of each state sample point Value；

2. the method according to claim 1, wherein described based on the currently corresponding value of each state sample point Determine that driving path is planned, comprising:

If present sample strategy is unsatisfactory for the condition of convergence, sampling policy is updated, is sampled according to updated sampling policy, And it continues to execute described be worth based on each state sample point of first path planning algorithm acquisition corresponding first and is based on described Second path planning algorithm obtains the operation of corresponding second value of each state sample point, until present sample strategy is received It holds back；

If present sample strategy meets the condition of convergence, current environment is determined according to the currently corresponding value of each state sample point Under maximum value path, and using the maximum value path as current driving path.

3. according to the method described in claim 2, it is characterized in that, the condition of convergence refers to that the present sample strategy is corresponding Sampling density valuation corresponding to state sample point it is directly proportional.

4. according to the method described in claim 2, it is characterized in that, the update sampling policy, comprising:

5. method according to claim 1 or 2, which is characterized in that obtained often described based on first path planning algorithm Before a state sample point corresponding first is worth, the method also includes:

6. according to the method described in claim 5, it is characterized in that, it is described by reverse nitrification enhancement train with it is described The corresponding state value function of first path planning algorithm, comprising:

According to the corresponding first path layout data of the first path planning algorithm and it is based on second path planning algorithm The second determining Route Planning Data is trained corresponding with the first path planning algorithm by reverse nitrification enhancement State value function.

7. according to the method described in claim 6, it is characterized in that, in the current corresponding valence of each state sample point of the basis It is worth after the maximum value path determined under current environment, the method also includes:

Maximum value path under the current environment is added in the first path layout data, with described for updating State value function.

8. a kind of path planning apparatus, which is characterized in that described device includes:

First obtains module, for obtaining each state sample point that the sampling module obtains based on first path planning algorithm Corresponding first value；

Second obtains module, for obtaining each state sample that the sampling module obtains based on the second path planning algorithm Corresponding second value of this point；

Weighted sum module, first value and second value for obtaining to the acquisition module are weighted and ask With obtain the corresponding value of each state sample point；

Determining module, the corresponding value of current each state sample point for being obtained based on the weighted sum module determine row Sail path planning.

9. a kind of computer readable storage medium, which is characterized in that the storage medium is stored with computer program, the calculating Machine program is used to execute any paths planning method of the claims 1-7.

10. a kind of mobile device, which is characterized in that including processor, memory and be stored on the memory and can locate The computer program run on reason device, the processor realize that the claims 1-7 is any when executing the computer program The paths planning method.