CN109405843A - A kind of paths planning method and device and mobile device - Google Patents

A kind of paths planning method and device and mobile device Download PDF

Info

Publication number
CN109405843A
CN109405843A CN201811105686.5A CN201811105686A CN109405843A CN 109405843 A CN109405843 A CN 109405843A CN 201811105686 A CN201811105686 A CN 201811105686A CN 109405843 A CN109405843 A CN 109405843A
Authority
CN
China
Prior art keywords
value
sample point
state sample
path planning
path
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811105686.5A
Other languages
Chinese (zh)
Other versions
CN109405843B (en
Inventor
钱德恒
任冬淳
丁曙光
付圣
韩勤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sankuai Online Technology Co Ltd filed Critical Beijing Sankuai Online Technology Co Ltd
Priority to CN201811105686.5A priority Critical patent/CN109405843B/en
Publication of CN109405843A publication Critical patent/CN109405843A/en
Application granted granted Critical
Publication of CN109405843B publication Critical patent/CN109405843B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/34Route searching; Route guidance
    • G01C21/3446Details of route searching algorithms, e.g. Dijkstra, A*, arc-flags, using precalculated routes

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Traffic Control Systems (AREA)

Abstract

The application provides a kind of paths planning method and device, mobile device and computer readable storage medium.Wherein, paths planning method includes: to sample according to initial samples strategy to current environment, obtains multiple state sample points;Corresponding first value of each state sample point is obtained based on first path planning algorithm;Corresponding second value of each state sample point is obtained based on the second path planning algorithm;Summation is weighted to the first value and the second value, obtains the corresponding value of each state sample point;Driving path planning is determined based on the currently corresponding value of each state sample point.The present embodiment, it realizes comprehensive two kinds of path planning algorithms and determines current driving path, be both adapted to complicated driving environment, reduce the gap with the manipulation behavior of human operators, the amount of operational data for needing to record can be reduced again, so that identified current driving path is more reasonable.

Description

A kind of paths planning method and device and mobile device
Technical field
This application involves Path Planning Technique more particularly to a kind of paths planning methods and device, mobile device and calculating Machine readable storage medium storing program for executing.
Background technique
With the development of computer technology and artificial intelligence, unmanned vehicle becomes the important research direction of robot field And research hotspot.The path planning and control strategy of unmanned vehicle refer to the plan that unmanned vehicle selects its own to act under various regimes Slightly.The movement of unmanned vehicle includes acceleration, deceleration, steering, whistle, switch light etc..Path planning and control for unmanned vehicle Strategy mainly has two major classes method at present, and one kind is the method based on heuristic rule, and another kind of is the side based on expert's demonstration Method.
Method based on heuristic rule, exactly by the rule artificially formulated come the path planning of specification unmanned vehicle and control System, these rules are the rules that engineer praises in common sense and instinctively.For example, a rule can make to make unmanned vehicle most Amount is located at lane center position, and another rule, which can be, allows unmanned vehicle as far as possible far from barrier.
Based on the method for expert's demonstration, the path planning that a large amount of human operators carry out during driving is exactly recorded It with control data, then allows computer learn from these data, imitate planning that the mankind make and control operation, finally allow meter Calculation machine association carries out path planning and control to unmanned vehicle.
But the method based on heuristic rule, it is sometimes difficult to adapt to complicated driving environment, and obtained by rule Path planning and the manipulation behavior gap of control strategy and human operators are larger.And the method based on expert's demonstration needs to record greatly The operation data of amount needs to consume the resources such as substantial contribution, time.
Summary of the invention
In view of this, the application provides a kind of paths planning method and device, mobile device and computer-readable storage medium Matter.
Specifically, the application is achieved by the following technical solution:
According to the first aspect of the embodiments of the present disclosure, a kind of paths planning method is provided, which comprises
Current environment is sampled according to initial samples strategy, obtains multiple state sample points;
Corresponding first value of each state sample point is obtained based on first path planning algorithm;
Corresponding second value of each state sample point is obtained based on the second path planning algorithm;
Summation is weighted to first value and second value, it is corresponding to obtain each state sample point Value;
Current driving path planning is determined based on the currently corresponding value of each state sample point.
In one embodiment, described that driving path planning, packet are determined based on the currently corresponding value of each state sample point It includes:
If present sample strategy is unsatisfactory for the condition of convergence, sampling policy is updated, is carried out according to updated sampling policy Sampling, and continue to execute described based on corresponding first value of each state sample point of first path planning algorithm acquisition and described The operation that corresponding second value of each state sample point is obtained based on the second path planning algorithm, until present sample plan Slightly restrain;
If present sample strategy meets the condition of convergence, determined according to the currently corresponding value of each state sample point current Maximum value path under environment, and using the maximum value path as current driving path.
In one embodiment, the condition of convergence refers to the corresponding sampling density of the present sample strategy and state sample The corresponding valuation of point is directly proportional.
In one embodiment, the update sampling policy, comprising:
The corresponding sampling density of current each state sample point is updated according to gauss hybrid models.
In one embodiment, corresponding first valence of each state sample point is obtained based on first path planning algorithm described Before value, the method also includes:
State value function corresponding with the first path planning algorithm is trained by reverse nitrification enhancement.
In one embodiment, it is described trained by reverse nitrification enhancement it is corresponding with the first path planning algorithm State value function, comprising:
According to the corresponding first path layout data of the first path planning algorithm and it is based on second path planning The second Route Planning Data that algorithm determines, is trained and the first path planning algorithm pair by reverse nitrification enhancement The state value function answered.
In one embodiment, under the basis currently determining current environment of the corresponding value of each state sample point most After big value path, the method also includes:
Maximum value path under the current environment is added in the first path layout data, for updating The state value function.
According to the second aspect of an embodiment of the present disclosure, a kind of path planning apparatus is provided, described device includes:
Sampling module obtains multiple state sample points for sampling according to initial samples strategy to current environment;
First obtains module, for obtaining each state sample that the sampling module obtains based on first path planning algorithm Corresponding first value of this point;
Second obtains module, for obtaining each shape that the sampling module obtains based on the second path planning algorithm Corresponding second value of state sample point;
Weighted sum module, first value and second value for obtaining to the acquisition module add Power summation, obtains the corresponding value of each state sample point;
Determining module, the corresponding value of current each state sample point for being obtained based on the weighted sum module are true Determine driving path planning.
In one embodiment, determining module includes:
If processing submodule is unsatisfactory for the condition of convergence for present sample strategy, sampling policy is updated, after update Sampling policy sampled, and continue to execute that described based on first path planning algorithm to obtain each state sample point corresponding First value obtains the operation that each state sample point corresponding second is worth based on the second path planning algorithm with described, Until present sample strategy is restrained;
Submodule is determined, if meeting the condition of convergence for present sample strategy, according to current each state sample point pair The value answered determines the maximum value path under current environment, and using the maximum value path as current driving path.
According to the third aspect of an embodiment of the present disclosure, a kind of computer readable storage medium is provided, the storage medium is deposited Computer program is contained, the computer program is for executing above-mentioned paths planning method.
According to a fourth aspect of embodiments of the present disclosure, a kind of mobile device is provided, including processor, memory and is stored in On the memory and the computer program that can run on a processor, the processor are realized when executing the computer program Above-mentioned paths planning method.
The embodiment of the present application obtains multiple state samples by sampling according to initial samples strategy to current environment Point obtains corresponding first value of each state sample point by first path planning algorithm and the second path planning algorithm respectively It is worth with second, and summation is weighted to the first value and the second value, obtain the corresponding value of each state sample point, so Driving path planning is determined according to the currently corresponding value of each state sample point afterwards, to realize comprehensive two kinds of path plannings Algorithm determines current driving path, has both been adapted to complicated driving environment, reduces the gap with the manipulation behavior of human operators, The amount of operational data for needing to record can be reduced again, so that identified current driving path is more reasonable.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets implementation of the invention Example, and be used to explain the principle of the present invention together with specification.
Fig. 1 is a kind of flow chart of paths planning method shown in one exemplary embodiment of the application;
Fig. 2 is the flow chart of another paths planning method shown in one exemplary embodiment of the application;
Fig. 3 is the flow chart of another paths planning method shown in one exemplary embodiment of the application;
Fig. 4 is the flow chart of another paths planning method shown in one exemplary embodiment of the application;
Fig. 5 is a kind of hardware structure diagram of mobile device where the application path planning apparatus;
Fig. 6 is a kind of block diagram of path planning apparatus shown in one exemplary embodiment of the application;
Fig. 7 is the block diagram of another path planning apparatus shown in one exemplary embodiment of the application.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the application.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the application.
It is only to be not intended to be limiting the application merely for for the purpose of describing particular embodiments in term used in this application. It is also intended in the application and the "an" of singular used in the attached claims, " described " and "the" including majority Form, unless the context clearly indicates other meaning.It is also understood that term "and/or" used herein refers to and wraps It may be combined containing one or more associated any or all of project listed.
It will be appreciated that though various information, but this may be described using term first, second, third, etc. in the application A little information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other out.For example, not departing from In the case where the application range, the first information can also be referred to as the second information, and similarly, the second information can also be referred to as One information.Depending on context, word as used in this " if " can be construed to " ... when " or " when ... When " or " in response to determination ".
Fig. 1 is a kind of flow chart of paths planning method shown in one exemplary embodiment of the application, and this method can answer For mobile device, which may include but is not limited to unmanned vehicle, as shown in Figure 1, this method comprises:
Step S100 samples current environment according to initial samples strategy, obtains multiple state sample points.
Wherein, initial samples strategy can be uniform sampling.For example, uniform sampling can be carried out to present road environment, Obtain multiple state sample points.
Step S101 obtains corresponding first value of each state sample point based on first path planning algorithm.
Step S102 obtains corresponding second value of each state sample point based on the second path planning algorithm.
Wherein, first path planning algorithm can be but be not limited to expert's exemplary algorithm, and the second path planning algorithm can Think but is not limited to heuristic rule algorithm.
In this embodiment it is possible to based on the state value function corresponding with first path planning algorithm trained in advance Obtain corresponding first value of each state sample point.Likewise it is possible to be calculated based on what is trained in advance with the second path planning The corresponding state value function of method obtains corresponding second value of each state sample point.
It should be noted that not stringent successive of above-mentioned steps S101 and step S102 executes sequence, it can first hold Row step S101, it is rear to execute step S102, step S102 can also be first carried out, it is rear to execute step S101.
Step S103 is weighted summation to the first value and the second value, obtains the corresponding valence of each state sample point Value.
It is assumed that each state sample point corresponding first is worth and uses VirlIt indicates, each state sample point corresponding second Value uses VobjIt indicates, then can be calculated by the following formula the value of each state sample point:
Vs=1/Z* (Virl+lambda*Vobj)
Wherein, Z is normaliztion constant, and lambda is the weight for balancing expert's exemplary algorithm and heuristic rule algorithm.
It should be noted that lambda can be with dynamic change.For example, when just starting, the second power corresponding with the second value Weight can be bigger, and over time, the training data of driver is more and more, so that it may increase to these data It relies on, and therefore reducing can make the first weight corresponding with the first value increasing the dependence of rule.Thus may be used See, with the increase of the data of accumulation, lambda can be gradually become smaller.
In this embodiment, by being weighted summation to the first value and the second value, each state sample point is obtained Corresponding value, to achieve the purpose that carry out comprehensive valuation to each state sample point.
Step S104 determines driving path planning based on the currently corresponding value of each state sample point.
In this embodiment, the purpose of path planning is that a good path is found in current environment.An and paths Quality depend on this paths on each state sample point value.Once it is determined that the value of good each state sample point, just As soon as the highest path of comprehensive value can be found, the path of control unmanned vehicle traveling is also had found.Therefore, the embodiment In, by being worth the maximum value path determined under current environment according to current each state sample point is corresponding, and will be maximum Path is worth as current driving path.
Above-described embodiment obtains multiple state sample points by sampling according to initial samples strategy to current environment, Respectively by first path planning algorithm and the second path planning algorithm obtain corresponding first value of each state sample point and Second value, and summation is weighted to the first value and the second value, the corresponding value of each state sample point is obtained, then Driving path planning is determined based on the currently corresponding value of each state sample point, is calculated to realize comprehensive two kinds of path plannings Method determines current driving path, has both been adapted to complicated driving environment, reduces the gap with the manipulation behavior of human operators, again The amount of operational data for needing to record can be reduced, so that identified current driving path is more reasonable.
Fig. 2 is the flow chart of another paths planning method shown in one exemplary embodiment of the application, as shown in Fig. 2, This method comprises:
Step S201 samples current environment according to initial samples strategy, obtains multiple state sample points.
Step S202 obtains corresponding first value of each state sample point based on first path planning algorithm, based on the Two path planning algorithms obtain corresponding second value of each state sample point.
Step S203 is weighted summation to the first value and the second value, obtains the corresponding valence of each state sample point Value.
Step S204, judges whether present sample strategy meets the condition of convergence, if satisfied, S205 is thened follow the steps, if not Meet, then follow the steps S206,
Wherein, the condition of convergence can refer to the valuation corresponding with state sample point of the corresponding sampling density of present sample strategy at Direct ratio.
For example, the corresponding valuation of state sample point 1 is 10, then it is 10 to the sampling density around state sample point 1, state The corresponding valuation of sample point 2 is 5, then is 5 to the sampling density around state sample point 2.
Step S205 is worth the maximum value road determined under current environment according to current each state sample point is corresponding Diameter, and using maximum value path as current driving path, operation terminates.
Step S206 updates sampling policy, is sampled according to updated sampling policy, and continue to execute step S202。
Wherein, updating sampling policy refers to update to the sampling density around each state sample point.
In this embodiment it is possible to be updated according to gauss hybrid models (GMM), currently each state sample point is corresponding is adopted Sample density is fitted the valuation of each state samples with GMM, correspondingly just obtained new sampling density.
In above-described embodiment, when present sample strategy is unsatisfactory for the condition of convergence, sampling policy is updated, until updated Sampling policy meets the condition of convergence, is sampled according to value to state sample point to realize, increases to high value shape Sampling density near state sample point realizes the intense adjustment to path planning.
Fig. 3 is the flow chart of another paths planning method shown in one exemplary embodiment of the application, as shown in figure 3, This method comprises:
Step S300 trains state value letter corresponding with first path planning algorithm by reverse nitrification enhancement Number.
It can will be based on the in order to reduce first path layout data such as expert's example data of acquisition, in the embodiment The second Route Planning Data that two path planning algorithms such as heuristic rule algorithm determines is added to existing first path rule It draws in data, state value function corresponding with first path planning algorithm is then trained by reverse nitrification enhancement.
The state value function is the mapping being worth from state to the state first.Its input is state, and output is First value of the state.The state value function is used to export the first value of a certain state.
Step S301 samples current environment according to initial samples strategy, obtains multiple state sample points.
Wherein, above-mentioned steps S300 and S301, which has no, stringent executes sequence, it can step S300 is first carried out, it is rear to execute Step S301 can also first carry out step S301, rear to execute step S300.
Step S302 obtains corresponding first value of each state sample point based on first path planning algorithm, based on the Two path planning algorithms obtain corresponding second value of each state sample point.
Wherein it is possible to which training corresponding with first path planning algorithm state value function by step S300 obtains the One value.Corresponding second value of each state sample point is obtained by the second path planning algorithm.
Step S303 is weighted summation to the first value and the second value, obtains the corresponding valence of each state sample point Value.
Step S304 is worth the maximum value road determined under current environment according to current each state sample point is corresponding Diameter, and using maximum value path as current driving path.
Above-described embodiment trains state value corresponding with first path planning algorithm by reverse nitrification enhancement Function, to provide condition for subsequent first value that obtains.
Fig. 4 is the flow chart of another paths planning method shown in one exemplary embodiment of the application, as shown in figure 4, This method comprises:
Step S400 according to the corresponding first path layout data of first path planning algorithm and is based on the second path planning The second Route Planning Data that algorithm determines, is trained corresponding with first path planning algorithm by reverse nitrification enhancement State value function.
Step S401 samples current environment according to initial samples strategy, obtains multiple state sample points.
Step S402 obtains corresponding first value of each state sample point based on first path planning algorithm, based on the Two path planning algorithms obtain corresponding second value of each state sample point.
Step S403 is weighted summation to the first value and the second value, obtains the corresponding valence of each state sample point Value.
Step S404 is worth the maximum value road determined under current environment according to current each state sample point is corresponding Diameter, and using maximum value path as current driving path.
Maximum value path under current environment is added in first path layout data by step S405, for more New state cost function.
In this embodiment, after finding the maximum value path under current environment, can by under current environment most Big value path is added in first path layout data, to be used for subsequent update state value function, so that according to more The value that state value function after new determines is more acurrate.
Above-described embodiment, by the way that the maximum value path under current environment is added in first path layout data, with For updating state value function, so that more acurrate according to the value that updated state value function determines.
Corresponding with the embodiment of aforesaid paths planing method, present invention also provides the embodiments of path planning apparatus.
The embodiment of the application path planning apparatus can be using on the mobile apparatus.Wherein, which can be Unmanned vehicle.Installation practice can also be realized by software realization by way of hardware or software and hardware combining.Such as figure It is a kind of hardware structure diagram of 500 place mobile device of the application path planning apparatus shown in 5, which includes processing Device 510, memory 520 and it is stored in the computer program that can be run on memory 520 and on processor 510, the processor Above-mentioned paths planning method is realized when 510 execution computer program.In addition to processor 510 shown in fig. 5 and memory 520 it Outside, the mobile device in embodiment where device can also include other hardware generally according to the actual functional capability of the path planning, This is repeated no more.
Fig. 6 is a kind of block diagram of path planning apparatus shown in one exemplary embodiment of the application, as shown in fig. 6, the road Diameter device for planning includes: that sampling module 60, first obtains the acquisition of module 61, second module 62, weighted sum module 63 and determines Module 64.
Sampling module 60 obtains multiple state sample points for sampling according to initial samples strategy to current environment.
Wherein, initial samples strategy can be uniform sampling.For example, uniform sampling can be carried out to present road environment, Obtain multiple state sample points.
First acquisition module 61 is used to obtain each state sample that sampling module 61 obtains based on first path planning algorithm Corresponding first value of this point.
Second acquisition module 62 is used to obtain each state sample that sampling module 61 obtains based on the second path planning algorithm Corresponding second value of this point.
Wherein, first path planning algorithm can be but be not limited to expert's exemplary algorithm, and the second path planning algorithm can Think but is not limited to heuristic rule algorithm.
In this embodiment it is possible to based on the state value function corresponding with first path planning algorithm trained in advance Obtain corresponding first value of each state sample point.Likewise it is possible to be calculated based on what is trained in advance with the second path planning The corresponding state value function of method obtains corresponding second value of each state sample point.
Weighted sum module 63 is used to be weighted summation to the first value and the second value that obtain the acquisition of module 62, obtains To the corresponding value of each state sample point.
It is assumed that each state sample point corresponding first is worth and uses VirlIt indicates, each state sample point corresponding second Value uses VobjIt indicates, then can be calculated by the following formula the value of each state sample point:
Vs=1/Z* (Virl+lambda*Vobj)
Wherein, Z is normaliztion constant, and lambda is the weight for balancing expert's exemplary algorithm and heuristic rule algorithm.
It should be noted that lambda can be with dynamic change.For example, when just starting, the second power corresponding with the second value Weight can be bigger, and over time, the training data of driver is more and more, so that it may increase to these data It relies on, and therefore reducing can make the first weight corresponding with the first value increasing the dependence of rule.Thus may be used See, with the increase of the data of accumulation, lambda can be gradually become smaller.
In this embodiment, by being weighted summation to the first value and the second value, each state sample point is obtained Corresponding value, to achieve the purpose that carry out comprehensive valuation to each state sample point.
Current each state sample point that determining module 64 is used to obtain based on weighted sum module 63 is corresponding to be worth really Determine driving path planning.
In this embodiment, the purpose of path planning is that a good path is found in current environment.An and paths Quality depend on this paths on each state sample point value.Once it is determined that the value of good each state sample point, just As soon as the highest path of comprehensive value can be found, the path of control unmanned vehicle traveling is also had found.Therefore, the embodiment In, by being worth the maximum value path determined under current environment according to current each state sample point is corresponding, and will be maximum Path is worth as current driving path.
Above-described embodiment obtains multiple state sample points by sampling according to initial samples strategy to current environment, Respectively by first path planning algorithm and the second path planning algorithm obtain corresponding first value of each state sample point and Second value, and summation is weighted to the first value and the second value, the corresponding value of each state sample point is obtained, then Driving path planning is determined according to the currently corresponding value of each state sample point, is calculated to realize comprehensive two kinds of path plannings Method determines current driving path, has both been adapted to complicated driving environment, reduces the gap with the manipulation behavior of human operators, again The amount of operational data for needing to record can be reduced, so that identified current driving path is more reasonable.
Fig. 7 is the block diagram of another path planning apparatus shown in one exemplary embodiment of the application, as shown in fig. 7, On the basis of above-mentioned embodiment illustrated in fig. 6, determining module 64 includes: processing submodule 641 and determining submodule 642.
If processing submodule 641 is unsatisfactory for the condition of convergence for present sample strategy, sampling policy is updated, according to update Sampling policy afterwards is sampled, and is continued to execute described corresponding based on each state sample point of first path planning algorithm acquisition First value and it is described based on the second path planning algorithm obtain each state sample point it is corresponding second be worth behaviour Make, until present sample strategy is restrained.
Wherein, the condition of convergence can refer to the valuation corresponding with state sample point of the corresponding sampling density of present sample strategy at Direct ratio.
For example, the corresponding valuation of state sample point 1 is 10, then it is 10 to the sampling density around state sample point 1, state The corresponding valuation of sample point 2 is 5, then is 5 to the sampling density around state sample point 2.
Wherein, updating sampling policy refers to update to the sampling density around each state sample point.
In this embodiment it is possible to be updated according to gauss hybrid models (GMM), currently each state sample point is corresponding is adopted Sample density is fitted the valuation of each state samples with GMM, correspondingly just obtained new sampling density.
If determining, submodule 642 meets the condition of convergence for present sample strategy, according to current each state sample point It is corresponding to be worth the maximum value path determined under current environment, and using maximum value path as current driving path.
In above-described embodiment, when present sample strategy is unsatisfactory for the condition of convergence, sampling policy is updated, until updated Sampling policy meets the condition of convergence, is sampled according to value to state sample point to realize, increases to high value shape Sampling density near state sample point realizes the intense adjustment to path planning.
The function of each unit and the realization process of effect are specifically detailed in the above method and correspond to step in above-mentioned apparatus Realization process, details are not described herein.
In the exemplary embodiment, a kind of computer readable storage medium is additionally provided, which is stored with calculating Machine program, the computer program is for executing above-mentioned paths planning method, wherein the paths planning method includes:
Current environment is sampled according to initial samples strategy, obtains multiple state sample points;
Corresponding first value of each state sample point is obtained based on first path planning algorithm;
Corresponding second value of each state sample point is obtained based on the second path planning algorithm;
Summation is weighted to the first value and the second value, obtains the corresponding value of each state sample point;
Driving path planning is determined based on the currently corresponding value of each state sample point.
Above-mentioned computer readable storage medium can be read-only memory (ROM), random access memory (RAM), CD Read-only memory (CD-ROM), tape, floppy disk and optical data storage devices etc..
For device embodiment, since it corresponds essentially to embodiment of the method, so related place is referring to method reality Apply the part explanation of example.The apparatus embodiments described above are merely exemplary, wherein being used as separate part description Unit may or may not be physically separated, component shown as a unit may or may not be Physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to the actual needs Some or all of the modules therein is selected to realize the purpose of application scheme.Those of ordinary skill in the art are not paying wound In the case that the property made is worked, it can understand and implement.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to its of the application Its embodiment.This application is intended to cover any variations, uses, or adaptations of the application, these modifications, purposes or Person's adaptive change follows the general principle of the application and including the undocumented common knowledge in the art of the application Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the application are wanted by right It asks and points out.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want There is also other identical elements in the process, method of element, commodity or equipment.
The above is only the preferred embodiments of the application, not to limit the application, it is all in spirit herein and Within principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the application protection.

Claims (10)

1. a kind of paths planning method, which is characterized in that the described method includes:
Current environment is sampled according to initial samples strategy, obtains multiple state sample points;
Corresponding first value of each state sample point is obtained based on first path planning algorithm;
Corresponding second value of each state sample point is obtained based on the second path planning algorithm;
Summation is weighted to first value and second value, obtains the corresponding valence of each state sample point Value;
Driving path planning is determined based on the currently corresponding value of each state sample point.
2. the method according to claim 1, wherein described based on the currently corresponding value of each state sample point Determine that driving path is planned, comprising:
If present sample strategy is unsatisfactory for the condition of convergence, sampling policy is updated, is sampled according to updated sampling policy, And it continues to execute described be worth based on each state sample point of first path planning algorithm acquisition corresponding first and is based on described Second path planning algorithm obtains the operation of corresponding second value of each state sample point, until present sample strategy is received It holds back;
If present sample strategy meets the condition of convergence, current environment is determined according to the currently corresponding value of each state sample point Under maximum value path, and using the maximum value path as current driving path.
3. according to the method described in claim 2, it is characterized in that, the condition of convergence refers to that the present sample strategy is corresponding Sampling density valuation corresponding to state sample point it is directly proportional.
4. according to the method described in claim 2, it is characterized in that, the update sampling policy, comprising:
The corresponding sampling density of current each state sample point is updated according to gauss hybrid models.
5. method according to claim 1 or 2, which is characterized in that obtained often described based on first path planning algorithm Before a state sample point corresponding first is worth, the method also includes:
State value function corresponding with the first path planning algorithm is trained by reverse nitrification enhancement.
6. according to the method described in claim 5, it is characterized in that, it is described by reverse nitrification enhancement train with it is described The corresponding state value function of first path planning algorithm, comprising:
According to the corresponding first path layout data of the first path planning algorithm and it is based on second path planning algorithm The second determining Route Planning Data is trained corresponding with the first path planning algorithm by reverse nitrification enhancement State value function.
7. according to the method described in claim 6, it is characterized in that, in the current corresponding valence of each state sample point of the basis It is worth after the maximum value path determined under current environment, the method also includes:
Maximum value path under the current environment is added in the first path layout data, with described for updating State value function.
8. a kind of path planning apparatus, which is characterized in that described device includes:
Sampling module obtains multiple state sample points for sampling according to initial samples strategy to current environment;
First obtains module, for obtaining each state sample point that the sampling module obtains based on first path planning algorithm Corresponding first value;
Second obtains module, for obtaining each state sample that the sampling module obtains based on the second path planning algorithm Corresponding second value of this point;
Weighted sum module, first value and second value for obtaining to the acquisition module are weighted and ask With obtain the corresponding value of each state sample point;
Determining module, the corresponding value of current each state sample point for being obtained based on the weighted sum module determine row Sail path planning.
9. a kind of computer readable storage medium, which is characterized in that the storage medium is stored with computer program, the calculating Machine program is used to execute any paths planning method of the claims 1-7.
10. a kind of mobile device, which is characterized in that including processor, memory and be stored on the memory and can locate The computer program run on reason device, the processor realize that the claims 1-7 is any when executing the computer program The paths planning method.
CN201811105686.5A 2018-09-21 2018-09-21 Path planning method and device and mobile device Active CN109405843B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811105686.5A CN109405843B (en) 2018-09-21 2018-09-21 Path planning method and device and mobile device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811105686.5A CN109405843B (en) 2018-09-21 2018-09-21 Path planning method and device and mobile device

Publications (2)

Publication Number Publication Date
CN109405843A true CN109405843A (en) 2019-03-01
CN109405843B CN109405843B (en) 2020-01-03

Family

ID=65466076

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811105686.5A Active CN109405843B (en) 2018-09-21 2018-09-21 Path planning method and device and mobile device

Country Status (1)

Country Link
CN (1) CN109405843B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110909859A (en) * 2019-11-29 2020-03-24 中国科学院自动化研究所 Bionic robot fish motion control method and system based on antagonistic structured control
CN110955239A (en) * 2019-11-12 2020-04-03 中国地质大学(武汉) Unmanned ship multi-target trajectory planning method and system based on inverse reinforcement learning
CN111230875A (en) * 2020-02-06 2020-06-05 北京凡川智能机器人科技有限公司 Double-arm robot humanoid operation planning method based on deep learning
CN112567399A (en) * 2019-09-23 2021-03-26 阿里巴巴集团控股有限公司 System and method for route optimization
CN113111296A (en) * 2019-12-24 2021-07-13 浙江吉利汽车研究院有限公司 Vehicle path planning method and device, electronic equipment and storage medium
CN113701771A (en) * 2021-07-29 2021-11-26 东风悦享科技有限公司 Parking path planning method and device, electronic equipment and storage medium
CN115494833A (en) * 2021-06-18 2022-12-20 广州视源电子科技股份有限公司 Robot control method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4698635A (en) * 1986-03-02 1987-10-06 The United States Of America As Represented By The Secretary Of The Navy Radar guidance system
CN103245347A (en) * 2012-02-13 2013-08-14 腾讯科技(深圳)有限公司 Intelligent navigation method and system based on road condition prediction
CN103310120A (en) * 2013-07-10 2013-09-18 东南大学 Transport service level based method for determining section congestion charge rates
WO2015105287A1 (en) * 2014-01-10 2015-07-16 에스케이플래닛 주식회사 Traffic information collecting method, apparatus and system therefor
CN106774327A (en) * 2016-12-23 2017-05-31 中新智擎有限公司 A kind of robot path planning method and device
CN107862346A (en) * 2017-12-01 2018-03-30 驭势科技(北京)有限公司 A kind of method and apparatus for carrying out driving strategy model training
CN108469827A (en) * 2018-05-16 2018-08-31 江苏华章物流科技股份有限公司 A kind of automatic guided vehicle global path planning method suitable for logistic storage system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4698635A (en) * 1986-03-02 1987-10-06 The United States Of America As Represented By The Secretary Of The Navy Radar guidance system
CN103245347A (en) * 2012-02-13 2013-08-14 腾讯科技(深圳)有限公司 Intelligent navigation method and system based on road condition prediction
CN103310120A (en) * 2013-07-10 2013-09-18 东南大学 Transport service level based method for determining section congestion charge rates
WO2015105287A1 (en) * 2014-01-10 2015-07-16 에스케이플래닛 주식회사 Traffic information collecting method, apparatus and system therefor
CN106774327A (en) * 2016-12-23 2017-05-31 中新智擎有限公司 A kind of robot path planning method and device
CN107862346A (en) * 2017-12-01 2018-03-30 驭势科技(北京)有限公司 A kind of method and apparatus for carrying out driving strategy model training
CN108469827A (en) * 2018-05-16 2018-08-31 江苏华章物流科技股份有限公司 A kind of automatic guided vehicle global path planning method suitable for logistic storage system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
吴伟宁等: "基于采样策略的主动学习算法研究进展", 《计算机研究与发展》 *
郜园园等: "基于DGSOM_A*的移动机器人地图创建和路径规划", 《北京工业大学学报》 *
麻博: "车辆自动驾驶中的速度跟踪控制策略研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112567399A (en) * 2019-09-23 2021-03-26 阿里巴巴集团控股有限公司 System and method for route optimization
CN110955239A (en) * 2019-11-12 2020-04-03 中国地质大学(武汉) Unmanned ship multi-target trajectory planning method and system based on inverse reinforcement learning
CN110909859A (en) * 2019-11-29 2020-03-24 中国科学院自动化研究所 Bionic robot fish motion control method and system based on antagonistic structured control
CN113111296A (en) * 2019-12-24 2021-07-13 浙江吉利汽车研究院有限公司 Vehicle path planning method and device, electronic equipment and storage medium
CN111230875A (en) * 2020-02-06 2020-06-05 北京凡川智能机器人科技有限公司 Double-arm robot humanoid operation planning method based on deep learning
CN111230875B (en) * 2020-02-06 2023-05-12 北京凡川智能机器人科技有限公司 Double-arm robot humanoid operation planning method based on deep learning
CN115494833A (en) * 2021-06-18 2022-12-20 广州视源电子科技股份有限公司 Robot control method and device
CN113701771A (en) * 2021-07-29 2021-11-26 东风悦享科技有限公司 Parking path planning method and device, electronic equipment and storage medium
CN113701771B (en) * 2021-07-29 2023-08-01 东风悦享科技有限公司 Parking path planning method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN109405843B (en) 2020-01-03

Similar Documents

Publication Publication Date Title
CN109405843A (en) A kind of paths planning method and device and mobile device
CN111142522B (en) Method for controlling agent of hierarchical reinforcement learning
WO2022052406A1 (en) Automatic driving training method, apparatus and device, and medium
US11132211B1 (en) Neural finite state machines
WO2021169588A1 (en) Automatic driving simulation method and apparatus, and electronic device and storage medium
CN111898211A (en) Intelligent vehicle speed decision method based on deep reinforcement learning and simulation method thereof
KR20200095378A (en) Learning method and learning device for supporting reinforcement learning by using human driving data as training data to thereby perform personalized path planning
CN112937564A (en) Lane change decision model generation method and unmanned vehicle lane change decision method and device
CN110210058B (en) Reference line generation method, system, terminal and medium conforming to vehicle dynamics
CN109933068A (en) Driving path planing method, device, equipment and storage medium
CN113561986A (en) Decision-making method and device for automatically driving automobile
CN110327624A (en) A kind of game follower method and system based on course intensified learning
CN113642243A (en) Multi-robot deep reinforcement learning system, training method, device and medium
CN113625753B (en) Method for guiding neural network to learn unmanned aerial vehicle maneuver flight by expert rules
CN116009542A (en) Dynamic multi-agent coverage path planning method, device, equipment and storage medium
CN114117944B (en) Model updating method, device, equipment and readable storage medium
CN115454082A (en) Vehicle obstacle avoidance method and system, computer readable storage medium and electronic device
Arroyo et al. Adaptive fuzzy knowledge‐based systems for control metabots' mobility on virtual environments
CN113052252B (en) Super-parameter determination method, device, deep reinforcement learning framework, medium and equipment
CN115743168A (en) Model training method for lane change decision, target lane determination method and device
CN108776668A (en) Path evaluation method, system, equipment and storage medium based on road-net node
Elallid et al. Vehicles control: Collision avoidance using federated deep reinforcement learning
Bono et al. SULFR: Simulation of Urban Logistic For Reinforcement
Xiao et al. MACNS: A generic graph neural network integrated deep reinforcement learning based multi-agent collaborative navigation system for dynamic trajectory planning
Gao et al. Hybrid path planning algorithm of the mobile agent based on Q-learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant