CN108268031A

CN108268031A - Paths planning method, device and robot

Info

Publication number: CN108268031A
Application number: CN201611264059.7A
Authority: CN
Inventors: 不公告发明人
Original assignee: Kuang Chi Innovative Technology Ltd; Shenzhen Guangqi Hezhong Technology Co Ltd
Current assignee: Hangzhou Institute Of Artificial Intelligence
Priority date: 2016-12-30
Filing date: 2016-12-30
Publication date: 2018-07-10
Also published as: WO2018120739A1

Abstract

The invention discloses a kind of paths planning method, device and robots.Wherein, this method includes：Obtain the current location of target robot and the position of candidate target point, wherein, current location and the position of candidate target point are in default path planning, and in default path planning, the position of candidate target point is located at after current location, the position candidate of next position that will be reached by current location as target robot；According to current location and the position of candidate target point, whether the position for determining candidate target point is next position that target robot will reach；If so, using the position of candidate target point as next position of current location in physical planning path；If it is not, the step of then returning to the position of the current location for obtaining target robot and candidate target point, and redefine until the position of candidate target point is next position that target robot will reach.The present invention solves the technical issues of path planning efficiency is low.

Description

Paths planning method, device and robot

Technical field

The present invention relates to intelligence machine technical field, in particular to a kind of paths planning method, device and machine People.

Background technology

With the development of robot technology and deepening continuously for artificial intelligence study, intelligent robot is played the part of in human lives Drill more and more important role.Wherein, robot needs have detection barrier, and can distinguish simultaneously cognitive disorders object, simultaneously The ability of feasible path can be planned in real time.It includes global path planning and local paths planning.Global path planning is usual Be static and environment it is known that common method has Visual Graph method, genetic algorithm, particle cluster algorithm etc., and local paths planning is State and environment is unknown, is the difficult point of path planning, when environment complexity, it is difficult to the route programming result got well.Existing In technology, conventional path planning, such as fuzzy programming, ant group algorithm, neural network etc., it tends not to meet dynamic environment simultaneously With the requirement of real-time.Path planning has Artificial Potential Field Method etc. at present.

For the above-mentioned low problem of path planning efficiency, currently no effective solution has been proposed.

Invention content

An embodiment of the present invention provides a kind of paths planning method, device and robot, at least to solve path planning effect The technical issues of rate is low.

One side according to embodiments of the present invention provides a kind of paths planning method, including：Obtain target robot Current location and candidate target point position, wherein, the position of the current location and candidate target point is default In path planning, and in the default path planning, the position of the candidate target point is located at after the current location, makees The position candidate of next position that will be reached by the current location for the target robot；According to described current Position and the position of candidate target point, whether the position for determining the candidate target point is that the target robot will arrive The next position reached；If so, using the position of the candidate target point as current location described in physical planning path Next position；If it is not, then return it is described obtain target robot current location and candidate target point position the step of, and It redefines until the position of the candidate target point is next position that the target robot will reach.

Further, according to the current location and the position of candidate target point, the candidate target point is determined Whether position is that next position that the target robot will reach includes：Obtain the current location and the candidate mesh The information of barrier between the position of punctuate, wherein, the information of the barrier includes the marginal position of the barrier, institute State position of the marginal position for the edge of the default path planning close in the barrier；Edge based on the barrier Position determines the environmental state information of the target robot and the barrier；According to the environmental state information, institute is determined Whether the position for stating candidate target point is next position that the target robot will reach.

Further, the marginal position based on the barrier determines the ring of the target robot and the barrier Border status information includes：Determine the distance between the current location and the marginal position of the barrier；It is if described apart from small In the first preset distance, the environmental state information for determining the target robot and the barrier is status of fail；It is if described Distance determines the ring of the target robot and the barrier not less than the first preset distance and less than the second preset distance Border status information is precarious position；If the distance determines described not less than the second preset distance and less than third preset distance Target robot and the environmental state information of the barrier are safe condition；If the distance not less than third preset distance, And less than the 4th preset distance, the environmental state information for determining the target robot and the barrier is success status；Its In, first preset distance is less than the second preset distance, and second preset distance is less than third preset distance, the third Preset distance is less than the 4th preset distance.

Further, according to the environmental state information, whether the position for determining the candidate target point is the target Next position that robot will reach includes：It determines from the position that the current location reaches each candidate target point, The candidate actions performed needed for the target robot；The corresponding candidate actions of each candidate target point and ambient condition are believed Breath determines the evaluation of estimate of each candidate target point；Obtain the maximum value in the corresponding evaluation of estimate of multiple candidate target points；According to institute The corresponding environmental state information in current location and current action and the maximum value are stated, determines the position of the candidate target point Whether put is next position that the target robot will reach.

Further, it is next position that the target robot will reach in the position for determining the candidate target point After putting, the method further includes：The current location is recorded to the routing information of next position, wherein, the road Diameter information includes range information of the current location to next position, the current location to next position Between obstacle information, the current location to the distance between barrier information.

Another aspect according to embodiments of the present invention additionally provides a kind of path planning apparatus, including：First obtains list Member, for obtaining the position of the current location of target robot and candidate target point, wherein, the current location and the candidate The position of target point is in default path planning, and in the default path planning, the position position of the candidate target point After the current location, the next position that will be reached by the current location as the target robot Position candidate；First determination unit for the position according to the current location and candidate target point, determines the candidate Whether the position of target point is next position that the target robot will reach；Second determination unit, for described In the case that the position of candidate target point is next position that the target robot will reach, by the candidate target point Next position of the position as current location described in physical planning path；And it is not in the position of the candidate target point In the case of next position that the target robot will reach, the first acquisition unit is returned, and is redefined straight Position to the candidate target point is next position that the target robot will reach.

Further, first determination unit includes：First acquisition module, for obtaining the current location and described The information of barrier between the position of candidate target point, wherein, the information of the barrier includes the edge of the barrier Position, the marginal position are close to the position at the edge of the default path planning in the barrier；First determining module, For the marginal position based on the barrier, the environmental state information of the target robot and the barrier is determined；The Two determining modules, for according to the environmental state information, whether the position for determining the candidate target point to be the target machine Next position that device people will reach.

Further, first determining module includes：First determination sub-module, for determining the current location and institute State the distance between marginal position of barrier；Second determination sub-module, if being less than the first preset distance for the distance, really The fixed target robot and the environmental state information of the barrier are status of fail；Third determination sub-module, if for institute Distance is stated not less than the first preset distance and less than the second preset distance, determines the target robot and the barrier Environmental state information is precarious position；4th determination sub-module, if not less than the second preset distance and being less than for the distance Third preset distance, the environmental state information for determining the target robot and the barrier is safe condition；5th determines Submodule, if determining the target robot not less than third preset distance and less than the 4th preset distance for the distance Environmental state information with the barrier is success status；Wherein, first preset distance is less than the second preset distance, institute The second preset distance is stated less than third preset distance, the third preset distance is less than the 4th preset distance.

Further, second determining module includes：6th determination sub-module, for determine from the current location to Up in the position of each candidate target point, the candidate actions of execution needed for the target robot；7th determination sub-module, is used for By the corresponding candidate actions of each candidate target point and environmental state information, the evaluation of estimate of each candidate target point is determined；Second Acquisition module, for obtaining the maximum value in the corresponding evaluation of estimate of multiple candidate target points；8th determination sub-module, for basis The corresponding environmental state information in the current location and current action and the maximum value, determine the candidate target point Whether position is next position that the target robot will reach.

Further, described device further includes：Recording unit, for being described in the position for determining the candidate target point After next position that target robot will reach, the path for recording the current location to next position is believed Breath, wherein, range information of the routing information including the current location to next position, the current location are arrived Obstacle information, the current location between next position are to the distance between barrier information.

Another aspect according to embodiments of the present invention additionally provides a kind of robot, including：It is any in the above embodiment Path planning apparatus described in.

In embodiments of the present invention, target robot can obtain the current location of target robot and candidate target point Position, wherein, the position of current location and candidate target point in default path planning, is waited in default path planning The position of target point is selected to be located at after current location, the next position that will be reached by current location as target robot The position candidate put, and according to current location and the position of candidate target point, whether the position for determining candidate target point is target Next position that robot will reach, if so, using the position of candidate target point as present bit in physical planning path The next position put, if it is not, may return to the current location for obtaining target robot and the position of candidate target point Step, and whether the position for redefining candidate target point is next position that target robot will reach, until candidate The position of target point is next position that target robot will reach.According to the above embodiment, target robot can be with The continuous position for obtaining current location and candidate target point, makes Path selection, thus according to the routing information got, Rational path planning is made, solves the technical issues of path planning efficiency is low.

Description of the drawings

Attached drawing described herein is used to provide further understanding of the present invention, and forms the part of the application, this hair Bright illustrative embodiments and their description do not constitute improper limitations of the present invention for explaining the present invention.In the accompanying drawings：

Fig. 1 is a kind of flow chart of optional paths planning method according to embodiments of the present invention；

Fig. 2 is the flow chart of the optional paths planning method of another kind according to embodiments of the present invention；

Fig. 3 is the structure chart of the optional path planning apparatus of another kind according to embodiments of the present invention.

Specific embodiment

In order to which those skilled in the art is made to more fully understand the present invention program, below in conjunction in the embodiment of the present invention The technical solution in the embodiment of the present invention is clearly and completely described in attached drawing, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people Member's all other embodiments obtained without making creative work should all belong to the model that the present invention protects It encloses.

It should be noted that term " first " in description and claims of this specification and above-mentioned attached drawing, " Two " etc. be the object for distinguishing similar, and specific sequence or precedence are described without being used for.It should be appreciated that it uses in this way Data can be interchanged in the appropriate case, so as to the embodiment of the present invention described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " comprising " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, be not necessarily limited to for example, containing the process of series of steps or unit, method, system, product or equipment Those steps or unit clearly listed, but may include not listing clearly or for these processes, method, product Or the intrinsic other steps of equipment or unit.

First, the part noun or term occurred during the embodiment of the present application is described is suitable for following solution It releases：

The method of exhaustion：The approximate range of answer is determined according to the partial condition of topic, and within this range to all possible Situation is verified one by one, until the whole circumstances verify.If some situation authenticator synthesis purpose full terms, for this problem One solution；If not complying with the full terms of topic after whole circumstances verification, this subject is without solution.The method of exhaustion is also referred to as enumerated Method.

MDP：Markovian decision process (Markov Decision Process, MDP), MDP considers action, that is, is Next state of uniting is not only related with current state, also related with the action currently taken.

Boltzman distribution mechanisms：That is ANALOGY OF BOLTZMANN DISTRIBUTION, is the selection of typical random neural network, state with probability come It determines.

Q-learning algorithms：Enhance learning algorithm, be based on finite state and the learning algorithm for having line model, be used for Machine learning learns a circumstances not known by unsupervised training (unsupervised training).

It is extensive：After a certain reaction is contacted with certain stimulation formation condition, this reaction also can be with other similar stimulations Conditioned connection to a certain degree is formed, this process is referred to as extensive.

Artificial Potential Field Method：It is the movement by robot in ambient enviroment, is designed to a kind of be abstracted in artificial gravitational field Movement, target point to mobile robot generate " gravitation ", barrier to mobile robot generate " repulsion ", finally by ask resultant force To control the movement of mobile robot.

Fuzzy logic：It establishes on the basis of multi valued logic, fuzziness thinking, language is studied with the method for fuzzy set The science of form and its rule.

According to embodiments of the present invention, a kind of embodiment of the method for path planning is provided, it should be noted that in attached drawing The step of flow illustrates can perform in the computer system of such as a group of computer-executable instructions, although also, Logical order is shown in flow chart, but in some cases, it can perform shown with the sequence being different from herein or retouch The step of stating.

Fig. 1 is a kind of flow chart of optional paths planning method according to embodiments of the present invention, as shown in Figure 1, the party Method includes the following steps：

Step S102 obtains the current location of target robot and the position of candidate target point, wherein, current location and time Selecting the position of target point, the position of candidate target point is located at currently in default path planning, and in default path planning After position, the position candidate of next position that will be reached by current location as target robot；

Step S104, according to current location and the position of candidate target point, whether the position for determining candidate target point is mesh Next position that scalar robot will reach, if so, step S106 is performed, if it is not, return to step S102；

Step S106, using the position of candidate target point as next position of current location in physical planning path.

By the above embodiment, target robot can obtain the current location of target robot and candidate target point Position, wherein, the position of current location and candidate target point in default path planning, is waited in default path planning The position of target point is selected to be located at after current location, the next position that will be reached by current location as target robot The position candidate put, and according to current location and the position of candidate target point, whether the position for determining candidate target point is target Next position that robot will reach, if so, using the position of candidate target point as present bit in physical planning path The next position put, if it is not, may return to the current location for obtaining target robot and the position of candidate target point Step, and whether the position for redefining candidate target point is next position that target robot will reach, until candidate The position of target point is next position that target robot will reach.According to the above embodiment, target robot can be with The continuous position for obtaining current location and candidate target point, makes Path selection, thus according to the routing information got, Rational path planning is made, solves the technical issues of path planning efficiency is low.

For the above embodiment, it can be applied in various robots, which includes intelligent robot, military Robot and service robot etc..Robot can make Path selection according to the environmental information detected.

Optionally, target robot can select the intelligent robot in path, can be set in the target robot Multiple sensors are put, the environmental information around target robot can be got by the sensor in real time, wherein, environment letter Breath can include it is a variety of, as obstacle position information and candidate target point location information and target robot to candidate mesh Routing information between punctuate.Wherein, which can include target robot to the distance and barrier of barrier With the range information of candidate target point etc..

Optionally, the sensor in the above embodiment can include a variety of, such as：Sonar sensor, infrared ray sensing Device, velocity sensor etc., by each sensor can detect target robot between barrier, candidate target point away from From can also detect the size of barrier and the edge of barrier by the sensor in target robot.

For above-described embodiment, target robot can first determine that target robot is current before path planning is carried out Location information and the end point location information finally to be reached, and rational path is made according to current location and final position and is advised It draws.

Another optional embodiment, in the technical solution of step S102 offers, obtains the current of target robot Position and the position of candidate target point, wherein, the position of current location and candidate target point all in default path planning, and In default path planning, the position of candidate target point is located at after current location, as target robot by current location The position candidate of next position that will be reached.Wherein, candidate target point can be that target robot can select next time Traveling target point, by obtaining the current location of target robot and the location determination of candidate target point, so that it is determined that target The direction that robot can walk in next step.Optionally, the direction that target robot can walk can include it is multiple, for example, Front, left front and right front etc..

Optionally, above-mentioned candidate target point can be multiple, and candidate target point is relative to target robot current location There can be different directions, target robot is when selecting candidate target point, it may be determined that candidate target point to target robot Distance, candidate target point can also be determined the distance between to the terminal finally to be reached.

Optionally, it in the technical solution provided in step S104, according to current location and the position of candidate target point, determines Whether the position of candidate target point is next position that target robot will reach.Wherein, candidate target point is being determined When whether position is next position that target robot will reach, target robot can be obtained to each candidate target point The distance between, at the same need to get target robot between candidate target point obstacle information (such as barrier it is big Small, barrier edge), and according to the information got, whether the position for determining candidate target point is that target robot will The next position to be reached.

Optionally, it is behind next position that target robot will reach in the position for determining candidate target point, holds Row step S106 provide technical solution, using the position of candidate target point as in physical planning path current location it is next Position.If it is determined that next position that the position of candidate target point, which will not be target robot, to be reached, then return to step S102, whether the position for redefining other candidate target points is next position that target robot will reach, you can with According to current location and the position of candidate target point, whether the position for judging other candidate target points again is that target robot will The next position to be reached.Until determining that target robot thinks next position to be achieved.

Another optional embodiment, according to current location and the position of candidate target point, determines candidate target point Whether position is that next position that target robot will reach includes：Obtain current location and candidate target point position it Between barrier information, wherein, the information of barrier includes the marginal position of barrier, and marginal position is close in barrier The position at the edge of default path planning；Marginal position based on barrier determines the environment shape of target robot and barrier State information；According to environmental state information, determine that whether to be that target robot will reach next for the position of candidate target point Position.

Optionally, the barrier in the above embodiment can include static-obstacle thing and dynamic barrier, wherein, it is static Barrier can be immotile barrier, not moved relative to target robot, position is relatively-stationary, example Such as, stone；Can be the real-time barrier for changing position, for example, mobile trolley for dynamic barrier.Target machine Sensor in people can get the obstacle information in preset range in real time, which can be target robot Around a range, for example, using the radius in the artificial center of circle of target machine as 5 meters of border circular areas.Target robot can not The environmental information of surrounding is got disconnectedly, and the information of acquisition is stored.

Wherein, target robot can establish an environmental model according to the environmental information got, wherein, at this In environmental model, it is stored with the routing information and obstacle information passed by, and the environment detected recently according to sensor Information updates environmental model.Optionally, target robot needs timely to update to the information of dynamic barrier.

Wherein, when obtaining obstacle information, the size of barrier and the location information of barrier can be obtained, it can also The information of the marginal position of barrier is got, the marginal position of the barrier can be proximate to the margin location of candidate target point It puts.

Another optional embodiment after obstacle information is got, can be based on obstacle information, determine target The environmental state information of robot and barrier, wherein, the environmental state information of target robot and barrier includes at least down At least one state：Success status；Safe condition；Precarious position；Status of fail.And according to the environmental state information got, really Whether the position for determining candidate target point is next position that target robot will reach.

Optionally, the marginal position based on barrier determines that the environmental state information of target robot and barrier includes： Determine the distance between marginal position of current location and barrier；If distance is less than the first preset distance, target machine is determined The environmental state information of people and barrier is status of fail；If distance is not less than the first preset distance and less than the second pre- spacing From the environmental state information for determining target robot and barrier is precarious position；If distance not less than the second preset distance and Less than third preset distance, the environmental state information for determining target robot and barrier is safe condition；If distance is not less than Third preset distance and less than the 4th preset distance, the environmental state information for determining target robot and barrier are successful shape State；Wherein, the first preset distance is less than the second preset distance, and the second preset distance is less than third preset distance, the pre- spacing of third From less than the 4th preset distance.

Wherein, the first preset distance can be very short, such as 0.1 meter, i.e., the barrier may be in target robot to time It selects on the path between target point so that target robot can not reach the candidate target point by the path, if at this point, detection It is less than the first preset distance to distance between target robot and the marginal position of barrier, then can be determined that target robot not The candidate target point, i.e. status of fail can be reached.Optionally, the second preset distance can be more than the first preset distance and be less than The numerical value of third preset distance, for example, 0.6 meter, i.e., the edge of the barrier may target robot to candidate target point it Between path on, if target robot is wanted to reach the candidate target point, need to get around the edge of the barrier, at this point, target Robot is precarious position to reach candidate target point.

Another optional embodiment, third preset distance can be pre-set threshold value, the third preset distance It may be greater than the second preset distance and be less than the numerical value of the 4th preset distance, for example, 1.2 meters, that is, the barrier detected The path that the Edge Distance target robot reaches candidate target point is nearer, and still, which does not interfere target robot to arrive Up to the candidate target point, at this point it is possible to which it is safe condition to determine that target robot reaches candidate target point.Optionally, the 4th is pre- Set a distance can be pre-set numerical value, can be more than third preset distance, for example, 2 meters, i.e. the obstacle distance target Robot reaches the path of candidate target point farther out, at this point, barrier does not influence the Path selection of target robot, it can To determine that target robot reaches candidate target point as success status at this time.

Wherein, target robot select it is next will reach position when, can be to reaching each candidate target point Path analyzed.

For the above embodiment, can sliding-model control be carried out to ambient condition by fuzzy logic, it optionally, can be with The distance between target robot and barrier edge according to size are obscured and are：Low coverage, middle-range and long distance.Wherein, low coverage Represent that target robot is nearer apart from the edge of barrier, the range that low coverage represents can be predetermined to second in the first preset distance Between distance；Middle-range represents that target robot is more than low coverage but the range represented less than long distance, middle-range apart from the edge of barrier It can be in the second preset distance between the 4th preset distance；During long distance represents that target robot is more than apart from the edge of barrier Away from the range that long distance represents may be greater than the 4th preset distance.

Optionally, robot can judge that the target robot reaches candidate mesh by the fuzzy distance that sensor returns Whether punctuate can walk, for example, sensor return be long distance information when, it can be determined that go out the Edge Distance mesh of barrier Scalar robot farther out, will not interfere robot to advance, if the low coverage information that sensor returns, it can be determined that go out the edge of barrier Distance objective robot is nearer, if robot moves on, may bump against with barrier, which may interfere with target Robot advances.

Optionally, the marginal position based on barrier determines that the environmental state information of target robot and barrier may be used also To include：Marginal position based on current location and barrier determines to reach the of the marginal position of barrier from current location One vector, wherein, the length of primary vector is the first length, and the direction of primary vector is first direction；According to current location and The position of candidate target point determines the secondary vector from the position of current location arrival candidate target point, wherein, secondary vector Length is the second length, and the direction of secondary vector is second direction；In the case where the first length is more than the second length, mesh is determined The environmental state information of scalar robot and barrier is success status；In the first length no more than in the case of the second length, examine Survey first direction and the angle of second direction；If angle is more than the first predetermined angle, it is determined that target robot and barrier Environmental state information is safe condition；If angle is more than the second predetermined angle and less than or equal to the first predetermined angle, it is determined that mesh The environmental state information of scalar robot and barrier is precarious position；If angle is less than or equal to the first predetermined angle, it is determined that mesh The environmental state information of scalar robot and barrier is status of fail, wherein, the second predetermined angle is less than the first predetermined angle.

For the above embodiment, be by detect target robot current location relative to barrier edge first to Amount and target robot current location determine that target robot arrives relative to the angle between the secondary vector of candidate target point Ambient condition up between candidate target point.In this embodiment, if the first representative length of primary vector is more than the The second length represented by two vectors, then can obtain barrier relative to candidate target point position reach target robot it Between it is distant, barrier will not hamper completely target robot reach candidate target point, at this point it is possible to determine target machine Device people reaches the candidate target point as success status.Optionally, in the case where the first length is not more than the second length, Ke Yijian The angle of first direction and second direction is surveyed, if angle is more than the first predetermined angle, it is determined that target robot and barrier Environmental state information is safe condition, wherein, which can be larger numerical value, such as 30 degree, first direction It is more than the first predetermined angle with the angle of second direction, then can determines that barrier will not hamper target robot and reach time Select target point.

Wherein, the second predetermined angle is less than the first predetermined angle, if angle is more than the second predetermined angle and less than or equal to the One predetermined angle, it is determined that target robot and the environmental state information of barrier are precarious position, and the second predetermined angle can be with It is smaller angle numerical value, for example, 10 degree, it is more than the second predetermined angle in the angle of primary vector and secondary vector and is less than During the first predetermined angle, at this point, the edge of barrier may hamper in target robot to the path of candidate target point Target robot reaches the candidate target point at this point it is possible to determine that it is precarious position.Another optional embodiment, is being pressed from both sides Angle is less than or equal to the first predetermined angle, it is determined that target robot and the environmental state information of barrier are status of fail, at this time Target robot cannot reach candidate target point.

Another optional embodiment, it is next in the position for determining candidate target point to be that target robot will reach After a position, current location can also be recorded to the routing information of next position, wherein, routing information includes current location Range information, current location to next position between next position obstacle information, current location to barrier The distance between information.Target robot can update above-mentioned environmental model according to the routing information of record.

Optionally, when target machine records routing information of the current location to next position, it may be determined that outbound path Evaluation of estimate, which can be obtained according to above-mentioned environmental state information, i.e., target robot reaches next position Afterwards, the path passed by can be fed back accordingly, such as：- 1,0,1 etc., which can be above-mentioned ambient condition Information is corresponding, for example, in status of fail, evaluation of estimate is -1；During safe condition, evaluation of estimate 1；During precarious position, evaluation Be worth is 0；During success status, evaluation of estimate 2.

Optionally, according to environmental state information, whether the position for determining candidate target point is that target robot will reach Next position include：It determines from the position that current location reaches each candidate target point, is performed needed for target robot Candidate actions；By the corresponding candidate actions of each candidate target point and environmental state information, each candidate target point is determined Evaluation of estimate；Obtain the maximum value in the corresponding evaluation of estimate of multiple candidate target points；According to the corresponding ambient condition letter in current location Breath and current action and maximum value determine that whether to be that target robot will reach next for the position of candidate target point Position.

Wherein, the candidate actions performed needed for target robot, can be target robot and each candidate target point Action selection, for example, advance, turn left and turn right etc..It is (i.e. successful according to the candidate actions and above-mentioned environmental state information State, safe condition, precarious position or status of fail), the evaluation of estimate of each candidate target is obtained, by obtaining multiple candidates Maximum value in the corresponding evaluation of estimate of target point, according to the corresponding environmental state information in current location and current action and most Big value, whether the position for determining candidate target point is next position that target robot will reach, you can to select to evaluate Maximum value in value, and action selection is made in combining target robot current location, to reach next position.

By the above embodiment, target robot can pass through evaluation of estimate, the corresponding environmental state information in current location Whether the position that candidate target point is determined with current action is next position that target robot will reach, so as in real time Update routing information, then, by the routing information passed by, constantly path optimizing selects, and finally, can reach home.

Optionally, target robot needs to select multiple candidate target points, no when proceeding by path planning Disconnectedly according to the path passed by, Path selection is made.In this embodiment, target robot is by candidate target point After next position of the position as current location in physical planning path, the next of selection can be moved to from current location A position, and current location is recorded to the routing information of next position.I.e. target robot can constantly learn to having walked Outbound path action selection is done in the path crossed according to the routing information learnt.

Optionally, according to the corresponding environmental state information in current location and current action and maximum value, candidate's mesh is determined Whether the position of punctuate is that next position that target robot will reach includes：Candidate target point is determined by equation below Position whether be next position that target robot will reach：

Q (state, action)=R (state, action)+Gamma*MaxQ (next state, all actions),

Wherein, state is target robot current location, and action is optional action, and Gamma is constant, R (state, Action it is) to be selected according to the action that current location is made, positions of the next state for candidate target point, all actions It is selected for all actions, MaxQ (next state, all actions) is position of the target robot according to candidate target point Maximum value in the evaluation of estimate of the everything of selection, Q (state, action) are according to the corresponding ambient condition in current location Information and current action and maximum value, it is next position to determine optimal candidate target point.

Target robot can with by the above-mentioned means, constantly learn, robot will from a target point to another Target point constantly detects, until it reaches the terminal in path.Target robot, can be according to commenting when path is selected The size of value makes optimum path planning, selects the larger represented path of evaluation of estimate, constantly updates the evaluation of estimate in path, most The optimal path between starting position to final position can be determined eventually, which can be effective avoiding obstacles Path.

Wherein, when above-mentioned formula is used to determine next position, a matrix can be established, in the matrix, and is built Corresponding path planning parameter is found, for example, target point A can arrive target point B, then it is 1 that can determine its parameter, and target point A is not Target point C can be arrived, then it is -1 that can determine its parameter, wherein, the specific implementation step of the matrix can include：Setting is current Target point is initial target point；Since current target point, the action for having highest evaluation of estimate is found；Under setting current target point is One target point (the candidate target point target point i.e. corresponding to highest evaluation of estimate)；The above embodiment is repeated, until current mesh Punctuate is terminal.

Optionally, obtain between current location and barrier apart from when, can target machine be detected according to sensor The distance between device people current location and barrier determine distance, and target robot is detected by sensor in target robot Behind the distance between current location and barrier, when can be handled in the processing end in feeding back to target robot, it can obscure Its specifically distance, feeds back corresponding range information to processing end.Wherein, processing end can analyze current goal robot and week Enclose ambient condition.

By the above embodiment, target robot can carry out path planning, Ke Yi during continuous study In circumstances not known, exploration is made to environment and knowledge is obtained from evaluation, improves action strategy to accomplish the end in view.

Here is according to a particular embodiment of the invention.

Embodiment of the present invention is to be based on intensified learning in circumstances not known into walking along the street for robot path planning Diameter plans that the advantage of intensified learning is, by with environmental interaction trial and error, constantly to learn.The embodiment of the present invention proposes will be strong Chemistry is practised and fuzzy logic be combined with each other, and sliding-model control is carried out to ambient condition using fuzzy logic, to ensure state demarcation Reasonability, and complete using intensified learning selection strategy obstacle-avoiding route planning in the circumstances not known of robot.

Optionally, when carrying out sliding-model control to ambient condition using fuzzy logic, robot and barrier can first be determined Hinder the distance of object, wherein, the distance between robot and barrier are divided into low coverage, middle-range and long distance.

Optionally, intensified learning can include such as lower part：

Environmental model：The information that robot can be returned according to sensor establishes the model of a currently known environment, In the model, it can be detected with recorder people current location information, directional information, the gait of march of current robot, sensor The target point to be reached to obstacle information and robot.Wherein, robot current location information can for robot Manage position；Distance and the barrier of the obstacle information that sensor detects including barrier and robot are relative to machine The direction of people.In this embodiment, barrier can include static-obstacle thing and dynamic barrier, for static-obstacle thing, Robot can determine how avoidance, for dynamic barrier, robot according to the obstacle position information once detected The obstacle position information that can be real-time detected according to sensor updates environmental model, so as to preferably carry out path planning.

Penalty：Since Q-learning algorithms are the MDP models based on finite state and limited action, input or defeated It must be finite discrete state to go out, and in embodiments of the present invention, extensive processing can be carried out to enhancing study, by robot State is set as s, and action is set as a.Optionally, the distance of robot and target point can be set as d, when may be set in d=1, The distance of robot and terminal is successful distance, and d=2 is conflict distance, and d=3 is risk distance, and d=4 is failure distance, ring Border state can include success status, safe condition, precarious position and status of fail, can return to numerical value in success status 2, return to numerical value 2 during safe condition, when precarious position returns to numerical value 0, and when status of fail returns to numerical value -1.

Wherein it is possible to according to the distance between robot and target point relationship, robot current location is obtained relative to end The state of point position, and according to the ambient condition between the robot and target point detected, select next target point.

By the above embodiment, can constantly be determined to obtain the optimal value in path according to penalty.

Q value function:The selection in the path can be realized by following formula：

Q (state, action)=R (state, action)+Gamma*MaxQ (next state, all actions)

Wherein, state is robotary, and action is action, and Q (state, action) is represented by Q value function One element of matrix value, R (state, action) is the value of respective element in R matrixes, and Gamma is Variable Learning, MaxQ (next state, all actions) is the maximum value in the evaluation of estimate of robot NextState everything.Robot can be with By the above-mentioned means, constantly learning, robot will constantly be explored from a state to another state, until it is arrived Up to dbjective state.Our such each time explore is referred to as a scene.Each scene is exactly target robot from initial state Reach the process of dbjective state.Each target robot reaches dbjective state, which can enter next field Jing Zhong.State s execution acts the evaluation function of a, and hereafter presses enhanced signal discount when optimal action sequence performs, only The status switch from initial state to dbjective state is tracked to use matrix Q.This algorithm is found in matrix Q for current state The next step for having highest evaluation of estimate acts.

Optionally, above-mentioned Q matrixes occupation mode can include：First, robotary is initialized, it then can be by machine The current state of people is set as init state；Since current state, the action for having highest q value is found；It can be by highest q value Represented state is expressed as NextState, and it is NextState to set current state；Judge whether current state is target-like State, if so, without finding the state of robot again, if it is not, continuing to execute above-mentioned since current state, searching highest q value Action, until current state be dbjective state.By the above embodiment, can return from initial state to dbjective state Status switch.

Act selection strategy：In order to select more actions acted while consider Q value function maximum as far as possible, the present invention is real Apply the action progress robot obstacle-avoiding that mode chooses Q value maximums using Boltzmann distribution mechanisms.Wherein, action selection includes Advance, turn left, turn right.

By the above embodiment, sliding-model control can be carried out to ambient condition using fuzzy logic, ensure that state What is divided is reasonable, then can complete robot in circumstances not known using intensified learning according to the environmental model of real-time update Obstacle-avoiding route planning.

Fig. 2 is the flowchart 2 of the optional paths planning method of another kind according to embodiments of the present invention, as shown in Fig. 2, This method comprises the following steps：

Step S201, robot initial current state.

Step S203 judges whether current state is dbjective state.

If it is not, step S205 is performed, if so, performing step S209.

Step S205 inquires highest evaluation of estimate from current state, and finds the corresponding action of highest evaluation of estimate.

Step S207, setting current state are NextState.

Step S209, robot reach the corresponding target point of NextState.

The embodiment of the present invention, robot can first initialize current state when carrying out path planning, which can To be robot current location and the ambient condition, then, robot may determine that whether current state is dbjective state, If to judge current state be not dbjective state, highest evaluation of estimate is inquired from current state, and find the most higher assessment Corresponding action is worth, it is NextState then to set current state, if judging current state for dbjective state, robot The corresponding target point of NextState can be reached.I.e. robot can when selecting path, according to current state and evaluation of estimate come Determine the next target point that will be reached.

In the embodiment, using the paths planning method of enhancing study, as robot constantly moves, constantly obtain Environmental information around robot, and pass through and constantly choose state and update routing information, current state can be calculated Under optimal path, ensure that robot can carry out effective avoidance in circumstances not known, and path optimizing selects, robot can be with In continuous processing state space, optimal path is searched out.

Fig. 3 is the structure chart of the optional path planning apparatus of another kind according to embodiments of the present invention, as shown in figure 3, should Device includes：First acquisition unit 31, for obtaining the position of the current location of target robot and candidate target point, wherein, Current location and the position of candidate target point are in default path planning, and in default path planning, candidate target point Position is located at after current location, the candidate bit of next position that will be reached by current location as target robot It puts；First determination unit 33, for the position according to current location and candidate target point, determine candidate target point position whether It is next position that target robot will reach；Second determination unit 35, the second determination unit, in candidate target point Position be next position that target robot will reach in the case of, using the position of candidate target point as physical planning Next position of current location in path；And in the position of candidate target point not to be that target robot will reach next In the case of position, first acquisition unit is returned, and is redefined until the position of candidate target point is that target robot will The next position reached.

By the above embodiment, target robot can obtain the current of target robot by first acquisition unit 31 Position and the position of candidate target point, wherein, the position of current location and candidate target point in default path planning, and In default path planning, the position of candidate target point is located at after current location, as target robot by current location The position candidate of next position that will be reached, and pass through the first determination unit 33 according to current location and candidate target point Position, whether the position for determining candidate target point is next position that target robot will reach, and passes through second and determine Unit 35 is in the case of next position that the position of candidate target point is that target robot will reach, by candidate target point Next position of the position as current location in physical planning path；And in the position of candidate target point it is not target machine In the case of next position that people will reach, first acquisition unit is returned, and redefines the position until candidate target point It is next position that target robot will reach to put.According to the above embodiment, target robot can be obtained constantly Current location and the position of candidate target point, make Path selection, so as to according to the routing information got, make rational Path planning solves the technical issues of path planning efficiency is low.

Optionally, the first determination unit includes：First acquisition module, for obtaining the position of current location and candidate target point The information of barrier between putting, wherein, the information of barrier includes the marginal position of barrier, and marginal position is in barrier Close to the position at the edge of default path planning；First determining module for the marginal position based on barrier, determines target machine Device people and the environmental state information of barrier；Second determining module, for according to environmental state information, determining candidate target point Whether position is next position that target robot will reach.

Another optional embodiment, the first determining module include：First determination sub-module, for determining current location The distance between marginal position of barrier；Second determination sub-module if being less than the first preset distance for distance, determines mesh The environmental state information of scalar robot and barrier is status of fail；Third determination sub-module, if being not less than first for distance Preset distance and less than the second preset distance, the environmental state information for determining target robot and barrier is precarious position；The Four determination sub-modules, if for distance not less than the second preset distance and less than third preset distance, determine target robot with The environmental state information of barrier is safe condition；5th determination sub-module, if for distance not less than third preset distance and Less than the 4th preset distance, the environmental state information for determining target robot and barrier is success status；Wherein, first is predetermined Distance is less than the second preset distance, and the second preset distance is less than third preset distance, and third preset distance is less than the 4th pre- spacing From.

Wherein, the second determining module includes：6th determination sub-module, for determining to reach each candidate mesh from current location In the position of punctuate, the candidate actions of execution needed for target robot；7th determination sub-module, for by each candidate target point Corresponding candidate actions and environmental state information determine the evaluation of estimate of each candidate target point；Second acquisition module, for obtaining Maximum value in the corresponding evaluation of estimate of multiple candidate target points；8th determination sub-module, for according to the corresponding ring in current location Border status information and current action and maximum value, whether the position for determining candidate target point is that target robot will reach Next position.

Optionally, device further includes：Recording unit, for whether being target robot in the position for determining candidate target point After the next position that will be reached, the routing information of record current location to next position, wherein, routing information includes Current location to the range information of next position, current location between next position obstacle information, current location To the distance between barrier information.

Another aspect according to embodiments of the present invention additionally provides a kind of robot, which is characterized in that including：Above-mentioned reality Apply the path planning apparatus of any one of mode.

The embodiment of the present invention, as robot constantly moves, is constantly obtained using the paths planning method of enhancing study Environmental information around robot, and pass through and constantly choose candidate target point and update routing information, it can be calculated and work as Optimal path under preceding state ensure that robot can carry out effective avoidance in circumstances not known, and path optimizing selects, machine People can search out optimal path in continuous processing state space.

The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.

In the above embodiment of the present invention, all emphasize particularly on different fields to the description of each embodiment, do not have in some embodiment The part of detailed description may refer to the associated description of other embodiment.

In several embodiments provided herein, it should be understood that disclosed technology contents can pass through others Mode is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, Ke Yiwei A kind of division of logic function, can there is an other dividing mode in actual implementation, for example, multiple units or component can combine or Person is desirably integrated into another system or some features can be ignored or does not perform.Another point, shown or discussed is mutual Between coupling, direct-coupling or communication connection can be INDIRECT COUPLING or communication link by some interfaces, unit or module It connects, can be electrical or other forms.

The unit illustrated as separating component may or may not be physically separate, be shown as unit The component shown may or may not be physical unit, you can be located at a place or can also be distributed to multiple On unit.Some or all of unit therein can be selected according to the actual needs to realize the purpose of this embodiment scheme.

In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also That each unit is individually physically present, can also two or more units integrate in a unit.Above-mentioned integrated list The form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.

If the integrated unit is realized in the form of SFU software functional unit and is independent product sale or uses When, it can be stored in a computer read/write memory medium.Based on such understanding, technical scheme of the present invention is substantially The part to contribute in other words to the prior art or all or part of the technical solution can be in the form of software products It embodies, which is stored in a storage medium, is used including some instructions so that a computer Equipment (can be personal computer, server or network equipment etc.) perform each embodiment the method for the present invention whole or Part steps.And aforementioned storage medium includes：USB flash disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD etc. are various can to store program code Medium.

The above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications also should It is considered as protection scope of the present invention.

Claims

1. a kind of paths planning method, which is characterized in that including：

The current location of target robot and the position of candidate target point are obtained, wherein, the current location and the candidate mesh The position of punctuate is in default path planning, and in the default path planning, the position of the candidate target point is located at After the current location, the time of next position that will be reached by the current location as the target robot Bit selecting is put；

According to the current location and the position of candidate target point, whether the position for determining the candidate target point is described Next position that target robot will reach；

If so, using the position of the candidate target point as next position of current location described in physical planning path； If it is not, the step of then returning to the position of the current location for obtaining target robot and candidate target point, and redefine straight Position to the candidate target point is next position that the target robot will reach.

2. according to the method described in claim 1, it is characterized in that, according to the current location and the position of candidate target point It puts, whether the position for determining the candidate target point is that next position that the target robot will reach includes：

The information of the barrier between the current location and the position of candidate target point is obtained, wherein, the barrier Information include the marginal position of the barrier, the marginal position is close to the default path planning in the barrier Edge position；

Based on the marginal position of the barrier, the environmental state information of the target robot and the barrier is determined；

According to the environmental state information, whether the position for determining the candidate target point is that the target robot will reach Next position.

3. according to the method described in claim 2, it is characterized in that, the marginal position based on the barrier, determines the mesh The environmental state information of scalar robot and the barrier includes：

Determine the distance between the current location and the marginal position of the barrier；

If the distance is less than the first preset distance, determine that the target robot and the environmental state information of the barrier are Status of fail；

If the distance not less than the first preset distance and less than the second preset distance, determine the target robot with it is described The environmental state information of barrier is precarious position；

If the distance not less than the second preset distance and less than third preset distance, determine the target robot with it is described The environmental state information of barrier is safe condition；

If the distance not less than third preset distance and less than the 4th preset distance, determine the target robot with it is described The environmental state information of barrier is success status；

Wherein, first preset distance is less than the second preset distance, and second preset distance is less than third preset distance, institute Third preset distance is stated less than the 4th preset distance.

4. according to the method described in claim 2, it is characterized in that, according to the environmental state information, the candidate mesh is determined Whether the position of punctuate is that next position that the target robot will reach includes：

It determines from the position that the current location reaches each candidate target point, the candidate performed needed for the target robot Action；

By the corresponding candidate actions of each candidate target point and environmental state information, the evaluation of estimate of each candidate target point is determined；

Obtain the maximum value in the corresponding evaluation of estimate of multiple candidate target points；

According to the corresponding environmental state information in the current location and current action and the maximum value, the candidate is determined Whether the position of target point is next position that the target robot will reach.

5. according to the method described in claim 4, it is characterized in that, it is the target in the position for determining the candidate target point After next position that robot will reach, the method further includes：

The current location is recorded to the routing information of next position, wherein, the routing information includes described current Position to the range information of next position, the current location between next position obstacle information, The current location is to the distance between barrier information.

6. a kind of path planning apparatus, which is characterized in that including：

First acquisition unit, for obtaining the position of the current location of target robot and candidate target point, wherein,

The current location and the position of candidate target point are in default path planning, and in the default path planning In, the position of the candidate target point is located at after the current location, as the target robot by the current location It sets out the position candidate of next position that will be reached；

First determination unit for the position according to the current location and candidate target point, determines the candidate target Whether the position of point is next position that the target robot will reach；

Second determination unit, for being next position that the target robot will reach in the position of the candidate target point In the case of putting, using the position of the candidate target point as next position of current location described in physical planning path； And in the case of in the position of the candidate target point not being next position that the target robot will reach, institute is returned First acquisition unit is stated, and is redefined until the position of the candidate target point is that the target robot will reach down One position.

7. device according to claim 6, which is characterized in that first determination unit includes：

First acquisition module, for obtaining the letter of the barrier between the current location and the position of candidate target point Breath, wherein, the information of the barrier includes the marginal position of the barrier, and the marginal position is to be leaned in the barrier The position at the edge of the nearly default path planning；

First determining module for the marginal position based on the barrier, determines the target robot and the barrier Environmental state information；

Second determining module, for according to the environmental state information, whether the position for determining the candidate target point to be described Next position that target robot will reach.

8. device according to claim 7, which is characterized in that first determining module includes：

First determination sub-module, for determining the distance between the current location and the marginal position of the barrier；

Second determination sub-module if being less than the first preset distance for the distance, determines the target robot and the barrier The environmental state information for hindering object is status of fail；

Third determination sub-module, if determining institute not less than the first preset distance and less than the second preset distance for the distance The environmental state information for stating target robot and the barrier is precarious position；

4th determination sub-module, if determining institute not less than the second preset distance and less than third preset distance for the distance The environmental state information for stating target robot and the barrier is safe condition；

5th determination sub-module, if determining institute not less than third preset distance and less than the 4th preset distance for the distance The environmental state information for stating target robot and the barrier is success status；

9. device according to claim 7, which is characterized in that second determining module includes：

6th determination sub-module, for determining from the position that the current location reaches each candidate target point, the target The candidate actions performed needed for robot；

7th determination sub-module, it is each for by the corresponding candidate actions of each candidate target point and environmental state information, determining The evaluation of estimate of candidate target point；

Second acquisition module, for obtaining the maximum value in the corresponding evaluation of estimate of multiple candidate target points；

8th determination sub-module, for according to the corresponding environmental state information in the current location and current action and described Maximum value, whether the position for determining the candidate target point is next position that the target robot will reach.

10. device according to claim 9, which is characterized in that described device further includes：

Recording unit, for being next position that the target robot will reach in the position for determining the candidate target point After putting, the current location is recorded to the routing information of next position, wherein, the routing information includes described work as Front position to the range information of next position, the current location to the barrier letter between next position Breath, the current location to the distance between barrier information.

11. a kind of robot, which is characterized in that including：Path planning apparatus as described in any one of claim 6 to 10.