CN107479547B - Decision tree behavior decision algorithm based on teaching learning - Google Patents
Decision tree behavior decision algorithm based on teaching learning Download PDFInfo
- Publication number
- CN107479547B CN107479547B CN201710687194.0A CN201710687194A CN107479547B CN 107479547 B CN107479547 B CN 107479547B CN 201710687194 A CN201710687194 A CN 201710687194A CN 107479547 B CN107479547 B CN 107479547B
- Authority
- CN
- China
- Prior art keywords
- state
- state transition
- action
- teaching
- frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 230000006399 behavior Effects 0.000 title claims abstract description 43
- 238000003066 decision tree Methods 0.000 title claims abstract description 30
- 230000007704 transition Effects 0.000 claims abstract description 114
- 230000009471 action Effects 0.000 claims abstract description 51
- 239000011159 matrix material Substances 0.000 claims abstract description 47
- 238000000034 method Methods 0.000 claims abstract description 40
- 238000011156 evaluation Methods 0.000 claims abstract description 25
- 230000008569 process Effects 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 10
- 238000005070 sampling Methods 0.000 claims description 8
- 238000012546 transfer Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000005728 strengthening Methods 0.000 claims description 3
- 238000013461 design Methods 0.000 description 6
- 238000001514 detection method Methods 0.000 description 6
- 230000008447 perception Effects 0.000 description 5
- 238000011084 recovery Methods 0.000 description 5
- 238000004088 simulation Methods 0.000 description 5
- 238000011160 research Methods 0.000 description 3
- 230000002787 reinforcement Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 238000004040 coloring Methods 0.000 description 1
- 238000013400 design of experiment Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000004836 empirical method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000013401 experimental design Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0214—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with safety or protection criteria, e.g. avoiding hazardous areas
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0276—Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Feedback Control In General (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a decision tree behavior decision algorithm based on teaching learning, which mainly solves the problem that the existing decision algorithm in the prior art can not simultaneously meet the requirements of comprehensive complex scenes and stability. The decision tree behavior decision algorithm based on teaching learning comprises the following steps: storing a state transition rule of the teaching track; obtaining a state transition frequency matrix and a state transition probability matrix; constructing a reward; the decision tree evaluates the actions to be generated; updating a transition frequency matrix and a state transition probability matrix; the above procedure was repeated until the evaluation passed. Through the scheme, the invention achieves the purposes of maximum reasonability and safety of unmanned driving behavior decision.
Description
Technical Field
The invention relates to the field of unmanned driving, in particular to a decision tree behavior decision algorithm based on teaching learning.
Background
An unmanned vehicle is a high-level form of a mobile robot with autonomous driving capability. The intelligent computing system can realize three functions of environment perception, decision planning and motion control. Compared with other small mobile robots, the system is complex in structure. Besides basic mobile driving capability, the system can perform real-time data fusion and positioning by using various sensors such as radar and camera in cooperation with a special high-precision map, so as to realize perception and understanding of the current environment. Meanwhile, according to the road and moving obstacle information understood by the sensor, the vehicle uses a decision planning algorithm to cut out a reasonable and feasible expected track, and the control module carries out final vehicle moving behavior implementation. The whole intelligent computing system comprises important key technologies such as lane line detection, obstacle identification, high-precision maps, high-precision positioning, decision planning algorithms, controller design and the like, relates to numerous disciplinary knowledge, and has extremely high theoretical research significance and engineering practice value.
The field of unmanned vehicle research includes three directions of environment perception, behavior decision and planning control. The behavior decision is used as a central position for connecting environment perception and planning control, has a very important position, and has become a key point and a difficulty point of research in the field of unmanned driving. The behavior decision is the process of selecting the best scheme which meets the purpose of self behavior from several feasible schemes selectable under the current environment. In this process, a specific decision algorithm is often needed to perform prediction evaluation on the result state after the action is taken, and the best action is selected under the unified judgment standard. For the unmanned vehicle, the behavior decision needs to acquire perception and understanding of the external environment according to data information fused by sensors such as a current radar and a camera, reasonably predict the next behavior to be executed by the vehicle, transmit the selectable behavior to a planning control system in a physical value form according to a decision algorithm, and further realize the expected behavior of a decision module so as to realize unmanned autonomous driving of the vehicle.
The behavior decision theory appears in the fields of psychology, management and economics at first, and is gradually expanded to be applied to other directions later. Currently, behavior decisions regarding vehicles are mainly focused on traditional empirical methods such as finite state machines, decision trees, multi-attribute decisions, and learning-based prediction methods. Experience-based design methods cannot be extended to synthesize complex scenes; although the learning prediction-based method has stability and safety which are difficult to determine for behaviors, the adaptability to scenes is far better than that of the experience-based design method. In view of the development of unmanned driving, the problem of complexity and variability of scenes is necessarily faced, and a learning prediction-based method becomes the best option for realizing vehicle behavior decision. Teaching learning as a learning prediction-based method can effectively solve the expansibility of a scene, and is an efficient behavior decision solution.
In practical applications, teaching learning alone as part of the decision-making of unmanned behavior does not solve this problem. The decision of the driverless behavior should ensure the maximum rationality of the behavior. The common teaching and learning is to carry out probability modeling on the behavior decision of unmanned driving theoretically, and the unreasonable behavior is difficult to avoid to the greatest extent from the practical problem. In addition, the data of the teaching part does not completely cover the global space. The teaching data provides somewhat less a priori decision knowledge. For the decision-making problem of the unmanned behavior, the decision-making system needs to be able to continuously strengthen the updating strategy on the basis of the prior knowledge.
Disclosure of Invention
The invention aims to provide a decision tree behavior decision algorithm based on teaching learning, which aims to solve the problem that the unreasonable behavior is difficult to avoid to the greatest extent from the practical problem in the conventional decision algorithm.
In order to solve the above problems, the present invention provides the following technical solutions:
a decision tree behavior decision algorithm based on teaching learning comprises the following steps:
(a) describing a teaching rule in teaching learning by using a state transition frequency matrix and a state transition probability matrix of the behavior, and storing the state transition rule of a teaching track;
(b) obtaining a state transition frequency matrix and a state transition probability matrix according to the step (a);
(c) constructing a reward according to the state transition frequency;
(d) when the transition probability matrix outputs the selection action to be performed, the decision tree evaluates the action to be generated by the state transition probability matrix according to the step (b), if the evaluation is passed, the state transition is executed, and if the evaluation is not passed, the step (e) is executed;
(e) updating a transition frequency matrix and a state transition probability matrix through an Actor-Critic algorithm according to the steps (b) and (c);
(f) repeating steps (d) and (e) until the evaluation is passed.
Specifically, the specific process of step (a) is as follows: firstly, rasterizing the length of a predicted road surface; designing a state transition table for recording a conversion relation; and filling the frequency of the transition table in a matrix form, wherein the frequency is used as the frequency of transition from the current state to the subsequent state in the teaching, and the state transition probability is obtained by calculating the access frequency of the subsequent n possible states of the current state through a softmax function.
Specifically, the specific process of step (b) is as follows: the state transition frequency is the number of times of the state to be accessed in the current state, and the state transition probability is the transition probability value obtained by calculating the number of times; discretizing and sampling the state transition track of the teaching learning to construct a state transition frequency matrix, wherein the state transition probability is obtained by calculating the access frequency of the subsequent n possible states of the current state through a softmax function.
Specifically, the specific process of step (c) is as follows: comparing the state action to be performed with the expected state action; if the result meets the expectation, the reward is added, otherwise, the negative reward punishment is carried out; if the behavior which is closer to the expected action than the selected action appears in other unselected actions in the current state, carrying out reward point adding; finally, fitting the discrete state points to obtain a planning curve; wherein the variation expression of the reward is designed as follows:
the above equation indicates that when the action is as desired, Δ r may be set to + 1; conversely, when the action is not desired, Δ r may be set to-1, where auIs the desired action, and a is the action to be performed.
Specifically, the specific process of step (d) is: the decision tree judges the reasonability and safety of action transfer through two aspects; if all the evaluation is satisfied, the evaluation is passed, otherwise, the evaluation is not passed;
firstly, judging the reasonability of state transition to confirm that the vehicle can realize the transition under the condition of self physical condition limitation; the evaluation procedure is si→sj,||i-j||=1;
In the above formula siRepresents the ith state; the formula shows that the vehicle can select a transition state in the adjacent state of the current state every time the vehicle moves;
secondly, after the track points are fitted, expansion is carried out, and no other obstacles exist in the track travelable area:
whereinIs the abscissa, xo, of the state si relative to the vehiclebstacle,yobstacleObstacle horizontal and vertical coordinates, x, of adjacent areawidth,y length1/2 for the width and length of the vehicle, respectively.
Specifically, the specific process of step (e) is: the strengthening method comprises the following steps:
t=rt+γV(st+1)-V(st),p(st,at):=p(st,at)+βt
wherein r istAwarding immediately; v(s)t) Is the predicted jackpot for the current state, V(s)t+1) Is the jackpot after prediction from the next state, β is the update degree, γ is the reward confidence after the current prediction, p(s)t,at) Is in a state stPerforming action atThe formula (2) is updated based on transition probabilities obtained by teaching learned transition frequencies.
Compared with the prior art, the invention has the following beneficial effects: the decision tree algorithm is located in the middle position, and the state transition rule is carried upwards and is connected downwards to strengthen or correct the state transition rule. For the teaching rule of a human driver, the method defines two matrixes of state transition frequency and state transition probability for description. The state transition frequency is the number of times of the state to be accessed in the current state, and the state transition probability is the transition probability value obtained by calculating such number of times. When the transition probability outputs the selection action to be performed, the decision tree algorithm needs to check and evaluate the rationality or safety of the current action. After the decision tree evaluation, the algorithm can correct the current state transition frequency matrix, increase the frequency of reasonable actions and reduce the frequency of unreasonable actions. The corrected state transition frequency matrix can continuously calculate the corresponding transition probability so as to carry out reciprocating cycle reinforcement; the maximum reasonability and safety of unmanned behavior decision are ensured.
Drawings
Fig. 1 is a diagram of expert teach lane access in the present invention.
FIG. 2 is a graph of the recovery results of the present invention.
FIG. 3 is a first partial experimental data recovery fit.
FIG. 4 is a second partial experimental data recovery fit.
Fig. 5 is a third graph of the recovery fit of a portion of experimental data.
Fig. 6 is a graph of a partial experimental data recovery fit.
Detailed Description
The present invention is further illustrated by the following figures and examples, which include, but are not limited to, the following examples.
In the whole algorithm frame, the decision tree algorithm is in the middle position, the state transition rule is carried upwards, and the state transition rule is strengthened or corrected downwards in a connected mode. For the teaching rule of a human driver, the method defines two matrixes of state transition frequency and state transition probability for description. The state transition frequency is the number of times of the state to be accessed in the current state, and the state transition probability is the transition probability value obtained by calculating such number of times. When the transition probability outputs the selection action to be performed, the decision tree algorithm needs to check and evaluate the rationality or safety of the current action. After the decision tree evaluation, the algorithm can correct the current state transition frequency matrix, increase the frequency of reasonable actions and reduce the frequency of unreasonable actions. The corrected state transition frequency matrix can continuously calculate the corresponding transition probability so as to carry out reciprocating cycle reinforcement; the specific process is as follows:
the decision tree behavior decision algorithm based on teaching learning comprises the following steps:
(a) describing a teaching rule in teaching learning by using a state transition frequency matrix and a state transition probability matrix of the behavior, and storing the state transition rule of a teaching track;
firstly, rasterizing the length of a predicted road surface; designing a state transition table for recording a conversion relation; filling the frequency of a transition table in a matrix form, and taking the frequency as the frequency of transition from the current state to the subsequent state in the teaching, wherein the state transition probability is obtained by calculating the access frequency of n subsequent possible states of the current state through a softmax function;
(b) obtaining a state transition frequency matrix and a state transition probability matrix according to the step (a);
the state transition frequency is the number of times of the state to be accessed in the current state, and the state transition probability is the transition probability value obtained by calculating the number of times; discretizing and sampling the state transition track of the teaching learning to construct a state transition frequency matrix, wherein the state transition probability is obtained by calculating the access frequency of the subsequent n possible states of the current state through a softmax function.
(c) Constructing a reward according to the state transition frequency;
comparing the state action to be performed with the expected state action; if the result meets the expectation, the reward is added, otherwise, the negative reward punishment is carried out; if the behavior which is closer to the expected action than the selected action appears in other unselected actions in the current state, carrying out reward point adding; finally, fitting the discrete state points to obtain a planning curve; wherein the variation expression of the reward is designed as follows:
the above equation indicates that when the action is as desired, Δ r may be set to + 1; conversely, when the action is not desired, Δ r may be set to-1, where auIs the desired action, and a is the action to be performed.
(d) When the transition probability matrix outputs the selection action to be performed, the decision tree evaluates the action to be generated by the state transition probability matrix according to the step (b), if the evaluation is passed, the state transition is executed, and if the evaluation is not passed, the step (e) is executed;
the decision tree judges the reasonability and safety of action transfer through two aspects; if all the evaluation is satisfied, the evaluation is passed, otherwise, the evaluation is not passed;
firstly, judging the reasonability of state transition to confirm that the vehicle can realize the transition under the condition of self physical condition limitation; the evaluation procedure is si→sj,||i-j||=1;
In the above formula siRepresents the ith state; the formula shows that the vehicle can select a transition state in the adjacent state of the current state every time the vehicle moves;
secondly, after the track points are fitted, expansion is carried out, and no other obstacles exist in the track travelable area:
whereinIs the abscissa, x, of the state si with respect to the vehicleobstacle,yobstacleObstacle horizontal and vertical coordinates, x, of adjacent areawidth,y length1/2 for the width and length of the vehicle, respectively. .
(e) Updating a transition frequency matrix and a state transition probability matrix through an Actor-Critic algorithm according to the steps (b) and (c);
the strengthening method comprises the following steps:
t=rt+γV(st+1)-V(st),p(st,at):=p(st,at)+βt
wherein r istAwarding immediately; v(s)t) Is the predicted jackpot for the current state, V(s)t+1) Is the jackpot after prediction from the next state, β is the update degree, γ is the reward confidence after the current prediction, p(s)t,at) Is in a state stPerforming action atThe formula (2) is updated based on transition probabilities obtained by teaching learned transition frequencies.
(f) Repeating steps (d) and (e) until the evaluation is passed.
The strategy updating part of the invention adopts an Actor-Critic algorithm. The Actor-criticic algorithm is a model-free algorithm that can be used in model-free cases as well as in model-present cases. The model-free solving algorithm is a major breakthrough in the Markov solving method, and moves the mathematical means with stronger theoretical performance to the application scene which is more in line with the actual concrete problem. Within the category of model-based algorithms, a common attribute is that the solution strategy needs to rely on existing prior transition models and reward structures. In contrast, a model-free solution strategy is not required. In general, it is very difficult to ideally model problems in life. It can be said that the complete markov process model is hidden in life and is difficult to make an obvious decision. From this point, the modeless algorithm is more suitable for solving the specific problem by simplifying the processing of the problem model constraint condition with the original theory. In the model-free algorithm, an agent can sample and acquire transition probability and other relevant variable information based on model definition through interaction to the environment, acquire prior cognition to the environment from a statistical view angle, and estimate a required reward function; alternatively, the agent uses a modeless algorithm to solve the reward function in a fuzzy manner to approximate the optimization objective. This is really a choice in two directions. After the first method is interactive with the environment to learn and obtain the transition probability and the reward model, the optimal strategy can be obtained by using a model-based solving method. The second direct approximation method is to solve the optimal strategy by a fuzzy means, and does not need any requirement on the form of the model. In many solution methods, there is an algorithm that combines both methods, and the reward learning is accelerated by using an approximate model while estimating the reward function, and both methods are iteratively updated. It should be noted that, in these model-free algorithms, the second direct approximation method is most concerned and the application range is also the widest; the following are simulation design and experimental design in the specific experimental process;
simulation design
In this simulation, the states need to be discretized into 27. The final state transition matrix size is. The probability matrix of the transition is calculated from the access frequency of the next 5 possible states of the current state.
The decision tree framework here detects the feasibility of the transition state. In this simulation, the decision tree will be detected as follows:
1. the lane number of the state jump is detected. If the difference in the values between lane numbers is greater than 2, it is indicated that the vehicle is facing a direct jump from the current leftmost lane to the rightmost lane. The algorithm sets the access frequency of the subsequent state after the state to 0. The algorithm continues to select a transferable state with the highest probability from the updated frequency matrix.
2. After passing the lane number detection, the algorithm needs to detect whether the left or right turn hits other obstacles. If other obstacles are hit, the access frequency of the subsequent state is reduced to half of the original one. The algorithm continues to select a transferable state with the highest probability from the updated frequency matrix.
3. After the above detection, the algorithm may perform a state transition.
FIGS. 1 and 2 show simulation results
Design of experiments
In this experiment, the teaching data was derived from sampling the vehicle travel trajectory. During the driving of the vehicle by the driver, the vehicle generally travels in the right lane. When meeting an obstacle, the vehicle changes lanes to avoid in a certain distance between the right lane and the right lane. There are 5 sampling states between the vehicle and the obstacle at the time of lane change. For such a sampling process, the detection tree and the enhancement process are as follows:
a transfer matrix is constructed by utilizing the discretized sampling state, and the transfer matrix is filled according to the sampling data;
1. calculating a transition probability matrix by using the obtained transition frequency matrix;
2. the rationality of the state jumps is detected by means of a detection tree. The state of the vehicle is not allowed to jump directly from the right lane to the left lane or from the left lane to the right lane.
3. Detecting whether the left turn or the right turn of the state can touch an obstacle or not by using detection;
4. for the jump of the state, the distance between the current state and the obstacle is calculated. And if the distance between the current state and the obstacle is calculated to be larger than the distance between the vehicle and the obstacle when the vehicle deflects the track in the teaching data and is a state distance, adding 1 to the access frequency of the subsequent adjacent state of the non-same lane of the state.
5. And updating the frequency matrix, and selecting the state with the highest probability for transfer.
6. And carrying out interpolation fitting on the discrete states.
Fig. 3 to 6 are experimental results.
The invention is well implemented in accordance with the above-described embodiments. It should be noted that, based on the above structural design, in order to solve the same technical problems, even if some insubstantial modifications or colorings are made on the present invention, the adopted technical solution is still the same as the present invention, and therefore, the technical solution should be within the protection scope of the present invention.
Claims (6)
1. A decision tree behavior decision algorithm based on teaching learning is characterized by comprising the following steps:
(a) describing a teaching rule in teaching learning by using a state transition frequency matrix and a state transition probability matrix of the behavior, and storing the state transition rule of a teaching track;
(b) obtaining a state transition frequency matrix and a state transition probability matrix according to the step (a);
(c) constructing a reward according to the state transition frequency;
(d) when the state transition probability matrix outputs the selection action to be performed, the decision tree evaluates the action to be generated by the state transition probability matrix according to the step (b), if the evaluation is passed, the state transition is executed, and if the evaluation is not passed, the step (e) is executed;
(e) updating a state transition frequency matrix and a state transition probability matrix through an Actor-Critic algorithm according to the steps (b) and (c);
(f) repeating steps (d) and (e) until the evaluation is passed.
2. The decision tree behavior decision algorithm based on teaching learning of claim 1, wherein the specific process of step (a) is as follows: firstly, rasterizing the length of a predicted road surface; designing a state transition table for recording a conversion relation; and filling the frequency of the state transition table in a matrix form, wherein the frequency is used as the frequency of transition from the current state to the subsequent state in the teaching, and the state transition probability is obtained by calculating the access frequency of the subsequent n possible states of the current state through a softmax function.
3. The decision tree behavior decision algorithm based on teaching learning of claim 1, wherein the specific process of step (b) is as follows: the state transition frequency is the number of times of the state to be accessed in the current state, and the state transition probability is the transition probability value obtained by calculating the number of times; discretizing and sampling the state transition track of the teaching learning to construct a state transition frequency matrix, wherein the state transition probability is obtained by calculating the access frequency of the subsequent n possible states of the current state through a softmax function.
4. The decision tree behavior decision algorithm based on teaching learning of claim 1, wherein the specific process of step (c) is as follows: comparing the state action to be performed with the expected state action; if the result meets the expectation, the reward is added, otherwise, the negative reward punishment is carried out; if the behavior which is closer to the expected action than the selected action appears in other unselected actions in the current state, carrying out reward point adding; finally, fitting the discrete state points to obtain a planning curve; wherein the variation expression of the reward is designed as follows:
the above equation indicates that when the motion is expected, Δ r is set to + 1; conversely, when the action is not desirable, setting Δ r to-1, where auIs a desired action, a is an action to be performed; Δ r denotes the execution of action auThe latter prize value.
5. The decision tree behavior decision algorithm based on teaching learning of claim 1, wherein the specific process of step (d) is as follows: the decision tree judges the reasonability and safety of action transfer through two aspects; if all the evaluation is satisfied, the evaluation is passed, otherwise, the evaluation is not passed;
firstly, judging the reasonability of state transition to confirm that the vehicle can realize the transition under the condition of self physical condition limitation; the evaluation procedure is si→sj,||i-j||=1;
In the above formula siRepresents the ith state; this equation indicates that each time the vehicle is moving, the transition state is selected in the vicinity of the current state, where siAnd sjRespectively representing the states before and after a certain action is executed, and | i-j | ═ 1 is a constraint condition of the states; the value ranges of i and j are both natural numbers;
secondly, after the track points are fitted, expansion is carried out, and no other obstacles exist in the track travelable area:
6. The teach-learning-based decision tree behavior decision algorithm according to claim 1, wherein the specific process of step (e) is; the strengthening method comprises the following steps:
t=rt+γV(st+1)-V(st),p(st,at)=p(st,at)+βt
wherein V(s)t) Is the predicted jackpot for the current state, V(s)t+1) Is the jackpot after prediction from the next state, β is the update degree, γ is the reward confidence after the current prediction, p(s)t,at) Is in a state stPerforming action atThe formula (c) is updated on the basis of transition probabilities obtained by teaching the learned transition frequencies; whereintIs in slave state stTo st+1TD error of (r)tIs a state stTo st+1Is awarded immediately.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710687194.0A CN107479547B (en) | 2017-08-11 | 2017-08-11 | Decision tree behavior decision algorithm based on teaching learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710687194.0A CN107479547B (en) | 2017-08-11 | 2017-08-11 | Decision tree behavior decision algorithm based on teaching learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107479547A CN107479547A (en) | 2017-12-15 |
CN107479547B true CN107479547B (en) | 2020-11-24 |
Family
ID=60600126
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710687194.0A Expired - Fee Related CN107479547B (en) | 2017-08-11 | 2017-08-11 | Decision tree behavior decision algorithm based on teaching learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107479547B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108229730B (en) * | 2017-12-19 | 2021-07-20 | 同济大学 | Unmanned vehicle track generation method based on fuzzy reward |
CN108363393B (en) * | 2018-02-05 | 2019-09-27 | 腾讯科技(深圳)有限公司 | A kind of smart motion equipment and its air navigation aid and storage medium |
CN108446727B (en) * | 2018-03-09 | 2021-09-21 | 上海安亭地平线智能交通技术有限公司 | Driving behavior decision method and system and electronic equipment |
CN110738221B (en) * | 2018-07-18 | 2024-04-26 | 华为技术有限公司 | Computing system and method |
CN110084539B (en) * | 2018-11-30 | 2021-10-22 | 武汉大学 | Irrigation decision learning method, device, server and storage medium |
CN109461342B (en) * | 2018-12-19 | 2023-06-27 | 畅加风行(苏州)智能科技有限公司 | Teaching system for unmanned motor vehicle and teaching method thereof |
CN110568848B (en) * | 2019-09-10 | 2022-09-23 | 东风商用车有限公司 | Teaching automatic driving operation system of sweeper |
JP7211375B2 (en) * | 2020-01-09 | 2023-01-24 | トヨタ自動車株式会社 | vehicle controller |
CN112141098B (en) * | 2020-09-30 | 2022-01-25 | 上海汽车集团股份有限公司 | Obstacle avoidance decision method and device for intelligent driving automobile |
US11577732B2 (en) * | 2020-10-28 | 2023-02-14 | Argo AI, LLC | Methods and systems for tracking a mover's lane over time |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6162905A (en) * | 1984-09-04 | 1986-03-31 | Komatsu Ltd | Automatic operating method of unmanned vehicle |
JPH01106113A (en) * | 1987-10-19 | 1989-04-24 | Toshiba Corp | Cleaning robot device |
JPH08101712A (en) * | 1994-09-30 | 1996-04-16 | Mitsubishi Heavy Ind Ltd | On-line teaching device for unmanned carriage passing path |
CN102929281A (en) * | 2012-11-05 | 2013-02-13 | 西南科技大学 | Robot k-nearest-neighbor (kNN) path planning method under incomplete perception environment |
CN103792846A (en) * | 2014-02-18 | 2014-05-14 | 北京工业大学 | Robot obstacle avoidance guiding method based on Skinner operating condition reflection principle |
CN104570738A (en) * | 2014-12-30 | 2015-04-29 | 北京工业大学 | Robot track tracing method based on Skinner operant conditioning automata |
CN105094124A (en) * | 2014-05-21 | 2015-11-25 | 防灾科技学院 | Method and model for performing independent path exploration based on operant conditioning |
CN105487537A (en) * | 2015-11-06 | 2016-04-13 | 福州华鹰重工机械有限公司 | Vehicle motion planning method and unmanned vehicle |
CN105700526A (en) * | 2016-01-13 | 2016-06-22 | 华北理工大学 | On-line sequence limit learning machine method possessing autonomous learning capability |
CN106970615A (en) * | 2017-03-21 | 2017-07-21 | 西北工业大学 | A kind of real-time online paths planning method of deeply study |
-
2017
- 2017-08-11 CN CN201710687194.0A patent/CN107479547B/en not_active Expired - Fee Related
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6162905A (en) * | 1984-09-04 | 1986-03-31 | Komatsu Ltd | Automatic operating method of unmanned vehicle |
JPH01106113A (en) * | 1987-10-19 | 1989-04-24 | Toshiba Corp | Cleaning robot device |
JPH08101712A (en) * | 1994-09-30 | 1996-04-16 | Mitsubishi Heavy Ind Ltd | On-line teaching device for unmanned carriage passing path |
CN102929281A (en) * | 2012-11-05 | 2013-02-13 | 西南科技大学 | Robot k-nearest-neighbor (kNN) path planning method under incomplete perception environment |
CN103792846A (en) * | 2014-02-18 | 2014-05-14 | 北京工业大学 | Robot obstacle avoidance guiding method based on Skinner operating condition reflection principle |
CN105094124A (en) * | 2014-05-21 | 2015-11-25 | 防灾科技学院 | Method and model for performing independent path exploration based on operant conditioning |
CN104570738A (en) * | 2014-12-30 | 2015-04-29 | 北京工业大学 | Robot track tracing method based on Skinner operant conditioning automata |
CN105487537A (en) * | 2015-11-06 | 2016-04-13 | 福州华鹰重工机械有限公司 | Vehicle motion planning method and unmanned vehicle |
CN105700526A (en) * | 2016-01-13 | 2016-06-22 | 华北理工大学 | On-line sequence limit learning machine method possessing autonomous learning capability |
CN106970615A (en) * | 2017-03-21 | 2017-07-21 | 西北工业大学 | A kind of real-time online paths planning method of deeply study |
Non-Patent Citations (4)
Title |
---|
《Adaptive output feedback control for uncertain nonholonomic chained systems》;YUAN Zhan-ping.etc;《中南大学学报(英文版)》;20101231(第3期);第572-579页 * |
《IMITATION LEARNING OF CAR DRIVING SKILLS WITH DECISION TREES AND RANDOM FORESTS》;PAWEŁ CICHOSZ.etc;《International Journal of Applied Mathematics & Computer Science》;20140930;第24卷(第3期);第579-597页 * |
《Imitation Learning: A Survey of Learning Methods》;Ahmed Hussein.etc;《ACM Computing Surveys》;20170430;第50卷(第2期);第21:1-21:35页 * |
《Self-reproduction for articulated behaviors with dual humanoid robots using on-line decision tree classification》;Jane Brooks Zurn.etc;《Robotica》;20110624;第30卷;第315-332页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107479547A (en) | 2017-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107479547B (en) | Decision tree behavior decision algorithm based on teaching learning | |
CN110834644B (en) | Vehicle control method and device, vehicle to be controlled and storage medium | |
CN111780777B (en) | Unmanned vehicle route planning method based on improved A-star algorithm and deep reinforcement learning | |
CN111098852B (en) | Parking path planning method based on reinforcement learning | |
CN112356830B (en) | Intelligent parking method based on model reinforcement learning | |
WO2021103834A1 (en) | Method for generating lane changing decision model, lane changing decision method for driverless vehicle, and device | |
Zhao et al. | A novel direct trajectory planning approach based on generative adversarial networks and rapidly-exploring random tree | |
Wang et al. | A survey of learning‐based robot motion planning | |
CN108229730B (en) | Unmanned vehicle track generation method based on fuzzy reward | |
Micheli et al. | NMPC trajectory planner for urban autonomous driving | |
CN111781922A (en) | Multi-robot collaborative navigation method based on deep reinforcement learning and suitable for complex dynamic scene | |
Tian et al. | Personalized lane change planning and control by imitation learning from drivers | |
Lodhi et al. | Autonomous vehicular overtaking maneuver: A survey and taxonomy | |
Yu et al. | Hierarchical reinforcement learning combined with motion primitives for automated overtaking | |
Moghadam et al. | A deep reinforcement learning approach for long-term short-term planning on frenet frame | |
Tang et al. | Actively learning Gaussian process dynamical systems through global and local explorations | |
CN114912693A (en) | Multi-mode prediction-based automatic driving automobile motion planning method | |
Ferreira et al. | Full neural predictors, with fixed time horizon, for a Truck-Trailer-Trailer prototype of a multi-articulated robot, in backward movements-singular conditions and critical angles | |
Yang et al. | Deep Reinforcement Learning Lane-Changing Decision Algorithm for Intelligent Vehicles Combining LSTM Trajectory Prediction | |
Wang et al. | An Enabling Decision-Making Scheme by Considering Trajectory Prediction and Motion Uncertainty | |
Zeng et al. | Risk-aware deep reinforcement learning for decision-making and planning of autonomous vehicles | |
Fan et al. | A hierarchical control strategy for reliable lane changes considering optimal path and lane‐changing time point | |
Zhang et al. | Maximum entropy inverse reinforcement learning-based trajectory planning for autonomous driving | |
Khalajzadeh et al. | A review on applicability of expert system in designing and control of autonomous cars | |
CN117789502A (en) | System and method for distributed awareness target prediction for modular autonomous vehicle control |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20201124 |
|
CF01 | Termination of patent right due to non-payment of annual fee |