CN107479547B - Decision tree behavior decision algorithm based on teaching learning - Google Patents

Decision tree behavior decision algorithm based on teaching learning Download PDF

Info

Publication number
CN107479547B
CN107479547B CN201710687194.0A CN201710687194A CN107479547B CN 107479547 B CN107479547 B CN 107479547B CN 201710687194 A CN201710687194 A CN 201710687194A CN 107479547 B CN107479547 B CN 107479547B
Authority
CN
China
Prior art keywords
state
state transition
action
teaching
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710687194.0A
Other languages
Chinese (zh)
Other versions
CN107479547A (en
Inventor
王祝萍
邢文治
张皓
陈启军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN201710687194.0A priority Critical patent/CN107479547B/en
Publication of CN107479547A publication Critical patent/CN107479547A/en
Application granted granted Critical
Publication of CN107479547B publication Critical patent/CN107479547B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0214Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with safety or protection criteria, e.g. avoiding hazardous areas
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a decision tree behavior decision algorithm based on teaching learning, which mainly solves the problem that the existing decision algorithm in the prior art can not simultaneously meet the requirements of comprehensive complex scenes and stability. The decision tree behavior decision algorithm based on teaching learning comprises the following steps: storing a state transition rule of the teaching track; obtaining a state transition frequency matrix and a state transition probability matrix; constructing a reward; the decision tree evaluates the actions to be generated; updating a transition frequency matrix and a state transition probability matrix; the above procedure was repeated until the evaluation passed. Through the scheme, the invention achieves the purposes of maximum reasonability and safety of unmanned driving behavior decision.

Description

Decision tree behavior decision algorithm based on teaching learning
Technical Field
The invention relates to the field of unmanned driving, in particular to a decision tree behavior decision algorithm based on teaching learning.
Background
An unmanned vehicle is a high-level form of a mobile robot with autonomous driving capability. The intelligent computing system can realize three functions of environment perception, decision planning and motion control. Compared with other small mobile robots, the system is complex in structure. Besides basic mobile driving capability, the system can perform real-time data fusion and positioning by using various sensors such as radar and camera in cooperation with a special high-precision map, so as to realize perception and understanding of the current environment. Meanwhile, according to the road and moving obstacle information understood by the sensor, the vehicle uses a decision planning algorithm to cut out a reasonable and feasible expected track, and the control module carries out final vehicle moving behavior implementation. The whole intelligent computing system comprises important key technologies such as lane line detection, obstacle identification, high-precision maps, high-precision positioning, decision planning algorithms, controller design and the like, relates to numerous disciplinary knowledge, and has extremely high theoretical research significance and engineering practice value.
The field of unmanned vehicle research includes three directions of environment perception, behavior decision and planning control. The behavior decision is used as a central position for connecting environment perception and planning control, has a very important position, and has become a key point and a difficulty point of research in the field of unmanned driving. The behavior decision is the process of selecting the best scheme which meets the purpose of self behavior from several feasible schemes selectable under the current environment. In this process, a specific decision algorithm is often needed to perform prediction evaluation on the result state after the action is taken, and the best action is selected under the unified judgment standard. For the unmanned vehicle, the behavior decision needs to acquire perception and understanding of the external environment according to data information fused by sensors such as a current radar and a camera, reasonably predict the next behavior to be executed by the vehicle, transmit the selectable behavior to a planning control system in a physical value form according to a decision algorithm, and further realize the expected behavior of a decision module so as to realize unmanned autonomous driving of the vehicle.
The behavior decision theory appears in the fields of psychology, management and economics at first, and is gradually expanded to be applied to other directions later. Currently, behavior decisions regarding vehicles are mainly focused on traditional empirical methods such as finite state machines, decision trees, multi-attribute decisions, and learning-based prediction methods. Experience-based design methods cannot be extended to synthesize complex scenes; although the learning prediction-based method has stability and safety which are difficult to determine for behaviors, the adaptability to scenes is far better than that of the experience-based design method. In view of the development of unmanned driving, the problem of complexity and variability of scenes is necessarily faced, and a learning prediction-based method becomes the best option for realizing vehicle behavior decision. Teaching learning as a learning prediction-based method can effectively solve the expansibility of a scene, and is an efficient behavior decision solution.
In practical applications, teaching learning alone as part of the decision-making of unmanned behavior does not solve this problem. The decision of the driverless behavior should ensure the maximum rationality of the behavior. The common teaching and learning is to carry out probability modeling on the behavior decision of unmanned driving theoretically, and the unreasonable behavior is difficult to avoid to the greatest extent from the practical problem. In addition, the data of the teaching part does not completely cover the global space. The teaching data provides somewhat less a priori decision knowledge. For the decision-making problem of the unmanned behavior, the decision-making system needs to be able to continuously strengthen the updating strategy on the basis of the prior knowledge.
Disclosure of Invention
The invention aims to provide a decision tree behavior decision algorithm based on teaching learning, which aims to solve the problem that the unreasonable behavior is difficult to avoid to the greatest extent from the practical problem in the conventional decision algorithm.
In order to solve the above problems, the present invention provides the following technical solutions:
a decision tree behavior decision algorithm based on teaching learning comprises the following steps:
(a) describing a teaching rule in teaching learning by using a state transition frequency matrix and a state transition probability matrix of the behavior, and storing the state transition rule of a teaching track;
(b) obtaining a state transition frequency matrix and a state transition probability matrix according to the step (a);
(c) constructing a reward according to the state transition frequency;
(d) when the transition probability matrix outputs the selection action to be performed, the decision tree evaluates the action to be generated by the state transition probability matrix according to the step (b), if the evaluation is passed, the state transition is executed, and if the evaluation is not passed, the step (e) is executed;
(e) updating a transition frequency matrix and a state transition probability matrix through an Actor-Critic algorithm according to the steps (b) and (c);
(f) repeating steps (d) and (e) until the evaluation is passed.
Specifically, the specific process of step (a) is as follows: firstly, rasterizing the length of a predicted road surface; designing a state transition table for recording a conversion relation; and filling the frequency of the transition table in a matrix form, wherein the frequency is used as the frequency of transition from the current state to the subsequent state in the teaching, and the state transition probability is obtained by calculating the access frequency of the subsequent n possible states of the current state through a softmax function.
Specifically, the specific process of step (b) is as follows: the state transition frequency is the number of times of the state to be accessed in the current state, and the state transition probability is the transition probability value obtained by calculating the number of times; discretizing and sampling the state transition track of the teaching learning to construct a state transition frequency matrix, wherein the state transition probability is obtained by calculating the access frequency of the subsequent n possible states of the current state through a softmax function.
Specifically, the specific process of step (c) is as follows: comparing the state action to be performed with the expected state action; if the result meets the expectation, the reward is added, otherwise, the negative reward punishment is carried out; if the behavior which is closer to the expected action than the selected action appears in other unselected actions in the current state, carrying out reward point adding; finally, fitting the discrete state points to obtain a planning curve; wherein the variation expression of the reward is designed as follows:
Figure BDA0001377039320000041
the above equation indicates that when the action is as desired, Δ r may be set to + 1; conversely, when the action is not desired, Δ r may be set to-1, where auIs the desired action, and a is the action to be performed.
Specifically, the specific process of step (d) is: the decision tree judges the reasonability and safety of action transfer through two aspects; if all the evaluation is satisfied, the evaluation is passed, otherwise, the evaluation is not passed;
firstly, judging the reasonability of state transition to confirm that the vehicle can realize the transition under the condition of self physical condition limitation; the evaluation procedure is si→sj,||i-j||=1;
In the above formula siRepresents the ith state; the formula shows that the vehicle can select a transition state in the adjacent state of the current state every time the vehicle moves;
secondly, after the track points are fitted, expansion is carried out, and no other obstacles exist in the track travelable area:
Figure BDA0001377039320000042
wherein
Figure BDA0001377039320000043
Is the abscissa, xo, of the state si relative to the vehiclebstacle,yobstacleObstacle horizontal and vertical coordinates, x, of adjacent areawidth,y length1/2 for the width and length of the vehicle, respectively.
Specifically, the specific process of step (e) is: the strengthening method comprises the following steps:
t=rt+γV(st+1)-V(st),p(st,at):=p(st,at)+βt
wherein r istAwarding immediately; v(s)t) Is the predicted jackpot for the current state, V(s)t+1) Is the jackpot after prediction from the next state, β is the update degree, γ is the reward confidence after the current prediction, p(s)t,at) Is in a state stPerforming action atThe formula (2) is updated based on transition probabilities obtained by teaching learned transition frequencies.
Compared with the prior art, the invention has the following beneficial effects: the decision tree algorithm is located in the middle position, and the state transition rule is carried upwards and is connected downwards to strengthen or correct the state transition rule. For the teaching rule of a human driver, the method defines two matrixes of state transition frequency and state transition probability for description. The state transition frequency is the number of times of the state to be accessed in the current state, and the state transition probability is the transition probability value obtained by calculating such number of times. When the transition probability outputs the selection action to be performed, the decision tree algorithm needs to check and evaluate the rationality or safety of the current action. After the decision tree evaluation, the algorithm can correct the current state transition frequency matrix, increase the frequency of reasonable actions and reduce the frequency of unreasonable actions. The corrected state transition frequency matrix can continuously calculate the corresponding transition probability so as to carry out reciprocating cycle reinforcement; the maximum reasonability and safety of unmanned behavior decision are ensured.
Drawings
Fig. 1 is a diagram of expert teach lane access in the present invention.
FIG. 2 is a graph of the recovery results of the present invention.
FIG. 3 is a first partial experimental data recovery fit.
FIG. 4 is a second partial experimental data recovery fit.
Fig. 5 is a third graph of the recovery fit of a portion of experimental data.
Fig. 6 is a graph of a partial experimental data recovery fit.
Detailed Description
The present invention is further illustrated by the following figures and examples, which include, but are not limited to, the following examples.
In the whole algorithm frame, the decision tree algorithm is in the middle position, the state transition rule is carried upwards, and the state transition rule is strengthened or corrected downwards in a connected mode. For the teaching rule of a human driver, the method defines two matrixes of state transition frequency and state transition probability for description. The state transition frequency is the number of times of the state to be accessed in the current state, and the state transition probability is the transition probability value obtained by calculating such number of times. When the transition probability outputs the selection action to be performed, the decision tree algorithm needs to check and evaluate the rationality or safety of the current action. After the decision tree evaluation, the algorithm can correct the current state transition frequency matrix, increase the frequency of reasonable actions and reduce the frequency of unreasonable actions. The corrected state transition frequency matrix can continuously calculate the corresponding transition probability so as to carry out reciprocating cycle reinforcement; the specific process is as follows:
the decision tree behavior decision algorithm based on teaching learning comprises the following steps:
(a) describing a teaching rule in teaching learning by using a state transition frequency matrix and a state transition probability matrix of the behavior, and storing the state transition rule of a teaching track;
firstly, rasterizing the length of a predicted road surface; designing a state transition table for recording a conversion relation; filling the frequency of a transition table in a matrix form, and taking the frequency as the frequency of transition from the current state to the subsequent state in the teaching, wherein the state transition probability is obtained by calculating the access frequency of n subsequent possible states of the current state through a softmax function;
(b) obtaining a state transition frequency matrix and a state transition probability matrix according to the step (a);
the state transition frequency is the number of times of the state to be accessed in the current state, and the state transition probability is the transition probability value obtained by calculating the number of times; discretizing and sampling the state transition track of the teaching learning to construct a state transition frequency matrix, wherein the state transition probability is obtained by calculating the access frequency of the subsequent n possible states of the current state through a softmax function.
(c) Constructing a reward according to the state transition frequency;
comparing the state action to be performed with the expected state action; if the result meets the expectation, the reward is added, otherwise, the negative reward punishment is carried out; if the behavior which is closer to the expected action than the selected action appears in other unselected actions in the current state, carrying out reward point adding; finally, fitting the discrete state points to obtain a planning curve; wherein the variation expression of the reward is designed as follows:
Figure BDA0001377039320000071
the above equation indicates that when the action is as desired, Δ r may be set to + 1; conversely, when the action is not desired, Δ r may be set to-1, where auIs the desired action, and a is the action to be performed.
(d) When the transition probability matrix outputs the selection action to be performed, the decision tree evaluates the action to be generated by the state transition probability matrix according to the step (b), if the evaluation is passed, the state transition is executed, and if the evaluation is not passed, the step (e) is executed;
the decision tree judges the reasonability and safety of action transfer through two aspects; if all the evaluation is satisfied, the evaluation is passed, otherwise, the evaluation is not passed;
firstly, judging the reasonability of state transition to confirm that the vehicle can realize the transition under the condition of self physical condition limitation; the evaluation procedure is si→sj,||i-j||=1;
In the above formula siRepresents the ith state; the formula shows that the vehicle can select a transition state in the adjacent state of the current state every time the vehicle moves;
secondly, after the track points are fitted, expansion is carried out, and no other obstacles exist in the track travelable area:
Figure BDA0001377039320000072
wherein
Figure BDA0001377039320000073
Is the abscissa, x, of the state si with respect to the vehicleobstacle,yobstacleObstacle horizontal and vertical coordinates, x, of adjacent areawidth,y length1/2 for the width and length of the vehicle, respectively. .
(e) Updating a transition frequency matrix and a state transition probability matrix through an Actor-Critic algorithm according to the steps (b) and (c);
the strengthening method comprises the following steps:
t=rt+γV(st+1)-V(st),p(st,at):=p(st,at)+βt
wherein r istAwarding immediately; v(s)t) Is the predicted jackpot for the current state, V(s)t+1) Is the jackpot after prediction from the next state, β is the update degree, γ is the reward confidence after the current prediction, p(s)t,at) Is in a state stPerforming action atThe formula (2) is updated based on transition probabilities obtained by teaching learned transition frequencies.
(f) Repeating steps (d) and (e) until the evaluation is passed.
The strategy updating part of the invention adopts an Actor-Critic algorithm. The Actor-criticic algorithm is a model-free algorithm that can be used in model-free cases as well as in model-present cases. The model-free solving algorithm is a major breakthrough in the Markov solving method, and moves the mathematical means with stronger theoretical performance to the application scene which is more in line with the actual concrete problem. Within the category of model-based algorithms, a common attribute is that the solution strategy needs to rely on existing prior transition models and reward structures. In contrast, a model-free solution strategy is not required. In general, it is very difficult to ideally model problems in life. It can be said that the complete markov process model is hidden in life and is difficult to make an obvious decision. From this point, the modeless algorithm is more suitable for solving the specific problem by simplifying the processing of the problem model constraint condition with the original theory. In the model-free algorithm, an agent can sample and acquire transition probability and other relevant variable information based on model definition through interaction to the environment, acquire prior cognition to the environment from a statistical view angle, and estimate a required reward function; alternatively, the agent uses a modeless algorithm to solve the reward function in a fuzzy manner to approximate the optimization objective. This is really a choice in two directions. After the first method is interactive with the environment to learn and obtain the transition probability and the reward model, the optimal strategy can be obtained by using a model-based solving method. The second direct approximation method is to solve the optimal strategy by a fuzzy means, and does not need any requirement on the form of the model. In many solution methods, there is an algorithm that combines both methods, and the reward learning is accelerated by using an approximate model while estimating the reward function, and both methods are iteratively updated. It should be noted that, in these model-free algorithms, the second direct approximation method is most concerned and the application range is also the widest; the following are simulation design and experimental design in the specific experimental process;
simulation design
In this simulation, the states need to be discretized into 27. The final state transition matrix size is. The probability matrix of the transition is calculated from the access frequency of the next 5 possible states of the current state.
The decision tree framework here detects the feasibility of the transition state. In this simulation, the decision tree will be detected as follows:
1. the lane number of the state jump is detected. If the difference in the values between lane numbers is greater than 2, it is indicated that the vehicle is facing a direct jump from the current leftmost lane to the rightmost lane. The algorithm sets the access frequency of the subsequent state after the state to 0. The algorithm continues to select a transferable state with the highest probability from the updated frequency matrix.
2. After passing the lane number detection, the algorithm needs to detect whether the left or right turn hits other obstacles. If other obstacles are hit, the access frequency of the subsequent state is reduced to half of the original one. The algorithm continues to select a transferable state with the highest probability from the updated frequency matrix.
3. After the above detection, the algorithm may perform a state transition.
FIGS. 1 and 2 show simulation results
Design of experiments
In this experiment, the teaching data was derived from sampling the vehicle travel trajectory. During the driving of the vehicle by the driver, the vehicle generally travels in the right lane. When meeting an obstacle, the vehicle changes lanes to avoid in a certain distance between the right lane and the right lane. There are 5 sampling states between the vehicle and the obstacle at the time of lane change. For such a sampling process, the detection tree and the enhancement process are as follows:
a transfer matrix is constructed by utilizing the discretized sampling state, and the transfer matrix is filled according to the sampling data;
1. calculating a transition probability matrix by using the obtained transition frequency matrix;
2. the rationality of the state jumps is detected by means of a detection tree. The state of the vehicle is not allowed to jump directly from the right lane to the left lane or from the left lane to the right lane.
3. Detecting whether the left turn or the right turn of the state can touch an obstacle or not by using detection;
4. for the jump of the state, the distance between the current state and the obstacle is calculated. And if the distance between the current state and the obstacle is calculated to be larger than the distance between the vehicle and the obstacle when the vehicle deflects the track in the teaching data and is a state distance, adding 1 to the access frequency of the subsequent adjacent state of the non-same lane of the state.
5. And updating the frequency matrix, and selecting the state with the highest probability for transfer.
6. And carrying out interpolation fitting on the discrete states.
Fig. 3 to 6 are experimental results.
The invention is well implemented in accordance with the above-described embodiments. It should be noted that, based on the above structural design, in order to solve the same technical problems, even if some insubstantial modifications or colorings are made on the present invention, the adopted technical solution is still the same as the present invention, and therefore, the technical solution should be within the protection scope of the present invention.

Claims (6)

1. A decision tree behavior decision algorithm based on teaching learning is characterized by comprising the following steps:
(a) describing a teaching rule in teaching learning by using a state transition frequency matrix and a state transition probability matrix of the behavior, and storing the state transition rule of a teaching track;
(b) obtaining a state transition frequency matrix and a state transition probability matrix according to the step (a);
(c) constructing a reward according to the state transition frequency;
(d) when the state transition probability matrix outputs the selection action to be performed, the decision tree evaluates the action to be generated by the state transition probability matrix according to the step (b), if the evaluation is passed, the state transition is executed, and if the evaluation is not passed, the step (e) is executed;
(e) updating a state transition frequency matrix and a state transition probability matrix through an Actor-Critic algorithm according to the steps (b) and (c);
(f) repeating steps (d) and (e) until the evaluation is passed.
2. The decision tree behavior decision algorithm based on teaching learning of claim 1, wherein the specific process of step (a) is as follows: firstly, rasterizing the length of a predicted road surface; designing a state transition table for recording a conversion relation; and filling the frequency of the state transition table in a matrix form, wherein the frequency is used as the frequency of transition from the current state to the subsequent state in the teaching, and the state transition probability is obtained by calculating the access frequency of the subsequent n possible states of the current state through a softmax function.
3. The decision tree behavior decision algorithm based on teaching learning of claim 1, wherein the specific process of step (b) is as follows: the state transition frequency is the number of times of the state to be accessed in the current state, and the state transition probability is the transition probability value obtained by calculating the number of times; discretizing and sampling the state transition track of the teaching learning to construct a state transition frequency matrix, wherein the state transition probability is obtained by calculating the access frequency of the subsequent n possible states of the current state through a softmax function.
4. The decision tree behavior decision algorithm based on teaching learning of claim 1, wherein the specific process of step (c) is as follows: comparing the state action to be performed with the expected state action; if the result meets the expectation, the reward is added, otherwise, the negative reward punishment is carried out; if the behavior which is closer to the expected action than the selected action appears in other unselected actions in the current state, carrying out reward point adding; finally, fitting the discrete state points to obtain a planning curve; wherein the variation expression of the reward is designed as follows:
Figure FDA0002676266150000021
the above equation indicates that when the motion is expected, Δ r is set to + 1; conversely, when the action is not desirable, setting Δ r to-1, where auIs a desired action, a is an action to be performed; Δ r denotes the execution of action auThe latter prize value.
5. The decision tree behavior decision algorithm based on teaching learning of claim 1, wherein the specific process of step (d) is as follows: the decision tree judges the reasonability and safety of action transfer through two aspects; if all the evaluation is satisfied, the evaluation is passed, otherwise, the evaluation is not passed;
firstly, judging the reasonability of state transition to confirm that the vehicle can realize the transition under the condition of self physical condition limitation; the evaluation procedure is si→sj,||i-j||=1;
In the above formula siRepresents the ith state; this equation indicates that each time the vehicle is moving, the transition state is selected in the vicinity of the current state, where siAnd sjRespectively representing the states before and after a certain action is executed, and | i-j | ═ 1 is a constraint condition of the states; the value ranges of i and j are both natural numbers;
secondly, after the track points are fitted, expansion is carried out, and no other obstacles exist in the track travelable area:
Figure FDA0002676266150000022
wherein
Figure FDA0002676266150000023
Is shape ofState siWith respect to the longitudinal and transverse coordinates, x, of the vehicleobstacle,yobstacleObstacle horizontal and vertical coordinates, x, of adjacent areawidth,ylength1/2 for the width and length of the vehicle, respectively.
6. The teach-learning-based decision tree behavior decision algorithm according to claim 1, wherein the specific process of step (e) is; the strengthening method comprises the following steps:
t=rt+γV(st+1)-V(st),p(st,at)=p(st,at)+βt
wherein V(s)t) Is the predicted jackpot for the current state, V(s)t+1) Is the jackpot after prediction from the next state, β is the update degree, γ is the reward confidence after the current prediction, p(s)t,at) Is in a state stPerforming action atThe formula (c) is updated on the basis of transition probabilities obtained by teaching the learned transition frequencies; whereintIs in slave state stTo st+1TD error of (r)tIs a state stTo st+1Is awarded immediately.
CN201710687194.0A 2017-08-11 2017-08-11 Decision tree behavior decision algorithm based on teaching learning Expired - Fee Related CN107479547B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710687194.0A CN107479547B (en) 2017-08-11 2017-08-11 Decision tree behavior decision algorithm based on teaching learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710687194.0A CN107479547B (en) 2017-08-11 2017-08-11 Decision tree behavior decision algorithm based on teaching learning

Publications (2)

Publication Number Publication Date
CN107479547A CN107479547A (en) 2017-12-15
CN107479547B true CN107479547B (en) 2020-11-24

Family

ID=60600126

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710687194.0A Expired - Fee Related CN107479547B (en) 2017-08-11 2017-08-11 Decision tree behavior decision algorithm based on teaching learning

Country Status (1)

Country Link
CN (1) CN107479547B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229730B (en) * 2017-12-19 2021-07-20 同济大学 Unmanned vehicle track generation method based on fuzzy reward
CN108363393B (en) * 2018-02-05 2019-09-27 腾讯科技(深圳)有限公司 A kind of smart motion equipment and its air navigation aid and storage medium
CN108446727B (en) * 2018-03-09 2021-09-21 上海安亭地平线智能交通技术有限公司 Driving behavior decision method and system and electronic equipment
CN110738221B (en) * 2018-07-18 2024-04-26 华为技术有限公司 Computing system and method
CN110084539B (en) * 2018-11-30 2021-10-22 武汉大学 Irrigation decision learning method, device, server and storage medium
CN109461342B (en) * 2018-12-19 2023-06-27 畅加风行(苏州)智能科技有限公司 Teaching system for unmanned motor vehicle and teaching method thereof
CN110568848B (en) * 2019-09-10 2022-09-23 东风商用车有限公司 Teaching automatic driving operation system of sweeper
JP7211375B2 (en) * 2020-01-09 2023-01-24 トヨタ自動車株式会社 vehicle controller
CN112141098B (en) * 2020-09-30 2022-01-25 上海汽车集团股份有限公司 Obstacle avoidance decision method and device for intelligent driving automobile
US11577732B2 (en) * 2020-10-28 2023-02-14 Argo AI, LLC Methods and systems for tracking a mover's lane over time

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6162905A (en) * 1984-09-04 1986-03-31 Komatsu Ltd Automatic operating method of unmanned vehicle
JPH01106113A (en) * 1987-10-19 1989-04-24 Toshiba Corp Cleaning robot device
JPH08101712A (en) * 1994-09-30 1996-04-16 Mitsubishi Heavy Ind Ltd On-line teaching device for unmanned carriage passing path
CN102929281A (en) * 2012-11-05 2013-02-13 西南科技大学 Robot k-nearest-neighbor (kNN) path planning method under incomplete perception environment
CN103792846A (en) * 2014-02-18 2014-05-14 北京工业大学 Robot obstacle avoidance guiding method based on Skinner operating condition reflection principle
CN104570738A (en) * 2014-12-30 2015-04-29 北京工业大学 Robot track tracing method based on Skinner operant conditioning automata
CN105094124A (en) * 2014-05-21 2015-11-25 防灾科技学院 Method and model for performing independent path exploration based on operant conditioning
CN105487537A (en) * 2015-11-06 2016-04-13 福州华鹰重工机械有限公司 Vehicle motion planning method and unmanned vehicle
CN105700526A (en) * 2016-01-13 2016-06-22 华北理工大学 On-line sequence limit learning machine method possessing autonomous learning capability
CN106970615A (en) * 2017-03-21 2017-07-21 西北工业大学 A kind of real-time online paths planning method of deeply study

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6162905A (en) * 1984-09-04 1986-03-31 Komatsu Ltd Automatic operating method of unmanned vehicle
JPH01106113A (en) * 1987-10-19 1989-04-24 Toshiba Corp Cleaning robot device
JPH08101712A (en) * 1994-09-30 1996-04-16 Mitsubishi Heavy Ind Ltd On-line teaching device for unmanned carriage passing path
CN102929281A (en) * 2012-11-05 2013-02-13 西南科技大学 Robot k-nearest-neighbor (kNN) path planning method under incomplete perception environment
CN103792846A (en) * 2014-02-18 2014-05-14 北京工业大学 Robot obstacle avoidance guiding method based on Skinner operating condition reflection principle
CN105094124A (en) * 2014-05-21 2015-11-25 防灾科技学院 Method and model for performing independent path exploration based on operant conditioning
CN104570738A (en) * 2014-12-30 2015-04-29 北京工业大学 Robot track tracing method based on Skinner operant conditioning automata
CN105487537A (en) * 2015-11-06 2016-04-13 福州华鹰重工机械有限公司 Vehicle motion planning method and unmanned vehicle
CN105700526A (en) * 2016-01-13 2016-06-22 华北理工大学 On-line sequence limit learning machine method possessing autonomous learning capability
CN106970615A (en) * 2017-03-21 2017-07-21 西北工业大学 A kind of real-time online paths planning method of deeply study

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
《Adaptive output feedback control for uncertain nonholonomic chained systems》;YUAN Zhan-ping.etc;《中南大学学报(英文版)》;20101231(第3期);第572-579页 *
《IMITATION LEARNING OF CAR DRIVING SKILLS WITH DECISION TREES AND RANDOM FORESTS》;PAWEŁ CICHOSZ.etc;《International Journal of Applied Mathematics & Computer Science》;20140930;第24卷(第3期);第579-597页 *
《Imitation Learning: A Survey of Learning Methods》;Ahmed Hussein.etc;《ACM Computing Surveys》;20170430;第50卷(第2期);第21:1-21:35页 *
《Self-reproduction for articulated behaviors with dual humanoid robots using on-line decision tree classification》;Jane Brooks Zurn.etc;《Robotica》;20110624;第30卷;第315-332页 *

Also Published As

Publication number Publication date
CN107479547A (en) 2017-12-15

Similar Documents

Publication Publication Date Title
CN107479547B (en) Decision tree behavior decision algorithm based on teaching learning
CN110834644B (en) Vehicle control method and device, vehicle to be controlled and storage medium
CN111780777B (en) Unmanned vehicle route planning method based on improved A-star algorithm and deep reinforcement learning
CN111098852B (en) Parking path planning method based on reinforcement learning
CN112356830B (en) Intelligent parking method based on model reinforcement learning
WO2021103834A1 (en) Method for generating lane changing decision model, lane changing decision method for driverless vehicle, and device
Zhao et al. A novel direct trajectory planning approach based on generative adversarial networks and rapidly-exploring random tree
Wang et al. A survey of learning‐based robot motion planning
CN108229730B (en) Unmanned vehicle track generation method based on fuzzy reward
Micheli et al. NMPC trajectory planner for urban autonomous driving
CN111781922A (en) Multi-robot collaborative navigation method based on deep reinforcement learning and suitable for complex dynamic scene
Tian et al. Personalized lane change planning and control by imitation learning from drivers
Lodhi et al. Autonomous vehicular overtaking maneuver: A survey and taxonomy
Yu et al. Hierarchical reinforcement learning combined with motion primitives for automated overtaking
Moghadam et al. A deep reinforcement learning approach for long-term short-term planning on frenet frame
Tang et al. Actively learning Gaussian process dynamical systems through global and local explorations
CN114912693A (en) Multi-mode prediction-based automatic driving automobile motion planning method
Ferreira et al. Full neural predictors, with fixed time horizon, for a Truck-Trailer-Trailer prototype of a multi-articulated robot, in backward movements-singular conditions and critical angles
Yang et al. Deep Reinforcement Learning Lane-Changing Decision Algorithm for Intelligent Vehicles Combining LSTM Trajectory Prediction
Wang et al. An Enabling Decision-Making Scheme by Considering Trajectory Prediction and Motion Uncertainty
Zeng et al. Risk-aware deep reinforcement learning for decision-making and planning of autonomous vehicles
Fan et al. A hierarchical control strategy for reliable lane changes considering optimal path and lane‐changing time point
Zhang et al. Maximum entropy inverse reinforcement learning-based trajectory planning for autonomous driving
Khalajzadeh et al. A review on applicability of expert system in designing and control of autonomous cars
CN117789502A (en) System and method for distributed awareness target prediction for modular autonomous vehicle control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20201124

CF01 Termination of patent right due to non-payment of annual fee