CN108791302B - Driver behavior modeling system - Google Patents

Driver behavior modeling system Download PDF

Info

Publication number
CN108791302B
CN108791302B CN201810662040.0A CN201810662040A CN108791302B CN 108791302 B CN108791302 B CN 108791302B CN 201810662040 A CN201810662040 A CN 201810662040A CN 108791302 B CN108791302 B CN 108791302B
Authority
CN
China
Prior art keywords
driving
state
neural network
strategy
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810662040.0A
Other languages
Chinese (zh)
Other versions
CN108791302A (en
Inventor
邹启杰
李昊宇
裴腾达
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University
Original Assignee
Dalian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University filed Critical Dalian University
Priority to CN201810662040.0A priority Critical patent/CN108791302B/en
Publication of CN108791302A publication Critical patent/CN108791302A/en
Application granted granted Critical
Publication of CN108791302B publication Critical patent/CN108791302B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/08Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to drivers or passengers
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0019Control system elements or transfer functions
    • B60W2050/0028Mathematical models, e.g. for simulation
    • B60W2050/0029Mathematical model of the driver

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Mechanical Engineering (AREA)
  • Evolutionary Computation (AREA)
  • Transportation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a driver behavior modeling system, which specifically comprises a feature extractor, a model extractor and a model database, wherein the feature extractor is used for extracting and constructing return function features; the return function generator is used for acquiring a return function required by constructing the driving strategy; the driving strategy acquirer completes construction of a driving strategy; the judger judges whether the optimal driving strategy constructed by the acquirer meets judgment standards or not; if the driving strategy does not meet the judgment standard, reconstructing a return function, repeatedly constructing an optimal driving strategy, and repeatedly iterating until the judgment standard is met; finally, a driving strategy describing a real driving demonstration is obtained. The method and the device can be applied to new state scenes to obtain corresponding actions, the generalization capability of the established driver behavior model is greatly improved, the applicable scenes are wider, and the robustness is stronger.

Description

Driver behavior modeling system
Technical Field
The invention relates to a modeling method, in particular to a driver behavior modeling system.
Background
Autonomous driving is an important part of the intelligent transportation field. For reasons such as current technology, autonomous vehicles still require intelligent driving systems (intelligent driver assistance systems) and human drivers to cooperate with each other to accomplish driving tasks. In the process, driver modeling is an essential important step whether the information of the driver is better quantified for decision making of an intelligent system or personalized services are provided for people by distinguishing different drivers.
Among the current methods related to modeling of drivers, the reinforcement learning method has a good solution effect on complex sequential decision problems of large-scale continuous space and multiple optimization targets when the drivers drive vehicles, and thus is an effective method for modeling of driver behaviors. Reinforcement learning, as an MDP-based problem solving method, requires interaction with the environment, taking action to obtain a feedback signal, i.e. reward, from the evaluated nature of the environment and maximizing the long-term reward.
Through the search of the existing literature, the existing setting method for the reward function in the modeling of the driver behavior mainly comprises the following steps: the conventional method of setting for different scene states manually by researchers and the method of setting by means of the reverse reinforcement learning method. The traditional method has great subjectivity on researchers, and the quality of the return function depends on the abilities and experiences of the researchers. Meanwhile, in the driving process of the vehicle, in order to correctly set the return function, a large number of decision variables need to be balanced, the variables have great incoherence and even contradiction, and researchers often cannot design the return function capable of balancing various requirements.
The reverse reinforcement learning distributes proper weight for various driving characteristics by means of driving demonstration data, and can automatically learn to obtain a required return function, so that the defect of original artificial decision is overcome. However, the traditional reverse reinforcement learning method can only learn the existing scene state in the driving demonstration data, and when the driver actually drives, the actual driving scene often exceeds the driving demonstration range due to different factors such as weather, scenery and the like. Therefore, the method of reverse reinforcement learning solves the problem that the relationship between scenes and decision actions in driving demonstration data shows insufficient generalization capability.
The existing driver behavior modeling method based on the reinforcement learning theory mainly has two ideas: in the first idea, a traditional reinforcement learning method is adopted, the setting of a return function depends on the analysis, arrangement, screening and induction of a researcher on scenes, and then a series of characteristics related to driving decisions are obtained, such as: the distance between the front of the vehicle and the curb or not, the pedestrian or not, the reasonable speed, the lane change frequency and the like; and designing a series of experiments according to the driving scene requirements to obtain the weight proportion of the characteristics in the return function under the corresponding scene environment, and finally completing the overall design of the return function to be used as a model for describing the driving behavior of the driver. And secondly, solving a driving behavior characteristic function by adopting maximum entropy reverse reinforcement learning based on a probabilistic model modeling method. First, assuming that there is a potential, specific one of the probability distributions, a demonstration trajectory of driving is generated; furthermore, it is necessary to find a probability distribution that can be fitted to the driving demonstration, and the problem of finding this probability distribution can be translated into a non-linear programming problem, namely:
max-plogp
Figure BDA0001706988370000021
∑P=1
p is the probability distribution of the demonstration track, and after the probability distribution is obtained through the solving of the formula, the probability distribution is obtained
Figure BDA0001706988370000022
Obtaining the relevant parameters, i.e. obtaining the return function r ═ thetaTf(st)。
The traditional driver driving behavior model utilizes the known driving data to analyze, describe and reason the driving behavior, however, the collected driving data cannot completely cover infinite driving behavior characteristics, and the situation of corresponding actions of all states cannot be obtained. In an actual driving scene, because of different weather, scenes and objects, the driving state has many possibilities, and it is impossible to traverse all the states. Therefore, the traditional driving behavior model of the driver has weak generalization capability, more assumed conditions of the model and poor robustness.
Secondly, in the actual driving problem, the method of setting the reward function only by the researcher needs to balance too many requirements for various characteristics, completely depends on the experience setting of the researcher, repeatedly and manually adjusts, consumes time and labor, and is over subjective even if the researcher is fatal. Under different scenes and environments, researchers need to face too many scene states; meanwhile, even if the requirements are different for a certain scene state, the driving behavior characteristics can be changed. To accurately describe the driving task, a series of weights are assigned to accurately describe these factors. In the existing method, reverse reinforcement learning based on a probability model is mainly based on existing demonstration data, the demonstration data is used as existing data, the distribution situation of the corresponding current data is sought, and action selection in the corresponding state can be obtained based on the distribution situation. However, the known data distribution does not indicate the distribution of all data, and the distribution needs to be acquired correctly, and all states need to be acquired.
Disclosure of Invention
In order to solve the problem of weak generalization of driver modeling, namely the technical problem that a corresponding return function cannot be established to carry out driver behavior modeling under the condition that a driving scene is not in demonstration data in the prior art, the application provides the driver behavior modeling system which can be applied to a new state scene to obtain corresponding actions, so that the generalization capability of the established driver behavior model is greatly improved, the applicable scene is wider, and the robustness is stronger.
In order to achieve the purpose, the technical points of the scheme of the invention are as follows: the driver behavior modeling system specifically comprises:
the characteristic extractor is used for extracting and constructing return function characteristics;
a return function generator for obtaining a driving strategy;
the driving strategy acquirer completes construction of a driving strategy;
the judger judges whether the optimal driving strategy constructed by the acquirer meets judgment standards or not; if the driving strategy does not meet the judgment standard, reconstructing a return function, repeatedly constructing an optimal driving strategy, and repeatedly iterating until the judgment standard is met; finally, a driving strategy describing a real driving demonstration is obtained.
Further, the specific implementation process of extracting and constructing the return function features by the feature extractor is as follows:
s11, in the driving process of the vehicle, a camera placed behind a windshield of the vehicle is used for sampling a driving video to obtain N groups of pictures of road conditions of different vehicle driving environments; meanwhile, corresponding to driving operation data, namely the steering angle condition under the road environment, training data are jointly constructed;
s12, translating, cutting and changing the brightness of the collected pictures to simulate scenes with different illumination and weather;
s13, constructing a convolutional neural network, taking the processed picture as input, taking the operation data of the corresponding picture as a tag value, training, and solving an optimal solution for mean square error loss by adopting an optimization method based on a Nadam optimizer to optimize weight parameters of the neural network;
s14, storing the network structure and the weight of the trained convolutional neural network to establish a new convolutional neural network to complete the state feature extractor.
Further, the convolutional neural network established in step S13 includes 1 input layer, 3 convolutional layers, 3 pooling layers, and 4 full-link layers; the input layer is sequentially connected with the first convolution layer and the first pooling layer, then connected with the second convolution layer and the second pooling layer, then connected with the third convolution layer and the third pooling layer, and finally sequentially connected with the first full-connection layer, the second full-connection layer, the third full-connection layer and the fourth full-connection layer.
Further, the convolutional neural network after the training in step S14 is completed does not include an output layer.
Further, the concrete implementation process of the reward function generator for obtaining the driving strategy is as follows:
s21, acquiring driving demonstration data of an expert: the driving demonstration data is obtained by sampling and extracting demonstration driving video data, and a section of continuous driving video is sampled according to a certain frequency to obtain a group of track demonstration; an expert demonstration data includes a plurality of traces, collectively denoted as:
Figure BDA0001706988370000031
wherein DERepresenting driving demonstration data as a whole,(s)j,aj) Representing a data pair formed by a corresponding state j and a decision command corresponding to the state, M representing the total number of driving demonstration data, NTRepresenting the number of driving demonstration trajectories, LiRepresenting the state-decision instruction pairs(s) contained in the ith driving demonstration trackj,aj) The number of (2);
s22, obtaining a characteristic expected value of a driving demonstration;
first driving demonstration data DEEach description in (1)State s of driving environmenttInput into a state feature extractor to obtain a corresponding state stCharacteristic case of f(s)t,at),f(st,at) Means a set of correspondences stThen, the characteristic expectation value of the driving demonstration is calculated based on the following formula:
Figure BDA0001706988370000041
wherein gamma is a discount factor and is correspondingly set according to different problems;
s23, solving a state-action set under a greedy strategy;
and S24, solving the weight of the return function.
Furthermore, the specific steps of solving the state-action set under the greedy strategy are as follows: the return function generator and the driving strategy acquirer are two parts of a cycle; first, a neural network in a driving strategy acquirer is acquired: driving demonstration data DEExtracting the obtained state feature f(s) describing the environmental conditiont,at) Input to the neural network to obtain an output gw(st);gw(st) Is about describing the state stA set of Q values, i.e. [ Q(s) ]t,a1),...,Q(st,an)]TAnd Q(s)t,ai) Representing state-action values for describing the state s in the current driving situationtNext, a decision-making driving action a is selectediThe quality of (d) is obtained based on the formula Q (s, a) where θ denotes a weight value in the current reward function and μ (s, a) denotes a feature expectation value.
Then based on an epsilon-greedy strategy, selecting and describing driving scene state stCorresponding driving decision actions
Figure BDA0001706988370000042
Selecting a scene s related to a current driving situationtDecision action for maximizing Q value in lower Q value set
Figure BDA0001706988370000043
Otherwise, randomly selecting
Figure BDA0001706988370000044
Has selected completely
Figure BDA0001706988370000045
Thereafter, recording the time
Figure BDA0001706988370000046
Thus demonstrating for driving DEState feature f(s) of each state in (b)t,at) Inputting the data into the neural network to obtain M state-action pairs(s)t,at) Describing the driving scene state s at time ttLower selection of driving decision action at(ii) a And simultaneously, acquiring Q values of M corresponding state-action pairs based on the condition of action selection, and recording the Q values as Q.
Furthermore, the specific step of obtaining the weight of the reporting function is:
firstly, an objective function is constructed based on the following formula:
Figure BDA0001706988370000051
Figure BDA0001706988370000052
representing a loss function, i.e. according to whether the current state-action pair exists in the driving demonstration, if so, it is 0, otherwise, it is 1;
Figure BDA0001706988370000053
the corresponding state-action values recorded above;
Figure BDA0001706988370000054
multiplying the driving demonstration feature expectation obtained in the step S22 by the weight value theta of the return function;
Figure BDA0001706988370000055
is a regular term;
the objective function is minimized by means of a gradient descent method, i.e. t ═ minθJ (theta), obtaining the variable theta minimizing the objective function, where theta is the weight of the desired reward function.
Further, the process of acquiring the driving strategy by the reward function generator further comprises: s25, based on the obtained corresponding return function weight value theta, according to a formula r (s, a) ═ thetaTf (s, a) constructing a reward function generator.
As a further step, the specific implementation process of the driving strategy construction completed by the driving strategy acquirer is as follows:
s31 construction of training data of driving strategy acquirer
Training data is acquired, each data comprising two parts: one is driving decision characteristic f(s) obtained by inputting driving scene state at the time t into a driving state extractort) The other is obtained based on the following formula
Figure BDA0001706988370000056
Figure BDA0001706988370000057
Wherein r isθ(st,at) A reward function generated based on driving demonstration data by means of a reward function generator; qπ(st,at) And Qπ(st+1,at+1) From the Q values recorded in S23, a driving scene S describing the time t is selectedtAnd selecting a driving scene s in which the t +1 moment is describedt+1The Q value of (1);
s32, establishing a neural network
The neural network comprises three layers, wherein the first layer is used as an input layer, the number of neurons and the output feature types of the feature extractor are the same and are k, and the neural network is used for inputting the features f(s) of the driving scenet,at) The number of the hidden layers of the second layer is 10, and the third layerThe number of neurons in the layer is the same as the number n of driving actions for decision making in the action space; the activation functions of the input layer and the hidden layer are sigmoid functions, i.e.
Figure BDA0001706988370000058
Namely, the method comprises the following steps:
z=w(1)x=w(1)[1,ft]T
h=sigmoid(z)
gw(st)=sigmoid(w(2)[1,h]T)
wherein w(1)The weight value of the hidden layer; f. oftFor the state s of the driving scene at time ttI.e. the input of the neural network; z is the network layer output when the hidden layer sigmoid activation function is not passed; h is hidden layer output after the sigmoid activation function; w is a(2)Is the weight of the output layer;
g of network outputw(st) Is the driving scene state s at time ttQ set of (1), i.e., [ Q(s) ]t,a1),...,Q(st,an)]TQ in S31π(st,at) That is, the state stInput neural network, selecting a in the outputtThe term is obtained;
s33, optimizing the neural network
For the optimization of the neural network, the established loss function is a cross entropy cost function, and the formula is as follows:
Figure BDA0001706988370000061
wherein N represents the number of training data; qπ(st,at) Will describe the driving scene state s at time ttInputting the neural network, selecting the corresponding driving decision action a in the outputtThe value obtained by the term;
Figure BDA0001706988370000062
the numerical value obtained in S31;
Figure BDA0001706988370000063
is a regular term where W ═ W(1),w(2)The weight in the neural network is represented by the symbol;
inputting the training data obtained in the S31 into the neural network optimization cost function; and (4) finishing the minimization of the cross entropy cost function by means of a gradient descent method to obtain an optimized neural network, and further obtaining a driving strategy acquirer.
As a further step, the specific implementation process of the arbiter comprises:
considering the current return function generator and the driving strategy acquirer as a whole, checking the t value in the current S22 to see whether t is less than epsilon, wherein epsilon is a threshold value for judging whether a target function meets requirements or not, namely judging whether the return function for acquiring the driving strategy currently meets the requirements or not; the numerical value is set differently according to specific requirements;
when the value of t does not satisfy the formula; the reward function generator needs to be reconstructed, and the neural network needed in the current S23 needs to be replaced by the new neural network which is optimized in S33, namely the neural network is used for generating the state S of the driving scenetNext, the selected decision-making driving action aiGood or bad Q(s)t,ai) A network of values replaced with a new network structure optimized by the gradient descent method in S33; then reconstructing a return function generator to obtain a driving strategy acquirer, and judging whether the value of t meets the requirement again;
when the formula is satisfied, the current theta is the weight of the required return function; the return function generator meets the requirements, and the driving strategy acquirer also meets the requirements; then, collecting the driving data of a certain driver needing to establish a driver model, namely an environmental scene image and corresponding operation data in the driving process, inputting the driving environmental scene image and the corresponding operation data into a driving environmental feature extractor, and obtaining decision-making features of the current scene; then inputting the extracted features into a return function generator to obtain a return function corresponding to the scene state; and then inputting the collected decision characteristics and the calculated return function into a driving strategy acquirer to obtain a driving strategy corresponding to the driver.
Compared with the prior art, the invention has the beneficial effects that: according to the method for describing the driver decision and establishing the driver behavior model, the neural network is adopted to describe the strategy, and when the neural network parameters are determined, the states and the actions are in one-to-one correspondence, so that the possible conditions of the state-action pairs are not limited to the demonstration track. Therefore, in an actual driving situation, due to the large state space corresponding to various driving scenes caused by weather, scenery and the like, by virtue of the excellent capability of the neural network to approximately express any function, the strategy expression can be approximately regarded as a black box: and outputting a corresponding state-action value by inputting the characteristic value of the state, and selecting an action according to the condition of the output value to obtain a corresponding action. Therefore, the applicability of modeling of the driver behavior by means of reverse reinforcement learning is greatly enhanced, the traditional method tries to fit the demonstration track by means of a certain probability distribution, so that the obtained optimal strategy is still limited by the existing state condition in the demonstration track, and the method can be applied to a new state scene to obtain the corresponding action of the new state scene, so that the generalization capability of the established driver behavior model is greatly improved, the application scene is wider, and the robustness is stronger.
Drawings
FIG. 1 is a new deep convolutional neural network;
FIG. 2 is a driving video sampling diagram;
FIG. 3 is a block diagram of the system workflow;
fig. 4 is a diagram illustrating the neural network structure established in step S32.
Detailed Description
The invention will be further explained with reference to the drawings attached to the specification. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
The present embodiment provides a driver behavior modeling system including:
1. the feature extractor extracts and constructs return function features, and the specific mode is as follows:
s11, sampling a driving video obtained by a camera placed behind a windshield of a vehicle in the driving process of the vehicle, wherein a sampling graph is shown in figure 2.
N groups of pictures of road conditions of different vehicle driving road environments and corresponding steering angle conditions are obtained. The training data are jointly constructed by corresponding driving operation data, wherein the training data comprise N1 straight roads and N2 curved roads, the values of N1 and N2 can be N1> -300 and N2> -3000.
And S12, carrying out related operations of translation, cutting, brightness change and the like on the collected image so as to simulate scenes with different illumination and weather.
S13, constructing a convolutional neural network, taking the processed picture as input, taking the operation data of the corresponding picture as a tag value, and training; and (3) optimizing weight parameters of the neural network by solving the optimal solution of the mean square error loss by adopting an optimization method based on a Nadam optimizer.
The convolutional neural network comprises 1 input layer, 3 convolutional layers, 3 pooling layers and 4 full-connection layers. The input layer is sequentially connected with the first convolution layer and the first pooling layer, then connected with the second convolution layer and the second pooling layer, then connected with the third convolution layer and the third pooling layer, and then sequentially connected with the first full-connection layer, the second full-connection layer, the third full-connection layer and the fourth full-connection layer.
And S14, storing the network structure and the weight of the trained convolutional neural network except the final output layer to establish a new convolutional neural network to complete the state feature extractor.
2. The return function generator acquires a driving strategy, and the specific mode is as follows:
the return function is used as a standard for action selection in the reinforcement learning method, and the quality of the return function plays a decisive role in the acquisition process of the driving strategy, so that the quality of the acquired driving strategy is directly determined, and whether the acquired strategy is the same as the strategy corresponding to the real driving demonstration data or not is directly determined. The formula of the return function is reward ═ thetaTf(st,at),f(st,at) Indicates a state s at time t in a scene corresponding to a driving environment "surrounding environment of a vehicletThe characteristic values influencing the driving decision result are used for describing the scene condition of the environment around the vehicle. And theta represents a group of weights corresponding to the characteristics influencing the driving decision, and the numerical value of the weights shows the proportion of the corresponding environmental characteristics in the return function, so that the importance is embodied. On the basis of the state feature extractor, the weight value theta needs to be solved, so that a return function influencing the driving strategy is constructed.
S21, obtaining driving demonstration data of experts
The driving demonstration data is derived from sample extraction of the demonstration driving video data (as opposed to data used by a previous driving environment feature extractor), and a continuous segment of driving video may be sampled at a frequency of 10hz, resulting in a set of trajectory demonstrations. One expert demonstration should have multiple tracks. Overall notation is:
Figure BDA0001706988370000081
wherein DERepresenting driving demonstration data as a whole,(s)j,aj) A data pair representing a corresponding state j (a video picture of the driving environment at the sampled time j) and a decision command corresponding to the state (e.g. a steering angle in a steering command), M representing the total number of driving demonstration data, NTRepresenting the number of driving demonstration trajectories, LiRepresenting the state-decision instruction pairs(s) contained in the ith driving demonstration trackj,aj) Number of (2)
S22, obtaining characteristic expectation of driving demonstration
First driving demonstration data DEEach state s describing a driving environment conditiontInput state feature extractor for obtaining corresponding state stCharacteristic case of f(s)t,at),f(st,at) Means a set of correspondences stThen calculates the characteristic expectation of the driving demonstration based on the following formula:
Figure BDA0001706988370000091
where γ is a discount factor, and the reference value may be set to 0.65 according to different problems.
S23, obtaining a state-action set under a greedy strategy
First, the neural network in the driving strategy acquirer in S32 is acquired. (since the reward function generator and the driving strategy acquirer are two parts of a loop, the neural network is the neural network just initialized in S32 at the beginning. with the progress of the loop, each step in the loop is that the construction of a reward function influencing the driving decision is completed once, then the corresponding optimal driving strategy is acquired in the driving strategy acquirer based on the current reward function, whether the criterion for ending the loop is met is judged, if not, the optimized neural network in the current S34 is put into the reconstruction of the reward function)
Driving demonstration data DEExtracting the obtained state feature f(s) describing the environmental conditiont,at) Input to a neural network to obtain an output gw(st);gw(st) Is about describing the state stA set of Q values, i.e. [ Q(s) ]t,a1),...,Q(st,an)]TAnd Q(s)t,ai) Representing state-action values for describing the state s in the current driving situationtNext, a decision-making driving action a is selectediThe quality of (d) can be obtained based on the formula Q (s, a) ═ θ · μ (s, a), where θ denotes the weight in the current reward function and μ (s, a) denotes the desired feature.
Then, based on an epsilon-greedy strategy, if epsilon is set to be 0.5, a driving scene state s is selected and describedtCorresponding driving decision actions
Figure BDA0001706988370000092
That is, there is a fifty percent likelihood of picking a score for the current driving scenario stDecision action for maximizing Q value in lower Q value set
Figure BDA0001706988370000093
Otherwise, randomly selecting
Figure BDA0001706988370000094
Has selected completely
Figure BDA0001706988370000095
Thereafter, recording the time
Figure BDA0001706988370000096
Thus demonstrating for driving DEState feature f(s) of each state in (b)t,at) Inputting the data into the neural network to obtain M state-action pairs(s)t,at) Which describes the driving scene state s at time ttLower selection of driving decision action at. And simultaneously, acquiring Q values of M corresponding state-action pairs based on the condition of action selection, and recording the Q values as Q.
S24, weight value of the return function is obtained
Firstly, an objective function is constructed based on the following formula:
Figure BDA0001706988370000097
Figure BDA0001706988370000101
represents a penalty function, i.e., a function that is based on whether a current state-action pair exists in the driving demonstration, 0 if it exists, and 1 if it does not exist.
Figure BDA0001706988370000102
The corresponding state-action values recorded above.
Figure BDA0001706988370000103
Is the product of the driving demonstration feature expectation obtained in S22 and the weight value θ of the reward function.
Figure BDA0001706988370000104
To be a regular term, to prevent the over-fitting problem from occurring, this γ may be 0.9.
The objective function is minimized by means of a gradient descent method, i.e. t ═ minθJ (theta), obtaining the variable theta minimizing the objective function, where theta is the weight of the desired reward function.
S25, based on the obtained corresponding return function weight value theta, according to a formula r (s, a) ═ thetaTf (s, a) constructing a reward function generator.
3. The driving strategy acquirer completes construction of a driving strategy, and the specific mode is as follows:
construction of training data of S31 driving strategy acquirer
Training data is acquired. The data comes from sampling the previous exemplary data, but needs to be processed to get a set of new types of data, N in total. Each of the data includes two parts: one is driving decision characteristic f(s) obtained by inputting driving scene state at the time t into a driving state extractort) The other is obtained based on the following formula
Figure BDA0001706988370000105
Figure BDA0001706988370000106
The formula includes a parameter rθ(st,at) A reward function generated by a reward function generator based on driving demonstration data. Qπ(st,at) And Qπ(st+1,at+1) From the set of Q values Q recorded in S23, a driving scene S describing the time t is selectedtAnd selecting a driving scene s in which the t +1 moment is describedt+1The Q value of (1).
S32, establishing a neural network
The neural network comprises three layers, wherein the first layer is used as an input layer, the number of neurons and the output feature types of the feature extractor are the same and are k, and the neural network is used for inputting the features f(s) of the driving scenet,at) Second, secondThe number of hidden layers of the layer is 10, and the number of neurons of the third layer is the same as the number n of driving actions for decision making in the action space; the activation functions of the input layer and the hidden layer are sigmoid functions, i.e.
Figure BDA0001706988370000107
Namely, the method comprises the following steps:
z=w(1)x=w(1)[1,ft]T
h=sigmoid(z)
gw(st)=sigmoid(w(2)[1,h]T)
wherein w(1)The weight of the hidden layer is represented; f. oftState s denoting the driving scenario at time ttI.e. the input of the neural network; z represents the output of the network layer when the hidden layer sigmoid activation function is not passed; h represents hidden layer output after sigmoid activation function; w is a(2)The weight of the output layer is represented; the network structure is as shown in FIG. 3:
g of network outputw(st) Is the driving scene state s at time ttQ set of (1), i.e., [ Q(s) ]t,a1),...,Q(st,an)]TQ in S31π(st,at) That is, the state stInput neural network, selecting a in the outputtThe term is obtained.
S33, optimizing the neural network
For the optimization of the neural network, the established loss function is a cross entropy cost function, and the formula is as follows:
Figure BDA0001706988370000111
wherein N denotes the number of training data. Qπ(st,at) That is, the driving scene state s at time t will be describedtInputting the neural network, selecting the corresponding driving decision action a in the outputtThe numerical values obtained by the terms.
Figure BDA0001706988370000112
The values obtained in S31.
Figure BDA0001706988370000113
Also a regular term, is set to prevent overfitting. The γ may be 0.9. Wherein W ═ { W ═ W(1),w(2)And the weights in the neural network are referred to.
The training data obtained in S31 is input to the neural network optimization cost function. And (4) finishing the minimization of the cross entropy cost function by means of a gradient descent method to obtain an optimized neural network, and obtaining a driving strategy acquirer.
4. The judger judges whether the optimal driving strategy constructed by the acquirer meets judgment standards or not; if the driving strategy does not meet the judgment standard, reconstructing a return function, repeatedly constructing an optimal driving strategy, and repeatedly iterating until the judgment standard is met; finally, a driving strategy describing a real driving demonstration is obtained.
The current return function generator and the driving strategy acquirer are considered as a whole, the t value in the current S22 is checked, whether t is less than epsilon or not is met, and epsilon is a threshold value for judging whether a target function meets requirements or not, namely whether the return function for acquiring the driving strategy currently meets the requirements or not is judged. The numerical value is set differently according to specific needs.
When t is a value that does not satisfy the formula. The reward function generator needs to be reconstructed, and the neural network needed in the current S23 needs to be replaced by the new neural network which is optimized in S33, namely the neural network is used for generating the state S of the driving scenetNext, the selected decision-making driving action aiGood or bad Q(s)t,ai) The network of values is replaced with a new network structure optimized by the gradient descent method in S33. And then reconstructing a return function generator to obtain a driving strategy acquirer, and judging whether the value of t meets the requirement again.
When the formula is satisfied, the current θ is the weight of the desired reward function. The return function generator meets the requirements, and the driving strategy acquirer also meets the requirements. Thus, it is possible to: the method comprises the steps of collecting driving data of a certain driver needing to establish a driver model, namely an environment scene image and corresponding operation data in the driving process, such as a driving steering angle. And inputting the driving environment characteristic extractor to obtain the decision characteristic of the current scene. And then inputting the extracted features into a return function generator to obtain a return function corresponding to the scene state. And then inputting the collected decision characteristics and the calculated return function into a driving strategy acquirer to obtain a driving strategy corresponding to the driver.
In the markov decision process, one strategy requires a connection state to its corresponding action. However, when a state space with a large range exists, it is difficult to describe a certain strategic representation for an unretraversed area, the description of the certain strategic representation is omitted in the conventional method, the probability model of the whole trajectory distribution is described based on an exemplary trajectory, and no specific strategic representation is given for a new state, that is, no specific method is given for the possibility of taking certain action for the new state. In the invention, the strategy is described by means of a neural network, and the neural network has excellent generalization capability because the neural network can approximately represent the characteristics of any function at any accuracy. By means of the representation of the state features, states which are not contained in the exemplary trajectories can be represented on the one hand, and, in addition, by means of the input of corresponding state features into the neural network. The corresponding action value can be obtained, so that the obtained action is obtained according to a strategy, and the problem that the traditional method cannot generalize the driving demonstration data to the state of the driving scene which is not traversed is solved.
The above description is only for the purpose of creating a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution and the inventive concept of the present invention within the technical scope of the present invention.

Claims (8)

1. The driver behavior modeling system is characterized by specifically comprising:
the characteristic extractor is used for extracting and constructing return function characteristics;
a return function generator for obtaining a driving strategy;
the driving strategy acquirer completes construction of a driving strategy;
the judger judges whether the optimal driving strategy constructed by the acquirer meets judgment standards or not; if the driving strategy does not meet the judgment standard, reconstructing a return function, repeatedly constructing an optimal driving strategy, and repeatedly iterating until the judgment standard is met;
the specific implementation process of extracting and constructing the return function features by the feature extractor is as follows:
s11, in the driving process of the vehicle, a camera placed behind a windshield of the vehicle is used for sampling a driving video to obtain N groups of pictures of road conditions of different vehicle driving environments and corresponding steering angle conditions; simultaneously, corresponding to driving operation data, training data are jointly constructed;
s12, translating, cutting and changing the brightness of the collected pictures to simulate scenes with different illumination and weather;
s13, constructing a convolutional neural network, taking the processed picture as input, taking the operation data of the corresponding picture as a tag value, training, and solving an optimal solution for mean square error loss by adopting an optimization method based on a Nadam optimizer to optimize weight parameters of the neural network;
s14, storing the network structure and the weight of the trained convolutional neural network to establish a new convolutional neural network and complete the state feature extractor;
the concrete implementation process of the return function generator for obtaining the driving strategy is as follows:
s21, acquiring driving demonstration data of an expert: the driving demonstration data is obtained by sampling and extracting demonstration driving video data, and a section of continuous driving video is sampled according to a certain frequency to obtain a group of track demonstration; an expert demonstration data includes a plurality of traces, collectively denoted as:
DE={(s1,a1),(s2,a2),...,(sM,aM)}
Figure FDA0002385433630000011
wherein DERepresenting driving demonstration data as a whole,(s)j,aj) Representing a data pair formed by a corresponding state j and a decision command corresponding to the state, M representing the total number of driving demonstration data, NTRepresenting the number of driving demonstration trajectories, LiRepresenting the state-decision instruction pairs(s) contained in the ith driving demonstration trackj,aj) The number of (2);
s22, obtaining a characteristic expected value of a driving demonstration;
first driving demonstration data DEEach state s describing a driving environment conditiontInput into a state feature extractor to obtain a corresponding state stCharacteristic case of f(s)t,at),f(st,at) Means a set of correspondences stThen, the characteristic expectation value of the driving demonstration is calculated based on the following formula:
Figure FDA0002385433630000021
wherein gamma is a discount factor and is correspondingly set according to different problems;
s23, solving a state-action set under a greedy strategy;
and S24, solving the weight of the return function.
2. The driver behavior modeling system of claim 1, wherein the convolutional neural network established in step S13 comprises 1 input layer, 3 convolutional layers, 3 pooling layers, 4 fully-connected layers; the input layer is sequentially connected with the first convolution layer and the first pooling layer, then connected with the second convolution layer and the second pooling layer, then connected with the third convolution layer and the third pooling layer, and finally sequentially connected with the first full-connection layer, the second full-connection layer, the third full-connection layer and the fourth full-connection layer.
3. The driver behavior modeling system of claim 1, wherein the trained convolutional neural network of step S14 does not include an output layer.
4. The driver behavior modeling system of claim 1, wherein the specific step of finding the set of state-actions under a greedy strategy is: the return function generator and the driving strategy acquirer are two parts of a cycle; first, a neural network in a driving strategy acquirer is acquired: driving demonstration data DEExtracting the obtained state feature f(s) describing the environmental conditiont,at) Input to a neural network to obtain an output gw(st);gw(st) Is about describing the state stA set of Q values, i.e. [ Q(s) ]t,a1),...,Q(st,an)]TAnd Q(s)t,ai) Representing state-action values for describing the state s in the current driving situationtNext, a decision-making driving action a is selectediThe quality of (2) is obtained based on a formula Q (s, a) ═ theta · mu (s, a), wherein theta refers to the weight in the current reward function, and mu (s, a) refers to the expected feature value;
then based on an epsilon-greedy strategy, selecting and describing driving scene state stCorresponding driving decision actions
Figure FDA0002385433630000022
Selecting a scene s related to a current driving situationtDecision action for maximizing Q value in lower Q value set
Figure FDA0002385433630000023
Otherwise, randomly selecting
Figure FDA0002385433630000024
Has selected completely
Figure FDA0002385433630000025
Thereafter, recording the time
Figure FDA0002385433630000026
Thus demonstrating for driving DEState feature f(s) of each state in (b)t,at) Inputting the data into the neural network to obtain M state-action pairs(s)t,at) Describing the driving scene state s at time ttLower selection of driving decision action at(ii) a And simultaneously, acquiring Q values of M corresponding state-action pairs based on the condition of action selection, and recording the Q values as Q.
5. The driver behavior modeling system of claim 1, wherein the specific step of weighting the reward function is:
firstly, an objective function is constructed based on the following formula:
Figure FDA0002385433630000031
Figure FDA0002385433630000032
representing a loss function, i.e. according to whether the current state-action pair exists in the driving demonstration, if so, it is 0, otherwise, it is 1;
Figure FDA0002385433630000033
the corresponding state-action values recorded above;
Figure FDA0002385433630000034
multiplying the driving demonstration feature expectation obtained in the step S22 by the weight value theta of the return function;
Figure FDA0002385433630000035
is a regular term;
the objective function is minimized by means of a gradient descent method, i.e. t ═ minθJ (theta), obtaining a variable theta that minimizes the objective function, where theta isAnd calculating the weight of the required return function.
6. The driver behavior modeling system of claim 1, wherein the reward function generator obtaining the driving strategy concrete implementation further comprises: s25, based on the obtained corresponding return function weight value theta, according to a formula r (s, a) ═ thetaTf (s, a) constructing a reward function generator.
7. The driver behavior modeling system of claim 1, wherein the driving strategy acquirer implements the driving strategy construction by:
s31 construction of training data of driving strategy acquirer
Training data is acquired, each data comprising two parts: one is driving decision characteristic f(s) obtained by inputting driving scene state at the time t into a driving state extractort) The other is obtained based on the following formula
Figure FDA0002385433630000036
Figure FDA0002385433630000037
Wherein r isθ(st,at) A reward function generated based on driving demonstration data by means of a reward function generator; qπ(st,at) And Qπ(st+1,at+1) From the Q values recorded in S23, a driving scene S describing the time t is selectedtAnd selecting a driving scene s in which the t +1 moment is describedt+1The Q value of (1);
s32, establishing a neural network
The neural network comprises three layers, wherein the first layer is used as an input layer, the number of neurons and the output feature types of the feature extractor are the same and are k, and the neural network is used for inputting the features f(s) of the driving scenet,at) The number of hidden layers in the second layer is 10, and the number of neurons in the third layer and the action space are performedThe number n of the decided driving actions is the same; the activation functions of the input layer and the hidden layer are sigmoid functions, i.e.
Figure FDA0002385433630000038
Namely, the method comprises the following steps:
z=w(1)x=w(1)[1,ft]T
h=sigmoid(z)
gw(st)=sigmoid(w(2)[1,h]T)
wherein w(1)The weight value of the hidden layer; f. oftFor the state s of the driving scene at time ttI.e. the input of the neural network; z is the network layer output when the hidden layer sigmoid activation function is not passed; h is hidden layer output after the sigmoid activation function; w is a(2)Is the weight of the output layer;
g of network outputw(st) Is the driving scene state s at time ttQ set of (1), i.e., [ Q(s) ]t,a1),...,Q(st,an)]TQ in S31π(st,at) That is, the state stInput neural network, selecting a in the outputtThe term is obtained;
s33, optimizing the neural network
For the optimization of the neural network, the established loss function is a cross entropy cost function, and the formula is as follows:
Figure FDA0002385433630000041
wherein N represents the number of training data; qπ(st,at) Will describe the driving scene state s at time ttInputting the neural network, selecting the corresponding driving decision action a in the outputtThe value obtained by the term;
Figure FDA0002385433630000042
the numerical value obtained in S31;
Figure FDA0002385433630000043
is a regular term where W ═ W(1),w(2)The weight in the neural network is represented by the symbol;
inputting the training data obtained in the S31 into the neural network optimization cost function; and (4) finishing the minimization of the cross entropy cost function by means of a gradient descent method to obtain an optimized neural network, and further obtaining a driving strategy acquirer.
8. The driver behavior modeling system of claim 5, wherein the determiner implementation comprises:
the current return function generator and the driving strategy acquirer are considered as a whole, whether a t value meets t which is less than epsilon or not is checked, and epsilon is a threshold value for judging whether a target function meets requirements or not, namely whether the return function for acquiring the driving strategy currently meets the requirements or not is judged; the numerical value is set differently according to specific requirements;
when the value of t does not satisfy the formula; the reward function generator needs to be reconstructed, and the neural network needed in the current S23 needs to be replaced by the new neural network which is optimized in S33, namely the neural network is used for generating the state S of the driving scenetNext, the selected decision-making driving action aiGood or bad Q(s)t,ai) A network of values replaced with a new network structure optimized by the gradient descent method in S33; then reconstructing a return function generator to obtain a driving strategy acquirer, and judging whether the value of t meets the requirement again;
when the formula is satisfied, the current theta is the weight of the required return function; the return function generator meets the requirements, and the driving strategy acquirer also meets the requirements; then, collecting the driving data of a certain driver needing to establish a driver model, namely an environmental scene image and corresponding operation data in the driving process, inputting the driving environmental scene image and the corresponding operation data into a driving environmental feature extractor, and obtaining decision-making features of the current scene; then inputting the extracted features into a return function generator to obtain a return function corresponding to the scene state; and then inputting the collected decision characteristics and the calculated return function into a driving strategy acquirer to obtain a driving strategy corresponding to the driver.
CN201810662040.0A 2018-06-25 2018-06-25 Driver behavior modeling system Active CN108791302B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810662040.0A CN108791302B (en) 2018-06-25 2018-06-25 Driver behavior modeling system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810662040.0A CN108791302B (en) 2018-06-25 2018-06-25 Driver behavior modeling system

Publications (2)

Publication Number Publication Date
CN108791302A CN108791302A (en) 2018-11-13
CN108791302B true CN108791302B (en) 2020-05-19

Family

ID=64070795

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810662040.0A Active CN108791302B (en) 2018-06-25 2018-06-25 Driver behavior modeling system

Country Status (1)

Country Link
CN (1) CN108791302B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200363800A1 (en) * 2019-05-13 2020-11-19 Great Wall Motor Company Limited Decision Making Methods and Systems for Automated Vehicle
CN110481561B (en) * 2019-08-06 2021-04-27 北京三快在线科技有限公司 Method and device for generating automatic control signal of unmanned vehicle
CN111079533B (en) * 2019-11-14 2023-04-07 深圳大学 Unmanned vehicle driving decision method, unmanned vehicle driving decision device and unmanned vehicle
CN112052776B (en) * 2020-09-01 2021-09-10 中国人民解放军国防科技大学 Unmanned vehicle autonomous driving behavior optimization method and device and computer equipment
CN112373482B (en) * 2020-11-23 2021-11-05 浙江天行健智能科技有限公司 Driving habit modeling method based on driving simulator
WO2022221979A1 (en) * 2021-04-19 2022-10-27 华为技术有限公司 Automated driving scenario generation method, apparatus, and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103381826A (en) * 2013-07-31 2013-11-06 中国人民解放军国防科学技术大学 Adaptive cruise control method based on approximate policy iteration
CN105955930A (en) * 2016-05-06 2016-09-21 天津科技大学 Guidance-type policy search reinforcement learning algorithm
CN107168303A (en) * 2017-03-16 2017-09-15 中国科学院深圳先进技术研究院 A kind of automatic Pilot method and device of automobile
CN107203134A (en) * 2017-06-02 2017-09-26 浙江零跑科技有限公司 A kind of front truck follower method based on depth convolutional neural networks
CN107229973A (en) * 2017-05-12 2017-10-03 中国科学院深圳先进技术研究院 The generation method and device of a kind of tactful network model for Vehicular automatic driving
CN107480726A (en) * 2017-08-25 2017-12-15 电子科技大学 A kind of Scene Semantics dividing method based on full convolution and shot and long term mnemon
CN107679557A (en) * 2017-09-19 2018-02-09 平安科技(深圳)有限公司 Driving model training method, driver's recognition methods, device, equipment and medium
CN108108657A (en) * 2017-11-16 2018-06-01 浙江工业大学 A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103381826A (en) * 2013-07-31 2013-11-06 中国人民解放军国防科学技术大学 Adaptive cruise control method based on approximate policy iteration
CN105955930A (en) * 2016-05-06 2016-09-21 天津科技大学 Guidance-type policy search reinforcement learning algorithm
CN107168303A (en) * 2017-03-16 2017-09-15 中国科学院深圳先进技术研究院 A kind of automatic Pilot method and device of automobile
CN107229973A (en) * 2017-05-12 2017-10-03 中国科学院深圳先进技术研究院 The generation method and device of a kind of tactful network model for Vehicular automatic driving
CN107203134A (en) * 2017-06-02 2017-09-26 浙江零跑科技有限公司 A kind of front truck follower method based on depth convolutional neural networks
CN107480726A (en) * 2017-08-25 2017-12-15 电子科技大学 A kind of Scene Semantics dividing method based on full convolution and shot and long term mnemon
CN107679557A (en) * 2017-09-19 2018-02-09 平安科技(深圳)有限公司 Driving model training method, driver's recognition methods, device, equipment and medium
CN108108657A (en) * 2017-11-16 2018-06-01 浙江工业大学 A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于轨迹分析的自主导航性能评估方法;王勇鑫,钱徽,金卓军,朱淼良;《计算机工程》;20110320;142页 *

Also Published As

Publication number Publication date
CN108791302A (en) 2018-11-13

Similar Documents

Publication Publication Date Title
CN108819948B (en) Driver behavior modeling method based on reverse reinforcement learning
CN108791302B (en) Driver behavior modeling system
CN108920805B (en) Driver behavior modeling system with state feature extraction function
US11062617B2 (en) Training system for autonomous driving control policy
CN110874578B (en) Unmanned aerial vehicle visual angle vehicle recognition tracking method based on reinforcement learning
CN109131348B (en) Intelligent vehicle driving decision method based on generative countermeasure network
CN110991027A (en) Robot simulation learning method based on virtual scene training
CN108891421B (en) Method for constructing driving strategy
CN110281949B (en) Unified hierarchical decision-making method for automatic driving
CN112550314B (en) Embedded optimization type control method suitable for unmanned driving, driving control module and automatic driving control system thereof
CN108944940B (en) Driver behavior modeling method based on neural network
CN114162146B (en) Driving strategy model training method and automatic driving control method
Farag Cloning safe driving behavior for self-driving cars using convolutional neural networks
Babiker et al. Convolutional neural network for a self-driving car in a virtual environment
CN113869170B (en) Pedestrian track prediction method based on graph division convolutional neural network
Farag Safe-driving cloning by deep learning for autonomous cars
CN115376103A (en) Pedestrian trajectory prediction method based on space-time diagram attention network
CN117406762A (en) Unmanned aerial vehicle remote control algorithm based on sectional reinforcement learning
CN117709602B (en) Urban intelligent vehicle personification decision-making method based on social value orientation
Zhong et al. Behavior prediction for unmanned driving based on dual fusions of feature and decision
Meftah et al. A virtual simulation environment using deep learning for autonomous vehicles obstacle avoidance
CN110222822A (en) The construction method of black box prediction model internal feature cause-and-effect diagram
CN117078923B (en) Automatic driving environment-oriented semantic segmentation automation method, system and medium
CN108791308B (en) System for constructing driving strategy based on driving environment
Oinar et al. Self-driving car steering angle prediction: Let transformer be a car again

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
OL01 Intention to license declared
OL01 Intention to license declared