CN108791302B - Driver behavior modeling system - Google Patents
Driver behavior modeling system Download PDFInfo
- Publication number
- CN108791302B CN108791302B CN201810662040.0A CN201810662040A CN108791302B CN 108791302 B CN108791302 B CN 108791302B CN 201810662040 A CN201810662040 A CN 201810662040A CN 108791302 B CN108791302 B CN 108791302B
- Authority
- CN
- China
- Prior art keywords
- driving
- state
- neural network
- strategy
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 claims abstract description 47
- 238000010276 construction Methods 0.000 claims abstract description 10
- 230000006870 function Effects 0.000 claims description 133
- 238000013528 artificial neural network Methods 0.000 claims description 65
- 230000009471 action Effects 0.000 claims description 32
- 238000012549 training Methods 0.000 claims description 20
- 230000008569 process Effects 0.000 claims description 17
- 238000013527 convolutional neural network Methods 0.000 claims description 15
- 238000011176 pooling Methods 0.000 claims description 12
- 230000007613 environmental effect Effects 0.000 claims description 10
- 238000005457 optimization Methods 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 9
- 238000011478 gradient descent method Methods 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 8
- 210000002569 neuron Anatomy 0.000 claims description 6
- 238000005286 illumination Methods 0.000 claims description 3
- 230000000875 corresponding effect Effects 0.000 abstract description 45
- 230000006399 behavior Effects 0.000 description 20
- 238000009826 distribution Methods 0.000 description 13
- 230000002787 reinforcement Effects 0.000 description 12
- 230000008859 change Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000007796 conventional method Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W40/00—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
- B60W40/08—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to drivers or passengers
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W2050/0001—Details of the control system
- B60W2050/0019—Control system elements or transfer functions
- B60W2050/0028—Mathematical models, e.g. for simulation
- B60W2050/0029—Mathematical model of the driver
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Mechanical Engineering (AREA)
- Evolutionary Computation (AREA)
- Transportation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Automation & Control Theory (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Evolutionary Biology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Feedback Control In General (AREA)
Abstract
The invention discloses a driver behavior modeling system, which specifically comprises a feature extractor, a model extractor and a model database, wherein the feature extractor is used for extracting and constructing return function features; the return function generator is used for acquiring a return function required by constructing the driving strategy; the driving strategy acquirer completes construction of a driving strategy; the judger judges whether the optimal driving strategy constructed by the acquirer meets judgment standards or not; if the driving strategy does not meet the judgment standard, reconstructing a return function, repeatedly constructing an optimal driving strategy, and repeatedly iterating until the judgment standard is met; finally, a driving strategy describing a real driving demonstration is obtained. The method and the device can be applied to new state scenes to obtain corresponding actions, the generalization capability of the established driver behavior model is greatly improved, the applicable scenes are wider, and the robustness is stronger.
Description
Technical Field
The invention relates to a modeling method, in particular to a driver behavior modeling system.
Background
Autonomous driving is an important part of the intelligent transportation field. For reasons such as current technology, autonomous vehicles still require intelligent driving systems (intelligent driver assistance systems) and human drivers to cooperate with each other to accomplish driving tasks. In the process, driver modeling is an essential important step whether the information of the driver is better quantified for decision making of an intelligent system or personalized services are provided for people by distinguishing different drivers.
Among the current methods related to modeling of drivers, the reinforcement learning method has a good solution effect on complex sequential decision problems of large-scale continuous space and multiple optimization targets when the drivers drive vehicles, and thus is an effective method for modeling of driver behaviors. Reinforcement learning, as an MDP-based problem solving method, requires interaction with the environment, taking action to obtain a feedback signal, i.e. reward, from the evaluated nature of the environment and maximizing the long-term reward.
Through the search of the existing literature, the existing setting method for the reward function in the modeling of the driver behavior mainly comprises the following steps: the conventional method of setting for different scene states manually by researchers and the method of setting by means of the reverse reinforcement learning method. The traditional method has great subjectivity on researchers, and the quality of the return function depends on the abilities and experiences of the researchers. Meanwhile, in the driving process of the vehicle, in order to correctly set the return function, a large number of decision variables need to be balanced, the variables have great incoherence and even contradiction, and researchers often cannot design the return function capable of balancing various requirements.
The reverse reinforcement learning distributes proper weight for various driving characteristics by means of driving demonstration data, and can automatically learn to obtain a required return function, so that the defect of original artificial decision is overcome. However, the traditional reverse reinforcement learning method can only learn the existing scene state in the driving demonstration data, and when the driver actually drives, the actual driving scene often exceeds the driving demonstration range due to different factors such as weather, scenery and the like. Therefore, the method of reverse reinforcement learning solves the problem that the relationship between scenes and decision actions in driving demonstration data shows insufficient generalization capability.
The existing driver behavior modeling method based on the reinforcement learning theory mainly has two ideas: in the first idea, a traditional reinforcement learning method is adopted, the setting of a return function depends on the analysis, arrangement, screening and induction of a researcher on scenes, and then a series of characteristics related to driving decisions are obtained, such as: the distance between the front of the vehicle and the curb or not, the pedestrian or not, the reasonable speed, the lane change frequency and the like; and designing a series of experiments according to the driving scene requirements to obtain the weight proportion of the characteristics in the return function under the corresponding scene environment, and finally completing the overall design of the return function to be used as a model for describing the driving behavior of the driver. And secondly, solving a driving behavior characteristic function by adopting maximum entropy reverse reinforcement learning based on a probabilistic model modeling method. First, assuming that there is a potential, specific one of the probability distributions, a demonstration trajectory of driving is generated; furthermore, it is necessary to find a probability distribution that can be fitted to the driving demonstration, and the problem of finding this probability distribution can be translated into a non-linear programming problem, namely:
max-plogp
∑P=1
p is the probability distribution of the demonstration track, and after the probability distribution is obtained through the solving of the formula, the probability distribution is obtained
Obtaining the relevant parameters, i.e. obtaining the return function r ═ thetaTf(st)。
The traditional driver driving behavior model utilizes the known driving data to analyze, describe and reason the driving behavior, however, the collected driving data cannot completely cover infinite driving behavior characteristics, and the situation of corresponding actions of all states cannot be obtained. In an actual driving scene, because of different weather, scenes and objects, the driving state has many possibilities, and it is impossible to traverse all the states. Therefore, the traditional driving behavior model of the driver has weak generalization capability, more assumed conditions of the model and poor robustness.
Secondly, in the actual driving problem, the method of setting the reward function only by the researcher needs to balance too many requirements for various characteristics, completely depends on the experience setting of the researcher, repeatedly and manually adjusts, consumes time and labor, and is over subjective even if the researcher is fatal. Under different scenes and environments, researchers need to face too many scene states; meanwhile, even if the requirements are different for a certain scene state, the driving behavior characteristics can be changed. To accurately describe the driving task, a series of weights are assigned to accurately describe these factors. In the existing method, reverse reinforcement learning based on a probability model is mainly based on existing demonstration data, the demonstration data is used as existing data, the distribution situation of the corresponding current data is sought, and action selection in the corresponding state can be obtained based on the distribution situation. However, the known data distribution does not indicate the distribution of all data, and the distribution needs to be acquired correctly, and all states need to be acquired.
Disclosure of Invention
In order to solve the problem of weak generalization of driver modeling, namely the technical problem that a corresponding return function cannot be established to carry out driver behavior modeling under the condition that a driving scene is not in demonstration data in the prior art, the application provides the driver behavior modeling system which can be applied to a new state scene to obtain corresponding actions, so that the generalization capability of the established driver behavior model is greatly improved, the applicable scene is wider, and the robustness is stronger.
In order to achieve the purpose, the technical points of the scheme of the invention are as follows: the driver behavior modeling system specifically comprises:
the characteristic extractor is used for extracting and constructing return function characteristics;
a return function generator for obtaining a driving strategy;
the driving strategy acquirer completes construction of a driving strategy;
the judger judges whether the optimal driving strategy constructed by the acquirer meets judgment standards or not; if the driving strategy does not meet the judgment standard, reconstructing a return function, repeatedly constructing an optimal driving strategy, and repeatedly iterating until the judgment standard is met; finally, a driving strategy describing a real driving demonstration is obtained.
Further, the specific implementation process of extracting and constructing the return function features by the feature extractor is as follows:
s11, in the driving process of the vehicle, a camera placed behind a windshield of the vehicle is used for sampling a driving video to obtain N groups of pictures of road conditions of different vehicle driving environments; meanwhile, corresponding to driving operation data, namely the steering angle condition under the road environment, training data are jointly constructed;
s12, translating, cutting and changing the brightness of the collected pictures to simulate scenes with different illumination and weather;
s13, constructing a convolutional neural network, taking the processed picture as input, taking the operation data of the corresponding picture as a tag value, training, and solving an optimal solution for mean square error loss by adopting an optimization method based on a Nadam optimizer to optimize weight parameters of the neural network;
s14, storing the network structure and the weight of the trained convolutional neural network to establish a new convolutional neural network to complete the state feature extractor.
Further, the convolutional neural network established in step S13 includes 1 input layer, 3 convolutional layers, 3 pooling layers, and 4 full-link layers; the input layer is sequentially connected with the first convolution layer and the first pooling layer, then connected with the second convolution layer and the second pooling layer, then connected with the third convolution layer and the third pooling layer, and finally sequentially connected with the first full-connection layer, the second full-connection layer, the third full-connection layer and the fourth full-connection layer.
Further, the convolutional neural network after the training in step S14 is completed does not include an output layer.
Further, the concrete implementation process of the reward function generator for obtaining the driving strategy is as follows:
s21, acquiring driving demonstration data of an expert: the driving demonstration data is obtained by sampling and extracting demonstration driving video data, and a section of continuous driving video is sampled according to a certain frequency to obtain a group of track demonstration; an expert demonstration data includes a plurality of traces, collectively denoted as:
wherein DERepresenting driving demonstration data as a whole,(s)j,aj) Representing a data pair formed by a corresponding state j and a decision command corresponding to the state, M representing the total number of driving demonstration data, NTRepresenting the number of driving demonstration trajectories, LiRepresenting the state-decision instruction pairs(s) contained in the ith driving demonstration trackj,aj) The number of (2);
s22, obtaining a characteristic expected value of a driving demonstration;
first driving demonstration data DEEach description in (1)State s of driving environmenttInput into a state feature extractor to obtain a corresponding state stCharacteristic case of f(s)t,at),f(st,at) Means a set of correspondences stThen, the characteristic expectation value of the driving demonstration is calculated based on the following formula:
wherein gamma is a discount factor and is correspondingly set according to different problems;
s23, solving a state-action set under a greedy strategy;
and S24, solving the weight of the return function.
Furthermore, the specific steps of solving the state-action set under the greedy strategy are as follows: the return function generator and the driving strategy acquirer are two parts of a cycle; first, a neural network in a driving strategy acquirer is acquired: driving demonstration data DEExtracting the obtained state feature f(s) describing the environmental conditiont,at) Input to the neural network to obtain an output gw(st);gw(st) Is about describing the state stA set of Q values, i.e. [ Q(s) ]t,a1),...,Q(st,an)]TAnd Q(s)t,ai) Representing state-action values for describing the state s in the current driving situationtNext, a decision-making driving action a is selectediThe quality of (d) is obtained based on the formula Q (s, a) where θ denotes a weight value in the current reward function and μ (s, a) denotes a feature expectation value.
Then based on an epsilon-greedy strategy, selecting and describing driving scene state stCorresponding driving decision actionsSelecting a scene s related to a current driving situationtDecision action for maximizing Q value in lower Q value setOtherwise, randomly selectingHas selected completelyThereafter, recording the time
Thus demonstrating for driving DEState feature f(s) of each state in (b)t,at) Inputting the data into the neural network to obtain M state-action pairs(s)t,at) Describing the driving scene state s at time ttLower selection of driving decision action at(ii) a And simultaneously, acquiring Q values of M corresponding state-action pairs based on the condition of action selection, and recording the Q values as Q.
Furthermore, the specific step of obtaining the weight of the reporting function is:
firstly, an objective function is constructed based on the following formula:
representing a loss function, i.e. according to whether the current state-action pair exists in the driving demonstration, if so, it is 0, otherwise, it is 1;the corresponding state-action values recorded above;multiplying the driving demonstration feature expectation obtained in the step S22 by the weight value theta of the return function;is a regular term;
the objective function is minimized by means of a gradient descent method, i.e. t ═ minθJ (theta), obtaining the variable theta minimizing the objective function, where theta is the weight of the desired reward function.
Further, the process of acquiring the driving strategy by the reward function generator further comprises: s25, based on the obtained corresponding return function weight value theta, according to a formula r (s, a) ═ thetaTf (s, a) constructing a reward function generator.
As a further step, the specific implementation process of the driving strategy construction completed by the driving strategy acquirer is as follows:
s31 construction of training data of driving strategy acquirer
Training data is acquired, each data comprising two parts: one is driving decision characteristic f(s) obtained by inputting driving scene state at the time t into a driving state extractort) The other is obtained based on the following formula
Wherein r isθ(st,at) A reward function generated based on driving demonstration data by means of a reward function generator; qπ(st,at) And Qπ(st+1,at+1) From the Q values recorded in S23, a driving scene S describing the time t is selectedtAnd selecting a driving scene s in which the t +1 moment is describedt+1The Q value of (1);
s32, establishing a neural network
The neural network comprises three layers, wherein the first layer is used as an input layer, the number of neurons and the output feature types of the feature extractor are the same and are k, and the neural network is used for inputting the features f(s) of the driving scenet,at) The number of the hidden layers of the second layer is 10, and the third layerThe number of neurons in the layer is the same as the number n of driving actions for decision making in the action space; the activation functions of the input layer and the hidden layer are sigmoid functions, i.e.Namely, the method comprises the following steps:
z=w(1)x=w(1)[1,ft]T
h=sigmoid(z)
gw(st)=sigmoid(w(2)[1,h]T)
wherein w(1)The weight value of the hidden layer; f. oftFor the state s of the driving scene at time ttI.e. the input of the neural network; z is the network layer output when the hidden layer sigmoid activation function is not passed; h is hidden layer output after the sigmoid activation function; w is a(2)Is the weight of the output layer;
g of network outputw(st) Is the driving scene state s at time ttQ set of (1), i.e., [ Q(s) ]t,a1),...,Q(st,an)]TQ in S31π(st,at) That is, the state stInput neural network, selecting a in the outputtThe term is obtained;
s33, optimizing the neural network
For the optimization of the neural network, the established loss function is a cross entropy cost function, and the formula is as follows:
wherein N represents the number of training data; qπ(st,at) Will describe the driving scene state s at time ttInputting the neural network, selecting the corresponding driving decision action a in the outputtThe value obtained by the term;the numerical value obtained in S31;is a regular term where W ═ W(1),w(2)The weight in the neural network is represented by the symbol;
inputting the training data obtained in the S31 into the neural network optimization cost function; and (4) finishing the minimization of the cross entropy cost function by means of a gradient descent method to obtain an optimized neural network, and further obtaining a driving strategy acquirer.
As a further step, the specific implementation process of the arbiter comprises:
considering the current return function generator and the driving strategy acquirer as a whole, checking the t value in the current S22 to see whether t is less than epsilon, wherein epsilon is a threshold value for judging whether a target function meets requirements or not, namely judging whether the return function for acquiring the driving strategy currently meets the requirements or not; the numerical value is set differently according to specific requirements;
when the value of t does not satisfy the formula; the reward function generator needs to be reconstructed, and the neural network needed in the current S23 needs to be replaced by the new neural network which is optimized in S33, namely the neural network is used for generating the state S of the driving scenetNext, the selected decision-making driving action aiGood or bad Q(s)t,ai) A network of values replaced with a new network structure optimized by the gradient descent method in S33; then reconstructing a return function generator to obtain a driving strategy acquirer, and judging whether the value of t meets the requirement again;
when the formula is satisfied, the current theta is the weight of the required return function; the return function generator meets the requirements, and the driving strategy acquirer also meets the requirements; then, collecting the driving data of a certain driver needing to establish a driver model, namely an environmental scene image and corresponding operation data in the driving process, inputting the driving environmental scene image and the corresponding operation data into a driving environmental feature extractor, and obtaining decision-making features of the current scene; then inputting the extracted features into a return function generator to obtain a return function corresponding to the scene state; and then inputting the collected decision characteristics and the calculated return function into a driving strategy acquirer to obtain a driving strategy corresponding to the driver.
Compared with the prior art, the invention has the beneficial effects that: according to the method for describing the driver decision and establishing the driver behavior model, the neural network is adopted to describe the strategy, and when the neural network parameters are determined, the states and the actions are in one-to-one correspondence, so that the possible conditions of the state-action pairs are not limited to the demonstration track. Therefore, in an actual driving situation, due to the large state space corresponding to various driving scenes caused by weather, scenery and the like, by virtue of the excellent capability of the neural network to approximately express any function, the strategy expression can be approximately regarded as a black box: and outputting a corresponding state-action value by inputting the characteristic value of the state, and selecting an action according to the condition of the output value to obtain a corresponding action. Therefore, the applicability of modeling of the driver behavior by means of reverse reinforcement learning is greatly enhanced, the traditional method tries to fit the demonstration track by means of a certain probability distribution, so that the obtained optimal strategy is still limited by the existing state condition in the demonstration track, and the method can be applied to a new state scene to obtain the corresponding action of the new state scene, so that the generalization capability of the established driver behavior model is greatly improved, the application scene is wider, and the robustness is stronger.
Drawings
FIG. 1 is a new deep convolutional neural network;
FIG. 2 is a driving video sampling diagram;
FIG. 3 is a block diagram of the system workflow;
fig. 4 is a diagram illustrating the neural network structure established in step S32.
Detailed Description
The invention will be further explained with reference to the drawings attached to the specification. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
The present embodiment provides a driver behavior modeling system including:
1. the feature extractor extracts and constructs return function features, and the specific mode is as follows:
s11, sampling a driving video obtained by a camera placed behind a windshield of a vehicle in the driving process of the vehicle, wherein a sampling graph is shown in figure 2.
N groups of pictures of road conditions of different vehicle driving road environments and corresponding steering angle conditions are obtained. The training data are jointly constructed by corresponding driving operation data, wherein the training data comprise N1 straight roads and N2 curved roads, the values of N1 and N2 can be N1> -300 and N2> -3000.
And S12, carrying out related operations of translation, cutting, brightness change and the like on the collected image so as to simulate scenes with different illumination and weather.
S13, constructing a convolutional neural network, taking the processed picture as input, taking the operation data of the corresponding picture as a tag value, and training; and (3) optimizing weight parameters of the neural network by solving the optimal solution of the mean square error loss by adopting an optimization method based on a Nadam optimizer.
The convolutional neural network comprises 1 input layer, 3 convolutional layers, 3 pooling layers and 4 full-connection layers. The input layer is sequentially connected with the first convolution layer and the first pooling layer, then connected with the second convolution layer and the second pooling layer, then connected with the third convolution layer and the third pooling layer, and then sequentially connected with the first full-connection layer, the second full-connection layer, the third full-connection layer and the fourth full-connection layer.
And S14, storing the network structure and the weight of the trained convolutional neural network except the final output layer to establish a new convolutional neural network to complete the state feature extractor.
2. The return function generator acquires a driving strategy, and the specific mode is as follows:
the return function is used as a standard for action selection in the reinforcement learning method, and the quality of the return function plays a decisive role in the acquisition process of the driving strategy, so that the quality of the acquired driving strategy is directly determined, and whether the acquired strategy is the same as the strategy corresponding to the real driving demonstration data or not is directly determined. The formula of the return function is reward ═ thetaTf(st,at),f(st,at) Indicates a state s at time t in a scene corresponding to a driving environment "surrounding environment of a vehicletThe characteristic values influencing the driving decision result are used for describing the scene condition of the environment around the vehicle. And theta represents a group of weights corresponding to the characteristics influencing the driving decision, and the numerical value of the weights shows the proportion of the corresponding environmental characteristics in the return function, so that the importance is embodied. On the basis of the state feature extractor, the weight value theta needs to be solved, so that a return function influencing the driving strategy is constructed.
S21, obtaining driving demonstration data of experts
The driving demonstration data is derived from sample extraction of the demonstration driving video data (as opposed to data used by a previous driving environment feature extractor), and a continuous segment of driving video may be sampled at a frequency of 10hz, resulting in a set of trajectory demonstrations. One expert demonstration should have multiple tracks. Overall notation is:wherein DERepresenting driving demonstration data as a whole,(s)j,aj) A data pair representing a corresponding state j (a video picture of the driving environment at the sampled time j) and a decision command corresponding to the state (e.g. a steering angle in a steering command), M representing the total number of driving demonstration data, NTRepresenting the number of driving demonstration trajectories, LiRepresenting the state-decision instruction pairs(s) contained in the ith driving demonstration trackj,aj) Number of (2)
S22, obtaining characteristic expectation of driving demonstration
First driving demonstration data DEEach state s describing a driving environment conditiontInput state feature extractor for obtaining corresponding state stCharacteristic case of f(s)t,at),f(st,at) Means a set of correspondences stThen calculates the characteristic expectation of the driving demonstration based on the following formula:
where γ is a discount factor, and the reference value may be set to 0.65 according to different problems.
S23, obtaining a state-action set under a greedy strategy
First, the neural network in the driving strategy acquirer in S32 is acquired. (since the reward function generator and the driving strategy acquirer are two parts of a loop, the neural network is the neural network just initialized in S32 at the beginning. with the progress of the loop, each step in the loop is that the construction of a reward function influencing the driving decision is completed once, then the corresponding optimal driving strategy is acquired in the driving strategy acquirer based on the current reward function, whether the criterion for ending the loop is met is judged, if not, the optimized neural network in the current S34 is put into the reconstruction of the reward function)
Driving demonstration data DEExtracting the obtained state feature f(s) describing the environmental conditiont,at) Input to a neural network to obtain an output gw(st);gw(st) Is about describing the state stA set of Q values, i.e. [ Q(s) ]t,a1),...,Q(st,an)]TAnd Q(s)t,ai) Representing state-action values for describing the state s in the current driving situationtNext, a decision-making driving action a is selectediThe quality of (d) can be obtained based on the formula Q (s, a) ═ θ · μ (s, a), where θ denotes the weight in the current reward function and μ (s, a) denotes the desired feature.
Then, based on an epsilon-greedy strategy, if epsilon is set to be 0.5, a driving scene state s is selected and describedtCorresponding driving decision actionsThat is, there is a fifty percent likelihood of picking a score for the current driving scenario stDecision action for maximizing Q value in lower Q value setOtherwise, randomly selectingHas selected completelyThereafter, recording the time
Thus demonstrating for driving DEState feature f(s) of each state in (b)t,at) Inputting the data into the neural network to obtain M state-action pairs(s)t,at) Which describes the driving scene state s at time ttLower selection of driving decision action at. And simultaneously, acquiring Q values of M corresponding state-action pairs based on the condition of action selection, and recording the Q values as Q.
S24, weight value of the return function is obtained
Firstly, an objective function is constructed based on the following formula:
represents a penalty function, i.e., a function that is based on whether a current state-action pair exists in the driving demonstration, 0 if it exists, and 1 if it does not exist.The corresponding state-action values recorded above.Is the product of the driving demonstration feature expectation obtained in S22 and the weight value θ of the reward function.To be a regular term, to prevent the over-fitting problem from occurring, this γ may be 0.9.
The objective function is minimized by means of a gradient descent method, i.e. t ═ minθJ (theta), obtaining the variable theta minimizing the objective function, where theta is the weight of the desired reward function.
S25, based on the obtained corresponding return function weight value theta, according to a formula r (s, a) ═ thetaTf (s, a) constructing a reward function generator.
3. The driving strategy acquirer completes construction of a driving strategy, and the specific mode is as follows:
construction of training data of S31 driving strategy acquirer
Training data is acquired. The data comes from sampling the previous exemplary data, but needs to be processed to get a set of new types of data, N in total. Each of the data includes two parts: one is driving decision characteristic f(s) obtained by inputting driving scene state at the time t into a driving state extractort) The other is obtained based on the following formula
The formula includes a parameter rθ(st,at) A reward function generated by a reward function generator based on driving demonstration data. Qπ(st,at) And Qπ(st+1,at+1) From the set of Q values Q recorded in S23, a driving scene S describing the time t is selectedtAnd selecting a driving scene s in which the t +1 moment is describedt+1The Q value of (1).
S32, establishing a neural network
The neural network comprises three layers, wherein the first layer is used as an input layer, the number of neurons and the output feature types of the feature extractor are the same and are k, and the neural network is used for inputting the features f(s) of the driving scenet,at) Second, secondThe number of hidden layers of the layer is 10, and the number of neurons of the third layer is the same as the number n of driving actions for decision making in the action space; the activation functions of the input layer and the hidden layer are sigmoid functions, i.e.Namely, the method comprises the following steps:
z=w(1)x=w(1)[1,ft]T
h=sigmoid(z)
gw(st)=sigmoid(w(2)[1,h]T)
wherein w(1)The weight of the hidden layer is represented; f. oftState s denoting the driving scenario at time ttI.e. the input of the neural network; z represents the output of the network layer when the hidden layer sigmoid activation function is not passed; h represents hidden layer output after sigmoid activation function; w is a(2)The weight of the output layer is represented; the network structure is as shown in FIG. 3:
g of network outputw(st) Is the driving scene state s at time ttQ set of (1), i.e., [ Q(s) ]t,a1),...,Q(st,an)]TQ in S31π(st,at) That is, the state stInput neural network, selecting a in the outputtThe term is obtained.
S33, optimizing the neural network
For the optimization of the neural network, the established loss function is a cross entropy cost function, and the formula is as follows:
wherein N denotes the number of training data. Qπ(st,at) That is, the driving scene state s at time t will be describedtInputting the neural network, selecting the corresponding driving decision action a in the outputtThe numerical values obtained by the terms.The values obtained in S31.Also a regular term, is set to prevent overfitting. The γ may be 0.9. Wherein W ═ { W ═ W(1),w(2)And the weights in the neural network are referred to.
The training data obtained in S31 is input to the neural network optimization cost function. And (4) finishing the minimization of the cross entropy cost function by means of a gradient descent method to obtain an optimized neural network, and obtaining a driving strategy acquirer.
4. The judger judges whether the optimal driving strategy constructed by the acquirer meets judgment standards or not; if the driving strategy does not meet the judgment standard, reconstructing a return function, repeatedly constructing an optimal driving strategy, and repeatedly iterating until the judgment standard is met; finally, a driving strategy describing a real driving demonstration is obtained.
The current return function generator and the driving strategy acquirer are considered as a whole, the t value in the current S22 is checked, whether t is less than epsilon or not is met, and epsilon is a threshold value for judging whether a target function meets requirements or not, namely whether the return function for acquiring the driving strategy currently meets the requirements or not is judged. The numerical value is set differently according to specific needs.
When t is a value that does not satisfy the formula. The reward function generator needs to be reconstructed, and the neural network needed in the current S23 needs to be replaced by the new neural network which is optimized in S33, namely the neural network is used for generating the state S of the driving scenetNext, the selected decision-making driving action aiGood or bad Q(s)t,ai) The network of values is replaced with a new network structure optimized by the gradient descent method in S33. And then reconstructing a return function generator to obtain a driving strategy acquirer, and judging whether the value of t meets the requirement again.
When the formula is satisfied, the current θ is the weight of the desired reward function. The return function generator meets the requirements, and the driving strategy acquirer also meets the requirements. Thus, it is possible to: the method comprises the steps of collecting driving data of a certain driver needing to establish a driver model, namely an environment scene image and corresponding operation data in the driving process, such as a driving steering angle. And inputting the driving environment characteristic extractor to obtain the decision characteristic of the current scene. And then inputting the extracted features into a return function generator to obtain a return function corresponding to the scene state. And then inputting the collected decision characteristics and the calculated return function into a driving strategy acquirer to obtain a driving strategy corresponding to the driver.
In the markov decision process, one strategy requires a connection state to its corresponding action. However, when a state space with a large range exists, it is difficult to describe a certain strategic representation for an unretraversed area, the description of the certain strategic representation is omitted in the conventional method, the probability model of the whole trajectory distribution is described based on an exemplary trajectory, and no specific strategic representation is given for a new state, that is, no specific method is given for the possibility of taking certain action for the new state. In the invention, the strategy is described by means of a neural network, and the neural network has excellent generalization capability because the neural network can approximately represent the characteristics of any function at any accuracy. By means of the representation of the state features, states which are not contained in the exemplary trajectories can be represented on the one hand, and, in addition, by means of the input of corresponding state features into the neural network. The corresponding action value can be obtained, so that the obtained action is obtained according to a strategy, and the problem that the traditional method cannot generalize the driving demonstration data to the state of the driving scene which is not traversed is solved.
The above description is only for the purpose of creating a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution and the inventive concept of the present invention within the technical scope of the present invention.
Claims (8)
1. The driver behavior modeling system is characterized by specifically comprising:
the characteristic extractor is used for extracting and constructing return function characteristics;
a return function generator for obtaining a driving strategy;
the driving strategy acquirer completes construction of a driving strategy;
the judger judges whether the optimal driving strategy constructed by the acquirer meets judgment standards or not; if the driving strategy does not meet the judgment standard, reconstructing a return function, repeatedly constructing an optimal driving strategy, and repeatedly iterating until the judgment standard is met;
the specific implementation process of extracting and constructing the return function features by the feature extractor is as follows:
s11, in the driving process of the vehicle, a camera placed behind a windshield of the vehicle is used for sampling a driving video to obtain N groups of pictures of road conditions of different vehicle driving environments and corresponding steering angle conditions; simultaneously, corresponding to driving operation data, training data are jointly constructed;
s12, translating, cutting and changing the brightness of the collected pictures to simulate scenes with different illumination and weather;
s13, constructing a convolutional neural network, taking the processed picture as input, taking the operation data of the corresponding picture as a tag value, training, and solving an optimal solution for mean square error loss by adopting an optimization method based on a Nadam optimizer to optimize weight parameters of the neural network;
s14, storing the network structure and the weight of the trained convolutional neural network to establish a new convolutional neural network and complete the state feature extractor;
the concrete implementation process of the return function generator for obtaining the driving strategy is as follows:
s21, acquiring driving demonstration data of an expert: the driving demonstration data is obtained by sampling and extracting demonstration driving video data, and a section of continuous driving video is sampled according to a certain frequency to obtain a group of track demonstration; an expert demonstration data includes a plurality of traces, collectively denoted as:
DE={(s1,a1),(s2,a2),...,(sM,aM)}wherein DERepresenting driving demonstration data as a whole,(s)j,aj) Representing a data pair formed by a corresponding state j and a decision command corresponding to the state, M representing the total number of driving demonstration data, NTRepresenting the number of driving demonstration trajectories, LiRepresenting the state-decision instruction pairs(s) contained in the ith driving demonstration trackj,aj) The number of (2);
s22, obtaining a characteristic expected value of a driving demonstration;
first driving demonstration data DEEach state s describing a driving environment conditiontInput into a state feature extractor to obtain a corresponding state stCharacteristic case of f(s)t,at),f(st,at) Means a set of correspondences stThen, the characteristic expectation value of the driving demonstration is calculated based on the following formula:
wherein gamma is a discount factor and is correspondingly set according to different problems;
s23, solving a state-action set under a greedy strategy;
and S24, solving the weight of the return function.
2. The driver behavior modeling system of claim 1, wherein the convolutional neural network established in step S13 comprises 1 input layer, 3 convolutional layers, 3 pooling layers, 4 fully-connected layers; the input layer is sequentially connected with the first convolution layer and the first pooling layer, then connected with the second convolution layer and the second pooling layer, then connected with the third convolution layer and the third pooling layer, and finally sequentially connected with the first full-connection layer, the second full-connection layer, the third full-connection layer and the fourth full-connection layer.
3. The driver behavior modeling system of claim 1, wherein the trained convolutional neural network of step S14 does not include an output layer.
4. The driver behavior modeling system of claim 1, wherein the specific step of finding the set of state-actions under a greedy strategy is: the return function generator and the driving strategy acquirer are two parts of a cycle; first, a neural network in a driving strategy acquirer is acquired: driving demonstration data DEExtracting the obtained state feature f(s) describing the environmental conditiont,at) Input to a neural network to obtain an output gw(st);gw(st) Is about describing the state stA set of Q values, i.e. [ Q(s) ]t,a1),...,Q(st,an)]TAnd Q(s)t,ai) Representing state-action values for describing the state s in the current driving situationtNext, a decision-making driving action a is selectediThe quality of (2) is obtained based on a formula Q (s, a) ═ theta · mu (s, a), wherein theta refers to the weight in the current reward function, and mu (s, a) refers to the expected feature value;
then based on an epsilon-greedy strategy, selecting and describing driving scene state stCorresponding driving decision actionsSelecting a scene s related to a current driving situationtDecision action for maximizing Q value in lower Q value setOtherwise, randomly selectingHas selected completelyThereafter, recording the time
Thus demonstrating for driving DEState feature f(s) of each state in (b)t,at) Inputting the data into the neural network to obtain M state-action pairs(s)t,at) Describing the driving scene state s at time ttLower selection of driving decision action at(ii) a And simultaneously, acquiring Q values of M corresponding state-action pairs based on the condition of action selection, and recording the Q values as Q.
5. The driver behavior modeling system of claim 1, wherein the specific step of weighting the reward function is:
firstly, an objective function is constructed based on the following formula:
representing a loss function, i.e. according to whether the current state-action pair exists in the driving demonstration, if so, it is 0, otherwise, it is 1;the corresponding state-action values recorded above;multiplying the driving demonstration feature expectation obtained in the step S22 by the weight value theta of the return function;is a regular term;
the objective function is minimized by means of a gradient descent method, i.e. t ═ minθJ (theta), obtaining a variable theta that minimizes the objective function, where theta isAnd calculating the weight of the required return function.
6. The driver behavior modeling system of claim 1, wherein the reward function generator obtaining the driving strategy concrete implementation further comprises: s25, based on the obtained corresponding return function weight value theta, according to a formula r (s, a) ═ thetaTf (s, a) constructing a reward function generator.
7. The driver behavior modeling system of claim 1, wherein the driving strategy acquirer implements the driving strategy construction by:
s31 construction of training data of driving strategy acquirer
Training data is acquired, each data comprising two parts: one is driving decision characteristic f(s) obtained by inputting driving scene state at the time t into a driving state extractort) The other is obtained based on the following formula
Wherein r isθ(st,at) A reward function generated based on driving demonstration data by means of a reward function generator; qπ(st,at) And Qπ(st+1,at+1) From the Q values recorded in S23, a driving scene S describing the time t is selectedtAnd selecting a driving scene s in which the t +1 moment is describedt+1The Q value of (1);
s32, establishing a neural network
The neural network comprises three layers, wherein the first layer is used as an input layer, the number of neurons and the output feature types of the feature extractor are the same and are k, and the neural network is used for inputting the features f(s) of the driving scenet,at) The number of hidden layers in the second layer is 10, and the number of neurons in the third layer and the action space are performedThe number n of the decided driving actions is the same; the activation functions of the input layer and the hidden layer are sigmoid functions, i.e.Namely, the method comprises the following steps:
z=w(1)x=w(1)[1,ft]T
h=sigmoid(z)
gw(st)=sigmoid(w(2)[1,h]T)
wherein w(1)The weight value of the hidden layer; f. oftFor the state s of the driving scene at time ttI.e. the input of the neural network; z is the network layer output when the hidden layer sigmoid activation function is not passed; h is hidden layer output after the sigmoid activation function; w is a(2)Is the weight of the output layer;
g of network outputw(st) Is the driving scene state s at time ttQ set of (1), i.e., [ Q(s) ]t,a1),...,Q(st,an)]TQ in S31π(st,at) That is, the state stInput neural network, selecting a in the outputtThe term is obtained;
s33, optimizing the neural network
For the optimization of the neural network, the established loss function is a cross entropy cost function, and the formula is as follows:
wherein N represents the number of training data; qπ(st,at) Will describe the driving scene state s at time ttInputting the neural network, selecting the corresponding driving decision action a in the outputtThe value obtained by the term;the numerical value obtained in S31;is a regular term where W ═ W(1),w(2)The weight in the neural network is represented by the symbol;
inputting the training data obtained in the S31 into the neural network optimization cost function; and (4) finishing the minimization of the cross entropy cost function by means of a gradient descent method to obtain an optimized neural network, and further obtaining a driving strategy acquirer.
8. The driver behavior modeling system of claim 5, wherein the determiner implementation comprises:
the current return function generator and the driving strategy acquirer are considered as a whole, whether a t value meets t which is less than epsilon or not is checked, and epsilon is a threshold value for judging whether a target function meets requirements or not, namely whether the return function for acquiring the driving strategy currently meets the requirements or not is judged; the numerical value is set differently according to specific requirements;
when the value of t does not satisfy the formula; the reward function generator needs to be reconstructed, and the neural network needed in the current S23 needs to be replaced by the new neural network which is optimized in S33, namely the neural network is used for generating the state S of the driving scenetNext, the selected decision-making driving action aiGood or bad Q(s)t,ai) A network of values replaced with a new network structure optimized by the gradient descent method in S33; then reconstructing a return function generator to obtain a driving strategy acquirer, and judging whether the value of t meets the requirement again;
when the formula is satisfied, the current theta is the weight of the required return function; the return function generator meets the requirements, and the driving strategy acquirer also meets the requirements; then, collecting the driving data of a certain driver needing to establish a driver model, namely an environmental scene image and corresponding operation data in the driving process, inputting the driving environmental scene image and the corresponding operation data into a driving environmental feature extractor, and obtaining decision-making features of the current scene; then inputting the extracted features into a return function generator to obtain a return function corresponding to the scene state; and then inputting the collected decision characteristics and the calculated return function into a driving strategy acquirer to obtain a driving strategy corresponding to the driver.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810662040.0A CN108791302B (en) | 2018-06-25 | 2018-06-25 | Driver behavior modeling system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810662040.0A CN108791302B (en) | 2018-06-25 | 2018-06-25 | Driver behavior modeling system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108791302A CN108791302A (en) | 2018-11-13 |
CN108791302B true CN108791302B (en) | 2020-05-19 |
Family
ID=64070795
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810662040.0A Active CN108791302B (en) | 2018-06-25 | 2018-06-25 | Driver behavior modeling system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108791302B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200363800A1 (en) * | 2019-05-13 | 2020-11-19 | Great Wall Motor Company Limited | Decision Making Methods and Systems for Automated Vehicle |
CN110481561B (en) * | 2019-08-06 | 2021-04-27 | 北京三快在线科技有限公司 | Method and device for generating automatic control signal of unmanned vehicle |
CN111079533B (en) * | 2019-11-14 | 2023-04-07 | 深圳大学 | Unmanned vehicle driving decision method, unmanned vehicle driving decision device and unmanned vehicle |
CN112052776B (en) * | 2020-09-01 | 2021-09-10 | 中国人民解放军国防科技大学 | Unmanned vehicle autonomous driving behavior optimization method and device and computer equipment |
CN112373482B (en) * | 2020-11-23 | 2021-11-05 | 浙江天行健智能科技有限公司 | Driving habit modeling method based on driving simulator |
WO2022221979A1 (en) * | 2021-04-19 | 2022-10-27 | 华为技术有限公司 | Automated driving scenario generation method, apparatus, and system |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103381826A (en) * | 2013-07-31 | 2013-11-06 | 中国人民解放军国防科学技术大学 | Adaptive cruise control method based on approximate policy iteration |
CN105955930A (en) * | 2016-05-06 | 2016-09-21 | 天津科技大学 | Guidance-type policy search reinforcement learning algorithm |
CN107168303A (en) * | 2017-03-16 | 2017-09-15 | 中国科学院深圳先进技术研究院 | A kind of automatic Pilot method and device of automobile |
CN107203134A (en) * | 2017-06-02 | 2017-09-26 | 浙江零跑科技有限公司 | A kind of front truck follower method based on depth convolutional neural networks |
CN107229973A (en) * | 2017-05-12 | 2017-10-03 | 中国科学院深圳先进技术研究院 | The generation method and device of a kind of tactful network model for Vehicular automatic driving |
CN107480726A (en) * | 2017-08-25 | 2017-12-15 | 电子科技大学 | A kind of Scene Semantics dividing method based on full convolution and shot and long term mnemon |
CN107679557A (en) * | 2017-09-19 | 2018-02-09 | 平安科技(深圳)有限公司 | Driving model training method, driver's recognition methods, device, equipment and medium |
CN108108657A (en) * | 2017-11-16 | 2018-06-01 | 浙江工业大学 | A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning |
-
2018
- 2018-06-25 CN CN201810662040.0A patent/CN108791302B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103381826A (en) * | 2013-07-31 | 2013-11-06 | 中国人民解放军国防科学技术大学 | Adaptive cruise control method based on approximate policy iteration |
CN105955930A (en) * | 2016-05-06 | 2016-09-21 | 天津科技大学 | Guidance-type policy search reinforcement learning algorithm |
CN107168303A (en) * | 2017-03-16 | 2017-09-15 | 中国科学院深圳先进技术研究院 | A kind of automatic Pilot method and device of automobile |
CN107229973A (en) * | 2017-05-12 | 2017-10-03 | 中国科学院深圳先进技术研究院 | The generation method and device of a kind of tactful network model for Vehicular automatic driving |
CN107203134A (en) * | 2017-06-02 | 2017-09-26 | 浙江零跑科技有限公司 | A kind of front truck follower method based on depth convolutional neural networks |
CN107480726A (en) * | 2017-08-25 | 2017-12-15 | 电子科技大学 | A kind of Scene Semantics dividing method based on full convolution and shot and long term mnemon |
CN107679557A (en) * | 2017-09-19 | 2018-02-09 | 平安科技(深圳)有限公司 | Driving model training method, driver's recognition methods, device, equipment and medium |
CN108108657A (en) * | 2017-11-16 | 2018-06-01 | 浙江工业大学 | A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning |
Non-Patent Citations (1)
Title |
---|
基于轨迹分析的自主导航性能评估方法;王勇鑫,钱徽,金卓军,朱淼良;《计算机工程》;20110320;142页 * |
Also Published As
Publication number | Publication date |
---|---|
CN108791302A (en) | 2018-11-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108819948B (en) | Driver behavior modeling method based on reverse reinforcement learning | |
CN108791302B (en) | Driver behavior modeling system | |
CN108920805B (en) | Driver behavior modeling system with state feature extraction function | |
US11062617B2 (en) | Training system for autonomous driving control policy | |
CN110874578B (en) | Unmanned aerial vehicle visual angle vehicle recognition tracking method based on reinforcement learning | |
CN109131348B (en) | Intelligent vehicle driving decision method based on generative countermeasure network | |
CN110991027A (en) | Robot simulation learning method based on virtual scene training | |
CN108891421B (en) | Method for constructing driving strategy | |
CN110281949B (en) | Unified hierarchical decision-making method for automatic driving | |
CN112550314B (en) | Embedded optimization type control method suitable for unmanned driving, driving control module and automatic driving control system thereof | |
CN108944940B (en) | Driver behavior modeling method based on neural network | |
CN114162146B (en) | Driving strategy model training method and automatic driving control method | |
Farag | Cloning safe driving behavior for self-driving cars using convolutional neural networks | |
Babiker et al. | Convolutional neural network for a self-driving car in a virtual environment | |
CN113869170B (en) | Pedestrian track prediction method based on graph division convolutional neural network | |
Farag | Safe-driving cloning by deep learning for autonomous cars | |
CN115376103A (en) | Pedestrian trajectory prediction method based on space-time diagram attention network | |
CN117406762A (en) | Unmanned aerial vehicle remote control algorithm based on sectional reinforcement learning | |
CN117709602B (en) | Urban intelligent vehicle personification decision-making method based on social value orientation | |
Zhong et al. | Behavior prediction for unmanned driving based on dual fusions of feature and decision | |
Meftah et al. | A virtual simulation environment using deep learning for autonomous vehicles obstacle avoidance | |
CN110222822A (en) | The construction method of black box prediction model internal feature cause-and-effect diagram | |
CN117078923B (en) | Automatic driving environment-oriented semantic segmentation automation method, system and medium | |
CN108791308B (en) | System for constructing driving strategy based on driving environment | |
Oinar et al. | Self-driving car steering angle prediction: Let transformer be a car again |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
OL01 | Intention to license declared | ||
OL01 | Intention to license declared |