CN108791302A - Driving behavior modeling - Google Patents
Driving behavior modeling Download PDFInfo
- Publication number
- CN108791302A CN108791302A CN201810662040.0A CN201810662040A CN108791302A CN 108791302 A CN108791302 A CN 108791302A CN 201810662040 A CN201810662040 A CN 201810662040A CN 108791302 A CN108791302 A CN 108791302A
- Authority
- CN
- China
- Prior art keywords
- driving
- reward program
- state
- feature
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000009471 action Effects 0.000 claims abstract description 16
- 238000000605 extraction Methods 0.000 claims abstract description 14
- 238000013528 artificial neural network Methods 0.000 claims description 55
- 238000000034 method Methods 0.000 claims description 45
- 230000006870 function Effects 0.000 claims description 38
- 230000001154 acute effect Effects 0.000 claims description 23
- 208000015181 infectious disease Diseases 0.000 claims description 23
- 238000012549 training Methods 0.000 claims description 21
- 238000005457 optimization Methods 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 17
- 238000013527 convolutional neural network Methods 0.000 claims description 13
- 230000004913 activation Effects 0.000 claims description 8
- 238000011478 gradient descent method Methods 0.000 claims description 7
- 238000005070 sampling Methods 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 5
- 210000002569 neuron Anatomy 0.000 claims description 5
- 238000005286 illumination Methods 0.000 claims description 3
- 239000004575 stone Substances 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 2
- 239000000203 mixture Substances 0.000 claims description 2
- 210000004218 nerve net Anatomy 0.000 claims description 2
- 230000001537 neural effect Effects 0.000 claims 1
- 230000000875 corresponding effect Effects 0.000 description 33
- 230000006399 behavior Effects 0.000 description 28
- 238000009826 distribution Methods 0.000 description 10
- 238000007796 conventional method Methods 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 241000736199 Paeonia Species 0.000 description 1
- 235000006484 Paeonia officinalis Nutrition 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000003014 reinforcing effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W40/00—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
- B60W40/08—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to drivers or passengers
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W2050/0001—Details of the control system
- B60W2050/0019—Control system elements or transfer functions
- B60W2050/0028—Mathematical models, e.g. for simulation
- B60W2050/0029—Mathematical model of the driver
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Mechanical Engineering (AREA)
- Evolutionary Computation (AREA)
- Transportation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Automation & Control Theory (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Evolutionary Biology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Feedback Control In General (AREA)
Abstract
The invention discloses a kind of driving behavior modelings, specifically include feature extractor, extraction structure Reward Program feature;Reward Program generator obtains the Reward Program needed for structure driving strategy;Driving strategy getter completes the structure of driving strategy;Judging device judges the optimal driving strategy of getter structure, whether meets judgment criteria;If not satisfied, then rebuilding Reward Program, repetition builds optimal driving strategy, iterates, until meeting judgment criteria;It is final to obtain the true driving strategy for driving demonstration of description.The application can be applicable in new state scene, to obtain its respective action, substantially increase the generalization ability of the driver behavior model of foundation, and applicable scene is wider, and robustness is stronger.
Description
Technical field
The present invention relates to a kind of modeling method, specifically driving behavior modeling.
Background technology
A pith of the autonomous driving as intelligent transportation field.By reasons such as current techniques, Autonomous Vehicles still need
Intelligent driving system (intelligent DAS (Driver Assistant System)) and human driver is wanted to cooperate to complete driving task.And in this mistake
Cheng Zhong, either preferably the information of quantization driver is not all for intelligence system decision, or by distinguish driver
People provide personalized service, and driver modeling is all essential important step.
Current related in the method for driver modeling, intensified learning method because of for driver vehicle drive in this way
Complicated sequential decision problem with extensive continuous space and multiple optimization aims has and solves effect well, then
It is also that one kind modeling effective method for driving behavior.Intensified learning be used as based on MDP the problem of solution, need and
Environmental interaction, the feedback signal for taking action to obtain the evaluation property from environment returns (reward), and makes long-term
Return maximization.
By the retrieval discovery for existing literature, in the existing modeling for driving behavior, for Reward Program
Setting method includes mainly:Traditional method for being directed to different scenes state manually by researcher and being configured, and by
The method of reverse intensified learning is come the method that is arranged.Traditional method relies on the subjectivity of researcher in very big, return letter
Several quality depends on the ability and experience of researcher.Simultaneously as in vehicle travel process, in order to correctly be arranged back
Report function, need to balance a large amount of decision variable, these variables exist greatly can not the property spent together even paradox, and study people
Member can not often design the Reward Program that can balance every demand.
And reverse intensified learning distributes suitable weight for all kinds of driving characteristics by example data is driven, it can be automatic
Study obtains required Reward Program, and then solves the deficiency of original artificial decision.But traditional reverse intensified learning side
Method can only learn for driving already present scene state in example data, and when actually drive, because weather,
The difference of the factors such as scenery, true Driving Scene often surmount driving example range.Thus, the method for reverse intensified learning solves
The relationship for driving example data Scene and decision action is shown to the problem of generalization ability deficiency.
There are mainly two types of thinkings for the existing driving behavior modeling method based on intensified learning theory:Thinking one, using biography
Unite the method for intensified learning, the setting dependence researcher of Reward Program for the analysis of scene, arrangement, screening and conclusion,
And then a series of related feature of Driving Decision-makings is acquired, such as:Chinese herbaceous peony away from, whether far from curb, whether far from pedestrian, rationally speed
Degree, lane change frequency etc.;Further according to Driving Scene demand, a series of experiment is designed to seek these features in corresponding scene environment
Under Reward Program in weight accounting, finally complete the global design for Reward Program, and drive as description driver
The model of behavior.Thinking two is based on probabilistic model modeling method, and driving behavior is solved using the reverse intensified learning of maximum entropy
Function.It assumes first that in the presence of potential, specific one probability distribution, produces the demonstration track of driving;In turn, it needs to look for
The problem of probability distribution for driving demonstration can be fitted to one, and seek this probability distribution, can be converted into Non-Linear Programming and ask
Topic, i.e.,:
max-plogp
∑ P=1
P acute pyogenic infection of finger tip is exactly the probability distribution of track of demonstrating, and is solved after obtaining probability distribution by formula above, by
It seeks obtaining relevant parameter, you can acquire Reward Program r=θTf(st)。
Traditional driver's driving behavior model, using the analysis of known driving data, description and reasoning driving behavior, however
Inexhaustible driving behavior can not be completely covered in the driving data acquired, unlikely obtain whole states and correspond to
The case where action.Under practical Driving Scene, because of the difference of weather, scene, object, driving condition has numerous possibility, time
It is impossible thing to go through whole states.Therefore traditional driver's driving behavior model generalization ability is weak, model hypothesis condition
It is more, poor robustness.
Secondly, in actual driving problem, the method for Reward Program is only set with researcher, needs balance too many right
In the demand of various features, it is completely dependent on the experience setting of researcher entirely, reconciles, takes time and effort manually repeatedly, it is more fatal
It is excessively subjective.Under different scenes and environment, researcher then needs to face too many scene state;Meanwhile even needle
To the scene state of some determination, the difference of demand also results in the variation of driving behavior.For accurate description, this is driven
The task of sailing will distribute a series of weights with these factors of accurate description.In existing method, the reverse reinforcing based on probabilistic model
Study is mainly from existing example data, using example data as data with existing, and then seeks point of corresponding current data
Cloth situation, the action that can be just sought based on this under corresponding states are chosen.But the distribution of given data can not indicate total data
Distribution, it is correct to obtain distribution, the case where needing to obtain whole state respective actions.
Invention content
To solve the problems, such as that driver modeling generalization is weak, i.e., not showing for Driving Scene in the presence of the prior art
Norm can not establish corresponding Reward Program come the technical issues of carrying out driving behavior modeling, the application carries in the case of
Driving behavior modeling has been supplied, new state scene can be applicable in, to obtain its respective action, be greatly improved
The generalization ability for the driver behavior model established, is applicable in that scene is wider, and robustness is stronger.
To achieve the goals above, the technical essential of the present invention program is:Driving behavior modeling, specifically includes:
Feature extractor, extraction structure Reward Program feature;
Reward Program generator obtains driving strategy;
Driving strategy getter completes the structure of driving strategy;
Judging device judges the optimal driving strategy of getter structure, whether meets judgment criteria;If not satisfied, then weighing
New structure Reward Program, repetition build optimal driving strategy, iterate, until meeting judgment criteria;It is final to obtain description very
The real driving strategy for driving demonstration.
Further, the specific implementation process of feature extractor extraction structure Reward Program feature is:
S11. in vehicle travel process, driving video is adopted using vehicle windscreen subsequent video camera is placed on
Sample obtains the picture of N group different vehicle driving environment road conditions;Corresponding driver behavior data simultaneously, i.e., under the road environment
Steering angle situation, joint mapping gets up training data;
S12. it translated to collecting the picture come, cut, change brightness operation, to simulate the field of different illumination and weather
Scape;
S13. convolutional neural networks are built, using picture after treatment as input, the operation data of corresponding picture is made
It for label value, is trained, optimizes god using optimal solution is sought to mean square error loss based on the optimization method of Nadam optimizers
Weight parameter through network;
S14. the network structure of the convolutional neural networks after the completion of training and weights are preserved, to establish a new convolution
Neural network, completion status feature extractor.
Further, the convolutional neural networks established in step S13 include 1 input layer, 3 convolutional layers, 3 ponds
Layer, 4 full articulamentums;Input layer is sequentially connected first convolutional layer, first pond layer, then connect second convolutional layer,
Second pond layer reconnects third convolutional layer, third pond layer, is finally sequentially connected first full articulamentum, second
A full articulamentum, third full articulamentum, the 4th full articulamentum.
Further, the convolutional neural networks after the completion of the training in step S14 do not include output layer.
Further, Reward Program generator, which obtains driving strategy specific implementation process, is:
S21. the driving example data of expert is obtained:Example data is driven from the sampling for driving video data of demonstrating
Extraction, samples according to one section of continuous driving video of certain frequency pair, obtains one group of track demonstration;One expert's demonstration number
According to including a plurality of track, totally it is denoted as:
Wherein DEIndicate whole driving example data,
(sj,aj) indicating that corresponding states j corresponds to the data pair of decision instruction composition with the state, M represents driving example data in total
Number, NTIt represents and drives demonstration trace number, LiIt represents i-th and drives the state-decision instruction for including in demonstration track to (sj,
aj) number;
S22. seek driving the feature desired value of demonstration;
Example data D will be driven firstEIn each description driving environment situation state stInput state feature extractor
In, obtain corresponding states stUnder feature situation f (st,at), f (st,at) one group of correspondence s of acute pyogenic infection of finger tiptInfluence Driving Decision-making result
Driving environment scene characteristic value, be then based on following formula calculate drive demonstration feature desired value:
Wherein γ is discount factor, and according to the difference of problem, correspondence is configured;
S23. state-behavior aggregate under greedy strategy is sought;
S24. the weights of Reward Program are sought.
Further, comprising the concrete steps that for state-behavior aggregate under greedy strategy is sought:Due to Reward Program generator
It is two parts of cycle with driving strategy getter;First, the neural network in driving strategy getter is obtained:It demonstrates driving
Data DEExtract the state feature f (s of obtained description ambient conditionst,at) input neural network, obtain output gw(st);gw
(st) it is about description state stOne group of Q value set, i.e. [Q (st,a1),...,Q(st,an)]T, and Q (st,ai) represent state-
Working value, for describing in current Driving Scene state stUnder, choose decision driver behavior aiQuality, based on formula Q (s, a)
(s a) is acquired=θ μ, the weights in the current Reward Program of θ acute pyogenic infection of finger tip in the formula, μ (s, a) acute pyogenic infection of finger tip feature desired value.
ε-greedy strategies are then based on, carry out choosing description Driving Scene state stCorresponding Driving Decision-making action
It chooses about current Driving Scene stUnder Q value sets in allow the maximum decision action of Q valuesOtherwise, then it randomly selects
It has chosenLater, it records at this time
So for driving the D that demonstratesEIn each state state feature f (st,at), the neural network is inputted, is obtained altogether
M state-action is obtained to (st,at), which depict the Driving Scene state s of t momenttLower selection Driving Decision-making acts at;Together
When based on action choose the case where, obtain the Q values of M corresponding states-action pair, be denoted as Q.
Further, the weights for seeking Reward Program comprise the concrete steps that:
It is primarily based on following formula, builds object function:
Loss function is represented, i.e., according to current state-action to whether there is among driving demonstration, if depositing
It is being then 0, is being otherwise 1;For the corresponding states-working value recorded above;For S22
In the driving exemplary features sought it is expected and the product of the weights θ of Reward Program;For regular terms;
The object function, i.e. t=min are minimized by gradient descent methodθJ (θ), acquisition enable the minimization of object function
Variable θ, the θ are the weights of striked required Reward Program.
Further, Reward Program generator acquisition driving strategy specific implementation process further includes:S25. it is based on obtaining
Correspondence Reward Program weights θ, according to formula r (s, a)=θT(s a) builds Reward Program generator to f.
As further, driving strategy getter completes the specific implementation process that driving strategy is built and is:
S31 builds the training data of driving strategy getter
Training data is obtained, each data include two parts:One is that t moment Driving Scene state is inputted driving condition
The Driving Decision-making feature f (s that extractor obtainst), another is namely based on what following formula obtained
Wherein, rθ(st,at) by Reward Program of the Reward Program generator based on driving example data generation;Qπ(st,
at) and Qπ(st+1,at+1) come from Q values recorded in S23, choose t moment Driving Scene s described in ittQ values and selection
T+1 moment Driving Scenes s described in itt+1Q values;
S32. neural network is established
Neural network includes three layers, and first layer is as input layer, the output of neuron number and feature extractor therein
Feature type is identical for k, the feature f (s for inputting Driving Scenet,at), the hidden layer number of the second layer is 10, third layer
Neuron number in motion space carry out decision driver behavior number n it is identical;Input layer and the activation primitive of hidden layer are all
For sigmoid functions, i.e.,Have:
Z=w(1)X=w(1)[1,ft]T
H=sigmoid (z)
gw(st)=sigmoid (w(2)[1,h]T)
Wherein w(1)For the weights of hidden layer;ftFor the state s of t moment Driving ScenetFeature, that is, neural network is defeated
Enter;Network layer output when z is without hidden layer sigmoid activation primitives;H is hidden after sigmoid activation primitives
Layer output;w(2)For the weights of output layer;
The g of network outputw(st) it is t moment Driving Scene state stQ set, i.e. [Q (st,a1),...,Q(st,an)]T,
Q in S31π(st,at) it is exactly by state stInput neural network, a in selection outputtObtained by;
S33. optimization neural network
The loss function of optimization for the neural network, foundation is cross entropy cost function, and formula is as follows:
The wherein number of N acute pyogenic infection of finger tip training data;Qπ(st,at) it is that will describe t moment Driving Scene state stInput nerve net
Network, the correspondence Driving Decision-making in selection output act atThe obtained numerical value of item;For the numerical value acquired in S31;It is regular terms, W={ w therein(1),w(2)Weights in neural network above acute pyogenic infection of finger tip;
The training data that will be obtained in S31 inputs the Neural Network Optimization cost function;By gradient descent method completion pair
In the minimum of the cross entropy cost function, the neural network of obtained optimization completion, and then obtain driving strategy getter.
As further, judging device implements process and includes:
Regard current Reward Program generator and driving strategy getter as an entirety, checks the t in current S22
Value, if meet t < ε, ε be judge object function whether the threshold value of meet demand, that is, judge to be currently used in acquisition driving
Whether the Reward Program of strategy meets the requirements;Its numerical value carries out different settings according to specific needs;
When the numerical value of t, when being unsatisfactory for the formula;It needs to rebuild Reward Program generator, be needed at this time by current S23
The neural network of middle needs is substituted for the new neural network after having already passed through optimization in S33, i.e., will be used to generate description and exist
Driving Scene state stUnder, the decision driver behavior a of selectioniGood and bad Q (st,ai) value network, be substituted in S33 by ladder
The new network structure that degree descending method optimized;Then Reward Program generator is rebuild, driving strategy is obtained and obtains
Take device, judge again t numerical value whether meet demand;
When meeting the formula, current θ is exactly the weights of required Reward Program;Reward Program generator, which then meets, to be wanted
It asks, driving strategy getter is also met the requirements;Then acquisition needs to establish the driving data of certain driver of pilot model, i.e.,
Environment scene image in driving procedure and corresponding operation data input driving environment feature extractor, obtain for current
The decision feature of scene;Then feature extraction obtained inputs Reward Program generator, obtains the return of corresponding scene state
Function;Then the decision feature of acquisition and the Reward Program being calculated are inputted driving strategy getter, obtains the driver
Corresponding driving strategy.
Advantageous effect is the present invention compared with prior art:For describing driver's decision in the present invention, establishes and drive
Member behavior model method, because using neural network come Descriptive strategies, when neural network parameter determines, state and action
Correspond, then for state-action to it is possible in the case of be no longer limited by demonstration track.Then in actual driving situation
In, because the corresponding big state space of Driving Scene various caused by the reasons such as weather, scenery, outstanding by means of neural network
Approximate expression arbitrary function ability, approximately can be by a kind of this Policy Table up to regarding black box as:Pass through the feature of input state
Value exports corresponding state-working value, while further being acted according to the case where output valve to choose, corresponding dynamic to obtain
Make.To make to be greatly enhanced come the applicability modeled for driving behavior by reverse intensified learning, conventional method is because attempting
It is fitted to demonstration track by a certain probability distribution, thus the optimal policy obtained remains unchanged and is limited to having in demonstration track
State status, and the present invention new state scene can be applicable in, to obtain its respective action, substantially increase and build
The generalization ability of vertical driver behavior model, applicable scene is wider, and robustness is stronger.
Description of the drawings
Fig. 1 is new depth convolutional neural networks;
Fig. 2 is driving video sample graph;
Fig. 3 is this system workflow block diagram;
Fig. 4 is to establish neural network structure figure in step S32.
Specific implementation mode
Below in conjunction with Figure of description, the invention will be further described.Following embodiment is only used for clearly
Illustrate technical scheme of the present invention, and not intended to limit the protection scope of the present invention.
The present embodiment provides a kind of driving behavior modelings, including:
1. feature extractor, extraction structure Reward Program feature, concrete mode are:
S11. in vehicle travel process, the driving video that is obtained using the subsequent video camera of the windshield for being placed on vehicle into
Row sampling, sample graph are as shown in Figure 2.
Obtain the picture of N group different vehicle driving road environment road conditions and corresponding steering angle situation.Including N1
Straight way and N2 bends, the value of N1, N2 can be N1>=300, N2>=3000, at the same corresponding driver behavior data, joint
Construct training data.
S12. carry out relevant translation to collecting the image come, cut, the change operations such as brightness, with simulate different illumination and
The scene of weather.
S13. convolutional neural networks are built, using picture after treatment as input, the operation data of corresponding picture is made
For label value, it is trained;Optimize to seek optimal solution to mean square error loss using based on the optimization method of Nadam optimizers
The weight parameter of neural network.
Convolutional neural networks include 1 input layer, 3 convolutional layers, 3 pond layers, 4 full articulamentums.Input layer is successively
First convolutional layer, first pond layer are connected, second convolutional layer, second pond layer are then connected, reconnects third
Convolutional layer, third pond layer, be then sequentially connected the full articulamentum of first full articulamentum, second full articulamentum, third,
4th full articulamentum.
S14. the network structure by the convolutional neural networks after the completion of training in addition to the last output layer and weights preserve,
To establish a new convolutional neural networks, completion status feature extractor.
2. Reward Program generator, obtains driving strategy, concrete mode is:
Reward Program returns letter as the standard for acting selection in intensified learning method in the acquisition process of driving strategy
Several quality plays the role of conclusive, directly determines the quality of the driving strategy of acquisition, and the strategy obtained is
No strategy corresponding with true driving example data is identical.The formula of Reward Program is reward=θTf(st,at), f (st,
at) acute pyogenic infection of finger tip corresponds to the t moment state s under driving environment scene " vehicle-periphery "tOne group of influence Driving Decision-making result spy
Value indicative, for describing vehicle-periphery scenario.And θ acute pyogenic infection of finger tip corresponds to one group of weights of the feature for influencing Driving Decision-making, power
The corresponding environmental characteristic of the numbers illustrated of value proportion shared in Reward Program, embodies importance.It is carried in state feature
On the basis of taking device, need to solve this weights θ, to come build influence driving strategy Reward Program.
S21. the driving example data of expert is obtained
Example data is driven from sampling extraction (and the driving environment feature extraction before for driving video data of demonstrating
Data used in device are different), it can be sampled according to the continuous driving video of one section of frequency pair of 10hz, obtain one group of track and show
Model.One expert's demonstration should have a plurality of track.Totally it is denoted as:
Wherein DEIndicate whole driving example data, (sj,aj) indicate the corresponding states j (videos of the driving environment of the time j of sampling
Picture) data pair that decision instruction (steering angle in such as steering order) is constituted are corresponded to the state, M represents driving in total
The number of example data, NTIt represents and drives demonstration trace number, LiIt represents i-th and drives the state-decision for including in demonstration track
Instruction is to (sj,aj) number
S22. the feature for seeking driving demonstration it is expected
Example data D will be driven firstEIn each description driving environment situation state stInput state feature extraction
Device obtains corresponding states stUnder feature situation f (st,at), f (st,at) one group of correspondence s of acute pyogenic infection of finger tiptInfluence Driving Decision-making result
Driving environment scene characteristic value, be then based on following formula calculate drive demonstration feature it is expected:
Wherein γ is discount factor, and according to the difference of problem, correspondence is configured, and referential data can be set as 0.65.
S23. state-behavior aggregate under greedy strategy is sought
First, the neural network in the driving strategy getter in S32 is obtained.(because of Reward Program generator and drive plan
Slightly getter is two parts in a cycle, and most neural network is the neural network just initialized in S32 at first.
With the progress of cycle, each step in cycle is all:In the structure for completing the primary Reward Program for influencing Driving Decision-making, then
Corresponding optimal driving strategy is obtained in driving strategy getter based on current Reward Program, judges whether to meet end loop
Standard, rebuild in Reward Program if not satisfied, being then put into the neural network that the process in current S34 optimized)
Driving example data DEExtract the state feature f (s of obtained description ambient conditionst,at), neural network is inputted,
Obtain output gw(st);gw(st) it is about description state stOne group of Q value set, i.e. [Q (st,a1),...,Q(st,an)]T, and
Q(st,ai) state-working value is represented, for describing in current Driving Scene state stUnder, choose decision driver behavior aiIt is excellent
It is bad, can be based on formula Q (s, a)=θ μ (s a) is acquired, the weights in the current Reward Program of the θ acute pyogenic infection of finger tip in the formula,
μ (s, a) acute pyogenic infection of finger tip feature expectation.
ε-greedy strategies are then based on, if setting ε is 0.5, carry out choosing description Driving Scene state stIt is corresponding
Driving Decision-making actsThat is there is 50 percent possibility, choose about current Driving Scene stUnder Q value collection
The maximum decision of Q values is allowed to act in conjunctionOtherwise, then it randomly selectsIt has chosenLater, it records at this time
So for driving the D that demonstratesEIn each state state feature f (st,at), the neural network is inputted, is obtained altogether
M state-action is obtained to (st,at) which depict the Driving Scene state s of t momenttLower selection Driving Decision-making acts at.Together
When based on action choose the case where, obtain the Q values of M corresponding states-action pair, be denoted as Q.
S24. the weights of Reward Program are sought
It is primarily based on following formula, builds object function:
Represent loss function, i.e., according to current state-action to whether there is among driving demonstration, if
It is otherwise 1 in the presence of being then 0.For the corresponding states-working value recorded above.For
The driving exemplary features sought in S22 it is expected and the product of the weights θ of Reward Program.For regular terms, to prevent over-fitting
The appearance of problem, the γ can be 0.9.
The object function, i.e. t=min are minimized by gradient descent methodθJ (θ), acquisition enable the minimization of object function
Variable θ, the θ are the weights of striked required Reward Program.
S25. the correspondence Reward Program weights θ based on acquisition, according to formula r (s, a)=θT(s a) builds Reward Program to f
Generator.
3. driving strategy getter, completes the structure of driving strategy, concrete mode is:
The structure of the training data of S31 driving strategy getters
Obtain training data.Data come from the sampling to example data before, but are handled to obtain one group
The data of new type amount to N number of.Each data include two parts in data:One is to input t moment Driving Scene state
The Driving Decision-making feature f (s that driving condition extractor obtainst), another is namely based on what following formula obtained
Include parameter r in the formulaθ(st,at) by return of the Reward Program generator based on driving example data generation
Function.Qπ(st,at) and Qπ(st+1,at+1) come from that group of Q value Q recorded in S23, choose t moment driver training ground described in it
Scape stQ values and choose its described in t+1 moment Driving Scenes st+1Q values.
S32. neural network is established
Neural network includes three layers, and first layer is as input layer, the output of neuron number and feature extractor therein
Feature type is identical for k, the feature f (s for inputting Driving Scenet,at), the hidden layer number of the second layer is 10, third layer
Neuron number in motion space carry out decision driver behavior number n as number;The activation of input layer and hidden layer
Function is all sigmoid functions, i.e.,Have:
Z=w(1)X=w(1)[1,ft]T
H=sigmoid (z)
gw(st)=sigmoid (w(2)[1,h]T)
Wherein w(1)The weights of acute pyogenic infection of finger tip hidden layer;ftThe state s of acute pyogenic infection of finger tip t moment Driving ScenetFeature, that is, neural network
Input;The output of network layer when z acute pyogenic infection of finger tip is without hidden layer sigmoid activation primitives;H acute pyogenic infection of finger tip is activated by sigmoid
Hidden layer output after function;w(2)The weights of acute pyogenic infection of finger tip output layer;Network structure such as Fig. 3:
The g of network outputw(st) it is t moment Driving Scene state stQ set, i.e. [Q (st,a1),...,Q(st,an)]T,
Q in S31π(st,at) it is exactly by state stInput neural network, a in selection outputtObtained by.
S33. optimization neural network
The loss function of optimization for the neural network, foundation is cross entropy cost function, and formula is as follows:
The wherein number of N acute pyogenic infection of finger tip training data.Qπ(st,at) it is exactly that will describe t moment Driving Scene state stInput nerve
Network, the correspondence Driving Decision-making in selection output act atThe obtained numerical value of item.For the numerical value acquired in S31.Equally it is regular terms, prevents over-fitting and be arranged.The γ may be 0.9.W={ w therein(1),w(2)Acute pyogenic infection of finger tip
Weights in neural network above.
The training data that will be obtained in S31 inputs the Neural Network Optimization cost function.By gradient descent method completion pair
In the minimum of the cross entropy cost function, the neural network of obtained optimization completion obtains driving strategy getter.
4. judging device judges the optimal driving strategy of getter structure, whether meets judgment criteria;If not satisfied, then
Reward Program is rebuild, repetition builds optimal driving strategy, iterates, until meeting judgment criteria;Finally described
The true driving strategy for driving demonstration.
Regard current Reward Program generator and driving strategy getter as an entirety, checks the t in current S22
Value, if meet t < ε, ε be judge object function whether the threshold value of meet demand, that is, judge to be currently used in acquisition driving
Whether the Reward Program of strategy meets the requirements.Its numerical value carries out different settings according to specific needs.
When the numerical value of t is unsatisfactory for the formula.It needs to rebuild Reward Program generator, needs to work as at this time
The neural network needed in preceding S23 is substituted for the new neural network after having already passed through optimization in S33, i.e., will be used to generate and retouch
It states in Driving Scene state stUnder, the decision driver behavior a of selectioniGood and bad Q (st,ai) value network, be substituted in S33 and pass through
Cross the new network structure that gradient descent method optimized.Then it rebuilds Reward Program generator, obtain driving plan
Slightly getter, judge again t numerical value whether meet demand.
When meeting the formula, current θ is exactly the weights of required Reward Program.Reward Program generator is then full
Foot requires, and driving strategy getter is also met the requirements.It then can be with:Acquisition needs to establish driving for certain driver of pilot model
Data, i.e., the environment scene image in driving procedure and corresponding operation data are sailed, steering angle is such as driven.It is special to input driving environment
Extractor is levied, the decision feature for current scene is obtained.Then feature extraction obtained inputs Reward Program generator, obtains
To the Reward Program of corresponding scene state.Then the decision feature of acquisition and the Reward Program being calculated are inputted driving strategy
Getter obtains the corresponding driving strategy of the driver.
In markov decision process, a kind of strategy needs connection status to its corresponding action.But have for one
When large-scale state space, for the region not traversed, indicated it is difficult to be depicted and carry out a determining strategy, tradition
Also the description to this part is had ignored among method, is only based on demonstration track, to illustrate the probability mould of entire track distribution
Type does not provide specific strategy for new state and indicates, i.e., takes the possibility for determining and acting not for new state
Provide specific method.Strategy is described by neural network in the present invention, neural network can be in any essence because of it
The characteristic of approximate representation arbitrary function in exactness, while having outstanding generalization ability.By the expression of state feature, on the one hand
Those states being not included in demonstration track can be represented, in addition, inputting neural network by by corresponding state feature.
Corresponding working value can be sought, to seek deserved action according to strategy, thus, conventional method can not extensive driving demonstration number
It is addressed according to not traversing Driving Scene state issues.
The preferable specific implementation mode of the above, only the invention, but the protection domain of the invention is not
It is confined to this, any one skilled in the art is in the technical scope that the invention discloses, according to the present invention
The technical solution of creation and its inventive concept are subject to equivalent substitution or change, should all cover the invention protection domain it
It is interior.
Claims (10)
1. driving behavior modeling, which is characterized in that specifically include:
Feature extractor, extraction structure Reward Program feature;
Reward Program generator obtains driving strategy;
Driving strategy getter completes the structure of driving strategy;
Judging device judges the optimal driving strategy of getter structure, whether meets judgment criteria;If not satisfied, then structure again
Reward Program is built, repetition builds optimal driving strategy, iterates, until meeting judgment criteria.
2. driving behavior modeling according to claim 1, which is characterized in that feature extractor extraction structure return letter
Counting the specific implementation process of feature is:
S11. in vehicle travel process, driving video is sampled using vehicle windscreen subsequent video camera is placed on,
Obtain the picture of N group different vehicle driving environment road conditions and corresponding steering angle situation;Corresponding driver behavior data simultaneously,
Joint mapping gets up training data;
S12. it translated to collecting the picture come, cut, change brightness operation, to simulate the scene of different illumination and weather;
S13. convolutional neural networks are built, using picture after treatment as input, the operation data of corresponding picture is as mark
Label value, is trained, and optimizes nerve net using optimal solution is sought to mean square error loss based on the optimization method of Nadam optimizers
The weight parameter of network;
S14. the network structure of the convolutional neural networks after the completion of training and weights are preserved, to establish a new convolutional Neural
Network, completion status feature extractor.
3. driving behavior modeling according to claim 2, which is characterized in that the convolutional Neural established in step S13
Network includes 1 input layer, 3 convolutional layers, 3 pond layers, 4 full articulamentums;Input layer be sequentially connected first convolutional layer,
Then first pond layer connects second convolutional layer, second pond layer, reconnect third convolutional layer, third pond
Layer is finally sequentially connected first full articulamentum, second full articulamentum, third full articulamentum, the 4th full articulamentum.
4. driving behavior modeling according to claim 2, which is characterized in that after the completion of the training in step S14
Convolutional neural networks do not include output layer.
5. driving behavior modeling according to claim 1, which is characterized in that Reward Program generator, which obtains, drives plan
Slightly implementing process is:
S21. the driving example data of expert is obtained:Drive example data from for demonstrate driving video data sampling carry
It takes, is sampled according to one section of continuous driving video of certain frequency pair, obtain one group of track demonstration;One expert's example data
Including a plurality of track, totally it is denoted as:
DE={ (s1,a1),(s2,a2),...,(sM,aM)}Wherein DEIndicate whole driving example data, (sj,
aj) indicating that corresponding states j corresponds to the data pair of decision instruction composition with the state, M represents of driving example data in total
Number, NTIt represents and drives demonstration trace number, LiIt represents i-th and drives the state-decision instruction for including in demonstration track to (sj,
aj) number;
S22. seek driving the feature desired value of demonstration;
Example data D will be driven firstEIn each description driving environment situation state stIn input state feature extractor,
Obtain corresponding states stUnder feature situation f (st,at), f (st,at) one group of correspondence s of acute pyogenic infection of finger tiptInfluence Driving Decision-making result drive
Environment scene characteristic value is sailed, following formula is then based on and calculates the feature desired value for driving demonstration:
Wherein γ is discount factor, and according to the difference of problem, correspondence is configured;
S23. state-behavior aggregate under greedy strategy is sought;
S24. the weights of Reward Program are sought.
6. driving behavior modeling according to claim 5, which is characterized in that the state-sought under greedy strategy is dynamic
That makees to collect comprises the concrete steps that:Since Reward Program generator and driving strategy getter are two parts of cycle;First, acquisition is driven
Sail the neural network in tactful getter:Driving example data DEExtract the state feature f (s of obtained description ambient conditionst,
at), neural network is inputted, output g is obtainedw(st);gw(st) it is about description state stOne group of Q value set, i.e. [Q (st,
a1),...,Q(st,an)]T, and Q (st,ai) state-working value is represented, for describing in current Driving Scene state stUnder, choosing
Depend on plan driver behavior aiQuality, ((s a) is acquired, and the θ acute pyogenic infection of finger tip in the formula is current by s, a)=θ μ based on formula Q
Weights in Reward Program, μ (s, a) acute pyogenic infection of finger tip feature desired value.
ε-greedy strategies are then based on, carry out choosing description Driving Scene state stCorresponding Driving Decision-making actionIt chooses
About current Driving Scene stUnder Q value sets in allow the maximum decision action of Q valuesOtherwise, then it randomly selectsIt chooses
It is completeLater, it records at this time
So for driving the D that demonstratesEIn each state state feature f (st,at), the neural network is inputted, is acquired altogether
M state-action is to (st,at), which depict the Driving Scene state s of t momenttLower selection Driving Decision-making acts at;Base simultaneously
In acting the case where choosing, the Q values of M corresponding states-action pair are obtained, Q is denoted as.
7. driving behavior modeling according to claim 5, which is characterized in that the weights for seeking Reward Program specifically walk
Suddenly it is:It is primarily based on following formula, builds object function:
Loss function is represented, i.e., according to current state-action to whether there is among driving demonstration, if in the presence of if
It is 0, is otherwise 1;For the corresponding states-working value recorded above;To be asked in S22
The driving exemplary features taken it is expected and the product of the weights θ of Reward Program;For regular terms;
The object function, i.e. t=min are minimized by gradient descent methodθJ (θ) obtains the variable for enabling the minimization of object function
θ, the θ are the weights of striked required Reward Program.
8. driving behavior modeling according to claim 5, which is characterized in that Reward Program generator, which obtains, drives plan
Slightly implementing process further includes:S25. the correspondence Reward Program weights θ based on acquisition, according to formula r (s, a)=θTf(s,a)
Build Reward Program generator.
9. driving behavior modeling according to claim 1, which is characterized in that driving strategy getter is completed to drive plan
The specific implementation process slightly built is:
S31 builds the training data of driving strategy getter
Training data is obtained, each data include two parts:One is by the input driving condition extraction of t moment Driving Scene state
The Driving Decision-making feature f (s that device obtainst), another is namely based on what following formula obtained
Wherein, rθ(st,at) by Reward Program of the Reward Program generator based on driving example data generation;Qπ(st,at) and Qπ
(st+1,at+1) come from Q values recorded in S23, choose t moment Driving Scene s described in ittQ values and choose wherein retouch
State t+1 moment Driving Scenes st+1Q values;
S32. neural network is established
Neural network includes three layers, and first layer is as input layer, the output feature of neuron number and feature extractor therein
Type is all mutually k, the feature f (s for inputting Driving Scenet,at), the hidden layer number of the second layer is 10, the god of third layer
It is identical with the driver behavior number n of progress decision in motion space through first number;Input layer and the activation primitive of hidden layer are all
Sigmoid functions, i.e.,Have:
Z=w(1)X=w(1)[1,ft]T
H=sigmoid (z)
gw(st)=sigmoid (w(2)[1,h]T)
Wherein w(1)For the weights of hidden layer;ftFor the state s of t moment Driving ScenetFeature, that is, neural network input;z
Network layer output when for without hidden layer sigmoid activation primitives;H is that the hidden layer after sigmoid activation primitives is defeated
Go out;w(2)For the weights of output layer;
The g of network outputw(st) it is t moment Driving Scene state stQ set, i.e. [Q (st,a1),...,Q(st,an)]T, S31
In Qπ(st,at) it is exactly by state stInput neural network, a in selection outputtObtained by;
S33. optimization neural network
The loss function of optimization for the neural network, foundation is cross entropy cost function, and formula is as follows:
The wherein number of N acute pyogenic infection of finger tip training data;Qπ(st,at) it is that will describe t moment Driving Scene state stInput neural network, choosing
Select the correspondence Driving Decision-making action a in outputtThe obtained numerical value of item;For the numerical value acquired in S31;It is just
Then item, W={ w therein(1),w(2)Weights in neural network above acute pyogenic infection of finger tip;
The training data that will be obtained in S31 inputs the Neural Network Optimization cost function;It is completed for this by gradient descent method
The minimum of cross entropy cost function, the neural network that obtained optimization is completed, and then obtain driving strategy getter.
10. driving behavior modeling according to claim 1, which is characterized in that judging device implements process and includes:
Regard current Reward Program generator and driving strategy getter as an entirety, checks the t values in current S22, be
It is no to meet t < ε, ε be judge object function whether the threshold value of meet demand, that is, judge to be currently used in and obtain driving strategy
Whether Reward Program meets the requirements;Its numerical value carries out different settings according to specific needs;
When the numerical value of t, when being unsatisfactory for the formula;It needs to rebuild Reward Program generator, needs to need in current S23 at this time
The neural network wanted is substituted for the new neural network after having already passed through optimization in S33, i.e., will be used to generate description and drive
Scene state stUnder, the decision driver behavior a of selectioniGood and bad Q (st,ai) value network, be substituted in S33 by under gradient
The new network structure that drop method optimized;Then it rebuilds Reward Program generator, obtain driving strategy getter,
Judge again t numerical value whether meet demand;
When meeting the formula, current θ is exactly the weights of required Reward Program;Reward Program generator is then met the requirements,
Driving strategy getter is also met the requirements;Then acquisition needs to establish the driving data of certain driver of pilot model, that is, drives
Environment scene image during sailing and corresponding operation data input driving environment feature extractor, obtain for working as front court
The decision feature of scape;Then feature extraction obtained inputs Reward Program generator, obtains the return letter of corresponding scene state
Number;Then the decision feature of acquisition and the Reward Program being calculated are inputted driving strategy getter, obtains the driver couple
The driving strategy answered.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810662040.0A CN108791302B (en) | 2018-06-25 | 2018-06-25 | Driver behavior modeling system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810662040.0A CN108791302B (en) | 2018-06-25 | 2018-06-25 | Driver behavior modeling system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108791302A true CN108791302A (en) | 2018-11-13 |
CN108791302B CN108791302B (en) | 2020-05-19 |
Family
ID=64070795
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810662040.0A Active CN108791302B (en) | 2018-06-25 | 2018-06-25 | Driver behavior modeling system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108791302B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110481561A (en) * | 2019-08-06 | 2019-11-22 | 北京三快在线科技有限公司 | Automatic driving vehicle automatic control signal generation method and device |
CN111923928A (en) * | 2019-05-13 | 2020-11-13 | 长城汽车股份有限公司 | Decision making method and system for automatic vehicle |
CN112052776A (en) * | 2020-09-01 | 2020-12-08 | 中国人民解放军国防科技大学 | Unmanned vehicle autonomous driving behavior optimization method and device and computer equipment |
CN112373482A (en) * | 2020-11-23 | 2021-02-19 | 浙江天行健智能科技有限公司 | Driving habit modeling method based on driving simulator |
WO2021093011A1 (en) * | 2019-11-14 | 2021-05-20 | 深圳大学 | Unmanned vehicle driving decision-making method, unmanned vehicle driving decision-making device, and unmanned vehicle |
CN112997128A (en) * | 2021-04-19 | 2021-06-18 | 华为技术有限公司 | Method, device and system for generating automatic driving scene |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103381826A (en) * | 2013-07-31 | 2013-11-06 | 中国人民解放军国防科学技术大学 | Adaptive cruise control method based on approximate policy iteration |
CN105955930A (en) * | 2016-05-06 | 2016-09-21 | 天津科技大学 | Guidance-type policy search reinforcement learning algorithm |
CN107168303A (en) * | 2017-03-16 | 2017-09-15 | 中国科学院深圳先进技术研究院 | A kind of automatic Pilot method and device of automobile |
CN107203134A (en) * | 2017-06-02 | 2017-09-26 | 浙江零跑科技有限公司 | A kind of front truck follower method based on depth convolutional neural networks |
CN107229973A (en) * | 2017-05-12 | 2017-10-03 | 中国科学院深圳先进技术研究院 | The generation method and device of a kind of tactful network model for Vehicular automatic driving |
CN107480726A (en) * | 2017-08-25 | 2017-12-15 | 电子科技大学 | A kind of Scene Semantics dividing method based on full convolution and shot and long term mnemon |
CN107679557A (en) * | 2017-09-19 | 2018-02-09 | 平安科技(深圳)有限公司 | Driving model training method, driver's recognition methods, device, equipment and medium |
CN108108657A (en) * | 2017-11-16 | 2018-06-01 | 浙江工业大学 | A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning |
-
2018
- 2018-06-25 CN CN201810662040.0A patent/CN108791302B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103381826A (en) * | 2013-07-31 | 2013-11-06 | 中国人民解放军国防科学技术大学 | Adaptive cruise control method based on approximate policy iteration |
CN105955930A (en) * | 2016-05-06 | 2016-09-21 | 天津科技大学 | Guidance-type policy search reinforcement learning algorithm |
CN107168303A (en) * | 2017-03-16 | 2017-09-15 | 中国科学院深圳先进技术研究院 | A kind of automatic Pilot method and device of automobile |
CN107229973A (en) * | 2017-05-12 | 2017-10-03 | 中国科学院深圳先进技术研究院 | The generation method and device of a kind of tactful network model for Vehicular automatic driving |
CN107203134A (en) * | 2017-06-02 | 2017-09-26 | 浙江零跑科技有限公司 | A kind of front truck follower method based on depth convolutional neural networks |
CN107480726A (en) * | 2017-08-25 | 2017-12-15 | 电子科技大学 | A kind of Scene Semantics dividing method based on full convolution and shot and long term mnemon |
CN107679557A (en) * | 2017-09-19 | 2018-02-09 | 平安科技(深圳)有限公司 | Driving model training method, driver's recognition methods, device, equipment and medium |
CN108108657A (en) * | 2017-11-16 | 2018-06-01 | 浙江工业大学 | A kind of amendment local sensitivity Hash vehicle retrieval method based on multitask deep learning |
Non-Patent Citations (1)
Title |
---|
王勇鑫,钱徽,金卓军,朱淼良: "基于轨迹分析的自主导航性能评估方法", 《计算机工程》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111923928A (en) * | 2019-05-13 | 2020-11-13 | 长城汽车股份有限公司 | Decision making method and system for automatic vehicle |
CN110481561A (en) * | 2019-08-06 | 2019-11-22 | 北京三快在线科技有限公司 | Automatic driving vehicle automatic control signal generation method and device |
WO2021093011A1 (en) * | 2019-11-14 | 2021-05-20 | 深圳大学 | Unmanned vehicle driving decision-making method, unmanned vehicle driving decision-making device, and unmanned vehicle |
CN112052776A (en) * | 2020-09-01 | 2020-12-08 | 中国人民解放军国防科技大学 | Unmanned vehicle autonomous driving behavior optimization method and device and computer equipment |
CN112373482A (en) * | 2020-11-23 | 2021-02-19 | 浙江天行健智能科技有限公司 | Driving habit modeling method based on driving simulator |
CN112373482B (en) * | 2020-11-23 | 2021-11-05 | 浙江天行健智能科技有限公司 | Driving habit modeling method based on driving simulator |
CN112997128A (en) * | 2021-04-19 | 2021-06-18 | 华为技术有限公司 | Method, device and system for generating automatic driving scene |
CN112997128B (en) * | 2021-04-19 | 2022-08-26 | 华为技术有限公司 | Method, device and system for generating automatic driving scene |
Also Published As
Publication number | Publication date |
---|---|
CN108791302B (en) | 2020-05-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108819948A (en) | Driving behavior modeling method based on reverse intensified learning | |
CN108791302A (en) | Driving behavior modeling | |
CN108920805A (en) | Driving behavior modeling with state feature extraction functions | |
CN111079561B (en) | Robot intelligent grabbing method based on virtual training | |
CN112232490B (en) | Visual-based depth simulation reinforcement learning driving strategy training method | |
CN108891421A (en) | A method of building driving strategy | |
CN111136659A (en) | Mechanical arm action learning method and system based on third person scale imitation learning | |
CN109949187A (en) | A kind of novel Internet of Things teleeducation system and control method | |
Li et al. | Facial feedback for reinforcement learning: a case study and offline analysis using the TAMER framework | |
CN110525428A (en) | A kind of automatic parking method based on the study of fuzzy deeply | |
CN109726676A (en) | The planing method of automated driving system | |
CN107351080A (en) | A kind of hybrid intelligent research system and control method based on array of camera units | |
CN108944940A (en) | Driving behavior modeling method neural network based | |
CN113779289A (en) | Drawing step reduction system based on artificial intelligence | |
CN103793054B (en) | A kind of action identification method simulating declarative memory process | |
CN110990589A (en) | Knowledge graph automatic generation method based on deep reinforcement learning | |
CN116353623A (en) | Driving control method based on self-supervision imitation learning | |
Hafez et al. | Improving robot dual-system motor learning with intrinsically motivated meta-control and latent-space experience imagination | |
CN108875555A (en) | Video interest neural network based region and well-marked target extraction and positioning system | |
CN112329498A (en) | Street space quality quantification method based on machine learning | |
CN110222822A (en) | The construction method of black box prediction model internal feature cause-and-effect diagram | |
CN108791308A (en) | The system for building driving strategy based on driving environment | |
CN117078923B (en) | Automatic driving environment-oriented semantic segmentation automation method, system and medium | |
CN111126441B (en) | Construction method of classification detection network model | |
CN114910071A (en) | Object navigation method based on object bias correction and directed attention map |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
OL01 | Intention to license declared | ||
OL01 | Intention to license declared |