CN107544516A - Automated driving system and method based on relative entropy depth against intensified learning - Google Patents
Automated driving system and method based on relative entropy depth against intensified learning Download PDFInfo
- Publication number
- CN107544516A CN107544516A CN201710940590.XA CN201710940590A CN107544516A CN 107544516 A CN107544516 A CN 107544516A CN 201710940590 A CN201710940590 A CN 201710940590A CN 107544516 A CN107544516 A CN 107544516A
- Authority
- CN
- China
- Prior art keywords
- driving
- road information
- strategy
- relative entropy
- client
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/02—Control of position or course in two dimensions
Abstract
The present invention relates to a kind of automated driving system based on relative entropy depth against intensified learning, including:(1) client:Show driving strategy;(2) basic data acquisition subsystem is driven:Gather road information;(3) memory module:With client and driving basic data acquisition subsystem and being connected and road information that memory of driving basic data acquisition subsystem is collected;Wherein, drive basic data acquisition subsystem collection road information and the road information is transferred to client and memory module, memory module receives road information, and one section of lasting road information is stored as historical track, analysis calculating is carried out according to historical track and simulates driving strategy, memory module transmits the driving strategy to client for selection by the user, and client receives road information and selected to implement automatic Pilot according to user.Present system realizes automatic Pilot under model-free using the depth of relative entropy against nitrification enhancement.
Description
Technical field
The present invention relates to a kind of automated driving system and method based on relative entropy depth against intensified learning, belong to and drive automatically
Sail technical field.
Background technology
With the increase of China's automobile volume of holding, road traffic congestion phenomenon is increasingly severe, etesian traffic thing
Therefore also constantly rising, in order to preferably solve this problem, research and development automatic vehicle control system is necessary.And with
The lifting that people are pursued quality of life, it is desirable to be liberated from the driving-activity of fatigue, automatic Pilot technology should
Transport and give birth to.
A kind of existing automatic vehicle control system is to be distinguished to drive by the video camera mounted in driver's cabin and image identification system
Environment is sailed, then by vehicle-mounted main control computer, GPS positioning system and path planning software according to road-map kept in advance etc.
Information is navigated to vehicle, and rational driving path is cooked up between the current location of vehicle and destination by vehicle guidance
Destination.
In above-mentioned automatic vehicle control system, because road-map is pre-stored in vehicle, the renewal of its data depends on
The manual operation of driver, renewal frequency cannot be guaranteed, also, enable a driver to accomplish to upgrade in time, it is also possible to by
The data for finally give on the up-to-date information of road in existing resource can not react road feelings instantly
Condition, it is unreasonable to ultimately cause traffic route, and navigation accuracy rate is not high, is made troubles to driving.Also, at present in automatic Pilot skill
Most of automatic vehicle control system in art field also needs to manually be intervened, and can not reach the ground of complete automatic Pilot
Step.
The content of the invention
It is an object of the invention to provide a kind of automated driving system and method based on relative entropy depth against intensified learning,
Using deep neural network structure and the history driving locus information of user driver is inputted, a variety of individual characteies that represent is obtained and drives habit
Used driving strategy, the automatic Pilot of individual character, intelligence is carried out by these driving strategies.
To reach above-mentioned purpose, the present invention provides following technical scheme:It is a kind of based on relative entropy depth against intensified learning
Automated driving system, the system include:
Client:Show driving strategy;
Drive basic data acquisition subsystem:Gather road information;
Memory module:It is connected with the client and driving basic data acquisition subsystem and stores the basic number of driving
The road information collected according to acquisition subsystem;
Wherein, the driving basic data acquisition subsystem gathers road information and is transferred to the road information described
Client and memory module, the memory module receives the road information, and one section of lasting road information is stored as going through
History track, carry out analyzing calculating simulating driving strategy according to the historical track, the memory module is by the driving strategy
Transmit to client for selection by the user, the client receives and according to the road information and user personality selection
Driving strategy implements automatic Pilot.
Further, the memory module include be used for store history driving locus driving locus storehouse, according to drive rail
Mark and driving habit calculate and simulate the trace information processing subsystem of driving strategy and the driving strategy of memory of driving strategy
Storehouse;Driving locus data are transferred to the trace information processing subsystem, the trace information processing by the driving locus storehouse
Subsystem calculates according to the driving locus data analysis and simulates driving strategy and be transferred to the driving strategy storehouse, described
Driving strategy storehouse receives and stores the driving strategy.
Further, the trace information processing subsystem using the relative entropy depth of multiple target against nitrification enhancement meter
Calculate simultaneously drive simulating strategy.
Further, the inverse nitrification enhancement of the multiple target is strengthened using EM algorithm frame nesting relative entropies depth is inverse
Study calculates the parameter of more reward functions.
Further, the basic data acquisition subsystem that drives includes being used for the sensor for gathering road information.
Present invention also offers a kind of method based on relative entropy depth against the automatic Pilot of intensified learning, methods described bag
Include following steps:
Comprise the following steps:
S1:The road information is simultaneously transferred to client and memory module by collection road information;
S2:The memory module receives the road information and one section of lasting road information is stored as into historical track,
Calculated according to the historical track analysis and simulate a variety of driving strategies, and the driving strategy is passed into the client;
S3:The client receives the road information and driving strategy, and the individual character driving strategy selected according to user
And road information implements automatic Pilot.
Further, the memory module is including being used to store the driving locus storehouse of history driving locus, being advised according to driving
Draw and driving habit calculates and simulates the trace information processing subsystem of driving strategy and the driving strategy of memory of driving strategy
Storehouse;Driving locus data are transferred to the trace information processing subsystem, the trace information processing by the driving locus storehouse
Subsystem calculates according to the driving locus data analysis and simulates driving strategy and be transferred to the driving strategy storehouse, described
Driving strategy storehouse receives and stores the driving strategy.
Further, the trace information processing subsystem using the relative entropy depth of multiple target against nitrification enhancement meter
Calculate simultaneously drive simulating strategy.
Further, the inverse nitrification enhancement of the multiple target is strengthened using EM algorithm frame nesting relative entropies depth is inverse
Study calculates the parameter of more reward functions.
The beneficial effects of the present invention are:Basic data acquisition subsystem is driven by setting in systems, collection in real time
Road information, and road information passed into memory module, memory module receive after road information and by one section of lasting roads
Information is stored as historical track, according to history driving locus drive simulating strategy, realizes individual character, the automatic Pilot of intelligence.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention,
And can be practiced according to the content of specification, below with presently preferred embodiments of the present invention and coordinate accompanying drawing describe in detail as after.
Brief description of the drawings
Fig. 1 be the present invention based on relative entropy depth against the automated driving system of intensified learning and the flow chart of method.
Fig. 2 is markov decision process MDP schematic diagrames.
Embodiment
With reference to the accompanying drawings and examples, the embodiment of the present invention is described in further detail.Implement below
Example is used to illustrate the present invention, but is not limited to the scope of the present invention.
Refer to Fig. 1, the automatic Pilot system based on relative entropy depth against intensified learning of a preferred embodiment of the invention
System includes:
Client 1:Show driving strategy;
Drive basic data acquisition subsystem 2:Gather road information;
Memory module 3:It is connected with the client 1 and driving basic data acquisition subsystem 2 and stores the driving base
The road information that plinth data acquisition subsystem 2 is collected;
Wherein, the driving basic data acquisition subsystem 2 gathers road information and the road information is transferred into institute
Client 1 and memory module 3 are stated, the memory module 3 receives the road information, and one section of lasting road information is stored
For historical track, analyze calculating according to the historical track and simulate driving strategy, the memory module 3 is by the driving
To client 1 for selection by the user, the client 1 receives the road information and the individual character selected according to user to strategy transmission
Driving strategy implements automatic Pilot.In the present embodiment, the memory module 3 is high in the clouds.
The 1 most important function of client is and user's finishing man-machine interaction process, there is provided to individual character, intelligent more
The selection of kind driving strategy and service.Client 1 selects according to the driving strategy of user personality, from the driving strategy storehouse 33 of high in the clouds 3
Corresponding driving strategy is downloaded, real-time Driving Decision-making is carried out then according to driving strategy and basic data, realizes real-time nothing
People's Driving control.
The driving basic data acquisition subsystem 2 passes through sensor collection road information (not shown).The letter collected
Breath has two purposes:Client 1 is passed information to, basic data is provided for current Driving Decision-making;Communicate information to cloud
The driving locus storehouse 31 at end 3, it is stored as the history driving locus data of user driver.
The high in the clouds 3 include by the driving locus storehouse 31 of history driving locus, according to drive plan and driving habit based on
Calculate and simulate the trace information processing subsystem 32 of driving strategy and the driving strategy storehouse 33 of memory of driving strategy;The driving
Driving locus data are transferred to the trace information processing subsystem 32, the trace information processing subsystem 32 by track storehouse 31
Calculated according to the driving locus data analysis and simulate driving strategy and be transferred to the driving strategy storehouse 33, the driving
Policy library 33 receives and stores the driving strategy.The trace information processing subsystem 32 uses the relative entropy depth of multiple target
Inverse nitrification enhancement calculates and drive simulating strategy.In the present embodiment, the inverse nitrification enhancement of the multiple target uses
EM algorithm frames nesting relative entropy depth calculates the parameter of more reward functions against intensified learning.The history driving locus includes special
Family's history driving locus and the historical track of user.
The inverse intensified learning IRL refers to that reward functions R is unknown in markov decision process MDP known to environment
Problem.In in general intensified learning problem RL, often utilize known to environment, given reward functions R and Markov
Property estimates the value Q of a state action pair (s, a) (alternatively referred to as action accumulation reward value), then using convergent each
State action pair value Q (s, a) ask for tactful π, intelligent body (Agent) can Utilization strategies π carry out decision-making.In reality,
Reward functions R is often extremely difficult to what is known, but some outstanding track TNIt is easier to obtain.It is unknown in reward functions
Markov decision process MDP/R in, utilize outstanding track TNThe problem of reducing reward functions R is referred to as inverse intensified learning
Problem IRL.
In the present embodiment, using known user's history driving locus data in the driving locus storehouse 31, phase is carried out
To entropy depth against intensified learning, the reward functions R of a variety of user personalities is restored, and then simulate corresponding driving strategy π.Phase
To entropy depth against nitrification enhancement be a kind of algorithm of model-free, without in known environment model state transition function T (s,
A, s '), relative entropy can utilize the method for importance sampling to avoid state transition function T in the calculation against nitrification enhancement
(s,a,s′)。
In the present embodiment, the automatic Pilot decision process of automobile is a Markovian decision mistake without reward functions
Journey MDP/R, being expressed as set, { state space S, motion space A, the state transition probability T of Environment Definition (are omitted to environment
Transition probability T requirement).Automobile Agent value function (accumulative reward value) can be expressed asAnd automobile Agent state action value function can be expressed as Q (s, a)=
Rθ(s,a)+γET(s,a,s′)[V(s′)].In order to solve the problems, such as more complicated true driving, to the hypothesis of reward functions no longer
Simply simple linear combination, but it is assumed to be deep neural network R (s, a, θ)=g1(g2(…(gn(f(s,a),
θn),…),θ2),θ1), wherein f (s, a) represents (s, a) driving at place link characteristic information, θiRepresent deep neural network the
The parameter of i layers.
Meanwhile in order to meet more individual character, more intelligent true Driving Scene, it is assumed that there is multiple reward functions R (target) same
When exist, represent the different driving habit of user driver.Assuming that G reward functions be present, the priori of this G reward functions is made
Probability distribution is ρ1,…,ρG, award weight is θ1,…,θG, make Θ=(ρ1,…,ρG,θ1,…,θG), represent this G award letter
Several parameter sets.
Refer to Fig. 2, it is known have hypothesis reward functions (by initializing or being obtained by iteration) under conditions of, now I
Problem can be described as a complete markov decision process MDP.Now in complete markov decision process
Under MDP, according to the knowledge of intensified learning, reward functions R (s, a, θ)=g is utilized1(g2(…(gn(f,θn),…),θ2),θ1), I
V values and Q values can be assessed.For the assessment algorithm of intensified learning, using a kind of new soft maximization approach
(MellowMax) desired value of V values is estimated.MellowMax maker is defined as:MellowMax is a kind of algorithm more optimized, and it can ensure to V values
Estimation can converge on uniquely a bit.Meanwhile MellowMax is but also with speciality:The probability assignments mechanism and expectation estimation of science
Method.In the present embodiment, combine exploration of the MellowMax nitrification enhancement during automatic Pilot to environment and
Use aspects will be more reasonable.It ensure that when intensified learning process restrains, automated driving system has had to various scenes
Enough study simultaneously can be to assessment of the current state generation compared with science.
In the present embodiment, according to a kind of soft intensified learning for maximizing algorithm MellowMax is combined, can obtain pair
The more scientific evaluation of the desired value of the feature of state.Using MellowMax can obtain action choose probability distribution beUnder the rule that the soft maximized action is chosen, the iteration of intensified learning is utilized
Process, can obtain can obtain the expectation of feature using the parameter of current depth neutral net as the reward functions that θ is formed
Value μ.μ is appreciated that the accumulative expectation being characterized.
In the present embodiment, the above-mentioned multiple target with hidden variable is solved using EM algorithms against intensified learning problem.EM is calculated
Method can be divided into E steps by step and M is walked, and is walked by E, the continuous iteration of M steps, approaches the maximum of possibility predication.
E is walked:Calculate firstWherein Z is regular terms.zijRepresent i-th of driving rail
Mark belongs to driving habit (reward functions) j probability.
Make yi=j represents that i-th of driving locus belongs to driving habit j, and with y=(y1,…,yN) set expression N number of drive
Sail the subordinate set of track.
Calculate likelihood estimator Q (Θ, Θt)=∑yL(Θ|D,y)Pr(y|D,Θt) (Q functions Q referred herein (Θ,
Θt) be EM algorithms renewal object function, pay attention to intensified learning in Q operating state value functions distinguish), through reckoning
Obtain likelihood estimator
M is walked:Choose suitable more driving habit parameter sets Θ (ρlAnd θl) cause E step in likelihood estimator Q (Θ,
Θt) maximization.Due to ρlAnd θlMutual independence, can separate ask their maximization.It can obtain
Latter half
For Q (Θ, the Θ of maximizingt) latter half more fresh target:
It can be understood as It is on being θ in the parameter of l cluster targetslUnder conditions of the track observed
SetMaximum likelihood equations can be obtained.We can solve this using relative entropy depth against the knowledge of intensified learning
Individual maximum likelihood equations.The solution formula of relative entropy, while maximum likelihood more fresh target is met, can naturally it be applied to
The backpropagation renewal of deep neural network parameter.The maximization object function for making deep neural network is L (θ)=logP (D, θ
| r), according to the decomposition formula of joint likelihood function, can obtain L (θ)=logP (D, θ | r)=logP (D | r)+logP (θ).
Local derviation is asked to obtain the joint plausible goals functionFor the first half of the local derviation, one can be entered
Step is decomposed, and is expressed as
WhereinKnowledge according to relative entropy against intensified learning, solving result can be obtained as under current reward functions
The difference of feature desired value and expert features valueWherein, utilization is important
Property sampling,Wherein, π is a kind of given strategy, is obtained according to this tactful π samplingsIndividual track.Its
InWherein τ=s1a1,…,sHaH.Further,Its
InIt is expressed as updating the ladder hidden in deep neural network and calculated during layer parameter by back-propagation algorithm
Degree.
Gradient updating complement mark a completion that relative entropy depth updates against intensified learning iteration.Completed using renewal
The new depth network reward functions of parameter renewal produce new tactful π, carry out new iteration.
Continuous iteration carries out the calculating of E steps and M steps, until likelihood estimator Q (Θ, Θt) converge to maximum.Now obtain
The parameter sets Θ=(ρ obtained1,…,ρG,θ1,…,θG), it is exactly that we want the reward functions of the more driving habits of representative of solution
Prior distribution and weight.
In the present embodiment, according to this parameter sets Θ, by intensified learning RL calculating, each driving habit is obtained
R driving strategy π.More driving strategies are exported, and are preserved in driving strategy storehouse beyond the clouds.User can select in the client
The driving strategy of individual character, intelligence.
Present invention also offers a kind of method based on relative entropy depth against the automatic Pilot of intensified learning, methods described bag
Include following steps:
S1:The road information is simultaneously transferred to client and memory module by collection road information;
S2:The memory module, which receives the road information and analyzed according to the road information, calculates and simulates a variety of drive
Strategy is sailed, and the driving strategy is passed into the client;
S3:The client receives the road information and driving strategy, and the individual character driving strategy selected according to user
And road information implements automatic Pilot.
In summary:Basic data acquisition subsystem 2 is driven by setting in systems, gathers road information in real time, and
Road information is passed into memory module 3 and client 1, memory module 3 is received after road information according to history driving locus mould
Intend driving strategy, realize individual character, the automatic Pilot of intelligence.
In automatic Pilot based on this method, driving strategy is all realized in 3 and calculated beyond the clouds, rather than in client 1
Run calculating process.When user is needing to carry out automatic Pilot, all driving strategies all 3 have been completed beyond the clouds.With
Family only needs selection to download the driving strategy needed for oneself, driving strategy and real-time road of the car body can according to selected by user
Road information carries out real-time automatic Pilot.Meanwhile after any once driving is completed, substantial amounts of road information uploads to high in the clouds
3 are stored as history driving locus.Using the history driving locus big data of storage, then realize the renewal to driving strategy storehouse.
Using trace information big data, the system will realize the automatic Pilot for demand of being more close to the users.
Each technical characteristic of embodiment described above can be combined arbitrarily, to make description succinct, not to above-mentioned reality
Apply all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited
In contradiction, the scope that this specification is recorded all is considered to be.
Embodiment described above only expresses the several embodiments of the present invention, and its description is more specific and detailed, but simultaneously
Can not therefore it be construed as limiting the scope of the patent.It should be pointed out that come for one of ordinary skill in the art
Say, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection of the present invention
Scope.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.
Claims (9)
1. a kind of automated driving system based on relative entropy depth against intensified learning, it is characterised in that the system includes:
Client:Show driving strategy;
Drive basic data acquisition subsystem:Gather road information;
Memory module:It is connected with the client and driving basic data acquisition subsystem and stores the driving basic data and is adopted
The road information that subsystem is collected;
Wherein, the driving basic data acquisition subsystem gathers road information and the road information is transferred into the client
End and memory module, the memory module receives the road information, and one section of lasting road information is stored as into history rail
Mark, analysis calculating is carried out according to the historical track and simulates driving strategy, the memory module transmits the driving strategy
To client for selection by the user, the driving that the client receives and selected according to the road information and user personality
Strategy implement automatic Pilot.
2. the automated driving system based on relative entropy depth against intensified learning as claimed in claim 1, it is characterised in that described
Memory module is including being used to store the driving locus storehouse of history driving locus, calculating and simulate according to driving locus and driving habit
Go out the trace information processing subsystem of driving strategy and the driving strategy storehouse of memory of driving strategy;The driving locus storehouse will drive
Track data is transferred to the trace information processing subsystem, and the trace information processing subsystem is according to the driving locus number
Calculated according to analysis and simulate driving strategy and be transferred to the driving strategy storehouse, the driving strategy storehouse receives and stored described
Driving strategy.
3. the automated driving system based on relative entropy depth against intensified learning as claimed in claim 2, it is characterised in that described
Trace information processing subsystem is calculated using the relative entropy depth of multiple target against nitrification enhancement and drive simulating strategy.
4. the automated driving system based on relative entropy depth against intensified learning as claimed in claim 3, it is characterised in that described
The inverse nitrification enhancement of multiple target calculates more reward functions using EM algorithm frames nesting relative entropy depth against intensified learning
Parameter.
5. the personalized automated driving system based on relative entropy depth against intensified learning, its feature exist as claimed in claim 1
In the basic data acquisition subsystem that drives includes being used for the sensor for gathering road information.
A kind of 6. method based on relative entropy depth against the automatic Pilot of intensified learning, it is characterised in that methods described is included such as
Lower step:
S1:The road information is simultaneously transferred to client and memory module by collection road information;
S2:The memory module receives the road information and one section of lasting road information is stored as into historical track, according to
The historical track analysis is calculated and simulates a variety of driving strategies, and the driving strategy is passed into the client;
S3:The client receives the road information and driving strategy, and the individual character driving strategy and road selected according to user
Road information implements automatic Pilot.
7. the method based on relative entropy depth against the automatic Pilot of intensified learning as claimed in claim 6, it is characterised in that institute
Memory module is stated including being used to store the driving locus storehouse of history driving locus, calculating simultaneously mould according to driving planning and driving habit
Draw up the trace information processing subsystem of driving strategy and the driving strategy storehouse of memory of driving strategy;The driving locus storehouse will drive
Sail track data and be transferred to the trace information processing subsystem, the trace information processing subsystem is according to the driving locus
Data analysis calculates and simulates driving strategy and be transferred to the driving strategy storehouse, and the driving strategy storehouse receives and stores institute
State driving strategy.
8. the method based on relative entropy depth against the automatic Pilot of intensified learning as claimed in claim 7, it is characterised in that institute
State trace information processing subsystem and simultaneously drive simulating strategy is calculated against nitrification enhancement using the relative entropy depth of multiple target.
9. the method based on relative entropy depth against the automatic Pilot of intensified learning as claimed in claim 8, it is characterised in that institute
The inverse nitrification enhancement for stating multiple target calculates more reward functions using EM algorithm frames nesting relative entropy depth against intensified learning
Parameter.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710940590.XA CN107544516A (en) | 2017-10-11 | 2017-10-11 | Automated driving system and method based on relative entropy depth against intensified learning |
PCT/CN2018/078740 WO2019071909A1 (en) | 2017-10-11 | 2018-03-12 | Automatic driving system and method based on relative-entropy deep inverse reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710940590.XA CN107544516A (en) | 2017-10-11 | 2017-10-11 | Automated driving system and method based on relative entropy depth against intensified learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107544516A true CN107544516A (en) | 2018-01-05 |
Family
ID=60967749
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710940590.XA Pending CN107544516A (en) | 2017-10-11 | 2017-10-11 | Automated driving system and method based on relative entropy depth against intensified learning |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107544516A (en) |
WO (1) | WO2019071909A1 (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109460015A (en) * | 2017-09-06 | 2019-03-12 | 通用汽车环球科技运作有限责任公司 | Unsupervised learning agency for autonomous driving application |
CN109636432A (en) * | 2018-09-28 | 2019-04-16 | 阿里巴巴集团控股有限公司 | The project selection method and device that computer executes |
WO2019071909A1 (en) * | 2017-10-11 | 2019-04-18 | 苏州大学张家港工业技术研究院 | Automatic driving system and method based on relative-entropy deep inverse reinforcement learning |
CN110238855A (en) * | 2019-06-24 | 2019-09-17 | 浙江大学 | A kind of robot random ordering workpiece grabbing method based on the reverse intensified learning of depth |
CN110321811A (en) * | 2019-06-17 | 2019-10-11 | 中国工程物理研究院电子工程研究所 | Depth is against the object detection method in the unmanned plane video of intensified learning |
WO2019237474A1 (en) * | 2018-06-11 | 2019-12-19 | 苏州大学 | Partially-observable automatic driving decision-making method and system based on constraint online planning |
WO2020000192A1 (en) * | 2018-06-26 | 2020-01-02 | Psa Automobiles Sa | Method for providing vehicle trajectory prediction |
CN110654372A (en) * | 2018-06-29 | 2020-01-07 | 比亚迪股份有限公司 | Vehicle driving control method and device, vehicle and storage medium |
CN110837258A (en) * | 2019-11-29 | 2020-02-25 | 商汤集团有限公司 | Automatic driving control method, device, system, electronic device and storage medium |
CN110850861A (en) * | 2018-07-27 | 2020-02-28 | 通用汽车环球科技运作有限责任公司 | Attention-based hierarchical lane change depth reinforcement learning |
CN110955239A (en) * | 2019-11-12 | 2020-04-03 | 中国地质大学(武汉) | Unmanned ship multi-target trajectory planning method and system based on inverse reinforcement learning |
CN111026127A (en) * | 2019-12-27 | 2020-04-17 | 南京大学 | Automatic driving decision method and system based on partially observable transfer reinforcement learning |
CN111159832A (en) * | 2018-10-19 | 2020-05-15 | 百度在线网络技术(北京)有限公司 | Construction method and device of traffic information flow |
CN114194211A (en) * | 2021-11-30 | 2022-03-18 | 浪潮(北京)电子信息产业有限公司 | Automatic driving method and device, electronic equipment and storage medium |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110673602B (en) * | 2019-10-24 | 2022-11-25 | 驭势科技(北京)有限公司 | Reinforced learning model, vehicle automatic driving decision method and vehicle-mounted equipment |
TWI737437B (en) * | 2020-08-07 | 2021-08-21 | 財團法人車輛研究測試中心 | Trajectory determination method |
US20230143937A1 (en) * | 2021-11-10 | 2023-05-11 | International Business Machines Corporation | Reinforcement learning with inductive logic programming |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140278052A1 (en) * | 2013-03-15 | 2014-09-18 | Caliper Corporation | Lane-level vehicle navigation for vehicle routing and traffic management |
CN106842925A (en) * | 2017-01-20 | 2017-06-13 | 清华大学 | A kind of locomotive smart steering method and system based on deeply study |
CN107074178A (en) * | 2014-09-16 | 2017-08-18 | 本田技研工业株式会社 | Drive assistance device |
CN107084735A (en) * | 2017-04-26 | 2017-08-22 | 电子科技大学 | Guidance path framework suitable for reducing redundancy navigation |
CN107169567A (en) * | 2017-03-30 | 2017-09-15 | 深圳先进技术研究院 | The generation method and device of a kind of decision networks model for Vehicular automatic driving |
CN107200017A (en) * | 2017-05-22 | 2017-09-26 | 北京联合大学 | A kind of automatic driving vehicle control system based on deep learning |
CN107229973A (en) * | 2017-05-12 | 2017-10-03 | 中国科学院深圳先进技术研究院 | The generation method and device of a kind of tactful network model for Vehicular automatic driving |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103699717A (en) * | 2013-12-03 | 2014-04-02 | 重庆交通大学 | Complex road automobile traveling track predication method based on foresight cross section point selection |
CN105718750B (en) * | 2016-01-29 | 2018-08-17 | 长沙理工大学 | A kind of prediction technique and system of vehicle driving trace |
CN107544516A (en) * | 2017-10-11 | 2018-01-05 | 苏州大学 | Automated driving system and method based on relative entropy depth against intensified learning |
-
2017
- 2017-10-11 CN CN201710940590.XA patent/CN107544516A/en active Pending
-
2018
- 2018-03-12 WO PCT/CN2018/078740 patent/WO2019071909A1/en active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140278052A1 (en) * | 2013-03-15 | 2014-09-18 | Caliper Corporation | Lane-level vehicle navigation for vehicle routing and traffic management |
CN107074178A (en) * | 2014-09-16 | 2017-08-18 | 本田技研工业株式会社 | Drive assistance device |
CN106842925A (en) * | 2017-01-20 | 2017-06-13 | 清华大学 | A kind of locomotive smart steering method and system based on deeply study |
CN107169567A (en) * | 2017-03-30 | 2017-09-15 | 深圳先进技术研究院 | The generation method and device of a kind of decision networks model for Vehicular automatic driving |
CN107084735A (en) * | 2017-04-26 | 2017-08-22 | 电子科技大学 | Guidance path framework suitable for reducing redundancy navigation |
CN107229973A (en) * | 2017-05-12 | 2017-10-03 | 中国科学院深圳先进技术研究院 | The generation method and device of a kind of tactful network model for Vehicular automatic driving |
CN107200017A (en) * | 2017-05-22 | 2017-09-26 | 北京联合大学 | A kind of automatic driving vehicle control system based on deep learning |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109460015B (en) * | 2017-09-06 | 2022-04-15 | 通用汽车环球科技运作有限责任公司 | Unsupervised learning agent for autonomous driving applications |
CN109460015A (en) * | 2017-09-06 | 2019-03-12 | 通用汽车环球科技运作有限责任公司 | Unsupervised learning agency for autonomous driving application |
WO2019071909A1 (en) * | 2017-10-11 | 2019-04-18 | 苏州大学张家港工业技术研究院 | Automatic driving system and method based on relative-entropy deep inverse reinforcement learning |
WO2019237474A1 (en) * | 2018-06-11 | 2019-12-19 | 苏州大学 | Partially-observable automatic driving decision-making method and system based on constraint online planning |
WO2020000192A1 (en) * | 2018-06-26 | 2020-01-02 | Psa Automobiles Sa | Method for providing vehicle trajectory prediction |
CN110654372B (en) * | 2018-06-29 | 2021-09-03 | 比亚迪股份有限公司 | Vehicle driving control method and device, vehicle and storage medium |
CN110654372A (en) * | 2018-06-29 | 2020-01-07 | 比亚迪股份有限公司 | Vehicle driving control method and device, vehicle and storage medium |
CN110850861B (en) * | 2018-07-27 | 2023-05-23 | 通用汽车环球科技运作有限责任公司 | Attention-based hierarchical lane-changing depth reinforcement learning |
CN110850861A (en) * | 2018-07-27 | 2020-02-28 | 通用汽车环球科技运作有限责任公司 | Attention-based hierarchical lane change depth reinforcement learning |
CN109636432B (en) * | 2018-09-28 | 2023-05-30 | 创新先进技术有限公司 | Computer-implemented item selection method and apparatus |
CN109636432A (en) * | 2018-09-28 | 2019-04-16 | 阿里巴巴集团控股有限公司 | The project selection method and device that computer executes |
CN111159832A (en) * | 2018-10-19 | 2020-05-15 | 百度在线网络技术(北京)有限公司 | Construction method and device of traffic information flow |
CN111159832B (en) * | 2018-10-19 | 2024-04-02 | 百度在线网络技术(北京)有限公司 | Traffic information stream construction method and device |
CN110321811A (en) * | 2019-06-17 | 2019-10-11 | 中国工程物理研究院电子工程研究所 | Depth is against the object detection method in the unmanned plane video of intensified learning |
CN110321811B (en) * | 2019-06-17 | 2023-05-02 | 中国工程物理研究院电子工程研究所 | Target detection method in unmanned aerial vehicle aerial video for deep reverse reinforcement learning |
CN110238855A (en) * | 2019-06-24 | 2019-09-17 | 浙江大学 | A kind of robot random ordering workpiece grabbing method based on the reverse intensified learning of depth |
CN110955239A (en) * | 2019-11-12 | 2020-04-03 | 中国地质大学(武汉) | Unmanned ship multi-target trajectory planning method and system based on inverse reinforcement learning |
CN110837258B (en) * | 2019-11-29 | 2024-03-08 | 商汤集团有限公司 | Automatic driving control method, device, system, electronic equipment and storage medium |
CN110837258A (en) * | 2019-11-29 | 2020-02-25 | 商汤集团有限公司 | Automatic driving control method, device, system, electronic device and storage medium |
CN111026127B (en) * | 2019-12-27 | 2021-09-28 | 南京大学 | Automatic driving decision method and system based on partially observable transfer reinforcement learning |
CN111026127A (en) * | 2019-12-27 | 2020-04-17 | 南京大学 | Automatic driving decision method and system based on partially observable transfer reinforcement learning |
CN114194211B (en) * | 2021-11-30 | 2023-04-25 | 浪潮(北京)电子信息产业有限公司 | Automatic driving method and device, electronic equipment and storage medium |
CN114194211A (en) * | 2021-11-30 | 2022-03-18 | 浪潮(北京)电子信息产业有限公司 | Automatic driving method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2019071909A1 (en) | 2019-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107544516A (en) | Automated driving system and method based on relative entropy depth against intensified learning | |
Zhu et al. | Human-like autonomous car-following model with deep reinforcement learning | |
JP7287707B2 (en) | Driverless vehicle lane change decision method and system based on adversarial imitation learning | |
CN112162555B (en) | Vehicle control method based on reinforcement learning control strategy in hybrid vehicle fleet | |
EP3719603B1 (en) | Action control method and apparatus | |
US11734828B2 (en) | High quality instance segmentation | |
DE102019113856A1 (en) | SYSTEMS, METHODS AND CONTROLS FOR AN AUTONOMOUS VEHICLE THAT IMPLEMENT AUTONOMOUS DRIVING AGENTS AND GUIDANCE LEARNERS TO CREATE AND IMPROVE GUIDELINES BASED ON THE COLLECTIVE DRIVING EXPERIENCES OF THE AUTONOMOUS DRIVING AGENTS | |
US20210004006A1 (en) | Method and system for predictive control of vehicle using digital images | |
US11580851B2 (en) | Systems and methods for simulating traffic scenes | |
US11891087B2 (en) | Systems and methods for generating behavioral predictions in reaction to autonomous vehicle movement | |
CN112580801B (en) | Reinforced learning training method and decision-making method based on reinforced learning | |
CN103336863A (en) | Radar flight path observation data-based flight intention recognition method | |
Zhao et al. | Personalized car following for autonomous driving with inverse reinforcement learning | |
CN112230675B (en) | Unmanned aerial vehicle task allocation method considering operation environment and performance in collaborative search and rescue | |
US20220153298A1 (en) | Generating Motion Scenarios for Self-Driving Vehicles | |
CN112071062B (en) | Driving time estimation method based on graph convolution network and graph attention network | |
CN109727490A (en) | A kind of nearby vehicle behavior adaptive corrective prediction technique based on driving prediction field | |
CN115494879B (en) | Rotor unmanned aerial vehicle obstacle avoidance method, device and equipment based on reinforcement learning SAC | |
CN110320932A (en) | A kind of flight pattern reconstructing method based on differential evolution algorithm | |
DE102021114724A1 (en) | IMPROVED VEHICLE OPERATION | |
CN108985488A (en) | The method predicted to individual trip purpose | |
CN111580526A (en) | Cooperative driving method for fixed vehicle formation scene | |
CN111310919A (en) | Driving control strategy training method based on scene segmentation and local path planning | |
CN114167898B (en) | Global path planning method and system for collecting data of unmanned aerial vehicle | |
CN107767036A (en) | A kind of real-time traffic states method of estimation based on condition random field |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20201228 Address after: 210034 building C4, Hongfeng Science Park, Nanjing Economic and Technological Development Zone, Jiangsu Province Applicant after: NANQI XIANCE (NANJING) TECHNOLOGY Co.,Ltd. Address before: 215006 No.8, Jixue Road, Xiangcheng District, Suzhou City, Jiangsu Province Applicant before: Suzhou University |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180105 |