CN107544516A - Automated driving system and method based on relative entropy depth against intensified learning - Google Patents

Automated driving system and method based on relative entropy depth against intensified learning Download PDF

Info

Publication number
CN107544516A
CN107544516A CN201710940590.XA CN201710940590A CN107544516A CN 107544516 A CN107544516 A CN 107544516A CN 201710940590 A CN201710940590 A CN 201710940590A CN 107544516 A CN107544516 A CN 107544516A
Authority
CN
China
Prior art keywords
driving
road information
strategy
relative entropy
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710940590.XA
Other languages
Chinese (zh)
Inventor
林嘉豪
章宗长
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NANQI XIANCE (NANJING) TECHNOLOGY Co.,Ltd.
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201710940590.XA priority Critical patent/CN107544516A/en
Publication of CN107544516A publication Critical patent/CN107544516A/en
Priority to PCT/CN2018/078740 priority patent/WO2019071909A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions

Abstract

The present invention relates to a kind of automated driving system based on relative entropy depth against intensified learning, including:(1) client:Show driving strategy;(2) basic data acquisition subsystem is driven:Gather road information;(3) memory module:With client and driving basic data acquisition subsystem and being connected and road information that memory of driving basic data acquisition subsystem is collected;Wherein, drive basic data acquisition subsystem collection road information and the road information is transferred to client and memory module, memory module receives road information, and one section of lasting road information is stored as historical track, analysis calculating is carried out according to historical track and simulates driving strategy, memory module transmits the driving strategy to client for selection by the user, and client receives road information and selected to implement automatic Pilot according to user.Present system realizes automatic Pilot under model-free using the depth of relative entropy against nitrification enhancement.

Description

Automated driving system and method based on relative entropy depth against intensified learning
Technical field
The present invention relates to a kind of automated driving system and method based on relative entropy depth against intensified learning, belong to and drive automatically Sail technical field.
Background technology
With the increase of China's automobile volume of holding, road traffic congestion phenomenon is increasingly severe, etesian traffic thing Therefore also constantly rising, in order to preferably solve this problem, research and development automatic vehicle control system is necessary.And with The lifting that people are pursued quality of life, it is desirable to be liberated from the driving-activity of fatigue, automatic Pilot technology should Transport and give birth to.
A kind of existing automatic vehicle control system is to be distinguished to drive by the video camera mounted in driver's cabin and image identification system Environment is sailed, then by vehicle-mounted main control computer, GPS positioning system and path planning software according to road-map kept in advance etc. Information is navigated to vehicle, and rational driving path is cooked up between the current location of vehicle and destination by vehicle guidance Destination.
In above-mentioned automatic vehicle control system, because road-map is pre-stored in vehicle, the renewal of its data depends on The manual operation of driver, renewal frequency cannot be guaranteed, also, enable a driver to accomplish to upgrade in time, it is also possible to by The data for finally give on the up-to-date information of road in existing resource can not react road feelings instantly Condition, it is unreasonable to ultimately cause traffic route, and navigation accuracy rate is not high, is made troubles to driving.Also, at present in automatic Pilot skill Most of automatic vehicle control system in art field also needs to manually be intervened, and can not reach the ground of complete automatic Pilot Step.
The content of the invention
It is an object of the invention to provide a kind of automated driving system and method based on relative entropy depth against intensified learning, Using deep neural network structure and the history driving locus information of user driver is inputted, a variety of individual characteies that represent is obtained and drives habit Used driving strategy, the automatic Pilot of individual character, intelligence is carried out by these driving strategies.
To reach above-mentioned purpose, the present invention provides following technical scheme:It is a kind of based on relative entropy depth against intensified learning Automated driving system, the system include:
Client:Show driving strategy;
Drive basic data acquisition subsystem:Gather road information;
Memory module:It is connected with the client and driving basic data acquisition subsystem and stores the basic number of driving The road information collected according to acquisition subsystem;
Wherein, the driving basic data acquisition subsystem gathers road information and is transferred to the road information described Client and memory module, the memory module receives the road information, and one section of lasting road information is stored as going through History track, carry out analyzing calculating simulating driving strategy according to the historical track, the memory module is by the driving strategy Transmit to client for selection by the user, the client receives and according to the road information and user personality selection Driving strategy implements automatic Pilot.
Further, the memory module include be used for store history driving locus driving locus storehouse, according to drive rail Mark and driving habit calculate and simulate the trace information processing subsystem of driving strategy and the driving strategy of memory of driving strategy Storehouse;Driving locus data are transferred to the trace information processing subsystem, the trace information processing by the driving locus storehouse Subsystem calculates according to the driving locus data analysis and simulates driving strategy and be transferred to the driving strategy storehouse, described Driving strategy storehouse receives and stores the driving strategy.
Further, the trace information processing subsystem using the relative entropy depth of multiple target against nitrification enhancement meter Calculate simultaneously drive simulating strategy.
Further, the inverse nitrification enhancement of the multiple target is strengthened using EM algorithm frame nesting relative entropies depth is inverse Study calculates the parameter of more reward functions.
Further, the basic data acquisition subsystem that drives includes being used for the sensor for gathering road information.
Present invention also offers a kind of method based on relative entropy depth against the automatic Pilot of intensified learning, methods described bag Include following steps:
Comprise the following steps:
S1:The road information is simultaneously transferred to client and memory module by collection road information;
S2:The memory module receives the road information and one section of lasting road information is stored as into historical track, Calculated according to the historical track analysis and simulate a variety of driving strategies, and the driving strategy is passed into the client;
S3:The client receives the road information and driving strategy, and the individual character driving strategy selected according to user And road information implements automatic Pilot.
Further, the memory module is including being used to store the driving locus storehouse of history driving locus, being advised according to driving Draw and driving habit calculates and simulates the trace information processing subsystem of driving strategy and the driving strategy of memory of driving strategy Storehouse;Driving locus data are transferred to the trace information processing subsystem, the trace information processing by the driving locus storehouse Subsystem calculates according to the driving locus data analysis and simulates driving strategy and be transferred to the driving strategy storehouse, described Driving strategy storehouse receives and stores the driving strategy.
Further, the trace information processing subsystem using the relative entropy depth of multiple target against nitrification enhancement meter Calculate simultaneously drive simulating strategy.
Further, the inverse nitrification enhancement of the multiple target is strengthened using EM algorithm frame nesting relative entropies depth is inverse Study calculates the parameter of more reward functions.
The beneficial effects of the present invention are:Basic data acquisition subsystem is driven by setting in systems, collection in real time Road information, and road information passed into memory module, memory module receive after road information and by one section of lasting roads Information is stored as historical track, according to history driving locus drive simulating strategy, realizes individual character, the automatic Pilot of intelligence.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And can be practiced according to the content of specification, below with presently preferred embodiments of the present invention and coordinate accompanying drawing describe in detail as after.
Brief description of the drawings
Fig. 1 be the present invention based on relative entropy depth against the automated driving system of intensified learning and the flow chart of method.
Fig. 2 is markov decision process MDP schematic diagrames.
Embodiment
With reference to the accompanying drawings and examples, the embodiment of the present invention is described in further detail.Implement below Example is used to illustrate the present invention, but is not limited to the scope of the present invention.
Refer to Fig. 1, the automatic Pilot system based on relative entropy depth against intensified learning of a preferred embodiment of the invention System includes:
Client 1:Show driving strategy;
Drive basic data acquisition subsystem 2:Gather road information;
Memory module 3:It is connected with the client 1 and driving basic data acquisition subsystem 2 and stores the driving base The road information that plinth data acquisition subsystem 2 is collected;
Wherein, the driving basic data acquisition subsystem 2 gathers road information and the road information is transferred into institute Client 1 and memory module 3 are stated, the memory module 3 receives the road information, and one section of lasting road information is stored For historical track, analyze calculating according to the historical track and simulate driving strategy, the memory module 3 is by the driving To client 1 for selection by the user, the client 1 receives the road information and the individual character selected according to user to strategy transmission Driving strategy implements automatic Pilot.In the present embodiment, the memory module 3 is high in the clouds.
The 1 most important function of client is and user's finishing man-machine interaction process, there is provided to individual character, intelligent more The selection of kind driving strategy and service.Client 1 selects according to the driving strategy of user personality, from the driving strategy storehouse 33 of high in the clouds 3 Corresponding driving strategy is downloaded, real-time Driving Decision-making is carried out then according to driving strategy and basic data, realizes real-time nothing People's Driving control.
The driving basic data acquisition subsystem 2 passes through sensor collection road information (not shown).The letter collected Breath has two purposes:Client 1 is passed information to, basic data is provided for current Driving Decision-making;Communicate information to cloud The driving locus storehouse 31 at end 3, it is stored as the history driving locus data of user driver.
The high in the clouds 3 include by the driving locus storehouse 31 of history driving locus, according to drive plan and driving habit based on Calculate and simulate the trace information processing subsystem 32 of driving strategy and the driving strategy storehouse 33 of memory of driving strategy;The driving Driving locus data are transferred to the trace information processing subsystem 32, the trace information processing subsystem 32 by track storehouse 31 Calculated according to the driving locus data analysis and simulate driving strategy and be transferred to the driving strategy storehouse 33, the driving Policy library 33 receives and stores the driving strategy.The trace information processing subsystem 32 uses the relative entropy depth of multiple target Inverse nitrification enhancement calculates and drive simulating strategy.In the present embodiment, the inverse nitrification enhancement of the multiple target uses EM algorithm frames nesting relative entropy depth calculates the parameter of more reward functions against intensified learning.The history driving locus includes special Family's history driving locus and the historical track of user.
The inverse intensified learning IRL refers to that reward functions R is unknown in markov decision process MDP known to environment Problem.In in general intensified learning problem RL, often utilize known to environment, given reward functions R and Markov Property estimates the value Q of a state action pair (s, a) (alternatively referred to as action accumulation reward value), then using convergent each State action pair value Q (s, a) ask for tactful π, intelligent body (Agent) can Utilization strategies π carry out decision-making.In reality, Reward functions R is often extremely difficult to what is known, but some outstanding track TNIt is easier to obtain.It is unknown in reward functions Markov decision process MDP/R in, utilize outstanding track TNThe problem of reducing reward functions R is referred to as inverse intensified learning Problem IRL.
In the present embodiment, using known user's history driving locus data in the driving locus storehouse 31, phase is carried out To entropy depth against intensified learning, the reward functions R of a variety of user personalities is restored, and then simulate corresponding driving strategy π.Phase To entropy depth against nitrification enhancement be a kind of algorithm of model-free, without in known environment model state transition function T (s, A, s '), relative entropy can utilize the method for importance sampling to avoid state transition function T in the calculation against nitrification enhancement (s,a,s′)。
In the present embodiment, the automatic Pilot decision process of automobile is a Markovian decision mistake without reward functions Journey MDP/R, being expressed as set, { state space S, motion space A, the state transition probability T of Environment Definition (are omitted to environment Transition probability T requirement).Automobile Agent value function (accumulative reward value) can be expressed asAnd automobile Agent state action value function can be expressed as Q (s, a)= Rθ(s,a)+γET(s,a,s′)[V(s′)].In order to solve the problems, such as more complicated true driving, to the hypothesis of reward functions no longer Simply simple linear combination, but it is assumed to be deep neural network R (s, a, θ)=g1(g2(…(gn(f(s,a), θn),…),θ2),θ1), wherein f (s, a) represents (s, a) driving at place link characteristic information, θiRepresent deep neural network the The parameter of i layers.
Meanwhile in order to meet more individual character, more intelligent true Driving Scene, it is assumed that there is multiple reward functions R (target) same When exist, represent the different driving habit of user driver.Assuming that G reward functions be present, the priori of this G reward functions is made Probability distribution is ρ1,…,ρG, award weight is θ1,…,θG, make Θ=(ρ1,…,ρG1,…,θG), represent this G award letter Several parameter sets.
Refer to Fig. 2, it is known have hypothesis reward functions (by initializing or being obtained by iteration) under conditions of, now I Problem can be described as a complete markov decision process MDP.Now in complete markov decision process Under MDP, according to the knowledge of intensified learning, reward functions R (s, a, θ)=g is utilized1(g2(…(gn(f,θn),…),θ2),θ1), I V values and Q values can be assessed.For the assessment algorithm of intensified learning, using a kind of new soft maximization approach (MellowMax) desired value of V values is estimated.MellowMax maker is defined as:MellowMax is a kind of algorithm more optimized, and it can ensure to V values Estimation can converge on uniquely a bit.Meanwhile MellowMax is but also with speciality:The probability assignments mechanism and expectation estimation of science Method.In the present embodiment, combine exploration of the MellowMax nitrification enhancement during automatic Pilot to environment and Use aspects will be more reasonable.It ensure that when intensified learning process restrains, automated driving system has had to various scenes Enough study simultaneously can be to assessment of the current state generation compared with science.
In the present embodiment, according to a kind of soft intensified learning for maximizing algorithm MellowMax is combined, can obtain pair The more scientific evaluation of the desired value of the feature of state.Using MellowMax can obtain action choose probability distribution beUnder the rule that the soft maximized action is chosen, the iteration of intensified learning is utilized Process, can obtain can obtain the expectation of feature using the parameter of current depth neutral net as the reward functions that θ is formed Value μ.μ is appreciated that the accumulative expectation being characterized.
In the present embodiment, the above-mentioned multiple target with hidden variable is solved using EM algorithms against intensified learning problem.EM is calculated Method can be divided into E steps by step and M is walked, and is walked by E, the continuous iteration of M steps, approaches the maximum of possibility predication.
E is walked:Calculate firstWherein Z is regular terms.zijRepresent i-th of driving rail Mark belongs to driving habit (reward functions) j probability.
Make yi=j represents that i-th of driving locus belongs to driving habit j, and with y=(y1,…,yN) set expression N number of drive Sail the subordinate set of track.
Calculate likelihood estimator Q (Θ, Θt)=∑yL(Θ|D,y)Pr(y|D,Θt) (Q functions Q referred herein (Θ, Θt) be EM algorithms renewal object function, pay attention to intensified learning in Q operating state value functions distinguish), through reckoning Obtain likelihood estimator
M is walked:Choose suitable more driving habit parameter sets Θ (ρlAnd θl) cause E step in likelihood estimator Q (Θ, Θt) maximization.Due to ρlAnd θlMutual independence, can separate ask their maximization.It can obtain Latter half
For Q (Θ, the Θ of maximizingt) latter half more fresh target: It can be understood as It is on being θ in the parameter of l cluster targetslUnder conditions of the track observed SetMaximum likelihood equations can be obtained.We can solve this using relative entropy depth against the knowledge of intensified learning Individual maximum likelihood equations.The solution formula of relative entropy, while maximum likelihood more fresh target is met, can naturally it be applied to The backpropagation renewal of deep neural network parameter.The maximization object function for making deep neural network is L (θ)=logP (D, θ | r), according to the decomposition formula of joint likelihood function, can obtain L (θ)=logP (D, θ | r)=logP (D | r)+logP (θ). Local derviation is asked to obtain the joint plausible goals functionFor the first half of the local derviation, one can be entered Step is decomposed, and is expressed as
WhereinKnowledge according to relative entropy against intensified learning, solving result can be obtained as under current reward functions The difference of feature desired value and expert features valueWherein, utilization is important Property sampling,Wherein, π is a kind of given strategy, is obtained according to this tactful π samplingsIndividual track.Its InWherein τ=s1a1,…,sHaH.Further,Its InIt is expressed as updating the ladder hidden in deep neural network and calculated during layer parameter by back-propagation algorithm Degree.
Gradient updating complement mark a completion that relative entropy depth updates against intensified learning iteration.Completed using renewal The new depth network reward functions of parameter renewal produce new tactful π, carry out new iteration.
Continuous iteration carries out the calculating of E steps and M steps, until likelihood estimator Q (Θ, Θt) converge to maximum.Now obtain The parameter sets Θ=(ρ obtained1,…,ρG1,…,θG), it is exactly that we want the reward functions of the more driving habits of representative of solution Prior distribution and weight.
In the present embodiment, according to this parameter sets Θ, by intensified learning RL calculating, each driving habit is obtained R driving strategy π.More driving strategies are exported, and are preserved in driving strategy storehouse beyond the clouds.User can select in the client The driving strategy of individual character, intelligence.
Present invention also offers a kind of method based on relative entropy depth against the automatic Pilot of intensified learning, methods described bag Include following steps:
S1:The road information is simultaneously transferred to client and memory module by collection road information;
S2:The memory module, which receives the road information and analyzed according to the road information, calculates and simulates a variety of drive Strategy is sailed, and the driving strategy is passed into the client;
S3:The client receives the road information and driving strategy, and the individual character driving strategy selected according to user And road information implements automatic Pilot.
In summary:Basic data acquisition subsystem 2 is driven by setting in systems, gathers road information in real time, and Road information is passed into memory module 3 and client 1, memory module 3 is received after road information according to history driving locus mould Intend driving strategy, realize individual character, the automatic Pilot of intelligence.
In automatic Pilot based on this method, driving strategy is all realized in 3 and calculated beyond the clouds, rather than in client 1 Run calculating process.When user is needing to carry out automatic Pilot, all driving strategies all 3 have been completed beyond the clouds.With Family only needs selection to download the driving strategy needed for oneself, driving strategy and real-time road of the car body can according to selected by user Road information carries out real-time automatic Pilot.Meanwhile after any once driving is completed, substantial amounts of road information uploads to high in the clouds 3 are stored as history driving locus.Using the history driving locus big data of storage, then realize the renewal to driving strategy storehouse. Using trace information big data, the system will realize the automatic Pilot for demand of being more close to the users.
Each technical characteristic of embodiment described above can be combined arbitrarily, to make description succinct, not to above-mentioned reality Apply all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, the scope that this specification is recorded all is considered to be.
Embodiment described above only expresses the several embodiments of the present invention, and its description is more specific and detailed, but simultaneously Can not therefore it be construed as limiting the scope of the patent.It should be pointed out that come for one of ordinary skill in the art Say, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection of the present invention Scope.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims (9)

1. a kind of automated driving system based on relative entropy depth against intensified learning, it is characterised in that the system includes:
Client:Show driving strategy;
Drive basic data acquisition subsystem:Gather road information;
Memory module:It is connected with the client and driving basic data acquisition subsystem and stores the driving basic data and is adopted The road information that subsystem is collected;
Wherein, the driving basic data acquisition subsystem gathers road information and the road information is transferred into the client End and memory module, the memory module receives the road information, and one section of lasting road information is stored as into history rail Mark, analysis calculating is carried out according to the historical track and simulates driving strategy, the memory module transmits the driving strategy To client for selection by the user, the driving that the client receives and selected according to the road information and user personality Strategy implement automatic Pilot.
2. the automated driving system based on relative entropy depth against intensified learning as claimed in claim 1, it is characterised in that described Memory module is including being used to store the driving locus storehouse of history driving locus, calculating and simulate according to driving locus and driving habit Go out the trace information processing subsystem of driving strategy and the driving strategy storehouse of memory of driving strategy;The driving locus storehouse will drive Track data is transferred to the trace information processing subsystem, and the trace information processing subsystem is according to the driving locus number Calculated according to analysis and simulate driving strategy and be transferred to the driving strategy storehouse, the driving strategy storehouse receives and stored described Driving strategy.
3. the automated driving system based on relative entropy depth against intensified learning as claimed in claim 2, it is characterised in that described Trace information processing subsystem is calculated using the relative entropy depth of multiple target against nitrification enhancement and drive simulating strategy.
4. the automated driving system based on relative entropy depth against intensified learning as claimed in claim 3, it is characterised in that described The inverse nitrification enhancement of multiple target calculates more reward functions using EM algorithm frames nesting relative entropy depth against intensified learning Parameter.
5. the personalized automated driving system based on relative entropy depth against intensified learning, its feature exist as claimed in claim 1 In the basic data acquisition subsystem that drives includes being used for the sensor for gathering road information.
A kind of 6. method based on relative entropy depth against the automatic Pilot of intensified learning, it is characterised in that methods described is included such as Lower step:
S1:The road information is simultaneously transferred to client and memory module by collection road information;
S2:The memory module receives the road information and one section of lasting road information is stored as into historical track, according to The historical track analysis is calculated and simulates a variety of driving strategies, and the driving strategy is passed into the client;
S3:The client receives the road information and driving strategy, and the individual character driving strategy and road selected according to user Road information implements automatic Pilot.
7. the method based on relative entropy depth against the automatic Pilot of intensified learning as claimed in claim 6, it is characterised in that institute Memory module is stated including being used to store the driving locus storehouse of history driving locus, calculating simultaneously mould according to driving planning and driving habit Draw up the trace information processing subsystem of driving strategy and the driving strategy storehouse of memory of driving strategy;The driving locus storehouse will drive Sail track data and be transferred to the trace information processing subsystem, the trace information processing subsystem is according to the driving locus Data analysis calculates and simulates driving strategy and be transferred to the driving strategy storehouse, and the driving strategy storehouse receives and stores institute State driving strategy.
8. the method based on relative entropy depth against the automatic Pilot of intensified learning as claimed in claim 7, it is characterised in that institute State trace information processing subsystem and simultaneously drive simulating strategy is calculated against nitrification enhancement using the relative entropy depth of multiple target.
9. the method based on relative entropy depth against the automatic Pilot of intensified learning as claimed in claim 8, it is characterised in that institute The inverse nitrification enhancement for stating multiple target calculates more reward functions using EM algorithm frames nesting relative entropy depth against intensified learning Parameter.
CN201710940590.XA 2017-10-11 2017-10-11 Automated driving system and method based on relative entropy depth against intensified learning Pending CN107544516A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710940590.XA CN107544516A (en) 2017-10-11 2017-10-11 Automated driving system and method based on relative entropy depth against intensified learning
PCT/CN2018/078740 WO2019071909A1 (en) 2017-10-11 2018-03-12 Automatic driving system and method based on relative-entropy deep inverse reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710940590.XA CN107544516A (en) 2017-10-11 2017-10-11 Automated driving system and method based on relative entropy depth against intensified learning

Publications (1)

Publication Number Publication Date
CN107544516A true CN107544516A (en) 2018-01-05

Family

ID=60967749

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710940590.XA Pending CN107544516A (en) 2017-10-11 2017-10-11 Automated driving system and method based on relative entropy depth against intensified learning

Country Status (2)

Country Link
CN (1) CN107544516A (en)
WO (1) WO2019071909A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460015A (en) * 2017-09-06 2019-03-12 通用汽车环球科技运作有限责任公司 Unsupervised learning agency for autonomous driving application
CN109636432A (en) * 2018-09-28 2019-04-16 阿里巴巴集团控股有限公司 The project selection method and device that computer executes
WO2019071909A1 (en) * 2017-10-11 2019-04-18 苏州大学张家港工业技术研究院 Automatic driving system and method based on relative-entropy deep inverse reinforcement learning
CN110238855A (en) * 2019-06-24 2019-09-17 浙江大学 A kind of robot random ordering workpiece grabbing method based on the reverse intensified learning of depth
CN110321811A (en) * 2019-06-17 2019-10-11 中国工程物理研究院电子工程研究所 Depth is against the object detection method in the unmanned plane video of intensified learning
WO2019237474A1 (en) * 2018-06-11 2019-12-19 苏州大学 Partially-observable automatic driving decision-making method and system based on constraint online planning
WO2020000192A1 (en) * 2018-06-26 2020-01-02 Psa Automobiles Sa Method for providing vehicle trajectory prediction
CN110654372A (en) * 2018-06-29 2020-01-07 比亚迪股份有限公司 Vehicle driving control method and device, vehicle and storage medium
CN110837258A (en) * 2019-11-29 2020-02-25 商汤集团有限公司 Automatic driving control method, device, system, electronic device and storage medium
CN110850861A (en) * 2018-07-27 2020-02-28 通用汽车环球科技运作有限责任公司 Attention-based hierarchical lane change depth reinforcement learning
CN110955239A (en) * 2019-11-12 2020-04-03 中国地质大学(武汉) Unmanned ship multi-target trajectory planning method and system based on inverse reinforcement learning
CN111026127A (en) * 2019-12-27 2020-04-17 南京大学 Automatic driving decision method and system based on partially observable transfer reinforcement learning
CN111159832A (en) * 2018-10-19 2020-05-15 百度在线网络技术(北京)有限公司 Construction method and device of traffic information flow
CN114194211A (en) * 2021-11-30 2022-03-18 浪潮(北京)电子信息产业有限公司 Automatic driving method and device, electronic equipment and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110673602B (en) * 2019-10-24 2022-11-25 驭势科技(北京)有限公司 Reinforced learning model, vehicle automatic driving decision method and vehicle-mounted equipment
TWI737437B (en) * 2020-08-07 2021-08-21 財團法人車輛研究測試中心 Trajectory determination method
US20230143937A1 (en) * 2021-11-10 2023-05-11 International Business Machines Corporation Reinforcement learning with inductive logic programming

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140278052A1 (en) * 2013-03-15 2014-09-18 Caliper Corporation Lane-level vehicle navigation for vehicle routing and traffic management
CN106842925A (en) * 2017-01-20 2017-06-13 清华大学 A kind of locomotive smart steering method and system based on deeply study
CN107074178A (en) * 2014-09-16 2017-08-18 本田技研工业株式会社 Drive assistance device
CN107084735A (en) * 2017-04-26 2017-08-22 电子科技大学 Guidance path framework suitable for reducing redundancy navigation
CN107169567A (en) * 2017-03-30 2017-09-15 深圳先进技术研究院 The generation method and device of a kind of decision networks model for Vehicular automatic driving
CN107200017A (en) * 2017-05-22 2017-09-26 北京联合大学 A kind of automatic driving vehicle control system based on deep learning
CN107229973A (en) * 2017-05-12 2017-10-03 中国科学院深圳先进技术研究院 The generation method and device of a kind of tactful network model for Vehicular automatic driving

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699717A (en) * 2013-12-03 2014-04-02 重庆交通大学 Complex road automobile traveling track predication method based on foresight cross section point selection
CN105718750B (en) * 2016-01-29 2018-08-17 长沙理工大学 A kind of prediction technique and system of vehicle driving trace
CN107544516A (en) * 2017-10-11 2018-01-05 苏州大学 Automated driving system and method based on relative entropy depth against intensified learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140278052A1 (en) * 2013-03-15 2014-09-18 Caliper Corporation Lane-level vehicle navigation for vehicle routing and traffic management
CN107074178A (en) * 2014-09-16 2017-08-18 本田技研工业株式会社 Drive assistance device
CN106842925A (en) * 2017-01-20 2017-06-13 清华大学 A kind of locomotive smart steering method and system based on deeply study
CN107169567A (en) * 2017-03-30 2017-09-15 深圳先进技术研究院 The generation method and device of a kind of decision networks model for Vehicular automatic driving
CN107084735A (en) * 2017-04-26 2017-08-22 电子科技大学 Guidance path framework suitable for reducing redundancy navigation
CN107229973A (en) * 2017-05-12 2017-10-03 中国科学院深圳先进技术研究院 The generation method and device of a kind of tactful network model for Vehicular automatic driving
CN107200017A (en) * 2017-05-22 2017-09-26 北京联合大学 A kind of automatic driving vehicle control system based on deep learning

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460015B (en) * 2017-09-06 2022-04-15 通用汽车环球科技运作有限责任公司 Unsupervised learning agent for autonomous driving applications
CN109460015A (en) * 2017-09-06 2019-03-12 通用汽车环球科技运作有限责任公司 Unsupervised learning agency for autonomous driving application
WO2019071909A1 (en) * 2017-10-11 2019-04-18 苏州大学张家港工业技术研究院 Automatic driving system and method based on relative-entropy deep inverse reinforcement learning
WO2019237474A1 (en) * 2018-06-11 2019-12-19 苏州大学 Partially-observable automatic driving decision-making method and system based on constraint online planning
WO2020000192A1 (en) * 2018-06-26 2020-01-02 Psa Automobiles Sa Method for providing vehicle trajectory prediction
CN110654372B (en) * 2018-06-29 2021-09-03 比亚迪股份有限公司 Vehicle driving control method and device, vehicle and storage medium
CN110654372A (en) * 2018-06-29 2020-01-07 比亚迪股份有限公司 Vehicle driving control method and device, vehicle and storage medium
CN110850861B (en) * 2018-07-27 2023-05-23 通用汽车环球科技运作有限责任公司 Attention-based hierarchical lane-changing depth reinforcement learning
CN110850861A (en) * 2018-07-27 2020-02-28 通用汽车环球科技运作有限责任公司 Attention-based hierarchical lane change depth reinforcement learning
CN109636432B (en) * 2018-09-28 2023-05-30 创新先进技术有限公司 Computer-implemented item selection method and apparatus
CN109636432A (en) * 2018-09-28 2019-04-16 阿里巴巴集团控股有限公司 The project selection method and device that computer executes
CN111159832A (en) * 2018-10-19 2020-05-15 百度在线网络技术(北京)有限公司 Construction method and device of traffic information flow
CN111159832B (en) * 2018-10-19 2024-04-02 百度在线网络技术(北京)有限公司 Traffic information stream construction method and device
CN110321811A (en) * 2019-06-17 2019-10-11 中国工程物理研究院电子工程研究所 Depth is against the object detection method in the unmanned plane video of intensified learning
CN110321811B (en) * 2019-06-17 2023-05-02 中国工程物理研究院电子工程研究所 Target detection method in unmanned aerial vehicle aerial video for deep reverse reinforcement learning
CN110238855A (en) * 2019-06-24 2019-09-17 浙江大学 A kind of robot random ordering workpiece grabbing method based on the reverse intensified learning of depth
CN110955239A (en) * 2019-11-12 2020-04-03 中国地质大学(武汉) Unmanned ship multi-target trajectory planning method and system based on inverse reinforcement learning
CN110837258B (en) * 2019-11-29 2024-03-08 商汤集团有限公司 Automatic driving control method, device, system, electronic equipment and storage medium
CN110837258A (en) * 2019-11-29 2020-02-25 商汤集团有限公司 Automatic driving control method, device, system, electronic device and storage medium
CN111026127B (en) * 2019-12-27 2021-09-28 南京大学 Automatic driving decision method and system based on partially observable transfer reinforcement learning
CN111026127A (en) * 2019-12-27 2020-04-17 南京大学 Automatic driving decision method and system based on partially observable transfer reinforcement learning
CN114194211B (en) * 2021-11-30 2023-04-25 浪潮(北京)电子信息产业有限公司 Automatic driving method and device, electronic equipment and storage medium
CN114194211A (en) * 2021-11-30 2022-03-18 浪潮(北京)电子信息产业有限公司 Automatic driving method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2019071909A1 (en) 2019-04-18

Similar Documents

Publication Publication Date Title
CN107544516A (en) Automated driving system and method based on relative entropy depth against intensified learning
Zhu et al. Human-like autonomous car-following model with deep reinforcement learning
JP7287707B2 (en) Driverless vehicle lane change decision method and system based on adversarial imitation learning
CN112162555B (en) Vehicle control method based on reinforcement learning control strategy in hybrid vehicle fleet
EP3719603B1 (en) Action control method and apparatus
US11734828B2 (en) High quality instance segmentation
DE102019113856A1 (en) SYSTEMS, METHODS AND CONTROLS FOR AN AUTONOMOUS VEHICLE THAT IMPLEMENT AUTONOMOUS DRIVING AGENTS AND GUIDANCE LEARNERS TO CREATE AND IMPROVE GUIDELINES BASED ON THE COLLECTIVE DRIVING EXPERIENCES OF THE AUTONOMOUS DRIVING AGENTS
US20210004006A1 (en) Method and system for predictive control of vehicle using digital images
US11580851B2 (en) Systems and methods for simulating traffic scenes
US11891087B2 (en) Systems and methods for generating behavioral predictions in reaction to autonomous vehicle movement
CN112580801B (en) Reinforced learning training method and decision-making method based on reinforced learning
CN103336863A (en) Radar flight path observation data-based flight intention recognition method
Zhao et al. Personalized car following for autonomous driving with inverse reinforcement learning
CN112230675B (en) Unmanned aerial vehicle task allocation method considering operation environment and performance in collaborative search and rescue
US20220153298A1 (en) Generating Motion Scenarios for Self-Driving Vehicles
CN112071062B (en) Driving time estimation method based on graph convolution network and graph attention network
CN109727490A (en) A kind of nearby vehicle behavior adaptive corrective prediction technique based on driving prediction field
CN115494879B (en) Rotor unmanned aerial vehicle obstacle avoidance method, device and equipment based on reinforcement learning SAC
CN110320932A (en) A kind of flight pattern reconstructing method based on differential evolution algorithm
DE102021114724A1 (en) IMPROVED VEHICLE OPERATION
CN108985488A (en) The method predicted to individual trip purpose
CN111580526A (en) Cooperative driving method for fixed vehicle formation scene
CN111310919A (en) Driving control strategy training method based on scene segmentation and local path planning
CN114167898B (en) Global path planning method and system for collecting data of unmanned aerial vehicle
CN107767036A (en) A kind of real-time traffic states method of estimation based on condition random field

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20201228

Address after: 210034 building C4, Hongfeng Science Park, Nanjing Economic and Technological Development Zone, Jiangsu Province

Applicant after: NANQI XIANCE (NANJING) TECHNOLOGY Co.,Ltd.

Address before: 215006 No.8, Jixue Road, Xiangcheng District, Suzhou City, Jiangsu Province

Applicant before: Suzhou University

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180105