CN108009587A - A kind of method and apparatus based on intensified learning and the definite driving strategy of rule - Google Patents

A kind of method and apparatus based on intensified learning and the definite driving strategy of rule Download PDF

Info

Publication number
CN108009587A
CN108009587A CN201711257834.0A CN201711257834A CN108009587A CN 108009587 A CN108009587 A CN 108009587A CN 201711257834 A CN201711257834 A CN 201711257834A CN 108009587 A CN108009587 A CN 108009587A
Authority
CN
China
Prior art keywords
information
vehicle
driving strategy
driving
rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711257834.0A
Other languages
Chinese (zh)
Other versions
CN108009587B (en
Inventor
许稼轩
周小成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Uisee Technologies Beijing Co Ltd
Original Assignee
Uisee Technologies Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Uisee Technologies Beijing Co Ltd filed Critical Uisee Technologies Beijing Co Ltd
Priority to CN201711257834.0A priority Critical patent/CN108009587B/en
Publication of CN108009587A publication Critical patent/CN108009587A/en
Application granted granted Critical
Publication of CN108009587B publication Critical patent/CN108009587B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The purpose of the application is to provide a kind of method or equipment that driving strategy is determined based on intensified learning and rule fusion;Drive parameter information based on vehicle, the first driving strategy information of the vehicle is determined by nitrification enhancement;Driving Rule Information based on the drive parameter information and the vehicle, rationality checking is carried out to the first driving strategy information;Based on the testing result of the rationality checking, the target driving strategy information of the vehicle is determined.Compared with prior art, the application is constrained with rule calculating the first definite driving strategy information by nitrification enhancement, so that the definite method of the driving strategy of the application is more intelligent compared with the method that the existing method that wagon control is realized using rule-based algorithm or use nitrification enhancement realize wagon control, and improve the reasonability and stability of finally definite driving strategy.

Description

A kind of method and apparatus based on intensified learning and the definite driving strategy of rule
Technical field
This application involves automatic Pilot field, more particularly to a kind of skill based on intensified learning and the definite driving strategy of rule Art.
Background technology
In existing vehicle travel process, for vehicle, the wagon control of particularly automatic driving vehicle mainly passes through Following several method is realized:Rule-based automatic Pilot technology, i.e., realize wagon control using rule-based algorithm, public according to logic Formula, directly obtains output control value, such algorithm is realized simple by state input value, it is not necessary to training, and control algolithm is defeated Go out result can be predicted, relatively stablize, but the algorithm does not possess intelligent, in the complex scene truly driven, is easily grabbed Right of way, therefore the algorithm can not successfully manage the complex scene truly driven;Wagon control, energy are realized using nitrification enhancement Enough so that driving strategy is more intelligent, but the time cost of intensified learning model training is higher, can not be applied to it is actual from Among dynamic driving, and the output result of algorithm is unpredictable;And the existing algorithm for merging rule and intensified learning, it can only incite somebody to action The result that rule-based algorithm and nitrification enhancement determine carries out linear, additive, and the time cost of model training is still higher, and Continuous trial and error is needed, leads to not be applied among actual automatic Pilot.
The content of the invention
The purpose of the application is to provide a kind of method and apparatus based on intensified learning and the definite driving strategy of rule.
According to the one side of the application, there is provided a kind of method based on intensified learning and the definite driving strategy of rule, Including:
Drive parameter information based on vehicle, the first driving strategy for determining the vehicle by nitrification enhancement are believed Breath;
Driving Rule Information based on the drive parameter information and the vehicle, to the first driving strategy information into Row rationality checking;
Based on the testing result of the rationality checking, the target driving strategy information of the vehicle is determined.
According to further aspect of the application, there is provided a kind of to determine setting for driving strategy based on intensified learning and rule It is standby, including:
First driving strategy information determining means, for the drive parameter information based on vehicle, pass through nitrification enhancement Determine the first driving strategy information of the vehicle;
Detection device, for the driving Rule Information based on the drive parameter information and the vehicle, to described first Driving strategy information carries out rationality checking;
Target driving strategy information determining means, for the testing result based on the rationality checking, determine the car Target driving strategy information.
According to the another aspect of the application, a kind of setting based on intensified learning and the definite driving strategy of rule is additionally provided It is standby, including:
One or more processors;
Memory;And
One or more programs, wherein one or more of programs are stored in the memory, and are configured Performed into by one or more of processors, described program includes being used to perform following operation:
Drive parameter information based on vehicle, the first driving strategy for determining the vehicle by nitrification enhancement are believed Breath;
Driving Rule Information based on the drive parameter information and the vehicle, to the first driving strategy information into Row rationality checking;
Based on the testing result of the rationality checking, the target driving strategy information of the vehicle is determined.
According to the another aspect of the application, a kind of computer-readable recording medium is additionally provided, is stored thereon with computer Program, the computer program can be executed by processor following operation:
Drive parameter information based on vehicle, the first driving strategy for determining the vehicle by nitrification enhancement are believed Breath;
Driving Rule Information based on the drive parameter information and the vehicle, to the first driving strategy information into Row rationality checking;
Based on the testing result of the rationality checking, the target driving strategy information of the vehicle is determined.
Compared with prior art, driving Rule Information of the application based on drive parameter information and vehicle, strengthens to passing through Learning algorithm determines that the first driving strategy information of the vehicle carries out rationality checking, and based on the inspection of the rationality checking Survey as a result, determine the vehicle target driving strategy information with realize to vehicle, particularly automatic driving vehicle, intelligent driving The control of vehicle.Here, the application using rule-based algorithm to realizing wagon control with realizing vehicle control using nitrification enhancement The method of system has carried out deeper fusion, is advised to calculating the first definite driving strategy information by nitrification enhancement Then constrained, pass through this new integration technology so that the definite method of the driving strategy of the application and existing use Rule-based algorithm realizes that the method for wagon control or use nitrification enhancement realize that the method for wagon control is compared more intelligently, And improve the reasonability and stability of finally definite driving strategy.
Brief description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 shows a kind of method stream based on intensified learning and the definite driving strategy of rule according to the application one side Cheng Tu;
Fig. 2 shows to be shown according to a kind of equipment based on intensified learning and the definite driving strategy of rule of the application one side It is intended to;
Fig. 3 shows the exemplary system that can be used for implementing each embodiment described herein.
The same or similar reference numeral represents the same or similar component in attached drawing.
Embodiment
The application is described in further detail below in conjunction with the accompanying drawings.
In one typical configuration of the application, terminal, the equipment of service network and computing device include one or more Processor (CPU), input/output interface, network interface and memory.
Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic cassette tape, magnetic disk storage or other magnetic storage apparatus or Any other non-transmission medium, the information that can be accessed by a computing device available for storage.Defined according to herein, computer Computer-readable recording medium does not include non-temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.
The application meaning equipment includes but not limited to user equipment, the network equipment or user equipment and the network equipment passes through Network is integrated formed equipment.The user equipment, which includes but not limited to any type, to carry out human-computer interaction with user The mobile electronic product of (such as human-computer interaction is carried out by touch pad), such as smart mobile phone, tablet computer etc., the mobile electricity Sub- product can use any operating system, such as android operating systems, iOS operating systems.Wherein, the network equipment Including it is a kind of can be according to the instruction for being previously set or storing, the automatic electronic equipment for carrying out numerical computations and information processing, its Hardware includes but not limited to microprocessor, application-specific integrated circuit (ASIC), programmable logic device (PLD), field programmable gate Array (FPGA), digital signal processor (DSP), embedded device etc..The network equipment includes but not limited to computer, net The cloud that network host, single network server, multiple webserver collection or multiple servers are formed;Here, cloud is by based on cloud meter The a large amount of computers or the webserver for calculating (Cloud Computing) are formed, wherein, cloud computing is the one of Distributed Calculation Kind, a virtual supercomputer being made of the computer collection of a group loose couplings.The network includes but not limited to interconnect Net, wide area network, Metropolitan Area Network (MAN), LAN, VPN network, wireless self-organization network (Ad Hoc networks) etc..
Fig. 1 shows a kind of method stream based on intensified learning and the definite driving strategy of rule according to the application one side Cheng Tu.Wherein, the method includes the steps S11, step S12 and step S13.It is described in a kind of implementation of the application Method performs in a kind of equipment based on intensified learning and the definite driving strategy of rule.
Wherein, in step s 11, can the drive parameter information based on vehicle, determined by nitrification enhancement described First driving strategy information of vehicle;Then, in step s 12, can be based on the drive parameter information and the vehicle Rule Information is driven, rationality checking is carried out to the first driving strategy information;Then, in step s 13, institute can be based on The testing result of rationality checking is stated, determines the target driving strategy information of the vehicle.
In this application, the vehicle can include but is not limited to complete mankind's driving model, auxiliary driving model, portion Divide the arbitrary moulds such as automatic driving mode, automatic driving mode of having ready conditions, highly automated driving model or fully automated driving model The vehicle of formula traveling.In a preferred embodiment, the vehicle can include automatic driving vehicle or intelligent driving vehicle, its In, in one implementation, the automatic driving vehicle can include the car in fully automated driving model downward driving ;The intelligent driving vehicle can be included in auxiliary driving model, part automatic driving mode, automatic Pilot mould of having ready conditions The vehicle of formula, highly automated driving model isotype downward driving.
Specifically, in step s 11, can the drive parameter information based on vehicle, institute is determined by nitrification enhancement State the first driving strategy information of vehicle.In a kind of implementation of the application, nitrification enhancement training can be first passed through Go out driving strategy model corresponding with wagon control, and then the drive parameter information of the vehicle is input to the driving strategy In model, and export the first driving strategy information.
Here, the drive parameter information can include the various types of vehicles of reflection vehicle running environment and vehicle running state Driving information.In one implementation, the drive parameter information includes but not limited to following at least any one:The speed of vehicle Spend information;The offset track directional information of vehicle;The range information of vehicle and track centerline;The distance of vehicle and track edges Information;Barrier perception information, such as the relative position and size of front obstacle;Traffic sign perception information, for example, it is red green Lamp Warning Mark, directional bea con, turning mark etc..In one implementation, the drive parameter information can be from all kinds of The vehicle drive information collected in sensor, for example, vehicle drive information in real time;It is described in another implementation Steered reference information can also be from other computing devices, such as emulator, simulator, as obtained in Torcs simulators.
Here, the driving strategy information in the application, such as the first driving strategy information of the vehicle can include pair The control information of the various vehicle drive behaviors, for example, the steering wheel angle control of vehicle, the brake control of vehicle, vehicle Throttle Opening Control etc..Here, those skilled in the art are it should be appreciated that above-mentioned each driving strategy information is only for example, now If other driving strategy information for having or occurring from now on can be suitable for the application, the protection model of the application should be also included in In enclosing, and it is incorporated herein in the form of reference.
Here, the intensified learning is also referred under a series of scene, reached by the appropriate decision-making of multistep To the learning process of a target, the problem of being a kind of sequence multistep decision-making.The target of intensified learning seeks to find an energy So that we obtain the strategy of cumulative maximum reward.Under the application scenarios of the wagon control of the application, a kind of possible realization In mode, the method for driving strategy model corresponding with wagon control is trained by nitrification enhancement to be included:Vehicle is being worked as Corresponding driver behavior is performed under preceding environment and state, based on driving strategy information, so that change the environment and state of itself, And obtain a reward, that is, determine feedback functional value, the feedback functional value embodies the vehicle and takes driving strategy to believe After breath, the change of state generation, in one implementation, can set state to improve, then feedback functional value is positive number, and return Feedback functional value is bigger, then state is better;Otherwise if state degenerates, feedback functional value is negative.By the setting of feedback function, The cyclic process that control institute's vehicle is interacted with its local environment, adjusts the driving strategy information of vehicle, so as to progressively train And improve driving strategy model corresponding with wagon control.In this application, the nitrification enhancement is additionally may included in by force Chemistry has merged the deeply learning algorithm of deep learning on the basis of practising, and then, trained by nitrification enhancement and car The corresponding driving strategy model of control can include intensified learning neural network model.Here, the deeply study is calculated Method can include but is not limited to Deep Q Learning, Double Q-Network, Deep Deterministic Policy Gradient (depth decision-making gradient method) etc..
And then can the drive parameter information based on vehicle, determine the vehicle by nitrification enhancement first drive Sail policy information.For example, the drive parameter information includes the velocity information, the offset track directional information of vehicle, car of vehicle With the range information and vehicle of track centerline and the range information of track edges, then based on passing through nitrification enhancement, example Such as, above-mentioned drive parameter information is input in intensified learning neutral net, then can be by exporting the steering wheel angle of vehicle The first driving strategy information such as control, brake control, Throttle Opening Control.
Then, in step s 12, driving Rule Information that can be based on the drive parameter information and the vehicle, it is right The first driving strategy information carries out rationality checking.And then in step s 13, the detection based on the rationality checking As a result, determine the target driving strategy information of the vehicle.
In this application, the Rule Information that drives includes drive parameter information or history drive parameter letter based on input Breath, is derived by predetermined logical formula, obtains the process of certain driving strategy information.Here, the driving Rule Information It can include existing Driving Scene, known driving experience, each rule-like of the export control policy of setting.Realized in one kind In mode, it is all kinds of that the driving Rule Information can include but is not limited to avoidance rule, path planning rule, pre- aiming rule etc. It is one or more in rule.If for example, the driving Rule Information includes pre- aiming rule, the history drive parameter letter of input Breath includes barrier perception information, such as the relative position and size of front obstacle, the velocity information of current vehicle and vehicle Offset track directional information, then calculate the steering wheel angle control, brake control or throttle of vehicle by rule-based algorithm formula The corresponding driving strategy information such as control, it is rule-based such as front direction offset track direction Θ, calculate steering wheel and just should Opposite direction turns 2 Θ.
In one embodiment of the application, in step s 12, the drive parameter information and the vehicle can be based on Driving Rule Information, determine the second driving strategy information of the vehicle.In one implementation, it is described to drive rule letter Breath can correspond to specific rule-based algorithm formula, and input information is drive parameter information, and plan is driven by calculating output second Slightly information, for example, input the drive parameter information include barrier perception information, as front obstacle relative position with The offset track directional information of size, the velocity information of current vehicle and vehicle, then calculate vehicle by rule-based algorithm formula Steering wheel angle control, brake control or the second driving strategy information such as Throttle Opening Control, such as work as front direction offset track direction Θ, rule-based, calculating steering wheel just should 2 Θ of opposite direction turn.
Then, rationality checking is carried out to the first driving strategy information based on the second driving strategy information. In a kind of implementation, carrying out rationality checking to the first driving strategy information based on the second driving strategy information can With including carrying out similitude detection for the second driving strategy information and the first driving strategy information.For example, it is assumed that first drives Policy information includes:Steering wheel angle Θ, throttle degree η, brake degree γ, second determined based on the driving Rule Information Driving strategy information includes steering wheel angle Θ ', and throttle degree η ', brake degree γ ', then can pass through more specific strategy Parameter, i.e. steering wheel angle Θ, throttle degree η, brake degree γ determine the second driving strategy information and the first driving strategy The similarity of information, such as the similarity is calculated by R=(Θ-Θ ')+(η-η ')+(γ-γ '), the value of R is smaller, and second The similarity of driving strategy information and the first driving strategy information is bigger.In one implementation, can be with the pre- of Provisioning Policy Determine threshold value, the judgment rule of the rationality checking is flexibly set by the comparison of the predetermined threshold and R values.For example, If the distance of the second driving strategy information and the first driving strategy information is greater than or equal to predetermined threshold, institute is judged State that the first driving strategy information is unreasonable, if otherwise the second driving strategy information and the first driving strategy information away from From less than predetermined threshold, then judge that the first driving strategy information is reasonable.
In one embodiment, in step s 13, driven if the testing result of the rationality checking includes described second Policy information and the distance of the first driving strategy information are greater than or equal to predetermined threshold, by the second driving strategy information It is determined as the target driving strategy information of the vehicle.And if the testing result of the rationality checking includes described second and drives Policy information is less than predetermined threshold with the first driving strategy information, the first driving strategy information is determined as described The target driving strategy information of vehicle.Since the result of the output of intensified learning is difficult prediction, and real unmanned or It is under intelligent driving scene, a small error is all probably fatal.So in this application, by the output knot of intensified learning Fruit, i.e. the first driving strategy information, carry out rationality checking, if the first driving strategy information is substantially not with Rule Information is driven Rationally, such as the vehicle is also deviated beyond railway line away from direction;And for example, the close distance in front obviously has barrier Hinder thing, the first driving strategy information still selects to accelerate, then by driving the constraint of Rule Information, it can be found that mistake therein And the actual execution of the first driving strategy information can be prevented, and then drive plan using determined based on driving Rule Information second Slightly information realization automatic Pilot operation.Here, the driving Rule Information includes drive parameter information or history based on input Drive parameter information, is derived by predetermined logical formula, obtains the process of certain driving strategy information.It is described to drive rule Information can include but is not limited to avoidance rule, path planning rule, pre- one aimed in all kinds of specific rules such as rule or It is multinomial.In one implementation, the rationality checking can include all kinds of tools included to the driving Rule Information Body rule is detected successively.Can set, if during in the presence of any one of being unsatisfactory for specific rules content, that is, be judged as it is unreasonable, And then the second driving strategy information is determined as to the target driving strategy information of the vehicle.
In one embodiment of the application, in step s 12, the car can be determined based on the drive parameter information Car status information after the first driving strategy information described in virtual execution.In this application, based on the drive parameter The driving Rule Information of information and the vehicle, the alternatively possible of rationality checking is carried out to the first driving strategy information Method be:In virtual environment, under the current environment of the vehicle and state, perform the drive parameter information and determine The first driving strategy information, so as to obtain the car status information at next moment, the car status information it is interior Appearance can be overlapped with the drive parameter information at next moment, i.e. can be included with including reflection vehicle running environment and vehicle The various types of vehicles driving information of transport condition.And then the driving Rule Information based on the vehicle, to the car status information Carry out rationality checking.Here, the car status information, which is the vehicle, performs the direct of the first driving strategy information As a result, therefore, whether the car status information rationally directly reflects the first driving strategy information reasonable, and The execution of the first driving strategy information is carried out in the virtual environment and generates car status information can be to avoid reality The unnecessary vehicle damage equivalent risk brought under Driving Scene.Here, the virtual environment can pass through emulator, simulation Device, such as Torcs simulators are built.
Then, in one embodiment, in step s 13, if the testing result of the rationality checking includes the car Status information belongs to vehicle safety scope, believes the first driving strategy information as the target driving strategy of the vehicle Breath;Otherwise, the driving Rule Information based on the drive parameter information and the vehicle, the target for generating the vehicle drive plan Slightly information.For example, it is that the vehicle bumps against barrier to perform the car status information that the first driving strategy information obtains, then Beyond vehicle safety scope, avoidance rule is deviated from, then the result of the rationality checking is unreasonable, so as to be driven based on described The driving Rule Information of parameter information and the vehicle is sailed, generates the target driving strategy information of the vehicle.
In one embodiment of the application, the method further includes step S14 (not shown), can be with step S14 Automatic Pilot operation is performed based on the target driving strategy information.Here, based on the mesh determined by rationality checking Driving strategy information is marked, corresponding driver behavior can be performed in real vehicles, such as in automatic driving vehicle or intelligence Drive and automatic Pilot operation is performed on vehicle.
In one embodiment of the application, the method further includes step S15 (not shown), can be with step S15 Based on the testing result of the rationality checking, the corresponding feedback functional value of the nitrification enhancement is updated.
Here, vehicle performs corresponding driver behavior under current environment and state, based on driving strategy information, so that Change the environment and state of itself, and obtain a reward, that is, determine feedback functional value, the feedback functional value embodies institute State after vehicle takes driving strategy information, the change that state occurs, in one implementation, can set state to improve, then Feedback functional value is positive number, and feedback functional value is bigger, then state is better;Otherwise if state degenerates, feedback functional value is negative Number.By the setting of feedback function, the cyclic process for controlling institute's vehicle to be interacted with its local environment, adjusts the driving of vehicle Policy information, so that progressively training and perfect driving strategy model corresponding with wagon control.
Therefore, if the testing result based on the rationality checking, the target driving strategy information of the definite vehicle Do not include the first driving strategy information, i.e., described testing result is unreasonable, sets the corresponding feedback functional value of nitrification enhancement For negative.For example, current feedback functional value is set as -100, intensified learning neutral net can be based on feedback after each decision-making Functional value is updated neural network parameter.If feedback functional value value is smaller, the possibility that next time makes similar decision-making is got over It is small, so as to avoid next analogue, such as the occurrence of unreasonable.If conversely, the rationality checking result is closed When reason, i.e. target driving strategy correspond to the first driving strategy information, it will it is positive number to set feedback functional value.
Driving Rule Information of the application based on drive parameter information and vehicle, it is described to being determined by nitrification enhancement First driving strategy information of vehicle carries out rationality checking, and based on the testing result of the rationality checking, determines described The target driving strategy information of vehicle with realize to vehicle, particularly automatic driving vehicle, intelligent driving vehicle control. This, the application using rule-based algorithm to realizing wagon control with realizing that the method for wagon control carries out using nitrification enhancement Deeper fusion, is constrained calculating the first definite driving strategy information by nitrification enhancement with rule, led to Cross this new integration technology so that the definite method of the driving strategy of the application realizes car with existing using rule-based algorithm The method of control realizes the method for wagon control compared to more intelligent using nitrification enhancement, and improves final The reasonability and stability of definite driving strategy.
Fig. 2 shows a kind of equipment 1 based on intensified learning and the definite driving strategy of rule according to the application one side Schematic diagram, wherein, the equipment 1 includes the first driving strategy information determining means 21, detection device 22 and target drive plan Slightly information determining means 23.
Wherein, the first driving strategy information determining means 21 can pass through extensive chemical based on the drive parameter information of vehicle Practise the first driving strategy information that algorithm determines the vehicle;Detection device 22 can be based on the drive parameter information and described The driving Rule Information of vehicle, rationality checking is carried out to the first driving strategy information;Target driving strategy information determines Device 23 based on the testing result of the rationality checking, can determine the target driving strategy information of the vehicle.
In this application, the vehicle can include but is not limited to complete mankind's driving model, auxiliary driving model, portion Divide the arbitrary moulds such as automatic driving mode, automatic driving mode of having ready conditions, highly automated driving model or fully automated driving model The vehicle of formula traveling.In a preferred embodiment, the vehicle can include automatic driving vehicle or intelligent driving vehicle, its In, in one implementation, the automatic driving vehicle can include the car in fully automated driving model downward driving ;The intelligent driving vehicle can be included in auxiliary driving model, part automatic driving mode, automatic Pilot mould of having ready conditions The vehicle of formula, highly automated driving model isotype downward driving.
Specifically, the first driving strategy information determining means 21 can pass through reinforcing based on the drive parameter information of vehicle Learning algorithm determines the first driving strategy information of the vehicle.In a kind of implementation of the application, it can first pass through strong Chemistry practises Algorithm for Training and goes out driving strategy model corresponding with wagon control, and then the drive parameter information of the vehicle is inputted Into the driving strategy model, and export the first driving strategy information.
Here, the drive parameter information can include the various types of vehicles of reflection vehicle running environment and vehicle running state Driving information.In one implementation, the drive parameter information includes but not limited to following at least any one:The speed of vehicle Spend information;The offset track directional information of vehicle;The range information of vehicle and track centerline;The distance of vehicle and track edges Information;Barrier perception information, such as the relative position and size of front obstacle;Traffic sign perception information, for example, it is red green Lamp Warning Mark, directional bea con, turning mark etc..In one implementation, the drive parameter information can be from all kinds of The vehicle drive information collected in sensor, for example, vehicle drive information in real time;It is described in another implementation Steered reference information can also be from other computing devices, such as emulator, simulator, as obtained in Torcs simulators.
Here, the driving strategy information in the application, such as the first driving strategy information of the vehicle can include pair The control information of the various vehicle drive behaviors, for example, the steering wheel angle control of vehicle, the brake control of vehicle, vehicle Throttle Opening Control etc..Here, those skilled in the art are it should be appreciated that above-mentioned each driving strategy information is only for example, now If other driving strategy information for having or occurring from now on can be suitable for the application, the protection model of the application should be also included in In enclosing, and it is incorporated herein in the form of reference.
Here, the intensified learning is also referred under a series of scene, reached by the appropriate decision-making of multistep To the learning process of a target, the problem of being a kind of sequence multistep decision-making.The target of intensified learning seeks to find an energy So that we obtain the strategy of cumulative maximum reward.Under the application scenarios of the wagon control of the application, a kind of possible realization In mode, the method for driving strategy model corresponding with wagon control is trained by nitrification enhancement to be included:Vehicle is being worked as Corresponding driver behavior is performed under preceding environment and state, based on driving strategy information, so that change the environment and state of itself, And obtain a reward, that is, determine feedback functional value, the feedback functional value embodies the vehicle and takes driving strategy to believe After breath, the change of state generation, in one implementation, can set state to improve, then feedback functional value is positive number, and return Feedback functional value is bigger, then state is better;Otherwise if state degenerates, feedback functional value is negative.By the setting of feedback function, The cyclic process that control institute's vehicle is interacted with its local environment, adjusts the driving strategy information of vehicle, so as to progressively train And improve driving strategy model corresponding with wagon control.In this application, the nitrification enhancement is additionally may included in by force Chemistry has merged the deeply learning algorithm of deep learning on the basis of practising, and then, trained by nitrification enhancement and car The corresponding driving strategy model of control can include intensified learning neural network model.Here, the deeply study is calculated Method can include but is not limited to Deep Q Learning, Double Q-Network, Deep Deterministic Policy Gradient (depth decision-making gradient method) etc..
And then can the drive parameter information based on vehicle, determine the vehicle by nitrification enhancement first drive Sail policy information.For example, the drive parameter information includes the velocity information, the offset track directional information of vehicle, car of vehicle With the range information and vehicle of track centerline and the range information of track edges, then based on passing through nitrification enhancement, example Such as, above-mentioned drive parameter information is input in intensified learning neutral net, then can be by exporting the steering wheel angle of vehicle The first driving strategy information such as control, brake control, Throttle Opening Control.
Here, detection device 22 can be based on the drive parameter information and the vehicle driving Rule Information, to institute State the first driving strategy information and carry out rationality checking.And then target driving strategy information determining means 23 can be based on described The testing result of rationality checking, determines the target driving strategy information of the vehicle.
In this application, the Rule Information that drives includes drive parameter information or history drive parameter letter based on input Breath, is derived by predetermined logical formula, obtains the process of certain driving strategy information.Here, the driving Rule Information It can include existing Driving Scene, known driving experience, each rule-like of the export control policy of setting.Realized in one kind In mode, it is all kinds of that the driving Rule Information can include but is not limited to avoidance rule, path planning rule, pre- aiming rule etc. It is one or more in rule.If for example, the driving Rule Information includes pre- aiming rule, the history drive parameter letter of input Breath includes barrier perception information, such as the relative position and size of front obstacle, the velocity information of current vehicle and vehicle Offset track directional information, then calculate the steering wheel angle control, brake control or throttle of vehicle by rule-based algorithm formula The corresponding driving strategy information such as control, it is rule-based such as front direction offset track direction Θ, calculate steering wheel and just should Opposite direction turns 2 Θ.
In one embodiment of the application, the detection device 22 can include first module (not shown) and the second list First (not shown), the driving Rule Information that the first module can be based on the drive parameter information and the vehicle, determines Second driving strategy information of the vehicle.In one implementation, the driving Rule Information can correspond to specifically Rule-based algorithm formula, input information is drive parameter information, and the second driving strategy information is exported by calculating, for example, input The drive parameter information includes barrier perception information, relative position and size, the speed of current vehicle such as front obstacle Spend the offset track directional information of information and vehicle, then by rule-based algorithm formula calculate vehicle steering wheel angle control, The second driving strategy information such as brake control or Throttle Opening Control, it is rule-based such as front direction offset track direction Θ, calculate Steering wheel just should 2 Θ of opposite direction turn.
Here, the second unit can be based on the second driving strategy information to the first driving strategy information into Row rationality checking.In one implementation, based on the second driving strategy information to the first driving strategy information Carrying out rationality checking can include carrying out similitude detection for the second driving strategy information and the first driving strategy information.Example Such as, it is assumed that the first driving strategy information includes:Steering wheel angle Θ, throttle degree η, brake degree γ, rule are driven based on described The second driving strategy information that then information determines includes steering wheel angle Θ ', and throttle degree η ', brake degree γ ', then can lead to Cross more specific policing parameter, i.e. steering wheel angle Θ, throttle degree η, brake degree γ determine that the second driving strategy is believed The similarity of breath and the first driving strategy information, for example, it is described similar by R=(Θ-Θ ')+(η-η ')+(γ-γ ') calculating Degree, the value of R is smaller, and the similarity of the second driving strategy information and the first driving strategy information is bigger.In one implementation, The rationality checking can be flexibly set by the comparison of the predetermined threshold and R values with the predetermined threshold of Provisioning Policy Judgment rule.If for example, the distance of the second driving strategy information and the first driving strategy information is greater than or equal to Predetermined threshold, then judge that the first driving strategy information is unreasonable, if otherwise the second driving strategy information and described the The distance of one driving strategy information is less than predetermined threshold, then judges that the first driving strategy information is reasonable.
Then, if the testing result of the rationality checking includes the second driving strategy information and driven with described first The distance of policy information is greater than or equal to predetermined threshold, then target driving strategy information determining means 23 can drive described second Sail the target driving strategy information that policy information is determined as the vehicle.And if the testing result of the rationality checking includes institute That states the second driving strategy information and the first driving strategy information is less than predetermined threshold, then equipment 1 can be by described first Driving strategy information is determined as the target driving strategy information of the vehicle.Since the result of the output of intensified learning is difficult pre- Survey, and under real unmanned or intelligent driving scene, a small error is all probably fatal.So in this Shen Please in, by the output of intensified learning as a result, i.e. the first driving strategy information, carry out rationality checking with Rule Information is driven, such as Fruit the first driving strategy information is substantially unreasonable, such as the vehicle is also deviated beyond railway line away from direction;Again Such as, the close distance in front obviously has barrier, and the first driving strategy information still selects to accelerate, then by driving Rule Information Constraint, it can be found that mistake therein and can prevent the actual execution of the first driving strategy information, and then using being based on driving Sail the second driving strategy information realization automatic Pilot operation that Rule Information determines.Here, the driving Rule Information includes base In the drive parameter information or history drive parameter information of input, derived by predetermined logical formula, obtain certain driving The process of policy information.The driving Rule Information can include but is not limited to avoidance rule, path planning rule, pre- aiming rule It is then etc. one or more in all kinds of specific rules.In one implementation, the rationality checking can include to described All kinds of specific rules that Rule Information is included are driven to be detected successively.It can set, if specific in the presence of any one is unsatisfactory for During Rule content, that is, be judged as it is unreasonable, and then by the second driving strategy information be determined as the vehicle target drive Policy information.
In one implementation, if the testing result of the rationality checking is unreasonable, it is necessary to which described second is driven Policy information is determined as the target driving strategy information of the vehicle, and exists at this time based on a variety of different driving rule letters Breath, for example, at the same time there are avoidance rule, path planning rule and it is pre- aim at rule, a kind of definite target driving strategy information can Can be achieved in that, different multiple second driving strategy information for determining of driving Rule Informations are carried out tactful superposition or Calculate the intersection of each second driving strategy so that finally the definite target driving strategy information can meet involved Whole drive rule.
In one embodiment of the application, the detection device 22 further includes third unit (not shown) and Unit the 4th (not shown), third unit can determine that the vehicle drives plan described in virtual execution first based on the drive parameter information Car status information slightly after information.In this application, the driving rule letter based on the drive parameter information and the vehicle Breath, the alternatively possible method of rationality checking is carried out to the first driving strategy information is:In virtual environment, in institute State under vehicle current environment and state, perform the first driving strategy information that the drive parameter information is determined, from And the car status information at next moment is obtained, the content of the car status information can join with the driving at next moment Number information overlaps, i.e. can include to include the various types of vehicles driving information of reflection vehicle running environment and vehicle running state. And then Unit the 4th can the driving Rule Information based on the vehicle, to the car status information carry out reasonability Detection.Here, the car status information is the direct result that the vehicle performs the first driving strategy information, therefore, Whether whether the car status information rationally directly reflects the first driving strategy information reasonable, and described virtual The execution of the first driving strategy information is carried out in environment and generates car status information can be to avoid under actual Driving Scene The unnecessary vehicle damage equivalent risk brought.Here, the virtual environment can be by emulator, simulator, such as Torcs moulds Intend the structure such as device.
Then, in one embodiment, if the testing result of the rationality checking includes the car status information category In vehicle safety scope, the target driving strategy information determining means 23 can be using the first driving strategy information as institute State the target driving strategy information of vehicle;Otherwise, the driving Rule Information based on the drive parameter information and the vehicle, it is raw Into the target driving strategy information of the vehicle.For example, perform the car status information that the first driving strategy information obtains It is that the vehicle bumps against barrier, then beyond vehicle safety scope, has deviated from avoidance rule, then the knot of the rationality checking Fruit is unreasonable, so that the driving Rule Information based on the drive parameter information and the vehicle, generates the target of the vehicle Driving strategy information.
In one embodiment of the application, the method further includes executive device (not shown), in the executive device It can be based on the target driving strategy information and perform automatic Pilot operation.Here, determined based on described by rationality checking Target driving strategy information, can perform corresponding driver behavior in real vehicles, for example, in automatic driving vehicle or Automatic Pilot operation is performed on intelligent driving vehicle.
In one embodiment of the application, the method further includes updating device (not shown), and the updating device can With the testing result based on the rationality checking, the corresponding feedback functional value of the nitrification enhancement is updated.
Here, vehicle performs corresponding driver behavior under current environment and state, based on driving strategy information, so that Change the environment and state of itself, and obtain a reward, that is, determine feedback functional value, the feedback functional value embodies institute State after vehicle takes driving strategy information, the change that state occurs, in one implementation, can set state to improve, then Feedback functional value is positive number, and feedback functional value is bigger, then state is better;Otherwise if state degenerates, feedback functional value is negative Number.By the setting of feedback function, the cyclic process for controlling institute's vehicle to be interacted with its local environment, adjusts the driving of vehicle Policy information, so that progressively training and perfect driving strategy model corresponding with wagon control.
Therefore, if the testing result based on the rationality checking, the target driving strategy information of the definite vehicle Do not include the first driving strategy information, i.e., described testing result is unreasonable, sets the corresponding feedback functional value of nitrification enhancement For negative.For example, current feedback functional value is set as -100, intensified learning neutral net can be based on feedback after each decision-making Functional value is updated neural network parameter.If feedback functional value value is smaller, the possibility that next time makes similar decision-making is got over It is small, so as to avoid next analogue, such as the occurrence of unreasonable.If conversely, the rationality checking result is closed When reason, i.e. target driving strategy correspond to the first driving strategy information, it will it is positive number to set feedback functional value.
Driving Rule Information of the application based on drive parameter information and vehicle, it is described to being determined by nitrification enhancement First driving strategy information of vehicle carries out rationality checking, and based on the testing result of the rationality checking, determines described The target driving strategy information of vehicle with realize to vehicle, particularly automatic driving vehicle, intelligent driving vehicle control. This, the application using rule-based algorithm to realizing wagon control with realizing that the method for wagon control carries out using nitrification enhancement Deeper fusion, is constrained calculating the first definite driving strategy information by nitrification enhancement with rule, led to Cross this new integration technology so that the definite method of the driving strategy of the application realizes car with existing using rule-based algorithm The method of control realizes the method for wagon control compared to more intelligent using nitrification enhancement, and improves final The reasonability and stability of definite driving strategy.
Present invention also provides a kind of equipment based on intensified learning and the definite driving strategy of rule, including:
One or more processors;
Memory;And
One or more programs, wherein one or more of programs are stored in the memory, and are configured Performed into by one or more of processors, described program includes being used to perform following operation:
Drive parameter information based on vehicle, the first driving strategy for determining the vehicle by nitrification enhancement are believed Breath;
Driving Rule Information based on the drive parameter information and the vehicle, to the first driving strategy information into Row rationality checking;
Based on the testing result of the rationality checking, the target driving strategy information of the vehicle is determined.
Further, the described program of the equipment can be also used for performing in other related embodiments based on aforesaid operations Respective operations.
Present invention also provides a kind of computer-readable recording medium, is stored thereon with computer program, the computer Program can be executed by processor following operation:
Drive parameter information based on vehicle, the first driving strategy for determining the vehicle by nitrification enhancement are believed Breath;
Driving Rule Information based on the drive parameter information and the vehicle, to the first driving strategy information into Row rationality checking;
Based on the testing result of the rationality checking, the target driving strategy information of the vehicle is determined.
Further, the computer program can be also executed by processor in other related embodiments based on aforesaid operations Respective operations.
Fig. 3 shows the exemplary system that can be used for implementing each embodiment described herein;
As shown in Figure 3 in certain embodiments, system 300 can be as described in the embodiment shown in Fig. 1, Fig. 2 or other Any one equipment 1 based on intensified learning and the definite driving strategy of rule in embodiment.In certain embodiments, system 300 may include one or more computer-readable mediums (for example, system storage or NVM/ storage devices 320) with instruction And coupled with the one or more computer-readable medium and be configured as execute instruction and perform this Shen to realize module Please described in action one or more processors (for example, (one or more) processor 305).
For one embodiment, system control module 310 may include any suitable interface controller, with to (one or It is multiple) any suitable equipment or component at least one and/or communicate with system control module 310 in processor 305 carries For any suitable interface.
System control module 310 may include Memory Controller module 330, to provide interface to system storage 315.Deposit Memory controller module 330 can be hardware module, software module and/or firmware module.
System storage 315 can be used for for example, system 300 and load and store data and/or instruction.For a reality Example is applied, system storage 315 may include any suitable volatile memory, for example, appropriate DRAM.In some embodiments In, system storage 315 may include four Synchronous Dynamic Random Access Memory of Double Data Rate type (DDR4SDRAM).
For one embodiment, system control module 310 may include one or more input/output (I/O) controller, with Interface is provided to NVM/ storage devices 320 and (one or more) communication interface 325.
For example, NVM/ storage devices 320 can be used for storing data and/or instruction.NVM/ storage devices 320 may include to appoint Anticipating appropriate nonvolatile memory (for example, flash memory) and/or may include that any suitable (one or more) is non-volatile and deposits Equipment is stored up (for example, one or more hard disk drives (HDD), one or more CD (CD) drivers and/or one or more Digital versatile disc (DVD) driver).
NVM/ storage devices 320 may include a part for the equipment being physically mounted on as system 300 Storage resource, or it can be by equipment access without the part as the equipment.For example, NVM/ storage devices 320 can Accessed by network via (one or more) communication interface 325.
(one or more) communication interface 325 can be system 300 provide interface with by one or more networks and/or with Other any appropriate equipment communications.System 300 can be in one or more wireless network standards and/or agreement any mark Accurate and/or agreement to carry out wireless communication with the one or more assemblies of wireless network.
For one embodiment, at least one in (one or more) processor 305 can be with system control module 310 The logic of one or more controllers (for example, Memory Controller module 330) is packaged together.For one embodiment, (one It is a or multiple) at least one in processor 305 can encapsulate with the logic of one or more controllers of system control module 310 Together to form system in package (SiP).It is at least one in (one or more) processor 305 for one embodiment It can be integrated in the logic of one or more controllers of system control module 310 on same mould.For one embodiment, At least one in (one or more) processor 305 can be with the logic of one or more controllers of system control module 310 It is integrated on same mould to form system-on-chip (SoC).
In various embodiments, system 300 can be, but not limited to be:Server, work station, desk-top computing device or movement Computing device (for example, lap-top computing devices, handheld computing device, tablet computer, net book etc.).In various embodiments, System 300 can have more or fewer components and/or different frameworks.For example, in certain embodiments, system 300 includes One or more video cameras, keyboard, liquid crystal display (LCD) screen (including touch screen displays), nonvolatile memory port, Mutiple antennas, graphic chips, application-specific integrated circuit (ASIC) and loudspeaker.
Obviously, those skilled in the art can carry out the application essence of the various modification and variations without departing from the application God and scope.In this way, if these modifications and variations of the application belong to the scope of the application claim and its equivalent technologies Within, then the application is also intended to comprising including these modification and variations.
It should be noted that the present invention can be carried out in the assembly of software and/or software and hardware, for example, can adopt With application-specific integrated circuit (ASIC), general purpose computer or any other realized similar to hardware device.In one embodiment In, software program of the invention can be performed by processor to realize steps described above or function.Similarly, it is of the invention Software program (including relevant data structure) can be stored in computer readable recording medium storing program for performing, for example, RAM memory, Magnetically or optically driver or floppy disc and similar devices.In addition, some steps or function of the present invention can employ hardware to realize, example Such as, as coordinating with processor so as to performing the circuit of each step or function.
In addition, the part of the present invention can be applied to computer program product, such as computer program instructions, when its quilt When computer performs, by the operation of the computer, the method according to the invention and/or technical solution can be called or provided. And the programmed instruction of the method for the present invention is called, it is possibly stored in fixed or moveable recording medium, and/or pass through Broadcast or the data flow in other signal bearing medias and be transmitted, and/or be stored according to described program instruction operation In the working storage of computer equipment.Here, including a device according to one embodiment of present invention, which includes using Memory in storage computer program instructions and processor for execute program instructions, wherein, when the computer program refers to When order is performed by the processor, method and/or skill of the device operation based on foregoing multiple embodiments according to the present invention are triggered Art scheme.
It is obvious to a person skilled in the art that the invention is not restricted to the details of above-mentioned one exemplary embodiment, Er Qie In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Profit requires rather than described above limits, it is intended that all in the implication and scope of the equivalency of claim by falling Change is included in the present invention.Any reference numeral in claim should not be considered as to the involved claim of limitation.This Outside, it is clear that one word of " comprising " is not excluded for other units or step, and odd number is not excluded for plural number.That is stated in device claim is multiple Unit or device can also be realized by a unit or device by software or hardware.The first, the second grade word is used for table Show title, and be not offered as any specific order.
The various aspects of each embodiment are defined in detail in the claims.Each reality is defined in following numbering clause Apply these and other aspects of example:
1. a kind of method based on intensified learning and the definite driving strategy of rule, wherein, the described method includes:
Drive parameter information based on vehicle, the first driving strategy for determining the vehicle by nitrification enhancement are believed Breath;
Driving Rule Information based on the drive parameter information and the vehicle, to the first driving strategy information into Row rationality checking;
Based on the testing result of the rationality checking, the target driving strategy information of the vehicle is determined.
2. according to the method described in clause 1, wherein, the driving rule based on the drive parameter information and the vehicle Then information, carrying out rationality checking to the first driving strategy information includes:
Driving Rule Information based on the drive parameter information and the vehicle, determines that the second of the vehicle drives plan Slightly information;
Rationality checking is carried out to the first driving strategy information based on the second driving strategy information.
3. according to the method described in clause 2, wherein, the testing result based on the rationality checking, determines described The target driving strategy information of vehicle includes:
If the testing result of the rationality checking includes the second driving strategy information and first driving strategy The distance of information is greater than or equal to predetermined threshold, and the target that the second driving strategy information is determined as to the vehicle drives plan Slightly information.
4. according to the method described in clause 1, wherein, the drive parameter information includes following at least any one:
The velocity information of vehicle;
The offset track directional information of vehicle;
The range information of vehicle and track centerline;
The range information of vehicle and track edges;
Barrier perception information;
Traffic sign perception information.
5. according to the method described in clause 1, wherein, the method further includes:
Automatic Pilot operation is performed based on the target driving strategy information.
6. according to the method described in clause 1, wherein, the method further includes:
Based on the testing result of the rationality checking, the corresponding feedback functional value of the nitrification enhancement is updated.
7. according to the method described in clause 6, wherein, the testing result based on the rationality checking, described in renewal The corresponding feedback functional value of nitrification enhancement includes:
If based on the testing result of the rationality checking, the target driving strategy information of the definite vehicle does not include First driving strategy information, it is negative to set the corresponding feedback functional value of nitrification enhancement.
8. a kind of equipment based on intensified learning and the definite driving strategy of rule, wherein, the described method includes:
First driving strategy information determining means, for the drive parameter information based on vehicle, pass through nitrification enhancement Determine the first driving strategy information of the vehicle;
Detection device, for the driving Rule Information based on the drive parameter information and the vehicle, to described first Driving strategy information carries out rationality checking;
Target driving strategy information determining means, for the testing result based on the rationality checking, determine the car Target driving strategy information.
9. according to the equipment described in clause 8, wherein, the detection device includes:
First module, for the driving Rule Information based on the drive parameter information and the vehicle, determines the car The second driving strategy information;
Second unit, for carrying out reasonability to the first driving strategy information based on the second driving strategy information Detection.
10. according to the equipment described in clause 9, wherein, the target driving strategy information determining means are used for:
If the testing result of the rationality checking includes the second driving strategy information and first driving strategy The distance of information is greater than or equal to predetermined threshold, and the target that the second driving strategy information is determined as to the vehicle drives plan Slightly information.
11. according to the equipment described in clause 8, wherein, the drive parameter information includes following at least any one:
The velocity information of vehicle;
The offset track directional information of vehicle;
The range information of vehicle and track centerline;
The range information of vehicle and track edges;
Barrier perception information;
Traffic sign perception information.
12. according to the equipment described in clause 8, wherein, the equipment further includes:
Executive device, for performing automatic Pilot operation based on the target driving strategy information.
13. according to the equipment described in clause 8, wherein, the equipment further includes:
Updating device, for the testing result based on the rationality checking, it is corresponding to update the nitrification enhancement Feedback functional value.
14. according to the equipment described in clause 13, wherein, the updating device is used for:
If based on the testing result of the rationality checking, the target driving strategy information of the definite vehicle does not include First driving strategy information, it is negative to set the corresponding feedback functional value of nitrification enhancement.
15. a kind of equipment based on intensified learning and the definite driving strategy of rule, including:
One or more processors;
Memory;And
One or more programs, wherein one or more of programs are stored in the memory, and are configured Performed into by one or more of processors, described program includes being used to perform the method as any one of clause 1-7.
16. a kind of computer-readable recording medium, is stored thereon with computer program, the computer program can be processed Device performs the method as any one of clause 1-7.

Claims (10)

1. a kind of method based on intensified learning and the definite driving strategy of rule, wherein, the described method includes:
Drive parameter information based on vehicle, the first driving strategy information of the vehicle is determined by nitrification enhancement;
Driving Rule Information based on the drive parameter information and the vehicle, closes the first driving strategy information Rationality detects;
Based on the testing result of the rationality checking, the target driving strategy information of the vehicle is determined.
2. according to the method described in claim 1, wherein, the driving based on the drive parameter information and the vehicle is advised Then information, carrying out rationality checking to the first driving strategy information includes:
Driving Rule Information based on the drive parameter information and the vehicle, determines the second driving strategy letter of the vehicle Breath;
Rationality checking is carried out to the first driving strategy information based on the second driving strategy information.
3. according to the method described in claim 2, wherein, the testing result based on the rationality checking, determines described The target driving strategy information of vehicle includes:
If the testing result of the rationality checking includes the second driving strategy information and the first driving strategy information Distance be greater than or equal to predetermined threshold, by the second driving strategy information be determined as the vehicle target driving strategy believe Breath.
4. according to the method described in claim 1, wherein, the drive parameter information includes following at least any one:
The velocity information of vehicle;
The offset track directional information of vehicle;
The range information of vehicle and track centerline;
The range information of vehicle and track edges;
Barrier perception information;
Traffic sign perception information.
5. according to the method described in claim 1, wherein, the method further includes:
Automatic Pilot operation is performed based on the target driving strategy information.
6. according to the method described in claim 1, wherein, the method further includes:
Based on the testing result of the rationality checking, the corresponding feedback functional value of the nitrification enhancement is updated.
7. according to the method described in claim 6, wherein, the testing result based on the rationality checking, described in renewal The corresponding feedback functional value of nitrification enhancement includes:
If based on the testing result of the rationality checking, the target driving strategy information of the definite vehicle does not include first Driving strategy information, it is negative to set the corresponding feedback functional value of nitrification enhancement.
8. a kind of equipment based on intensified learning and the definite driving strategy of rule, wherein, the described method includes:
First driving strategy information determining means, for the drive parameter information based on vehicle, are determined by nitrification enhancement First driving strategy information of the vehicle;
Detection device, for the driving Rule Information based on the drive parameter information and the vehicle, drives to described first Policy information carries out rationality checking;
Target driving strategy information determining means, for the testing result based on the rationality checking, determine the vehicle Target driving strategy information.
9. a kind of equipment based on intensified learning and the definite driving strategy of rule, including:
One or more processors;
Memory;And
One or more programs, wherein one or more of programs are stored in the memory, and be configured to by One or more of processors perform, and described program includes being used to perform the method as any one of claim 1-7.
10. a kind of computer-readable recording medium, is stored thereon with computer program, the computer program can be held by processor Method of the row as any one of claim 1-7.
CN201711257834.0A 2017-12-01 2017-12-01 Method and equipment for determining driving strategy based on reinforcement learning and rules Active CN108009587B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711257834.0A CN108009587B (en) 2017-12-01 2017-12-01 Method and equipment for determining driving strategy based on reinforcement learning and rules

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711257834.0A CN108009587B (en) 2017-12-01 2017-12-01 Method and equipment for determining driving strategy based on reinforcement learning and rules

Publications (2)

Publication Number Publication Date
CN108009587A true CN108009587A (en) 2018-05-08
CN108009587B CN108009587B (en) 2021-04-16

Family

ID=62056248

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711257834.0A Active CN108009587B (en) 2017-12-01 2017-12-01 Method and equipment for determining driving strategy based on reinforcement learning and rules

Country Status (1)

Country Link
CN (1) CN108009587B (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108803604A (en) * 2018-06-06 2018-11-13 深圳市易成自动驾驶技术有限公司 Vehicular automatic driving method, apparatus and computer readable storage medium
CN109255442A (en) * 2018-09-27 2019-01-22 北京百度网讯科技有限公司 Training method, equipment and the readable medium of control decision module based on artificial intelligence
CN109407662A (en) * 2018-08-31 2019-03-01 百度在线网络技术(北京)有限公司 Automatic driving vehicle control method and device
CN109492835A (en) * 2018-12-28 2019-03-19 东软睿驰汽车技术(沈阳)有限公司 Determination method, model training method and the relevant apparatus of vehicle control information
CN109727470A (en) * 2019-01-08 2019-05-07 清华大学 A kind of current decision-making technique of Distributed Intelligent Network connection automobile intersection complex scene
CN109808704A (en) * 2019-01-15 2019-05-28 北京百度网讯科技有限公司 Driving strategy management method, device and equipment
CN109839937A (en) * 2019-03-12 2019-06-04 百度在线网络技术(北京)有限公司 Determine method, apparatus, the computer equipment of Vehicular automatic driving planning strategy
CN109991987A (en) * 2019-04-29 2019-07-09 北京智行者科技有限公司 Automatic Pilot decision-making technique and device
CN110304045A (en) * 2019-06-25 2019-10-08 中国科学院自动化研究所 Intelligent driving transverse direction lane-change decision-making technique, system and device
CN110673602A (en) * 2019-10-24 2020-01-10 驭势科技(北京)有限公司 Reinforced learning model, vehicle automatic driving decision method and vehicle-mounted equipment
CN110703766A (en) * 2019-11-07 2020-01-17 南京航空航天大学 Unmanned aerial vehicle path planning method based on transfer learning strategy deep Q network
CN110850854A (en) * 2018-07-27 2020-02-28 通用汽车环球科技运作有限责任公司 Autonomous driver agent and policy server for providing policies to autonomous driver agents
CN111026127A (en) * 2019-12-27 2020-04-17 南京大学 Automatic driving decision method and system based on partially observable transfer reinforcement learning
CN111338227A (en) * 2020-05-18 2020-06-26 南京三满互联网络科技有限公司 Electronic appliance control method and control device based on reinforcement learning and storage medium
TWI703065B (en) * 2018-12-18 2020-09-01 大陸商北京航跡科技有限公司 Systems and methods for determining driving action in autonomous driving
CN111679660A (en) * 2020-06-16 2020-09-18 中国科学院深圳先进技术研究院 Unmanned deep reinforcement learning method integrating human-like driving behaviors
CN112198794A (en) * 2020-09-18 2021-01-08 哈尔滨理工大学 Unmanned driving method based on human-like driving rule and improved depth certainty strategy gradient
CN112295237A (en) * 2020-10-19 2021-02-02 深圳大学 Deep reinforcement learning-based decision-making method
CN113015981A (en) * 2018-11-16 2021-06-22 华为技术有限公司 System and method for efficient, continuous and safe learning using first principles and constraints
CN113044064A (en) * 2021-04-01 2021-06-29 南京大学 Vehicle self-adaptive automatic driving decision method and system based on meta reinforcement learning
US20210229687A1 (en) * 2020-01-29 2021-07-29 Toyota Jidosha Kabushiki Kaisha Vehicle controller, vehicle control system, vehicle control method, and vehicle control system control method
CN113449823A (en) * 2021-08-31 2021-09-28 成都深蓝思维信息技术有限公司 Automatic driving model training method and data processing equipment
CN114261400A (en) * 2022-01-07 2022-04-01 京东鲲鹏(江苏)科技有限公司 Automatic driving decision-making method, device, equipment and storage medium
CN114527737A (en) * 2020-11-06 2022-05-24 百度在线网络技术(北京)有限公司 Speed planning method, device, equipment, medium and vehicle for automatic driving

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008127465A1 (en) * 2007-04-11 2008-10-23 Nec Laboratories America, Inc. Real-time driving danger level prediction
CN105059287A (en) * 2015-07-31 2015-11-18 奇瑞汽车股份有限公司 Lane keeping method and device
CN106289797A (en) * 2016-07-19 2017-01-04 百度在线网络技术(北京)有限公司 For the method and apparatus testing automatic driving vehicle
CN106347359A (en) * 2016-09-14 2017-01-25 北京百度网讯科技有限公司 Method and device for operating autonomous vehicle
CN107168303A (en) * 2017-03-16 2017-09-15 中国科学院深圳先进技术研究院 A kind of automatic Pilot method and device of automobile
CN107229973A (en) * 2017-05-12 2017-10-03 中国科学院深圳先进技术研究院 The generation method and device of a kind of tactful network model for Vehicular automatic driving

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008127465A1 (en) * 2007-04-11 2008-10-23 Nec Laboratories America, Inc. Real-time driving danger level prediction
CN105059287A (en) * 2015-07-31 2015-11-18 奇瑞汽车股份有限公司 Lane keeping method and device
CN106289797A (en) * 2016-07-19 2017-01-04 百度在线网络技术(北京)有限公司 For the method and apparatus testing automatic driving vehicle
CN106347359A (en) * 2016-09-14 2017-01-25 北京百度网讯科技有限公司 Method and device for operating autonomous vehicle
CN107168303A (en) * 2017-03-16 2017-09-15 中国科学院深圳先进技术研究院 A kind of automatic Pilot method and device of automobile
CN107229973A (en) * 2017-05-12 2017-10-03 中国科学院深圳先进技术研究院 The generation method and device of a kind of tactful network model for Vehicular automatic driving

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
夏伟等: "基于深度强化学习的自动驾驶策略学习方法", 《集成技术》 *

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108803604A (en) * 2018-06-06 2018-11-13 深圳市易成自动驾驶技术有限公司 Vehicular automatic driving method, apparatus and computer readable storage medium
CN110850854A (en) * 2018-07-27 2020-02-28 通用汽车环球科技运作有限责任公司 Autonomous driver agent and policy server for providing policies to autonomous driver agents
CN109407662A (en) * 2018-08-31 2019-03-01 百度在线网络技术(北京)有限公司 Automatic driving vehicle control method and device
CN109407662B (en) * 2018-08-31 2022-10-14 百度在线网络技术(北京)有限公司 Unmanned vehicle control method and device
CN109255442A (en) * 2018-09-27 2019-01-22 北京百度网讯科技有限公司 Training method, equipment and the readable medium of control decision module based on artificial intelligence
CN113015981A (en) * 2018-11-16 2021-06-22 华为技术有限公司 System and method for efficient, continuous and safe learning using first principles and constraints
TWI703065B (en) * 2018-12-18 2020-09-01 大陸商北京航跡科技有限公司 Systems and methods for determining driving action in autonomous driving
US11155264B2 (en) 2018-12-18 2021-10-26 Beijing Voyager Technology Co., Ltd. Systems and methods for determining driving action in autonomous driving
CN109492835A (en) * 2018-12-28 2019-03-19 东软睿驰汽车技术(沈阳)有限公司 Determination method, model training method and the relevant apparatus of vehicle control information
CN109727470A (en) * 2019-01-08 2019-05-07 清华大学 A kind of current decision-making technique of Distributed Intelligent Network connection automobile intersection complex scene
CN109808704B (en) * 2019-01-15 2023-03-14 北京百度网讯科技有限公司 Driving strategy management method, device and equipment
CN109808704A (en) * 2019-01-15 2019-05-28 北京百度网讯科技有限公司 Driving strategy management method, device and equipment
CN109839937A (en) * 2019-03-12 2019-06-04 百度在线网络技术(北京)有限公司 Determine method, apparatus, the computer equipment of Vehicular automatic driving planning strategy
CN109991987A (en) * 2019-04-29 2019-07-09 北京智行者科技有限公司 Automatic Pilot decision-making technique and device
CN110304045A (en) * 2019-06-25 2019-10-08 中国科学院自动化研究所 Intelligent driving transverse direction lane-change decision-making technique, system and device
CN110673602B (en) * 2019-10-24 2022-11-25 驭势科技(北京)有限公司 Reinforced learning model, vehicle automatic driving decision method and vehicle-mounted equipment
CN110673602A (en) * 2019-10-24 2020-01-10 驭势科技(北京)有限公司 Reinforced learning model, vehicle automatic driving decision method and vehicle-mounted equipment
CN110703766B (en) * 2019-11-07 2022-01-11 南京航空航天大学 Unmanned aerial vehicle path planning method based on transfer learning strategy deep Q network
CN110703766A (en) * 2019-11-07 2020-01-17 南京航空航天大学 Unmanned aerial vehicle path planning method based on transfer learning strategy deep Q network
CN111026127B (en) * 2019-12-27 2021-09-28 南京大学 Automatic driving decision method and system based on partially observable transfer reinforcement learning
CN111026127A (en) * 2019-12-27 2020-04-17 南京大学 Automatic driving decision method and system based on partially observable transfer reinforcement learning
US20210229687A1 (en) * 2020-01-29 2021-07-29 Toyota Jidosha Kabushiki Kaisha Vehicle controller, vehicle control system, vehicle control method, and vehicle control system control method
CN111338227A (en) * 2020-05-18 2020-06-26 南京三满互联网络科技有限公司 Electronic appliance control method and control device based on reinforcement learning and storage medium
CN111679660A (en) * 2020-06-16 2020-09-18 中国科学院深圳先进技术研究院 Unmanned deep reinforcement learning method integrating human-like driving behaviors
CN111679660B (en) * 2020-06-16 2022-08-05 中国科学院深圳先进技术研究院 Unmanned deep reinforcement learning method integrating human-like driving behaviors
CN112198794A (en) * 2020-09-18 2021-01-08 哈尔滨理工大学 Unmanned driving method based on human-like driving rule and improved depth certainty strategy gradient
CN112295237A (en) * 2020-10-19 2021-02-02 深圳大学 Deep reinforcement learning-based decision-making method
CN114527737A (en) * 2020-11-06 2022-05-24 百度在线网络技术(北京)有限公司 Speed planning method, device, equipment, medium and vehicle for automatic driving
CN113044064B (en) * 2021-04-01 2022-07-29 南京大学 Vehicle self-adaptive automatic driving decision method and system based on meta reinforcement learning
CN113044064A (en) * 2021-04-01 2021-06-29 南京大学 Vehicle self-adaptive automatic driving decision method and system based on meta reinforcement learning
CN113449823A (en) * 2021-08-31 2021-09-28 成都深蓝思维信息技术有限公司 Automatic driving model training method and data processing equipment
CN114261400A (en) * 2022-01-07 2022-04-01 京东鲲鹏(江苏)科技有限公司 Automatic driving decision-making method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN108009587B (en) 2021-04-16

Similar Documents

Publication Publication Date Title
CN108009587A (en) A kind of method and apparatus based on intensified learning and the definite driving strategy of rule
CN107862346A (en) A kind of method and apparatus for carrying out driving strategy model training
Palanisamy Multi-agent connected autonomous driving using deep reinforcement learning
Althoff et al. Automatic generation of safety-critical test scenarios for collision avoidance of road vehicles
Lefevre et al. A learning-based framework for velocity control in autonomous driving
Hanselmann et al. King: Generating safety-critical driving scenarios for robust imitation via kinematics gradients
CN109991987B (en) Automatic driving decision-making method and device
Ryan et al. End-to-end autonomous driving risk analysis: A behavioural anomaly detection approach
Lu et al. Imitation is not enough: Robustifying imitation with reinforcement learning for challenging driving scenarios
Camara et al. Evaluating pedestrian interaction preferences with a game theoretic autonomous vehicle in virtual reality
Siebinga et al. A human factors approach to validating driver models for interaction-aware automated vehicles
Li et al. Incrementally constrained dynamic optimization: A computational framework for lane change motion planning of connected and automated vehicles
CN104504170A (en) Animated simulation method and system of vehicle
Muzahid et al. Learning-based conceptual framework for threat assessment of multiple vehicle collision in autonomous driving
Mavrogiannis et al. B-gap: Behavior-guided action prediction for autonomous navigation
CN115601954A (en) Lane changing judgment method, device, equipment and medium for intelligent internet motorcade
CN113848888B (en) AGV forklift path planning method, device, equipment and storage medium
Hickling et al. Explainability in Deep Reinforcement Learning: A Review into Current Methods and Applications
Benrachou et al. Use of social interaction and intention to improve motion prediction within automated vehicle framework: A review
Kuru TrustFSDV: Framework for building and maintaining trust in self-driving vehicles
Youssef et al. Comparative study of end-to-end deep learning methods for self-driving car
Ilievski Wisebench: A motion planning benchmarking framework for autonomous vehicles
Ghimire et al. Lane Change Decision-Making through Deep Reinforcement Learning
CN115719547A (en) Traffic participant trajectory prediction method and system based on multiple interactive behaviors
Arbabi et al. Planning for autonomous driving via interaction-aware probabilistic action policies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant