CN115071758A - Man-machine common driving control right switching method based on reinforcement learning - Google Patents
Man-machine common driving control right switching method based on reinforcement learning Download PDFInfo
- Publication number
- CN115071758A CN115071758A CN202210758672.3A CN202210758672A CN115071758A CN 115071758 A CN115071758 A CN 115071758A CN 202210758672 A CN202210758672 A CN 202210758672A CN 115071758 A CN115071758 A CN 115071758A
- Authority
- CN
- China
- Prior art keywords
- driving
- driver
- vehicle
- current
- road
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W60/00—Drive control systems specially adapted for autonomous road vehicles
- B60W60/005—Handover processes
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W60/00—Drive control systems specially adapted for autonomous road vehicles
- B60W60/005—Handover processes
- B60W60/0059—Estimation of the risk associated with autonomous or manual driving, e.g. situation too complex, sensor failure or driver incapacity
Landscapes
- Engineering & Computer Science (AREA)
- Automation & Control Theory (AREA)
- Human Computer Interaction (AREA)
- Transportation (AREA)
- Mechanical Engineering (AREA)
- Traffic Control Systems (AREA)
Abstract
The application discloses a reinforcement learning-based man-machine driving sharing control right switching method, which is suitable for distribution of a reinforcement learning-based man-machine driving sharing control right switching system to driving weights between a driver and a driving system, and comprises the following steps: calculating a driving operation action prediction index according to the driver information and the vehicle road prediction information; and inputting the driving operation action prediction index and the comprehensive driving operation action index into the control weight switching system, and calculating the driving weight between the driver and the driving system. Through the technical scheme in the application, the risk of longitudinal and transverse synthesis of the vehicle is effectively solved, the influence of uncertainty caused by a driver is weakened, and the driver is comprehensively considered at different angles, so that the judgment error of the driver is reduced.
Description
Technical Field
The application relates to the technical field of intelligent driving, in particular to a man-machine driving sharing control right switching method based on reinforcement learning.
Background
In the conventional automatic driving technology, a control right switching mode is generally adopted to correct the driving behavior of a driver so as to improve the driving safety of a vehicle.
For example, in patent CN 109795486 a, the common driving coefficient (range is 0-1) is dynamically adjusted according to the input torque Td of the driver and the time TLC from the left and right wheels to the lane boundary, so as to realize gradual transition from the driver to the auxiliary control system, and the common driving coefficient at this time is determined through fuzzy control. However, this approach, while addressing the risk of lateral deviation from driving, does not take into account the longitudinal risks during driving.
For another example, patent CN 108469806 a performs key factor construction on the current driving environment, the state of the vehicle and the driver, performs situational assessment on the key factors, and synchronously assesses the driving abilities of the automatic driving system and the driver, so as to determine whether the driving right transfer can be performed. Although a plurality of factors which may affect driving safety are considered in the scheme, the evaluation mode of the driving capacity in the driving right switching process is too complex, subjective and random factors are large, the considered data are too much, and the instantaneity and the stability are poor.
And quantifying the environmental risk as in a thesis 'human-computer co-driving model based on a driver risk response mechanism', obtaining a safety risk response strategy by fitting the environmental risk action and the driving acceleration of the driver, and flexibly switching the human-computer co-driving control right through strategy deviation. The safety control method solves the coupling problem of the state of a driver and the environmental safety, but the safety strategy is established on a large number of driving segments which cannot completely summarize all safety operations and only solves the switching problem when the driver overtakes the following vehicles on the highway. Meanwhile, the control right switching mode only considers the safety problem at the current moment and does not consider traffic hazards possibly caused in the future time period.
Therefore, the safety and stability of the control right switching scheme in the existing automatic driving need to be improved.
Disclosure of Invention
The purpose of this application lies in: how to effectively solve the risk of vehicle vertically and transversely synthesizing reduces the judgement error to the driver in order to improve the accuracy and the security of driving right switching.
The technical scheme of the application is as follows: the utility model provides a reinforcement learning-based man-machine common driving control right switching method, which comprises the following steps: the method is suitable for the distribution of the man-machine common driving control right switching system based on reinforcement learning to the driving weight between the driver and the driving system, and comprises the following steps: calculating a driving operation action prediction index according to the driver information and the vehicle road prediction information; and inputting the driving operation action prediction index and the comprehensive driving operation action index into a control right switching system, and calculating the driving weight between the driver and the driving system.
In any of the above technical solutions, further, the driver information at least includes a driver state, a driver intention, a driver style, and a driver subconscious driving influence deviation, the vehicle path prediction information at least includes a predicted vehicle path risk degree and a predicted vehicle path risk threshold,
the calculation formula of the driving operation action prediction index is as follows:
in the formula (I), the compound is shown in the specification,for predicting an index of driving performance, Z t Delay of the driver 'S state response, σ is the driver' S subconscious driving influence deviation, δ is the driver 'S intention, S is the driver' S style, v risk To predict the degree of risk of the vehicle road, A arisk To predict a threshold vehicle route hazard.
In any of the above technical solutions, further, the calculation formula of the driver subconscious driving influence deviation σ is:
R d =|d-q ki |
wherein sigma is the driver subconscious driving influence deviation, sum is the collected traffic scene number, and D i For a series of subconscious driving strengths in a traffic scene time period, rho', tau and omega are undetermined parameters, alpha is subconscious side weight, beta is personal safety tendency weight of a driver, d is the current transverse position of a vehicle, and q is the current transverse position of the vehicle ki Is the fitted lateral position of the vehicle under this scenario (label), a is the vehicle acceleration, R d Is a location parameter.
In any one of the above technical solutions, further, the driver information at least includes a driver state, a driver intention, and a driver style, and the calculation process of the comprehensive driving operation action index specifically includes:
determining current vehicle path information according to the position of a current vehicle in a road, wherein the current vehicle path information at least comprises a current vehicle path danger degree and a current vehicle path danger threshold;
determining a comprehensive driving operation action index by combining an environmental response factor and a piecewise function according to the driver information and the current vehicle path information, wherein the calculation formula of the comprehensive driving operation action index is as follows:
in the formula (I), the compound is shown in the specification,for the comprehensive driving operation action index, z 1 For driver state, gamma is the environmental response factor, H x,y For the current vehicle road risk, σ is a road correction parameter, a pre For real-time operation of the quantized parameters, risk is the current vehicle route risk threshold.
In any one of the above technical solutions, further, determining the current vehicle path information according to the position of the current vehicle on the road specifically includes:
determining the position of the current vehicle in the road, wherein the position at least comprises the distance between the current vehicle and the front vehicle and the transverse position of the current vehicle;
determining a longitudinal vehicle road danger value according to the distance between the current vehicle and the front vehicle;
determining a transverse vehicle road danger value according to the transverse position of the current vehicle;
calculating the current vehicle road danger degree according to the longitudinal vehicle road danger value and the transverse vehicle road danger value, wherein the corresponding calculation formula is as follows:
in the formula, H x,y The current degree of danger of the vehicle road is,the risk distance influence factors of different road sections have the value range of [1,10 ]],y 1 As longitudinal road hazard value, y 2 Is a lateral vehicle road danger value;
and calculating current vehicle road danger thresholds of different scenes according to the current vehicle road danger degree, and recording the current vehicle road danger threshold and the current vehicle road danger degree as current vehicle road information.
In any of the above technical solutions, further, the calculation formula of the environmental response factor γ is:
wherein M is the vehicle mass, M is the vehicle type and purpose correction parameter, k 1 In order to correct the parameters for the dynamics,representing the desired speed and direction of speed, v, of the vehicle limleast (t) is the minimum velocity value, k 2 The parameters are corrected for the traffic scene,as a vehicle interaction force parameter, k 3 A correction parameter for the degree to which the pedestrian complies with the traffic regulations,is a pedestrian interaction force parameter, k 4 The parameters are corrected for the complexity of the surrounding physical environment,as an environmental interaction force parameter, k 5 A correction parameter for the degree of influence of the traffic regulations,are rule parameters.
In any one of the above technical solutions, further, calculating a driving weight between the driver and the driving system specifically includes: step 9.1, using the Z-score standardized formula, predicting the index of the driving operation action at the current momentAnd comprehensive driving operation action indexNormalizing, and calculating the prediction index of the driving operation from the beginning to the current driving operation during the drivingAnd comprehensive driving operation action indexMean and standard deviation of; step 9.2, driving operation action prediction index after Z-Score standardizationAnd comprehensive driving operation action indexInputting the current corresponding mean value and standard deviation as input parameters into a human-computer co-driving control right switching system based on reinforcement learning to judge whether weight distribution conditions are met, if so, executing the step 9.3, and if not, acquiring driver information and vehicle path prediction information again; step 9.3, based on the Q learning algorithm, adjusting the learning state in the Q learning algorithm by using the input parameters, and performing the driving weight of the driver according to the action in the value maximum value of the next state in the Q learning algorithmAnd assigning, wherein the driving weight of the driving system is the difference between 1 and the driving weight of the driver.
In any one of the above technical solutions, further, the weight assignment condition specifically includes: 5 times in succession of the first parameterAnd a second parameterAre both less than or equal to a first trigger threshold; or, the second parameter is continued for 3 timesLess than or equal to a second trigger threshold; or, the first parameter is continued for 3 timesLess than or equal to a second trigger threshold, wherein the first parameterPredicting index for currently inputted driving operation actionAnd a driving operation action prediction index corresponding to all the inputs from the driving behavior to the current timeBy the number of standard deviations, second parameterFor the currently inputted comprehensive driving operation action indexAnd the comprehensive driving operation action index of all the input from the driving behavior to the current timeBy the number of standard deviations.
The beneficial effect of this application is:
according to the technical scheme, the risk of longitudinal and transverse integration of the vehicle is effectively solved, the influence of uncertainty caused by a driver is weakened, the driver is comprehensively considered from different angles, so that the judgment error of the driver is reduced, the method is suitable for multiple traffic scenes, traffic dangers possibly caused in future time periods are comprehensively considered, the accuracy and the safety of driving right switching are further improved, finally, all factors are integrated into two index input switching systems, the data volume is small and accurate, and the real-time performance is higher.
In the preferred implementation mode of the application, the influence of experience and subconscious of a driver on driving is considered, the judgment burden of a switching system is reduced, and the real-time performance is better. And the risk that other vehicles may cause the vehicle can be predicted in advance, and the rear-end collision and collision in the driving process are avoided.
Drawings
The advantages of the above and/or additional aspects of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flow chart of a reinforcement learning-based human-machine co-driving control right switching method according to an embodiment of the present application;
FIG. 2 is a diagram of relative positions of roads and relative safe positions according to one embodiment of the present application;
FIG. 3 is a schematic diagram of a model-free reinforcement learning process according to an embodiment of the present application;
fig. 4 is a schematic diagram illustrating an overall structure of a reinforcement learning-based human-machine co-driving control right switching mechanism according to an embodiment of the present application;
FIG. 5 is a diagram illustrating Q-tables in a Q-learning algorithm in reinforcement learning according to an embodiment of the present application.
Detailed Description
In order that the above objects, features and advantages of the present application can be more clearly understood, the present application will be described in further detail with reference to the accompanying drawings and detailed description. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, however, the present application may be practiced in other ways than those described herein, and therefore the scope of the present application is not limited by the specific embodiments disclosed below.
As shown in fig. 1, the present embodiment provides a method for switching a driving control right of a human-computer based on reinforcement learning, including:
further, step 1 is realized by:
step 1.1, simulator hardware needs to have a camera for acquiring a human image of a driver and a driving operation environment for simulating a real vehicle;
step 1.2, constructing a large number of typical traffic environments which can be met in the real world, wherein the typical traffic environments comprise a full-type road section car following scene, a full-type road section overtaking scene, a road intersection scene, a congestion road section scene and the like;
step 1.3, inserting a certain number of dangerous traffic scenes and accident simulation scenes in the construction of different typical traffic environment scenes.
further, step 2 may include the following processes:
step 2.1, the driver needs to complete a complete driving process in different scenes;
step 2.2, under the condition of no intervention of a control right switching system, a driver needs to drive normally in a certain amount of different driving scenes, collect and record the driving operation and road conditions of the driver in the driving process, obtain the style of the driver through statistical analysis, and calculate the subconscious driving influence deviation of the driver (the subconscious driving operation influence of the driver is described by the acceleration and deceleration caused by the experience accumulated by the driver in different driving scenes and the change of the transverse position of the road):
wherein, the calculation formula of the driver subconscious driving influence deviation is as follows:
R d =|d-q ki |
wherein sigma is the driver subconscious driving influence deviation, sum is the collected traffic scene number, and D i For a series of subconscious driving strengths in a traffic scene time period, rho', tau and omega are undetermined parameters, alpha is subconscious side weight, beta is personal safety tendency weight of a driver, d is the current transverse position of a vehicle, and q is the current transverse position of the vehicle ki Is the fitted lateral position of the vehicle under this scenario (label), a is the vehicle acceleration, R d Is a location parameter.
Specifically, the driver subconscious driving influence deviation does not consider the influence of other traffic participants, and only considers from the perspective of personal safety. Based on the maximum entropy principle, a maximum entropy method related to the subconscious of the driver is established.
Firstly, an entropy function is constructed:
where H (x) is entropy, a measure of uncertainty in the thing; p is a radical of k Is a probability distribution; c is a constant, depending on the measure of entropy, here taken to be 1.
At the entropy functionIn particular, it is desirable that the driver's subconscious driving influence deviation, i.e. how subconscious the environment is currently in, has an influence on the behavior, but due to the probability distribution p k Is a decimal fraction of 0 to 1, such that log 2 p k Is a negative number, so a non-negative integer q is introduced in the embodiment i Substituting probability distribution p in entropy function k 。
Defining a parameter q i The method is characterized in that relative safety positions of different road scenes are provided, wherein the relative positions are shown in fig. 2, a transverse coordinate axis is established by taking the left side of a road as an origin, the half width of a single lane of the road is taken as a driving position, the road is divided into eight areas, and the relative safety positions are that more than half of vehicles are located at the position when the vehicles normally drive.
Roads in different scenes have large difference, and specific positions which can be used for completely representing road paths cannot be obtained accurately, so that a mode that ln replaces log with base 2 is adopted, and q is used as a base i The main value is an integer greater than one, so the negative sign of the original entropy function needs to be removed, and the difference can be expressed by the following correction entropy:
secondly, establishing a constraint condition of the correction entropy: first, the road conditions are constrained, and each driver will choose the side with good road conditions. Second, under the constraints of traffic regulations, drivers tend to drive more as specified by traffic regulations. Thirdly, the constraint of traffic demand, namely whether the driver needs to overtake, follow or go straight in the road scene, is as follows:
constraint 3: b (q) i )∈S
In the formula, A min 、A max The lower limit and the upper limit of the road traffic capacity score are set;interference coefficients for unfamiliar degrees of different road sections; b is a traffic demand impact weight; b is the maximum boundary of the traffic rule; b (q) i ) For the determination of traffic demands, i.e. knowing whether a traffic demand is overtaking, following or going straight, q is estimated by the demand i (ii) a And S is a traffic demand set, and all normal driving behavior position results are in the set.
Three constraint conditions and different road scenes are set, the correction entropy is used for calculation, and the relative safe position q of the different road scenes is obtained when the value of the correction entropy E is maximum i For a relative safety position q i Clustering is performed, labels are marked for each type (such as overtaking, following and going straight), and then the relative safety position q is determined i Fitting to obtain a fitted transverse position q ki This position is the safe position that this driver is most inclined to walk under different labels.
In summary, fitting the lateral position q ki For correcting the relative safe position q when the entropy E value is maximum under the constraint condition i 。
D i The calculation formula of a series of subconscious driving strengths in one time period of the traffic scene is as follows:
R d =|d-q ki |
where d is the current lateral position of the vehicle and q is ki The fitting transverse position of the vehicle in the scene (label) is shown as a, the acceleration of the vehicle is shown as a, the subconscious side weight is shown as alpha, the personal safety tendency weight of a driver is shown as beta, rho' is shown as a undetermined parameter, the value of the undetermined parameter meets the change trend of subconscious driving strength in different traffic scenes, and the change trend is as follows:
when R is d More than or equal to Z (Z is a safety value and a set value, and the values of different roads are differentSample), subconscious Driving Strength D i When the value of (a) is larger, the values of the undetermined parameters rho', tau and omega follow R d And | a | increases, i.e., becomes more and more unsafe, at which time the greater the strength of the subconsciously driven operational action;
when R is d <Z、D i When the value of (a) is small, the values of the undetermined parameters rho', tau and omega follow R d And | a | decreases, i.e., becomes more and more safe, at which time the strength of the subconsciously driven operational action is less.
sum is the number of collected traffic scenes, and the result sigma of the averaging of the intensity is the subconscious driving influence deviation of the driver.
Step 2.3, simulating conditions possibly encountered in the real driving process by a driver, such as dangerous states of fatigue, emotional excitement, distraction and the like, and normal driving;
and 2.4, collecting data to obtain the speed, distance and road surface information of surrounding vehicles, the brake, accelerator and steering wheel data of the own vehicle, the driving weight distribution and the intention and operation data of a driver in a driving system, and obtaining the state and intention information of the driver in a data statistical processing mode.
further, step 3 is realized by:
and 3.1, the environmental response factor gamma is an interaction force under the influence of the interaction of the vehicle and the vehicle road environment, and particularly responds to different units. The environmental response factor γ was calculated using the following formula:
the formula:
v limleast (t) is the minimum speed value of the speed limit and the vehicle speed in the current time period scene;
m is the mass of the vehicle;
m is a vehicle type and a target correction parameter;
representing the desired speed and direction of speed of the vehicle, it is derived from Newton's second law and kinematic formula.
k 1 Correcting parameters for dynamics;
k 2 parameters are corrected for traffic scenarios, such as highway segments, congested road segments, etc.,is an interaction force with other vehicles, wherein the vehicle interaction force parameterComprises the following steps:
θ 1l is the angle between the direction of travel of the vehicle and the direction of travel of other vehicles, Δ v 1l /Δμ 1l The expression that u is the safe distance and ρ is the distance to other vehicles in the ratio of the speed difference to the distance difference indicates that a distance greater than the safe distance represents an attractive force, the attractive force is smaller as the distance is closer to the safe distance, and the attractive force is converted into a repulsive force when the distance is smaller than the safe distance, and the repulsive force is larger as the distance is closer to other vehicles. The vehicles in the transverse parallel position and the parallel advancing direction do not have interaction force, and the absolute value of the interaction force of the longitudinal same laneAnd max.
k 3 A correction parameter for the degree to which the pedestrian complies with the traffic regulations,is an interaction force with a pedestrian, wherein the pedestrian interaction force parameterComprises the following steps:
v is the current speed of the vehicle, θ 1j The angle between the center of the vehicle head and the pedestrian r 1j Is the distance difference, t 1j The formula shows that when the vehicle speed is 0, no interaction force exists between the vehicle and the pedestrian, the closer the vehicle and the pedestrian are, the smaller the angle difference is, the shorter the estimated meeting time is, and the higher the vehicle speed is, the larger the repulsive force is caused.
k 4 The parameters are corrected for the complexity of the surrounding physical environment,is an interaction force with a surrounding physical environment such as a non-moving object like a building, wherein the environmental interaction force parameterComprises the following steps:
t is the volume of the non-moving object, the larger the volume is, the larger the repulsive force is, when the volume is smaller than or equal to the passable size of the vehicle, the interaction force is attractive force, when the volume is larger than the passable size of the vehicle, when the collision time T is 1R The smaller the repulsion force, the greater the repulsion force when the vehicle mass is greater, and the greater the vehicle speed, the greater the exclusion forceLarge, at a speed of 0, there is no interaction force.
k 5 In order to reflect the attention degree of the vehicle to the traffic regulation as the correction parameter of the influence degree of the traffic regulation,acting as a resistance to traffic regulations, wherein the regulation parametersComprises the following steps:
v lim the maximum speed is limited for the traffic regulations and the traffic signs, the lower the limited speed is, the larger the resistance is, and when the traffic regulations and the traffic signs require parking, the resistance is infinite under the red light condition.
Further, step 4 is implemented by:
step 4.1, extracting brake force, accelerator force and steering wheel angle data through a sensor;
and 4.2, normalizing the three data by using min-max standardization:
braking:
accelerator:
steering wheel corner:
the value is the current value, min is the minimum value, and max is the maximum value;
the operation specification can know that the accelerator and the brake are mutually exclusive operations, so the normalization results are combined as follows:
longitudinal operation interval: [ -1:1 ];
the transverse operation interval: [ -1:1 ];
step 4.3, for the longitudinal operation interval and the transverse operation interval: -1:1, constructing bijections from-1: 1, -1:1 to-1:
longitudinal value of (0. a) 1 a 2 a 3 a 4 …) and has a transverse value of (0. b) 1 b 2 b 3 b 4 …), constructing a crossover method, segmenting the two types of decimal, segmenting after all non-0 digits, and performing crossover recombination on the segmented segments to obtain a one-dimensional real-time operation quantization parameter a pre 。
And 5, determining current vehicle path information according to the position of the current vehicle on the road, wherein the current vehicle path information at least comprises the current vehicle path danger degree and the current vehicle path danger threshold.
Further, step 5 is implemented by:
and 5.1, determining the position of the current vehicle in the road, wherein the position at least comprises the distance between the current vehicle and the front vehicle and the transverse position of the current vehicle, and determining a longitudinal vehicle road danger value according to the distance between the current vehicle and the front vehicle.
The risk of the longitudinal position is inversely proportional to the distance from the tail of the front vehicle, namely the closer the distance from the tail of the front vehicle, the greater the risk, the longitudinal vehicle road danger function is established, the tail of the front vehicle is used as the original point to set a coordinate axis, and the specified normal safe distance is zeta 1 . Setting a minimum safety distance eta 1 ,η 1 Is set to the maximum deceleration braking to the just-no-collision distance of the preceding vehicle.
y 1 Is a longitudinal vehicle road risk value,
x 1 is the distance from the front vehicle;
and 5.2, determining a transverse vehicle road danger value according to the transverse position of the current vehicle:
and (3) establishing a transverse vehicle road danger function by taking the vehicle head central point as an original point:
y 2 =0.5cos[(π/T)x 2 ]-0.5,-T≤x 2 ≤T
y 2 is a value of the risk of the lateral vehicle,
x 2 is the current lateral position;
t is the distance from the center line of the lane to the sideline;
step 5.3, calculating to obtain the current vehicle road danger degree H x,y :
The risk distance influence factors of different road sections have the value range of [1,10 ]]When the value is 1, the current road section and the driving state are standard traffic road sections and driving environments under the regulation of the intersection standard. When the value is 10, the conditions that the current driving environment is severe, the road traffic capacity is extremely poor and rear-end accidents happen frequently around the road, such as a heavy fog and frozen road section, are indicated.
Step 5.4, calculating the current vehicle road danger threshold values of different scenes:
risk=ωγH x,y
omega is a scene impact parameter. Environmental response factor gamma
z 1 the driver states are different, the environmental response degrees are different,
gamma is an environmental response factor and is a specific factor,
delta is the intention of the driver, representing the degree to which the current operation coincides with the recognized intention of the driver,
H x,y the current degree of danger of the vehicle road is,
sigma is a road correction parameter, and the road correction parameter,
a pre in order to operate the quantization parameter in real-time,
risk is the current roadway hazard threshold.
specifically, the interaction force in step 3 is inversely proportional to the distance, and the faster the interaction force increases and the more danger is likely to occur, so the following derivation formula can be derived:
A arisk =ρa risk
in the formula, v f Is the rate of increase of single unit interaction force, v risk To predict the degree of risk of the vehicle road, a f Acceleration for single unit interaction force increase, a risk For the sum of the acceleration increases for all peripheral units of interaction force, A arisk In order to predict the threshold value of the vehicle road danger, rho is a vehicle road danger influence factor and is determined by the complexity of the current road, and the value range is [0,1]]。
in the formula, sigma is the subconscious driving influence deviation of the driver, and the subconscious driving influence deviation of the historical driver under the most similar scene is obtained by comparing traffic scenes;
s is the style of the driver, namely the quantitative evaluation of the style of the driver of [0,10] is obtained through the style test of the driver, and the style of the driver is less than 1, so that the style of the driver is extremely unsuitable, and the time delay of the intention of the driver and the operation reaction of the driving state is influenced.
Z t The state operation reaction of the driver is delayed, the delay is a set value, and the larger the delay is, the smaller the prediction index of the driving operation action is;
delta is the intention of the driver, and different driver intention paths have larger influence on the driver subconscious driving influence deviation.
Step 9, predicting the driving operation action indexAnd comprehensive driving operation action indexAnd inputting the driving weight into a man-machine driving sharing control weight switching system based on reinforcement learning, and calculating and adjusting driving weight weights respectively needed by a driver and a driving system.
Specifically, as shown in fig. 3 and 4, the driving operation action prediction indexVarious risk factors representing the future time period influence the operation safety degree, and the influence of other units after the operation of the vehicle is emphasized. Integrated driving operation action indexRepresenting that each risk factor influences the operation safety degree at the current time, and emphasizing whether the current position is safe or not and whether the current state can be effectively driven or not;
step 9.1, using the Z-score standardized formula, predicting the index of the driving operation action at the current momentAnd comprehensive driving operation action indexNormalizing, and calculating the prediction index of the driving operation from the beginning to the current driving operation during the drivingAnd comprehensive driving operation action indexMean and standard deviation of.
Step 9.2, driving operation action prediction index after Z-Score standardizationAnd comprehensive driving operation action indexAnd inputting the current corresponding mean value and standard deviation as input parameters into a human-computer co-driving control right switching system based on reinforcement learning to judge whether weight distribution conditions are met, if so, executing the step 9.3, and if not, re-acquiring driver information and vehicle path prediction information.
Specifically, the parameter Z-Score represents the number of the sampled sample values differing from the data mean value by several standard deviations, so as to predict the index of the driving operation actionAs an example, the first parameterPredicting index for currently inputted driving operation action(sample value sampling) and index of prediction of driving operation behavior for all inputs from driving behavior to current timeIs different by the number of standard deviations that are prediction indexes of all the inputted driving operation actions from the start of the driving behavior to the current timeStandard deviation of (2). Second parameterSimilarly, no further description is given.
In this embodiment, the weight distribution conditions in the control weight switching system include three types:
(1) 5 times in succession of the first parameterAnd a second parameterAre both less than or equal to a first trigger threshold;
specifically, the first parameter is judgedAnd a second parameterWhether the first trigger threshold values are all smaller than or equal to the first trigger threshold value, wherein the value of the first trigger threshold value can be-3, namely when the input driving operation action prediction index is inputAnd comprehensive driving operation action indexWhether all are less than or equal to 3 standard deviations from the current corresponding mean, the corresponding formula is:
Such a situation indicates that the current state does not satisfy the safe state and that the safe state is not satisfied in the future, and therefore, the control right switching system is triggered to start operating. Under the condition, the index of five inputs in succession is (And) All satisfyAnd isIn time, the driving weights respectively required by the driver and the driving system need to be adjusted.
specifically, the value of the second trigger threshold may be-4. When the inputted comprehensive driving operation action indexThe current mean value is less than or equal to 4 standard deviations, i.e. the second parameter When the current state does not meet the safety state, the driving system needs to be intervened emergently, and the control right switching system is triggered to start working. Under the condition, when the indexes are input three times in successionSatisfies the second parameterAnd adjusting the driving weight respectively required by the driver and the driving system.
specifically, when the driving operation is inputtedIndex of action predictionThe current mean is less than or equal to 4 standard deviations, i.e. the first parameterAnd when the future state does not meet the safe state and the state is safe after the intervention of a driver cannot be corrected by self, triggering the control right switching system to start working. Under this condition, when the index is inputted three times in successionSatisfies the first parameterAnd adjusting the driving weight respectively required by the driver and the driving system.
And 9.3, based on the Q learning algorithm, adjusting the learning state in the Q learning algorithm by using the input parameters, and assigning a driving weight of the driver according to the action in the value maximum value of the next state in the Q learning algorithm, wherein the driving weight of the driving system is the difference between 1 and the driving weight of the driver.
In this embodiment, the driving right weight algorithm respectively required for the driver and the driving system in the control right switching system is a Q learning algorithm, and the specific training process is as follows:
(1) the transition rule for Q learning is:
Q(state,action)=R(state,action)+Gamma*MaxQ(next state,all actions)
that is, Q (state, action) + Gamma max [ Q (next state, all actions) ]
Gamma is a discount factor (discount factor), and the larger the discount factor is, the greater the MaxQ plays a role. Here, the value (R) before the eye, and the value in memory can be understood. MaxQ refers to the value in memory, and it refers to the maximum value of the value in the action of the next state in memory.
(2) A "matrix Q" is added as a learning-intensive agent, i.e. the brain of the driving right switching system, i.e. something learned empirically. The rows of the "matrix Q" represent the current state of the driving right switching system and the columns represent the possible actions of the next state (link between nodes). The driving right switching system is initialized to 0, i.e., the "matrix Q" is initialized to zero. The "matrix Q" can only start with one element. If a new state is found, the "matrix Q" is updated, which is referred to as unsupervised learning.
(3) The driving weight of the driver is power (driver), the driving weight of the driving system is power (system) 1-power (driver), the control right switching system adjusts the driving weight of the driver by using a reinforcement learning Q learning algorithm, the Q learning action is directly assigned to the driving weight of the driver, the value range of the weight value is [0,1], and the step length is 0.05.
(4) The Q learning state is set to 0,1, 2, 3, 4, 5. Wherein:
(5) After triggering the control right switching system to start working, obtaining one of the initial states 0,1 and 2, when the state is still one of the states 0,1 and 2 through the action (adjusting the weight) in the step (3), rewarding to be-1, updating a matrix Q, and assigning the element in the matrix corresponding to the state and the use action to be-1;
when the state reaches the states 3 and 4 through the action in the step (3), the reward is 1, the matrix Q is updated, and the element in the corresponding matrix of the state and the use action is assigned as 1;
when the state reaches the state 5 through the action in (3), the reward is 100, the matrix Q is updated, the element in the corresponding matrix of the state and the use action is assigned as 100, the state 5 is the target state, and finally the Q-table is obtained as shown in fig. 5 (the element is not assigned).
(6) Selecting a road environment, applying (1) to (5), and obtaining a Q-table of an initial "matrix Q" as a selection of MaxQ (next state, all actions) in Q learning Q (state, action) ═ R (state, action) + Gamma (next state, all actions), wherein Gamma is selected according to the road similarity degree [0,1], and R (state, action) is a value of a state obtained by the current road environment: the reward is-1 when the state is 0,1, 2, 1 when the state is 3, 4, and 100 when the state is 5.
When the control right switching system calculates the driving weight, the weight which can be adjusted and the reward obtained by the achieved state are calculated in advance according to the Q-table of the similar road section, and the reward is MaxQ (next state) in the formula. Therefore, Q (state, action) is the sum of the R (state, action) value in the current road environment and MaxQ (next state, all actions).
When Q (state, action) is maximum, the action next state in MaxQ (next state, all actions) is the weight that needs to be adjusted, and is recorded as the driver driving weight power (driver).
(7) And updating the Q-table according to the calculated Q (state) value.
(8) The switching of weights is stopped when state 5 is reached.
The technical scheme of the application is explained in detail by combining the attached drawings, the application provides a reinforcement learning-based man-machine driving control right switching method, the method is suitable for a reinforcement learning-based man-machine driving control right switching system to distribute driving weights between a driver and a driving system, and the method comprises the following steps: calculating a driving operation action prediction index according to the driver information and the vehicle road prediction information; and inputting the driving operation action prediction index and the comprehensive driving operation action index into the control weight switching system, and calculating the driving weight between the driver and the driving system. Through the technical scheme in the application, the risk of longitudinal and transverse integration of the vehicle is effectively solved, the influence of uncertainty caused by a driver is weakened, and the driver is comprehensively considered from different angles, so that the judgment error of the driver is reduced.
The steps in the present application may be sequentially adjusted, combined, and subtracted according to actual requirements.
The units in the device can be merged, divided and deleted according to actual requirements.
Although the present application has been disclosed in detail with reference to the accompanying drawings, it is to be understood that such description is merely illustrative and is not intended to limit the application of the present application. The scope of the present application is defined by the appended claims and may include various modifications, adaptations, and equivalents of the invention without departing from the scope and spirit of the application.
Claims (8)
1. A reinforcement learning-based man-machine driving control right switching method is applicable to distribution of driving weights between a driver and a driving system by a reinforcement learning-based man-machine driving control right switching system, and comprises the following steps:
calculating a driving operation action prediction index according to the driver information and the vehicle road prediction information;
and inputting the driving operation action prediction index and the comprehensive driving operation action index into the control weight switching system, and calculating the driving weight between the driver and the driving system.
2. The reinforcement learning-based human-machine co-driving control right switching method according to claim 1, wherein the driver information at least includes a driver state, a driver intention, a driver style, and a driver subconscious driving influence deviation, the vehicle path prediction information at least includes a predicted vehicle path risk and a predicted vehicle path risk threshold,
the calculation formula of the driving operation action prediction index is as follows:
in the formula (I), the compound is shown in the specification,predicting an index, Z, for the driving maneuver r For a driver state operation response delay, σ is the driver subconscious Driving influence deviation, δ is the driver intent, S is the driver style, v risk For the prediction of the degree of risk of the vehicle road, A arisk And the predicted vehicle road danger threshold value is used.
3. The reinforcement learning-based man-machine driving sharing control right switching method according to claim 2, wherein the calculation formula of the driver subconscious driving influence deviation σ is as follows:
R d =|d-q ki |
wherein sigma is the driver subconscious driving influence deviation, sum is the collected traffic scene number, and D i For a series of subconscious driving strengths in a traffic scene time period, rho', tau and omega are undetermined parameters, alpha is subconscious side weight, beta is personal safety tendency weight of a driver, d is the current transverse position of a vehicle, and q is the current transverse position of the vehicle ki Is the fitted lateral position of the vehicle under this scenario (label), a is the vehicle acceleration, R d Is a location parameter.
4. The reinforcement learning-based human-computer co-driving control right switching method as claimed in claim 1, wherein the driver information at least includes a driver state, a driver intention, and a driver style, and the calculation process of the comprehensive driving operation action index specifically includes:
determining current vehicle path information according to the position of a current vehicle in a road, wherein the current vehicle path information at least comprises a current vehicle path danger degree and a current vehicle path danger threshold;
determining the comprehensive driving operation action index by combining an environmental response factor and adopting a piecewise function mode according to the driver information and the current vehicle path information, wherein the calculation formula of the comprehensive driving operation action index is as follows:
in the formula (I), the compound is shown in the specification,is the index of the integrated driving maneuver z 1 Is the driver state, gamma is the environmental response factor, H x,y For the current vehicle road risk, sigma is a road correction parameter, a pre And for real-time operation of the quantized parameters, risk is the current vehicle road risk threshold.
5. The reinforcement learning-based man-machine co-driving control right switching method as claimed in claim 4, wherein the determining current vehicle path information according to the current vehicle position in the road specifically comprises:
determining a position of a current vehicle in a road, including at least a distance to a preceding vehicle of the current vehicle and a lateral position of the current vehicle;
determining a longitudinal vehicle road danger value according to the distance between the current vehicle and the front vehicle;
determining a transverse vehicle road danger value according to the transverse position of the current vehicle;
calculating the current vehicle road risk degree according to the longitudinal vehicle road risk value and the transverse vehicle road risk value, wherein the corresponding calculation formula is as follows:
in the formula, H x,y In order to obtain the current vehicle road risk degree,the value range of the risk distance influence factors of different road sections is [1,10 ]],y 1 Is said longitudinal road hazard value, y 2 The value is the lateral vehicle road danger value;
and calculating current vehicle road danger thresholds of different scenes according to the current vehicle road danger degrees, and recording the current vehicle road danger thresholds and the current vehicle road danger degrees as the current vehicle road information.
6. The reinforcement learning-based man-machine co-driving control right switching method as claimed in claim 4, wherein the environmental response factor γ is calculated by the formula:
wherein M is the vehicle mass, M is the vehicle type andtarget correction parameter, k 1 In order to correct the parameters for the dynamics,representing the desired speed and direction of speed, v, of the vehicle limleast (t) is the minimum velocity value, k 2 The parameters are corrected for the traffic scene,as a vehicle interaction force parameter, k 3 A correction parameter for the degree to which the pedestrian complies with the traffic regulations,is a pedestrian interaction force parameter, k 4 The parameters are corrected for the complexity of the surrounding physical environment,as an environmental interaction force parameter, k 5 A correction parameter for the degree of influence of the traffic regulations,are rule parameters.
7. The reinforcement learning-based human-computer co-driving control right switching method according to any one of claims 1 to 6, wherein the calculating the driving weight between the driver and the driving system specifically includes:
step 9.1, using the Z-score standardized formula, predicting the index of the driving operation action at the current momentAnd comprehensive driving operation action indexStandardizing, and calculating the operation steps from the start to the current operation in the current drivingIndex of measurementAnd comprehensive driving operation action indexMean and standard deviation of (d);
step 9.2, driving operation action prediction index after Z-Score standardizationAnd comprehensive driving operation action indexInputting the current corresponding mean value and standard deviation as input parameters into a human-computer co-driving control right switching system based on reinforcement learning to judge whether weight distribution conditions are met, if so, executing the step 9.3, and if not, acquiring driver information and vehicle path prediction information again;
and 9.3, based on the Q learning algorithm, adjusting the learning state in the Q learning algorithm by using the input parameters, and assigning a driving weight of the driver according to the action in the value maximum value of the next state in the Q learning algorithm, wherein the driving weight of the driving system is the difference between 1 and the driving weight of the driver.
8. The reinforcement learning-based human-computer co-driving control weight switching method as claimed in claim 7, wherein the weight distribution condition specifically includes:
5 times in succession of the first parameterAnd a second parameterAre both less than or equal to a first trigger threshold; alternatively, the first and second electrodes may be,
3 times of continuous workThe second parameterLess than or equal to a second trigger threshold; alternatively, the first and second electrodes may be,
the first parameter is continued for 3 timesLess than or equal to the second trigger threshold, wherein,
the first parameterPredicting index a for currently inputted driving operation actionAnd a driving operation action prediction index corresponding to all the inputs from the driving behavior to the current timeThe mean of (a) differs by the number of standard deviations,
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210758672.3A CN115071758B (en) | 2022-06-29 | 2022-06-29 | Man-machine common driving control right switching method based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210758672.3A CN115071758B (en) | 2022-06-29 | 2022-06-29 | Man-machine common driving control right switching method based on reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115071758A true CN115071758A (en) | 2022-09-20 |
CN115071758B CN115071758B (en) | 2023-03-21 |
Family
ID=83254772
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210758672.3A Active CN115071758B (en) | 2022-06-29 | 2022-06-29 | Man-machine common driving control right switching method based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115071758B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108549367A (en) * | 2018-04-09 | 2018-09-18 | 吉林大学 | A kind of man-machine method for handover control based on prediction safety |
US20190118832A1 (en) * | 2016-04-18 | 2019-04-25 | Honda Motor Co., Ltd. | Vehicle control system, vehicle control method, and vehicle control program |
US20200192359A1 (en) * | 2018-12-12 | 2020-06-18 | Allstate Insurance Company | Safe Hand-Off Between Human Driver and Autonomous Driving System |
US20210039638A1 (en) * | 2019-08-08 | 2021-02-11 | Honda Motor Co., Ltd. | Driving support apparatus, control method of vehicle, and non-transitory computer-readable storage medium |
CN113335291A (en) * | 2021-07-27 | 2021-09-03 | 燕山大学 | Man-machine driving sharing control right decision method based on man-vehicle risk state |
CN113341730A (en) * | 2021-06-28 | 2021-09-03 | 上海交通大学 | Vehicle steering control method under remote man-machine cooperation |
-
2022
- 2022-06-29 CN CN202210758672.3A patent/CN115071758B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190118832A1 (en) * | 2016-04-18 | 2019-04-25 | Honda Motor Co., Ltd. | Vehicle control system, vehicle control method, and vehicle control program |
CN108549367A (en) * | 2018-04-09 | 2018-09-18 | 吉林大学 | A kind of man-machine method for handover control based on prediction safety |
US20200192359A1 (en) * | 2018-12-12 | 2020-06-18 | Allstate Insurance Company | Safe Hand-Off Between Human Driver and Autonomous Driving System |
US20210039638A1 (en) * | 2019-08-08 | 2021-02-11 | Honda Motor Co., Ltd. | Driving support apparatus, control method of vehicle, and non-transitory computer-readable storage medium |
CN113341730A (en) * | 2021-06-28 | 2021-09-03 | 上海交通大学 | Vehicle steering control method under remote man-machine cooperation |
CN113335291A (en) * | 2021-07-27 | 2021-09-03 | 燕山大学 | Man-machine driving sharing control right decision method based on man-vehicle risk state |
Also Published As
Publication number | Publication date |
---|---|
CN115071758B (en) | 2023-03-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021077725A1 (en) | System and method for predicting motion state of surrounding vehicle based on driving intention | |
CN109345020B (en) | Non-signalized intersection vehicle driving behavior prediction method under complete information | |
EP3464005B1 (en) | Method for estimating a probability distribution of the maximum coefficient of friction at a current and/or future waypoint of a vehicle | |
CN111104969B (en) | Collision possibility pre-judging method for unmanned vehicle and surrounding vehicles | |
CN109727469B (en) | Comprehensive risk degree evaluation method for automatically driven vehicles under multiple lanes | |
CN110077398B (en) | Risk handling method for intelligent driving | |
CN112249008B (en) | Unmanned automobile early warning method aiming at complex dynamic environment | |
CN110843789A (en) | Vehicle lane change intention prediction method based on time sequence convolution network | |
CN112071059A (en) | Intelligent vehicle track changing collaborative planning method based on instantaneous risk assessment | |
Li et al. | Modeling vehicle merging position selection behaviors based on a finite mixture of linear regression models | |
CN115056798A (en) | Automatic driving vehicle lane change behavior vehicle-road cooperative decision algorithm based on Bayesian game | |
Toledo et al. | State dependence in lane-changing models | |
Babojelić et al. | Modelling of driver and pedestrian behaviour–a historical review | |
CN113761715A (en) | Method for establishing personalized vehicle following model based on Gaussian mixture and hidden Markov | |
Julian et al. | Complex lane change behavior in the foresighted driver model | |
Griesbach et al. | Prediction of lane change by echo state networks | |
CN115071758B (en) | Man-machine common driving control right switching method based on reinforcement learning | |
CN112750304A (en) | Intersection data acquisition interval determining method and device based on traffic simulation | |
CN116811854A (en) | Method and device for determining running track of automobile, automobile and storage medium | |
Dey et al. | Left-turn phasing selection considering vehicle to vehicle and vehicle to pedestrian conflicts | |
CN113033902B (en) | Automatic driving lane change track planning method based on improved deep learning | |
DE102018008599A1 (en) | Control system and control method for determining a trajectory for a motor vehicle | |
Ni et al. | Situation assessment for lane-changing risk based on driver’s perception of adjacent rear vehicles | |
CN113823118A (en) | Intelligent network vehicle lane changing method combining urgency degree and game theory | |
CN116946089B (en) | Intelligent brake auxiliary system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |