CN114385113A - Test scene generation method based on self-adaptive driving style dynamic switching model - Google Patents

Test scene generation method based on self-adaptive driving style dynamic switching model Download PDF

Info

Publication number
CN114385113A
CN114385113A CN202111558858.6A CN202111558858A CN114385113A CN 114385113 A CN114385113 A CN 114385113A CN 202111558858 A CN202111558858 A CN 202111558858A CN 114385113 A CN114385113 A CN 114385113A
Authority
CN
China
Prior art keywords
mood
driver
longitudinal
visibility
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111558858.6A
Other languages
Chinese (zh)
Inventor
马依宁
陈君毅
吴建峰
吴靖宇
熊璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN202111558858.6A priority Critical patent/CN114385113A/en
Publication of CN114385113A publication Critical patent/CN114385113A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/10Requirements analysis; Specification techniques
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01MTESTING STATIC OR DYNAMIC BALANCE OF MACHINES OR STRUCTURES; TESTING OF STRUCTURES OR APPARATUS, NOT OTHERWISE PROVIDED FOR
    • G01M17/00Testing of vehicles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/20Software design

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention relates to a test scene generation method based on a self-adaptive driving style dynamic switching model, which comprises the following steps: respectively defining a mood value and a mood value of a driver, wherein the mood value is used for quantitatively representing the influence of the current traffic condition and the environmental state on the mood of the driver, and the mood value is used for representing the prediction of the driver on the future traffic condition; constructing a driving style conversion system based on the mood value and the mood value, wherein the driving style conversion system comprises a mood value calculation model and a mood change model; training a mental state change model by adopting a reinforcement learning method; combining a driving style conversion system with two single-style driver models, and jointly constructing to obtain a self-adaptive driving style dynamic switching model; and generating an automatic driving test scene by utilizing the self-adaptive driving style dynamic switching model. Compared with the prior art, the method and the device can automatically switch the driving style according to the change of the environment, and effectively improve the reality and complexity of the automatic driving test scene.

Description

Test scene generation method based on self-adaptive driving style dynamic switching model
Technical Field
The invention relates to the technical field of automatic driving tests, in particular to a test scene generation method based on a self-adaptive driving style dynamic switching model.
Background
The current automatic driving vehicle test mainly comprises a real vehicle test and a virtual simulation test. The real vehicle test needs a great deal of manpower and time, and the limitation of the real vehicle test is more and more obvious along with the improvement of the driving automation level; in the virtual simulation test, the test scene configuration is flexible, the test efficiency is high, the test repeatability is strong, the test process is safe, the test cost is low, automatic test and accelerated test can be realized, a large amount of manpower and material resources can be saved, but the existing virtual simulation technology has some defects.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a test scene generation method based on a self-adaptive driving style dynamic switching model, which generates a dynamic automatic driving test scene consistent with the vehicle behavior in the real traffic environment by constructing the self-adaptive driving style dynamic switching model, thereby improving the reliability of the automatic driving virtual test result.
The purpose of the invention can be realized by the following technical scheme: a test scene generation method based on a self-adaptive driving style dynamic switching model comprises the following steps:
s1, defining a mood value and a mood value of the driver respectively, wherein the mood value is used for quantitatively representing the influence of the current traffic condition and the environmental state on the mood of the driver, and the mood value is used for representing the prediction of the driver on the future traffic condition;
s2, constructing a driving style conversion system based on the mood value and the mood value, wherein the driving style conversion system comprises a mood value calculation model and a mood change model;
s3, training a mental state change model by adopting a reinforcement learning method;
s4, combining the driving style conversion system with two single-style driver models to jointly construct a self-adaptive driving style dynamic switching model;
and S5, generating an automatic driving test scene by utilizing the self-adaptive driving style dynamic switching model.
Further, the calculation formula of the mood value calculation model in step S2 is specifically:
m=ωtraffict+ωspeedv+ωego_behaviore+ωenvironmentw
Figure BDA0003419950170000021
Figure BDA0003419950170000022
w=ωvisibilitywvisibilityroad_conditionswroad_conditions
wherein m is the mood value, omega, obtained by the driver at the current time steptrafficIs the weight of the traffic jam degree parameter, t is the traffic jam degree parameter, and is determined by the total number n of vehicles with the longitudinal distance between the own vehicle in the lane and the adjacent lane and the own vehicle within the set range, omegaspwedIs the weight of the speed parameter, v is the speed parameter, and is measured by the magnitude of the speed of the bicycle, omegaego_BehaviorThe behavior of the vehicle is dispersed into a combination of actions in the transverse direction and the longitudinal direction, wherein the actions comprise longitudinal uniform speed, longitudinal deceleration, longitudinal acceleration and longitudinal rapid decelerationTen actions of speed, longitudinal rapid acceleration, longitudinal uniform speed and transverse uniform speed, longitudinal acceleration transverse uniform speed, longitudinal rapid acceleration transverse uniform speed, longitudinal deceleration transverse uniform speed and longitudinal rapid deceleration transverse uniform speed are carried out; when the action taken by the self vehicle comprises 'longitudinal uniform speed', e is 0;
when the motion adopted by the self vehicle is used as ' longitudinal acceleration ', ' longitudinal rapid acceleration ', ' transverse uniform speed longitudinal acceleration ' and ' transverse uniform speed longitudinal rapid acceleration ', ' e is 1;
when the actions taken by the self-vehicle are 'longitudinal deceleration', 'transverse uniform speed longitudinal deceleration' and 'transverse uniform speed longitudinal uniform speed', e is-1;
when the action taken by the self vehicle is 'longitudinal rapid deceleration' and 'transverse uniform speed longitudinal rapid deceleration', e is-5;
the behavior of other vehicles influencing the mood of drivers of the vehicles mainly comprises two behaviors, one is that the vehicles ahead of the same lane suddenly decelerate in front of the vehicles, and the other is that the vehicles ahead of the adjacent lanes suddenly cut into the lane where the vehicles are located; both of these behaviors force the vehicle to slow down or change lanes and are thus already included in the vehicle behavior parameter e;
ωenvironmentis the weight of the natural environment factors, w is the natural environment factors including visibility and road conditions: under different natural environments, the mood of a driver is different, and when the visibility is good and the road surface condition is good, the driver can attach more importance to the experience and mood during driving; when visibility is poor and the road surface is poor, driving difficulty is increased, and a driver can pay more attention to driving safety rather than driving experience;
ωvisibilityas a weight of visibility factor, wvisibilityFor visibility reasons, the visibility is semi-quantified as "good", "normal" and "poor", and when the visibility is "good", wvisibility=0;
When visibility is "general", wvisibility=-0.5;
When visibility is "poor", wvisibility=-1;
ωroad_conditionsFor roadsWeight of condition factor, wroad_conditionsThe road condition was semi-quantitatively defined as "good", "normal" and "poor" for the road condition factor, and when the road condition was "good", w wasroad_conditions=0:
When the road condition is "normal", wroad_conditions=-0.5;
When the road condition is "poor", wroad_conditions=-1。
Further, the step S2 of constructing the driving style conversion system is specifically to construct the driving style conversion system based on a threshold or a ratio.
Further, the threshold-based construction of the driving style conversion system specifically includes: when the accumulated mood value of the driver exceeds the mood value, switching the driving style, wherein the calculation formula of the accumulated mood value is as follows:
Figure BDA0003419950170000031
wherein m isvIs the cumulative mood value of the driver, mtThe mood value obtained by the driver at the time step t and the current time step h.
Further, the proportion-based construction of the driving style conversion system specifically includes: at a certain time step, the driver has Pi(0<i≤2,p1+p21) of the single-style driver model i, where PiThe calculation formula of (2) is as follows:
Figure BDA0003419950170000032
wherein m ismaxTo accumulate the maximum value of mood values, mminTo accumulate the minimum value of mood values, mnowIs the current heart state value.
Further, the specific process of training the mind state change model in step S3 is as follows: firstly, defining a state set used for training, and semiquantitatively setting the traffic jam degree as 'crowded', 'normal' and 'vacant' according to a lane where a self-vehicle is located and the total number n of vehicles with the longitudinal distance between the self-vehicle and an adjacent lane within a set range;
according to the speed of the vehicle, the semi-quantitative speed is 'fast', 'normal speed' and 'slow';
according to the relative distance between the self vehicle and the front vehicle, the distance is semi-quantitatively taken as 'near', 'proper' and 'far';
natural environments include both visibility, which can be semi-quantified as "good", "normal", and "poor", and road conditions, which can be semi-quantified as "good", "normal", and "poor";
thereafter defining a set of actions for the mood swings including "unchanged", "tend to be negative", "tend to be positive", "tend to be negative fast", and "tend to be positive fast", wherein "tend to be negative" and "tend to be negative fast" both indicate that the mood value decreases, but "tend to be negative fast" the mood value decreases faster;
and designing a reward function, and finally training the dynamic change model by using two single-style driver models.
Further, the reward function is specifically designed to obtain a corresponding reward function based on the mental state fluctuation condition of the driver, the driving risk degree and the influence of the mood value.
Further, the reward function is specifically:
R=ωfRfeaturescRcrashmRm
wherein, ω isfWeight, omega, for driver's mood swingscWeighted by driving risk, ωmIs the weight of mood value, RfeaturesAs a heart-beat reward function, RcrashFor driving risk reward functions, RmThe reward function is influenced by the mood value for the mood.
Further, the two single-style driver models include a first-style driver model and a second-style driver model.
Further, the specific process of training the posture change model by using two single-style driver models is as follows: firstly, a safe traffic flow environment is built by utilizing a first style model, a mind state change model is preliminarily trained, then a challenging traffic flow environment is built by utilizing a second style driver model, and the mind state change model is further trained.
Compared with the prior art, the method defines the mood value of the driver, and provides a mood value calculation method considering traffic jam factors, own vehicle behaviors, natural environment and other vehicle behaviors; meanwhile, the mood value of a driver is defined, the mood value and the mood value are combined to construct a driving style conversion system, then a reinforcement learning method is utilized to train a mood change model, the driving style conversion system is combined with two existing single-style driver models to construct and obtain a self-adaptive driving style dynamic switching model, a self-adaptive driving style dynamic switching test scene is generated by utilizing the model, the driving style can be automatically switched according to the change of the environment, the authenticity and the complexity of the test scene are effectively improved, and the reliability of an automatic driving virtual test result is ensured.
When the mind state change model is trained, basic facts of the fluctuation condition of the mind state of a driver, the driving danger degree and the influence of the mind state value are fully considered to design a reward function for training, and the mind state change model is trained in batches by combining two single style driver models, so that the accuracy of the mind state change model can be ensured, the driving style of the constructed driving style dynamic switching model can be adaptively changed according to the change of the driving environment, and a corresponding test scene is generated.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic diagram of an embodiment of an application process;
FIG. 3 is a flowchart illustrating the training of an adaptive threshold-based driving style dynamic switching model according to an exemplary embodiment;
FIG. 4 is a flowchart of a method of a threshold-based driving style conversion system according to an embodiment;
FIG. 5 is a flowchart of training a scale-based adaptive driving style dynamic switching model according to an embodiment;
fig. 6 is a flowchart of a method of a scale-based driving style conversion system according to an embodiment.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments.
Examples
As shown in fig. 1, a test scenario generation method based on an adaptive driving style dynamic switching model includes the following steps:
s1, defining a mood value and a mood value of the driver respectively, wherein the mood value is used for quantitatively representing the influence of the current traffic condition and the environmental state on the mood of the driver, and the mood value is used for representing the prediction of the driver on the future traffic condition;
s2, constructing a driving style conversion system based on the mood value and the mood value, wherein the driving style conversion system comprises a mood value calculation model and a mood change model, and the calculation formula of the mood value calculation model is as follows:
m=ωtraffict+ωspeedv+ωego_behaviore+ωenvironmentw
Figure BDA0003419950170000051
Figure BDA0003419950170000052
w=ωvisibilitywvisibilityroad_conditionswroad_conditions
wherein m is the mood value, omega, obtained by the driver at the current time steptrafficThe weight of the traffic jam degree parameter is t, and the vehicle is located in the lane and adjacent lanesThe total number n of vehicles with the longitudinal distance within a set range is determined, omegaspwedIs the weight of the speed parameter, v is the speed parameter, and is measured by the magnitude of the speed of the bicycle, omegaego_behaviorThe behavior of the vehicle is dispersed into a combination of actions in the transverse direction and the longitudinal direction, and the combination comprises ten actions of longitudinal uniform speed, longitudinal deceleration, longitudinal acceleration, longitudinal rapid deceleration, longitudinal rapid acceleration, longitudinal uniform speed, transverse uniform speed, longitudinal acceleration transverse uniform speed, longitudinal rapid acceleration transverse uniform speed, longitudinal deceleration transverse uniform speed and longitudinal rapid deceleration transverse uniform speed; when the action taken by the self vehicle comprises 'longitudinal uniform speed', e is 0;
when the motion adopted by the self vehicle is used as ' longitudinal acceleration ', ' longitudinal rapid acceleration ', ' transverse uniform speed longitudinal acceleration ' and ' transverse uniform speed longitudinal rapid acceleration ', ' e is 1;
when the actions taken by the self-vehicle are 'longitudinal deceleration', 'transverse uniform speed longitudinal deceleration' and 'transverse uniform speed longitudinal uniform speed', e is-1;
when the action taken by the self vehicle is 'longitudinal rapid deceleration' and 'transverse uniform speed longitudinal rapid deceleration', e is-5;
the behavior of other vehicles influencing the mood of drivers of the vehicles mainly comprises two behaviors, one is that the vehicles ahead of the same lane suddenly decelerate in front of the vehicles, and the other is that the vehicles ahead of the adjacent lanes suddenly cut into the lane where the vehicles are located; both of these behaviors force the vehicle to slow down or change lanes and are thus already included in the vehicle behavior parameter e;
ωenvironmentis the weight of the natural environment factors, w is the natural environment factors including visibility and road conditions: under different natural environments, the mood of a driver is different, and when the visibility is good and the road surface condition is good, the driver can attach more importance to the experience and mood during driving; when visibility is poor and the road surface is poor, driving difficulty is increased, and a driver can pay more attention to driving safety rather than driving experience;
ωvisibilityas a weight of visibility factor, wvisibilityFor visibility reasons, the visibility is halvedThe basis weights are "good", "normal" and "poor", and when the visibility is "good", wvisibility=0;
When visibility is "general", wvisibility=-0.5;
When visibility is "poor", wvisibility=-1;
ωroad_conditionsIs the weight of the road condition factor, wroad_conditionsThe road condition was semi-quantitatively defined as "good", "normal" and "poor" for the road condition factor, and when the road condition was "good", w wasroad_conditions=0:
When the road condition is "normal", wroad_conditions=-0.5;
When the road condition is "poor", wroad_conditions=-1;
In addition, when applied specifically, the driving style conversion system can be constructed based on a threshold value or based on a proportion:
the threshold-based construction of the driving style conversion system means that the driving style is switched after the accumulated mood value of the driver exceeds the mood value, and the calculation formula of the accumulated mood value is as follows:
Figure BDA0003419950170000071
in the formula, mvIs the cumulative mood value of the driver, mtObtaining the mood value of the driver at t time step, and obtaining the current time step by h;
the fact that the driving style conversion system is constructed based on the proportion means that a driver has P at a certain time stepi(0<i≤2,p1+p21) probability of executing a single-style driver model i, PiThe calculation formula of (2) is as follows:
Figure BDA0003419950170000072
in the formula, mmaxTo accumulate the maximum value of mood values, mminTo accumulate the minimum value of mood values, mnowIs the current heart state value;
s3, training a mind state change model by adopting a reinforcement learning method, during training, firstly defining a state set used for training, and semiquantitatively setting the traffic jam degree as 'crowded', 'normal' and 'open' according to the lane where the own vehicle is located and the total number n of vehicles with the longitudinal distance between the adjacent lane and the own vehicle within a set range;
according to the speed of the vehicle, the semi-quantitative speed is 'fast', 'normal speed' and 'slow';
according to the relative distance between the self vehicle and the front vehicle, the distance is semi-quantitatively taken as 'near', 'proper' and 'far';
natural environments include both visibility, which can be semi-quantified as "good", "normal", and "poor", and road conditions, which can be semi-quantified as "good", "normal", and "poor";
thereafter defining a set of actions for the mood swings including "unchanged", "tend to be negative", "tend to be positive", "tend to be negative fast", and "tend to be positive fast", wherein "tend to be negative" and "tend to be negative fast" both indicate that the mood value decreases, but "tend to be negative fast" the mood value decreases faster;
designing a reward function, and finally training the dynamic change model by using two single-style driver models;
in the technical scheme, the reward function is based on the mental state fluctuation condition of the driver, the driving danger degree and the influence of the mood value so as to design and obtain the corresponding reward function:
R=ωfRfeaturescRcrashmRm
in the formula, ωfWeight, omega, for driver's mood swingscWeighted by driving risk, ωmIs the weight of mood value, RfeaturesAs a heart-beat reward function, RcrashFor drivingDriving risk degree reward function, RmRewarding functions for mood affected by mood values;
in addition, the two single-style driver models can select the existing first-style driver model and the second-style driver model, firstly, the first-style driver model is used for constructing a safe traffic flow environment, the mind state change model is preliminarily trained, then, the second-style driver model is used for constructing a challenging traffic flow environment, and the mind state change model is further trained. (ii) a
S4, combining the driving style conversion system with two single-style driver models to jointly construct a self-adaptive driving style dynamic switching model;
and S5, generating an automatic driving test scene by utilizing the self-adaptive driving style dynamic switching model.
The embodiment applies the above technical solution, as shown in fig. 2:
firstly, constructing a driving style conversion system
First, a mood value is calculated in consideration of the influence of traffic flow, own vehicle behavior, natural environment, and other vehicle behavior, and a calculation formula of the mood value is as follows:
m=ωtraffict+ωspeedv+ωego_behaviore+ωenvironmentw
wherein m represents the mood value, ω, obtained by the driver at the current time steptrafficThe weight of the traffic jam degree parameter is represented, t represents the traffic jam degree parameter, the t is determined by the total number n of vehicles with the longitudinal distance between the lane where the vehicle is located and the adjacent lane and the vehicle within the range of 42m, and the specific calculation formula of the value of t is as follows:
Figure BDA0003419950170000081
Figure BDA0003419950170000082
ωspeedrepresents the velocity parameter weight, ω in this embodimentspeed0.8; v represents a speed parameter, and is measured by the speed of the bicycle; omegaego_BehaviorRepresents the weight of the behavior parameter of the vehicle, in this embodiment ωego_Behavior0.5; e represents the behavior parameters of the vehicle, wherein the behavior of the vehicle is dispersed into a combination of actions in the transverse direction and the longitudinal direction, and the combination comprises ten actions of longitudinal uniform speed, longitudinal deceleration, longitudinal acceleration, longitudinal rapid deceleration, longitudinal rapid acceleration, longitudinal uniform speed, transverse uniform speed, longitudinal acceleration, transverse uniform speed, longitudinal rapid acceleration, transverse uniform speed, longitudinal deceleration, transverse uniform speed and longitudinal rapid deceleration, wherein the ten actions are taken; when the action taken by the self vehicle comprises 'longitudinal uniform speed', e is 0; when the motion adopted by the self vehicle is used as ' longitudinal acceleration ', ' longitudinal rapid acceleration ', ' transverse uniform speed longitudinal acceleration ' and ' transverse uniform speed longitudinal rapid acceleration ', ' e is 1; when the actions taken by the self-vehicle are 'longitudinal deceleration', 'transverse uniform speed longitudinal deceleration' and 'transverse uniform speed longitudinal uniform speed', e is-1; when the action taken by the self vehicle is 'longitudinal rapid deceleration' and 'transverse uniform-speed longitudinal rapid deceleration', e is-5, the behaviors of the other vehicle influencing the mood of the driver of the self vehicle mainly comprise two behaviors, one behavior is that the vehicle ahead of the self vehicle in the same lane is rapidly decelerated, and the other behavior is that the vehicle ahead of the adjacent lane is suddenly cut into the lane where the self vehicle is located; both of these behaviors force the vehicle to slow down or change lanes and are thus already included in the vehicle behavior parameter e. OmegaenvironmentWeight representing natural environment factor, in this embodiment ωenvironment1 is ═ 1; w represents a natural environment factor, and the natural environment includes visibility and road surface conditions: the mood of the driver can be different under different natural environments, and when the visibility is good and the road surface condition is good, the driver can attach more importance to the experience during driving, so that the mood is better; when visibility is poor, the road surface is poor, the driving difficulty can increase, the driver can pay more attention to driving safety rather than driving experience, the mood becomes poor, and a specific calculation formula is as follows:
w=ωvisibilitywvisibilityroad_conditionswroad_conditions
wherein, ω isvisibilityWeight representing visibility factor inIn this example ωvisibility=0.9,wvisibilityRepresenting visibility factors, the visibility is semi-quantified as "good", "normal" and "poor", and when the visibility is "good", wvisibilityWhen visibility is "normal", w is 0visibility-0.5, when visibility is "poor", wvisibility=-1;ωroad_conditionsWeight representing road condition factor, in this embodiment ωroad_conditions=0.7,wroad_conditionsRepresenting the road condition factors, the road condition was semi-quantitatively defined as "good", "general" and "poor", and when the road condition was "good", w wasroad_conditionsWhen the road condition is "normal", w is 0road_conditions-0.5, when the road condition is "poor", wroad_conditions=-1。
Secondly, a driving style conversion system is constructed by utilizing the mood value and the mood value, if the driving style conversion system based on the threshold value is constructed, as shown in fig. 4, the cumulative mood value is calculated and is used as the threshold value of the cumulative mood value, when the cumulative mood value exceeds the mood value, the driver model is switched, otherwise, the driver model is not switched. The calculation formula of the cumulative mood value is as follows:
Figure BDA0003419950170000091
where mt represents the mood value obtained by the driver at time step t, and h represents the current time step.
If a proportion-based driving style conversion system is constructed, as shown in fig. 6, an accumulated mood value is calculated, and the probability P that a driver selects a single driver model i at a certain time step is calculated by combining the mood valuei(0<i≤2,p1+p21), wherein PiThe calculation process of (2) is as follows:
Figure BDA0003419950170000092
wherein m ismaxTo accumulate the maximum value of mood values, mminTo accumulate the minimum value of mood values, mnowIs the current heart state value.
Second, training the mental state change model
Firstly, defining a state set used for training and an action set used for mental state fluctuation, and semiquantitatively determining the traffic jam degree as 'crowded', 'normal' and 'vacant', according to the lane where the vehicle is located and the total number n of vehicles with the longitudinal distance between the adjacent lane and the vehicle within the range of 42 m; according to the speed of the vehicle, the semi-quantitative speed is 'fast', 'normal speed' and 'slow'; according to the relative distance between the self vehicle and the front vehicle, the distance is semi-quantitatively taken as 'near', 'proper' and 'far'; the natural environment includes both visibility and road surface conditions, where visibility can be semi-quantified as "good", "normal" and "poor", and road surface conditions can be semi-quantified as "good", "normal" and "poor". Poor visibility in snowy weather, poor road conditions due to icing; the driving visibility and the road surface condition of the country road in the haze day are poor; the visibility and the inside conditions are good when the vehicle is driven on a highway in sunny days, and the like. The set of actions for mood swings includes "do not change", "tend to be negative", "tend to be positive", "tend to be negative quickly", and "tend to be positive quickly", where "tend to be negative" and "tend to be negative quickly" both mean that the mood value drops, but the mood value drops faster "tends to be negative quickly".
Secondly, the reward function is designed according to several basic facts:
(1) human driver mood swings: the change of the mind state of the human driver is an accumulative process, namely the change of the mind state of a person in a short time is not too violent generally, and according to the fluctuation condition of the mind state, a reward function is designed to be as follows: rfeatures
(2) Driving risk degree: human drivers always tend to avoid traffic accidents, and the reward function is designed as follows: rcrash
(3) Influenced by mood values: considering that the mind of the human driver is influenced by the current moodThe reward function is: rm
The resulting reward function is:
R=ωfRfeaturescRcrashmRm
wherein, ω isfRepresenting the weight, omega, of the fluctuation of the human driver's mindcWeight, ω, representing driving riskmRepresenting the weight occupied by the mood value.
Finally, training by using the existing conservative driver model and aggressive driver model, firstly constructing a safe traffic flow environment by using the conservative driver model, primarily training an mind state change model, then constructing a challenging traffic flow environment by using the aggressive driver model, and further training the mind state change model;
if a driving style conversion system based on a threshold value is combined to train a mind state change model, conversion between an aggressive driving style and a conservative driving style is realized; as shown in fig. 3, during training, the state quantity s of the heart state change model at the current time t is obtainedtThen, a sum current state quantity s is selectedtCorresponding most valuable action
Figure BDA0003419950170000101
Execution of atThen, the mental state value is changed and input into the driving style conversion system, and the driving style conversion system simultaneously changes the driving style according to the current state stCalculating an accumulated mood value, determining whether to switch the driver model according to whether the accumulated mood value exceeds a mood value, and executing the single-style driver model by the system according to the current state stCorresponding vehicle actions are executed, and then the vehicle obtains a reward rt+1And enters the next state st+1According to the correspondence of the next state
Figure BDA0003419950170000102
And rt+1To update Q(s)t,at) And continuously iterating and updating to finally obtain the heart state change model.
If combined with each otherA proportion-based driving style conversion system trains a mental state change model to realize conversion between aggressive driving style and conservative driving style; as shown in fig. 5, during training, the state quantity s of the heart state change model at the current time t is obtainedtThen, a sum current state quantity s is selectedtCorresponding most valuable action
Figure BDA0003419950170000111
Execution of atThen, the mental state value is changed and input into the driving style conversion system, and the driving style conversion system simultaneously changes the driving style according to the current state stCalculating the cumulative mood value, calculating the probability of selecting a single style driver model according to the cumulative mood value and the mood value, and then selecting the single style driver model according to the probability by the system, wherein the model is based on the current state stCorresponding vehicle actions are executed, and then the vehicle obtains a reward rt+1And enters the next state st+1According to the correspondence of the next state
Figure BDA0003419950170000112
And rt+1To update Q(s)t,at) And continuously iterating and updating to finally obtain the heart state change model.
Thirdly, generating a test scene
Combining a driving style conversion system, a trained mind state change model and two single mind state driver models (aggressive type and conservative type) to construct a self-adaptive driving style dynamic switching model, and generating an automatic driving test scene by using the model; specifically, a straight line three-lane road model is constructed, and n is usedcar(ncarFor background car numbers, [20, 40 ]]Random selection of n with inner equal probabilitycarValue) background vehicles are randomly placed on any position of the road, wherein fifty percent of the initial driving strategies of the background vehicles are driving strategies of an aggressive driver model, the other fifty percent of the initial driving strategies of the background vehicles are driving strategies of a conservative driver model, and then all the background vehicles execute driving tasks according to the self-adaptive driving style dynamic switching model.
During testing, the tested automatic driving decision-making system is put into a test scene, and the tested automatic driving decision-making system is evaluated by testing the passing efficiency, comfort and safety of the system in the scene. Specifically, the running efficiency of the tested automatic driving decision system is reflected by testing the passing time and the average speed of the tested automatic driving decision system in a scene; the comfort of the tested automatic driving decision system is reflected by measuring the average of the absolute values of the acceleration in the running process of the tested automatic driving decision system; the safety of the tested automatic driving decision system is reflected by measuring the number of times the system collides in the scene and the number of times the minimum safety interval is violated.
In summary, according to the technical scheme, a self-adaptive driving style dynamic switching model is constructed by simulating a process of driving style change caused by mood change of a driver, specifically, a mood value and a mood value of the driver are defined, influence of the current state of the driver on the mood of the driver is quantitatively expressed through the mood value, prediction of the driver on future traffic conditions is expressed through the mood value, then a driving style conversion system is constructed by combining the mood value and the mood value, then a reinforced learning method is utilized to train the mood change model, the driving style conversion system, the mood change model and two single mood driver models are combined to construct the self-adaptive driving style dynamic switching model, and a test scene of self-adaptive driving style switching is generated by utilizing the model. In practical application, the tested automatic driving system is placed in a test scene, and the tested automatic driving system can be evaluated by testing the running efficiency, comfort and safety of the system in the scene.
The driver model obtained by training in the technical scheme can change the driving style according to the change of the driving environment, which is similar to that of a real human driver, so that the generated test scene has stronger authenticity; the driver model obtained by training in the technical scheme can change the driving style in a self-adaptive manner, and the test scene is generated by using the driver model, so that the behavior of a background vehicle in the scene has higher uncertainty, and the test effect is further improved.

Claims (10)

1. A test scene generation method based on a self-adaptive driving style dynamic switching model is characterized by comprising the following steps:
s1, defining a mood value and a mood value of the driver respectively, wherein the mood value is used for quantitatively representing the influence of the current traffic condition and the environmental state on the mood of the driver, and the mood value is used for representing the prediction of the driver on the future traffic condition;
s2, constructing a driving style conversion system based on the mood value and the mood value, wherein the driving style conversion system comprises a mood value calculation model and a mood change model;
s3, training a mental state change model by adopting a reinforcement learning method;
s4, combining the driving style conversion system with two single-style driver models to jointly construct a self-adaptive driving style dynamic switching model;
and S5, generating an automatic driving test scene by utilizing the self-adaptive driving style dynamic switching model.
2. The method for generating a test scenario based on an adaptive driving style dynamic switching model according to claim 1, wherein a calculation formula of the mood value calculation model in step S2 is specifically:
m=ωtraffict+ωspeedv+ωego_behaviore+ωenvironmentw
Figure FDA0003419950160000011
Figure FDA0003419950160000012
w=ωvisibilityWvisibilityroad_conditionswroad_conditions
wherein m is the mood value, omega, obtained by the driver at the current time steptrafficIs the weight of the traffic jam degree parameter, t is the traffic jam degree parameter, and is determined by the total number n of vehicles with the longitudinal distance between the own vehicle in the lane and the adjacent lane and the own vehicle within the set range, omegaspeedIs the weight of the speed parameter, v is the speed parameter, and is measured by the magnitude of the speed of the bicycle, omegaego_behaviorThe behavior of the vehicle is dispersed into a combination of actions in the transverse direction and the longitudinal direction, and the combination comprises ten actions of longitudinal uniform speed, longitudinal deceleration, longitudinal acceleration, longitudinal rapid deceleration, longitudinal rapid acceleration, longitudinal uniform speed, transverse uniform speed, longitudinal acceleration transverse uniform speed, longitudinal rapid acceleration transverse uniform speed, longitudinal deceleration transverse uniform speed and longitudinal rapid deceleration transverse uniform speed; when the action taken by the self vehicle comprises 'longitudinal uniform speed', e is 0;
when the motion adopted by the self vehicle is used as ' longitudinal acceleration ', ' longitudinal rapid acceleration ', ' transverse uniform speed longitudinal acceleration ' and ' transverse uniform speed longitudinal rapid acceleration ', ' e is 1;
when the actions taken by the self-vehicle are 'longitudinal deceleration', 'transverse uniform speed longitudinal deceleration' and 'transverse uniform speed longitudinal uniform speed', e is-1;
when the action taken by the self vehicle is 'longitudinal rapid deceleration' and 'transverse uniform speed longitudinal rapid deceleration', e is-5;
the behavior of other vehicles influencing the mood of drivers of the vehicles mainly comprises two behaviors, one is that the vehicles ahead of the same lane suddenly decelerate in front of the vehicles, and the other is that the vehicles ahead of the adjacent lanes suddenly cut into the lane where the vehicles are located; both of these behaviors force the vehicle to slow down or change lanes and are thus already included in the vehicle behavior parameter e;
ωenvironmentis the weight of the natural environment factors, w is the natural environment factors including visibility and road conditions: under different natural environments, the mood of a driver is different, and when the visibility is good and the road surface condition is good, the driver can attach more importance to the experience and mood during driving; when visibility is poor and the road surface is poor, driving difficulty is increased, and a driver can pay more attention to driving safety rather than driving experience;
ωvisibilityas a weight of visibility factor, wvisibilityFor visibility reasons, the visibility is semi-quantified as "good", "normal" and "poor", and when the visibility is "good", wvisibility=0;
When visibility is "general", wvisibility=-0.5;
When visibility is "poor", wvisibility=-1;
ωroad_conditionsIs the weight of the road condition factor, wroad_conditionsThe road condition was semi-quantitatively defined as "good", "normal" and "poor" for the road condition factor, and when the road condition was "good", w wasroad_conditions=0;
When the road condition is "normal", wroad_conditions=-0.5;
When the road condition is "poor", wroad_conditions=-1。
3. The method for generating a test scenario based on an adaptive driving style dynamic switching model according to claim 2, wherein the step S2 is to construct a driving style conversion system based on a threshold or a ratio.
4. The method for generating a test scenario based on an adaptive driving style dynamic switching model according to claim 3, wherein the threshold-based construction of the driving style conversion system specifically comprises: when the accumulated mood value of the driver exceeds the mood value, switching the driving style, wherein the calculation formula of the accumulated mood value is as follows:
Figure FDA0003419950160000021
wherein m isvIs the cumulative mood value of the driver, mtThe mood value obtained by the driver at the time step t and the current time step h.
5. The method for generating a test scenario based on an adaptive driving style dynamic switching model according to claim 3, wherein the building of the driving style conversion system based on a proportion specifically comprises: at a certain time step, the driver has Pi(0<i≤2,p1+p21) of the single-style driver model i, where PiThe calculation formula of (2) is as follows:
Figure FDA0003419950160000031
wherein m ismaxTo accumulate the maximum value of mood values, mminTo accumulate the minimum value of mood values, mnowIs the current heart state value.
6. The method for generating a test scenario based on an adaptive driving style dynamic switching model according to claim 1, wherein the specific process of training the mind state change model in step S3 is as follows: firstly, defining a state set used for training, and semiquantitatively setting the traffic jam degree as 'crowded', 'normal' and 'vacant' according to a lane where a self-vehicle is located and the total number n of vehicles with the longitudinal distance between the self-vehicle and an adjacent lane within a set range;
according to the speed of the vehicle, the semi-quantitative speed is 'fast', 'normal speed' and 'slow';
according to the relative distance between the self vehicle and the front vehicle, the distance is semi-quantitatively taken as 'near', 'proper' and 'far';
natural environments include both visibility, which can be semi-quantified as "good", "normal", and "poor", and road conditions, which can be semi-quantified as "good", "normal", and "poor";
thereafter defining a set of actions for the mood swings including "unchanged", "tend to be negative", "tend to be positive", "tend to be negative fast", and "tend to be positive fast", wherein "tend to be negative" and "tend to be negative fast" both indicate that the mood value decreases, but "tend to be negative fast" the mood value decreases faster;
and designing a reward function, and finally training the dynamic change model by using two single-style driver models.
7. The method as claimed in claim 6, wherein the reward function is designed based on the fluctuation of driver's mind, driving risk and the influence of mood value to obtain a corresponding reward function.
8. The method for generating a test scenario based on an adaptive driving style dynamic switching model according to claim 7, wherein the reward function is specifically:
R=ωfRfeaturescRcrashmRm
wherein, ω isfWeight, omega, for driver's mood swingscWeighted by driving risk, ωmIs the weight of mood value, RfeaturesAs a heart-beat reward function, RcrashFor driving risk reward functions, RmThe reward function is influenced by the mood value for the mood.
9. The method as claimed in claim 6, wherein the two single-style driver models include a first-style driver model and a second-style driver model.
10. The method for generating a test scenario based on an adaptive driving style dynamic switching model according to claim 9, wherein the specific process of training the dynamically changing model by using two single style driver models is as follows: firstly, a safe traffic flow environment is built by utilizing a first style model, a mind state change model is preliminarily trained, then a challenging traffic flow environment is built by utilizing a second style driver model, and the mind state change model is further trained.
CN202111558858.6A 2021-12-20 2021-12-20 Test scene generation method based on self-adaptive driving style dynamic switching model Pending CN114385113A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111558858.6A CN114385113A (en) 2021-12-20 2021-12-20 Test scene generation method based on self-adaptive driving style dynamic switching model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111558858.6A CN114385113A (en) 2021-12-20 2021-12-20 Test scene generation method based on self-adaptive driving style dynamic switching model

Publications (1)

Publication Number Publication Date
CN114385113A true CN114385113A (en) 2022-04-22

Family

ID=81197753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111558858.6A Pending CN114385113A (en) 2021-12-20 2021-12-20 Test scene generation method based on self-adaptive driving style dynamic switching model

Country Status (1)

Country Link
CN (1) CN114385113A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2942012A1 (en) * 2014-05-08 2015-11-11 Continental Automotive GmbH Driver assistance system
WO2020205597A1 (en) * 2019-03-29 2020-10-08 Intel Corporation Autonomous vehicle system
CN112519788A (en) * 2019-09-19 2021-03-19 北京新能源汽车股份有限公司 Method and device for determining driving style and automobile
CN112677983A (en) * 2021-01-07 2021-04-20 浙江大学 System for recognizing driving style of driver
CN114862156A (en) * 2022-04-22 2022-08-05 同济大学 Emotion-driven personalized driver model customized test scene generation method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2942012A1 (en) * 2014-05-08 2015-11-11 Continental Automotive GmbH Driver assistance system
WO2020205597A1 (en) * 2019-03-29 2020-10-08 Intel Corporation Autonomous vehicle system
CN112519788A (en) * 2019-09-19 2021-03-19 北京新能源汽车股份有限公司 Method and device for determining driving style and automobile
CN112677983A (en) * 2021-01-07 2021-04-20 浙江大学 System for recognizing driving style of driver
CN114862156A (en) * 2022-04-22 2022-08-05 同济大学 Emotion-driven personalized driver model customized test scene generation method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
QING-LONG LU等: "Exploring the influence of automated driving styles on network efficiency", TRANSPORTATION RESEARCH PROCEDIA, vol. 52, 3 February 2021 (2021-02-03), pages 380 - 387 *
侯海晶 等: "驾驶风格对驾驶行为的影响", 中国公路学报, vol. 31, no. 04, 15 April 2018 (2018-04-15), pages 18 - 27 *
马依宁 等: "基于不同风格行驶模型的自动驾驶仿真测试自演绎场景研究", 中国公路学报, vol. 36, no. 2, 27 December 2022 (2022-12-27), pages 216 - 228 *

Similar Documents

Publication Publication Date Title
CN109709956B (en) Multi-objective optimized following algorithm for controlling speed of automatic driving vehicle
KR102325028B1 (en) Method and device for performing multiple agent sensor fusion in cooperative driving based on reinforcement learning
CN112703459B (en) Iterative generation of confrontational scenarios
CN110686906B (en) Automatic driving test method and device for vehicle
CN109213148B (en) Vehicle low-speed following decision method based on deep reinforcement learning
CN107168303A (en) A kind of automatic Pilot method and device of automobile
US7302344B2 (en) Driver adaptive collision warning system
CN111795832B (en) Intelligent driving vehicle testing method, device and equipment
CN111409648B (en) Driving behavior analysis method and device
CN109002595A (en) Simulate the two-way traffic cellular automata microscopic traffic simulation method of dynamic lane-change behavior
DE112010003678T5 (en) TRAFFIC EVALUATION SYSTEM, VEHICLE MOUNTED MACHINE AND INFORMATION PROCESSING CENTER
Kamalanathsharma et al. Agent-based simulation of ecospeed-controlled vehicles at signalized intersections
CN111159832A (en) Construction method and device of traffic information flow
Toledo et al. State dependence in lane-changing models
CN114802306A (en) Intelligent vehicle integrated decision-making system based on man-machine co-driving concept
Alam et al. Intellegent traffic light control system for isolated intersection using fuzzy logic
CN116596380A (en) Optimization determination method, platform, equipment and medium for expressway construction organization scheme and management and control scheme
CN112581756B (en) Driving risk assessment method based on hybrid traffic
CN112258097B (en) Driving assistance method and system based on big data
CN114385113A (en) Test scene generation method based on self-adaptive driving style dynamic switching model
CN112462759A (en) Method and system for evaluating rule control algorithm and computer storage medium
CN114475607B (en) Method and device for changing lanes for automatic driving vehicle, vehicle and storage medium
Tan et al. Multiple-vehicle collision influenced by misjudgment of space headway in traffic flow under fog weather condition
CN114779764B (en) Vehicle reinforcement learning movement planning method based on driving risk analysis
Olstam Simulation of vehicles in a driving simulator using microscopic traffic simulation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination