CN114385113A

CN114385113A - Test scene generation method based on self-adaptive driving style dynamic switching model

Info

Publication number: CN114385113A
Application number: CN202111558858.6A
Authority: CN
Inventors: 马依宁; 陈君毅; 吴建峰; 吴靖宇; 熊璐
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2021-12-20
Filing date: 2021-12-20
Publication date: 2022-04-22

Abstract

The invention relates to a test scene generation method based on a self-adaptive driving style dynamic switching model, which comprises the following steps: respectively defining a mood value and a mood value of a driver, wherein the mood value is used for quantitatively representing the influence of the current traffic condition and the environmental state on the mood of the driver, and the mood value is used for representing the prediction of the driver on the future traffic condition; constructing a driving style conversion system based on the mood value and the mood value, wherein the driving style conversion system comprises a mood value calculation model and a mood change model; training a mental state change model by adopting a reinforcement learning method; combining a driving style conversion system with two single-style driver models, and jointly constructing to obtain a self-adaptive driving style dynamic switching model; and generating an automatic driving test scene by utilizing the self-adaptive driving style dynamic switching model. Compared with the prior art, the method and the device can automatically switch the driving style according to the change of the environment, and effectively improve the reality and complexity of the automatic driving test scene.

Description

Test scene generation method based on self-adaptive driving style dynamic switching model

Technical Field

The invention relates to the technical field of automatic driving tests, in particular to a test scene generation method based on a self-adaptive driving style dynamic switching model.

Background

The current automatic driving vehicle test mainly comprises a real vehicle test and a virtual simulation test. The real vehicle test needs a great deal of manpower and time, and the limitation of the real vehicle test is more and more obvious along with the improvement of the driving automation level; in the virtual simulation test, the test scene configuration is flexible, the test efficiency is high, the test repeatability is strong, the test process is safe, the test cost is low, automatic test and accelerated test can be realized, a large amount of manpower and material resources can be saved, but the existing virtual simulation technology has some defects.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a test scene generation method based on a self-adaptive driving style dynamic switching model, which generates a dynamic automatic driving test scene consistent with the vehicle behavior in the real traffic environment by constructing the self-adaptive driving style dynamic switching model, thereby improving the reliability of the automatic driving virtual test result.

The purpose of the invention can be realized by the following technical scheme: a test scene generation method based on a self-adaptive driving style dynamic switching model comprises the following steps:

s1, defining a mood value and a mood value of the driver respectively, wherein the mood value is used for quantitatively representing the influence of the current traffic condition and the environmental state on the mood of the driver, and the mood value is used for representing the prediction of the driver on the future traffic condition;

s2, constructing a driving style conversion system based on the mood value and the mood value, wherein the driving style conversion system comprises a mood value calculation model and a mood change model;

s3, training a mental state change model by adopting a reinforcement learning method;

s4, combining the driving style conversion system with two single-style driver models to jointly construct a self-adaptive driving style dynamic switching model;

and S5, generating an automatic driving test scene by utilizing the self-adaptive driving style dynamic switching model.

Further, the calculation formula of the mood value calculation model in step S2 is specifically:

m＝ω_traffict+ω_speedv+ω_{ego_behavior}e+ω_environmentw

w＝ω_visibilityw_visibility+ω_{road_conditions}w_{road_conditions}

wherein m is the mood value, omega, obtained by the driver at the current time step_trafficIs the weight of the traffic jam degree parameter, t is the traffic jam degree parameter, and is determined by the total number n of vehicles with the longitudinal distance between the own vehicle in the lane and the adjacent lane and the own vehicle within the set range, omega_spwedIs the weight of the speed parameter, v is the speed parameter, and is measured by the magnitude of the speed of the bicycle, omega_{ego_Behavior}The behavior of the vehicle is dispersed into a combination of actions in the transverse direction and the longitudinal direction, wherein the actions comprise longitudinal uniform speed, longitudinal deceleration, longitudinal acceleration and longitudinal rapid decelerationTen actions of speed, longitudinal rapid acceleration, longitudinal uniform speed and transverse uniform speed, longitudinal acceleration transverse uniform speed, longitudinal rapid acceleration transverse uniform speed, longitudinal deceleration transverse uniform speed and longitudinal rapid deceleration transverse uniform speed are carried out; when the action taken by the self vehicle comprises 'longitudinal uniform speed', e is 0;

when the motion adopted by the self vehicle is used as ' longitudinal acceleration ', ' longitudinal rapid acceleration ', ' transverse uniform speed longitudinal acceleration ' and ' transverse uniform speed longitudinal rapid acceleration ', ' e is 1;

when the actions taken by the self-vehicle are 'longitudinal deceleration', 'transverse uniform speed longitudinal deceleration' and 'transverse uniform speed longitudinal uniform speed', e is-1;

when the action taken by the self vehicle is 'longitudinal rapid deceleration' and 'transverse uniform speed longitudinal rapid deceleration', e is-5;

the behavior of other vehicles influencing the mood of drivers of the vehicles mainly comprises two behaviors, one is that the vehicles ahead of the same lane suddenly decelerate in front of the vehicles, and the other is that the vehicles ahead of the adjacent lanes suddenly cut into the lane where the vehicles are located; both of these behaviors force the vehicle to slow down or change lanes and are thus already included in the vehicle behavior parameter e;

ω_environmentis the weight of the natural environment factors, w is the natural environment factors including visibility and road conditions: under different natural environments, the mood of a driver is different, and when the visibility is good and the road surface condition is good, the driver can attach more importance to the experience and mood during driving; when visibility is poor and the road surface is poor, driving difficulty is increased, and a driver can pay more attention to driving safety rather than driving experience;

ω_visibilityas a weight of visibility factor, w_visibilityFor visibility reasons, the visibility is semi-quantified as "good", "normal" and "poor", and when the visibility is "good", w_visibility＝0；

When visibility is "general", w_visibility＝-0.5；

When visibility is "poor", w_visibility＝-1；

ω_{road_conditions}For roadsWeight of condition factor, w_{road_conditions}The road condition was semi-quantitatively defined as "good", "normal" and "poor" for the road condition factor, and when the road condition was "good", w was_{road_conditions}＝0：

When the road condition is "normal", w_{road_conditions}＝-0.5；

When the road condition is "poor", w_{road_conditions}＝-1。

Further, the step S2 of constructing the driving style conversion system is specifically to construct the driving style conversion system based on a threshold or a ratio.

Further, the threshold-based construction of the driving style conversion system specifically includes: when the accumulated mood value of the driver exceeds the mood value, switching the driving style, wherein the calculation formula of the accumulated mood value is as follows:

wherein m is_vIs the cumulative mood value of the driver, m_tThe mood value obtained by the driver at the time step t and the current time step h.

Further, the proportion-based construction of the driving style conversion system specifically includes: at a certain time step, the driver has P_i(0＜i≤2，p₁+p₂1) of the single-style driver model i, where P_iThe calculation formula of (2) is as follows:

wherein m is_maxTo accumulate the maximum value of mood values, m_minTo accumulate the minimum value of mood values, m_nowIs the current heart state value.

Further, the specific process of training the mind state change model in step S3 is as follows: firstly, defining a state set used for training, and semiquantitatively setting the traffic jam degree as 'crowded', 'normal' and 'vacant' according to a lane where a self-vehicle is located and the total number n of vehicles with the longitudinal distance between the self-vehicle and an adjacent lane within a set range;

according to the speed of the vehicle, the semi-quantitative speed is 'fast', 'normal speed' and 'slow';

according to the relative distance between the self vehicle and the front vehicle, the distance is semi-quantitatively taken as 'near', 'proper' and 'far';

natural environments include both visibility, which can be semi-quantified as "good", "normal", and "poor", and road conditions, which can be semi-quantified as "good", "normal", and "poor";

thereafter defining a set of actions for the mood swings including "unchanged", "tend to be negative", "tend to be positive", "tend to be negative fast", and "tend to be positive fast", wherein "tend to be negative" and "tend to be negative fast" both indicate that the mood value decreases, but "tend to be negative fast" the mood value decreases faster;

and designing a reward function, and finally training the dynamic change model by using two single-style driver models.

Further, the reward function is specifically designed to obtain a corresponding reward function based on the mental state fluctuation condition of the driver, the driving risk degree and the influence of the mood value.

Further, the reward function is specifically:

R＝ω_fR_features+ω_cR_crash+ω_mR_m

wherein, ω is_fWeight, omega, for driver's mood swings_cWeighted by driving risk, ω_mIs the weight of mood value, R_featuresAs a heart-beat reward function, R_crashFor driving risk reward functions, R_mThe reward function is influenced by the mood value for the mood.

Further, the two single-style driver models include a first-style driver model and a second-style driver model.

Further, the specific process of training the posture change model by using two single-style driver models is as follows: firstly, a safe traffic flow environment is built by utilizing a first style model, a mind state change model is preliminarily trained, then a challenging traffic flow environment is built by utilizing a second style driver model, and the mind state change model is further trained.

Compared with the prior art, the method defines the mood value of the driver, and provides a mood value calculation method considering traffic jam factors, own vehicle behaviors, natural environment and other vehicle behaviors; meanwhile, the mood value of a driver is defined, the mood value and the mood value are combined to construct a driving style conversion system, then a reinforcement learning method is utilized to train a mood change model, the driving style conversion system is combined with two existing single-style driver models to construct and obtain a self-adaptive driving style dynamic switching model, a self-adaptive driving style dynamic switching test scene is generated by utilizing the model, the driving style can be automatically switched according to the change of the environment, the authenticity and the complexity of the test scene are effectively improved, and the reliability of an automatic driving virtual test result is ensured.

When the mind state change model is trained, basic facts of the fluctuation condition of the mind state of a driver, the driving danger degree and the influence of the mind state value are fully considered to design a reward function for training, and the mind state change model is trained in batches by combining two single style driver models, so that the accuracy of the mind state change model can be ensured, the driving style of the constructed driving style dynamic switching model can be adaptively changed according to the change of the driving environment, and a corresponding test scene is generated.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a schematic diagram of an embodiment of an application process;

FIG. 3 is a flowchart illustrating the training of an adaptive threshold-based driving style dynamic switching model according to an exemplary embodiment;

FIG. 4 is a flowchart of a method of a threshold-based driving style conversion system according to an embodiment;

FIG. 5 is a flowchart of training a scale-based adaptive driving style dynamic switching model according to an embodiment;

fig. 6 is a flowchart of a method of a scale-based driving style conversion system according to an embodiment.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments.

Examples

As shown in fig. 1, a test scenario generation method based on an adaptive driving style dynamic switching model includes the following steps:

s2, constructing a driving style conversion system based on the mood value and the mood value, wherein the driving style conversion system comprises a mood value calculation model and a mood change model, and the calculation formula of the mood value calculation model is as follows:

m＝ω_traffict+ω_speedv+ω_{ego_behavior}e+ω_environmentw

w＝ω_visibilityw_visibility+ω_{road_conditions}w_{road_conditions}

wherein m is the mood value, omega, obtained by the driver at the current time step_trafficThe weight of the traffic jam degree parameter is t, and the vehicle is located in the lane and adjacent lanesThe total number n of vehicles with the longitudinal distance within a set range is determined, omega_spwedIs the weight of the speed parameter, v is the speed parameter, and is measured by the magnitude of the speed of the bicycle, omega_{ego_behavior}The behavior of the vehicle is dispersed into a combination of actions in the transverse direction and the longitudinal direction, and the combination comprises ten actions of longitudinal uniform speed, longitudinal deceleration, longitudinal acceleration, longitudinal rapid deceleration, longitudinal rapid acceleration, longitudinal uniform speed, transverse uniform speed, longitudinal acceleration transverse uniform speed, longitudinal rapid acceleration transverse uniform speed, longitudinal deceleration transverse uniform speed and longitudinal rapid deceleration transverse uniform speed; when the action taken by the self vehicle comprises 'longitudinal uniform speed', e is 0;

ω_visibilityas a weight of visibility factor, w_visibilityFor visibility reasons, the visibility is halvedThe basis weights are "good", "normal" and "poor", and when the visibility is "good", w_visibility＝0；

When visibility is "general", w_visibility＝-0.5；

When visibility is "poor", w_visibility＝-1；

ω_{road_conditions}Is the weight of the road condition factor, w_{road_conditions}The road condition was semi-quantitatively defined as "good", "normal" and "poor" for the road condition factor, and when the road condition was "good", w was_{road_conditions}＝0：

When the road condition is "normal", w_{road_conditions}＝-0.5；

When the road condition is "poor", w_{road_conditions}＝-1；

In addition, when applied specifically, the driving style conversion system can be constructed based on a threshold value or based on a proportion:

the threshold-based construction of the driving style conversion system means that the driving style is switched after the accumulated mood value of the driver exceeds the mood value, and the calculation formula of the accumulated mood value is as follows:

in the formula, m_vIs the cumulative mood value of the driver, m_tObtaining the mood value of the driver at t time step, and obtaining the current time step by h;

the fact that the driving style conversion system is constructed based on the proportion means that a driver has P at a certain time step_i(0＜i≤2，p₁+p₂1) probability of executing a single-style driver model i, P_iThe calculation formula of (2) is as follows:

in the formula, m_maxTo accumulate the maximum value of mood values, m_minTo accumulate the minimum value of mood values, m_nowIs the current heart state value;

s3, training a mind state change model by adopting a reinforcement learning method, during training, firstly defining a state set used for training, and semiquantitatively setting the traffic jam degree as 'crowded', 'normal' and 'open' according to the lane where the own vehicle is located and the total number n of vehicles with the longitudinal distance between the adjacent lane and the own vehicle within a set range;

designing a reward function, and finally training the dynamic change model by using two single-style driver models;

in the technical scheme, the reward function is based on the mental state fluctuation condition of the driver, the driving danger degree and the influence of the mood value so as to design and obtain the corresponding reward function:

R＝ω_fR_features+ω_cR_crash+ω_mR_m

in the formula, ω_fWeight, omega, for driver's mood swings_cWeighted by driving risk, ω_mIs the weight of mood value, R_featuresAs a heart-beat reward function, R_crashFor drivingDriving risk degree reward function, R_mRewarding functions for mood affected by mood values;

in addition, the two single-style driver models can select the existing first-style driver model and the second-style driver model, firstly, the first-style driver model is used for constructing a safe traffic flow environment, the mind state change model is preliminarily trained, then, the second-style driver model is used for constructing a challenging traffic flow environment, and the mind state change model is further trained. (ii) a

The embodiment applies the above technical solution, as shown in fig. 2:

firstly, constructing a driving style conversion system

First, a mood value is calculated in consideration of the influence of traffic flow, own vehicle behavior, natural environment, and other vehicle behavior, and a calculation formula of the mood value is as follows:

m＝ω_traffict+ω_speedv+ω_{ego_behavior}e+ω_environmentw

wherein m represents the mood value, ω, obtained by the driver at the current time step_trafficThe weight of the traffic jam degree parameter is represented, t represents the traffic jam degree parameter, the t is determined by the total number n of vehicles with the longitudinal distance between the lane where the vehicle is located and the adjacent lane and the vehicle within the range of 42m, and the specific calculation formula of the value of t is as follows:

ω_speedrepresents the velocity parameter weight, ω in this embodiment_speed0.8; v represents a speed parameter, and is measured by the speed of the bicycle; omega_{ego_Behavior}Represents the weight of the behavior parameter of the vehicle, in this embodiment ω_{ego_Behavior}0.5; e represents the behavior parameters of the vehicle, wherein the behavior of the vehicle is dispersed into a combination of actions in the transverse direction and the longitudinal direction, and the combination comprises ten actions of longitudinal uniform speed, longitudinal deceleration, longitudinal acceleration, longitudinal rapid deceleration, longitudinal rapid acceleration, longitudinal uniform speed, transverse uniform speed, longitudinal acceleration, transverse uniform speed, longitudinal rapid acceleration, transverse uniform speed, longitudinal deceleration, transverse uniform speed and longitudinal rapid deceleration, wherein the ten actions are taken; when the action taken by the self vehicle comprises 'longitudinal uniform speed', e is 0; when the motion adopted by the self vehicle is used as ' longitudinal acceleration ', ' longitudinal rapid acceleration ', ' transverse uniform speed longitudinal acceleration ' and ' transverse uniform speed longitudinal rapid acceleration ', ' e is 1; when the actions taken by the self-vehicle are 'longitudinal deceleration', 'transverse uniform speed longitudinal deceleration' and 'transverse uniform speed longitudinal uniform speed', e is-1; when the action taken by the self vehicle is 'longitudinal rapid deceleration' and 'transverse uniform-speed longitudinal rapid deceleration', e is-5, the behaviors of the other vehicle influencing the mood of the driver of the self vehicle mainly comprise two behaviors, one behavior is that the vehicle ahead of the self vehicle in the same lane is rapidly decelerated, and the other behavior is that the vehicle ahead of the adjacent lane is suddenly cut into the lane where the self vehicle is located; both of these behaviors force the vehicle to slow down or change lanes and are thus already included in the vehicle behavior parameter e. Omega_environmentWeight representing natural environment factor, in this embodiment ω_environment1 is ═ 1; w represents a natural environment factor, and the natural environment includes visibility and road surface conditions: the mood of the driver can be different under different natural environments, and when the visibility is good and the road surface condition is good, the driver can attach more importance to the experience during driving, so that the mood is better; when visibility is poor, the road surface is poor, the driving difficulty can increase, the driver can pay more attention to driving safety rather than driving experience, the mood becomes poor, and a specific calculation formula is as follows:

w＝ω_visibilityw_visibility+ω_{road_conditions}w_{road_conditions}

wherein, ω is_visibilityWeight representing visibility factor inIn this example ω_visibility＝0.9，w_visibilityRepresenting visibility factors, the visibility is semi-quantified as "good", "normal" and "poor", and when the visibility is "good", w_visibilityWhen visibility is "normal", w is 0_visibility-0.5, when visibility is "poor", w_visibility＝-1；ω_{road_conditions}Weight representing road condition factor, in this embodiment ω_{road_conditions}＝0.7，w_{road_conditions}Representing the road condition factors, the road condition was semi-quantitatively defined as "good", "general" and "poor", and when the road condition was "good", w was_{road_conditions}When the road condition is "normal", w is 0_{road_conditions}-0.5, when the road condition is "poor", w_{road_conditions}＝-1。

Secondly, a driving style conversion system is constructed by utilizing the mood value and the mood value, if the driving style conversion system based on the threshold value is constructed, as shown in fig. 4, the cumulative mood value is calculated and is used as the threshold value of the cumulative mood value, when the cumulative mood value exceeds the mood value, the driver model is switched, otherwise, the driver model is not switched. The calculation formula of the cumulative mood value is as follows:

where mt represents the mood value obtained by the driver at time step t, and h represents the current time step.

If a proportion-based driving style conversion system is constructed, as shown in fig. 6, an accumulated mood value is calculated, and the probability P that a driver selects a single driver model i at a certain time step is calculated by combining the mood value_i(0＜i≤2，p₁+p₂1), wherein P_iThe calculation process of (2) is as follows:

Second, training the mental state change model

Firstly, defining a state set used for training and an action set used for mental state fluctuation, and semiquantitatively determining the traffic jam degree as 'crowded', 'normal' and 'vacant', according to the lane where the vehicle is located and the total number n of vehicles with the longitudinal distance between the adjacent lane and the vehicle within the range of 42 m; according to the speed of the vehicle, the semi-quantitative speed is 'fast', 'normal speed' and 'slow'; according to the relative distance between the self vehicle and the front vehicle, the distance is semi-quantitatively taken as 'near', 'proper' and 'far'; the natural environment includes both visibility and road surface conditions, where visibility can be semi-quantified as "good", "normal" and "poor", and road surface conditions can be semi-quantified as "good", "normal" and "poor". Poor visibility in snowy weather, poor road conditions due to icing; the driving visibility and the road surface condition of the country road in the haze day are poor; the visibility and the inside conditions are good when the vehicle is driven on a highway in sunny days, and the like. The set of actions for mood swings includes "do not change", "tend to be negative", "tend to be positive", "tend to be negative quickly", and "tend to be positive quickly", where "tend to be negative" and "tend to be negative quickly" both mean that the mood value drops, but the mood value drops faster "tends to be negative quickly".

Secondly, the reward function is designed according to several basic facts:

(1) human driver mood swings: the change of the mind state of the human driver is an accumulative process, namely the change of the mind state of a person in a short time is not too violent generally, and according to the fluctuation condition of the mind state, a reward function is designed to be as follows: r_features；

(2) Driving risk degree: human drivers always tend to avoid traffic accidents, and the reward function is designed as follows: r_crash；

(3) Influenced by mood values: considering that the mind of the human driver is influenced by the current moodThe reward function is: r_m；

The resulting reward function is:

R＝ω_fR_features+ω_cR_crash+ω_mR_m

wherein, ω is_fRepresenting the weight, omega, of the fluctuation of the human driver's mind_cWeight, ω, representing driving risk_mRepresenting the weight occupied by the mood value.

Finally, training by using the existing conservative driver model and aggressive driver model, firstly constructing a safe traffic flow environment by using the conservative driver model, primarily training an mind state change model, then constructing a challenging traffic flow environment by using the aggressive driver model, and further training the mind state change model;

if a driving style conversion system based on a threshold value is combined to train a mind state change model, conversion between an aggressive driving style and a conservative driving style is realized; as shown in fig. 3, during training, the state quantity s of the heart state change model at the current time t is obtained_tThen, a sum current state quantity s is selected_tCorresponding most valuable action

Execution of a_tThen, the mental state value is changed and input into the driving style conversion system, and the driving style conversion system simultaneously changes the driving style according to the current state s_tCalculating an accumulated mood value, determining whether to switch the driver model according to whether the accumulated mood value exceeds a mood value, and executing the single-style driver model by the system according to the current state s_tCorresponding vehicle actions are executed, and then the vehicle obtains a reward r_t+1And enters the next state s_t+1According to the correspondence of the next state

And r_t+1To update Q(s)_t，a_t) And continuously iterating and updating to finally obtain the heart state change model.

If combined with each otherA proportion-based driving style conversion system trains a mental state change model to realize conversion between aggressive driving style and conservative driving style; as shown in fig. 5, during training, the state quantity s of the heart state change model at the current time t is obtained_tThen, a sum current state quantity s is selected_tCorresponding most valuable action

Execution of a_tThen, the mental state value is changed and input into the driving style conversion system, and the driving style conversion system simultaneously changes the driving style according to the current state s_tCalculating the cumulative mood value, calculating the probability of selecting a single style driver model according to the cumulative mood value and the mood value, and then selecting the single style driver model according to the probability by the system, wherein the model is based on the current state s_tCorresponding vehicle actions are executed, and then the vehicle obtains a reward r_t+1And enters the next state s_t+1According to the correspondence of the next state

Thirdly, generating a test scene

Combining a driving style conversion system, a trained mind state change model and two single mind state driver models (aggressive type and conservative type) to construct a self-adaptive driving style dynamic switching model, and generating an automatic driving test scene by using the model; specifically, a straight line three-lane road model is constructed, and n is used_car(n_carFor background car numbers, [20, 40 ]]Random selection of n with inner equal probability_carValue) background vehicles are randomly placed on any position of the road, wherein fifty percent of the initial driving strategies of the background vehicles are driving strategies of an aggressive driver model, the other fifty percent of the initial driving strategies of the background vehicles are driving strategies of a conservative driver model, and then all the background vehicles execute driving tasks according to the self-adaptive driving style dynamic switching model.

During testing, the tested automatic driving decision-making system is put into a test scene, and the tested automatic driving decision-making system is evaluated by testing the passing efficiency, comfort and safety of the system in the scene. Specifically, the running efficiency of the tested automatic driving decision system is reflected by testing the passing time and the average speed of the tested automatic driving decision system in a scene; the comfort of the tested automatic driving decision system is reflected by measuring the average of the absolute values of the acceleration in the running process of the tested automatic driving decision system; the safety of the tested automatic driving decision system is reflected by measuring the number of times the system collides in the scene and the number of times the minimum safety interval is violated.

In summary, according to the technical scheme, a self-adaptive driving style dynamic switching model is constructed by simulating a process of driving style change caused by mood change of a driver, specifically, a mood value and a mood value of the driver are defined, influence of the current state of the driver on the mood of the driver is quantitatively expressed through the mood value, prediction of the driver on future traffic conditions is expressed through the mood value, then a driving style conversion system is constructed by combining the mood value and the mood value, then a reinforced learning method is utilized to train the mood change model, the driving style conversion system, the mood change model and two single mood driver models are combined to construct the self-adaptive driving style dynamic switching model, and a test scene of self-adaptive driving style switching is generated by utilizing the model. In practical application, the tested automatic driving system is placed in a test scene, and the tested automatic driving system can be evaluated by testing the running efficiency, comfort and safety of the system in the scene.

The driver model obtained by training in the technical scheme can change the driving style according to the change of the driving environment, which is similar to that of a real human driver, so that the generated test scene has stronger authenticity; the driver model obtained by training in the technical scheme can change the driving style in a self-adaptive manner, and the test scene is generated by using the driver model, so that the behavior of a background vehicle in the scene has higher uncertainty, and the test effect is further improved.

Claims

1. A test scene generation method based on a self-adaptive driving style dynamic switching model is characterized by comprising the following steps:

2. The method for generating a test scenario based on an adaptive driving style dynamic switching model according to claim 1, wherein a calculation formula of the mood value calculation model in step S2 is specifically:

m＝ω_traffict+ω_speedv+ω_{ego_behavior}e+ω_environmentw

w＝ω_visibilityW_visibility+ω_{road_conditions}w_{road_conditions}

wherein m is the mood value, omega, obtained by the driver at the current time step_trafficIs the weight of the traffic jam degree parameter, t is the traffic jam degree parameter, and is determined by the total number n of vehicles with the longitudinal distance between the own vehicle in the lane and the adjacent lane and the own vehicle within the set range, omega_speedIs the weight of the speed parameter, v is the speed parameter, and is measured by the magnitude of the speed of the bicycle, omega_{ego_behavior}The behavior of the vehicle is dispersed into a combination of actions in the transverse direction and the longitudinal direction, and the combination comprises ten actions of longitudinal uniform speed, longitudinal deceleration, longitudinal acceleration, longitudinal rapid deceleration, longitudinal rapid acceleration, longitudinal uniform speed, transverse uniform speed, longitudinal acceleration transverse uniform speed, longitudinal rapid acceleration transverse uniform speed, longitudinal deceleration transverse uniform speed and longitudinal rapid deceleration transverse uniform speed; when the action taken by the self vehicle comprises 'longitudinal uniform speed', e is 0;

When visibility is "general", w_visibility＝-0.5；

When visibility is "poor", w_visibility＝-1；

ω_{road_conditions}Is the weight of the road condition factor, w_{road_conditions}The road condition was semi-quantitatively defined as "good", "normal" and "poor" for the road condition factor, and when the road condition was "good", w was_{road_conditions}＝0；

When the road condition is "normal", w_{road_conditions}＝-0.5；

When the road condition is "poor", w_{road_conditions}＝-1。

3. The method for generating a test scenario based on an adaptive driving style dynamic switching model according to claim 2, wherein the step S2 is to construct a driving style conversion system based on a threshold or a ratio.

4. The method for generating a test scenario based on an adaptive driving style dynamic switching model according to claim 3, wherein the threshold-based construction of the driving style conversion system specifically comprises: when the accumulated mood value of the driver exceeds the mood value, switching the driving style, wherein the calculation formula of the accumulated mood value is as follows:

5. The method for generating a test scenario based on an adaptive driving style dynamic switching model according to claim 3, wherein the building of the driving style conversion system based on a proportion specifically comprises: at a certain time step, the driver has P_i(0＜i≤2，p₁+p₂1) of the single-style driver model i, where P_iThe calculation formula of (2) is as follows:

6. The method for generating a test scenario based on an adaptive driving style dynamic switching model according to claim 1, wherein the specific process of training the mind state change model in step S3 is as follows: firstly, defining a state set used for training, and semiquantitatively setting the traffic jam degree as 'crowded', 'normal' and 'vacant' according to a lane where a self-vehicle is located and the total number n of vehicles with the longitudinal distance between the self-vehicle and an adjacent lane within a set range;

7. The method as claimed in claim 6, wherein the reward function is designed based on the fluctuation of driver's mind, driving risk and the influence of mood value to obtain a corresponding reward function.

8. The method for generating a test scenario based on an adaptive driving style dynamic switching model according to claim 7, wherein the reward function is specifically:

R＝ω_fR_features+ω_cR_crash+ω_mR_m

9. The method as claimed in claim 6, wherein the two single-style driver models include a first-style driver model and a second-style driver model.

10. The method for generating a test scenario based on an adaptive driving style dynamic switching model according to claim 9, wherein the specific process of training the dynamically changing model by using two single style driver models is as follows: firstly, a safe traffic flow environment is built by utilizing a first style model, a mind state change model is preliminarily trained, then a challenging traffic flow environment is built by utilizing a second style driver model, and the mind state change model is further trained.