CN113753049A

CN113753049A - Social preference-based automatic driving overtaking decision determination method and system

Info

Publication number: CN113753049A
Application number: CN202111322969.7A
Authority: CN
Inventors: 吕超; 王昊阳; 鲁洪良; 于洋; 龚建伟; 臧政
Original assignee: Beili Huidong Beijing Technology Co ltd; Beijing Institute of Technology BIT
Current assignee: Beili Huidong Beijing Technology Co ltd; Beijing Institute of Technology BIT
Priority date: 2021-11-10
Filing date: 2021-11-10
Publication date: 2021-12-07
Anticipated expiration: 2041-11-10
Also published as: CN113753049B

Abstract

The invention discloses a social preference-based automatic driving overtaking decision determining method and system, which are applied to a parallel driving stage in an overtaking process and comprise the following steps: inputting the acquired target road information of the current stage into a social preference prediction model to determine social preference of the transcended vehicle of the current stage; determining a state transition model of the current stage overridden vehicle based on social preferences of the current stage overridden vehicle; inputting the target road information of the current stage, the social preference of the overtaking vehicle of the current stage and the state transition model of the overtaking vehicle of the current stage into the overtaking decision model to determine the overtaking decision of the host vehicle of the current stage; the overtaking decision comprises lane keeping, lane changing execution and overtaking abandoning; the algorithm applied by the overtaking decision model is a semi-model-based improved Q-learning algorithm. The invention can output accurate overtaking decision, and improve overtaking efficiency and overtaking safety.

Description

Social preference-based automatic driving overtaking decision determination method and system

Technical Field

The invention relates to the technical field of automatic driving, in particular to a social preference-based automatic driving overtaking decision determining method and system.

Background

With the continuous improvement of automobile holding capacity and the continuous progress of automatic driving technology, the intelligent driving system gradually enters the public visual field, wherein the autonomous overtaking system is increasingly concerned by researchers at home and abroad. At present, the research on an autonomous overtaking system at home and abroad, particularly the parallel driving stage of two vehicles during overtaking has certain defects.

In a driver's typical driving scenario, overtaking behavior is one of the most risky and challenging driving approaches. Aiming at the problem of autonomous overtaking of an intelligent vehicle, the longitudinal driving behavior and the change of the overtaken vehicle in the overtaking process are rarely considered by the conventional autonomous overtaking system, so that the longitudinal driving behavior of the overtaken vehicle cannot be adjusted in real time in the process of overtaking the overtaken vehicle by a host vehicle.

Disclosure of Invention

The invention aims to provide a method and a system for determining an automatic driving overtaking decision based on social preference so as to achieve the purpose of outputting an accurate overtaking decision and further improve the overtaking efficiency and the overtaking safety.

In order to achieve the purpose, the invention provides the following scheme:

an automatic driving overtaking decision determining method based on social preference, which is applied to a parallel driving stage in an overtaking process, comprises the following steps:

acquiring target road information at the current stage; the target road information comprises host vehicle position information, host vehicle speed information, overridden vehicle position information and overridden vehicle speed information; the host vehicle and the overtaking vehicle both run on the target road;

inputting the current stage target road information into a social preference prediction model to determine the social preference of the transcended vehicle at the current stage;

determining a state transition model of the current stage overridden vehicle based on social preferences of the current stage overridden vehicle;

inputting the current stage target road information, the social preference of the current stage overtaken vehicle and the state transition model of the current stage overtaken vehicle into an overtaking decision model to determine an overtaking decision of the current stage host vehicle; the overtaking decision comprises lane keeping, lane changing execution and overtaking abandoning;

the algorithm applied by the overtaking decision model is a semi-model-based improved Q-learning algorithm.

Optionally, the determining process of the social preference prediction model is as follows:

constructing a sample database; the sample data comprises three types of data, wherein the first type of overridden vehicle driving data comprises first overridden vehicle driving data and a first tag corresponding to the first overridden vehicle driving data, the second type of overridden vehicle driving data comprises second overridden vehicle driving data and a second tag corresponding to the second overridden vehicle driving data, and the third type of overridden vehicle driving data comprises third overridden vehicle driving data and a third tag corresponding to the third overridden vehicle driving data; the first label is of a type of benefiting oneself, the second label is of a reciprocal type, and the third label is of a type of benefiting;

and determining a social preference prediction model based on the sample database, a support vector machine model with a linear kernel and a maximum entropy model based on logistic regression.

Optionally, the determining a state transition model of the overridden vehicle at the current stage based on the social preference of the overridden vehicle at the current stage specifically includes:

performing statistical operation on the data in the sample database to obtain the state transition probability of the transcended vehicle at each position under each tag;

summarizing the state transition probabilities of the transcendered vehicles at all positions under the same label to construct state transition models of the transcendered vehicles under all labels;

and screening out the state transition models which accord with the social preference of the overtaken vehicle at the current stage from the state transition models of the overtaken vehicle under each label.

Optionally, the constructing a sample database specifically includes:

carrying out clustering processing on the sample traffic flow data by taking the average running speed of the overtaking vehicle after entering a parallel running stage in the overtaking process as a characteristic quantity and taking the social preference in the overtaking process as a clustering category to obtain first overtaken vehicle driving data, second overtaken vehicle driving data and third overtaken vehicle driving data;

constructing a sample database based on the first class of overridden vehicle driving data, the second class of overridden vehicle driving data, and the third class of overridden vehicle driving data;

the sample traffic flow data includes host vehicle information and overridden vehicle information; the host vehicle information comprises position information and speed information of a host vehicle in a parallel driving stage in the overtaking process; the overtaking vehicle information comprises position information and speed information of an overtaking vehicle in a parallel running stage in the overtaking process;

the overtaking process social preferences include a benef type, a reciprocal type, and a profit type.

Optionally, the semi-model-based improved Q-learning algorithm is designed based on a reinforcement learning method.

An autonomous driving overtaking decision making system based on social preferences, the autonomous driving overtaking decision making system being applied in a parallel driving phase in an overtaking process, the autonomous driving overtaking decision making system comprising:

the data acquisition module is used for acquiring the target road information at the current stage; the target road information comprises host vehicle position information, host vehicle speed information, overridden vehicle position information and overridden vehicle speed information; the host vehicle and the overtaking vehicle both run on the target road;

the social preference determination module is used for inputting the current stage target road information into a social preference prediction model so as to determine the social preference of the transcended vehicle at the current stage;

the state transition model determining module is used for determining a state transition model of the overtaken vehicle at the current stage based on the social preference of the overtaken vehicle at the current stage;

the overtaking decision output module is used for inputting the current stage target road information, the social preference of the overtaken vehicle in the current stage and the state transition model of the overtaken vehicle in the current stage into an overtaking decision model so as to determine an overtaking decision of a host vehicle in the current stage; the overtaking decision comprises lane keeping, lane changing execution and overtaking abandoning;

Optionally, the system further comprises a social preference prediction model determining module; the social preference prediction model determining module specifically comprises:

the sample database construction unit is used for constructing a sample database; the sample data comprises three types of data, wherein the first type of overridden vehicle driving data comprises first overridden vehicle driving data and a first tag corresponding to the first overridden vehicle driving data, the second type of overridden vehicle driving data comprises second overridden vehicle driving data and a second tag corresponding to the second overridden vehicle driving data, and the third type of overridden vehicle driving data comprises third overridden vehicle driving data and a third tag corresponding to the third overridden vehicle driving data; the first label is of a type of benefiting oneself, the second label is of a reciprocal type, and the third label is of a type of benefiting;

and the social preference prediction model determining unit is used for determining a social preference prediction model based on the sample database, the support vector machine model with the linear kernel and the maximum entropy model based on logistic regression.

Optionally, the state transition model determining module specifically includes:

the state transition probability calculation unit is used for carrying out statistical operation on the data in the sample database to obtain the state transition probability of the transcendered vehicle at each position under each label;

the state transition model building unit is used for summarizing the state transition probabilities of the transcendered vehicles under the same label at all positions so as to build state transition models of the transcendered vehicles under all labels;

and the state transition model determining unit is used for screening out the state transition models which accord with the social preference of the overtaken vehicles at the current stage from the state transition models of the overtaken vehicles under each label.

Optionally, the sample database constructing unit specifically includes:

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

according to the method and the system for determining the automatic driving overtaking decision based on the social preference, the longitudinal driving behavior and the social preference of the overtaken vehicle can be monitored in real time in the process that the main vehicle overtakes the overtaken vehicle, then the accurate overtaking decision is output based on the target road information of the current stage, the social preference of the overtaken vehicle of the current stage and the comprehensive consideration of the state transition probability of the overtaken vehicle of the current stage, the problem of interaction between the main vehicle and the overtaken vehicle in the overtaking process is solved, and the overtaking efficiency and the overtaking safety are improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a schematic flow chart of a method for determining an autonomous driving overtaking decision based on social preferences in accordance with the present invention;

FIG. 2 is a schematic flow chart of a design method of an autonomous overtaking decision system based on social preferences according to the present invention;

FIG. 3 is a schematic diagram of the lattice space in the autonomous overtaking state according to the present invention;

FIG. 4 is a diagram of a staged overtaking process of the present invention;

fig. 5 is a schematic structural diagram of an automatic driving overtaking decision-making system based on social preference according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to solve the problems provided by the background technology and conform to the intelligent development direction of the automatic driving technology, the invention models social preference based on a Markov decision process and a statistical machine Learning method, develops a semi-model-based improved Q-Learning algorithm based on the social preference attribute of the overtaking vehicle, completes the effective decision of the intelligent vehicle in the overtaking parallel driving stage, solves the interaction problem of the main vehicle and the overtaking vehicle in the overtaking process, and improves the overtaking efficiency and safety.

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

Example one

As shown in fig. 1, the present embodiment provides an automatic driving overtaking decision determining method based on social preferences, which is applied to a parallel driving stage in an overtaking process, and the automatic driving overtaking decision determining method includes:

step 101: acquiring target road information at the current stage; the target road information comprises host vehicle position information, host vehicle speed information, overridden vehicle position information and overridden vehicle speed information; both the host vehicle and the overridden vehicle travel on the target road.

Step 102: and inputting the current stage target road information into a social preference prediction model to determine the social preference of the transcended vehicle at the current stage.

Step 103: determining a state transition model for the current stage overridden vehicle based on social preferences of the current stage overridden vehicle.

Step 104: inputting the current stage target road information, the social preference of the current stage overtaken vehicle and the state transition model of the current stage overtaken vehicle into an overtaking decision model to determine an overtaking decision of the current stage host vehicle; the overtaking decision includes lane keeping, lane changing and overtaking abandoning.

The algorithm applied by the overtaking decision model is a semi-model-based improved Q-learning algorithm. The semi-model-based improved Q-learning algorithm is designed based on a reinforcement learning method.

The determination process of the social preference prediction model comprises the following steps:

constructing a sample database; the sample data comprises three types of data, wherein the first type of overridden vehicle driving data comprises first overridden vehicle driving data and a first tag corresponding to the first overridden vehicle driving data, the second type of overridden vehicle driving data comprises second overridden vehicle driving data and a second tag corresponding to the second overridden vehicle driving data, and the third type of overridden vehicle driving data comprises third overridden vehicle driving data and a third tag corresponding to the third overridden vehicle driving data; the first label is of a type of benefiting oneself, the second label is of a reciprocal type, and the third label is of a type of benefiting.

The constructing of the sample database specifically includes:

and clustering the sample traffic flow data by taking the average running speed of the overtaking vehicle after entering the parallel running stage in the overtaking process as a characteristic quantity and taking the social preference in the overtaking process as a clustering category to obtain first overtaken vehicle driving data, second overtaken vehicle driving data and third overtaken vehicle driving data.

Constructing a sample database based on the first class of overridden vehicle driving data, the second class of overridden vehicle driving data, and the third class of overridden vehicle driving data.

The sample traffic flow data includes host vehicle information and overridden vehicle information; the host vehicle information comprises position information and speed information of a host vehicle in a parallel driving stage in the overtaking process; the overtaken vehicle information comprises position information and speed information of an overtaken vehicle in a parallel driving stage in the overtaking process.

Step 103 specifically comprises:

and carrying out statistical operation on the data in the sample database to obtain the state transition probability of the transcended vehicle at each position under each tag.

And summarizing the state transition probabilities of the transcended vehicles under the same label at all positions to construct state transition models of the transcended vehicles under all labels.

Example two

The invention aims to provide a design method of an autonomous overtaking decision system based on social preference, which is characterized in that the driving experience and driving behavior of a human driver are integrated into an overtaking decision model (also called as an overtaking decision module) of the autonomous overtaking system, and an improved Q-learning algorithm based on a half model is developed based on a Markov decision process so as to solve the problem of vehicle interaction in the overtaking process.

Referring to fig. 2, an embodiment of the present invention provides a method for designing an autonomous overtaking decision system based on social preferences, including:

step 1, defining a state lattice space. In the embodiment of the invention, aiming at the problem of autonomous overtaking on a straight road, in order to facilitate calculation and modeling in the overtaking process, a state lattice space as shown in fig. 3 is established on the researched straight road. Thus, the position of the vehicle can be represented by the coordinates of the lattice vertices.

The overall state lattice space can be expressed as:

（1）；

wherein S is_slIs a position matrix, x_m，y_nThe vectors representing the abscissa and ordinate values, respectively, the subscripts m, n representing the position index and R the set of real numbers.

The position coordinates in the state lattice space can be used for determining and representing the position of the vehicle, so that subsequent modeling and calculation are facilitated.

And 2, defining the overtaking process. The autonomous overtaking decision system provided by the embodiment of the invention mainly aims at the overtaking problem of urban road environment, and aims at the parallel driving stage during overtaking, so that the overtaking process needs to be specifically defined. The overtaking scene of the embodiment of the invention mainly comprises two objects, as shown in fig. 4: a host vehicle and an overrunning vehicle. A typical overtaking process includes three phases: the initial overtaking stage: the host vehicle changes the lane to the overtaking lane; and (3) a parallel driving stage: the main vehicle and the overtaking vehicle run in parallel and execute overtaking; and (3) overtaking termination stage: the host vehicle drives back to the origin lane.

The overtaking mode adopted by the embodiment of the invention is accelerated overtaking, and the main vehicle keeps running at a constant speed in the parallel running stage in the overtaking process.

The step 1 and the step 2 mainly define the lattice space, the overtaking process and the parallel driving stage, and lay the foundation for the subsequent design.

And 3, defining social preference of the overtaking process. Smart vehicles deployed on highways need to understand the intentions of human drivers and adapt to their driving style. The embodiment of the invention integrates social psychology concept-social preference into the autonomous overtaking decision so as to quantify and predict the social behaviors of other drivers and enable the intelligent vehicle to realize unmanned driving in a mode of meeting social rules.

According to the definition of social psychology, social preference refers to the embodiment of the inherent sociality of people in a preference level. According to the embodiment of the invention, the definition of the social preference and the classification method are integrated, and three types of social preferences are defined aiming at the overtaking scene, namely the type of benefiting oneself, the type of mutual benefit and the type of benefiting others, so as to distinguish the longitudinal driving modes of drivers with different styles.

The three social preferences defined in this step will be the basis for the cluster analysis in the next step.

And 4, clustering overtaking data. After defining the social preference of the overtaking process, analyzing the overtaking data by using a clustering method. For the social preference of autonomous overtaking, the average running speed of the overtaken vehicle within five seconds after entering the parallel running stage of the overtaking process can be used as a characteristic quantity, the social preference of the overtaking process is used as a clustering category, and the overtaking data is clustered, so that three groups of overtaken vehicle driving data corresponding to the three social preferences of the overtaking process mentioned in the step 3 are obtained. The driving data of the overtaken vehicle subjected to cluster analysis mainly comprises position information and speed information of a host vehicle and the overtaken vehicle in a parallel driving stage, and the data source of the driving data can adopt a common data set such as highway traffic flow data (NGSIM) in the United states.

At present, the common clustering algorithms mainly have five categories, wherein a K-Means algorithm in a clustering method based on division is a classic algorithm in the clustering algorithm, and compared with other clustering algorithms, the clustering principle and idea of the algorithm are simpler, and the clustering effect is better. Although the K-means algorithm has disadvantages, such as the number of clusters in the algorithm needs to be determined in advance, the data dimension is low and the data volume is small in the cluster of overtaking data, and the requirement can be met by using the simple K-means algorithm.

And further carrying out average state transition probability statistics on each type of overtaking data by using the overtaking vehicle driving data after the clustering analysis, thereby obtaining an overtaking vehicle state transition model.

And 5, carrying out statistics on the transgressed vehicle state transition probability. The overtaking vehicle driving data of the overtaking parallel driving stage under three different social preferences are obtained through clustering, wherein the overtaking vehicle driving data comprise data information of initial positions of the overtaking vehicle relative to the host vehicle in the parallel driving stage. Using this information as input, the state transition probabilities of the overtaken vehicles are counted.

From step 1, the overtaking process is defined within the state lattice space. For the data of the overtaken vehicle under a certain social preference, the following data statistical operation is required to obtain the state transition probability of the overtaken vehicle under the social preference.

The following formula is first applied to convert the overridden vehicle relative position data under social preference to grid state information:

（2）；

wherein the content of the first and second substances,Sthe finger state matrix stores the grid position information of the overtaking vehicle in the overtaking data in each row of the state matrix;P _krefers to a matrix of relative position data,P _keach row of (a) stores data information of the initial position of the overtaking vehicle relative to the host vehicle during the parallel driving phase for each overtaking process.gIs the grid width;floor(g)is a floor function.

After the state matrix is obtained, the state transition number can be counted, and then the state transition probability of the overtaken vehicle under each social preference is calculated according to the following formula:

（3）

wherein the content of the first and second substances,

is shown astThe position of the grid where the overtaking vehicle is located at the moment isk，

Is shown ast+1The position of the grid where the overtaking vehicle is located at the moment isk+n。

When the overtaking vehicle is in the second placetIs in the grid at all timeskAnd at the firstt+ 1Is in the grid at all timesk+nI.e. the state transition probability;count(g)the function is used to count the number of state transition events before and after the occurrence,Gis a grid position matrix in which all grid positions are contained.

By counting the state transition probability of the transcendered vehicle at each grid position under each social preference, the state transition model of the transcendered vehicle can be obtained, and the state transition model has the function of predicting the position of the transcendered vehicle at a next time step on the premise that the social preference of the transcendered vehicle and the grid position of the transcendered vehicle at a certain time step are known. The model is used for judging the position information of the overtaking vehicle in the overtaking decision module, assisting the model training and making the optimal decision.

And 6, establishing a social preference prediction model. The social preference prediction model is also established on the basis of the overtaking data obtained after clustering in the step 4, and the purpose is to judge the social preference of the overtaken vehicle on line and in real time by analyzing the three types of overtaking data with labels. In essence, classification or prediction beyond the social preference of the vehicle is achieved through certain classifiers or probabilistic models.

The classification method based on data is various, and in view of the fact that the number of overtaking data which can be extracted in the embodiment of the invention is moderate, the interpretability of the support vector machine is good and the support vector machine has sparsity, namely, a good classification effect can be obtained by a small amount of samples. In addition, the maximum entropy statistical model has higher accuracy when being used as a classical classification model, can flexibly set constraint conditions, and can adjust the fitness of the model to unknown data and the fitting degree of the model to known data according to the number of the constraint conditions. Therefore, the embodiment of the invention selects to establish a support vector machine model with a linear kernel and a maximum entropy model based on logistic regression to predict the social preference of the transcended vehicle in real time.

The input of the social preference prediction model is real-time road information which mainly comprises speed information of the overtaking vehicle, the output of the social preference prediction model is social preference of the overtaking vehicle, and the social preference is used as the input of the overtaking decision module to assist the overtaking decision module to make an optimal decision.

And 7, designing a overtaking decision module. The module is based on a reinforcement learning method, and is used for designing a semi-model-based improved Q-learning algorithm for making decisions on whether to continue overtaking and lane changing and when to change lanes in an overtaking parallel driving stage, so that the overtaking efficiency and safety are optimized.

Specifically, the overtaking decision model takes the overtaking vehicle state transition model obtained in the step 5, the overtaking vehicle real-time social preference prediction result obtained in the step 6 and the host vehicle and overtaken vehicle real-time data in the environment as input, and outputs one of three decisions of lane keeping, lane changing and overtaking abandoning at each time step by analyzing the speed and position information of the host vehicle and the overtaken vehicle, the social preference and the future position prediction of the overtaken vehicle at each time step.

The model starting time is that the host vehicle finishes lane changing and enters a parallel running stage, and the ending point is that the host vehicle makes a lane changing execution decision and enters a passing ending stage or the host vehicle makes a passing abandoning decision. The evaluation indexes comprise lane changing efficiency, position difference with the optimal lane changing point and whether overtaking is given up in time when the social preference of the overtaking vehicle is a good-hand type. The reward function of the reinforcement learning algorithm is also designed based on the above.

And 8, designing a semi-model-based improved Q-learning algorithm. The step introduces an algorithm related to the overtaking decision module. The state and motion space discretization process is relatively simple due to the definition of the cut-in problem within the state lattice space and the discretization of the decision motion. Therefore, in order to better adapt to uncertainty of the social preference of the overtaken vehicle, for the overtaking decision module based on the Markov decision process and the social preference, the embodiment of the invention adopts an improved Q-Learning algorithm with discretized state and action space for training. The autonomous overtaking problem during the parallel driving phase in the embodiment of the invention is based on the above-mentioned overtaken vehicle state transition model, so that the improvement on the iterative formula of the model-free Q-Learning algorithm is needed. The improved Q-Learning algorithm is referred to as being semi-model based, in that the state of the overridden vehicle is model based and the state transition of the host vehicle is simulation platform based. The semi-model-based improved Q-Learning algorithm is well adapted to uncertainties that are surpassed by the social preferences of the vehicle.

When considering the transcendental vehicle state transition model, the Q function needs to consider the state of the transcendental vehicle, so the theoretical formula of the Q-learning algorithm is rewritten as:

（4）；

wherein the superscripts HV, OV respectively represent the host vehicle and the overtaken vehicle,

and

respectively representing the vehicle state and behavior at time step t.

The expected Q value needs to be rewritten as:

（5）；

wherein the content of the first and second substances,

from time t state for an overridden vehicle

Transition to the State at time t +1

The state transition probability of (2).

Equation 5 means that when the state transition probability of the overridden vehicle is known, the expected Q value at each time step needs to be adjusted based on all possible states of the overridden vehicle at all next times.

And 9, defining the reinforcement learning model elements. The reinforcement learning model basic elements comprise a state space, an action space and a reward function. Aiming at the overtaking scene in the invention, the following definitions are made:

1) state space

For lane change point decision problems, the relative positions and speeds of the host vehicle and the overridden vehicle determine the optimal location for the lane change. Furthermore, for an overridden vehicle with a different social preference, the host vehicle should adjust the lane change position according to its aggressiveness. Thus, the factors considered by the state space should include:

（6）;

where G is a matrix of grid positions where the vehicle is located, V represents a matrix of vehicle speeds,

is a social preference matrix for the transcended vehicle, an

-1, 0, 1 represent the social preferences of the type of risa, reciprocal, and the type of liber, respectively.

2) Movement space

The decision of the intelligent vehicle autonomous overtaking lane change point belongs to one behavior decision. When the intelligent vehicle is not in the lane change state, the intelligent vehicle should keep the lane to continue driving; when the intelligent vehicle is at the optimal lane changing point, a lane changing decision is made at all times; when the overtaken vehicle is too profitable, it is considered to give up overtaking. Thus, three optional actions are defined herein in the cut-in decision module, namely lane keeping, performing lane change and abandoning cut-in:

a = { "lane keeping", "lane change is performed", "overtaking is abandoned" } (7).

3. Reward function

For the decision problem of the autonomous overtaking lane change point, different rewards or punishments are given according to different decision actions. Specifically, for lane keeping behavior, a small penalty should be given to avoid that the vehicle is always lane keeping without changing lanes; for lane change behaviors, punishment is given according to the difference between the decided lane change position and the optimal lane change position; for surrendering overtaking behavior, choosing this action at an inappropriate time should give a large penalty. Thus, the reward function is defined as follows:

（8）；

wherein the content of the first and second substances,K _lis a constant parameter.g _HV、g _OVRespectively representing the positions of the grids in which the host vehicle and the overtaking vehicle are located at the current moment,

to expect a position error, the correlation to the three second rule and the social preference of the overridden vehicle is calculated by:

（9）；

wherein the content of the first and second substances,gindicating the unit length of the grid.T _rA travel time of 3 seconds in the three second rule.

Is a constant parameter.

EXAMPLE III

Referring to fig. 5, the system for determining an automatic driving overtaking decision based on social preference according to the present embodiment is applied to a parallel driving stage in an overtaking process, and includes:

a data obtaining module 501, configured to obtain target road information at a current stage; the target road information comprises host vehicle position information, host vehicle speed information, overridden vehicle position information and overridden vehicle speed information; both the host vehicle and the overridden vehicle travel on the target road.

A social preference determination module 502 for inputting the current stage target road information to a social preference prediction model to determine the social preference of the current stage transcended vehicle.

A state transition model determination module 503 for determining a state transition model of the current stage overridden vehicle based on the social preference of the current stage overridden vehicle.

The overtaking decision output module 504 is configured to input the current stage target road information, the social preference of the overtaken vehicle in the current stage, and a state transition model of the overtaken vehicle in the current stage into an overtaking decision model, so as to determine an overtaking decision of the host vehicle in the current stage; the overtaking decision comprises lane keeping, lane changing execution and overtaking abandoning; the algorithm applied by the overtaking decision model is a semi-model-based improved Q-learning algorithm. The semi-model-based improved Q-learning algorithm is designed based on a reinforcement learning method.

Further, the system of this embodiment further includes a social preference prediction model determining module.

The social preference prediction model determining module specifically comprises:

the sample database construction unit is used for constructing a sample database; the sample data comprises three types of data, wherein the first type of overridden vehicle driving data comprises first overridden vehicle driving data and a first tag corresponding to the first overridden vehicle driving data, the second type of overridden vehicle driving data comprises second overridden vehicle driving data and a second tag corresponding to the second overridden vehicle driving data, and the third type of overridden vehicle driving data comprises third overridden vehicle driving data and a third tag corresponding to the third overridden vehicle driving data; the first label is of a type of benefiting oneself, the second label is of a reciprocal type, and the third label is of a type of benefiting.

The sample database construction unit specifically includes:

The state transition model determining module 503 specifically includes:

Compared with the prior art, the overtaking decision system based on the Markov decision process and the social preference in the overtaking parallel running stage can fully consider the longitudinal driving behavior mode and the change of the overtaken vehicle in the overtaken process, so that the main vehicle is controlled to make corresponding overtaking behavior adjustment on the longitudinal driving behavior of the overtaken vehicle in real time. Specifically, when the overtaking vehicle is a reciprocal type or a rivalry type, the main vehicle controlled by the overtaking decision system can accurately realize accurate lane change point decision; when the social preference of the overtaken vehicle is good, the main vehicle can also smoothly make a decision to give up the overtaking of the front vehicle. The hierarchical reinforcement learning autonomous overtaking system has the capability of solving the interaction problem between the main vehicle and the overtaken vehicle in the autonomous overtaking process, can realize safe and efficient autonomous overtaking by applying the framework, and has certain application feasibility in an actual overtaking scene.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. An automatic driving overtaking decision determining method based on social preference is characterized in that the automatic driving overtaking decision determining method is applied to a parallel driving stage in an overtaking process, and the automatic driving overtaking decision determining method comprises the following steps:

2. The method of claim 1, wherein the social preference prediction model is determined by:

3. The method for determining a social preference-based automatic driving overtaking decision as claimed in claim 2, wherein the determining a state transition model of the current stage overtaken vehicle based on the social preference of the current stage overtaken vehicle comprises:

4. The method according to claim 2, wherein the constructing a sample database comprises:

5. A social preference based decision making method for autonomous driving and passing as claimed in claim 1 wherein the semi model based modified Q-learning algorithm is designed based on reinforcement learning method.

6. An autonomous driving overtaking decision making system based on social preferences, the autonomous driving overtaking decision making system being applied in a parallel driving phase in an overtaking process, the autonomous driving overtaking decision making system comprising:

7. A social preference based autonomous driving overtaking decision making system as claimed in claim 6 further comprising a social preference prediction model determination module; the social preference prediction model determining module specifically comprises:

8. The system of claim 7, wherein the state transition model determination module specifically comprises:

9. The system according to claim 7, wherein the sample database construction unit specifically comprises:

10. A social preference based autonomous driving overtaking decision making system as claimed in claim 6 wherein the semi model based modified Q-learning algorithm is designed based on reinforcement learning methods.