CN116572993A - Intelligent vehicle risk sensitive sequential behavior decision method, device and equipment - Google Patents

Intelligent vehicle risk sensitive sequential behavior decision method, device and equipment Download PDF

Info

Publication number
CN116572993A
CN116572993A CN202310788233.1A CN202310788233A CN116572993A CN 116572993 A CN116572993 A CN 116572993A CN 202310788233 A CN202310788233 A CN 202310788233A CN 116572993 A CN116572993 A CN 116572993A
Authority
CN
China
Prior art keywords
vehicle
decision
dynamic
action
driving
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310788233.1A
Other languages
Chinese (zh)
Inventor
黄荷叶
王建强
崔明阳
刘艺璁
韩泽宇
许庆
李克强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202310788233.1A priority Critical patent/CN116572993A/en
Publication of CN116572993A publication Critical patent/CN116572993A/en
Pending legal-status Critical Current

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0027Planning or execution of driving tasks using trajectory prediction for other traffic participants
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0027Planning or execution of driving tasks using trajectory prediction for other traffic participants
    • B60W60/00272Planning or execution of driving tasks using trajectory prediction for other traffic participants relying on extrapolation of current movement
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0027Planning or execution of driving tasks using trajectory prediction for other traffic participants
    • B60W60/00276Planning or execution of driving tasks using trajectory prediction for other traffic participants for two or more other traffic participants
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0002Automatic control, details of type of controller or control system architecture
    • B60W2050/0004In digital systems, e.g. discrete-time systems involving sampling
    • B60W2050/0005Processor details or data handling, e.g. memory registers or chip architecture
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Human Computer Interaction (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Traffic Control Systems (AREA)

Abstract

The application relates to a method, a device and equipment for deciding risk-sensitive sequential behaviors of an intelligent vehicle, wherein the method comprises the following steps: acquiring running state information of traffic participants in a preset traffic environment, constructing a dynamic objective function, determining a single-step behavior decision strategy of a vehicle based on the dynamic objective function, determining longitudinal and transverse dynamic safety margins in the decision process, identifying driving intention of surrounding vehicles based on the single-step behavior decision strategy, calculating cost values of the vehicles taking different behavior decision strategies to match with the optimal strategy of the vehicles, repeating the steps until the risk-sensitive sequential decision strategy is consistent with the action of the vehicles at the current moment, and outputting an optimal track according to the risk-sensitive sequential decision strategy. Therefore, the problems that the intelligent vehicle decision-making method has certain requirements on the capacity and quality of training data samples, is difficult to apply to actual complex dynamic scenes and the like are solved, and the intelligent vehicle dynamic multi-objective cooperation and multi-stage stable decision-making is realized in the complex scenes.

Description

Intelligent vehicle risk sensitive sequential behavior decision method, device and equipment
Technical Field
The application relates to the technical field of intelligent automobile application, in particular to a method, a device and equipment for deciding risk-sensitive sequential behaviors of an intelligent vehicle.
Background
In a complex scenario, the intelligent vehicle decision system needs to output a stable, continuous and reasonable decision strategy to meet the actual driving requirement. However, in the actual running process of the vehicle, the performance requirements are coupled in various ways, the multi-stage multi-scene decision process is incoherent, and a plurality of difficulties are brought to the multi-objective cooperation and multi-stage decision performance assurance related research of the intelligent vehicle. The key challenge is how to complete the sequential behavior decision of the vehicle in a dynamic environment and plan a feasible track so as to meet the performance targets of safety, high efficiency, reliability and the like. In a human-vehicle-road system, a driver takes multiple roles such as a decision maker, an experimenter and the like, and the driving perception-decision-control characteristics directly influence the vehicle control stability and safety. The potential risks of the driver to the traffic environment have a common cognitive mechanism and a common control law, but different types of risk sources have different influences on the driver, and the risk response of the driver can influence the scene understanding and decision strategy output of the driver, so that the driving safety is influenced.
In the related art, research on advanced auxiliary driving systems and automatic driving aims at improving the intelligent level of vehicles, and the most suitable driving behaviors of the vehicles are selected to adapt to complex dynamic traffic environments. However, in the process of driving the vehicle by the driver, the complex time-varying traffic environment under the coupling action of multiple elements such as people, vehicles, roads and the like can be effectively treated, and the active decision principle is unified by 'perception-evaluation-decision' coordination under any scene instead of aiming at a single dangerous scene. Therefore, it is necessary to obtain a heuristic from an active decision mode adopted by a driver to cope with complex and changeable traffic environments, so that an automatic driving system can be self-adaptively opened and dynamic traffic scenes, and safe, reliable, quick and smooth autonomous driving in actual traffic is realized.
In addition, aiming at various uncertainty factors such as randomness of traffic participant behaviors (including intention and track interaction randomness) in a complex dynamic interaction environment, dynamic state of traffic environment (static/dynamic blind area, sensor error and the like) and the like, the decision system for researching the intelligent vehicle is very critical, and the decision system is required to output a better and stable decision strategy in the vehicle driving process so as to meet the actual driving requirement. At present, a great deal of research on intelligent vehicle decisions is carried out at home and abroad.
In the related art, the centralized decision framework is mainly based on an integrated thought, and based on environmental information received by a sensor, learning or self-trial-and-error and exploration of driving behavior data are performed through an end-to-end method (such as deep learning, reinforcement learning and the like), and based on input information of the sensor, a vehicle bottom layer control command is directly output.
In the related art, the hierarchical decision framework decomposes the whole decision process into a series of sub-functional modules, and each functional module can be designed independently, and usually, the trajectory planning is performed after the decision is performed. The overall decision process under the hierarchical decision framework can be categorized into single-stage or single-stage behavioral decisions, multi-stage sequential decisions, or multi-stage behavioral decisions. The single-step behavior decision research method comprises a traditional rule/optimization decision method, a decision method based on probabilistic statistical reasoning, a decision method based on behavior interaction and the like. Multi-stage or multi-step sequential decision (or trajectory planning) research methods mainly include searching, interpolation, sampling, artificial potential energy fields, and the like.
In the related technology, the data-driven centralized decision method does not need to rely on a limited expert rule to make decisions, and the strategy network directly generates test cases from real driving data, so that the integrity and instantaneity of the decisions are better. However, the essence of the centralized decision framework (e.g., reinforcement learning-based decision method, supervised learning-based decision method) is to learn simulation of natural driving data, and there is a certain need for the training data sample capacity and quality of the input end or the number and quality of the test scenes. In a practical complex dynamic scene, the influence and constraint of traffic rules or limitation of road topography on continuous sequential decision of vehicles are more needed to be considered, so that analysis on complex interaction problems can be greatly simplified, and reasonable multi-stage decision of the vehicles is supported. By comparison, the traditional hierarchical decision framework has simple logic and can output better and stable behavior decisions, but knowledge is difficult to acquire, and how to migrate the decisions to different scenes has challenges. End-to-end can handle a variety of driving scenarios, but with poor interpretability.
In summary, there is currently a lack of a method for risk-sensitive sequential behavior decision of an intelligent vehicle, and the method needs to be solved.
Disclosure of Invention
The application provides an intelligent vehicle risk sensitive sequential behavior decision method, device and equipment, which are used for solving the problems that in the related technology, the intelligent vehicle decision method has certain requirements on the capacity and quality of training data samples, is difficult to apply to actual complex dynamic scenes and the like, realizing dynamic multi-objective collaborative and multi-stage stable decision of the intelligent vehicle in the complex scenes, and promoting application landing and development and upgrading of the intelligent vehicle.
An embodiment of a first aspect of the present application provides an intelligent vehicle risk-sensitive sequential behavior decision method, including the following steps:
acquiring running state information of a traffic participant in a preset traffic environment;
based on the driving state information, constructing a dynamic objective function according to the risk sensitivity of the driver;
determining a vehicle single-step behavior decision strategy based on the dynamic objective function, the longitudinal dynamic safety margin and the transverse dynamic safety margin;
identifying the driving intention of surrounding vehicles based on the single-step behavior decision strategy, calculating the cost value of the current vehicle adopting different behavior decision strategies according to the driving intention, and matching the optimal strategy of the current vehicle according to the cost value; and
Determining the action of the current vehicle at the current moment according to the optimal strategy of the current vehicle, judging whether the action quantity of the single-step output action decision strategy is consistent with the actual action quantity of the action output of the current vehicle at the current moment based on rolling time domain optimization, and acquiring the running state information of the traffic participant in the preset traffic environment again until the action quantity of the risk sensitive sequential decision strategy is consistent with the actual action quantity of the action output of the current vehicle at the current moment when the action quantity of the action decision strategy is inconsistent with the actual action quantity of the action output of the current vehicle at the current moment, and outputting an optimal track according to the risk sensitive sequential decision strategy.
Optionally, in some embodiments, the constructing a dynamic objective function according to the driver risk sensitivity based on the driving state information includes:
constructing the minimum action quantity in a real physical system, and determining a unified driving target in the driving decision process of the driver based on the driving state information;
based on the unified driving objective, a dynamic objective function based on the minimum amount of action is output,
wherein the dynamic objective function is:
wherein i is an intelligent vehicle, S Risk For the dynamic objective function, t, of the intelligent automobile i in the decision planning process 0 Is the initial time, t f Is the end time, L i Lagrangian equation, T, for two-vehicle systems i U is the kinetic energy of the vehicle i Is the potential energy of the system.
Optionally, in some embodiments, before determining the vehicle single step behavior decision strategy based on the dynamic objective function, the longitudinal dynamic safety margin, and the lateral dynamic safety margin, further comprising:
dividing the preset traffic environment into a plurality of two-vehicle systems formed by combining two vehicles according to the interaction between the vehicles based on the interaction relationship between the vehicles and the traffic participants;
and determining the Lagrange equation of the two-vehicle system, and determining the longitudinal dynamic safety margin and the transverse dynamic safety margin in the decision process according to the Lagrange equation of the two-vehicle system.
Optionally, in some embodiments, the lagrangian equation for the two-vehicle system is:
the lateral dynamic safety margin is:
r y =r ij,y +∈;
the longitudinal dynamic safety margin is:
r x =r ij,x +Ψ(v ix ,(v ix -v jx ))Δt;
wherein i represents a self-propelled vehicle, T i U is the kinetic energy of the vehicle i For the potential energy of the system, m i Is the mass, v of the vehicle i i For the speed of vehicle i, v j For the speed of the vehicle j, t 0 Is the initial time, t f Is the end time, R i Is the longitudinal constraint resistance of traffic rules to drivers, G i Is a virtual driving force generated by driving the intelligent vehicle by the driving target of the driver, G i,x For the longitudinal target driving force of the driver, G i,y For the lateral target driving force of the driver, v ix For the longitudinal speed of the vehicle i, v iy For the lateral speed of the vehicle i, F li, And F li, Represents the lateral restraining forces generated by two lane lines of the driving lane of the target vehicle, F ji Representing interaction risk force caused by the vehicle j to the vehicle i; r is (r) ij, Is the following distance of vehicles i and j in the longitudinal direction, r ij, For the following distance of vehicles i and j in the lateral direction, e is the lateral safety margin, ψ is the positive correlation function, Δt represents the step back time.
Optionally, in some embodiments, the risk sensitive sequential decision strategy comprises:
rolling optimization adjustment vehicle driving strategy in a time window to obtain the expression of the optimal dynamic objective function in a rolling time domain;
and solving the functional extremum based on a preset variational method, and obtaining the risk sensitive sequential decision strategy according to the solving result.
Optionally, in some embodiments, the optimal dynamic objective function is expressed in the rolling time domain as:
wherein S is the actual acting quantity, k represents the moment, J (·) is a cost function, u (k) is an input vector, x (k) is a state vector, and Φ is a target set; s is S * To the ideal action quantity, H w Representing the scrolling horizon, τ is the time increment, u (k+τ|k) is the control input value from time k to the future time (k+τ), X (k+τ|k) is the predicted value from time k to the future time (k+τ), and Z (·) is the end penalty term.
An embodiment of a second aspect of the present application provides an intelligent vehicle risk-sensitive sequential behavior decision apparatus, including:
the traffic participant driving information acquisition module is used for acquiring driving state information of the traffic participant in a preset traffic environment;
the dynamic objective function construction module is used for constructing a dynamic objective function according to the risk sensitivity of the driver based on the driving state information;
the intelligent vehicle single-step behavior decision module is used for determining the vehicle single-step behavior decision strategy based on the dynamic objective function, the longitudinal dynamic safety margin and the transverse dynamic safety margin;
the behavior decision cost calculation and strategy selection module is used for identifying the driving intention of surrounding vehicles based on the single-step behavior decision strategy, calculating the cost value of the current vehicle adopting different behavior decision strategies according to the driving intention, and matching the optimal strategy of the current vehicle according to the cost value; and
The construction and optimization module is used for determining the action of the current vehicle at the current moment according to the optimal strategy of the current vehicle, judging whether the action quantity of the single-step output action decision strategy is consistent with the actual action quantity of the action output of the current vehicle at the current moment or not based on rolling time domain optimization, and re-acquiring the running state information of the traffic participant in the preset traffic environment until the action quantity of the risk sensitive type sequential decision strategy is consistent with the actual action quantity of the action output of the current vehicle at the current moment when the action quantity of the action decision strategy is inconsistent with the actual action quantity of the action output of the current vehicle at the current moment, and outputting an optimal track according to the risk sensitive type sequential decision strategy.
Optionally, in some embodiments, the dynamic objective function construction module is specifically configured to:
constructing the minimum action quantity in a real physical system, and determining a unified driving target in the driving decision process of the driver based on the driving state information;
based on the unified driving objective, a dynamic objective function based on the minimum amount of action is output,
wherein the dynamic objective function is:
wherein i is an intelligent vehicle, S Risk For the dynamic objective function, t, of the intelligent automobile i in the decision planning process 0 Is the initial time, t f Is the end time, L i Lagrangian equation, T, for two-vehicle systems i U is the kinetic energy of the vehicle i Is the potential energy of the system.
Optionally, in some embodiments, before determining a vehicle single step behavior decision strategy based on the dynamic objective function, the longitudinal dynamic safety margin, and the lateral dynamic safety margin, the intelligent vehicle single step behavior decision module is further configured to:
dividing the preset traffic environment into a plurality of two-vehicle systems formed by combining two vehicles according to the interaction between the vehicles based on the interaction relationship between the vehicles and the traffic participants;
and determining the Lagrange equation of the two-vehicle system, and determining the longitudinal dynamic safety margin and the transverse dynamic safety margin in the decision process according to the Lagrange equation of the two-vehicle system.
Optionally, in some embodiments, the lagrangian equation for the two-vehicle system is:
the lateral dynamic safety margin is:
r y =r ij,y +∈;
the longitudinal dynamic safety margin is:
r x =r ij,x +Ψ(v ix ,(v ix -v jx ))Δt;
wherein i represents a self-propelled vehicle, T i U is the kinetic energy of the vehicle i For the potential energy of the system, m i Is the mass, v of the vehicle i u For the speed of vehicle i, v j For the speed of the vehicle j, t 0 Is the initial time, t f Is the end time, R i Is the longitudinal constraint resistance of traffic rules to drivers, G i Is a virtual driving force generated by driving the intelligent vehicle by the driving target of the driver, G i,x For the longitudinal target driving force of the driver, G i,y For the lateral target driving force of the driver, v ix For the longitudinal speed of the vehicle i, v iy For the lateral speed of the vehicle i, F li, And F li, Represents the lateral restraining forces generated by two lane lines of the driving lane of the target vehicle, F ji Representing interaction risk force caused by the vehicle j to the vehicle i; r is (r) ij, Is the following distance of vehicles i and j in the longitudinal direction, r ij, For the following distance of vehicles i and j in the lateral direction, e is the lateral safety margin, ψ is the positive correlation function, Δt represents the step back time.
Optionally, in some embodiments, the risk sensitivity sequential decision strategy comprises:
rolling optimization adjustment vehicle driving strategy in a time window to obtain the expression of the optimal dynamic objective function in a rolling time domain;
and solving the functional extremum based on a preset variational method, and obtaining the risk sensitive sequential decision strategy according to the solving result.
Optionally, in some embodiments, the optimal dynamic objective function is expressed in the rolling time domain as:
Wherein S is the actual acting quantity, k represents the moment, J (·) is a cost function, u (k) is an input vector, x (k) is a state vector, and Φ is a target set; s is S * To the ideal action quantity, H w Representing the scrolling horizon, τ is the time increment, u (k+τ|k) is the control input value from time k to the future time (k+τ), X (k+τ|k) is the predicted value from time k to the future time (k+τ), and Z (·) is the end penalty term.
An embodiment of a third aspect of the present application provides an electronic device, including: the system comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the intelligent vehicle risk sensitive sequential behavior decision method according to the embodiment.
A fourth aspect of the present application provides a computer readable storage medium having stored thereon a computer program for execution by a processor for implementing the intelligent vehicle risk sensitive sequential behavior decision method as described in the above embodiments.
According to the application, a dynamic objective function is constructed by acquiring the running state information of traffic participants in a preset traffic environment, a single-step behavior decision strategy of the vehicle is determined based on the dynamic objective function, the longitudinal and transverse dynamic safety margin in the decision process is determined, the driving intention of surrounding vehicles is identified based on the single-step behavior decision strategy, the cost value of the vehicles taking different behavior decision strategies is calculated to match the optimal strategy of the vehicles, the steps are repeated until the risk-sensitive sequential decision strategy is consistent with the action of the vehicles at the current moment, and the optimal track is output according to the risk-sensitive sequential decision strategy. Therefore, the problems that in the related technology, the intelligent vehicle decision method has certain requirements on the capacity and quality of training data samples, is difficult to apply to actual complex dynamic scenes and the like are solved, the intelligent vehicle dynamic multi-objective cooperation and multi-stage stable decision is realized in the complex scenes, and the application landing and development and upgrading of the intelligent vehicle are promoted.
Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a flow chart of a risk-sensitive sequential behavior decision method for an intelligent vehicle according to an embodiment of the present application;
FIG. 2 is a schematic diagram of single step behavior decision strategy calculation and selection for an intelligent vehicle according to one embodiment of the present application;
FIG. 3 is a schematic diagram of intelligent vehicle interactive multi-step behavioral decision scroll execution, provided in accordance with one embodiment of the present application;
FIG. 4 is a schematic diagram of an intelligent vehicle roll horizon optimization strategy provided in accordance with one embodiment of the present application;
FIG. 5 is a schematic diagram of a risk-sensitive sequential behavior decision method for an intelligent vehicle according to one embodiment of the present application;
FIG. 6 is a schematic diagram of a flow of a risk-sensitive sequential behavior decision method for an intelligent vehicle according to an embodiment of the present application;
FIG. 7 is a schematic diagram of single step behavior decision, driver and classical decision method concept comparison in any scenario provided according to one embodiment of the present application;
FIG. 8 is a schematic diagram of an intelligent vehicle dynamic multi-objective behavior decision process provided in accordance with one embodiment of the present application;
FIG. 9 is a schematic diagram of an intelligent vehicle risk-sensitive sequential behavior decision apparatus provided according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present application and should not be construed as limiting the application.
The following describes an intelligent vehicle risk-sensitive sequential behavior decision method, device and equipment according to an embodiment of the application with reference to the accompanying drawings. Aiming at the problems that the intelligent vehicle decision-making method in the background art has certain requirements on the capacity and quality of training data samples and is difficult to apply to actual complex dynamic scenes, the application provides an intelligent vehicle risk-sensitive sequential behavior decision-making method, wherein the method acquires the running state information of traffic participants in a preset traffic environment; based on the driving state information, constructing a dynamic objective function according to the risk sensitivity of the driver; determining a single-step behavior decision strategy of the vehicle based on the dynamic objective function, the longitudinal dynamic safety margin and the transverse dynamic safety margin; identifying the driving intention of surrounding vehicles based on the single-step behavior decision strategy, calculating the cost value of the current vehicle adopting different behavior decision strategies according to the driving intention, and matching the optimal strategy of the current vehicle according to the cost value; determining the action of the current vehicle at the current moment according to the optimal strategy of the current vehicle, judging whether the action quantity of the action decision strategy output by the single step is consistent with the actual action quantity of the action output by the current vehicle at the current moment based on rolling time domain optimization, and acquiring the running state information of the traffic participant in the preset traffic environment again when the action quantity of the action decision strategy output by the single step is inconsistent with the actual action quantity of the action output by the current vehicle at the current moment until the action quantity of the action decision strategy output by the risk sensitive sequential is consistent with the actual action quantity of the action output by the current vehicle at the current moment, and outputting an optimal track according to the risk sensitive sequential decision strategy. Therefore, the problems that in the related technology, the intelligent vehicle decision method has certain requirements on the capacity and quality of training data samples, is difficult to apply to actual complex dynamic scenes and the like are solved, the intelligent vehicle dynamic multi-objective cooperation and multi-stage stable decision is realized in the complex scenes, and the application landing and development and upgrading of the intelligent vehicle are promoted.
Specifically, fig. 1 is a schematic flow chart of a risk-sensitive sequential behavior decision method for an intelligent vehicle according to an embodiment of the present application.
As shown in fig. 1, the intelligent vehicle risk-sensitive sequential behavior decision method includes the following steps:
in step S101, travel state information of a traffic participant in a preset traffic environment is acquired.
The preset traffic environment refers to a complex traffic environment and even any scene. Because various vehicles, pedestrians and the like exist in a real traffic environment, traffic participants in the embodiment of the application refer to other vehicles such as self vehicles, following vehicles and the like and surrounding obstacles. Because the intelligent vehicle risk-sensitive sequential behavior decision method considers the complexity of driving scenes, the unpredictability of the behaviors of traffic participants and the dynamic requirements of drivers on driving safety, the embodiment of the application needs to acquire the driving information of the traffic participants in the complex traffic environment so as to decide the risk-sensitive sequential behaviors according to the information.
In step S102, a dynamic objective function is constructed according to the driver risk sensitivity based on the driving state information.
After the driving state information of the traffic participant is acquired, the application further considers the sensitivity of the driver risk to construct a safe, efficient and unified dynamic objective function.
Specifically, the application constructs the minimum acting amount mathematical expression in the real physical system, thereby determining the unified driving target pursuing safety and high efficiency in the driving decision process of the driver, and further determining and outputting the target function considering the risk sensitivity of the driver based on the minimum acting amount.
Optionally, in some embodiments, constructing the dynamic objective function from the driver risk sensitivity based on the driving state information includes: constructing the minimum action quantity in a real physical system, and determining a unified driving target in the driving decision process of a driver based on the driving state information; based on the unified driving target, outputting a dynamic objective function based on the minimum action quantity, wherein the dynamic objective function is as follows:
wherein i is an intelligent vehicle, S Risk For the dynamic objective function, t, of the intelligent automobile i in the decision planning process 0 Is the initial time, t f Is the end time, L i Lagrangian equation, T, for two-vehicle systems i U is the kinetic energy of the vehicle i Is the potential energy of the system.
Specifically, in a system of free particles, the present application considers the true motion of particles from space point 1 to space point 2 based on the principle of least action, so that integrationThere is a minimum value present. To describe interactions between particles, the present application defines the function generated by their interactions as-. Additive according to the Lagrangian equation:
Wherein r is a Is the radial vector of the a-th particle.
It will be appreciated that in a real physical system, it is theoretically possible to find a system with the most significant distance from the start point to the end pointA path of small action volume and which satisfies newton's law. Thus, the application can establish the objective function of pursuing safety and high efficiency for intelligent vehicles and can be converted into the minimum action S min Is provided. The amount of action S is defined as the integral of the lagrangian quantity L between the two endpoints:
wherein T represents vehicle kinetic energy, U represents system potential energy, and T 0 Is the initial time, t f Is the end time.
In the application, when the unified driving target pursuing safety and high efficiency in the driving decision process of the driver is determined, the principle and rule of dynamic interaction between the driver and the intelligent vehicle in the complex dynamic traffic scene are mined and identified, and in order to realize personification in the driving process of the intelligent vehicle, the main target pursuing the driving process of the driver and the environment interaction process is extracted based on the driving behavior decision process, and the basic target pursuing the driving process of the driver is focused: safety and high efficiency. The safety is related to various factors of vehicle attributes (such as vehicle speed, quality and the like) and other vehicle interaction characteristics (including relative distance and relative speed of driving) and is a basic guarantee of the driving process. Efficiency is mainly aimed at pursuing the running speed.
Further, the present application establishes a lagrangian quantity reflecting the safety of the vehicle for the driving target of safety and efficiency pursued by the driver, while efficiency can be expressed as time spent throughout the driving. According to the principle of minimum effort, the cost of generating a feasible path can be defined as S Risk The dynamic objective function of the intelligent vehicle i in the decision-making planning process can be characterized as:
wherein i is an intelligent vehicle, S Risk For the dynamic objective function, t, of the intelligent automobile i in the decision planning process 0 Is the initial time, t f Is the end time, L i Lagrangian equation, T, for two-vehicle systems i U is the kinetic energy of the vehicle i Is the potential energy of the system.
Therefore, the application refers to the principle of 'minimum action amount' in physics, simulates a driver cognitive decision mechanism, aims at cooperative optimization of driving safety and traffic efficiency, and provides a dynamic multi-performance target cooperative intelligent vehicle decision method, which overcomes the defects of difficult weighting of a plurality of single targets, difficult dimension determination and the like, and realizes the decision optimization of intelligent vehicles in complex road traffic environments.
In step S103, a vehicle single step behavior decision strategy is determined based on the dynamic objective function, the longitudinal dynamic safety margin and the lateral dynamic safety margin.
It should be noted that, through step S102, the present application obtains a dynamic objective function, and the longitudinal dynamic safety margin and the transverse dynamic safety margin can be obtained by calculating the present application, so that the present application can determine a single step behavior decision strategy of the vehicle according to the dynamic objective function, the longitudinal dynamic safety margin and the transverse dynamic safety margin.
Specifically, fig. 2 is a schematic diagram of calculation and selection of a single-step behavior decision strategy of an intelligent vehicle according to an embodiment of the present application, as shown in fig. 2, when the single-step behavior decision strategy is selected based on a unified objective function, traffic participants interact with the intelligent vehicle during running, and aiming at uncertainty of behavior of interactive vehicles and uncertainty of dynamic and static environments, dynamic and static traffic participants (such as surrounding vehicles) can cause the driving behavior of the self-vehicle to be completed, such as overtaking and lane keeping, etc., due to the difference of running targets and dynamic requirements. Therefore, in the interaction process of two vehicles, the self-vehicle firstly judges the driving intention of surrounding vehicles, and potential risks and driver expectations are considered through a decision objective function, so that the application calculates the cost value of the self-vehicle for adopting different behavior decision strategies based on the minimum action amount principle, and can output better and stable behavior decisions in theory according to calculation.
Optionally, in some embodiments, before determining the vehicle single step behavior decision strategy based on the dynamic objective function, the longitudinal dynamic safety margin, and the lateral dynamic safety margin, further comprising: dividing a preset traffic environment into a plurality of two-vehicle systems formed by combining two vehicles according to the interaction between the vehicles based on the interaction relationship between the vehicles and traffic participants; and determining a Lagrange equation of the two-vehicle system, and determining a longitudinal dynamic safety margin and a transverse dynamic safety margin in a decision process according to the Lagrange equation of the two-vehicle system.
Optionally, in some embodiments, the lagrangian equation for a two-vehicle system is:
the lateral dynamic safety margin is:
r y =r ij,y +∈; (6)
the longitudinal dynamic safety margin is:
r x =r ij,x +Ψ(v ix ,(v ix -v jx ))Δt; (7)
wherein i represents a self-propelled vehicle, T i U is the kinetic energy of the vehicle i For the potential energy of the system, m i Is the mass, v of the vehicle i i For the speed of vehicle i, v j For the speed of the vehicle j, t 0 Is the initial time, t f Is the end time, R i Is the longitudinal constraint resistance of traffic rules to drivers, G i Is a virtual driving force generated by driving the intelligent vehicle by the driving target of the driver, G i,x For the longitudinal target driving force of the driver, G i,y For the lateral target driving force of the driver, v ix For the longitudinal speed of the vehicle i, v iy For the lateral speed of the vehicle i, F li, And F li, Two lane lines respectively representing the lane of travel of the target vehicleTransverse restraint force F of (2) ji Representing interaction risk force caused by the vehicle j to the vehicle i; r is (r) ij, Is the following distance of vehicles i and j in the longitudinal direction, r ij, For the following distance of vehicles i and j in the lateral direction, e is the lateral safety margin, ψ is the positive correlation function, Δt represents the step back time.
Specifically, based on the interaction relationship between vehicles and traffic participants, the application divides the complex traffic environment into a plurality of two-vehicle systems formed by combining two vehicles according to the interaction between vehicles, namely, a simple vehicle following system, a cut-in system and the like which can also be divided into two-vehicle interaction.
The Lagrangian equation of the two-vehicle system is as follows:
wherein i represents a host vehicle; m is m i Is the mass of vehicle i; v i And v j The vehicle speeds of the vehicles i, j, respectively. G i Is a virtual driving force generated by a driver driving a target driving intelligent vehicle, and the driver target driving force can enable the vehicle to move from a starting position to an end position. When there is no lane change, the target driving force of the driver is only in the longitudinal direction G i,x The method comprises the steps of carrying out a first treatment on the surface of the When a lane change is selected, the target driving force of the driver will have a component force G in the lateral direction due to the lateral movement i,y . In the vehicle movement process, the longitudinal constraint of road traffic speed limit on the vehicle movement and the transverse constraint of road traffic marking on the vehicle movement are required to be considered specifically. Wherein R is i Is the longitudinal constraint resistance of traffic rules to drivers; f (F) li, And F li, Respectively representing transverse constraint forces generated by two lane lines of a driving lane of a target vehicle; f (F) ji Representing the interaction risk force that vehicle j causes to vehicle i.
Further, in vehicle movement, the speed of the vehicle, and the relative speed of the vehicle and surrounding vehicles directly affect the potential risk of collision. Generally, the higher the speed of the vehicle, the greater the risk of collision. The greater the relative speed between the host vehicle and the surrounding vehicles, the greater the traffic disturbance and potential impact on the lead and trailing vehicles. Therefore, the vehicle safety margin in the longitudinal direction is positively correlated with the own vehicle speed and the relative speed. And is limited by the driver's elliptical viewing angle and visual attention distribution during the course of the driver's traffic. Lateral safety margin is related to the risk sensitivity of the driver, whereas considering that lateral speed variation is limited, margin may be defined as a variable related to dynamic relative distance only. The application thus defines the longitudinal and transverse dynamic safety margins respectively as:
r y =r ij,y +∈; (6)
r x =r ij,x +Ψ(v ix ,(v ix -v jx ))Δt; (7)
Wherein ψ is a positive correlation function; Δt represents the backoff time step; e is the lateral safety margin. r is (r) ij, And r ij, The distribution is the following distance of vehicles i and j in the longitudinal and transverse directions.
In step S104, based on the single-step behavior decision strategy, the driving intention of surrounding vehicles is identified, and the cost value of the current vehicle adopting different behavior decision strategies is calculated according to the driving intention, and the optimal strategy of the current vehicle is matched according to the cost value.
It should be noted that, the application firstly determines the single-step behavior decision model expression considering efficiency (kinetic energy embodiment) and safety (potential energy guarantee), then calculates the cost value of the own vehicle for adopting different behavior decision strategies, and selects a proper strategy according to the calculated output cost value.
Specifically, for any driving scenario, assume that there is a vehicle i in the traffic system, which has a Lagrangian equation L i Can be described as:
in the single step behavior decision process that considers driver risk sensitivity, the single step behavior decision cost can be characterized as S Risk
Wherein t is 0 Is the initial time, t f Is the end time.
Further, during the running process of the intelligent vehicle, the traffic participants interact with the intelligent vehicle, so that the dynamic and static traffic participants (such as surrounding vehicles) can finish the operations such as overtaking and lane keeping when the self-vehicle drives due to the difference of running targets and dynamic demands. Therefore, in the two-vehicle interaction process, the own vehicle of the embodiment firstly judges the driving intention of surrounding vehicles, calculates the cost value of the own vehicle for adopting different behavior decision strategies, and selects a proper strategy according to the calculated output cost value.
In step S105, determining the action of the current vehicle at the current moment according to the optimal policy of the current vehicle, determining whether the action amount of the single-step action decision policy is consistent with the actual action amount of the action output of the current vehicle at the current moment based on the rolling time domain optimization, and when the action amount is inconsistent with the actual action amount of the action output of the current vehicle at the current moment, acquiring the running state information of the traffic participant in the preset traffic environment again until the action amount of the risk sensitive sequential decision policy is consistent with the actual action amount of the action output of the current vehicle at the current moment, and outputting the optimal track according to the risk sensitive sequential decision policy.
Those skilled in the art will appreciate that the solution output during a single-step decision-making reasoning process may not be the optimal solution, even if the solution does not exist, resulting in decision concurrence in complex scenarios. Therefore, on the basis of realizing single-step behavior decision of the intelligent vehicle, the application considers that the multi-step sequential behavior decision process is a continuous multi-stage process, and by utilizing the structural characteristics of the traffic environment, the optimal path needs to be optimized and selected after a feasible limited candidate track curve is generated.
In some embodiments, the method outputs and executes the action of the own vehicle at the time t based on single-step decision, constructs a risk sensitive sequential decision method based on rolling time domain optimization at the time t+1, and repeatedly executes the steps until multi-step sequential action decision is realized through rolling execution, and a track which safely and efficiently reaches a destination is found.
Optionally, in some embodiments, the risk sensitive sequential decision strategy comprises: rolling optimization adjustment vehicle driving strategy in a time window to obtain the expression of the optimal dynamic objective function in a rolling time domain; and solving the functional extremum based on a preset variational method, and obtaining a risk sensitive sequential decision strategy according to the solving result.
Specifically, fig. 3 is a schematic diagram of rolling execution of an intelligent vehicle interactive multi-step behavior decision, as shown in fig. 3, in which the multi-step reasoning decision can enlarge a solution space, obtain a better decision after iterative optimization and rolling execution, ensure stability and continuity in a time domain, and realize a locally optimal or even globally optimal decision. Under the condition of the optimal cost value, the self-vehicle finishes the operation behavior of lane change, keeps the safe distance from the front vehicle to drive with the vehicle, and then recalculates the cost of the next-stage behavior decision strategy and selects the lane change or following behavior of the next stage.
Furthermore, on the basis of realizing single-step behavior decision of the intelligent vehicle, the multi-step sequential behavior decision process is considered to be a continuous multi-stage process, and by utilizing the structural characteristics of the traffic environment, the optimal path needs to be optimized and selected after a feasible limited candidate track curve is generated. Fig. 4 is a schematic diagram of an intelligent vehicle rolling time domain optimization strategy according to an embodiment of the present application, and as shown in fig. 4, the application of the rolling time domain optimization method can implement optimal control on a specific band constraint system, and can effectively implement real-time optimal control solution based on the state and constraint output by the system at the sampling time. The idea of dividing the decision process into multiple stages is adopted to solve the optimization problems of multiple variables and constraint on line.
In addition, it should be noted that the whole optimization process of the present application will output the current best state based on the existing state. The scrolling execution solves to finally obtain an optimal trajectory consisting of a series of locally (time) optimal trajectory segments.
Optionally, in some embodiments, the optimal dynamic objective function is expressed in the rolling time domain as:
wherein S is the actual acting quantity, k represents the moment, J (·) is a cost function, u (k) is an input vector, x (k) is a state vector, and Φ is a target set; s is S * To the ideal action quantity, H w Representing the scrolling horizon, τ is the time increment, u (k+τ|k) is the control input value from time k to the future time (k+τ), X (k+τ|k) is the predicted value from time k to the future time (k+τ), and Z (·) is the end penalty term.
In some embodiments, setting the sampling time to be 50ms will scroll through the horizon H w Setting to 15, when setting to a larger scrolling time domain (e.g., setting to 20), although the output trajectory performance is better, it results in longer computation time; if the scrolling time domain (e.g., set to 10) is small, the dynamic prediction range is insufficient and the effect may be poor.
Specifically, the main idea of the application in view of rolling time domain optimization is as follows: and carrying out iterative solution according to the corresponding objective function and constraint conditions, and finally obtaining the optimal input of the limited time period at the moment.
The intelligent vehicle can reach the target area with minimum cost under the conditions of avoiding obstacles and meeting corresponding constraint conditions. The dynamics of its linear time-invariant system are described as:
X(k+1)=AX(k)+Bu(k); (12)
where X (k) is the state vector and u (k) is the input vector. The output state vector satisfies the following constraint:that is, equation (12) is a state space model of the vehicle, and equation (13) is a constraint that the state and input need to satisfy.
In the process of calculating the actual cost, the application calculates and solves S of the path based on a variational method R1sk The minimum value can obtain the optimal trajectory of the multi-stage process, i.e. solve the functional extremum.
Specifically, the amount of action S i Namely, the functional S is used for quantifying the pursuit of the driver to the multi-target driving i The extremum of (2) is the extremum pursued by the driver for multiple targets. Thus, the present application solves for S of this path by calculation Risk The minimum value enables an optimal trajectory of the multi-stage process, and the variational method is generally used for solving functional extremum in numerical solution, namely:
wherein, the liquid crystal display device comprises a liquid crystal display device,is the theoretical minimum amount of action, is the actual action S of each track Risk Is an extremum of (a).
And solving an optimization problem by applying an Euler-Lagrange basic equation, wherein the basic equation expression is as follows:
based on the above embodiment, with reference to fig. 5, fig. 5 is a schematic diagram of an intelligent vehicle risk sensitive sequential behavior decision method according to an embodiment of the present application, and the implementation of the entire sequential behavior decision process of the present application includes:
1. Generating and calculating the actual action quantity of each track;
2. screening risk sensitive safe collision-free feasible tracks;
3. judging whether the actual acting quantity of the track is equal to or approaches to the theoretical minimum value;
4. and (5) rolling calculation and repeated iteration to output an optimal track. Finally, the optimal decision of the current state is output on a continuous time sequence.
In addition, in the defined sequential behavior decision problem based on rolling time domain optimization, the objective function is considered to be safe, efficient, balanced and optimal, and the constraint mainly suffered by the intelligent vehicle in the running process is traffic rule soft constraint (including lane line constraint, legal regulation speed limit and the like), peripheral traffic participant hard constraint, vehicle dynamics hard constraint and the like, and the objective function is embodied in potential field function and constraint conditions and is considered in solving. And finally, outputting an optimal track through rolling calculation and multiple iterations.
Therefore, the intelligent vehicle risk sensitive sequential behavior decision method provided by the application constructs a single-step behavior decision and multi-step sequential decision method based on rolling time domain optimization based on a unified objective function, ensures stability and continuity in time domain, realizes continuous time sequence, and outputs the optimal decision according to the rolling optimization of the current vehicle state.
In order for those skilled in the art to further understand the intelligent vehicle risk-sensitive sequential behavior decision method of the present application, the following enumerated examples schematically illustrate the steps of the method in conjunction with the accompanying drawings.
Specifically, fig. 6 is a schematic diagram of a flow of an intelligent vehicle risk-sensitive sequential behavior decision method according to an embodiment of the present application, as shown in fig. 6, the method includes the following steps:
step S601, sensing and acquiring driving state information of surrounding traffic participants in a complex traffic environment;
step S602, a safe and efficient uniform dynamic objective function is constructed in consideration of the risk sensitivity of a driver;
step S603, based on the objective function driving in step S602, constructing a single-step behavior decision method of the intelligent vehicle, and determining longitudinal and transverse dynamic safety margins in the decision process;
step S604, judging the driving intention of surrounding vehicles according to a single-step behavior decision method, calculating cost values of different behavior decision strategies adopted by the own vehicle, and selecting a proper strategy according to the calculated output cost values;
step S605, outputting the action of the own vehicle at the time t based on the single-step decision and executing the action;
step S606, at time t+1, constructing a risk sensitive sequential decision method based on rolling time domain optimization;
Step S607, repeating the above steps until the track to the destination is found safely and efficiently by implementing the multi-step sequential behavior decision through the rolling execution.
Therefore, the intelligent vehicle risk sensitive sequential behavior decision method can describe the action mechanism among traffic elements and the driving expected target of a driver by considering the complexity of road scenes, the individual randomness of traffic participants and the interactive game among individuals, and meets the dynamic multi-target requirement, so that the intelligent vehicle dynamic multi-target cooperation and multi-stage stable decision can be realized under the complex scenes.
In some embodiments, fig. 7 is a schematic diagram of single-step behavior decision, comparison of ideas of a driver and a classical decision method in any scenario in the embodiments of the present application, and in order to verify the rationality of decision logic of the method provided by the present application, comparison of solution ideas of the behavior decision method, the driver and the classical decision method in any scenario is performed. In the vehicle decision process, the classical method is often based on rules, and attempts to pre-determine how to treat each obstacle have the difficulty that the threshold parameters are fixed in different scenes, so that the vehicle is difficult to migrate to any scene, and the probability of application failure is increased. For the driver, the driving strategy of the current vehicle is obtained through experience, but traffic accidents are caused by the limitation of the driving experience and style of the driver. The behavior decision method provided by the application is driven based on the minimum action amount principle, the planning process is divided into two stages of feasible behavior decision generation and decision optimization evaluation, an excellent driver operating mechanism is learned, a comprehensive track quality evaluation function considering both safety and high efficiency is established, objective expression of the intelligent vehicle track quality evaluation function under different scenes is realized, and the method can be used for solving the track planning problem under a complex environment. Therefore, for intelligent vehicle driving safety evaluation, the vehicle collision-free condition during optimization solving can be ensured.
Of course, in other embodiments, fig. 8 is a schematic diagram of a dynamic multi-objective behavior decision process of an intelligent vehicle according to an embodiment of the present application, in order to cope with complexity of driving scenarios, unpredictability of behaviors of traffic participants, and dynamic demands of drivers on driving safety, efficiency, and comfort, when the intelligent vehicle performs trajectory planning in different scenarios, the intelligent vehicle needs to balance multi-performance objectives to obtain optimal performance. While an excellent driver can effectively cope with a complex uncertain environment, an intelligent vehicle decision system can simulate the brain of the driver, so how to think like a person simulates unified decision logic of the driver, the intelligent level of the vehicle is improved through learning to the driver, and the realization of maximum anthropomorphic is a key challenge in decision algorithm design. The behavior decision method of the application extracts behavior characteristics from a large amount of natural driving data of drivers, and combines the behavior characteristics with the physical system attributes of the nature, so that the finite rationality and cognitive deviation of the drivers can be avoided, and the dynamic pursuits of the intelligent vehicles on different targets in different scenes can be balanced.
Therefore, by constructing the intelligent vehicle risk sensitive sequential behavior decision method, the intelligent vehicle multi-scene multi-stage stable, reliable and continuous decision is realized, and the application landing and development upgrading of the intelligent vehicle are promoted.
According to the intelligent vehicle risk sensitive sequential behavior decision method provided by the embodiment of the application, a dynamic objective function is constructed by acquiring the running state information of traffic participants in a preset traffic environment, a single-step behavior decision strategy of a vehicle is determined based on the dynamic objective function, the longitudinal and transverse dynamic safety margins in the decision process are determined, the driving intention of surrounding vehicles is identified based on the single-step behavior decision strategy, the cost value of the vehicles taking different behavior decision strategies is calculated to match the optimal strategy of the vehicles, the steps are repeated until the risk sensitive sequential decision strategy is consistent with the action of the vehicles at the current moment, and the optimal track is output according to the risk sensitive sequential decision strategy. Therefore, the problems that in the related technology, the intelligent vehicle decision method has certain requirements on the capacity and quality of training data samples, is difficult to apply to actual complex dynamic scenes and the like are solved, the intelligent vehicle dynamic multi-objective cooperation and multi-stage stable decision is realized in the complex scenes, and the application landing and development and upgrading of the intelligent vehicle are promoted.
Next, an intelligent vehicle risk-sensitive sequential behavior decision device according to an embodiment of the present application is described with reference to the accompanying drawings.
Fig. 9 is a block schematic diagram of an intelligent vehicle risk-sensitive sequential behavior decision apparatus according to an embodiment of the present application.
As shown in fig. 9, the intelligent vehicle risk-sensitive sequential behavior decision apparatus 10 includes: the system comprises a traffic participant driving information acquisition module 100, a dynamic objective function construction module 200, an intelligent vehicle single-step behavior decision module 300, a behavior decision cost calculation and strategy selection module 400 and a risk sensitivity sequential decision model construction and optimization module 500.
The traffic participant driving information acquisition module 100 is configured to acquire driving state information of a traffic participant in a preset traffic environment; the dynamic objective function construction module 200 is configured to construct a dynamic objective function according to the risk sensitivity of the driver based on the driving state information; the intelligent vehicle single-step behavior decision module 300 is used for determining a vehicle single-step behavior decision strategy based on a dynamic objective function, a longitudinal dynamic safety margin and a transverse dynamic safety margin; the behavior decision cost calculation and strategy selection module 400 is configured to identify driving intentions of surrounding vehicles based on a single-step behavior decision strategy, calculate cost values of different behavior decision strategies adopted by the current vehicle according to the driving intentions, and match an optimal strategy of the current vehicle according to the cost values; and the construction and optimization module 500 of the risk sensitivity sequential decision model is configured to determine an action of the current vehicle at the current moment according to an optimal policy of the current vehicle, determine whether an action amount of the single-step action decision policy output by the single-step action decision module is consistent with an actual action amount of the action output of the current vehicle at the current moment based on rolling time domain optimization, and reacquire running state information of a traffic participant in a preset traffic environment when the action amount of the single-step action decision policy output by the single-step action decision module is inconsistent with the actual action amount of the action output of the current vehicle at the current moment until the action amount of the risk sensitivity sequential decision policy output by the single-step action decision module is consistent with the actual action amount of the action output of the current vehicle at the current moment, and output an optimal track according to the risk sensitivity sequential decision policy.
Optionally, in some embodiments, the dynamic objective function construction module 200 is specifically configured to: constructing the minimum action quantity in a real physical system, and determining a unified driving target in the driving decision process of a driver based on the driving state information; based on the unified driving target, outputting a dynamic objective function based on the minimum action quantity, wherein the dynamic objective function is as follows:
wherein i is an intelligent vehicle, S Risk For the dynamic objective function, t, of the intelligent automobile i in the decision planning process 0 Is the initial time, t f Is the end time, L i Lagrangian equation, T, for two-vehicle systems i U is the kinetic energy of the vehicle i Is the potential energy of the system.
Optionally, in some embodiments, prior to determining the vehicle single step behavior decision strategy based on the dynamic objective function, the longitudinal dynamic safety margin, and the lateral dynamic safety margin, the intelligent vehicle single step behavior decision module 300 is further configured to: dividing a preset traffic environment into a plurality of two-vehicle systems formed by combining two vehicles according to the interaction between the vehicles based on the interaction relationship between the vehicles and traffic participants; and determining a Lagrange equation of the two-vehicle system, and determining a longitudinal dynamic safety margin and a transverse dynamic safety margin in a decision process according to the Lagrange equation of the two-vehicle system.
Optionally, in some embodiments, the lagrangian equation for a two-vehicle system is:
the lateral dynamic safety margin is:
r y =r ij,y +∈;
the longitudinal dynamic safety margin is:
r x =r ij,x +Ψ(v ix ,(v ix -v jx ))Δt;
wherein i represents a self-propelled vehicle, T i U is the kinetic energy of the vehicle i For the potential energy of the system, m i Is the mass, v of the vehicle i i For the speed of vehicle i, v j For the speed of the vehicle j, t 0 Is the initial time, t f Is the end time, R i Is the longitudinal constraint resistance of traffic rules to drivers, G i Is a virtual driving force generated by driving the intelligent vehicle by the driving target of the driver, G i,x For the longitudinal target driving force of the driver, G i,y For the lateral target driving force of the driver, v ix For the longitudinal speed of the vehicle i, v iy For the lateral speed of the vehicle i, F li,1 And F li,2 Represents the lateral restraining forces generated by two lane lines of the driving lane of the target vehicle, F ji Representing interaction risk force caused by the vehicle j to the vehicle i; r is (r) ij, Is the following distance of vehicles i and j in the longitudinal direction, r ij, For the following distance of vehicles i and j in the lateral direction, e is the lateral safety margin, ψ is the positive correlation function, Δt represents the step back time.
Optionally, in some embodiments, the risk sensitivity sequential decision strategy comprises: rolling optimization adjustment vehicle driving strategy in a time window to obtain the expression of the optimal dynamic objective function in a rolling time domain; and solving the functional extremum based on a preset variational method, and obtaining a risk sensitive sequential decision strategy according to the solving result.
Optionally, in some embodiments, the optimal dynamic objective function is expressed in the rolling time domain as:
wherein S is the actual acting quantity, k represents the moment, J (·) is a cost function, u (k) is an input vector, x (k) is a state vector, and Φ is a target set; s is S * To the ideal action quantity, H w Representing the scrolling horizon, τ is the time increment, u (k+τ|k) is the control input value from time k to the future time (k+τ), X (k+τ|k) is the predicted value from time k to the future time (k+τ), and Z (·) is the end penalty term.
It should be noted that the foregoing explanation of the embodiment of the risk-sensitive sequential behavior decision method for an intelligent vehicle is also applicable to the risk-sensitive sequential behavior decision device for an intelligent vehicle of this embodiment, and will not be repeated here.
According to the intelligent vehicle risk sensitive sequential behavior decision device provided by the embodiment of the application, a dynamic objective function is constructed by acquiring the running state information of traffic participants in a preset traffic environment, a single-step behavior decision strategy of a vehicle is determined based on the dynamic objective function, the longitudinal and transverse dynamic safety margins in the decision process are determined, the driving intention of surrounding vehicles is identified based on the single-step behavior decision strategy, the cost value of the vehicles taking different behavior decision strategies is calculated to match the optimal strategy of the vehicles, the steps are repeated until the risk sensitive sequential decision strategy is consistent with the action of the vehicles at the current moment, and the optimal track is output according to the risk sensitive sequential decision strategy. Therefore, the problems that in the related technology, the intelligent vehicle decision method has certain requirements on the capacity and quality of training data samples, is difficult to apply to actual complex dynamic scenes and the like are solved, the intelligent vehicle dynamic multi-objective cooperation and multi-stage stable decision is realized in the complex scenes, and the application landing and development and upgrading of the intelligent vehicle are promoted.
Fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:
memory 1001, processor 1002, and a computer program stored on memory 1001 and executable on processor 1002.
The processor 1002 implements the risk-sensitive sequential behavior decision method of the intelligent electronic device provided in the above embodiment when executing the program.
Further, the electronic device further includes:
a communication interface 1003 for communication between the memory 1001 and the processor 1002.
Memory 1001 for storing computer programs that may be run on processor 1002.
The memory 1001 may include a high-speed RAM (Random Access Memory ) memory, and may also include a nonvolatile memory, such as at least one disk memory.
If the memory 1001, the processor 1002, and the communication interface 1003 are implemented independently, the communication interface 1003, the memory 1001, and the processor 1002 may be connected to each other through a bus and perform communication with each other. The bus may be an ISA (Industry Standard Architecture ) bus, a PCI (Peripheral Component, external device interconnect) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 10, but not only one bus or one type of bus.
Alternatively, in a specific implementation, if the memory 1001, the processor 1002, and the communication interface 1003 are integrated on a chip, the memory 1001, the processor 1002, and the communication interface 1003 may complete communication with each other through internal interfaces.
The processor 1002 may be a CPU (Central Processing Unit ) or ASIC (Application Specific Integrated Circuit, application specific integrated circuit) or one or more integrated circuits configured to implement embodiments of the present application.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the intelligent vehicle risk-sensitive sequential behavior decision method as above.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, for example, two, three, etc., unless specifically defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order from that shown or discussed, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable gate arrays, field programmable gate arrays, and the like.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.
While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims (10)

1. The intelligent vehicle risk-sensitive sequential behavior decision method is characterized by comprising the following steps of:
acquiring running state information of a traffic participant in a preset traffic environment;
based on the driving state information, constructing a dynamic objective function according to the risk sensitivity of the driver;
determining a vehicle single-step behavior decision strategy based on the dynamic objective function, the longitudinal dynamic safety margin and the transverse dynamic safety margin;
identifying the driving intention of surrounding vehicles based on the single-step behavior decision strategy, calculating the cost value of the current vehicle adopting different behavior decision strategies according to the driving intention, and matching the optimal strategy of the current vehicle according to the cost value; and
Determining the action of the current vehicle at the current moment according to the optimal strategy of the current vehicle, judging whether the action quantity of the single-step output action decision strategy is consistent with the actual action quantity of the action output of the current vehicle at the current moment based on rolling time domain optimization, and acquiring the running state information of the traffic participant in the preset traffic environment again until the action quantity of the risk sensitive sequential decision strategy is consistent with the actual action quantity of the action output of the current vehicle at the current moment when the action quantity of the action decision strategy is inconsistent with the actual action quantity of the action output of the current vehicle at the current moment, and outputting an optimal track according to the risk sensitive sequential decision strategy.
2. The method of claim 1, wherein constructing a dynamic objective function from driver risk sensitivity based on the driving state information comprises:
constructing the minimum action quantity in a real physical system, and determining a unified driving target in the driving decision process of the driver based on the driving state information;
based on the unified driving objective, a dynamic objective function based on the minimum amount of action is output,
wherein the dynamic objective function is:
wherein i is an intelligent vehicle, S Risk For the dynamic objective function, t, of the intelligent automobile i in the decision planning process 0 Is the initial time, t f Is the end time, L i Lagrangian equation, T, for two-vehicle systems i U is the kinetic energy of the vehicle i Is the potential energy of the system.
3. The method of claim 1, further comprising, prior to determining a vehicle single step behavior decision strategy based on the dynamic objective function, the longitudinal dynamic safety margin, and the lateral dynamic safety margin:
dividing the preset traffic environment into a plurality of two-vehicle systems formed by combining two vehicles according to the interaction between the vehicles based on the interaction relationship between the vehicles and the traffic participants;
and determining the Lagrange equation of the two-vehicle system, and determining the longitudinal dynamic safety margin and the transverse dynamic safety margin in the decision process according to the Lagrange equation of the two-vehicle system.
4. A method according to claim 3, wherein the lagrangian equation for the two-vehicle system is:
the lateral dynamic safety margin is:
r y =r ij,y +∈;
the longitudinal dynamic safety margin is:
r x =r ij,x +Ψ(v ix ,(v ix -v jx ))Δt;
wherein i represents a self-propelled vehicle, T i U is the kinetic energy of the vehicle i For the potential energy of the system, m i Is the mass, v of the vehicle i i For the speed of vehicle i, v j For the speed of the vehicle j, t 0 Is the initial time, t f Is the end time, R i Is the longitudinal constraint resistance of traffic rules to drivers, G i Is a virtual driving force generated by driving the intelligent vehicle by the driving target of the driver, G i,x For the longitudinal target driving force of the driver, G i,y For the lateral target driving force of the driver, v ix For the longitudinal speed of the vehicle i, v iy For the lateral speed of the vehicle i, F li,1 And F li,2 Represents the lateral restraining forces generated by two lane lines of the driving lane of the target vehicle, F ji Representing interaction risk force caused by the vehicle j to the vehicle i; r is (r) ij,x Is the following distance of vehicles i and j in the longitudinal direction, r ij,y For the following distance of vehicles i and j in the lateral direction, e is the lateral safety margin, ψ is the positive correlation function, Δt represents the step back time.
5. The method of claim 1, wherein the risk-sensitive sequential decision strategy comprises:
rolling optimization adjustment vehicle driving strategy in a time window to obtain the expression of the optimal dynamic objective function in a rolling time domain;
and solving the functional extremum based on a preset variational method, and obtaining the risk sensitive sequential decision strategy according to the solving result.
6. The method of claim 5, wherein the optimal dynamic objective function is expressed in the rolling time domain as:
wherein S is the actual acting quantity, k represents the moment, J (·) is a cost function, u (k) is an input vector, x (k) is a state vector, and Φ is a target set; s is S * To the ideal action quantity, H w Representing the scrolling horizon, τ is the time increment, u (k+τ|k) is the control input value from time k to the future time (k+τ), X (k+τ|k) is the predicted value from time k to the future time (k+τ), and Z (·) is the end penalty term.
7. An intelligent vehicle risk-sensitive sequential behavior decision device, comprising:
the traffic participant driving information acquisition module is used for acquiring driving state information of the traffic participant in a preset traffic environment;
the dynamic objective function construction module is used for constructing a dynamic objective function according to the risk sensitivity of the driver based on the driving state information;
the intelligent vehicle single-step behavior decision module is used for determining the vehicle single-step behavior decision strategy based on the dynamic objective function, the longitudinal dynamic safety margin and the transverse dynamic safety margin;
the behavior decision cost calculation and strategy selection module is used for identifying the driving intention of surrounding vehicles based on the single-step behavior decision strategy, calculating the cost value of the current vehicle adopting different behavior decision strategies according to the driving intention, and matching the optimal strategy of the current vehicle according to the cost value; and
The construction and optimization module is used for determining the action of the current vehicle at the current moment according to the optimal strategy of the current vehicle, judging whether the action quantity of the single-step output action decision strategy is consistent with the actual action quantity of the action output of the current vehicle at the current moment or not based on rolling time domain optimization, and re-acquiring the running state information of the traffic participant in the preset traffic environment until the action quantity of the risk sensitive type sequential decision strategy is consistent with the actual action quantity of the action output of the current vehicle at the current moment when the action quantity of the action decision strategy is inconsistent with the actual action quantity of the action output of the current vehicle at the current moment, and outputting an optimal track according to the risk sensitive type sequential decision strategy.
8. The device according to claim 7, wherein the dynamic objective function construction module is specifically configured to:
constructing the minimum action quantity in a real physical system, and determining a unified driving target in the driving decision process of the driver based on the driving state information;
based on the unified driving objective, a dynamic objective function based on the minimum amount of action is output,
wherein the dynamic objective function is:
wherein i is an intelligent vehicle, S Risk For the dynamic objective function, t, of the intelligent automobile i in the decision planning process 0 Is the initial time, t f Is the end time, L i Lagrangian equation, T, for two-vehicle systems i U is the kinetic energy of the vehicle i Is the potential energy of the system.
9. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the intelligent vehicle risk-sensitive sequential behavior decision method of any one of claims 1-6.
10. A computer readable storage medium having stored thereon a computer program, the program being executed by a processor for implementing the intelligent vehicle risk sensitive sequential behavior decision method of any of claims 1-6.
CN202310788233.1A 2023-06-29 2023-06-29 Intelligent vehicle risk sensitive sequential behavior decision method, device and equipment Pending CN116572993A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310788233.1A CN116572993A (en) 2023-06-29 2023-06-29 Intelligent vehicle risk sensitive sequential behavior decision method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310788233.1A CN116572993A (en) 2023-06-29 2023-06-29 Intelligent vehicle risk sensitive sequential behavior decision method, device and equipment

Publications (1)

Publication Number Publication Date
CN116572993A true CN116572993A (en) 2023-08-11

Family

ID=87545501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310788233.1A Pending CN116572993A (en) 2023-06-29 2023-06-29 Intelligent vehicle risk sensitive sequential behavior decision method, device and equipment

Country Status (1)

Country Link
CN (1) CN116572993A (en)

Similar Documents

Publication Publication Date Title
US11899411B2 (en) Hybrid reinforcement learning for autonomous driving
CN110834644B (en) Vehicle control method and device, vehicle to be controlled and storage medium
US11900797B2 (en) Autonomous vehicle planning
CN110297494B (en) Decision-making method and system for lane change of automatic driving vehicle based on rolling game
CN112099496B (en) Automatic driving training method, device, equipment and medium
US11465650B2 (en) Model-free reinforcement learning
Scheel et al. Situation assessment for planning lane changes: Combining recurrent models and prediction
US20200189597A1 (en) Reinforcement learning based approach for sae level-4 automated lane change
US11577750B2 (en) Method and apparatus for determining a vehicle comfort metric for a prediction of a driving maneuver of a target vehicle
Makantasis et al. Deep reinforcement‐learning‐based driving policy for autonomous road vehicles
US20210271988A1 (en) Reinforcement learning with iterative reasoning for merging in dense traffic
Li et al. Transferable driver behavior learning via distribution adaption in the lane change scenario
He et al. Probabilistic intention prediction and trajectory generation based on dynamic bayesian networks
CN113511222A (en) Scene self-adaptive vehicle interactive behavior decision and prediction method and device
CN112977412A (en) Vehicle control method, device and equipment and computer storage medium
CN115923833A (en) Personifying decision control method and device for vehicle, vehicle and storage medium
CN114987498A (en) Anthropomorphic trajectory planning method and device for automatic driving vehicle, vehicle and medium
CN114475608A (en) Method and device for changing lanes for automatic driving vehicle, vehicle and storage medium
US20230162539A1 (en) Driving decision-making method and apparatus and chip
CN116572993A (en) Intelligent vehicle risk sensitive sequential behavior decision method, device and equipment
Arbabi et al. Planning for autonomous driving via interaction-aware probabilistic action policies
CN115782880A (en) Intelligent automobile lane change decision-making method and device, electronic equipment and storage medium
CN115454082A (en) Vehicle obstacle avoidance method and system, computer readable storage medium and electronic device
Chen et al. Imitating driver behavior for fast overtaking through bagging Gaussian process regression
Cui et al. Reward Machine Reinforcement Learning for Autonomous Highway Driving: An Unified Framework for Safety and Performance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination