WO2022227827A1 - 一种决策方法、装置和车辆 - Google Patents

一种决策方法、装置和车辆 Download PDF

Info

Publication number
WO2022227827A1
WO2022227827A1 PCT/CN2022/077480 CN2022077480W WO2022227827A1 WO 2022227827 A1 WO2022227827 A1 WO 2022227827A1 CN 2022077480 W CN2022077480 W CN 2022077480W WO 2022227827 A1 WO2022227827 A1 WO 2022227827A1
Authority
WO
WIPO (PCT)
Prior art keywords
vehicle
game
strategy
self
cost
Prior art date
Application number
PCT/CN2022/077480
Other languages
English (en)
French (fr)
Inventor
程思源
郝东浩
杨绍宇
王新宇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP22794307.3A priority Critical patent/EP4321406A1/en
Publication of WO2022227827A1 publication Critical patent/WO2022227827A1/zh
Priority to US18/495,071 priority patent/US20240051572A1/en

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/08Active safety systems predicting or avoiding probable or impending collision or attempting to minimise its consequences
    • B60W30/095Predicting travel path or likelihood of collision
    • B60W30/0956Predicting travel path or likelihood of collision the prediction being responsive to traffic or environmental parameters
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/08Active safety systems predicting or avoiding probable or impending collision or attempting to minimise its consequences
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/08Active safety systems predicting or avoiding probable or impending collision or attempting to minimise its consequences
    • B60W30/09Taking automatic action to avoid collision, e.g. braking and steering
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/02Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to ambient conditions
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0011Planning or execution of driving tasks involving control alternatives for a single driving scenario, e.g. planning several paths to avoid obstacles
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0027Planning or execution of driving tasks using trajectory prediction for other traffic participants
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0027Planning or execution of driving tasks using trajectory prediction for other traffic participants
    • B60W60/00272Planning or execution of driving tasks using trajectory prediction for other traffic participants relying on extrapolation of current movement
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0027Planning or execution of driving tasks using trajectory prediction for other traffic participants
    • B60W60/00276Planning or execution of driving tasks using trajectory prediction for other traffic participants for two or more other traffic participants
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0019Control system elements or transfer functions
    • B60W2050/0022Gains, weighting coefficients or weighting functions
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0043Signal treatments, identification of variables or parameters, parameter estimation or state estimation
    • B60W2050/005Sampling
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2552/00Input parameters relating to infrastructure
    • B60W2552/50Barriers
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2554/00Input parameters relating to objects
    • B60W2554/40Dynamic objects, e.g. animals, windblown objects
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2554/00Input parameters relating to objects
    • B60W2554/40Dynamic objects, e.g. animals, windblown objects
    • B60W2554/404Characteristics
    • B60W2554/4045Intention, e.g. lane change or imminent movement
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2554/00Input parameters relating to objects
    • B60W2554/80Spatial relation or speed relative to objects

Definitions

  • the present invention relates to the technical field of intelligent driving, and in particular, to a decision-making method, a device and a vehicle.
  • the intelligent driving system can be divided into four key functional modules: positioning, environment perception, path planning and decision control.
  • the decision control module each manufacturer proposes decision planning methods applied to different scenarios. It is mainly divided into high-level semantic decision-making (such as lane-changing decision, lane-keeping decision, etc.) and obstacle decision-making (such as avoidance decision, car-following decision, rush decision, yield decision, etc.).
  • the embodiments of the present application provide a decision-making method, device and vehicle.
  • the present application provides a decision-making method, including: obtaining the predicted motion trajectories of the self-vehicle and each obstacle around the self-vehicle; Predict obstacles whose motion trajectories intersect or whose distance from the vehicle is less than the set threshold; construct a model for each game object according to the vehicle information of the vehicle collected by the sensor system, the obstacle information and road condition information of the game object.
  • each sampling game space includes at least one game strategy; calculate the strategy cost of each game strategy, the strategy cost is the value obtained by weighting the weights of each factor of the strategy cost; determine the decision result of the self-vehicle, The decision result is a game strategy with the smallest strategy cost in a shared sampling game space, the shared sampling game space includes at least one game strategy, and each sampling game space includes a game strategy in the shared sampling game space.
  • determining the decision result of the self-vehicle includes: constructing a feasible region of each sampling game space, where the feasible region of each sampling game space is at least one game strategy corresponding to a strategy cost that meets the set requirements ; In the intersection of feasible regions of all sampled game spaces, determine the game strategy with the smallest strategy cost in the same game strategy.
  • the present application constructs a feasible region between the self-vehicle and each obstacle by outputting each game strategy that meets the requirements, so that the present application is based on the The feasible domain realizes the judgment of the conflict between multiple game objects, so that the output game result is more reasonable.
  • the method further includes: determining a non-game object, where the non-game object is that each obstacle around the self-vehicle does not intersect with the predicted motion trajectory of the self-vehicle or the distance from the self-vehicle is not less than or equal to Obstacles with a set threshold; according to the vehicle information of the self-vehicle collected by the sensor system, the obstacle information and road condition information of the non-game object, the feasible domain of the self-vehicle is constructed, and the feasible domain of the self-vehicle is the self-vehicle.
  • Adopt at least one strategy of different decision-making in the case of non-game objects; output the decision-making result of the self-vehicle when it is detected that the decision-making result of the self-vehicle is within the feasible region of the self-vehicle.
  • the intersection of the feasible domain between the self-vehicle and each game object and the feasible domain of the self-vehicle and the non-game object is obtained, and the game cost is selected from the intersection.
  • the minimum game strategy is used as the decision result, so as to ensure that the selected decision result can be applied to scenarios including game objects and non-game objects.
  • a sampling game space is respectively constructed for each game object, including: according to the vehicle information of the own vehicle information, the obstacle information and road condition information of the game object, determine the upper and lower decision-making limits of the ego car and each obstacle in the game object; according to the set rules, each obstacle in the ego car and the game object Obtain the decision-making strategy of the self-vehicle and each obstacle in the game object from the upper and lower decision-making limits of At least one game strategy for each obstacle in the game object.
  • the game strategy of the ego car and each game object is obtained, and then the game strategy of the ego car and the game strategy of each game object are combined, The set of game strategies of the ego car and each game object is obtained, so as to ensure the rationality of the game strategies in each sampling game space.
  • the method further includes: determining each obstacle according to the distance between the ego vehicle and the game object and the conflict point, and the at least one game strategy of each obstacle in the ego car and the game object.
  • the behavior label of the game strategy, the conflict point is the position where the predicted motion trajectories of the ego vehicle and the obstacle intersect, or the position where the distance between the ego car and the obstacle is less than the set threshold.
  • the behavior label includes the ego vehicle giving way, the ego car At least one of overrun, self-vehicle and obstacle.
  • each game strategy is marked with a label, so that after the subsequent selection of the game result, the label of the game strategy can be directly sent to the execution unit of the next layer, and there is no need to use the game strategy adopted by both parties in the game strategy.
  • the ego vehicle should give way, the ego car rushes, or both the ego car and the obstacle should give way, so as to greatly reduce the decision-making time and improve the user experience.
  • the calculating the strategy cost of each game strategy includes: determining various factors of the strategy cost, and the various factors of the strategy cost include safety, comfort, passing efficiency, right of way, and priori of obstacles At least one of probability and historical decision correlation; calculate the factor cost of each factor in each strategy cost; weight the factor cost of each factor in each strategy cost to obtain the strategy cost of each game strategy .
  • the cost of each factor when calculating the strategy cost of each game strategy, the cost of each factor can be calculated, and then the cost of each factor can be weighted to obtain the cost of each game strategy, so as to determine the reasonableness of each game strategy .
  • the method further includes: comparing whether each factor in the strategy cost is within the set range; deleting the strategy including any factor that is not within the set range The game strategy corresponding to the cost.
  • the method further includes: detecting that the decision result of the self-vehicle is not within the feasible region of the self-vehicle, and outputting the decision-making result of the self-vehicle giving way.
  • the output decision result is not within the feasible region of the self-vehicle, it indicates that the decision-making results do not meet the conditions, and the self-vehicle will not output the decision-making result.
  • This situation is equivalent to that the self-vehicle does not perform the game process, and there is a serious problem. Therefore, when the decision result cannot be determined, according to the principle of "safety”, choose “yield from the vehicle” as the decision result, so as to ensure that the decision result selected by the vehicle can make the vehicle safe during driving. .
  • the present application provides a decision-making device, including: a transceiver unit for acquiring the predicted motion trajectories of the own vehicle and obstacles around the own vehicle; and a processing unit for determining a game object, where the game object is the own vehicle Among the surrounding obstacles, the obstacles that intersect with the predicted motion trajectory of the self-vehicle or the distance from the self-vehicle is less than the set threshold; the vehicle information of the self-vehicle collected by the sensor system, the obstacle information and road conditions of the game object information, construct a sampling game space for each game object, and each sampling game space includes at least one game strategy; calculate the strategy cost of each game strategy, and the strategy cost is obtained by weighting the weights of each factor of the strategy cost. and determine the decision result of the vehicle, the decision result is the game strategy with the smallest strategy cost in the shared sampling game space, the shared sampling game space includes at least one game strategy, and each sampling game space includes the shared sampling Game strategy in game space.
  • the processing unit is specifically configured to construct a feasible region of each sampled game space, and the feasible region of each sampled game space is at least one game strategy corresponding to a strategy cost that meets the set requirements; From the intersection of feasible domains of sampling game space, determine the game strategy with the smallest strategy cost in the same game strategy.
  • the processing unit is further configured to determine a non-game object, where the non-game object is the distance between each obstacle around the self-vehicle and the predicted motion trajectory of the self-vehicle or the distance between the self-vehicle and the self-vehicle.
  • Obstacles not smaller than the set threshold construct the feasible domain of the self-vehicle according to the vehicle information of the self-vehicle collected by the sensor system, the obstacle information of the non-game object and the road condition information, and the feasible domain of the self-vehicle is that the self-vehicle is not At least one strategy of different decisions is adopted in the case of colliding with the non-game object; it is detected that the decision result of the self-vehicle is within the feasible region of the self-vehicle, and the decision-making result of the self-vehicle is output.
  • the processing unit is specifically configured to determine the decision upper limit and decision-making limit of each obstacle in the self-vehicle and the game object according to the vehicle information of the self-vehicle, the obstacle information of the game object and the road condition information lower limit; according to the set rules, obtain the decision-making strategy of the self-vehicle and each obstacle in the game object from the decision-making upper limit and the decision-making lower limit of the self-vehicle and each obstacle in the game object; The decision-making strategy is combined with the decision-making strategy of each obstacle in the game object to obtain the at least one game strategy of the self-vehicle and each obstacle in the game object.
  • the processing unit is further configured to determine according to the distance between the ego vehicle and the game object and the conflict point, and the at least one game strategy of each obstacle in the ego car and the game object.
  • the behavior label of each game strategy, the conflict point is the position where the predicted motion trajectories of the ego vehicle and the obstacle intersect, or the position where the distance between the ego vehicle and the obstacle is less than the set threshold.
  • the behavior label includes the ego vehicle yielding, At least one of self-vehicle overrun, self-vehicle and obstacle yield.
  • the processing unit is specifically configured to determine various factors of the strategy cost, where the various factors of the strategy cost include safety, comfort, passing efficiency, right of way, prior probability of obstacles and historical decision correlation at least one of the properties; calculate the factor cost of each factor in each strategy cost; weight the factor cost of each factor in each strategy cost to obtain the strategy cost of each game strategy.
  • the processing unit is further configured to compare whether each factor in the strategy cost is within the set range; delete the game strategy corresponding to the strategy cost including any factor that is not within the set range.
  • the processing unit is further configured to detect that the decision result of the self-vehicle is not within the feasible region of the self-vehicle, and output the decision-making result of the self-vehicle giving way.
  • the present application provides an intelligent driving system, including at least one processor, where the processor is configured to execute instructions stored in a memory, and execute various possible implementations of the first aspect.
  • the present application provides a vehicle, comprising at least one processor configured to execute various possible implementations of the first aspect.
  • the present application provides an intelligent driving system, including a sensor system and a processor, where the processor is configured to execute each possible implementation of the first aspect.
  • the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed in a computer, causes the computer to execute the various possible embodiments of the first aspect.
  • the present application provides a computing device, comprising a memory and a processor, wherein executable code is stored in the memory, and when the processor executes the executable code, each possible implementation of the first aspect is implemented Example.
  • FIG. 1 is a schematic structural diagram of an intelligent driving system according to an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of a decision module provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a scene between a self-vehicle and a non-game object provided by an embodiment of the present application;
  • FIG. 5 is a schematic diagram of a scenario of trajectory conflict between an ego vehicle and a game object provided in Embodiment 1 of the present application;
  • FIG. 6 is a schematic diagram of the functional relationship between the time domain security cost and the absolute value of TDTC provided by Embodiment 1 of the present application;
  • FIG. 7 is a schematic diagram of the functional relationship between the safety cost in the space domain and the minimum distance between two vehicles according to Embodiment 1 of the present application;
  • FIG. 8 is a schematic diagram of a functional relationship between comfort cost and acceleration variation provided by Embodiment 1 of the present application.
  • FIG. 9 is a schematic diagram of the functional relationship between the passability cost and the time of passing the collision point according to Embodiment 1 of the present application.
  • FIG. 10 is a schematic diagram of a functional relationship between a game object’s preemptive prior probability cost and a game object’s probability of letting its own vehicle pass, according to Embodiment 1 of the present application;
  • 11 is a schematic diagram of the functional relationship between the right of way ratio and the distance from the social vehicle to the conflict point provided by the first embodiment of the application;
  • FIG. 12 is a schematic diagram of the functional relationship between the historical decision result correlation cost and the line grab cost or yield cost corresponding to each frame of image provided in Embodiment 1 of the present application;
  • FIG. 13 is a diagram of the occupation relationship between longitudinal distance and time for longitudinal planning performed by the motion planning module provided in Embodiment 1 of the present application;
  • the planning module 40 is used to receive the decision result output by the decision module 30, and according to the behavior labels of the obstacles, determine to give way/rush, avoid obstacles/follow and other actions for each obstacle, such as which lane the self-vehicle chooses, whether Change lanes, whether to follow the car, whether to detour, whether to stop, etc.
  • the object decision refers to the autonomous vehicle in the process of autonomous navigation, it needs to make decisions on obstacles in the environment, and label them with behaviors. Label it with detour, follow, rush, etc.
  • FIG. 2 is a schematic structural diagram of a decision-making module provided by an embodiment of the present application.
  • the decision module 30 includes a game object screening unit 301 , a game decision unit 302 , a rule decision unit 303 and a conflict processing unit 304 .
  • the game object screening unit 301 obtains the vehicle information, obstacle information and road condition information of the self-vehicle, according to whether the driving trajectories of the self-vehicle and each obstacle intersect, or according to the driving trajectory of the self-vehicle and each obstacle, Speed, acceleration and other data to determine whether there is an obstacle at the same position as the vehicle (or the distance between the two positions is less than the set threshold).
  • Scenario 2 Single-obstacle decision-making (there is no intersection of the tracks, there is potential conflict): such as the vehicle going straight, the obstacle merges into the adjacent lane, the same lane is located in front of the vehicle, etc.;
  • Scenario 4 Decision-making with multiple obstacles (with game objects and non-game objects): For example, if the vehicle is going straight, it follows the vehicle in front (non-game objects), and there are obstacles on the side.
  • the game decision unit 302 receives the distance between the vehicle and the game object from the theoretical collision point, the maximum and minimum acceleration values of the vehicle, and the distance between the vehicle and the game object. Speed, road maximum speed limit and other data, determine the ego vehicle and the game object to obtain different game strategy types by changing the acceleration value, and use the set of game strategies as the game strategy range. Then, in the set sampling method, n kinds of acceleration values of the vehicle and m kinds of acceleration values of game objects are selected to obtain n ⁇ m kinds of possible combination game strategy spaces of both parties.
  • the strategic cost of each game strategy is quantitatively described by the cost of each design factor.
  • the costs mentioned in this application include safety cost, comfort cost, passability cost, prior probability cost of game objects, right-of-way cost and historical decision result correlation cost, and the total cost is the weighted sum of six costs.
  • For each decision-policy pair in the policy space calculate its corresponding total return.
  • the strategy pair [1.0,-1.45] (the sampling acceleration of the self-vehicle is 1m/s 2 , and the sampling acceleration of the social vehicle is 1.45m/s 2 ) is used as an example to describe in detail.
  • the security of the space domain is based on the sampling acceleration corresponding to the game strategy of the two vehicles, along the planned path of the vehicle and the predicted path of the obstacle, recursively recurses the future movements of the two sides of the game, and obtains the minimum distance between the two vehicles in the next 10s ( Take 0.2s as the recursive step).
  • the minimum distance is larger, it is safer, and the security cost in the space domain is smaller.
  • the passability cost is related to the time to pass the collision point. If the time to pass the collision point is longer when decelerating, the passability cost increases. On the contrary, the acceleration time to pass the collision point is shorter, and the passability cost decreases.
  • the quantitative relationship is shown in Figure 9. Show. Among them, the horizontal axis in Figure 9 is the difference deltaPassTime between the time samplePassTime calculated by the acceleration in the strategy space and the time realPassTime passing the collision point and the time realPassTime passing the collision point with the currently observed speed and acceleration, and the vertical axis is the passability cost.
  • 100 is the social vehicle passability cost weight.
  • the prior probability cost of the game object is related to the probability that the game object allows the vehicle to pass. The greater the probability of giving way, the smaller the prior probability cost of the game object.
  • the quantitative relationship is shown in Figure 10.
  • the road right ratio of the scene f (distance from the social vehicle to the conflict point), a non-linear function, the range is [0.0, 1.0].
  • the road right ratio of the scene refers to the concept of the social vehicle obtaining the road right, which is related to the social vehicle and the conflict point. The specific quantitative relationship is shown in Figure 11. If the distance between the social vehicle and the conflict point is less than 5m, the road right ratio of the scene is 1; if the distance between the social vehicle and the conflict point is greater than 5m, the road The weight ratio decreases as the distance between the social vehicle and the conflict point increases; if the distance between the social vehicle and the conflict point is greater than 100m, the road weight ratio of the scene is 0.
  • the introduction of the correlation cost of historical decision results is to prevent the decision jump of two consecutive frames. If the previous frame is rushing, the current frame rushing cost is reduced, making it easier to output the rushing decision; if the previous frame is yielding, the current frame's yielding cost is reduced, making it easier to output Yield decision, the quantitative relationship between the historical decision result association cost and the rushing cost or yielding cost corresponding to each frame of image is shown in Figure 12. If the Kth frame image is represented as rushing, the K+ After 1 frame of image, if it is calculated as the rush cost, the correlation cost between the K-th frame image and the K+1-th frame image is reduced; if it is calculated as the yield cost, then the K-th frame image and the The association cost between K+1 frame images increases.
  • each item in the formula is the time domain safety cost, the road right cost, the space domain safety cost, the self-vehicle comfort cost, the game car comfort cost, the inter-frame correlation cost, the prior probability cost and the passability cost.
  • the cost values in Table 2 are divided into normal fonts, bold fonts and oblique fonts. Among them, when the self-vehicle arrives at the conflict point before the gaming car, it indicates that the self-vehicle behavior is preemptive (the value corresponding to the normal font indicates the preemptive behavior The strategy cost of the game strategy); when the self-vehicle reaches the conflict point after the game car, it indicates that the self-vehicle behavior is yielding (the cost value corresponding to the oblique font represents the strategic cost of the game strategy of giving way); when the self-vehicle and the game car If they stop before reaching the conflict point, it means that the behavior of the ego car and the gaming car is to give way (the cost value corresponding to the bold font represents the strategic cost of the game strategy of the ego car and the gambler car to give way).
  • step 2 all action combinations in the effective action space between the ego car and the gaming car can be generated, as well as the total cost of each pair of action combinations. For these action combinations, each sub-cost is evaluated and screened, and all reasonable alternative combinations are selected to form the feasible region of the ego car for the game object. Sub-costs such as comfort and right of way are considered valid values within the set interval.
  • “deletion-1” is the invalid action output of the time domain security cost
  • “deletion-2” is the invalid action output of the space domain security cost
  • “deletion-3” is the invalid action output of the traffic cost.
  • the feasible region of the ego vehicle for the game object is the effective acceleration combination in the table
  • the game strategy with the smallest cost Total is selected as the final decision result in the feasible region, that is, the global income corresponding to the game strategy with the smallest cost Total
  • the game decision module sends the optimal acceleration that the game object will take to the downstream motion planning module, and the motion planning module plans according to the acceleration value.
  • the self-vehicle has no right of way (traffic rules stipulate that it should turn and go straight), and the obstacles are marked as "yield" during the whole process.
  • calculate the acceleration at which the game object will rush to run send the acceleration value to the motion planner, and select the corresponding acceleration value of the ego vehicle according to the corresponding acceleration combination in the feasible domain, so that Accurately plan the transfer action.
  • the ego car When the ego car and the game object have just entered the intersection, the ego car has a large feasible area for the game object, and the acceleration in the range of [0,2]m/s 2 is used to give way (-4,0)m/s 2 . can complete the interaction.
  • the acceleration strategy pair of (-3, 0.45) m/s 2 when the acceleration strategy pair of (-3, 0.45) m/s 2 is selected, the game pair composed of the ego car and the game object has an optimal solution.
  • the main advantage of the solution is that it follows the right-of-way description while maintaining adequate safety and comfort.
  • the obtained obstacle will rush ahead at the acceleration of (0.45) m/s 2 , and the acceleration value will be sent to the motion planning layer, and the motion planning module will correct the predicted trajectory of the obstacle according to the acceleration value.
  • the speed planning is carried out through the occupation (station-time, ST) relationship diagram of the longitudinal distance and time of the obstacle on the vehicle path in Fig. 13 to achieve safety. give way.
  • the above process mainly introduces how the game is performed in each frame.
  • the ego car gives way to the game object.
  • the safety cost of the self-driving car will continue to increase. Therefore, in the process of selecting the optimal game result, the game object will inevitably keep giving way until the game object passes the intersection of the trajectory, and the game is over. .
  • the preemption/yield decision scheme for obstacles in the first embodiment of the present application does not depend on the specific obstacle interaction form and trajectory intersection features, but uses the sensor system to obtain traffic scene information, and reasonably abstracts the traffic scene.
  • the generalization of application scenarios is realized. At the same time, get how fast the ego vehicle should rush/yield, and how fast the obstacle will rush/yield, and use these values to influence motion planning to ensure the correct execution of decision-making instructions.
  • the self-vehicle turns left at an unprotected intersection, the social vehicle A and the social vehicle B are in the opposite lane and go straight, and there is a social vehicle C driving in the same direction in front of the driving path of the self-vehicle.
  • the social car C follows the social car C, and the self-car forms a game relationship with the social cars A and B.
  • the speed of the ego vehicle is 8km/h
  • the speed of the vehicle A is 14km/h
  • the speed of the vehicle B is 14km/h
  • the distance from the vehicle to the intersection with the vehicle A is 21m
  • the speed of the vehicle to the vehicle B is 21m.
  • the distance from the intersection point of the vehicle is 25m
  • the distance from the vehicle A to the intersection point with the vehicle is 35m
  • the distance from vehicle B to the intersection point with the vehicle is 24m
  • the feedforward acceleration of the vehicle is 0.22m/s 2
  • the observed acceleration of the vehicle is 0.0m/ s 2
  • the observed acceleration of the social car is 0.0m/s 2
  • the static speed limit of the road is 40km/h
  • the speed limit of the path curvature is 25km/h.
  • the speed of the social vehicle C is 10 km/h
  • the acceleration is 0.0 m/s 2
  • the distance from the rear of the vehicle to the front of the vehicle is 15 m.
  • the allowable acceleration sampling interval for self-vehicle is [-4,2]m/s 2
  • the allowable acceleration sampling interval for social vehicle A and social vehicle B is [-3.55,3.0]m/s 2 .
  • the acceleration interval is set as 1m/s 2 .
  • the social car A and the social car B are respectively played in a single-car game, wherein the cost function design, weight distribution, and feasible region selection methods are consistent with those in the first embodiment.
  • the feasible domains corresponding to social vehicles A and B can be obtained respectively.
  • the feasible domains of self-vehicle and social vehicle A are shown in Table 4.
  • the cost values in Table 2 are divided into normal fonts, bold fonts and oblique fonts. The value is expressed as both the ego car and the game car stop before the point of conflict.
  • the feasible regions [0.45,-1] and [-1.55,-2] of the self-vehicle and the social vehicle A are the optimal costs in the full set of preemptive or yielding costs.
  • the cost values in Table 2 are divided into normal fonts, bold fonts and oblique fonts. The value is expressed as both the ego car and the game car stop before the point of conflict.
  • the feasible regions [0.45,-1] and [-3.55,-2] of the self-vehicle and the social vehicle A are the optimal costs in the full set of preemptive or yielding costs.
  • the decision of the self-vehicle cannot generate risks with it. It needs to be estimated according to the speed of the self-vehicle, the acceleration of the self-vehicle, the speed of the social vehicle C, the acceleration of the social vehicle C, and the distance from the self-vehicle to the social vehicle C.
  • the acceleration feasible region of the ego vehicle This part is realized by the vertical planning module, and its calculation model is:
  • accUpLimit is the upper limit of the acceleration decision
  • objV is the obstacle speed
  • egoV is the ego vehicle speed
  • speedGain and distTimeGain are adjustable parameters.
  • the upper bound of the acceleration of the ego vehicle is 0.8m/s 2
  • the feasible region is [-4,0.8]m/s 2 .
  • the final two-vehicle comprehensive optimal solution is: give way to the social vehicle A (expected acceleration is 0.45), and yield to the social vehicle B (expected acceleration is 0.45), and the optimal expected acceleration of the self-vehicle is -1.0.
  • the comprehensive decision-making of this conflict resolution can obtain the greatest overall benefit and ensure that the decision-making result is feasible for each obstacle.
  • the ego vehicle will make a comprehensive decision based on all the obstacles considered, and obtain the optimal game result that can satisfy multiple obstacles at the same time.
  • the vehicle on the planned path of the self-vehicle constitutes a virtual wall constraint on the self-vehicle, that is, a feasible region corresponding to an acceleration range of [-4.0, 0.8].
  • the self-vehicle makes game decisions on the social vehicle A and the social vehicle B respectively, and the self-vehicle feasible domain for each vehicle can be obtained. The intersection of these three feasible domains is obtained, and the feasible domain that satisfies all obstacles in the scene is obtained.
  • the optimal solution of the self-car for all game obstacles is obtained, that is, the social car A and then B are obtained.
  • Each obstacle requires yielding results.
  • the second embodiment mainly solves the multi-objective scene game problem.
  • the corresponding feasible region is estimated based on the game type.
  • the feasible region is directly obtained, and the optimal solution for all game objects is solved in the feasible region to achieve the consistency of multi-objective decision-making.
  • the sampling space feasible region of each game car is obtained first, then the feasible region of the own car is estimated for the non-game car, and finally the feasible region of the game car and the non-game car is estimated. , take the public feasible domain, and calculate the optimal solution in it, and finally obtain the global optimal solution of the self-vehicle and multiple social vehicles.
  • FIG. 15 is a schematic flowchart of a decision-making method provided by an embodiment of the present application. As shown in FIG. 15 , an embodiment of the present application provides a decision-making method, and the specific implementation process is as follows:
  • step S1501 the predicted motion trajectories of the ego vehicle and each obstacle around the ego car are obtained.
  • the predicted motion trajectory can be obtained through the data collected by the GPS unit, INS unit, odometer, camera, radar and other sensors in the sensor system to obtain information such as the location of the vehicle, the environment around the vehicle, and the state of the vehicle, and then perform the obtained information.
  • the path of the ego vehicle and each obstacle around the ego car can be predicted in the future period of time.
  • Step S1503 determine the game object.
  • the game object is an obstacle that intersects with the predicted motion trajectory of the self-vehicle or the distance between the obstacle and the self-vehicle is less than the set threshold value among the obstacles around the self-vehicle.
  • the present application determines whether the predicted motion trajectories of the self-vehicle and each obstacle intersect, or according to the predicted motion trajectory, driving trajectory, speed of the self-vehicle and each obstacle , acceleration and other data to determine whether the distance between the position of the obstacle and the position of the vehicle is less than the set threshold). If it is detected that the predicted motion trajectory of an obstacle intersects with the predicted motion trajectory of the own vehicle, or the distance between the two vehicles is less than the set threshold, this type of obstacles will be divided into game objects; other obstacles will be divided into game objects. for non-game objects.
  • Step S1505 construct a sampling game space for each game object respectively according to the vehicle information of the own vehicle, the obstacle information and the road condition information of the game object collected by the sensor system.
  • each sampling game space is a set of different game strategies adopted between the ego vehicle and an obstacle in the game object.
  • This application determines the game strategy range of the ego car and each game object, such as the acceleration range and speed range of the ego car, according to the predefined game mode, road condition information, the movement ability of the ego car and each obstacle and other factors. Then, within the scope of the game strategy, sample the feasible game strategies of the ego car and each game object to obtain the number of feasible game strategies of the ego car and each game object, and then combine the feasible game strategies of the ego car and each game object. A variety of different game strategy spaces of the combination can be obtained.
  • the game mode of changing the acceleration as an example, according to the received distance between the ego vehicle and a game object from the location where the theoretical collision occurs, the maximum and minimum acceleration of the vehicle, the speed of the ego vehicle, the road Based on the data such as the maximum speed limit, it is determined that the ego car and a game object obtain different game strategy types by changing different acceleration values, and the set of game strategies is used as the game strategy range. Then, with the set sampling method, select n kinds of acceleration values of self-vehicles and m kinds of acceleration values of game objects, and then the game strategy space of n ⁇ m kinds of possible combinations of both parties can be obtained.
  • the factors that affect the cost of the strategy include safety, comfort, passing efficiency, right of way, probability of letting obstacles pass, historical decision-making methods, and so on. Therefore, when calculating the strategy cost of each game strategy, the cost of each game strategy can be obtained by calculating the cost of each factor and then weighting the cost of each factor.
  • the present application determines which of the ego car and each obstacle arrives at the conflict point first.
  • the decision-making strategy of the ego vehicle and the obstacle in a game strategy determines that the ego vehicle reaches the conflict point before the obstacle, indicating that the ego vehicle behavior is rushing, the game strategy is marked as the label of "ego vehicle rushing";
  • the game strategy is marked as the label of "yield vehicle”;
  • the game strategy is marked as the label of "yield vehicle”;
  • a game strategy In the decision-making strategy of the ego vehicle and the obstacle it is determined that both the ego car and the obstacle stop before the conflict point, indicating that the behavior of the ego car and the obstacle is to give way, and the game strategy is marked as "
  • step S1509 the decision result of the self-vehicle is determined.
  • the decision result is the game strategy with the smallest strategy cost among the same game strategies in each sampling game space.
  • the rationality of each factor weighted to the cost of each game strategy is evaluated and screened, and the game strategy that includes unreasonable factors is The cost is deleted, so as to screen out the reasonable game strategy cost as the feasible domain of the ego car and the game object.
  • the feasible regions of the ego car and each game object find the intersection of the obtained feasible regions, and obtain the public feasible region that satisfies the current scenario when the ego car encounters multiple game objects, and then select the game from the public feasible region.
  • the game strategy with the least cost is used as the decision result.
  • the feasible region of the self-vehicle for the constraint region should be estimated in the constraint region formed by the non-game objects.
  • action games such as rushing/yielding
  • a virtual wall will be created in front of the vehicle as the upper limit of acceleration constraint; direction
  • the ego car uses the non-game object to constitute the maximum lateral displacement range as a constraint, so as to construct a feasible region between ego car and non-game object.
  • the intersection of the public feasible domain between the self-vehicle and each game object and the feasible domain of the self-vehicle and the non-game object is obtained, and the game strategy with the least game cost is selected from the intersection set as the decision result; if there is no game strategy in the intersection set, then According to the principle of "safety”, choose the decision result of "yield by car".
  • the game object is determined by obtaining the predicted motion trajectories of the self-vehicle and each obstacle around the self-vehicle, and by judging whether the predicted motion trajectories intersect or whether the distance between the two vehicles is less than a set threshold;
  • the sampling game space between the car and each obstacle and calculate the strategy cost of each game strategy in each sampling game space; by solving the same game strategy in each sampling game space, select the game strategy with the smallest strategy cost in the same game strategy.
  • the scheme does not depend on the scene, it can be adapted to all scenes.
  • the self-vehicle can play games with multiple game objects at the same time.
  • FIG. 16 is a schematic structural diagram of a decision-making apparatus provided by an embodiment of the present application.
  • the apparatus 1600 shown in FIG. 16 includes a transceiver unit 1601 and a processing unit 1602 . Specifically perform the following functions:
  • the transceiver unit 1601 is used to obtain the predicted motion trajectories of the own vehicle and each obstacle around the own vehicle; the processing unit 1702 is used to determine a game object, and the game object is the intersection of the predicted motion trajectories of the own vehicle among the obstacles around the own vehicle. Or obstacles whose distance from the self-vehicle is less than the set threshold; according to the vehicle information of the self-vehicle collected by the sensor system, the obstacle information and road condition information of the game object, a sampling game space is constructed for each game object respectively.
  • each sampling game space is a set of different game strategies adopted between the ego vehicle and an obstacle in the game object; calculate the strategy cost of each game strategy, which is obtained by weighting the weights of various factors that affect the strategy cost. and determine the decision result of the vehicle, the decision result is the game strategy with the smallest strategy cost in the shared sampling game space, the shared sampling game space includes at least one game strategy, and each sampling game space includes the shared sampling Game strategy in game space.
  • the processing unit 1602 is specifically configured to construct a feasible region of each sampled game space, and the feasible region of each sampled game space is at least one game strategy corresponding to a strategy cost that meets the set requirements; From the intersection of feasible domains of sampling game space, determine the game strategy with the smallest strategy cost in the same game strategy.
  • the processing unit 1602 is specifically configured to determine the decision upper limit and decision-making limit of each obstacle in the self-vehicle and the game object according to the vehicle information of the self-vehicle, the obstacle information and road condition information of the game object lower limit; according to the set rules, obtain the decision-making strategy of the self-vehicle and each obstacle in the game object from the decision-making upper limit and the decision-making lower limit of the self-vehicle and each obstacle in the game object; The decision-making strategy is combined with the decision-making strategy of each obstacle in the game object to obtain the at least one game strategy of the self-vehicle and each obstacle in the game object.
  • the processing unit 1602 is further configured to determine, according to the distance between the ego vehicle and the game object and the conflict point, and the at least one game strategy of each obstacle in the ego car and the game object.
  • the behavior label of each game strategy, the conflict point is the position where the predicted motion trajectories of the ego vehicle and the obstacle intersect, or the position where the distance between the ego vehicle and the obstacle is less than the set threshold.
  • the behavior label includes the ego vehicle yielding, At least one of self-vehicle overrun, self-vehicle and obstacle yield.
  • the present invention provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed in a computer, the computer is made to execute any one of the above methods.
  • various storage media described herein can represent one or more devices and/or other machine-readable media for storing information.
  • the term "machine-readable medium” may include, but is not limited to, wireless channels and various other media capable of storing, containing, and/or carrying instructions and/or data.
  • the computer instructions may be stored on or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted over a wire from a website site, computer, server or data center (eg coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.) to another website site, computer, server or data center.
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes one or more available media integrated.
  • the available media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state disks (SSDs)), and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Traffic Control Systems (AREA)

Abstract

一种决策方法,包括获取自车和自车周围各个障碍物的预测运动轨迹;确定博弈对象为自车周围各个障碍物中与自车的预测运动轨迹相交或与自车之间的距离小于设定阂值的障碍物;根据传感器系统采集的自车的车辆信息、博弈对象的障碍物信息和路况信息,分别对每个博弈对象构建一个采样博弈空间,每一个采样博弈空间均包括至少一个博弈策略;计算每一个博弈策略的策略代价,策略代价为将策略代价的各个因素权重进行加权得到的数值;确定自车的决策结果,决策结果为共有采样博弈空间中策略代价最小的博弈策略,共有采样博弈空间包括至少一个博弈策略,每一个采样博弈空间中均包括共有采样博弈空间中的博弈策略。一种决策装置、车辆、智能驾驶系统、计算机可读存储介质、计算设备以及计算机程序产品也被公开。

Description

一种决策方法、装置和车辆
本申请要求于2021年04月26日提交中国国家知识产权局、申请号为202110454337.X、申请名称为“一种决策方法、装置和车辆”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及智能驾驶技术领域,尤其涉及一种决策方法、装置和车辆。
背景技术
随着智能化的发展和普及,车辆的智能驾驶成为当前比较热门的研究方向。智能驾驶系统根据功能需求,可分为定位、环境感知、路径规划和决策控制四个关键功能模块。其中,在决策控制模块中,各厂商提出了应用于不同场景的决策规划方法。主要分为高层的语义决策(如换道决策、车道保持决策等)和对物体的障碍物决策(如避让决策、跟车决策、抢行决策、让行决策等)。
对于物体的障碍物决策过程中,现有的通过检测障碍物类型为车辆规划驾驶路线的决策方式只能处理特定场景,这些方式通常对希望处理的特定场景进行定量描述,然后再提取关键障碍物的关键信息来进行决策。因此,交通场景泛化能力差,无法在其它场景下对障碍物环境进行处理。
发明内容
为了解决上述的问题,本申请的实施例提供了一种决策方法、装置和车辆。
第一方面,本申请提供一种决策方法,包括:获取自车和自车周围各个障碍物的预测运动轨迹;确定博弈对象,该博弈对象为该自车周围各个障碍物中与该自车的预测运动轨迹相交或与自车之间的距离小于设定阈值的障碍物;根据传感器系统采集的自车的车辆信息、该博弈对象的障碍物信息和路况信息,分别对每个该博弈对象构建一个采样博弈空间,每一个采样博弈空间均包括至少一个博弈策略;计算每一个博弈策略的策略代价,该策略代价为将策略代价的各个因素权重进行加权得到的数值;确定自车的决策结果,该决策结果为共有采样博弈空间中策略代价最小的博弈策略,该共有采样博弈空间包括至少一个博弈策略,该每一个采样博弈空间中均包括该共有采样博弈空间中的博弈策略。
在该实施方式中,通过获取自车和自车周围的各个障碍的预测运动轨迹,通过判断预测运动轨迹是否相交或两车之间的距离是否小于设定阈值,确定出博弈对象;然后构建自车与各个障碍物之间的采样博弈空间,并计算各个采样博弈空间中的各个博弈策略的策略代价;通过求解各个采样博弈空间中的相同博弈策略,选择相同博弈策略中策略代价最小的博弈策略作为博弈结果,由于该方案不依赖场景,所以可以适应于所有场景。同时,在博弈过程中,面对多个博弈对象时,通过求解各个采样博弈空间中的相同博弈策略的方式,实现自车可以与多个博弈对象同时进行博弈。
在一种实施方式中,该确定自车的决策结果,包括:构建每一个采样博弈空间的可行域,该每一个采样博弈空间的可行域为符合设定要求的策略代价对应的至少一个博弈策略;在所有采样博弈空间的可行域交集中,确定出相同博弈策略中策略代价最小的博弈策略。
在该实施方式中,区别于现有技术直接从得到最优博弈策略的结果,本申请通过将满足要求的各个博弈策略输出,构建自车和各个障碍物之间的可行域,以便本申请基于可行域实现对多个博弈对象之间的冲突进行判决,使输出的博弈结果更加合理。
在一种实施方式中,该方法还包括:确定非博弈对象,该非博弈对象为该自车周围各个障碍物中与该自车的预测运动轨迹不相交或与自车之间的距离不小于设定阈值的障碍物;根据传感器系统采集的自车的车辆信息、该非博弈对象的障碍物信息和路况信息,构建自车的可行域,该自车的可行域为自车在不碰撞该非博弈对象情况下采取不同决策的至少一个策略;检测到该自车的决策结果在该自车的可行域内,输出该自车的决策结果。
在该实施方式中,通过构建自车与非博弈对象的可行域,将自车与各个博弈对象之间的可行域和自车与非博弈对象的可行域求交集,从交集中选择出博弈代价最小的博弈策略作为决策结果,从而保证选择出来的决策结果可以适用于包括博弈对象和非博弈对象的场景下。
在一种实施方式中,根据传感器系统采集的自车的车辆信息、该博弈对象的障碍物信息和路况信息,分别对每个该博弈对象构建一个采样博弈空间,包括:根据该自车的车辆信息、该博弈对象的障碍物信息和路况信息,确定自车和该博弈对象中每一个障碍物的决策上限和决策下限;按照设定规则,在该自车和该博弈对象中每一个障碍物的决策上限和决策下限中,获取该自车和该博弈对象中每一个障碍物的决策策略;将该自车的决策策略与该博弈对象中每一个障碍物的决策策略组合,得到自车和该博弈对象中每一个障碍物的至少一个博弈策略。
在该实施方式中,通过确定自车和各个博弈对象的博弈策略选取范围和选取方式,得到自车和各个博弈对象的博弈策略,然后将自车的博弈策略与各个博弈对象的博弈策略组合,得到自车和各个博弈对象的博弈策略的集合,从而保证每个采样博弈空间中的博弈策略的合理性。
在一种实施方式中,该方法还包括:根据自车和该博弈对象与冲突点之间的距离、该自车和该博弈对象中每一个障碍物的所述至少一个博弈策略,确定每一个博弈策略的行为标签,该冲突点为自车与障碍物的预测运动轨迹相交的位置或自车与障碍物之间的距离小于设定阈值的位置,该行为标签包括自车让行、自车抢行、自车和障碍物均让行中的至少一个。
在该实施方式中,通过对每个博弈策略标注标签,以便后续选择出博弈结果后,可以直接将博弈策略的标签发送给下一层的执行单元,不需要再根据博弈策略中双方采取的博弈方式,分析出自车在此次博弈过程中该采取的自车让行、自车抢行或自车和障碍物均让行,从而大大减少决策时间,提高用户体验。
在一种实施方式中,该计算每一个博弈策略的策略代价,包括:确定策略代价的各个因素,该策略代价的各个因素包括安全性、舒适性、通过效率、路权、障碍物的先验概率和历史决策关联性中至少一个;计算每一个策略代价中的每一个因素的因素代价;将该每一个策略代价中的每一个因素的因素代价进行加权,得到该每一个博弈策略的策略代价。
在该实施方式中,在计算各个博弈策略的策略代价时,可以通过计算各个因素的代价,再将各个因素的代价进行加权计算,得到每种博弈策略代价,以便确定每个博弈策略的合理程度。
在一种实施方式中,在该计算每一个博弈策略的策略代价之后,还包括:比较策略代价中的每一个因素是否在设定范围内;删除包括任意一个不在设定范围内的因素的策略代价对应的博弈策略。
在该实施方式中,通过将不合理的博弈策略删除,避免后续因选择出的策略结果为不合 理的博弈策略,导致决策结果不能执行或错误,从而降低决策方法的可靠性。
在一种实施方式中,该方法还包括:检测到该自车的决策结果不在该自车的可行域内,输出自车让行的决策结果。
在该实施方式中,如果输出的决策结果不在自车的可行域内,表明此次决策结果均不符合条件,自车将不输出决策结果,这种情况等同于自车没有进行博弈过程,存在严重的缺陷,所以在无法确定决策结果时,按照“安全性”的原则,选择“自车让行”作为决策结果,从而保证自车选择出来的决策结果可以让自车在行驶过程中是安全的。
第二方面,本申请提供一种决策装置,包括:收发单元,用于获取自车和自车周围各个障碍物的预测运动轨迹;处理单元,用于确定博弈对象,该博弈对象为该自车周围各个障碍物中与该自车的预测运动轨迹相交或与自车之间的距离小于设定阈值的障碍物;根据传感器系统采集的自车的车辆信息、该博弈对象的障碍物信息和路况信息,分别对每个该博弈对象构建一个采样博弈空间,每一个采样博弈空间均包括至少一个博弈策略;计算每一个博弈策略的策略代价,该策略代价为将策略代价的各个因素权重进行加权得到的数值;以及确定自车的决策结果,该决策结果为共有采样博弈空间中策略代价最小的博弈策略,该共有采样博弈空间包括至少一个博弈策略,该每一个采样博弈空间中均包括该共有采样博弈空间中的博弈策略。
在一种实施方式中,该处理单元,具体用于构建每一个采样博弈空间的可行域,该每一个采样博弈空间的可行域为符合设定要求的策略代价对应的至少一个博弈策略;在所有采样博弈空间的可行域交集中,确定出相同博弈策略中策略代价最小的博弈策略。
在一种实施方式中,该处理单元,还用于确定非博弈对象,该非博弈对象为该自车周围各个障碍物中与该自车的预测运动轨迹不相交或与自车之间的距离不小于设定阈值的障碍物;根据传感器系统采集的自车的车辆信息、该非博弈对象的障碍物信息和路况信息,构建自车的可行域,该自车的可行域为自车在不碰撞该非博弈对象情况下采取不同决策的至少一个策略;检测到该自车的决策结果在该自车的可行域内,输出该自车的决策结果。
在一种实施方式中,该处理单元,具体用于根据该自车的车辆信息、该博弈对象的障碍物信息和路况信息,确定自车和该博弈对象中每一个障碍物的决策上限和决策下限;按照设定规则,在该自车和该博弈对象中每一个障碍物的决策上限和决策下限中,获取该自车和该博弈对象中每一个障碍物的决策策略;将该自车的决策策略与该博弈对象中每一个障碍物的决策策略组合,得到自车和该博弈对象中每一个障碍物的所述至少一个博弈策略。
在一种实施方式中,该处理单元,还用于根据自车和该博弈对象与冲突点之间的距离、该自车和该博弈对象中每一个障碍物的所述至少一个博弈策略,确定每一个博弈策略的行为标签,该冲突点为自车与障碍物的预测运动轨迹相交的位置或自车与障碍物之间的距离小于设定阈值的位置,该行为标签包括自车让行、自车抢行、自车和障碍物均让行中的至少一个。
在一种实施方式中,该处理单元,具体用于确定策略代价的各个因素,该策略代价的各个因素包括安全性、舒适性、通过效率、路权、障碍物的先验概率和历史决策关联性中至少一个;计算每一个策略代价中的每一个因素的因素代价;将该每一个策略代价中的每一个因素的因素代价进行加权,得到该每一个博弈策略的策略代价。
在一种实施方式中,该处理单元,还用于比较策略代价中的每一个因素是否在设定范围内;删除包括任意一个不在设定范围内的因素的策略代价对应的博弈策略。
在一种实施方式中,该处理单元,还用于检测到该自车的决策结果不在该自车的可行域内,输出自车让行的决策结果。
第三方面,本申请提供一种智能驾驶系统,包括至少一个处理器,该处理器用于执行存储器中存储的指令,执行如第一方面各个可能实现的实施例。
第四方面,本申请提供一种车辆,包括至少一个处理器,所述处理器用于执行如第一方面各个可能实现的实施例。
第五方面,本申请提供一种智能驾驶系统,包括传感器系统和处理器,所述处理器用于执行如第一方面各个可能实现的实施例。
第六方面,本申请提供一种计算机可读存储介质,其上存储有计算机程序,当该计算机程序在计算机中执行时,令计算机执行如第一方面各个可能实现的实施例。
第七方面,本申请提供一种计算设备,包括存储器和处理器,其特征在于,该存储器中存储有可执行代码,该处理器执行该可执行代码时,实现如第一方面各个可能实现的实施例。
第八方面,本申请提供一种计算机程序产品,所述计算机程序产品存储有指令,所述指令在由计算机执行时,使得所述计算机实现如第一方面各个可能实现的实施例。
附图说明
下面对实施例或现有技术描述中所需使用的附图作简单地介绍。
图1为本申请实施例提供的一种智能驾驶系统的架构示意图;
图2为本申请实施例提供的一种决策模块的架构示意图;
图3为本申请实施例提供的自车与障碍物之间的四种常见的场景示意图;
图4为本申请实施例提供的自车与非博弈对象之间的场景示意图;
图5为本申请实施例一提供的自车与博弈对象之间轨迹冲突的场景示意图;
图6为本申请实施例一提供的时域安全性代价与TDTC的绝对值之间的函数关系示意图;
图7为本申请实施例一提供的空间域安全性代价与两车之间最小距离之间的函数关系示意图;
图8为本申请实施例一提供的舒适性代价与加速度变化量之间的函数关系示意图;
图9为本申请实施例一提供的通过性代价与通过碰撞点时间之间的函数关系示意图;
图10为本申请实施例一提供的博弈对象抢行先验概率代价与博弈对象让自车通行的概率之间的函数关系示意图;
图11为本申请实施例一提供的路权ratio随社会车到冲突点距离之间的函数关系示意图;
图12为本申请实施例一提供的历史决策结果关联代价与每一帧图像对应的抢行代价或让行代价之间的函数关系的示意图;
图13为本申请实施例一提供的运动规划模块进行纵向规划的纵向距离和时间的侵占关系图;
图14为本申请实施例二提供的多车冲突解决示意图;
图15为本申请实施例提供的一种决策测方法的流程示意图;
图16为本申请实施例提供的一种决策装置方法的结构框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。
智能驾驶系统,就是利用传感器检测周围环境和自身状态,如导航定位信息、道路信息、其他车辆和行人等障碍物信息、自身的位姿信息及运动状态信息等等,经过一定的决策 规划算法后,精确的控制车辆行驶速度和转向,从而现实自动驾驶。如图1所示,根据智能驾驶系统100的功能需求,可将该系统100分为预测模块10、导航模块20、决策决策模块30、规划模块40和控制模块50。
预测模块10用于通过传感器系统中的全球定位系统(global positioning system,GPS)单元、惯性导航系统(inertial navigation system,INS)单元、里程计、摄像头、雷达等传感器采集的数据,获取车辆的位置、车辆周围的环境、车辆状态等信息,预测自车和自车周围的各个障碍物在未来一段时间段内行驶的路径。
导航模块20可以为车载导航系统、外部终端上的导航应用程序(application,APP)等,用于提供自车的导航路线、路线上的车道线、红绿灯、岔口等路况信息。
决策模块30用于接收预测模块10预测自车和自车周围的其它车辆在未来一段时间段内行驶的路径,导航模块20提供的自车的导航路线、路线上的车道线、红绿灯、岔口等路况信息,判断自车沿预测的路线(或导航路线)行驶时是否会与障碍物发生冲突。如果自车不与障碍物发生冲突,则自车和障碍物之间不作博弈,按照设定的规则确定运动方式和运行轨迹;如果自车与障碍物发生冲突,则根据输入的数据,计算出自车与各个障碍物之间的博弈结果,并对各个障碍物标注让行/抢行、避障/跟随等行为标签。
规划模块40用于接收到决策模块30输出的决策结果,根据障碍物的行为标签,确定对各个障碍物进行让行/抢行、避障/跟随等动作,如自车选取哪条车道、是否换道、是否跟车行驶、是否绕行、是否停车等。
控制模块50用于跟随规划模块40下发的规划结果,控制自车达到期望的速度和转向角度。
本申请下面将以决策模块30为例,具体来讲述本申请技术方案。其中,物体决策指自动驾驶车辆在自主导航的过程中,需要对环境中的障碍物进行决策,为其打上行为标签,如确定对某个障碍物进行绕行、跟随、抢行等动作,则为其打上绕行、跟随、抢行等标签。
图2为本申请实施例提供的一种决策模块的架构示意图。如图2所示,决策模块30包括博弈对象筛选单元301、博弈决策单元302、规则决策单元303和冲突处理单元304。
博弈对象筛选单元301根据传感器系统、定位模块、环境感知模块等其它上层模块输入的自车的车辆信息、障碍物的障碍物信息和路况信息,判断自车沿着参考路径行时是否与障碍物发生冲突,可以将障碍物分为博弈对象和非博弈对象。其中,博弈对象为可能与自车发生冲突的障碍物,非博弈对象为不可能与自车发生冲突的障碍物。
本申请中,自车的车辆信息包括通过自车的导航模块20或外部终端上的导航设备提供导航路线,以及车内各个传感器检测到的自车速度、加速度、航向角、位置等数据;障碍物信息包括各个障碍物的位置、障碍物与障碍物之间的距离、各障碍物与自车之间的距离、各个障碍物的类型、各个障碍物的状态、自车和各个障碍物的历史轨迹、预测出在未来一段时间内行驶轨迹、运动状态等数据,以及各个障碍物的速度、加速度、航向角等数据;路况信息包括红绿灯信息、道路指示牌的指示信息等等。
示例性地,博弈对象筛选单元301在得到自车的车辆信息、障碍物信息和路况信息后,根据自车和各个障碍物的行驶轨迹是否相交,或根据自车和各个障碍物的行驶轨迹、速度、加速度等数据,判断是否有障碍物的位置与自车的位置相同(或两者位置之间距离小于设定阈值)。当有障碍物的运动轨迹与自车的运动轨迹相交,或位置与自车的位置相同时,将这一类的障碍物分为博弈对象,并将博弈对象的障碍物信息发送至博弈决策单元302;其它的障碍物分为非博弈对象,并将非博弈对象的障碍物信息发送至规则决策单元303。
一般来说,自车与博弈对象之间构建的场景大致可以分为四种场景,如图3所示,具体为(以自车直行为例):
场景1:单障碍物决策(轨道有交点):如自车直行,障碍物横穿;
场景2:单障碍物决策(轨道无交点,存在潜在冲突):如自车直行,障碍物汇入相邻车道、相同车道的位于自车的前方位置等场景;
场景3:多障碍物决策(多个博弈对象):如自车直行,多个障碍物穿行自车的规划路径;
场景4:多障碍物决策(有博弈对象和非博弈对象):如自车直行,且跟随前车(非博弈对象),同时侧面有障碍物穿行。
本申请中,博弈决策单元302在处理多障碍物决策过程,会将多障碍物拆分为多个单障碍物决策,决策出自车与每个障碍物之间的可行域,然后提取共有的博弈策略,每个可行域中都包括共有的博弈策略,以获得这些可行域的交集,若交集存在,则计算在该交集内各博弈对象的最优博弈策略;若交集不存在,则输出当前情形下最保守的决策结果,如自车输出“让行”策略。所以本申请在此只需要将博弈决策单元302处理自车与一个博弈对象之间的决策即可。具体实现过程如下:
1、采样策略空间生成:博弈决策单元302根据预定义的博弈方式、路况信息、自车和障碍物的运动能力,确定博弈双方的博弈策略的决策上限和决策下限,从而得到博弈双方的合理博弈决策范围。然后在该博弈策略范围内,对自车和障碍物进行可行博弈策略采样,得到双方可行博弈策略的数量,再将双方可行博弈策略进行组合,即可得到多种不同组合的博弈策略空间。
如果自车和博弈对象均为车辆,在博弈时,博弈策略对应的操作可以为转动方向盘和加减油门这两种方式。通过转动方向盘,改变车辆行驶的转向角,使得车辆通过改变横向位移来实现抢行、避让等行为;通过加减油门,改变车辆的加速度和速度,使得车辆通过改变纵向位移来实现抢行、避让等行为。
示例性地,以博弈策略为改变加速度为例,博弈决策单元302根据接收到的自车和博弈对象的距理论发生碰撞的地点之间的距离、车辆的加速度最大值和最小值、自车的速度、道路最大限速等数据,确定自车和博弈对象在通过改变加速度值得到不同的博弈策略种类,将该博弈策略的集合作为博弈策略范围。然后以设定的采样方式,选取n种自车的加速度值和m种博弈对象的加速度值,得到n×m种双方可能的组合博弈策略空间。
2、策略代价评价:博弈决策单元302计算的每种博弈策略的策略代价,与安全性、舒适性、通过效率、路权、让障碍物的通行概率、历史决策方式等因素有关,所以在计算每个博弈策略的策略代价时,可以通过计算各个因素的代价,再将各个因素的代价进行加权计算,得到每种博弈策略的策略代价。本申请以安全性代价、舒适性代价、通过效率代价、路权代价、障碍物先验概率代价和历史决策关联代价这六种因素来分析每种博弈策略的策略代价,具体如下:
(1)安全性代价:在博弈过程中,博弈双方应保持合理的安全距离,当小于安全阈值或者碰撞时应产生较大的安全性代价。其中,安全性代价是与双方之间的距离成反比。
(2)舒适性代价:在博弈过程中,博弈双方在不碰撞的前提下,倾向于保持当前运动状态,当有较大的运动状态变化时(如加速度、横向加速度等),会影响乘车人的体验,产生较大的体验性代价。其中,舒适性代价与运动状态变化程度成反比。
(3)通过效率代价:在博弈过程中,博弈双方倾向于尽快通过当前的交通场景,以完成 此次博弈过程,如果博弈双方花费较多时间完成此次博弈,产生较大的通过效率代价。其中通过效率代价与完成博弈时间成反比。
(4)路权代价:在博弈过程中,博弈双方倾向于遵从交通规则中规定的行驶顺序行驶,如果博弈策略和路权信息中规定的行驶规则相差较大时,则产生较大的路权代价。其中,路权代价是与违反规定行驶的程度成正比。
(5)障碍物的先验概率代价:在博弈过程中,障碍物的决策结果倾向于趋近观测得到的对应行为的先验概率,如果博弈策略与先验概率偏离较大时,产生较大的障碍物的先验概率代价。其中,障碍物的先验概率与博弈场景有关的,如果博弈场景为抢行/让行的博弈决策,则障碍物的先验概率为抢行先验概率;如果博弈场景为避让/不避障的博弈决策,则障碍物的先验概率为避障先验概率。
(6)历史决策关联代价:在博弈过程中,博弈双方倾向于保持上一帧博弈过程中得到的决策结果,当发生博弈结果变化时,产生较大的历史决策关联代价。
3、策略可行域生成:博弈决策单元302通过将上述六种因素的代价按照一定规则进行加权,得到每种博弈策略的策略代价后,再对加权到各个博弈策略的策略代价上的各个因素进行合理性评估和筛选,将包括不合理因素的博弈策略的策略代价删除,从而筛选出合理的博弈策略的策略代价,作为自车与博弈对象的可行域。
规则决策单元303用于估计非博弈对象的可行域。本申请中,为了处理非博弈对象和博弈对象之间的决策结果冲突问题,应该根据非博弈对象所构成的约束区域来估算自车对于该约束区域的可行域。如对于纵向(是指沿自车行驶的道路方向)动作博弈(如抢行/让行),自车前方会虚拟出一个虚拟墙作为构成加速度上限约束;对于横向(垂直于自车行驶的道路方向)动作博弈,自车将非博弈对象构成横向最大偏移范围作为约束,从而构建出自车与非博弈对象的可行域。其中,虚拟墙是指由决策/规划生成的纵向约束,通常是指自车通过某位置点的速度。
示例性地,如图4所示的场景,自车(黑色方框)与博弈对象(A和B)进行抢行/让行决策,自车的最优博弈策略可能是抢行,但是由于自车前方有障碍物的存在,自车无法进行抢行动作,因此应根据自车跟随的目标车辆生成自车在此情景下的加速度上限,作为自车与非博弈对象的可行域。
冲突处理单元304在得到博弈决策单元302发送的自车与各个博弈对象的可行域和规则决策单元303发送的自车与各个非博弈对象的可行域后,对接收到的可行域求交集,若交集存在,则计算在该交集内自车对各个博弈对象的最优博弈策略;若交集不存在,则进行输出当前情形下最保守的决策结果,如对各个博弈对象让行的决策结果。
本申请提供了一种基于采样博弈空间的自动驾驶车辆物体决策方案,通过接收预测模块、导航模块、传感器系统发送的数据,构建自车与各个博弈对象之间的采样博弈空间;然后计算各个影响车辆博弈的因素的代价,通过加权得到各个博弈策略的策略代价,并剔除包括不合理的因素的博弈策略的策略代价,得到自车与各个博弈对象之间的可行域;接着再结合自车与非博弈对象之间的可行域,计算出自车对各个博弈对象的最优博弈策略,由于该方案不依赖场景的规定,所以可以适应于所有场景,同时在博弈过程中,面对多个博弈对象时,可以通过自车与各个博弈对象的可行域求交集的方式,实现自车可以与多个博弈对象同时进行博弈。
下面将通过两个实施例具体讲述博弈决策单元302如何决策出可通行域。
实施例一
如图5所示的轨迹冲突的交通场景,该场景中自车(黑色)的规划路径和社会车(灰色)的预测路径存在冲突,即在交点处可能发生碰撞。上层模块提供自车的规划参考路径、社会车的预测轨迹、以及自车和社会车当前的速度、加速度以及各自到碰撞点的距离。自车需要根据这些信息进行纵向博弈,如对障碍物进行抢行/让行。
一、采样策略空间生成
自车和博弈对象的纵向博弈策略,可以用加减速度(抢行/让行)的大小来表征。首先生成博弈策略(加速度)的决策上限和决策下限,考虑车辆纵向动力学、运动学约束及自车与博弈对象的相对位置速度关系得到。图5场景中,在当前时刻中,自车速度为17km/h,对向直行车辆速度为15km/h,自车到X点距离为20.11m,对向博弈车辆到X点距离为35.92m,自车的前馈加速度(规划加速度)为0.5m/s 2,自车的观测加速度为-0.67m/s 2,社会车的观测加速度为0.0m/s 2,道路静态限速60km/h和路径曲率限速30km/h。则自车允许的加速度区间为[-4.0,2.0]m/s 2,社会车允许的加速度区间为[-3.55,3.0]m/s 2。考虑到计算复杂度和采样策略空间精度之间的平衡,将加速度间隔定为1m/s 2,最终可生成表一所示的采样策略空间。
表一 实施例一构建博弈策略空间
Figure PCTCN2022077480-appb-000001
二、博弈策略的策略代价评价
各博弈策略的策略代价通过各项设计因素的代价来定量描述。本申请提到的代价包括安全性代价、舒适性代价、通过性代价、博弈对象的先验概率代价、路权代价和历史决策结果关联代价,总代价为六种代价的加权之和。对于策略空间里的每一个决策策略对,计算其对应的总收益。此处以[1.0,-1.45](自车采样加速度为1m/s 2,社会车采样加速度为1.45m/s 2)这个策略对为例,进行详细描述。
1、安全性代价可分为时域安全性代价和空间域安全性代价。其中,时域安全性代价是根据自车和博弈车通过碰撞点的时间差(time difference to collision,TDTC)有关,TDTC越大,则越安全,时域安全性代价越小,量化关系如图6所示。以[1.0,1.45]策略对为例,自车到达碰撞点的时间为eTTC=3.41s,博弈车到达碰撞点的时间为oTTC=5.02s,则此策略对的|TDTC|=|eTTC–oTTC|=|-1.65|=1.65s。通过图6所示的关系,可以计算得到时域安全性代价为100000*1.0=100000。其中,100000为时间域安全性代价权重。
空间域安全性是根据两车的博弈策略对应的采样加速度沿自车规划路径和障碍物预测路径,对博弈双方在未来的运动进行递推,获得在未来10s内两车之间的最小距离(以0.2s为递推步长)。当最小距离越大,则越安全,空间域安全性代价越小,量化关系如图7所示。以递 推出的两车最小距离为0.77m为例,空间域安全性代价为10000*0.1756=1756。其中,10000为空间域安全性代价权重。
2、舒适性代价与自车/博弈对象的加速度变化率(jerk)有关,jerk越小,则舒适性越好,舒适性代价越小,量化关系如图8所示。以[1.0,1.45]策略对为例,对于自车,其加速度变化量为eDeltaAcc=1-(-0.67)=1.67m/s2,舒适性代价为eComf代价=100*0.315=31.5。其中,100为自车舒适性代价的权重。对于博弈车,其加速度变化量为oDeltaAcc=1.45-(0)=1.45m/s 2,舒适性代价为oComf代价=300*0.286=85.94。其中,300为社会车舒适性代价的权重。
3、通过性代价与通过碰撞点时间有关,减速让行通过碰撞点时间较长,则通过性代价增大,相反加速通过碰撞点时间较短,通过性代价减小,量化关系如图9所示。其中,图9中的横轴为用策略空间下的加速度计算得到的驶过碰撞点的时间samplePassTime与用当前观测的速度与加速度通过碰撞点的时间realPassTime的差deltaPassTime,纵轴为通过性代价。
以自车在当前加速度和速度下,通过碰撞点的时间为eRealPassTime=4.47s,博弈车在当前加速度和速度下,通过碰撞点的时间为oRealPassTime=7.2s,以及在[1.0,1.45]策略为例。对于自车,其通过碰撞点的时间为eSamplePassTime=3.41s,与观测量计算的通过时间差为eDeltaPassTime=eSamplePassTime–eRealPassTime=3.41-4.47=-1.06,通过性代价为ePass代价=100*4.1150=411.5。其中,100为自车通过性代价权重。
对于博弈车,其通过碰撞点的时间为oSamplePassTime=5.02s,与观测量计算的通过时间差为oDeltaPassTime=oSamplePassTime–oRealPassTime=5.02-7.2=-2.18s,通过性代价为oPass代价=100*2.3324*233.24。其中100为社会车通过性代价权重。
4、博弈对象的先验概率代价与博弈对象让自车通行的概率有关,让行的概率越大,则博弈对象的先验概率代价越小,量化关系如图10所示。博弈对象的让行概率反应了其个体的驾驶风格,是动态因素,取决于其历史的速度、加速度、位置等信息,是博弈模块的输入。在上文描述的场景下,博弈对象的让行概率为0.2,则对应的自车抢行eProb代价为(1-0.2)*1000=800。其中1000为先验概率代价权重。
5、路权代价描述博弈双方遵守交通规则的程度,有路权的车辆更有抢行优势,其抢行代价应减小,自车的让行代价应增大。路权代价取决于当前社会车辆和自车之间的交通路权关系以及社会车辆到冲突点的客观距离。路权代价计算公式为:
路权代价=场景的动态路权ratio*动态路权权重
场景的路权ratio=f(社会车到冲突点距离),非线性函数,范围在[0.0,1.0],场景的路权ratio是指社会车获取路权的概念,其与社会车和冲突点之间的距离有关,具体量化关系如图11所示,如果社会车与冲突点的距离小于5m,则场景的路权ratio为1;如果社会车与冲突点的距离大于5m,则场景的路权ratio随着社会车与冲突点的距离增加而减小;如果社会车与冲突点的距离大于100m,则场景的路权ratio为0。
动态路权权重=路权权重基础值+1000*场景的路权值。
当以博弈对象直行和自车左转的场景为例,博弈对象有路权,场景的路权值为0.4,博弈车到冲突点距离为35.92m,自车抢行路权代价增加,即eGWRoadRight代价=f(35.92)*(5000+1000*0.4)=0.61*5400=3305.67。其中,500为路权权重基础值。
6、历史决策结果关联代价的引入,是为了防止连续两帧的决策跳变。若上一帧为抢行时,则当前帧抢行代价减小,使其更容易输出抢行决策,若上一帧为让行时,则当前帧让行代价减小,使其更容易输出让行决策,历史决策结果关联代价与每一帧图像对应的抢行代价 或让行代价之间的量化关系如图12所示,如果第K帧图像表示为抢行时,在得到第K+1帧图像后,如果计算出为抢行代价时,则第K帧图像与第K+1帧图像之间的关联代价减小,如果计算出为让行代价时,则第K帧图像与第K+1帧图像之间的关联代价增大。
自车决策结果让行切换到抢行的回差值为50,即上一帧若为YD,则当前帧的YD代价减小50。相反,自车决策结果抢行切换到让行的回差值为20,即上一帧若为GW,则当前帧的GW代价减小20。当上一帧决策为自车让行,则本次历史决策关联代价为50。由下一小节可知,这一帧YD最优代价为11087.57。故最终的最优YD代价为11087.57–50=11037.57。
对上述六项代价进行求和,即可获得策略空间点{1,1.45}对应的最终代价Total。即为:
代价Total=100000+3305.67+1756+31.50+85.94-50+800+411.5+233.24=106573.85。
其中,式中每一项分别为时域安全性代价、路权代价、空间域安全性代价、自车舒适性代价、博弈车舒适性代价、帧间关联代价、先验概率代价和通过性代价。
同时,通过对两车到达碰撞点时间差(TDTC=-1.65s)的判断,可得自车先到碰撞点。此策略点对应的是决策是自车抢行。然后对表一中每一个动作组合对进行以上步骤的计算,可得所有动作组合对下的总代价,如表二所示。
表二 实施例一中各个博弈策略对的代价
Figure PCTCN2022077480-appb-000002
表二中的代价值分为正常字体、加粗字体和倾斜字体,其中,当自车比博弈车先到达冲突点,则表明自车行为为抢行(正常字体对应的代价值表示抢行的博弈策略的策略代价);当自车比博弈车后到达冲突点,则表明自车行为为让行(倾斜字体对应的代价值表示让行的博弈策略的策略代价);当自车和博弈车在未到达冲突点之前都刹停了,则表明自车和博弈车的行为为让行(加粗字体对应的代价值表示自车和博弈车让行的博弈策略的策略代价)。
三、策略可行域生成
经过步骤二后,可以生成自车与博弈车之间的有效动作空间内的所有动作组合,以及每一对动作组合下的代价Total。对于这些动作组合,对各个分代价进行评估和筛选,选取所有合理的备选组合构成自车对于该博弈对象的可行域。对于舒适性、路权等子代价,在其设定的区间内都认为是有效值。
对于安全性代价,需要进行有效性判断,将不合理的博弈策略对直接删除。对于时域安全性代价,如果TDTC小于某一阈值(1s),则认为自车和社会车有碰撞风险,此动作组合不 可作为有效动作输出,故将此动作对删除。同理,对于空间安全性代价,递推的最小距离若小于某一阈值(0.01m),则认为自车与博弈车会发生碰撞,将此动作对删除。对于通过性代价,如果两车都在冲突点前停车,则造成通行效率下降,不可行,将此动作对删除。剩下的有效动作即可构成策略空间的可行域,如表三所示。
表三 实施例一中可行策略集(可行域)及其代价
Figure PCTCN2022077480-appb-000003
其中,“删除-1”为时域安全性代价的无效动作输出,“删除-2”为空间域安全性代价的无效动作输出,“删除-3”为通行代价的无效动作输出。
由上述分析可得,自车针对该博弈对象的可行域为表中有效的加速度组合,在该可行域内选取代价Total最小的博弈策略作为最终决策结果,即代价Total最小的博弈策略对应的全局收益最大,作为自车和博弈车最优的动作组合,能同时保证足够的安全性、通过性和舒适性。在选取该最优策略对进行决策时,博弈决策模块向下游运动规划模块发送该博弈物体会采取的最优加速度,运动规划模块根据该加速度值进行规划。
图5所示的场景中,自车是没有路权的(交通规则中规定转弯让直行),在整个过程中将障碍物打上“让行”的标签。在判断博弈对象会抢行的同时,计算出博弈对象会以多大加速度抢行,将该加速度值发送给运动规划器,根据可行域内对应的加速度组合,选择出对应的自车加速度值,使其准确规划出让行动作。
当自车和博弈对象刚刚进入路口,自车针对博弈对象的可行域较大,使用让行(-4,0)m/s 2,抢行[0,2]m/s 2范围内的加速度都可以完成交互。但是根据各个代价加权求和的结果可知,在选取(-3,0.45)m/s 2的加速度策略对时,自车和博弈对象构成的博弈对有最优解,该博弈结果相较于其他解的主要优势在于,它遵循了路权描述,同时保证了足够的安全性和舒适性。在该最优博弈策略中,得到障碍物会以(0.45)m/s 2的加速度抢行,将该加速度值发给运动规划层,运动规划模块会根据该加速度值进行障碍物预测轨迹的修正,具体体现为将该障碍物的侵占区在T轴上平移,通过图13中的障碍物在此车路径上纵向距离和时间的侵占(station-time,ST)关系图进行速度规划,实现安全让行。
上述过程主要介绍了每一帧博弈如何进行,对于整体的物体决策过程,当物体逐渐接近,博弈的第一帧,自车对博弈对象进行让行。在博弈对象不断接近轨迹交点过程中,自车抢行的安全性代价不断增大,因此在最优博弈结果选取的过程中必然会保持对博弈对象让行 直至博弈对象驶过轨迹交点,博弈结束。
本申请实施例一的针对障碍物的抢行/让行决策方案,不依赖于具体的障碍物交互形式,轨迹相交特征,而是利用传感器系统获取交通场景信息,对交通场景进行了合理抽象,实现了应用场景的泛化。同时,得到自车应以多大的加速度抢行/让行,障碍物会以多大的加速度抢行/让行,并用这些值影响运动规划,保证决策指令的正确执行。
实施例二
以图4所示的场景为例,自车在无保护路口左转,社会车A和社会车B为对向车道直行,在自车的行驶路径前方存在同向行驶社会车C,自车需要对社会车C进行跟随,自车与社会车A、B形成博弈关系。
假设当前时刻,自车速度为8km/h,对向直行车辆A速度为14km/h,对向直行车辆B速度为14km/h,自车到与A车交点距离为21m,自车到与B车交点距离为25m,A车到与自车交点距离为35m,B车到与自车交点距离为24m,自车的前馈加速度为0.22m/s 2,自车的观测加速度为0.0m/s 2,社会车的观测加速度为0.0m/s 2,道路静态限速40km/h和路径曲率限速25km/h。社会车C的速度为10km/h,加速度为0.0m/s 2,车尾到自车车头距离为15m。自车允许的加速度采样区间为[-4,2]m/s 2,社会车A和社会车B允许的加速度采样区间为[-3.55,3.0]m/s 2。考虑到计算复杂度和采样策略空间精度之间的平衡,将加速度间隔定为1m/s 2
分别对社会车A和社会车B进行单车博弈,其中代价函数设计,权重分配,可行域选取方式与实施例一一致。可分别获得社会车A和B对应的可行域。其中,自车与社会车A的可行域如表四所示。
表四 实施例二中自车与社会车A的可行域
Figure PCTCN2022077480-appb-000004
其中,表二中的代价值分为正常字体、加粗字体和倾斜字体,正常字体对应的代价值表示自车抢行、倾斜字体对应的代价值表示自车让行,加粗字体对应的代价值表示为自车和博弈车在冲突点之前都刹停。自车与社会车A的可行域[0.45,-1]和[-1.55,-2]为抢行或让行代价全集中最优的代价。
自车与社会车B的可行域如表五所示。
表五 实施例二中自车与社会车A的可行域
Figure PCTCN2022077480-appb-000005
其中,表二中的代价值分为正常字体、加粗字体和倾斜字体,正常字体对应的代价值表示自车抢行、倾斜字体对应的代价值表示自车让行,加粗字体对应的代价值表示为自车和博弈车在冲突点之前都刹停。自车与社会车A的可行域[0.45,-1]和[-3.55,-2]为抢行或让行代价全集中最优的代价。
对于社会车C(非博弈对象),自车的决策不能与其产生风险,需根据自车速度、自车加速度、社会车C的速度、社会车C的加速度以及自车到社会车C的距离估计自车的加速度可行域。这一部分由纵向规划模块实现,其计算模型为:
accUpLimit=speedGain*(objV-egoV)+distTimeGain*(到前车距离-最小跟车距离)/egoV。
其中,accUpLimit为加速度的决策上限,objV为障碍物速度,egoV为自车速度,speedGain和distTimeGain为可调参数。
然后直接用其输出值,即针对社会车C,将此场景参数代入以上计算模型,可得:
0.85*(10/3.6-8/3.6)+0.014*(15-4.56)/(8/3.6)=0.8
即自车的加速度上界为0.8m/s 2,可行域为[-4,0.8]m/s 2
在利用该实施例分别对社会车A和B进行单独博弈时,得到的结果是对A抢行,对B让行,但是在实际中,这样的动作是无法同时完成的。因此需要进行多车冲突解决,综合以上三辆社会车的可行域,冲突解决求解示意图如图14所示。其中,“代价”和“决策Tag”栏中框出的数据为单车的最优决策对应的代价,“自车加速度采样空间”栏中框出的数据为针对三个社会车自车的公共可行域。
对于博弈社会车A、B以及非博弈车(社会车C)形成的可行域,先对其求交集,可得自车对其这三个物体的公有可行域为[-4.0,-1.0]m/s 2,故在这个公共域[-4.0,-1.0]m/s 2内查找最优策略代价。分别计算在该公共可行域中自车对于社会车A和B的策略代价总和,得到自车加速度为-1.0时有最优解。此时对于社会车A,自车的最优代价为11135.31,社会车A期望加速度为0.45,对应的决策为让行。对于社会车B,自车的最优代价为11067.41,社会车B期望加速度为0.45,对应的决策为让行。
故最终的双车综合最优解为:让行社会车A(期望加速度为0.45),且让行社会车B(期望加速度为0.45),自车最优期望加速度为-1.0。此冲突解决的综合决策,可以获得最大的全 局收益,且保证了决策结果对各个障碍物都可行。
图4所示的场景中,自车会根据考虑到的所有障碍物进行综合决策,得到可以同时满足多个障碍物的最优博弈结果。当自车规划路径上的车辆对自车构成了一个虚拟墙的约束,即对应一个加速度范围为[-4.0,0.8]的可行域。在博弈过程中,自车对社会车A和社会车B两车分别做博弈决策,可以得到针对于每个车辆的自车可行域。对这三个可行域求交集,则得到了满足场景内所有障碍物的可行域,在该可行域内求得自车针对所有博弈障碍物的最优解,即得到了对社会车A进而B两个障碍物都需要进行让行的结果。
实施例二中主要是解决多目标场景博弈问题。对于非博弈车辆,基于博弈类型估算其对应的可行域,对于博弈车辆直接得到可行域,在可行域内求解针对所有博弈对象的最优解,以实现多目标决策的一致性。
本申请实施例二提出的多车博弈方法中,先获得每个博弈车的采样空间可行域,然后针对非博弈车,估计自车的可行域,最后在博弈车和非博弈车的可行域中,取公共可行域,并计算其中的最优解,最终获得自车与多个社会车的全局最优解。
图15为本申请实施例提供的一种决策方法的流程示意图。如图15所示,本申请实施例提供了一种决策方法,具体实现过程如下:
步骤S1501,获取自车和自车周围各个障碍物的预测运动轨迹。
其中,预测运动轨迹可以通过传感器系统中的GPS单元、INS单元、里程计、摄像头、雷达等传感器采集的数据,获取车辆的位置、车辆周围的环境、车辆状态等信息,然后对获取的信息进行处理,可以预测出自车和自车周围的各个障碍物在未来一段时间段内行驶的路径。
步骤S1503,确定博弈对象。其中,博弈对象为自车周围各个障碍物中与自车的预测运动轨迹相交或与自车之间的距离小于设定阈值的障碍物。
具体地,本申请在得到自车和各个障碍物的预测运动轨迹后,判断自车和各个障碍物的预测运动轨迹是否相交,或者根据自车和各个障碍物的预测运动轨迹、行驶轨迹、速度、加速度等数据,判断是否有障碍物的位置与自车的位置之间的距离小于设定阈值)。如果检测到有障碍物的预测运动轨迹与自车的预测运动轨迹相交,或两车之间的距离小于设定阈值时,则将这一类的障碍物分为博弈对象;其它的障碍物分为非博弈对象。
步骤S1505,根据传感器系统采集的自车的车辆信息、博弈对象的障碍物信息和路况信息,分别对每个博弈对象构建一个采样博弈空间。其中,每一个采样博弈空间为自车和博弈对象中一个障碍物之间采取不同博弈策略的集合。
本申请根据预定义的博弈方式、路况信息、自车和各个障碍物的运动能力等因素,确定自车和各个博弈对象的博弈策略范围,如自车的加速度范围、速度范围等等。然后在该博弈策略范围内,对自车和各个博弈对象进行可行博弈策略采样,得到自车和各个博弈对象的可行博弈策略的数量,再将自车与各个博弈对象的可行博弈策略进行组合,即可得到组合的多种不同的博弈策略空间。
示例性地,以博弈方式为改变加速度为例,根据接收到的自车和一个博弈对象的距理论发生碰撞的地点之间的距离、车辆的加速度最大值和最小值、自车的速度、道路最大限速等数据,确定自车和一个博弈对象在通过改变不同的加速度值得到不同的博弈策略种类,将该博弈策略的集合作为博弈策略范围。然后以设定的采样方式,选取n种自车的加速度值和m种博弈对象的加速度值,则可以得到n×m种双方可能的组合的博弈策略空间。
步骤S1507,计算每一个博弈策略的策略代价。其中,策略代价为将影响策略代价的各个因素权重进行加权得到的数值。
其中,影响策略代价的因素有安全性、舒适性、通过效率、路权、让障碍物的通行概率、历史决策方式等等。所以在计算各个博弈策略的策略代价时,可以通过计算各个因素的代价,再将各个因素的代价进行加权计算,得到每种博弈策略代价。
可选地,本申请根据自车和该博弈对象与冲突点之间的距离、自车与博弈对象中每一个障碍物的博弈策略的集合,判断自车和各个障碍物谁先到冲突点。当一个博弈策略中自车和障碍物的决策策略确定自车比障碍物先到达冲突点,表明自车行为为抢行,则将该博弈策略标注为“自车抢行”的标签;当一个博弈策略中自车和障碍物的决策策略确定自车比障碍物后到达冲突点,表明自车行为为让行,则将该博弈策略标注为“自车让行”的标签;当一个博弈策略中自车和障碍物的决策策略确定自车和障碍物在冲突点之前都停下来,表明自车和障碍物的行为为让行,则将该博弈策略标注为“自车和障碍物均让行”的标签。
步骤S1509,确定自车的决策结果。其中,决策结果为每一个采样博弈空间中相同博弈策略中策略代价最小的博弈策略。
具体地,通过将各个因素的代价按照一定规则进行加权,得到每种博弈策略代价后,再对加权到各个博弈策略代价上的各个因素进行合理性评估和筛选,将包括不合理因素的博弈策略代价删除,从而筛选出合理的博弈策略代价,作为自车与博弈对象的可行域。在得到个自车与各个博弈对象的可行域后,对得到各个可行域求交集,得到满足当前场景下自车遇到多个博弈对象时的公共可行域,然后从公共可行域中选择出博弈代价最小的博弈策略作为决策结果。
可选地,为了处理非博弈对象和博弈对象之间的决策结果冲突问题,应对非博弈对象所构成的约束区域来估算自车对于该约束区域的可行域。如对于纵向(是指沿自车行驶的道路方向)动作博弈(如抢行/让行),自车前方会虚拟出一个虚拟墙作为构成加速度上限约束;对于横向(垂直于自车行驶的道路方向)动作博弈,自车将非博弈对象构成横向最大偏移范围作为约束,从而构建出自车与非博弈对象的可行域。然后将自车与各个博弈对象之间的公共可行域和自车与非博弈对象的可行域求交集,从交集中选择出博弈代价最小的博弈策略作为决策结果;如果交集中没有博弈策略,则按照“安全性”的原则,选择“自车让行”的决策结果。
本申请实施例中,通过获取自车和自车周围的各个障碍的预测运动轨迹,通过判断预测运动轨迹是否相交或两车之间的距离是否小于设定阈值,确定出博弈对象;然后构建自车与各个障碍物之间的采样博弈空间,并计算各个采样博弈空间中的各个博弈策略的策略代价;通过求解各个采样博弈空间中的相同博弈策略,选择相同博弈策略中策略代价最小的博弈策略作为博弈结果,由于该方案不依赖场景,所以可以适应于所有场景。同时,在博弈过程中,面对多个博弈对象时,通过求解各个采样博弈空间中的相同博弈策略的方式,实现自车可以与多个博弈对象同时进行博弈。
图16为本申请实施例提供的一种决策装置的架构示意图。如图16所示的装置1600,包括收发单元1601和处理单元1602。具体执行如下功能:
收发单元1601用于获取自车和自车周围各个障碍物的预测运动轨迹;处理单元1702用于确定博弈对象,该博弈对象为该自车周围各个障碍物中与该自车的预测运动轨迹相交或与自车之间的距离小于设定阈值的障碍物;根据传感器系统采集的自车的车辆信息、该博弈对象的障碍物信息和路况信息,分别对每个该博弈对象构建一个采样博弈空间,每一个采样博弈空间为自车和该博弈对象中一个障碍物之间采取不同博弈策略的集合;计算每一个博弈策略的策略代价,该策略代价为将影响策略代价的各个因素权重进行加权得到的数值;以及确定 自车的决策结果,该决策结果为共有采样博弈空间中策略代价最小的博弈策略,该共有采样博弈空间包括至少一个博弈策略,该每一个采样博弈空间中均包括该共有采样博弈空间中的博弈策略。
在一种实施方式中,该处理单元1602具体用于构建每一个采样博弈空间的可行域,该每一个采样博弈空间的可行域为符合设定要求的策略代价对应的至少一个博弈策略;在所有采样博弈空间的可行域交集中,确定出相同博弈策略中策略代价最小的博弈策略。
在一种实施方式中,该处理单元1602还用于确定非博弈对象,该非博弈对象为该自车周围各个障碍物中与该自车的预测运动轨迹不相交或与自车之间的距离不小于设定阈值的障碍物;根据传感器系统采集的自车的车辆信息、该非博弈对象的障碍物信息和路况信息,构建自车的可行域,该自车的可行域为自车在不碰撞该非博弈对象情况下采取不同决策的至少一个策略;检测到该自车的决策结果在该自车的可行域内,输出该自车的决策结果。
在一种实施方式中,该处理单元1602具体用于根据该自车的车辆信息、该博弈对象的障碍物信息和路况信息,确定自车和该博弈对象中每一个障碍物的决策上限和决策下限;按照设定规则,在该自车和该博弈对象中每一个障碍物的决策上限和决策下限中,获取该自车和该博弈对象中每一个障碍物的决策策略;将该自车的决策策略与该博弈对象中每一个障碍物的决策策略组合,得到自车和该博弈对象中每一个障碍物的所述至少一个博弈策略。
在一种实施方式中,该处理单元1602还用于根据自车和该博弈对象与冲突点之间的距离、该自车和该博弈对象中每一个障碍物的所述至少一个博弈策略,确定每一个博弈策略的行为标签,该冲突点为自车与障碍物的预测运动轨迹相交的位置或自车与障碍物之间的距离小于设定阈值的位置,该行为标签包括自车让行、自车抢行、自车和障碍物均让行中的至少一个。
在一种实施方式中,该处理单元1602具体用于确定策略代价的各个因素,该策略代价的各个因素包括安全性、舒适性、通过效率、路权、障碍物的先验概率和历史决策关联性中至少一个;计算每一个策略代价中的每一个因素的因素代价;将该每一个策略代价中的每一个因素的因素代价进行加权,得到该每一个博弈策略的策略代价。
在一种实施方式中,该处理单元1602还用于比较策略代价中的每一个因素是否在设定范围内;删除包括任意一个不在设定范围内的因素的策略代价对应的博弈策略。
在一种实施方式中,该处理单元1602还用于检测到该自车的决策结果不在该自车的可行域内,输出自车让行的决策结果。
本发明提供一种计算机可读存储介质,其上存储有计算机程序,当该计算机程序在计算机中执行时,令计算机执行上述任一项方法。
本发明提供一种计算设备,包括存储器和处理器,该存储器中存储有可执行代码,该处理器执行该可执行代码时,实现上述任一项方法。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请实施例的范围。
此外,本申请实施例的各个方面或特征可以实现成方法、装置或使用标准编程和/或工程技术的制品。本申请中使用的术语“制品”涵盖可从任何计算机可读器件、载体或介质访问的计算机程序。例如,计算机可读介质可以包括,但不限于:磁存储器件(例如,硬盘、软盘或 磁带等),光盘(例如,压缩盘(compact disc,CD)、数字通用盘(digital versatile disc,DVD)等),智能卡和闪存器件(例如,可擦写可编程只读存储器(erasable programmable read-only memory,EPROM)、卡、棒或钥匙驱动器等)。另外,本文描述的各种存储介质可代表用于存储信息的一个或多个设备和/或其它机器可读介质。术语“机器可读介质”可包括但不限于,无线信道和能够存储、包含和/或承载指令和/或数据的各种其它介质。
在上述实施例中,图16中的决策装置1600可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本申请实施例该的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
应当理解的是,在本申请实施例的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
该作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
该功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者接入网设备等)执行本申请实施例各个实施例该方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上该,仅为本申请实施例的具体实施方式,但本申请实施例的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请实施例揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请实施例的保护范围之内。

Claims (21)

  1. 一种决策方法,其特征在于,包括:
    获取自车和自车周围各个障碍物的预测运动轨迹;
    确定博弈对象,所述博弈对象为所述自车周围各个障碍物中与所述自车的预测运动轨迹相交或与自车之间的距离小于设定阈值的障碍物;
    根据传感器系统采集的自车的车辆信息、所述博弈对象的障碍物信息和路况信息,分别对每个所述博弈对象构建一个采样博弈空间,每一个采样博弈空间均包括至少一个博弈策略;
    计算每一个博弈策略的策略代价,所述策略代价为将策略代价的各个因素权重进行加权得到的数值;
    确定自车的决策结果,所述决策结果为共有采样博弈空间中策略代价最小的博弈策略,所述共有采样博弈空间包括至少一个博弈策略,所述每一个采样博弈空间中均包括所述共有采样博弈空间中的博弈策略。
  2. 根据权利要求1所述的方法,其特征在于,所述确定自车的决策结果,包括:
    构建所述每一个采样博弈空间的可行域,所述每一个采样博弈空间的可行域为符合设定要求的策略代价对应的至少一个博弈策略;
    在所有采样博弈空间的可行域交集中,确定出相同博弈策略中策略代价最小的博弈策略。
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:
    确定非博弈对象,所述非博弈对象为所述自车周围各个障碍物中与所述自车的预测运动轨迹不相交或与自车之间的距离不小于设定阈值的障碍物;
    根据传感器系统采集的所述自车的车辆信息、所述非博弈对象的障碍物信息和所述路况信息,构建所述自车的可行域,所述自车的可行域为所述自车在不碰撞所述非博弈对象情况下采取不同决策的至少一个策略;
    检测到所述自车的决策结果在所述自车的可行域内,输出所述自车的决策结果。
  4. 根据权利要求1-3任意一项所述的方法,其特征在于,所述根据传感器系统采集的自车的车辆信息、所述博弈对象的障碍物信息和路况信息,分别对每个所述博弈对象构建一个采样博弈空间,包括:
    根据所述自车的车辆信息、所述博弈对象的障碍物信息和所述路况信息,确定所述自车和所述博弈对象中每一个障碍物的决策上限和决策下限;
    按照设定规则,在所述自车和所述博弈对象中每一个障碍物的决策上限和决策下限中,获取所述自车和所述博弈对象中每一个障碍物的决策策略;
    将所述自车的决策策略与所述博弈对象中每一个障碍物的决策策略组合,得到自车和所述博弈对象中每一个障碍物的所述至少一个博弈策略。
  5. 根据权利要求1-4任意一项所述的方法,其特征在于,所述方法还包括:
    根据自车和所述博弈对象与冲突点之间的距离、所述自车和所述博弈对象中每一个障碍物的所述至少一个博弈策略,确定每一个博弈策略的行为标签,所述冲突点为自车与障碍物的预测运动轨迹相交的位置或自车与障碍物之间的距离小于设定阈值的位置,所述行为标签包括自车让行、自车抢行、自车和障碍物均让行中的至少一个。
  6. 根据权利要求1-5任意一项所述的方法,其特征在于,所述计算每一个博弈策略的策 略代价,包括:
    确定策略代价的各个因素,所述策略代价的各个因素包括安全性、舒适性、通过效率、路权、障碍物的先验概率和历史决策关联性中至少一个;
    计算每一个策略代价中的每一个因素的因素代价;
    将所述每一个策略代价中的每一个因素的因素代价进行加权,得到所述每一个博弈策略的策略代价。
  7. 根据权利要求1-6任意一项所述的方法,其特征在于,在所述计算每一个博弈策略的策略代价之后,还包括:
    比较策略代价中的每一个因素是否在设定范围内;
    删除包括任意一个不在设定范围内的因素的策略代价对应的博弈策略。
  8. 根据权利要求1-7任意一项所述的方法,其特征在于,所述方法还包括:
    检测到所述自车的决策结果不在所述自车的可行域内,输出自车让行的决策结果。
  9. 一种决策装置,其特征在于,包括:
    收发单元,用于获取自车和自车周围各个障碍物的预测运动轨迹;
    处理单元,用于确定博弈对象,所述博弈对象为所述自车周围各个障碍物中与所述自车的预测运动轨迹相交或与自车之间的距离小于设定阈值的障碍物;
    根据传感器系统采集的自车的车辆信息、所述博弈对象的障碍物信息和路况信息,分别对每个所述博弈对象构建一个采样博弈空间,每一个采样博弈空间均包括至少一个博弈策略;
    计算每一个博弈策略的策略代价,所述策略代价为将策略代价的各个因素权重进行加权得到的数值;以及
    确定自车的决策结果,所述决策结果为共有采样博弈空间中策略代价最小的博弈策略,所述共有采样博弈空间包括至少一个博弈策略,所述每一个采样博弈空间中均包括所述共有采样博弈空间中的博弈策略。
  10. 根据权利要求9所述的装置,其特征在于,所述处理单元,具体用于
    构建所述每一个采样博弈空间的可行域,所述每一个采样博弈空间的可行域为符合设定要求的策略代价对应的至少一个博弈策略;
    在所有采样博弈空间的可行域交集中,确定出相同博弈策略中策略代价最小的博弈策略。
  11. 根据权利要求9或10所述的装置,其特征在于,所述处理单元,还用于
    确定非博弈对象,所述非博弈对象为所述自车周围各个障碍物中与所述自车的预测运动轨迹不相交或与自车之间的距离不小于设定阈值的障碍物;
    根据传感器系统采集的所述自车的车辆信息、所述非博弈对象的障碍物信息和所述路况信息,构建自车的可行域,所述自车的可行域为所述自车在不碰撞所述非博弈对象情况下采取不同决策的至少一个策略;
    检测到所述自车的决策结果在所述自车的可行域内,输出所述自车的决策结果。
  12. 根据权利要求9-11任意一项所述的装置,其特征在于,所述处理单元,具体用于
    根据所述自车的车辆信息、所述博弈对象的障碍物信息和所述路况信息,确定所述自车和所述博弈对象中每一个障碍物的决策上限和决策下限;
    按照设定规则,在所述自车和所述博弈对象中每一个障碍物的决策上限和决策下限中, 获取所述自车和所述博弈对象中每一个障碍物的决策策略;
    将所述自车的决策策略与所述博弈对象中每一个障碍物的决策策略组合,得到自车和所述博弈对象中每一个障碍物的所述至少一个博弈策略。
  13. 根据权利要求9-12任意一项所述的装置,其特征在于,所述处理单元,还用于
    根据自车和所述博弈对象与冲突点之间的距离、所述自车和所述博弈对象中每一个障碍物的所述至少一个博弈策略,确定每一个博弈策略的行为标签,所述冲突点为自车与障碍物的预测运动轨迹相交的位置或自车与障碍物之间的距离小于设定阈值的位置,所述行为标签包括自车让行、自车抢行、自车和障碍物均让行中的至少一个。
  14. 根据权利要求9-13任意一项所述的装置,其特征在于,所述处理单元,具体用于
    确定策略代价的各个因素,所述策略代价的各个因素包括安全性、舒适性、通过效率、路权、障碍物的先验概率和历史决策关联性中至少一个;
    计算每一个策略代价中的每一个因素的因素代价;
    将所述每一个策略代价中的每一个因素的因素代价进行加权,得到所述每一个博弈策略的策略代价。
  15. 根据权利要求9-14任意一项所述的装置,其特征在于,所述处理单元,还用于
    比较策略代价中的每一个因素是否在设定范围内;
    删除包括任意一个不在设定范围内的因素的策略代价对应的博弈策略。
  16. 根据权利要求9-15任意一项所述的装置,其特征在于,所述处理单元,还用于
    检测到所述自车的决策结果不在所述自车的可行域内,输出自车让行的决策结果。
  17. 一种车辆,包括至少一个处理器,所述处理器用于执行存储器中存储的指令,执行如权利要求1-8任一所述的方法。
  18. 一种智能驾驶系统,包括传感器系统和处理器,所述处理器用于执行如权利要求1-8中的任一项所述的方法。
  19. 一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行权利要求1-8中任一项的所述的方法。
  20. 一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-8中任一项所述的方法。
  21. 一种计算机程序产品,其特征在于,所述计算机程序产品存储有指令,所述指令在由计算机执行时,使得所述计算机实施权利要求1-8任意一项所述的方法。
PCT/CN2022/077480 2021-04-26 2022-02-23 一种决策方法、装置和车辆 WO2022227827A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22794307.3A EP4321406A1 (en) 2021-04-26 2022-02-23 Decision-making method and apparatus and vehicle
US18/495,071 US20240051572A1 (en) 2021-04-26 2023-10-26 Decision making method and apparatus, and vehicle

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110454337.X 2021-04-26
CN202110454337.XA CN115246415A (zh) 2021-04-26 2021-04-26 一种决策方法、装置和车辆

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/495,071 Continuation US20240051572A1 (en) 2021-04-26 2023-10-26 Decision making method and apparatus, and vehicle

Publications (1)

Publication Number Publication Date
WO2022227827A1 true WO2022227827A1 (zh) 2022-11-03

Family

ID=83696008

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/077480 WO2022227827A1 (zh) 2021-04-26 2022-02-23 一种决策方法、装置和车辆

Country Status (4)

Country Link
US (1) US20240051572A1 (zh)
EP (1) EP4321406A1 (zh)
CN (1) CN115246415A (zh)
WO (1) WO2022227827A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117246320A (zh) * 2023-11-10 2023-12-19 新石器慧通(北京)科技有限公司 车辆的控制方法、装置、设备及存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117227763B (zh) * 2023-11-10 2024-02-20 新石器慧通(北京)科技有限公司 基于博弈论和强化学习的自动驾驶行为决策方法和装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111038497A (zh) * 2019-12-25 2020-04-21 苏州智加科技有限公司 自动驾驶控制方法、装置、车载终端及可读存储介质
CN111775961A (zh) * 2020-06-29 2020-10-16 北京百度网讯科技有限公司 自动驾驶车辆规划方法、装置、电子设备及存储介质
CN112373470A (zh) * 2020-11-17 2021-02-19 聊城大学 紧急避让工况自动驾驶转向制动Nash博弈控制方法
CN112373485A (zh) * 2020-11-03 2021-02-19 南京航空航天大学 一种考虑交互博弈的自动驾驶车辆决策规划方法
CN112896186A (zh) * 2021-01-30 2021-06-04 同济大学 一种车路协同环境下的自动驾驶纵向决策控制方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111038497A (zh) * 2019-12-25 2020-04-21 苏州智加科技有限公司 自动驾驶控制方法、装置、车载终端及可读存储介质
CN111775961A (zh) * 2020-06-29 2020-10-16 北京百度网讯科技有限公司 自动驾驶车辆规划方法、装置、电子设备及存储介质
CN112373485A (zh) * 2020-11-03 2021-02-19 南京航空航天大学 一种考虑交互博弈的自动驾驶车辆决策规划方法
CN112373470A (zh) * 2020-11-17 2021-02-19 聊城大学 紧急避让工况自动驾驶转向制动Nash博弈控制方法
CN112896186A (zh) * 2021-01-30 2021-06-04 同济大学 一种车路协同环境下的自动驾驶纵向决策控制方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117246320A (zh) * 2023-11-10 2023-12-19 新石器慧通(北京)科技有限公司 车辆的控制方法、装置、设备及存储介质
CN117246320B (zh) * 2023-11-10 2024-02-09 新石器慧通(北京)科技有限公司 车辆的控制方法、装置、设备及存储介质

Also Published As

Publication number Publication date
US20240051572A1 (en) 2024-02-15
CN115246415A (zh) 2022-10-28
EP4321406A1 (en) 2024-02-14

Similar Documents

Publication Publication Date Title
US11210744B2 (en) Navigation based on liability constraints
US11164264B2 (en) Navigation based on liability constraints
US20210341920A1 (en) Methods and systems for trajectory forecasting with recurrent neural networks using inertial behavioral rollout
CN110488802B (zh) 一种网联环境下的自动驾驶车辆动态行为决策方法
EP3854646A2 (en) Systems and methods for navigating with safe distances
WO2022227827A1 (zh) 一种决策方法、装置和车辆
WO2020245654A1 (en) Systems and methods for vehicle navigation
KR20190099088A (ko) 차량 움직임에 기반한 항법
GB2608567A (en) Operation of a vehicle using motion planning with machine learning
JP2023510136A (ja) 知覚、予測又は計画のための地理的位置特定モデル
KR102424067B1 (ko) 정보 처리 방법, 장치 및 저장 매체
US20230286536A1 (en) Systems and methods for evaluating domain-specific navigation system capabilities
US20200172098A1 (en) Multi-headed recurrent neural network (rnn) for multi-class trajectory predictions
EP4222036A1 (en) Methods and systems for predicting actions of an object by an autonomous vehicle to determine feasible paths through a conflicted area
WO2021133745A1 (en) Resource prioritization based on travel path relevance
Muzahid et al. Learning-based conceptual framework for threat assessment of multiple vehicle collision in autonomous driving
Hacohen et al. Autonomous driving: A survey of technological gaps using google scholar and web of science trend analysis
WO2022160634A1 (zh) 一种路径规划方法及装置
CN115503756A (zh) 一种智能驾驶决策方法、决策装置以及车辆
US20230339509A1 (en) Pull-over site selection
KR20230024392A (ko) 주행 의사 결정 방법 및 장치 및 칩
Gupta et al. Hylear: Hybrid deep reinforcement learning and planning for safe and comfortable automated driving
US20230339507A1 (en) Contextual right-of-way decision making for autonomous vehicles
US20230331230A1 (en) Navigating minor-major intersections with an autonomous vehicle
Gawronski et al. Pedestrian intention detection as a resource competition challenge

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22794307

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022794307

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022794307

Country of ref document: EP

Effective date: 20231110

NENP Non-entry into the national phase

Ref country code: DE