CN116279595A

CN116279595A - Behavior decision method and device of vehicle, terminal equipment and storage medium

Info

Publication number: CN116279595A
Application number: CN202310539475.7A
Authority: CN
Inventors: 艾锐; 欧洋佳欣; 唐科; 顾维灏
Original assignee: Haomo Zhixing Technology Co Ltd
Current assignee: Haomo Zhixing Technology Co Ltd
Priority date: 2023-05-12
Filing date: 2023-05-12
Publication date: 2023-06-23

Abstract

The application is applicable to the technical field of automatic driving, and provides a behavior decision method, a device, terminal equipment and a storage medium of a vehicle, wherein the behavior decision method comprises the following steps: acquiring running state information of a first vehicle and a second vehicle at the current moment, wherein the first vehicle and the second vehicle are in the same traffic scene; determining behavior decision information of the first vehicle according to a game model and the running state information of the first vehicle and the second vehicle; determining the execution behavior corresponding to the current moment according to the behavior decision information, and driving the first vehicle to execute the corresponding execution behavior at the current moment. Because the behavior decision information of the first vehicle is a game result obtained according to a game model, the behavior decision information can be guaranteed to have the advantages of safety and high efficiency in running, and therefore the accuracy of vehicle behavior decision can be improved according to the execution behavior of the current moment decided by the behavior decision information, so that safe and high-efficiency behavior decision planning can be completed.

Description

Behavior decision method and device of vehicle, terminal equipment and storage medium

Technical Field

The application belongs to the technical field of automatic driving, and particularly relates to a behavior decision method and device for a vehicle, terminal equipment and a storage medium.

Background

With the rapid development of automobile electronics and advanced auxiliary driving technology, automatic driving technology has become an important way to solve the problem of travel in the future as an advanced stage of auxiliary driving.

At present, under the condition that traffic participants are relatively single, such as a lane change scene or an assembly scene, the application of an automatic driving technology is relatively mature, and for the scene with complex traffic conditions in urban roads, such as intersections, the automatic driving vehicle cannot complete the decision planning of safe and efficient self-behaviors due to the fact that the traffic participants are more and the road structure is complex.

Disclosure of Invention

The embodiment of the application provides a behavior decision method, a device, a terminal device and a storage medium of a vehicle, which can solve the problem that an automatic driving vehicle cannot complete safe and efficient self behavior decision planning under a complex scene.

A first aspect of an embodiment of the present application provides a behavior decision method of a vehicle, where the behavior decision method of the vehicle includes:

Acquiring running state information of a first vehicle and a second vehicle at the current moment, wherein the first vehicle and the second vehicle are in the same traffic scene;

determining behavior decision information of the first vehicle according to a game model and the running state information of the first vehicle and the second vehicle;

determining the execution behavior corresponding to the current moment according to the behavior decision information, and driving the first vehicle to execute the corresponding execution behavior at the current moment.

A second aspect of the embodiments of the present application provides a behavior decision device of a vehicle, including:

the information acquisition module is used for acquiring running state information of a first vehicle and a second vehicle at the current moment, wherein the first vehicle and the second vehicle are in the same traffic scene;

the decision determining module is used for determining behavior decision information of the first vehicle according to the game model and the running state information of the first vehicle and the second vehicle;

and the behavior determination module is used for determining the execution behavior corresponding to the current moment according to the behavior decision information and driving the first vehicle to execute the corresponding execution behavior at the current moment.

A third aspect of the embodiments of the present application provides a terminal device, including: the system comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the behavior decision method of the vehicle according to the first aspect when executing the computer program.

A fourth aspect of the embodiments of the present application provides a computer-readable storage medium storing a computer program that, when executed by a processor, implements the behavior decision method of the vehicle described in the first aspect.

A fifth aspect of embodiments of the present application provides a computer program product, which when run on a terminal device, causes the terminal device to perform the vehicle behavior decision method according to the first aspect described above.

Compared with the prior art, the embodiment of the application has the beneficial effects that: the method comprises the steps that firstly, running state information of a first vehicle and running state information of a second vehicle at the current moment are obtained, and the first vehicle and the second vehicle are located in the same traffic scene, such as an intersection; secondly, determining behavior decision information of the first vehicle according to the game model and the running state information of the first vehicle and the second vehicle, wherein the behavior decision information comprises execution behaviors corresponding to each time point; and finally, determining the execution behavior corresponding to the current moment according to the behavior decision information, and driving the first vehicle to execute the corresponding execution behavior at the current moment.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic view of a traffic scene in an embodiment of the present application;

FIG. 2 is a flow chart of a vehicle behavior decision method according to an embodiment of the present disclosure;

fig. 3 is a flow chart of a behavior decision method of a vehicle according to a second embodiment of the present application;

fig. 4 is a flow chart of a behavior decision method of a vehicle according to a third embodiment of the present application;

fig. 5 is a flow chart of a behavior decision method of a vehicle according to a fourth embodiment of the present application;

fig. 6 is a flow chart of a behavior decision method of a vehicle provided in a fifth embodiment of the present application;

FIG. 7 is a graph comparing the travel curves of a first vehicle before and after joining a gaming model;

fig. 8 is a schematic structural diagram of a behavior decision device of a vehicle according to a sixth embodiment of the present application;

Fig. 9 is a schematic structural diagram of a terminal device according to a seventh embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

In addition, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

The behavior decision planning of the vehicle is always a critical part of the automatic driving field, at present, aiming at a scene with relatively single traffic participants, the behavior decision planning of the vehicle in the automatic driving process is relatively mature, and aiming at a scene with more traffic participants and more complex traffic scenes, the behavior decision planning of the opposite side cannot be deduced according to signals such as eye concentration or actions of other traffic participants because the automatic driving vehicle does not have corresponding social attributes, so that the automatic driving vehicle can uniformly select the behavior decision of deceleration or stopping for giving way under the condition that the traffic participants are more and the traffic scenes are more complex, and the automatic driving vehicle cannot efficiently complete the behavior decision planning of the vehicle under the condition that the traffic scenes are complex.

In order to enable an automatic driving vehicle to efficiently complete self behavior ju plan under a complex traffic scene, the application provides a vehicle behavior decision method, wherein the vehicle behavior decision method comprises the steps that firstly, running state information of a first vehicle and a second vehicle at the current moment is acquired, the first vehicle is the automatic driving vehicle, and the first vehicle and the second vehicle are in the same traffic scene, such as an intersection; secondly, determining behavior decision information of the first vehicle according to the game model and the running state information of the first vehicle and the second vehicle, wherein the behavior decision information comprises execution behaviors corresponding to each time point; and finally, determining the execution behavior corresponding to the current moment according to the behavior decision information, and driving the first vehicle to execute the corresponding execution behavior at the current moment.

It should be understood that the sequence number of each step in this embodiment does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.

In order to illustrate the technical solution of the present application, the following description is made by specific embodiments.

Referring to fig. 1, a schematic view of a traffic scene in an embodiment of the present application is shown. As shown in fig. 1, in this traffic scenario, a first vehicle may refer to vehicle a, and a second vehicle may refer to vehicle B, where vehicle a is an autonomous vehicle, and vehicle B is an autonomous vehicle or a vehicle driven by a person.

In the traffic scenario shown in fig. 1, in order to enable the vehicle a to complete a safe and efficient behavior decision plan, the running state information of the vehicle a and the vehicle B at the current moment may be firstly obtained, where the current moment may refer to any current time point when the vehicle a runs in the traffic scenario, for example, the current time point when the vehicle a runs to one meter, the current time point when the vehicle a runs to two meters, and then the running state information of the vehicle a and the vehicle B at the current moment may be determined according to a game model, so that the vehicle a and the vehicle B game with each other, and the behavior decision information of the vehicle is determined according to the running state information of the opposite vehicle, that is, the behavior decision information of the vehicle a may be determined.

In a possible implementation manner, when the green light is turned on according to the indication of the traffic light shown in fig. 1, running state information of the vehicle a and the vehicle B at the time of turning on the green light is obtained, gaming is performed according to the running state information and the game model, behavior decision information of the vehicle a is determined, and corresponding execution behaviors of the vehicle a at the time of turning on the green light are determined according to the behavior decision information of the vehicle a, so as to drive the vehicle a to execute the corresponding execution behaviors at the time of turning on the green light.

In the embodiment of the application, the first vehicle and the second vehicle can play games through the game model, so that the behavior decision information of the first vehicle is determined, and the execution behavior corresponding to each time point in a certain time length from the current moment to the future under the traffic scene can be determined, so that the first vehicle can be driven to run according to the execution behavior corresponding to each time point, and the first vehicle can complete safe and efficient behavior decision planning through the scheme.

Referring to fig. 2, a flow chart of a vehicle behavior decision method according to an embodiment of the present application is shown. As shown in fig. 2, the behavior decision method of the vehicle may include the steps of:

Step 201, obtaining running state information of a first vehicle and a second vehicle at the current moment.

The first vehicle may refer to an autopilot vehicle, the second vehicle may refer to a vehicle detected in a traffic scene where the autopilot vehicle is located, and may be an autopilot vehicle or a vehicle driven by a person, which is not limited in this application.

In the embodiment of the present application, the driving state information includes position information, speed information and angle information of the first vehicle and the second vehicle, and the position information of the first vehicle and the second vehicle may be completed by means of a global positioning system, real-time dynamic positioning, a camera, a laser radar, and the like. The speed information of the first and second vehicles may be measured by a speed measuring radar or a sensor mounted on the first vehicle, and the speed information may include real-time speed information and acceleration information of the first and second vehicles. The angle information of the first vehicle and the second vehicle may be obtained through a global positioning system or real-time dynamic positioning, that is, the course angles of the first vehicle and the second vehicle are obtained, wherein the course angles of the vehicles generally refer to the included angle between the centroid speed and the transverse axis of the vehicles in the ground coordinate system, and it should be noted that in the present application, when the course angles of the first vehicle and the second vehicle are obtained, the first vehicle and the second vehicle should be located in the same ground coordinate system.

In one possible embodiment, the first vehicle may acquire the driving state information of the first vehicle and the second vehicle when recognizing that the first vehicle is in a preset scene, where the preset scene may be a scene that lights up with a green light at an intersection. In this preset scenario, the second vehicle may refer to a vehicle located on the opposite lane, such as vehicle B in fig. 1, where the first vehicle is vehicle a. When the green light opposite to the vehicle A is on, the vehicle A can turn left, the vehicle B can go straight, and then the collision risk exists, and the vehicle A can eliminate the collision risk with the vehicle B through acceleration, deceleration or parking, so that the behavior decision of the vehicle A needs to be planned. In the prior art, a vehicle A generally selects a behavior strategy for decelerating and letting go, so that the driving passing efficiency of the vehicle A is low.

Step 202, determining behavior decision information of the first vehicle according to the game model and the running state information of the first vehicle and the second vehicle.

In the embodiment of the application, in order to avoid the situation that the driving passing efficiency is low due to one-taste selection yielding when the automatic driving vehicle interacts with other vehicles in a scene with a complex road structure, the driving state information of the first vehicle and the second vehicle is processed according to the game model, and the behavior decision information of the first vehicle is determined.

In one possible embodiment, determining the behavior decision information of the first vehicle according to the gaming model and the driving state information of the first vehicle and the second vehicle may include:

inputting the driving state information into a game model, and calculating utility values corresponding to N behavior decision information of the first vehicle respectively by the game model according to the driving state information and a payment function, wherein N is an integer greater than zero;

and determining the behavior decision information corresponding to the highest utility value as the behavior decision information of the first vehicle.

The payment function is used to represent the utility level obtained by the game participant from the game, the participant refers to a decision-making body for selecting actions in the game to maximize self utility, in this application, the first vehicle and the second vehicle, and the utility value refers to the utility obtained by the participant after selecting actions.

In the embodiment of the application, the payment function may use a loss function or a reward function, etc. If the loss function is used, the corresponding utility value is a loss value, the loss value is used for representing the loss of the first vehicle in the game, after the game model calculates N behavior decision information of the first vehicle according to the running state information and the loss function, the highest utility value is the corresponding lowest loss value, and the behavior decision information corresponding to the lowest loss value can be determined as the behavior decision information of the first vehicle; if the reward function is used, the corresponding utility value is a reward value, the reward value is used for indicating the reward of the first vehicle in the game, the reward of the first vehicle can correspond to the driving passing rate of the first vehicle when the first vehicle interacts with the second vehicle, after the driving state information is input into the game model, the game model calculates the reward values respectively corresponding to the N types of behavior decision information of the first vehicle according to the driving state information and the reward function, and the behavior decision information corresponding to the highest reward value can be determined as the behavior decision information of the first vehicle correspondingly.

In one possible implementation manner, in a case that the payment function is a first reward function, the utility value corresponds to a first reward value calculated by the first reward function, and the gaming model calculates utility values respectively corresponding to N behavior decision information of the first vehicle according to the driving state information and the payment function, where the utility value includes:

Calculating first bonus values corresponding to M execution behaviors of the first vehicle at the current moment respectively according to the running state information and the first bonus function by using a game model, wherein M is an integer greater than 1;

acquiring M types of running state information of a first vehicle at a next time point and running state information of a second vehicle at the next time point after M types of execution behaviors are respectively executed;

and iteratively calculating the combination of all execution behaviors of the first vehicle and the second vehicle from the current moment to the moment H to generate different tracks of the square H of M, and calculating a first rewarding value corresponding to each track according to the first rewarding value by a game model, wherein N is equal to the square H of M.

In this embodiment of the present application, the behavior decision information includes execution behaviors of the first vehicle at all time points, where N is equal to the power H of M, and H is the number of all time points in the preset time period.

The all time points may refer to all time points during the running of the first vehicle in a traffic scene, for example, one time point may be set every 1 second, assuming that the traffic scene is an intersection, the time period for the first vehicle to pass through the intersection is a preset time period, for example, the preset time period may be set to 40 seconds, and then the number of all time points is 40.

By way of example, assuming that a maximum of 40 seconds is required for the first vehicle to pass through an intersection, gaming may be conducted for a total of 40 points in time, i.e., every 1 second, and the first vehicle's performance includes three of acceleration, deceleration, and stopping every 1 second. Therefore, the position information, the speed information and the angle information of the first vehicle and the second vehicle at the 1 st second (i.e. at the time of starting the vehicle) can be acquired first, the position information, the speed information and the angle information of the first vehicle and the second vehicle are input into the game model, the first bonus value corresponding to the acceleration, the deceleration and the stop respectively at the 1 st second is calculated, and the corresponding first bonus value can be calculated specifically by the following first bonus function:

the first bonus function R for the first vehicle i is defined as:

r (τt) =w ₁ *danger+w ₂ *comfort+w ₃ *efficiency

danger＝e ^{-collision_risk}

efficiency＝e ^-distance

The danger is a collision risk item, which is obtained by judging whether collision occurs according to a path point corresponding to each time point in a reference path of the first vehicle and the second vehicle, and the comfort item is used for evaluating whether the current strategy is comfortable. The efficiency value is the efficiency (i.e. the driving passing rate) of the first vehicle reaching the target position calculated according to the real-time speed information in the speed information and the uniform motion model. The collision_risk is a collision risk calculation value calculated from the euclidean distance between the track points of the first vehicle and the second vehicle. jerk refers to the rate of change of acceleration, acc refers to acceleration. w (w) ₁ 、w ₂ And w ₃ Respectively corresponding to collision value weight, distance value weight and efficiency value weight, wherein τ=0 is represented as the current time, namely 1 st second, τ=1 is represented as the next time point of the current time, and H is the number of all time points.

Wherein R 'is' _i The adaptive level_k gaming model is corresponding to the second bonus function for the first vehicle i. Where K is the level number, K is the total number of layers defined, and represents the weight of the kth level. Interpretation of the remaining parameters and R as described above _i The same is not described in detail herein.

At the time of obtaining 3R _τ＝0 That is, after 3 first bonus points, three kinds of running state information corresponding to the first vehicle at the 2 nd second after acceleration, deceleration and parking are performed respectively, and running state information of the second vehicle at the 2 nd second (where the running state information of the second vehicle at the 2 nd second may be obtained according to the real-time speed information of the second vehicle at the 1 st second and the uniform motion model, that is, predicted values), three kinds of running state information corresponding to the first vehicle at the 2 nd second and the running state information corresponding to the second vehicle at the 2 nd second are obtained, 9 kinds of running state information corresponding to the first vehicle after acceleration, deceleration and parking are performed respectively at the 3 rd second are calculated, and the running state of the second vehicle at the 3 rd second is obtained And (5) information, circularly executing the steps until the time H is reached. According to the assumption of 3 execution behaviors per time point, there is a common (C ¹ ₃ ) ⁿ By combining, i.e. 3 ⁴⁰ Seed combination method (i.e. 3 ⁴⁰ Behavior decision information). Each combination method corresponds to a speed curve of the first vehicle, a first rewarding value of each combination method is calculated according to a preset first rewarding function in a game mode, and behavior decision information corresponding to the highest first rewarding value is determined as the behavior decision information of the first vehicle, wherein the behavior decision information comprises execution behaviors (namely acceleration, deceleration or parking) corresponding to each time point.

It should be appreciated that the first reward function of the first vehicle i at the time t considers the reward value of each time point in the preset time period after the time t, so that the obtained behavior decision at the time t is more reasonable.

It should also be appreciated that the first bonus function in embodiments of the present application corresponds to a bonus function of a level-k gaming model.

Step 203, determining an execution behavior corresponding to the current time according to the behavior decision information, and driving the first vehicle to execute the corresponding execution behavior at the current time.

In the embodiment of the present application, since the determined behavior decision information of the first vehicle is the behavior decision information with the highest corresponding utility value, the behavior decision information may be determined to be the optimal behavior decision information, so that the efficiency of the first vehicle reaching the target position may be higher, and since the behavior decision information includes the corresponding execution behaviors of all time points, the execution behavior corresponding to the current time may be determined by back-pushing according to the behavior decision information, and the first vehicle may be driven to execute the corresponding execution behavior at the current time.

In the embodiment of the application, since the behavior decision information of the first vehicle is a game result obtained according to the game model, the behavior decision information can be ensured to have the advantages of safety and high efficiency in running, so that the accuracy of the vehicle behavior decision can be improved according to the execution behavior of the current moment decided by the behavior decision information, and the safe and high-efficiency behavior decision planning can be completed.

Referring to fig. 3, a flow chart of a vehicle behavior decision method according to a second embodiment of the present application is shown. As shown in fig. 3, the behavior decision method of the vehicle may include the steps of:

step 301, obtaining running state information of a first vehicle and a second vehicle at the current moment.

The step 301 of this embodiment is the same as the step 201 of the previous embodiment, and can be referred to each other, and the description of this embodiment is omitted here.

Step 302, determining the actual reasoning level of the second vehicle at the current moment according to the execution behavior of the second vehicle at each predicted reasoning level at the current moment and the actual execution behavior of the second vehicle.

In this embodiment of the present application, since the second vehicle may be an autonomous vehicle or a vehicle driven by a person, if the second vehicle is an autonomous vehicle, for the same driving state information, different behavior decisions will be made at each inference level, where the inference level is used to indicate a rational driving level of the vehicle, for example, when the autonomous vehicle is zero, the autonomous vehicle will not infer the behavior of other vehicles during driving, and may subjectively consider that the autonomous vehicle owns the road right, and consider that the other vehicles are static obstacles.

In the embodiment of the application, since the first vehicle cannot directly determine the reasoning grade of the second vehicle, the actual reasoning grade of the second vehicle at the current moment can be determined according to the execution behavior of the second vehicle at each prediction reasoning grade and the actual execution behavior of the second vehicle.

In one possible embodiment, determining the actual inference level of the second vehicle according to the execution behavior of the second vehicle at each of the predicted inference levels and the actual execution behavior of the second vehicle, includes:

acquiring the execution behaviors of the second vehicle at the current moment, which correspond to the prediction inference levels respectively;

comparing each execution behavior with the actual execution behavior of the second vehicle at the current moment one by one;

and determining the predicted reasoning level corresponding to the execution behavior identical to the actual execution behavior of the second vehicle as the actual reasoning level of the second vehicle at the current moment.

In this embodiment of the present application, the obtaining the execution behaviors of the second vehicle at the current moment, where the execution behaviors correspond to the prediction inference levels respectively, may correspond to different obtaining methods according to different prediction inference levels, and specifically may include the following processes: when the predicted reasoning level is zero, determining the execution behavior of the second vehicle corresponding to the zero predicted reasoning level according to the vehicle running rule corresponding to the zero predicted reasoning level, wherein the vehicle running rule corresponding to the zero predicted reasoning level is the behavior of the automatic driving vehicle which does not infer other vehicles in the running process, and subjectively thinking that the vehicle has road rights and the vehicle is regarded as a static obstacle; when the predicted reasoning level is a non-zero reasoning level, based on the game model, according to the first preset reasoning level of the first vehicle, each predicted reasoning level of the second vehicle and the running state information of the current moment, the execution behavior of the second vehicle corresponding to each predicted reasoning level can be calculated respectively.

In an exemplary embodiment, when the predicted inference level of the second vehicle is zero, a first execution behavior corresponding to the first vehicle when the inference level is one may be obtained, and then the first execution behavior and the driving state information are input into the game model, and by maximizing the utility value of the game model, a second execution behavior corresponding to the second vehicle when the predicted inference level is two may be obtained.

It should be understood that according to the above method, the corresponding execution behavior of the second vehicle when the prediction inference level is the non-zero inference level can be obtained, and when the prediction inference level is the zero inference level, the corresponding execution behavior can be obtained according to the corresponding vehicle driving rule, so that the execution behavior of the second vehicle respectively corresponding to each prediction inference level can be obtained.

In this embodiment of the present application, after the execution behaviors of the second vehicle corresponding to the respective prediction inference levels are obtained, the execution behaviors of the respective prediction inference levels may be compared with the actual execution behaviors of the second vehicle at the current time one by one, the actual execution behaviors of the second vehicle may be determined according to the real-time speed information of each time point, if the speed increases, the actual execution behaviors of the second vehicle are determined to be accelerated, and after comparison, the prediction inference level corresponding to the execution behaviors identical to the actual execution behaviors of the second vehicle may be determined to be the actual inference level of the second vehicle at the current time.

According to the above method, the actual level of reasoning for the second vehicle can be determined.

In one possible implementation, the behavior decision method further includes:

updating the probability of each predicted reasoning level according to the determined actual reasoning level of the second vehicle at the previous moment;

correspondingly, acquiring the execution behaviors of the second vehicle at the current moment, which correspond to the prediction inference levels respectively, comprises the following steps:

and acquiring the execution behaviors of the second vehicle at the current moment, which correspond to the prediction inference levels respectively, according to the probability of the prediction inference levels.

Specifically, the probability of each prediction inference level at the current moment can be updated according to the actual inference level of the second vehicle determined at the previous moment, then the execution behaviors of the second vehicle at the current moment corresponding to each prediction inference level are sequentially obtained according to the probability of each prediction inference level, and if the corresponding execution behaviors determined according to the prediction inference level with the highest probability value are the same as the actual execution behaviors of the second vehicle, the execution behaviors corresponding to other prediction inference levels are not obtained any more, so that the calculated amount can be reduced, and the running speed of the system can be increased.

Step 303, determining behavior decision information of the first vehicle at the current moment according to the actual reasoning level of the second vehicle at the current moment, the first preset reasoning level of the first vehicle, the game model and the running state information of the first vehicle and the second vehicle at the current moment.

In the embodiment of the application, the self rewarding function can be maximized according to the running state information of the first vehicle and the second vehicle and the actions of the opposite side in the current reasoning level, and the self execution behavior can be calculated, so that the behavior decision information of the first vehicle can be determined according to the corresponding execution behavior of the first vehicle in the first preset reasoning level at each time point.

Wherein the process of calculating the corresponding execution behavior of the first vehicle at any time point in the first pre-examination reasoning level can comprise: inputting the running state information and the first execution behavior corresponding to the first vehicle when the actual reasoning level is zero into the game model, and obtaining the second execution behavior corresponding to the second vehicle when the actual reasoning level is one by maximizing the utility value of the game model; inputting the second execution behavior and the driving state information into a game model, and obtaining a third execution behavior corresponding to the first vehicle when the actual reasoning level is two by maximizing the utility value of the game model; and taking the third execution behavior as the first execution behavior, circularly executing the steps of inputting the running state information and the first execution behavior into the game model and then, sequentially increasing the actual reasoning level corresponding to each obtained execution behavior until the corresponding execution behavior of the first vehicle in the first preset reasoning level is obtained. The corresponding execution behavior of the first vehicle at the first preset reasoning level at any time point can be obtained according to the method.

It should be appreciated that when the first preset inference level of the first vehicle is even, calculation may be started according to the first execution behavior corresponding to the first vehicle when the actual inference level is zero; when the first preset reasoning level of the first vehicle is odd, calculation is needed to be started according to the corresponding execution behavior of the second vehicle when the actual reasoning level is zero, and the calculation method is the same as the principle of the method, namely, the execution behavior of the second vehicle can be calculated and obtained by maximizing the rewarding function of the second vehicle according to the running state information of the first vehicle and the running state information of the second vehicle and the actions of the opponent when the current reasoning level is reduced by one level.

In one possible implementation manner, in a case that the second vehicle has K predictive reasoning levels and the payment function is a second rewarding function, determining the behavior decision information of the first vehicle at the current moment further includes:

calculating second bonus sharing values respectively corresponding to the first vehicle after M execution behaviors are executed when the second vehicle is positioned at different prediction reasoning grades by a game model according to the running state information and the second bonus function;

acquiring M pieces of driving state information of the first vehicle at the next time point after M pieces of execution behaviors are executed;

Iteratively calculating the combination of all execution behaviors of the first vehicle and the second vehicle from the current moment to the moment H to generate different tracks of the square H of M, and calculating a second bonus value corresponding to each track according to the second bonus value by the game model, wherein N is equal to the square H of M;

and determining the behavior decision information corresponding to the highest second rewarding value as the behavior decision information of the first vehicle at the current moment.

In the embodiment of the present application, in the process of determining the behavior decision information of the first vehicle at the current moment, the second prize value corresponding to the second prize function is maximized, where the second prize value includes second prize values respectively corresponding to different prediction inference levels for executing the same execution behavior, and if the second prize value is to be maximized, the second prize value after executing the same execution behavior needs to be maximized.

It should be appreciated that the second bonus function in embodiments of the present application may correspond to a bonus function of an adaptive level-k gaming model.

In this embodiment, the above-mentioned process of determining the behavior decision information of the current moment of the first vehicle is similar to the process of determining the behavior decision information of the current moment of the first vehicle in step 202 of the first embodiment, except that the maximized second prize value in this embodiment includes the sum of the second prize values obtained according to all the predicted prize scales, and the maximized first prize value in the first embodiment includes the first prize value obtained according to one of the prize scales, and the prize values of all the prize scales do not need to be calculated. Except for the above differences, the steps are the same as the step 202 of the embodiment, and reference may be made to each other, and the description of this embodiment is omitted here.

Step 304, determining an execution behavior corresponding to the current time according to the behavior decision information, and driving the first vehicle to execute the corresponding execution behavior at the current time.

The step 304 of this embodiment is the same as the step 203 of the previous embodiment, and can be referred to each other, and the description of this embodiment is omitted here.

Compared with the first embodiment, the method introduces the concept of the vehicle reasoning level, and can greatly influence the game process under the condition that the second vehicle reasoning level is different, the obtained execution behaviors of the second vehicle are different, the corresponding execution behaviors of the first vehicle are also different, the execution behaviors of the first vehicle at each time point are changed at any time according to the actual reasoning level of the second vehicle, and the method can be more in line with the real traffic scene.

Referring to fig. 4, a flow chart of a behavior decision method of a vehicle according to a third embodiment of the present application is shown. As shown in fig. 4, the behavior decision method of the vehicle may include the steps of:

step 401, acquiring running state information of a first vehicle and a second vehicle at the current moment.

The step 401 of this embodiment is the same as the step 201 of the previous embodiment, and can be referred to each other, and the description of this embodiment is omitted here.

And step 402, determining the actual reasoning level of the second vehicle at the initial time point according to the execution behavior of the second vehicle at each predicted reasoning level and the actual execution behavior of the second vehicle at the initial time point.

In the embodiment of the application, when the current time is an initial time point in a preset time period, determining an actual reasoning level of the second vehicle at the initial time point according to the execution behavior of the second vehicle at each prediction reasoning level and the actual execution behavior of the second vehicle at each prediction reasoning level at the initial time point; the implementation process is the same as that of step 302, and may be referred to each other, and the embodiments of the present application are not described herein again.

Step 403, determining behavior decision information of the first vehicle at any time point according to the actual reasoning level of the second vehicle at the initial time point, the first preset reasoning level of the first vehicle, the game model and the running state information of the first vehicle and the second vehicle at any time point.

In the embodiment of the application, in order to reduce the calculated amount of the system and accelerate the running speed of the system, the actual reasoning level of the second vehicle at the initial time point can be used as the actual reasoning level of the second vehicle in the whole game process, real-time judgment on the actual reasoning level of the second vehicle is not needed, and the calculated amount of the system can be reduced.

The specific implementation process is similar to that of step 303, the actual inference level of the second vehicle at the current moment in step 303 may be replaced by the actual inference level of the second vehicle at the initial time point, and the behavior decision information of the first vehicle at each time point may be determined according to the actual inference level of the second vehicle at the initial time point, that is, when the behavior decision information of the first vehicle at each time point is determined, the actual inference level of the second vehicle at the initial time point is adopted to play games.

Step 404, determining an execution behavior corresponding to the current time according to the behavior decision information, and driving the first vehicle to execute the corresponding execution behavior at the current time.

The step 404 of this embodiment is the same as the step 203 of the previous embodiment, and can be referred to each other, and the description of this embodiment is omitted here.

Compared with the second embodiment, the embodiment of the application takes the actual reasoning grade of the second vehicle at the initial time point as the actual reasoning grade of the second vehicle in the whole game process, real-time judgment on the actual reasoning grade of the second vehicle is not needed, the calculated amount of the system can be reduced, and meanwhile, the actual reasoning grade of the second vehicle determined at the initial time point is adopted, and the reasoning grade of the vehicle is not changed under the conventional condition, so that the calculated amount of the system can be reduced under the condition of ensuring the accuracy of the method.

Referring to fig. 5, a flow chart of a vehicle behavior decision method according to a fourth embodiment of the present application is shown. As shown in fig. 5, the behavior decision method of the vehicle may include the steps of:

step 501, obtaining running state information of a first vehicle and a second vehicle at the current moment.

The step 501 of this embodiment is the same as the step 201 of the previous embodiment, and can be referred to each other, and the description of this embodiment is omitted here.

Step 502, inputting the driving state information and the first execution behavior corresponding to the first vehicle when the actual reasoning level is zero into the game model, and obtaining the second execution behavior corresponding to the second vehicle when the actual reasoning level is one by maximizing the utility value of the game model.

In this embodiment of the present application, the actual inference level of the first vehicle is a first preset inference level, the actual inference level of the second vehicle is a second preset inference level, when the first preset inference level is even, the execution behavior of the first vehicle corresponding to the zero actual inference level may be obtained by setting from the time when the first vehicle starts to calculate when the actual inference level is zero. And according to the execution behavior of the first vehicle corresponding to the actual reasoning level of zero, first obtaining a second execution behavior of the second vehicle corresponding to the actual reasoning level of one.

Step 503, inputting the second execution behavior and the driving state information into the game model, and obtaining a third execution behavior corresponding to the first vehicle when the actual reasoning level is two by maximizing the utility value of the game model.

In the embodiment of the application, according to the running state information of the first vehicle and the second vehicle and the actions of the opposite side when the current reasoning level is reduced by one level, the self rewarding function is maximized, and the principle of the self execution behavior can be calculated, and then according to the second execution behavior of the second vehicle corresponding to the actual reasoning level being one, the third execution behavior of the first vehicle corresponding to the actual reasoning level being two can be obtained.

It should be understood that if the first preset inference level of the first vehicle is two, proceeding to this step may obtain the execution behavior corresponding to the first vehicle at the current moment. If the first preset inferencing level of the first vehicle is greater than two, the following step 504 may be performed.

Step 504, taking the third execution behavior as the first execution behavior, and circularly executing the steps of inputting the running state information and the first execution behavior into the game model and then sequentially increasing the actual reasoning level corresponding to each obtained execution behavior until the execution behavior corresponding to the first vehicle in the first preset reasoning level is obtained.

Step 505, determining behavior decision information of the first vehicle according to the corresponding execution behaviors of the first vehicle at the first preset reasoning level at each time point.

In the embodiment of the present application, since the behavior decision information includes the execution behavior corresponding to the first vehicle at each time point, the utility value of the game model may be maximized according to the execution behavior corresponding to the first vehicle at the first preset inference level at each time point, and the behavior decision information of the first vehicle may be determined.

Step 506, determining an execution behavior corresponding to the current time according to the behavior decision information, and driving the first vehicle to execute the corresponding execution behavior at the current time.

The step 506 of this embodiment is the same as the step 203 of the previous embodiment, and can be referred to each other, and the description of this embodiment is omitted here.

Compared with the first embodiment, in the embodiment of the present application, since the vehicle traveling on the road is basically a more rational vehicle driven by a person, the actual reasoning level of the second vehicle can be fixed to the second preset reasoning level, the actual reasoning level of the second vehicle is not predicted, and the running speed of the system can be increased.

Referring to fig. 6, a flow chart of a vehicle behavior decision method provided in a fifth embodiment of the present application is shown. As shown in fig. 6, the behavior decision method of the vehicle may include the steps of:

Step 601, obtaining running state information of a first vehicle and a second vehicle at the current moment.

Step 602, determining behavior decision information of the first vehicle according to the game model and the driving state information of the first vehicle and the second vehicle.

Step 603, determining the execution behavior corresponding to the current time according to the behavior decision information.

Steps 601-603 of this embodiment are the same as steps 201-203 of the previous embodiment, and can be referred to each other, and the description of this embodiment is omitted here.

Step 604, a first reference path of the first vehicle and a second reference path of the second vehicle are obtained, and a trip map is generated for the second vehicle to travel on the second reference path.

In this embodiment of the present application, the first reference path and the second reference path may be obtained through a global positioning system, real-time dynamic positioning, and a high-precision map, and may be specifically expressed as: judging whether the vehicle is in a preset scene according to the data acquired by the global positioning system or the real-time dynamic positioning vehicle positioning and the high-precision map, wherein the preset scene can refer to a scene with a complex traffic structure, such as an intersection, an unprotected left-turn scene and the like, and the preset scene is taken as an unprotected left-turn scene for illustration.

For example, after the first vehicle and the second vehicle enter the preset scene, the first reference path and the second reference path may be determined according to the lane in which the vehicle is positioned, for example, the reference path corresponding to the lane where the vehicle a is located in the left turn in fig. 1 is the first reference path shown in fig. 1, and the reference path corresponding to the lane where the vehicle B is located in the straight run is the second reference path shown in fig. 1.

After the first reference path and the second reference path are determined, a travel chart of the second vehicle running on the second reference path can be generated according to the first reference path and the second reference path, wherein the travel chart is represented as a mapping relationship between the running time and the travel of the second vehicle in the traffic scene, such as an ST chart;

step 605, obtain the execution behaviors of the first vehicle corresponding to each time point on the first reference path, and determine the speed curve of the first vehicle running on the first reference path according to the execution behaviors of the first vehicle corresponding to each time point on the first reference path.

In this embodiment of the present application, before the execution behaviors of the first vehicle corresponding to each time point on the first reference path are obtained, a running speed range of the first vehicle may be determined according to an ST chart of the second vehicle running on the second reference path, where the running speed range is used to ensure that the first vehicle and the second vehicle do not collide. Secondly, the method in the above embodiment obtains the execution behaviors of the first vehicle corresponding to each time point on the first reference path, and generates a speed curve of the first vehicle in the running speed range according to the execution behaviors of the first vehicle corresponding to each time point on the first reference path.

In one possible implementation, the curve may be smoothed using a quadratic programming algorithm as the final output speed programming result, i.e., the final speed curve.

Step 606, driving the first vehicle to travel on the first reference path according to the speed profile.

In an embodiment of the present application, a speed profile is issued to the control module to drive the first vehicle to travel on the first reference path according to the speed profile.

Fig. 7 is a graph showing the comparison of the travel curves of the first vehicles before and after joining the game model. According to the comparison graph, in the original behavior decision method, the first vehicle cannot effectively interact with the second vehicle when the first vehicle turns left at the intersection, so that a deceleration and traffic letting strategy is selected, and the traffic efficiency is greatly reduced. After the game model is added, the passing efficiency is improved by solving the game model and running according to the planned speed curve.

In the embodiment of the application, a speed planning algorithm based on a trip chart (ST chart) is used, after the execution behavior corresponding to each time point is received, a feasible speed curve is searched out in the ST chart according to the semantic action, and the curve is optimized; finally, the first reference path and the speed curve of the model are combined and issued to the control module, so that safe and efficient behavior decision planning can be achieved.

Referring to fig. 8, a schematic structural diagram of a behavior decision device of a vehicle provided in a sixth embodiment of the present application is shown, and for convenience of explanation, only a portion related to the embodiment of the present application is shown.

The behavior decision device of the vehicle can specifically comprise the following modules:

an information obtaining module 801, configured to obtain driving state information of a first vehicle and a second vehicle at a current moment, where the first vehicle and the second vehicle are in a same traffic scene;

a decision determining module 802, configured to determine behavior decision information of the first vehicle according to the game model and the driving state information of the first vehicle and the second vehicle;

the behavior determining module 803 is configured to determine an execution behavior corresponding to the current time according to the behavior decision information, and drive the first vehicle to execute the corresponding execution behavior at the current time.

In the embodiment of the present application, the decision determining module 802 may specifically include the following sub-modules:

the calculation sub-module is used for inputting the running state information into the game model, and calculating utility values respectively corresponding to N behavior decision information of the first vehicle according to the running state information and the payment function by the game model, wherein N is an integer greater than zero;

the first information determining sub-module is used for determining the behavior decision information corresponding to the highest utility value as the behavior decision information of the first vehicle.

In the embodiment of the present application, in the case that the payment function is a first reward function, the utility value corresponds to a first reward value calculated by the first reward function, and the calculating submodule may specifically include the following units:

the first score calculating unit is used for calculating first score rewarding values corresponding to M execution behaviors of the first vehicle at the current moment respectively according to the running state information and the first rewarding function by the game model, wherein M is an integer larger than 1;

a first acquiring unit configured to acquire M kinds of running state information of a first vehicle at a next time point and running state information of a second vehicle at the next time point after M kinds of execution behaviors are executed respectively;

the first iterative calculation unit is used for iteratively calculating the combination of all execution behaviors of the first vehicle and the second vehicle from the current moment to the moment H, generating different tracks of the square H of M, and calculating a first rewarding value corresponding to each track according to the first rewarding value by the game model, wherein N is equal to the square H of M.

In this embodiment of the present application, the actual inference level of the first vehicle is a first preset inference level, where the inference level is used to indicate a rational driving level of the vehicle, and the behavior decision device of the vehicle further includes:

The first actual grade determining module is used for determining the actual reasoning grade of the second vehicle at the current moment according to the execution behaviors of the second vehicle at each predicted reasoning grade respectively corresponding to the second vehicle at the current moment and the actual execution behaviors of the second vehicle;

correspondingly, the decision determining module 802 may specifically include the following sub-modules:

the first decision determining sub-module is used for determining behavior decision information of the first vehicle at the current moment according to the actual reasoning level of the second vehicle at the current moment, the first preset reasoning level of the first vehicle, the game model and the running state information of the first vehicle and the second vehicle at the current moment.

In the embodiment of the present application, the actual grade determining module may specifically include the following sub-modules:

the behavior acquisition sub-module is used for acquiring the execution behaviors of the second vehicle at the current moment, which correspond to the prediction inference levels respectively;

the comparison sub-module is used for comparing each execution behavior with the actual execution behavior of the second vehicle at the current moment one by one;

the grade determining sub-module is used for determining that the predicted reasoning grade corresponding to the execution behavior identical to the actual execution behavior of the second vehicle is the actual reasoning grade of the second vehicle at the current moment.

In the embodiment of the present application, the actual grade determining module may specifically further include the following sub-modules:

the grade probability updating sub-module is used for updating the probability of each predicted reasoning grade according to the determined actual reasoning grade of the second vehicle at the last moment;

correspondingly, the behavior acquisition submodule specifically may include the following units:

the current behavior acquisition unit is used for acquiring the execution behaviors of the second vehicle at the current moment, which correspond to the prediction inference levels respectively, according to the probabilities of the prediction inference levels.

In the embodiment of the present application, in the case that the payment function is a second prize function, the utility value corresponds to a second prize value calculated by the second prize function, the second vehicle has K prediction inference levels, and the first decision determining submodule may specifically include the following units:

the second score calculating unit is used for calculating a second score value corresponding to the first vehicle after M execution behaviors are executed when the second vehicle is positioned at different prediction reasoning grades according to the running state information and the second scoring function by the game model;

a second acquisition unit configured to acquire M pieces of running state information of the first vehicle at a next time point after M pieces of execution behaviors are executed;

The second iterative calculation unit is used for iteratively calculating the combination of all execution behaviors of the first vehicle and the second vehicle from the current moment to the moment H, generating different tracks of the square H of M, and calculating a second rewarding value corresponding to each track according to a second rewarding value by the game model, wherein N is equal to the square H of M;

and the second information determining unit is used for determining the behavior decision information corresponding to the highest second rewarding value as the behavior decision information of the first vehicle at the current moment.

In this embodiment of the present application, the current time is an initial time point within a preset time period, the actual reasoning level of the first vehicle is a first preset reasoning level, and the behavior decision device of the vehicle may specifically further include the following modules:

the second actual grade determining module is used for determining the actual reasoning grade of the second vehicle at the initial time point according to the execution behaviors of the second vehicle at each predicted reasoning grade and the actual execution behaviors of the second vehicle at each predicted reasoning grade;

the second decision determining sub-module is used for determining behavior decision information of the first vehicle at any time point according to the actual reasoning level of the second vehicle at the initial time point, the first preset reasoning level of the first vehicle, the game model and the running state information of the first vehicle and the second vehicle at any time point.

In this embodiment of the present application, the actual inference level of the first vehicle is a first preset inference level, the actual inference level of the second vehicle is a second preset inference level, and the first preset inference level is greater than or equal to two, and the behavior decision device of the vehicle may specifically further include the following modules:

the first level behavior determining module is used for inputting the running state information and the first execution behavior corresponding to the first vehicle when the actual reasoning level is zero into the game model, and obtaining the second execution behavior corresponding to the second vehicle when the actual reasoning level is one by maximizing the utility value of the game model;

the second level behavior determining module is used for inputting the second execution behavior and the driving state information into the game model, and obtaining a third execution behavior corresponding to the first vehicle when the actual reasoning level is two by maximizing the utility value of the game model;

the circulation module is used for taking the third execution behavior as the first execution behavior, and circularly executing the steps of inputting the running state information and the first execution behavior into the game model and then sequentially increasing the actual reasoning level corresponding to each obtained execution behavior until the execution behavior corresponding to the first vehicle in the first preset reasoning level is obtained;

and the third decision determining sub-module is used for determining the behavior decision information of the first vehicle according to the corresponding execution behaviors of the first vehicle at the first preset reasoning level at each time point.

In the embodiment of the present application, the behavior decision device of the vehicle may specifically further include the following modules:

the travel map acquisition module is used for acquiring a first reference path of the first vehicle, a second reference path of the second vehicle and generating a travel map of the second vehicle running on the second reference path, wherein the travel map is expressed as a mapping relation between the running time and the travel of the second vehicle in the traffic scene;

the speed curve determining module is used for determining a speed curve of the first vehicle running on the first reference path according to the execution behaviors of the first vehicle, which correspond to each time point on the first reference path;

the driving module is used for driving the first vehicle to run on the first reference path according to the speed curve.

The behavior decision device of the vehicle provided in the embodiment of the present application may be applied to the foregoing method embodiment, and details refer to the description of the foregoing method embodiment, which is not repeated herein.

Fig. 9 is a schematic structural diagram of a terminal device according to a fourth embodiment of the present application. As shown in fig. 9, the terminal device 900 of this embodiment includes: at least one processor 910 (only one is shown in fig. 9), a memory 920 and a computer program 921 stored in the memory 920 and executable on the at least one processor 910, the processor 910 implementing the steps in the above-described embodiments of the behavior decision method of the vehicle when executing the computer program 921.

The terminal device 900 may be a desktop computer, a notebook computer, a palm computer, or other computing device. The terminal device may include, but is not limited to, a processor 910, a memory 920. It will be appreciated by those skilled in the art that fig. 9 is merely an example of a terminal device 900 and is not limiting of the terminal device 900, and may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input-output devices, network access devices, etc.

The processor 910 may be a central processing unit (Central Processing Unit, CPU), the processor 910 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 920 may in some embodiments be an internal storage unit of the terminal device 900, for example, a hard disk or a memory of the terminal device 900. The memory 920 may also be an external storage device of the terminal device 900, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 900 in other embodiments. Further, the memory 920 may also include both an internal storage unit and an external storage device of the terminal device 900. The memory 920 is used to store an operating system, application programs, boot loader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 920 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each method embodiment described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

The implementation of all or part of the flow of the method of the above embodiment may also be accomplished by a computer program product, which when run on a terminal device, causes the terminal device to perform the steps of the method embodiments described above.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting. Although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A behavior decision method of a vehicle, characterized in that the behavior decision method comprises:

2. The behavior decision method of claim 1, wherein the determining behavior decision information of the first vehicle according to a gaming model and the driving state information of the first vehicle and the second vehicle comprises:

inputting the driving state information into the game model, and calculating utility values respectively corresponding to N behavior decision information of the first vehicle by the game model according to the driving state information and a payment function, wherein N is an integer greater than zero;

3. The behavior decision method according to claim 2, wherein in the case where the payment function is a first bonus function, the utility value corresponds to a first bonus value calculated by the first bonus function, and the calculating, by the game model, utility values corresponding to N kinds of behavior decision information of the first vehicle according to the running state information and the payment function, respectively, includes:

Calculating first bonus points corresponding to M execution behaviors of the first vehicle at the current moment respectively according to the running state information and the first bonus function by the game model, wherein M is an integer larger than 1;

acquiring M types of running state information of the first vehicle at the next time point and the running state information of the second vehicle at the next time point after the M types of execution behaviors are respectively executed;

and iteratively calculating the combination of all execution behaviors of the first vehicle and the second vehicle from the current moment to the moment H to generate different tracks of the square H of M, and calculating a first bonus value corresponding to each track according to the first bonus value by the game model, wherein N is equal to the square H of M.

4. The behavioral decision method of claim 1 wherein the actual level of reasoning for the first vehicle is a first preset level of reasoning for indicating a rational level of travel of the vehicle, further comprising:

determining the actual reasoning grade of the second vehicle at the current moment according to the execution behaviors of the second vehicle at each predicted reasoning grade at the current moment and the actual execution behaviors of the second vehicle;

Correspondingly, the determining the behavior decision information of the first vehicle according to the game model and the running state information of the first vehicle and the second vehicle comprises:

and determining behavior decision information of the first vehicle at the current moment according to the actual reasoning grade of the second vehicle at the current moment, the first preset reasoning grade of the first vehicle, the game model and the running state information of the first vehicle and the second vehicle at the current moment.

5. The behavior decision method of claim 4, wherein the determining the actual inference level of the second vehicle at the current time according to the execution behavior of the second vehicle at each predicted inference level and the actual execution behavior of the second vehicle at the current time, respectively, comprises:

and determining the predicted reasoning grade corresponding to the execution behavior identical to the actual execution behavior of the second vehicle as the actual reasoning grade of the second vehicle at the current moment.

6. The behavioral decision method of claim 4 wherein, in the event that the payment function is a second reward function, the utility value corresponds to a second reward value calculated by the second reward function, the second vehicle having K predictive reasoning levels, the determining behavioral decision information for the first vehicle at the current time further comprising:

7. The behavior decision method of claim 1, wherein the current time is an initial point in time within a preset time period, the actual inference level of the first vehicle is a first preset inference level, the behavior decision method further comprising:

determining the actual reasoning level of the second vehicle at the initial time point according to the execution behavior of the second vehicle at each predicted reasoning level and the actual execution behavior of the second vehicle at each predicted reasoning level at the initial time point;

correspondingly, determining behavior decision information of the first vehicle according to a game model and the running state information of the first vehicle and the second vehicle;

determining behavior decision information of the first vehicle at any time point according to the actual reasoning grade of the second vehicle at the initial time point, the first preset reasoning grade of the first vehicle, the game model and running state information of the first vehicle and the second vehicle at any time point.

8. The behavioral decision method of claim 1 wherein the actual level of reasoning for the first vehicle is a first preset level of reasoning, the actual level of reasoning for the second vehicle is a second preset level of reasoning, the first preset level of reasoning being greater than or equal to two, the behavioral decision method further comprising:

Inputting the driving state information and the first execution behavior of the first vehicle corresponding to zero of the actual reasoning level into the game model, and obtaining the second execution behavior of the second vehicle corresponding to one of the actual reasoning levels by maximizing the utility value of the game model;

inputting the second execution behavior and the driving state information into the game model, and obtaining a third execution behavior corresponding to the first vehicle when the actual reasoning level is two by maximizing the utility value of the game model;

the third execution behavior is used as the first execution behavior, the steps of inputting the driving state information and the first execution behavior into the game model and then circularly executing are carried out, and the actual reasoning level corresponding to each obtained execution behavior is sequentially increased until the execution behavior corresponding to the first vehicle in the first preset reasoning level is obtained;

and determining behavior decision information of the first vehicle according to the corresponding execution behaviors of the first vehicle at the first preset reasoning level at each time point.

9. The behavioral decision method of claim 1 further comprising:

acquiring a first reference path of the first vehicle and a second reference path of the second vehicle, and generating a travel chart of the second vehicle traveling on the second reference path, wherein the travel chart is expressed as a mapping relation between the traveling time and the travel of the second vehicle in the traffic scene;

determining a speed curve of the first vehicle running on the first reference path according to the execution behaviors of the first vehicle, which correspond to each time point on the first reference path;

and driving the first vehicle to run on the first reference path according to the speed curve.

10. A behavior decision device of a vehicle, characterized in that the behavior decision device comprises:

11. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 9 when executing the computer program.

12. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 9.