CN112382165B

CN112382165B - Driving strategy generation method, device, medium, equipment and simulation system

Info

Publication number: CN112382165B
Application number: CN202011303762.0A
Authority: CN
Inventors: 吴伟; 段雄; 郎咸朋
Original assignee: Beijing Co Wheels Technology Co Ltd
Current assignee: Beijing Co Wheels Technology Co Ltd
Priority date: 2020-11-19
Filing date: 2020-11-19
Publication date: 2022-10-04
Anticipated expiration: 2040-11-19
Also published as: CN112382165A

Abstract

The disclosure relates to a driving strategy generation method, a driving strategy generation device, a driving strategy generation medium, a driving strategy generation device and a driving strategy generation equipment, and a simulation system, so as to optimize the simulation effect of the simulation system. The method comprises the following steps: acquiring target environment information and target vehicle information of a target simulation vehicle corresponding to a target simulation time from simulation data generated by a simulation system, wherein the target vehicle information comprises a target position of the target simulation vehicle at the target simulation time; acquiring target map information corresponding to the target simulation vehicle, wherein the target map information is taken from a high-precision map; inputting the target environment information, the target vehicle information and the target map information into a decision model to obtain a target control strategy output by the decision model, wherein the decision model is obtained by training in a reinforcement learning mode according to simulation data generated by the simulation system; and performing simulation control on the target simulation vehicle through the simulation system according to the target control strategy.

Description

Driving strategy generation method, device, medium, equipment and simulation system

Technical Field

The present disclosure relates to the field of simulation, and in particular, to a driving strategy generation method, device, medium, device, and simulation system.

Background

Currently, the mainstream simulation platforms (i.e., simulation systems) mainly have two major types, one type is generally applied to the simulation of vehicle dynamics models and functions, such as CANoe simulation mainly by Vector corporation, and the other type is scenario-based simulation, such as VTD simulation. Most of the simulation platforms provide simulation verification functions, and simulation verification is generally performed on a perception algorithm or a rule-based decision planning algorithm, however, a single simulation verification function cannot bring a good simulation effect to a simulation system, and meanwhile, the rule-based decision planning algorithm cannot be applied to various simulation scenes, and has certain limitations.

Disclosure of Invention

The invention aims to provide a driving strategy generation method, a driving strategy generation device, a driving strategy generation medium, driving strategy generation equipment and a simulation system so as to optimize the simulation effect of the simulation system.

In order to achieve the above object, according to a first aspect of the present disclosure, there is provided a driving strategy generation method applied to a simulation system, the method including:

acquiring target environment information and target vehicle information of a target simulation vehicle corresponding to a target simulation time from simulation data generated by a simulation system, wherein the target vehicle information comprises a target position of the target simulation vehicle at the target simulation time;

acquiring target map information corresponding to the target simulation vehicle, wherein the target map information is taken from a high-precision map;

inputting the target environment information, the target vehicle information and the target map information into a decision model to obtain a target control strategy output by the decision model, wherein the decision model is obtained by training in a reinforcement learning mode according to simulation data generated by the simulation system;

and performing simulation control on the target simulation vehicle through the simulation system according to the target control strategy.

Optionally, the target map information is acquired by:

determining a map area with a preset area including the target position in a high-precision map;

and taking the map information corresponding to the map area as the target map information.

Optionally, the decision model is obtained by:

acquiring first environmental information, first vehicle information and first map information of the target simulation vehicle corresponding to a first historical simulation moment from simulation data generated by the simulation system;

inputting the first environment information, the first vehicle information and the first map information into an initial model used in the training to obtain a first control strategy output by the initial model;

according to the first control strategy, performing simulation control on the target simulation vehicle through the simulation system;

acquiring a designated vehicle parameter corresponding to the target simulation vehicle at a second historical simulation time from simulation data generated by the simulation system, wherein the designated vehicle parameter is taken as a first actual vehicle parameter, and the second historical simulation time is a next simulation time of the first historical simulation time;

acquiring ideal vehicle parameters of the target simulation vehicle corresponding to a second historical simulation moment as first reference vehicle parameters;

determining a reward function value of the training according to the deviation degree between the first actual vehicle parameter and the first reference vehicle parameter;

and optimizing the initial model according to the reward function value to obtain the decision model.

Optionally, the performing simulation control on the target simulated vehicle through the simulation system according to the target control strategy includes:

determining a simulation result for performing simulation control on the target simulation vehicle through a vehicle dynamics model according to the target control strategy;

and according to the simulation result, generating environment information and vehicle information corresponding to the target simulation vehicle at the next simulation time of the target simulation time, and storing the environment information and the vehicle information into the simulation system.

Optionally, after the step of performing simulation control on the target simulated vehicle by the simulation system according to the target control strategy, the method further includes:

acquiring a designated vehicle parameter corresponding to the target simulation vehicle at the next simulation time of the target simulation time from simulation data generated by the simulation system, and taking the designated vehicle parameter as a second actual vehicle parameter;

acquiring ideal vehicle parameters of the target simulation vehicle corresponding to the next simulation time of the target simulation time, and taking the ideal vehicle parameters as second reference vehicle parameters;

and optimizing the decision model according to the deviation degree between the second actual vehicle parameter and the second reference vehicle parameter to obtain an optimized decision model.

Optionally, the specified vehicle parameter comprises at least one of: curvature, position, steering angle, distance from the surrounding vehicle.

Optionally, the target environment information includes: information of vehicles around the target simulated vehicle, information of pedestrians around the target simulated vehicle, road information around the target simulated vehicle, and obstacle information around the target simulated vehicle;

the target vehicle information further includes: the attitude of the target simulated vehicle;

the target control strategy comprises a control strategy for at least one of: steering wheel, throttle, brake.

According to a second aspect of the present disclosure, there is provided a driving strategy generation apparatus applied to a simulation system, the apparatus including:

the simulation system comprises a first acquisition module, a second acquisition module and a simulation processing module, wherein the first acquisition module is used for acquiring target environment information and target vehicle information of a target simulation vehicle corresponding to a target simulation time from simulation data generated by a simulation system, and the target vehicle information comprises a target position of the target simulation vehicle at the target simulation time;

the second acquisition module is used for acquiring target map information corresponding to the target simulation vehicle, and the target map information is taken from a high-precision map;

the decision module is used for inputting the target environment information, the target vehicle information and the target map information into a decision model and obtaining a target control strategy output by the decision model, wherein the decision model is obtained by training in a reinforcement learning mode according to simulation data generated by the simulation system;

and the simulation control module is used for performing simulation control on the target simulation vehicle through the simulation system according to the target control strategy.

Optionally, the second obtaining module includes:

the first determining submodule is used for determining a map area with a preset area including the target position in a high-precision map;

and the second determining submodule is used for taking the map information corresponding to the map area as the target map information.

Optionally, the decision model is obtained by:

inputting the first environment information, the first vehicle information and the first map information into an initial model used in the training, and obtaining a first control strategy output by the initial model;

Optionally, the simulation control module includes:

the third determining submodule is used for determining a simulation result of the simulation control of the target simulation vehicle through a vehicle dynamic model according to the target control strategy;

and the generation and storage submodule is used for generating environment information and vehicle information corresponding to the next simulation time of the target simulation vehicle at the target simulation time according to the simulation result and storing the environment information and the vehicle information into the simulation system.

Optionally, the apparatus further comprises:

a third obtaining module, configured to obtain, from simulation data generated by the simulation system, a specified vehicle parameter corresponding to a next simulation time of the target simulation vehicle at the target simulation time as a second actual vehicle parameter after the simulation control module performs simulation control on the target simulation vehicle through the simulation system according to the target control policy;

a fourth obtaining module, configured to obtain an ideal vehicle parameter of the target simulated vehicle at a next simulation time corresponding to the target simulation time, as a second reference vehicle parameter;

and the model optimization module is used for optimizing the decision model according to the deviation degree between the second actual vehicle parameter and the second reference vehicle parameter so as to obtain the optimized decision model.

Optionally, the specified vehicle parameters include at least one of: curvature, position, steering angle, distance from the surrounding vehicle.

According to a third aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect of the present disclosure.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to implement the steps of the method of the first aspect of the disclosure.

According to a fifth aspect of the present disclosure, there is provided a simulation system including the driving strategy generating device applied to the simulation system according to the second aspect of the present disclosure.

According to the technical scheme, the target environment information and the target vehicle information of the target simulation vehicle corresponding to the target simulation moment are obtained from the simulation data generated by the simulation system, the target map information corresponding to the target simulation vehicle is obtained, the target environment information, the target vehicle information and the target map information are input into the decision model, the target control strategy output by the decision model is obtained, and the target simulation vehicle is subjected to simulation control through the simulation system according to the target control strategy. The target map information is taken from a high-precision map, and the high-precision map contains abundant map information, so that the simulation effect is favorably improved. The decision model is obtained by training in a reinforcement learning manner based on simulation data generated by the simulation system. Therefore, the driving strategy generation of the simulation system can be assisted through reinforcement learning, the training of a decision model can be realized by utilizing a reinforcement learning mode based on the data of the simulation system, the simulation effect of the simulation system can be further improved, and the use scene of the simulation system is expanded.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:

FIG. 1 is a flow diagram of a driving strategy generation method for a simulation system provided in accordance with one embodiment of the present disclosure;

FIG. 2 is an exemplary schematic diagram of the structure of an initial model in a driving strategy generation method for a simulation system provided in accordance with the present disclosure;

FIG. 3 is a block diagram of a driving strategy generation apparatus applied to a simulation system provided according to an embodiment of the present disclosure;

FIG. 4 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

The following detailed description of the embodiments of the disclosure refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.

Fig. 1 is a flowchart of a driving strategy generation method for a simulation system provided according to one embodiment of the present disclosure. As shown in fig. 1, the method may include the steps of:

in step 11, acquiring target environment information and target vehicle information of a target simulation vehicle corresponding to a target simulation time from simulation data generated by a simulation system;

in step 12, target map information corresponding to the target simulated vehicle is acquired;

in step 13, target environment information, target vehicle information and target map information are input to a decision model, and a target control strategy output by the decision model is obtained, wherein the decision model is obtained by training in a reinforcement learning mode according to simulation data generated by a simulation system;

in step 14, the target simulation vehicle is subjected to simulation control through the simulation system according to the target control strategy.

In brief, in the process of executing a simulation task by a simulation system, simulation is performed at each simulation time for each simulation object (for example, a vehicle, a pedestrian, etc.) in the simulation system, and simulation data at each simulation time is generated. Thus, the simulation system is able to continually generate new simulation data for each simulation object contained therein for use. Thus, from the simulation data generated by the simulation system, the data required to execute the method, i.e., the target environment information and the target vehicle information of the target simulated vehicle corresponding to the target simulation time can be obtained.

The target environment information may include, but is not limited to, the following: the information of vehicles around the target simulated vehicle, the information of pedestrians around the target simulated vehicle, the information of roads around the target simulated vehicle, and the information of obstacles around the target simulated vehicle. The information of the target simulated vehicle-surrounding vehicle may include, for example, the position, the vehicle speed, the traveling direction, and the like of the target simulated vehicle-surrounding vehicle. The information of the pedestrian around the target simulation vehicle may include, for example, a walking state (e.g., whether or not to travel, etc.) of the pedestrian, a traveling direction, a traveling speed, and the like. The road information of the periphery of the target simulation vehicle may include, for example, lane line information, connection information between roads, attribute information of the roads (e.g., whether it is an intersection, etc.), traffic light information, and the like. The obstacle information of the periphery of the target simulated vehicle may include, for example, an obstacle state (e.g., whether to move, a moving direction, etc.), an obstacle position, and the like.

The environment model in automatic driving refers to description of the environment around the automatic driving vehicle at a certain time, and may include dynamic information, such as the position, speed, and driving direction of other vehicles, and the state of pedestrians, and also include static information, such as lane line information, road connection relationship, and road attributes, and the state of static obstacles (e.g., the state of traffic lights). Thus, for example, the target environment information may be directly obtained through the environment model of the simulation system.

The target vehicle information may include, but is not limited to: the position of the target simulated vehicle at the target simulation time (target position), and the attitude of the target simulated vehicle at the target simulation time. For example, the target vehicle information may be directly obtained by the environment model.

After the target environment information and the target vehicle information are acquired, step 12 may be executed to acquire target map information corresponding to the target simulated vehicle. The simulation system can be provided with a high-precision map, the high-precision map contains all map information related to a simulation scene, and the map information can comprise road information, obstacle information and the like. Because the high-precision map has abundant map information, the simulation is carried out based on the high-precision map, and a better simulation effect can be obtained.

In one possible embodiment, in step 12, the target map information corresponding to the target simulated vehicle may be all of the high-precision maps.

In another possible embodiment, the step 12 of obtaining the target map information corresponding to the target simulated vehicle may include the following steps:

determining a map area with a preset area including a target position in a high-precision map;

As previously described, the target vehicle information may include the position of the target simulated vehicle at the target simulation time, i.e., the target position. After the target position in the target vehicle information is determined in step 11, a map area having a preset area including the target position may be determined in the high-precision map further based on the target position, and the map information corresponding to the map area may be used as the target map information. The preset area may be defined by N × M, where M and N represent the distance. For example, the target position may be used as a center, and a corresponding map area may be selected according to a preset area, for example, a map area of 300m (meters) × 300m may be selected with the target position as the center, so as to determine the target map information.

By the method, the partial map area including the target position is selected, the map information corresponding to the partial map area is used as the target map information, the data processing complexity in the subsequent data processing process can be effectively reduced, and the data processing efficiency is effectively improved.

After the target environment information, the target vehicle information, and the target map information are obtained, step 13 may be executed to input the target environment information, the target vehicle information, and the target map information to the decision model to obtain the target control strategy output by the decision model.

The target environment information, the target vehicle information, and the target map information are input to the decision model, and the target environment information, the target vehicle information, and the target map information may be preprocessed to obtain a state vector corresponding to the target environment information, the target vehicle information, and the target map information, and the state vector is input to the decision model.

The decision model generates a corresponding output result, namely a target control strategy, according to the input content. Illustratively, the target control strategy includes a control strategy for at least one of: steering wheel, throttle, brake.

The decision model is obtained by training in a reinforcement learning mode according to simulation data generated by a simulation system. Reinforcement Learning (RL) is Learning in a "trial and error" manner, with a reward guidance behavior obtained by interacting with the environment with the goal of making the agent obtain the maximum reward.

In one possible embodiment, the decision model may be obtained by:

acquiring first environmental information, first vehicle information and first map information of a target simulation vehicle corresponding to a first historical simulation moment from simulation data generated by a simulation system;

inputting first environment information, first vehicle information and first map information into an initial model used in the training, and obtaining a first control strategy output by the initial model;

according to the first control strategy, performing simulation control on the target simulation vehicle through a simulation system;

acquiring a specified vehicle parameter corresponding to a target simulation vehicle at a second historical simulation time from simulation data generated by the simulation system as a first actual vehicle parameter, wherein the second historical simulation time is the next simulation time of the first historical simulation time;

acquiring ideal vehicle parameters of the target simulation vehicle corresponding to the second historical simulation moment as first reference vehicle parameters;

and optimizing the initial model according to the value of the reward function to obtain a decision model.

The obtaining manners of the first environment information, the first vehicle information, and the first map information are the same as the obtaining manners of the target environment information, the target vehicle information, and the target map information given in the foregoing, and the differences only exist in that the respective corresponding simulation times are different, so the obtaining manners of the first environment information, the first vehicle information, and the first map information are not described herein again.

The generation of the decision model requires multiple training, and in one training process, the initial model can be considered as the model used at the beginning of the training. At the beginning of training, a model is generally required to be created, the model is an initial model used for initial training, and in the subsequent training process, training can be performed step by step on the basis of the initially created model until a decision model is obtained.

In reinforcement learning training, a decision agent is generally designed by using a reinforcement learning algorithm, and subsequent model training is realized based on the decision agent. The following is a detailed description of the decision agent, i.e. the initial model, as applied in the method. As shown in fig. 2, the agent used in the method may comprise two networks, a decision network and an evaluation network. The decision network is used for externally outputting a control strategy (generally embodied as a decision vector) according to input contents, and the evaluation network is used for internal evaluation and evaluating the quality of the decision vector generated each time. In general, a state vector S, a reward function R, a set of decision vectors (action instructions) a may be defined for an agent. In the reinforcement learning training process, the state vector of the vehicle at the current moment is input to the intelligent agent, the decision vector output by the intelligent agent is obtained, the simulation system carries out simulation control according to the decision vector, the state of the vehicle can be changed, and further the actual state of the vehicle at the next moment is obtained, meanwhile, the vehicle also corresponds to an ideal state at the next moment, the deviation between the actual state and the ideal state can be known according to the actual state and the ideal state, so that the reward value of a reward function is generated, the network parameters in the intelligent agent are optimized and adjusted according to the reward value, the training is completed, and the decision model can be obtained through multiple times of training under the condition that the training end condition is met. The smaller the deviation between the actual state and the ideal state is, the closer the actual state is to the ideal state is, and correspondingly, the larger the reward value corresponding to the reward function is, so that the actual effect of the decision vector is effectively considered in the process of optimizing the intelligent agent by utilizing the reward value, and a better decision model is favorably obtained. In the method, as shown in fig. 2, the input of the agent is connected with the environment model and the high-precision map of the simulation system, the input data required at the current moment can be obtained in real time according to the data content of the environment map and the high-precision map, and a state vector is formed and used as the input of the agent, meanwhile, the output of the agent (namely, the decision network in the agent) is connected with the controller of the simulation system, and the control strategy (decision vector) output by the agent can be obtained by the controller, so that the controller can control the simulated vehicle based on the control strategy to perform the next simulation, meanwhile, the input of the evaluation network in the agent is also connected with the controller, and the evaluation network can be compared according to the actual simulation result and the ideal simulation situation, and the comparison result is used for updating the parameters in the agent. The above steps are repeated in a circulating way, and a decision model with a good effect can be obtained through multiple times of training.

Therefore, after the first environmental information, the first vehicle information and the first map information corresponding to the first historical simulation time are obtained, the information is input into an initial model (namely, the intelligent agent) used by the training, a first control strategy output by the initial model can be obtained, further, according to the first control strategy, the target simulation vehicle is subjected to simulation control through a simulation system, a series of new simulation data for the target simulation vehicle is generated, then, the specified vehicle parameters of the target simulation vehicle at the second historical simulation time (the next simulation time of the first simulation time) are obtained from the simulation data and serve as first actual vehicle parameters, meanwhile, the ideal vehicle parameters of the target simulation vehicle corresponding to the second historical simulation time are obtained and serve as first reference vehicle parameters, the reward function value of the training is determined according to the deviation degree between the first environmental information, the initial model is optimized according to the reward function value, and the decision model is obtained. Illustratively, the inverse of the deviation may be set as the value of the reward function.

Wherein the specified vehicle parameters may include, but are not limited to, at least one of: curvature, position, steering angle, distance from the surrounding vehicle. For example, an evaluation index for a specific vehicle parameter may be set in advance, and for example, a correspondence between a deviation between an actual vehicle parameter and an ideal vehicle parameter corresponding to the specific vehicle parameter and a reward function value may be preset, and the deviation may be quantified. For example, the smaller the distance from the nearby vehicle, the larger the corresponding bonus function value.

Therefore, according to the target environment information, the target vehicle information and the target map information, the corresponding target control strategy can be obtained through the decision model. Thereafter, step 14 may be performed.

Illustratively, step 14 may include the steps of:

determining a simulation result for performing simulation control on the target simulation vehicle through a vehicle dynamics model according to a target control strategy;

That is, the simulation system performs the next simulation on the target simulated vehicle according to the target control strategy and generates a series of simulation data, thereby updating the environmental model and the vehicle information of the target simulated vehicle at the next simulation time, and storing the information into the database of the simulation system.

According to the technical scheme, the target environment information and the target vehicle information of the target simulation vehicle corresponding to the target simulation moment are obtained from the simulation data generated by the simulation system, the target map information corresponding to the target simulation vehicle is obtained, the target environment information, the target vehicle information and the target map information are input into the decision model, the target control strategy output by the decision model is obtained, and the target simulation vehicle is subjected to simulation control through the simulation system according to the target control strategy. The decision model is obtained by training in a reinforcement learning mode according to simulation data generated by a simulation system. Therefore, the driving strategy generation of the simulation system can be assisted through reinforcement learning, the training of a decision model can be realized by utilizing a reinforcement learning mode based on the data of the simulation system, the simulation effect of the simulation system can be further improved, and the use scene of the simulation system is expanded.

On the basis of the scheme shown in fig. 1, after the target simulated vehicle is subjected to simulation control by the simulation system according to the target control strategy in step 14, the method provided by the present disclosure may further include the following steps:

acquiring a designated vehicle parameter corresponding to the next simulation time of the target simulation vehicle at the target simulation time from simulation data generated by the simulation system, and taking the designated vehicle parameter as a second actual vehicle parameter;

and optimizing the decision model according to the deviation degree between the second actual vehicle parameter and the second reference vehicle parameter to obtain the optimized decision model.

The training process is equivalent to further updating the decision model based on the current actual vehicle parameters, and the principle of the training process is the same as that of the single training process of the decision model, and is not repeated here.

Fig. 3 is a block diagram of a driving strategy generation apparatus applied to a simulation system according to an embodiment of the present disclosure, and as shown in fig. 3, the apparatus 30 may include:

a first obtaining module 31, configured to obtain, from simulation data generated by a simulation system, target environment information and target vehicle information of a target simulation vehicle corresponding to a target simulation time, where the target vehicle information includes a target position of the target simulation vehicle at the target simulation time;

a second obtaining module 32, configured to obtain target map information corresponding to the target simulated vehicle, where the target map information is obtained from a high-precision map;

a decision module 33, configured to input the target environment information, the target vehicle information, and the target map information into a decision model, and obtain a target control strategy output by the decision model, where the decision model is obtained by training in a reinforcement learning manner according to simulation data generated by the simulation system;

and the simulation control module 34 is configured to perform simulation control on the target simulated vehicle through the simulation system according to the target control strategy.

Optionally, the second obtaining module 32 includes:

Optionally, the decision model is obtained by:

optimizing the initial model according to the reward function value to obtain the decision model.

Optionally, the simulation control module 34 includes:

Optionally, the apparatus 30 further comprises:

the fourth acquisition module is used for acquiring ideal vehicle parameters of the target simulation vehicle at a next simulation moment corresponding to the target simulation moment, and the ideal vehicle parameters serve as second reference vehicle parameters;

Optionally, the target environment information includes: the information of vehicles around the target simulation vehicle, the information of pedestrians around the target simulation vehicle, the information of roads around the target simulation vehicle and the information of obstacles around the target simulation vehicle;

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 4 is a block diagram illustrating an electronic device 1900 according to an example embodiment. For example, electronic device 1900 may be provided as a simulation system. Referring to fig. 4, electronic device 1900 includes a processor 1922, which can be one or more in number, and memory 1932 for storing computer programs executable by processor 1922. The computer program stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processor 1922 may be configured to execute the computer program to perform the driving strategy generation method applied to the simulation system described above.

Additionally, the electronic device 1900 may also include a power component 1926 and a communication component 1950, the power component 1926 may be configured to perform power management for the electronic device 1900, and the communication component 1950 may be configured to enable communication for the electronic device 1900, e.g., wired or wireless communication. In addition, the electronic device 1900 may also include input/output (I/O) interfaces 1958. The electronic device 1900 may operate based on an operating system, such as Windows Server, stored in memory 1932 ^TM ，Mac OS X ^TM ，Unix ^TM ，Linux ^TM And so on.

In another exemplary embodiment, there is also provided a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the driving strategy generation method applied to the simulation system described above. For example, the computer readable storage medium may be the memory 1932 described above including program instructions executable by the processor 1922 of the electronic device 1900 to perform the driving maneuver generation method described above as applied to the simulation system.

In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-described driving strategy generation method applied to a simulation system when executed by the programmable apparatus.

The present disclosure further provides a simulation system, which includes the driving strategy generating device applied to the simulation system according to any embodiment of the present disclosure.

The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.

It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. To avoid unnecessary repetition, the disclosure does not separately describe various possible combinations.

In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims

1. A driving strategy generation method applied to a simulation system is characterized by comprising the following steps:

according to the target control strategy, performing simulation control on the target simulation vehicle through the simulation system;

wherein the decision model is obtained by:

2. The method according to claim 1, wherein the target map information is acquired by:

3. The method of claim 1, wherein said simulation controlling the target simulated vehicle by the simulation system according to the target control strategy comprises:

4. The method of claim 1, wherein after the step of providing simulated control of the target simulated vehicle by the simulation system in accordance with the target control strategy, the method further comprises:

acquiring an ideal vehicle parameter of the target simulation vehicle corresponding to the next simulation moment of the target simulation moment as a second reference vehicle parameter;

5. The method of claim 1 or 4, wherein specifying vehicle parameters comprises at least one of: curvature, position, steering angle, distance from the surrounding vehicle.

6. The method of any of claims 1-4, wherein the target environment information comprises: information of vehicles around the target simulated vehicle, information of pedestrians around the target simulated vehicle, road information around the target simulated vehicle, and obstacle information around the target simulated vehicle;

7. A driving strategy generation apparatus applied to a simulation system, the apparatus comprising:

the simulation control module is used for performing simulation control on the target simulation vehicle through the simulation system according to the target control strategy;

wherein the decision model is obtained by:

8. The apparatus of claim 7, wherein the second obtaining module comprises:

9. The apparatus of claim 7, wherein the simulation control module comprises:

10. The apparatus of claim 7, further comprising:

11. The apparatus of claim 7 or 10, wherein specifying a vehicle parameter comprises at least one of: curvature, position, steering angle, distance from the surrounding vehicle.

12. The apparatus according to any one of claims 7-10, wherein the target environment information comprises: the information of vehicles around the target simulation vehicle, the information of pedestrians around the target simulation vehicle, the information of roads around the target simulation vehicle and the information of obstacles around the target simulation vehicle;

13. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.

14. An electronic device, comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 6.

15. A simulation system, characterized by comprising a driving strategy generating device applied to the simulation system of any one of claims 7-12.