CN112382165B - Driving strategy generation method, device, medium, equipment and simulation system - Google Patents

Driving strategy generation method, device, medium, equipment and simulation system Download PDF

Info

Publication number
CN112382165B
CN112382165B CN202011303762.0A CN202011303762A CN112382165B CN 112382165 B CN112382165 B CN 112382165B CN 202011303762 A CN202011303762 A CN 202011303762A CN 112382165 B CN112382165 B CN 112382165B
Authority
CN
China
Prior art keywords
simulation
target
vehicle
information
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011303762.0A
Other languages
Chinese (zh)
Other versions
CN112382165A (en
Inventor
吴伟
段雄
郎咸朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Co Wheels Technology Co Ltd
Original Assignee
Beijing Co Wheels Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Co Wheels Technology Co Ltd filed Critical Beijing Co Wheels Technology Co Ltd
Priority to CN202011303762.0A priority Critical patent/CN112382165B/en
Publication of CN112382165A publication Critical patent/CN112382165A/en
Application granted granted Critical
Publication of CN112382165B publication Critical patent/CN112382165B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B9/00Simulators for teaching or training purposes
    • G09B9/02Simulators for teaching or training purposes for teaching control of vehicles or other craft
    • G09B9/04Simulators for teaching or training purposes for teaching control of vehicles or other craft for teaching control of land vehicles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/14Force analysis or force optimisation, e.g. static or dynamic forces
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Traffic Control Systems (AREA)

Abstract

The disclosure relates to a driving strategy generation method, a driving strategy generation device, a driving strategy generation medium, a driving strategy generation device and a driving strategy generation equipment, and a simulation system, so as to optimize the simulation effect of the simulation system. The method comprises the following steps: acquiring target environment information and target vehicle information of a target simulation vehicle corresponding to a target simulation time from simulation data generated by a simulation system, wherein the target vehicle information comprises a target position of the target simulation vehicle at the target simulation time; acquiring target map information corresponding to the target simulation vehicle, wherein the target map information is taken from a high-precision map; inputting the target environment information, the target vehicle information and the target map information into a decision model to obtain a target control strategy output by the decision model, wherein the decision model is obtained by training in a reinforcement learning mode according to simulation data generated by the simulation system; and performing simulation control on the target simulation vehicle through the simulation system according to the target control strategy.

Description

Driving strategy generation method, device, medium, equipment and simulation system
Technical Field
The present disclosure relates to the field of simulation, and in particular, to a driving strategy generation method, device, medium, device, and simulation system.
Background
Currently, the mainstream simulation platforms (i.e., simulation systems) mainly have two major types, one type is generally applied to the simulation of vehicle dynamics models and functions, such as CANoe simulation mainly by Vector corporation, and the other type is scenario-based simulation, such as VTD simulation. Most of the simulation platforms provide simulation verification functions, and simulation verification is generally performed on a perception algorithm or a rule-based decision planning algorithm, however, a single simulation verification function cannot bring a good simulation effect to a simulation system, and meanwhile, the rule-based decision planning algorithm cannot be applied to various simulation scenes, and has certain limitations.
Disclosure of Invention
The invention aims to provide a driving strategy generation method, a driving strategy generation device, a driving strategy generation medium, driving strategy generation equipment and a simulation system so as to optimize the simulation effect of the simulation system.
In order to achieve the above object, according to a first aspect of the present disclosure, there is provided a driving strategy generation method applied to a simulation system, the method including:
acquiring target environment information and target vehicle information of a target simulation vehicle corresponding to a target simulation time from simulation data generated by a simulation system, wherein the target vehicle information comprises a target position of the target simulation vehicle at the target simulation time;
acquiring target map information corresponding to the target simulation vehicle, wherein the target map information is taken from a high-precision map;
inputting the target environment information, the target vehicle information and the target map information into a decision model to obtain a target control strategy output by the decision model, wherein the decision model is obtained by training in a reinforcement learning mode according to simulation data generated by the simulation system;
and performing simulation control on the target simulation vehicle through the simulation system according to the target control strategy.
Optionally, the target map information is acquired by:
determining a map area with a preset area including the target position in a high-precision map;
and taking the map information corresponding to the map area as the target map information.
Optionally, the decision model is obtained by:
acquiring first environmental information, first vehicle information and first map information of the target simulation vehicle corresponding to a first historical simulation moment from simulation data generated by the simulation system;
inputting the first environment information, the first vehicle information and the first map information into an initial model used in the training to obtain a first control strategy output by the initial model;
according to the first control strategy, performing simulation control on the target simulation vehicle through the simulation system;
acquiring a designated vehicle parameter corresponding to the target simulation vehicle at a second historical simulation time from simulation data generated by the simulation system, wherein the designated vehicle parameter is taken as a first actual vehicle parameter, and the second historical simulation time is a next simulation time of the first historical simulation time;
acquiring ideal vehicle parameters of the target simulation vehicle corresponding to a second historical simulation moment as first reference vehicle parameters;
determining a reward function value of the training according to the deviation degree between the first actual vehicle parameter and the first reference vehicle parameter;
and optimizing the initial model according to the reward function value to obtain the decision model.
Optionally, the performing simulation control on the target simulated vehicle through the simulation system according to the target control strategy includes:
determining a simulation result for performing simulation control on the target simulation vehicle through a vehicle dynamics model according to the target control strategy;
and according to the simulation result, generating environment information and vehicle information corresponding to the target simulation vehicle at the next simulation time of the target simulation time, and storing the environment information and the vehicle information into the simulation system.
Optionally, after the step of performing simulation control on the target simulated vehicle by the simulation system according to the target control strategy, the method further includes:
acquiring a designated vehicle parameter corresponding to the target simulation vehicle at the next simulation time of the target simulation time from simulation data generated by the simulation system, and taking the designated vehicle parameter as a second actual vehicle parameter;
acquiring ideal vehicle parameters of the target simulation vehicle corresponding to the next simulation time of the target simulation time, and taking the ideal vehicle parameters as second reference vehicle parameters;
and optimizing the decision model according to the deviation degree between the second actual vehicle parameter and the second reference vehicle parameter to obtain an optimized decision model.
Optionally, the specified vehicle parameter comprises at least one of: curvature, position, steering angle, distance from the surrounding vehicle.
Optionally, the target environment information includes: information of vehicles around the target simulated vehicle, information of pedestrians around the target simulated vehicle, road information around the target simulated vehicle, and obstacle information around the target simulated vehicle;
the target vehicle information further includes: the attitude of the target simulated vehicle;
the target control strategy comprises a control strategy for at least one of: steering wheel, throttle, brake.
According to a second aspect of the present disclosure, there is provided a driving strategy generation apparatus applied to a simulation system, the apparatus including:
the simulation system comprises a first acquisition module, a second acquisition module and a simulation processing module, wherein the first acquisition module is used for acquiring target environment information and target vehicle information of a target simulation vehicle corresponding to a target simulation time from simulation data generated by a simulation system, and the target vehicle information comprises a target position of the target simulation vehicle at the target simulation time;
the second acquisition module is used for acquiring target map information corresponding to the target simulation vehicle, and the target map information is taken from a high-precision map;
the decision module is used for inputting the target environment information, the target vehicle information and the target map information into a decision model and obtaining a target control strategy output by the decision model, wherein the decision model is obtained by training in a reinforcement learning mode according to simulation data generated by the simulation system;
and the simulation control module is used for performing simulation control on the target simulation vehicle through the simulation system according to the target control strategy.
Optionally, the second obtaining module includes:
the first determining submodule is used for determining a map area with a preset area including the target position in a high-precision map;
and the second determining submodule is used for taking the map information corresponding to the map area as the target map information.
Optionally, the decision model is obtained by:
acquiring first environmental information, first vehicle information and first map information of the target simulation vehicle corresponding to a first historical simulation moment from simulation data generated by the simulation system;
inputting the first environment information, the first vehicle information and the first map information into an initial model used in the training, and obtaining a first control strategy output by the initial model;
according to the first control strategy, performing simulation control on the target simulation vehicle through the simulation system;
acquiring a designated vehicle parameter corresponding to the target simulation vehicle at a second historical simulation time from simulation data generated by the simulation system, wherein the designated vehicle parameter is taken as a first actual vehicle parameter, and the second historical simulation time is a next simulation time of the first historical simulation time;
acquiring ideal vehicle parameters of the target simulation vehicle corresponding to a second historical simulation moment as first reference vehicle parameters;
determining a reward function value of the training according to the deviation degree between the first actual vehicle parameter and the first reference vehicle parameter;
and optimizing the initial model according to the reward function value to obtain the decision model.
Optionally, the simulation control module includes:
the third determining submodule is used for determining a simulation result of the simulation control of the target simulation vehicle through a vehicle dynamic model according to the target control strategy;
and the generation and storage submodule is used for generating environment information and vehicle information corresponding to the next simulation time of the target simulation vehicle at the target simulation time according to the simulation result and storing the environment information and the vehicle information into the simulation system.
Optionally, the apparatus further comprises:
a third obtaining module, configured to obtain, from simulation data generated by the simulation system, a specified vehicle parameter corresponding to a next simulation time of the target simulation vehicle at the target simulation time as a second actual vehicle parameter after the simulation control module performs simulation control on the target simulation vehicle through the simulation system according to the target control policy;
a fourth obtaining module, configured to obtain an ideal vehicle parameter of the target simulated vehicle at a next simulation time corresponding to the target simulation time, as a second reference vehicle parameter;
and the model optimization module is used for optimizing the decision model according to the deviation degree between the second actual vehicle parameter and the second reference vehicle parameter so as to obtain the optimized decision model.
Optionally, the specified vehicle parameters include at least one of: curvature, position, steering angle, distance from the surrounding vehicle.
Optionally, the target environment information includes: information of vehicles around the target simulated vehicle, information of pedestrians around the target simulated vehicle, road information around the target simulated vehicle, and obstacle information around the target simulated vehicle;
the target vehicle information further includes: the attitude of the target simulated vehicle;
the target control strategy comprises a control strategy for at least one of: steering wheel, throttle, brake.
According to a third aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect of the present disclosure.
According to a fourth aspect of the present disclosure, there is provided an electronic device comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the steps of the method of the first aspect of the disclosure.
According to a fifth aspect of the present disclosure, there is provided a simulation system including the driving strategy generating device applied to the simulation system according to the second aspect of the present disclosure.
According to the technical scheme, the target environment information and the target vehicle information of the target simulation vehicle corresponding to the target simulation moment are obtained from the simulation data generated by the simulation system, the target map information corresponding to the target simulation vehicle is obtained, the target environment information, the target vehicle information and the target map information are input into the decision model, the target control strategy output by the decision model is obtained, and the target simulation vehicle is subjected to simulation control through the simulation system according to the target control strategy. The target map information is taken from a high-precision map, and the high-precision map contains abundant map information, so that the simulation effect is favorably improved. The decision model is obtained by training in a reinforcement learning manner based on simulation data generated by the simulation system. Therefore, the driving strategy generation of the simulation system can be assisted through reinforcement learning, the training of a decision model can be realized by utilizing a reinforcement learning mode based on the data of the simulation system, the simulation effect of the simulation system can be further improved, and the use scene of the simulation system is expanded.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
FIG. 1 is a flow diagram of a driving strategy generation method for a simulation system provided in accordance with one embodiment of the present disclosure;
FIG. 2 is an exemplary schematic diagram of the structure of an initial model in a driving strategy generation method for a simulation system provided in accordance with the present disclosure;
FIG. 3 is a block diagram of a driving strategy generation apparatus applied to a simulation system provided according to an embodiment of the present disclosure;
FIG. 4 is a block diagram illustrating an electronic device in accordance with an example embodiment.
Detailed Description
The following detailed description of the embodiments of the disclosure refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
Fig. 1 is a flowchart of a driving strategy generation method for a simulation system provided according to one embodiment of the present disclosure. As shown in fig. 1, the method may include the steps of:
in step 11, acquiring target environment information and target vehicle information of a target simulation vehicle corresponding to a target simulation time from simulation data generated by a simulation system;
in step 12, target map information corresponding to the target simulated vehicle is acquired;
in step 13, target environment information, target vehicle information and target map information are input to a decision model, and a target control strategy output by the decision model is obtained, wherein the decision model is obtained by training in a reinforcement learning mode according to simulation data generated by a simulation system;
in step 14, the target simulation vehicle is subjected to simulation control through the simulation system according to the target control strategy.
In brief, in the process of executing a simulation task by a simulation system, simulation is performed at each simulation time for each simulation object (for example, a vehicle, a pedestrian, etc.) in the simulation system, and simulation data at each simulation time is generated. Thus, the simulation system is able to continually generate new simulation data for each simulation object contained therein for use. Thus, from the simulation data generated by the simulation system, the data required to execute the method, i.e., the target environment information and the target vehicle information of the target simulated vehicle corresponding to the target simulation time can be obtained.
The target environment information may include, but is not limited to, the following: the information of vehicles around the target simulated vehicle, the information of pedestrians around the target simulated vehicle, the information of roads around the target simulated vehicle, and the information of obstacles around the target simulated vehicle. The information of the target simulated vehicle-surrounding vehicle may include, for example, the position, the vehicle speed, the traveling direction, and the like of the target simulated vehicle-surrounding vehicle. The information of the pedestrian around the target simulation vehicle may include, for example, a walking state (e.g., whether or not to travel, etc.) of the pedestrian, a traveling direction, a traveling speed, and the like. The road information of the periphery of the target simulation vehicle may include, for example, lane line information, connection information between roads, attribute information of the roads (e.g., whether it is an intersection, etc.), traffic light information, and the like. The obstacle information of the periphery of the target simulated vehicle may include, for example, an obstacle state (e.g., whether to move, a moving direction, etc.), an obstacle position, and the like.
The environment model in automatic driving refers to description of the environment around the automatic driving vehicle at a certain time, and may include dynamic information, such as the position, speed, and driving direction of other vehicles, and the state of pedestrians, and also include static information, such as lane line information, road connection relationship, and road attributes, and the state of static obstacles (e.g., the state of traffic lights). Thus, for example, the target environment information may be directly obtained through the environment model of the simulation system.
The target vehicle information may include, but is not limited to: the position of the target simulated vehicle at the target simulation time (target position), and the attitude of the target simulated vehicle at the target simulation time. For example, the target vehicle information may be directly obtained by the environment model.
After the target environment information and the target vehicle information are acquired, step 12 may be executed to acquire target map information corresponding to the target simulated vehicle. The simulation system can be provided with a high-precision map, the high-precision map contains all map information related to a simulation scene, and the map information can comprise road information, obstacle information and the like. Because the high-precision map has abundant map information, the simulation is carried out based on the high-precision map, and a better simulation effect can be obtained.
In one possible embodiment, in step 12, the target map information corresponding to the target simulated vehicle may be all of the high-precision maps.
In another possible embodiment, the step 12 of obtaining the target map information corresponding to the target simulated vehicle may include the following steps:
determining a map area with a preset area including a target position in a high-precision map;
and taking the map information corresponding to the map area as the target map information.
As previously described, the target vehicle information may include the position of the target simulated vehicle at the target simulation time, i.e., the target position. After the target position in the target vehicle information is determined in step 11, a map area having a preset area including the target position may be determined in the high-precision map further based on the target position, and the map information corresponding to the map area may be used as the target map information. The preset area may be defined by N × M, where M and N represent the distance. For example, the target position may be used as a center, and a corresponding map area may be selected according to a preset area, for example, a map area of 300m (meters) × 300m may be selected with the target position as the center, so as to determine the target map information.
By the method, the partial map area including the target position is selected, the map information corresponding to the partial map area is used as the target map information, the data processing complexity in the subsequent data processing process can be effectively reduced, and the data processing efficiency is effectively improved.
After the target environment information, the target vehicle information, and the target map information are obtained, step 13 may be executed to input the target environment information, the target vehicle information, and the target map information to the decision model to obtain the target control strategy output by the decision model.
The target environment information, the target vehicle information, and the target map information are input to the decision model, and the target environment information, the target vehicle information, and the target map information may be preprocessed to obtain a state vector corresponding to the target environment information, the target vehicle information, and the target map information, and the state vector is input to the decision model.
The decision model generates a corresponding output result, namely a target control strategy, according to the input content. Illustratively, the target control strategy includes a control strategy for at least one of: steering wheel, throttle, brake.
The decision model is obtained by training in a reinforcement learning mode according to simulation data generated by a simulation system. Reinforcement Learning (RL) is Learning in a "trial and error" manner, with a reward guidance behavior obtained by interacting with the environment with the goal of making the agent obtain the maximum reward.
In one possible embodiment, the decision model may be obtained by:
acquiring first environmental information, first vehicle information and first map information of a target simulation vehicle corresponding to a first historical simulation moment from simulation data generated by a simulation system;
inputting first environment information, first vehicle information and first map information into an initial model used in the training, and obtaining a first control strategy output by the initial model;
according to the first control strategy, performing simulation control on the target simulation vehicle through a simulation system;
acquiring a specified vehicle parameter corresponding to a target simulation vehicle at a second historical simulation time from simulation data generated by the simulation system as a first actual vehicle parameter, wherein the second historical simulation time is the next simulation time of the first historical simulation time;
acquiring ideal vehicle parameters of the target simulation vehicle corresponding to the second historical simulation moment as first reference vehicle parameters;
determining a reward function value of the training according to the deviation degree between the first actual vehicle parameter and the first reference vehicle parameter;
and optimizing the initial model according to the value of the reward function to obtain a decision model.
The obtaining manners of the first environment information, the first vehicle information, and the first map information are the same as the obtaining manners of the target environment information, the target vehicle information, and the target map information given in the foregoing, and the differences only exist in that the respective corresponding simulation times are different, so the obtaining manners of the first environment information, the first vehicle information, and the first map information are not described herein again.
The generation of the decision model requires multiple training, and in one training process, the initial model can be considered as the model used at the beginning of the training. At the beginning of training, a model is generally required to be created, the model is an initial model used for initial training, and in the subsequent training process, training can be performed step by step on the basis of the initially created model until a decision model is obtained.
In reinforcement learning training, a decision agent is generally designed by using a reinforcement learning algorithm, and subsequent model training is realized based on the decision agent. The following is a detailed description of the decision agent, i.e. the initial model, as applied in the method. As shown in fig. 2, the agent used in the method may comprise two networks, a decision network and an evaluation network. The decision network is used for externally outputting a control strategy (generally embodied as a decision vector) according to input contents, and the evaluation network is used for internal evaluation and evaluating the quality of the decision vector generated each time. In general, a state vector S, a reward function R, a set of decision vectors (action instructions) a may be defined for an agent. In the reinforcement learning training process, the state vector of the vehicle at the current moment is input to the intelligent agent, the decision vector output by the intelligent agent is obtained, the simulation system carries out simulation control according to the decision vector, the state of the vehicle can be changed, and further the actual state of the vehicle at the next moment is obtained, meanwhile, the vehicle also corresponds to an ideal state at the next moment, the deviation between the actual state and the ideal state can be known according to the actual state and the ideal state, so that the reward value of a reward function is generated, the network parameters in the intelligent agent are optimized and adjusted according to the reward value, the training is completed, and the decision model can be obtained through multiple times of training under the condition that the training end condition is met. The smaller the deviation between the actual state and the ideal state is, the closer the actual state is to the ideal state is, and correspondingly, the larger the reward value corresponding to the reward function is, so that the actual effect of the decision vector is effectively considered in the process of optimizing the intelligent agent by utilizing the reward value, and a better decision model is favorably obtained. In the method, as shown in fig. 2, the input of the agent is connected with the environment model and the high-precision map of the simulation system, the input data required at the current moment can be obtained in real time according to the data content of the environment map and the high-precision map, and a state vector is formed and used as the input of the agent, meanwhile, the output of the agent (namely, the decision network in the agent) is connected with the controller of the simulation system, and the control strategy (decision vector) output by the agent can be obtained by the controller, so that the controller can control the simulated vehicle based on the control strategy to perform the next simulation, meanwhile, the input of the evaluation network in the agent is also connected with the controller, and the evaluation network can be compared according to the actual simulation result and the ideal simulation situation, and the comparison result is used for updating the parameters in the agent. The above steps are repeated in a circulating way, and a decision model with a good effect can be obtained through multiple times of training.
Therefore, after the first environmental information, the first vehicle information and the first map information corresponding to the first historical simulation time are obtained, the information is input into an initial model (namely, the intelligent agent) used by the training, a first control strategy output by the initial model can be obtained, further, according to the first control strategy, the target simulation vehicle is subjected to simulation control through a simulation system, a series of new simulation data for the target simulation vehicle is generated, then, the specified vehicle parameters of the target simulation vehicle at the second historical simulation time (the next simulation time of the first simulation time) are obtained from the simulation data and serve as first actual vehicle parameters, meanwhile, the ideal vehicle parameters of the target simulation vehicle corresponding to the second historical simulation time are obtained and serve as first reference vehicle parameters, the reward function value of the training is determined according to the deviation degree between the first environmental information, the initial model is optimized according to the reward function value, and the decision model is obtained. Illustratively, the inverse of the deviation may be set as the value of the reward function.
Wherein the specified vehicle parameters may include, but are not limited to, at least one of: curvature, position, steering angle, distance from the surrounding vehicle. For example, an evaluation index for a specific vehicle parameter may be set in advance, and for example, a correspondence between a deviation between an actual vehicle parameter and an ideal vehicle parameter corresponding to the specific vehicle parameter and a reward function value may be preset, and the deviation may be quantified. For example, the smaller the distance from the nearby vehicle, the larger the corresponding bonus function value.
Therefore, according to the target environment information, the target vehicle information and the target map information, the corresponding target control strategy can be obtained through the decision model. Thereafter, step 14 may be performed.
Illustratively, step 14 may include the steps of:
determining a simulation result for performing simulation control on the target simulation vehicle through a vehicle dynamics model according to a target control strategy;
and according to the simulation result, generating environment information and vehicle information corresponding to the target simulation vehicle at the next simulation time of the target simulation time, and storing the environment information and the vehicle information into the simulation system.
That is, the simulation system performs the next simulation on the target simulated vehicle according to the target control strategy and generates a series of simulation data, thereby updating the environmental model and the vehicle information of the target simulated vehicle at the next simulation time, and storing the information into the database of the simulation system.
According to the technical scheme, the target environment information and the target vehicle information of the target simulation vehicle corresponding to the target simulation moment are obtained from the simulation data generated by the simulation system, the target map information corresponding to the target simulation vehicle is obtained, the target environment information, the target vehicle information and the target map information are input into the decision model, the target control strategy output by the decision model is obtained, and the target simulation vehicle is subjected to simulation control through the simulation system according to the target control strategy. The decision model is obtained by training in a reinforcement learning mode according to simulation data generated by a simulation system. Therefore, the driving strategy generation of the simulation system can be assisted through reinforcement learning, the training of a decision model can be realized by utilizing a reinforcement learning mode based on the data of the simulation system, the simulation effect of the simulation system can be further improved, and the use scene of the simulation system is expanded.
On the basis of the scheme shown in fig. 1, after the target simulated vehicle is subjected to simulation control by the simulation system according to the target control strategy in step 14, the method provided by the present disclosure may further include the following steps:
acquiring a designated vehicle parameter corresponding to the next simulation time of the target simulation vehicle at the target simulation time from simulation data generated by the simulation system, and taking the designated vehicle parameter as a second actual vehicle parameter;
acquiring ideal vehicle parameters of the target simulation vehicle corresponding to the next simulation time of the target simulation time, and taking the ideal vehicle parameters as second reference vehicle parameters;
and optimizing the decision model according to the deviation degree between the second actual vehicle parameter and the second reference vehicle parameter to obtain the optimized decision model.
The training process is equivalent to further updating the decision model based on the current actual vehicle parameters, and the principle of the training process is the same as that of the single training process of the decision model, and is not repeated here.
Fig. 3 is a block diagram of a driving strategy generation apparatus applied to a simulation system according to an embodiment of the present disclosure, and as shown in fig. 3, the apparatus 30 may include:
a first obtaining module 31, configured to obtain, from simulation data generated by a simulation system, target environment information and target vehicle information of a target simulation vehicle corresponding to a target simulation time, where the target vehicle information includes a target position of the target simulation vehicle at the target simulation time;
a second obtaining module 32, configured to obtain target map information corresponding to the target simulated vehicle, where the target map information is obtained from a high-precision map;
a decision module 33, configured to input the target environment information, the target vehicle information, and the target map information into a decision model, and obtain a target control strategy output by the decision model, where the decision model is obtained by training in a reinforcement learning manner according to simulation data generated by the simulation system;
and the simulation control module 34 is configured to perform simulation control on the target simulated vehicle through the simulation system according to the target control strategy.
Optionally, the second obtaining module 32 includes:
the first determining submodule is used for determining a map area with a preset area including the target position in a high-precision map;
and the second determining submodule is used for taking the map information corresponding to the map area as the target map information.
Optionally, the decision model is obtained by:
acquiring first environmental information, first vehicle information and first map information of the target simulation vehicle corresponding to a first historical simulation moment from simulation data generated by the simulation system;
inputting the first environment information, the first vehicle information and the first map information into an initial model used in the training to obtain a first control strategy output by the initial model;
according to the first control strategy, performing simulation control on the target simulation vehicle through the simulation system;
acquiring a designated vehicle parameter corresponding to the target simulation vehicle at a second historical simulation time from simulation data generated by the simulation system, wherein the designated vehicle parameter is taken as a first actual vehicle parameter, and the second historical simulation time is a next simulation time of the first historical simulation time;
acquiring ideal vehicle parameters of the target simulation vehicle corresponding to a second historical simulation moment as first reference vehicle parameters;
determining a reward function value of the training according to the deviation degree between the first actual vehicle parameter and the first reference vehicle parameter;
optimizing the initial model according to the reward function value to obtain the decision model.
Optionally, the simulation control module 34 includes:
the third determining submodule is used for determining a simulation result of the simulation control of the target simulation vehicle through a vehicle dynamic model according to the target control strategy;
and the generation and storage submodule is used for generating environment information and vehicle information corresponding to the next simulation time of the target simulation vehicle at the target simulation time according to the simulation result and storing the environment information and the vehicle information into the simulation system.
Optionally, the apparatus 30 further comprises:
a third obtaining module, configured to obtain, from simulation data generated by the simulation system, a specified vehicle parameter corresponding to a next simulation time of the target simulation vehicle at the target simulation time as a second actual vehicle parameter after the simulation control module performs simulation control on the target simulation vehicle through the simulation system according to the target control policy;
the fourth acquisition module is used for acquiring ideal vehicle parameters of the target simulation vehicle at a next simulation moment corresponding to the target simulation moment, and the ideal vehicle parameters serve as second reference vehicle parameters;
and the model optimization module is used for optimizing the decision model according to the deviation degree between the second actual vehicle parameter and the second reference vehicle parameter so as to obtain the optimized decision model.
Optionally, the specified vehicle parameters include at least one of: curvature, position, steering angle, distance from the surrounding vehicle.
Optionally, the target environment information includes: the information of vehicles around the target simulation vehicle, the information of pedestrians around the target simulation vehicle, the information of roads around the target simulation vehicle and the information of obstacles around the target simulation vehicle;
the target vehicle information further includes: the attitude of the target simulated vehicle;
the target control strategy comprises a control strategy for at least one of: steering wheel, throttle, brake.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 4 is a block diagram illustrating an electronic device 1900 according to an example embodiment. For example, electronic device 1900 may be provided as a simulation system. Referring to fig. 4, electronic device 1900 includes a processor 1922, which can be one or more in number, and memory 1932 for storing computer programs executable by processor 1922. The computer program stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processor 1922 may be configured to execute the computer program to perform the driving strategy generation method applied to the simulation system described above.
Additionally, the electronic device 1900 may also include a power component 1926 and a communication component 1950, the power component 1926 may be configured to perform power management for the electronic device 1900, and the communication component 1950 may be configured to enable communication for the electronic device 1900, e.g., wired or wireless communication. In addition, the electronic device 1900 may also include input/output (I/O) interfaces 1958. The electronic device 1900 may operate based on an operating system, such as Windows Server, stored in memory 1932 TM ,Mac OS X TM ,Unix TM ,Linux TM And so on.
In another exemplary embodiment, there is also provided a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the driving strategy generation method applied to the simulation system described above. For example, the computer readable storage medium may be the memory 1932 described above including program instructions executable by the processor 1922 of the electronic device 1900 to perform the driving maneuver generation method described above as applied to the simulation system.
In another exemplary embodiment, a computer program product is also provided, which comprises a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-described driving strategy generation method applied to a simulation system when executed by the programmable apparatus.
The present disclosure further provides a simulation system, which includes the driving strategy generating device applied to the simulation system according to any embodiment of the present disclosure.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. To avoid unnecessary repetition, the disclosure does not separately describe various possible combinations.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims (15)

1. A driving strategy generation method applied to a simulation system is characterized by comprising the following steps:
acquiring target environment information and target vehicle information of a target simulation vehicle corresponding to a target simulation time from simulation data generated by a simulation system, wherein the target vehicle information comprises a target position of the target simulation vehicle at the target simulation time;
acquiring target map information corresponding to the target simulation vehicle, wherein the target map information is taken from a high-precision map;
inputting the target environment information, the target vehicle information and the target map information into a decision model to obtain a target control strategy output by the decision model, wherein the decision model is obtained by training in a reinforcement learning mode according to simulation data generated by the simulation system;
according to the target control strategy, performing simulation control on the target simulation vehicle through the simulation system;
wherein the decision model is obtained by:
acquiring first environmental information, first vehicle information and first map information of the target simulation vehicle corresponding to a first historical simulation moment from simulation data generated by the simulation system;
inputting the first environment information, the first vehicle information and the first map information into an initial model used in the training to obtain a first control strategy output by the initial model;
according to the first control strategy, performing simulation control on the target simulation vehicle through the simulation system;
acquiring a designated vehicle parameter corresponding to the target simulation vehicle at a second historical simulation time from simulation data generated by the simulation system, wherein the designated vehicle parameter is taken as a first actual vehicle parameter, and the second historical simulation time is a next simulation time of the first historical simulation time;
acquiring ideal vehicle parameters of the target simulation vehicle corresponding to a second historical simulation moment as first reference vehicle parameters;
determining a reward function value of the training according to the deviation degree between the first actual vehicle parameter and the first reference vehicle parameter;
optimizing the initial model according to the reward function value to obtain the decision model.
2. The method according to claim 1, wherein the target map information is acquired by:
determining a map area with a preset area including the target position in a high-precision map;
and taking the map information corresponding to the map area as the target map information.
3. The method of claim 1, wherein said simulation controlling the target simulated vehicle by the simulation system according to the target control strategy comprises:
determining a simulation result for performing simulation control on the target simulation vehicle through a vehicle dynamics model according to the target control strategy;
and according to the simulation result, generating environment information and vehicle information corresponding to the target simulation vehicle at the next simulation time of the target simulation time, and storing the environment information and the vehicle information into the simulation system.
4. The method of claim 1, wherein after the step of providing simulated control of the target simulated vehicle by the simulation system in accordance with the target control strategy, the method further comprises:
acquiring a designated vehicle parameter corresponding to the target simulation vehicle at the next simulation time of the target simulation time from simulation data generated by the simulation system, and taking the designated vehicle parameter as a second actual vehicle parameter;
acquiring an ideal vehicle parameter of the target simulation vehicle corresponding to the next simulation moment of the target simulation moment as a second reference vehicle parameter;
and optimizing the decision model according to the deviation degree between the second actual vehicle parameter and the second reference vehicle parameter to obtain an optimized decision model.
5. The method of claim 1 or 4, wherein specifying vehicle parameters comprises at least one of: curvature, position, steering angle, distance from the surrounding vehicle.
6. The method of any of claims 1-4, wherein the target environment information comprises: information of vehicles around the target simulated vehicle, information of pedestrians around the target simulated vehicle, road information around the target simulated vehicle, and obstacle information around the target simulated vehicle;
the target vehicle information further includes: the attitude of the target simulated vehicle;
the target control strategy comprises a control strategy for at least one of: steering wheel, throttle, brake.
7. A driving strategy generation apparatus applied to a simulation system, the apparatus comprising:
the simulation system comprises a first acquisition module, a second acquisition module and a simulation processing module, wherein the first acquisition module is used for acquiring target environment information and target vehicle information of a target simulation vehicle corresponding to a target simulation time from simulation data generated by a simulation system, and the target vehicle information comprises a target position of the target simulation vehicle at the target simulation time;
the second acquisition module is used for acquiring target map information corresponding to the target simulation vehicle, and the target map information is taken from a high-precision map;
the decision module is used for inputting the target environment information, the target vehicle information and the target map information into a decision model and obtaining a target control strategy output by the decision model, wherein the decision model is obtained by training in a reinforcement learning mode according to simulation data generated by the simulation system;
the simulation control module is used for performing simulation control on the target simulation vehicle through the simulation system according to the target control strategy;
wherein the decision model is obtained by:
acquiring first environmental information, first vehicle information and first map information of the target simulation vehicle corresponding to a first historical simulation moment from simulation data generated by the simulation system;
inputting the first environment information, the first vehicle information and the first map information into an initial model used in the training to obtain a first control strategy output by the initial model;
according to the first control strategy, performing simulation control on the target simulation vehicle through the simulation system;
acquiring a designated vehicle parameter corresponding to the target simulation vehicle at a second historical simulation time from simulation data generated by the simulation system, wherein the designated vehicle parameter is taken as a first actual vehicle parameter, and the second historical simulation time is a next simulation time of the first historical simulation time;
acquiring ideal vehicle parameters of the target simulation vehicle corresponding to a second historical simulation moment as first reference vehicle parameters;
determining a reward function value of the training according to the deviation degree between the first actual vehicle parameter and the first reference vehicle parameter;
optimizing the initial model according to the reward function value to obtain the decision model.
8. The apparatus of claim 7, wherein the second obtaining module comprises:
the first determining submodule is used for determining a map area with a preset area including the target position in a high-precision map;
and the second determining submodule is used for taking the map information corresponding to the map area as the target map information.
9. The apparatus of claim 7, wherein the simulation control module comprises:
the third determining submodule is used for determining a simulation result of the simulation control of the target simulation vehicle through a vehicle dynamic model according to the target control strategy;
and the generation and storage submodule is used for generating environment information and vehicle information corresponding to the next simulation time of the target simulation vehicle at the target simulation time according to the simulation result and storing the environment information and the vehicle information into the simulation system.
10. The apparatus of claim 7, further comprising:
a third obtaining module, configured to obtain, from simulation data generated by the simulation system, a specified vehicle parameter corresponding to a next simulation time of the target simulation vehicle at the target simulation time as a second actual vehicle parameter after the simulation control module performs simulation control on the target simulation vehicle through the simulation system according to the target control policy;
a fourth obtaining module, configured to obtain an ideal vehicle parameter of the target simulated vehicle at a next simulation time corresponding to the target simulation time, as a second reference vehicle parameter;
and the model optimization module is used for optimizing the decision model according to the deviation degree between the second actual vehicle parameter and the second reference vehicle parameter so as to obtain the optimized decision model.
11. The apparatus of claim 7 or 10, wherein specifying a vehicle parameter comprises at least one of: curvature, position, steering angle, distance from the surrounding vehicle.
12. The apparatus according to any one of claims 7-10, wherein the target environment information comprises: the information of vehicles around the target simulation vehicle, the information of pedestrians around the target simulation vehicle, the information of roads around the target simulation vehicle and the information of obstacles around the target simulation vehicle;
the target vehicle information further includes: the attitude of the target simulated vehicle;
the target control strategy comprises a control strategy for at least one of: steering wheel, throttle, brake.
13. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
14. An electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 6.
15. A simulation system, characterized by comprising a driving strategy generating device applied to the simulation system of any one of claims 7-12.
CN202011303762.0A 2020-11-19 2020-11-19 Driving strategy generation method, device, medium, equipment and simulation system Active CN112382165B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011303762.0A CN112382165B (en) 2020-11-19 2020-11-19 Driving strategy generation method, device, medium, equipment and simulation system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011303762.0A CN112382165B (en) 2020-11-19 2020-11-19 Driving strategy generation method, device, medium, equipment and simulation system

Publications (2)

Publication Number Publication Date
CN112382165A CN112382165A (en) 2021-02-19
CN112382165B true CN112382165B (en) 2022-10-04

Family

ID=74584552

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011303762.0A Active CN112382165B (en) 2020-11-19 2020-11-19 Driving strategy generation method, device, medium, equipment and simulation system

Country Status (1)

Country Link
CN (1) CN112382165B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113386133A (en) * 2021-06-10 2021-09-14 贵州恰到科技有限公司 Control method of reinforcement learning robot
CN113238970B (en) * 2021-07-08 2021-10-22 腾讯科技(深圳)有限公司 Training method, evaluation method, control method and device of automatic driving model
CN115421500B (en) * 2022-11-04 2023-03-24 北自所(北京)科技发展股份有限公司 Automatic loading and unloading vehicle control method and system based on digital twin model
CN116662474B (en) * 2023-07-28 2023-11-10 智道网联科技(北京)有限公司 High-precision map data processing method, device, equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107025369A (en) * 2016-08-03 2017-08-08 北京推想科技有限公司 A kind of method and apparatus that shift learning is carried out to medical image
CN111275249A (en) * 2020-01-15 2020-06-12 吉利汽车研究院(宁波)有限公司 Driving behavior optimization method based on DQN neural network and high-precision positioning
CN111461056A (en) * 2020-04-15 2020-07-28 北京罗克维尔斯科技有限公司 Sample data acquisition method and device
CN111696405A (en) * 2019-03-13 2020-09-22 福特全球技术公司 Driving simulator

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2451485A (en) * 2007-08-01 2009-02-04 Airmax Group Plc Vehicle monitoring system
JP5671320B2 (en) * 2009-12-18 2015-02-18 キヤノン株式会社 Information processing apparatus, control method therefor, and program
CN105654808A (en) * 2016-02-03 2016-06-08 北京易驾佳信息科技有限公司 Intelligent training system for vehicle driver based on actual vehicle
CN107329466A (en) * 2017-08-28 2017-11-07 北京华清智能科技有限公司 A kind of automatic Pilot compact car
CN108762225B (en) * 2018-04-24 2020-11-10 中国商用飞机有限责任公司北京民用飞机技术研究中心 Under-aircraft equipment decision-making method for fault response time in flight control system
US11042156B2 (en) * 2018-05-14 2021-06-22 Honda Motor Co., Ltd. System and method for learning and executing naturalistic driving behavior
JP7140849B2 (en) * 2018-05-31 2022-09-21 ニッサン ノース アメリカ,インク Probabilistic Object Tracking and Prediction Framework
US11829870B2 (en) * 2018-11-26 2023-11-28 Uber Technologies, Inc. Deep reinforcement learning based models for hard-exploration problems
CN111325230B (en) * 2018-12-17 2023-09-12 上海汽车集团股份有限公司 Online learning method and online learning device for vehicle lane change decision model
DE102019205520A1 (en) * 2019-04-16 2020-10-22 Robert Bosch Gmbh Method for determining driving courses
CN110097799B (en) * 2019-05-23 2020-12-11 重庆大学 Virtual driving system based on real scene modeling
CN110478911A (en) * 2019-08-13 2019-11-22 苏州钛智智能科技有限公司 The unmanned method of intelligent game vehicle and intelligent vehicle, equipment based on machine learning
CN110562258B (en) * 2019-09-30 2022-04-29 驭势科技(北京)有限公司 Method for vehicle automatic lane change decision, vehicle-mounted equipment and storage medium
CN111814308B (en) * 2020-06-08 2024-04-26 同济大学 Acceleration test system for automatic driving system
CN111832652B (en) * 2020-07-14 2023-12-19 北京罗克维尔斯科技有限公司 Training method and device for decision model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107025369A (en) * 2016-08-03 2017-08-08 北京推想科技有限公司 A kind of method and apparatus that shift learning is carried out to medical image
CN111696405A (en) * 2019-03-13 2020-09-22 福特全球技术公司 Driving simulator
CN111275249A (en) * 2020-01-15 2020-06-12 吉利汽车研究院(宁波)有限公司 Driving behavior optimization method based on DQN neural network and high-precision positioning
CN111461056A (en) * 2020-04-15 2020-07-28 北京罗克维尔斯科技有限公司 Sample data acquisition method and device

Also Published As

Publication number Publication date
CN112382165A (en) 2021-02-19

Similar Documents

Publication Publication Date Title
CN112382165B (en) Driving strategy generation method, device, medium, equipment and simulation system
Chen et al. Autonomous vehicle testing and validation platform: Integrated simulation system with hardware in the loop
CN109991987B (en) Automatic driving decision-making method and device
CN111506058B (en) Method and device for planning a short-term path for autopilot by means of information fusion
CN111795832B (en) Intelligent driving vehicle testing method, device and equipment
CN110673602B (en) Reinforced learning model, vehicle automatic driving decision method and vehicle-mounted equipment
JP2022506404A (en) Methods and devices for determining vehicle speed
CN111653113A (en) Method, device, terminal and storage medium for determining local path of vehicle
CN115303297B (en) Urban scene end-to-end automatic driving control method and device based on attention mechanism and graph model reinforcement learning
US11107228B1 (en) Realistic image perspective transformation using neural networks
EP4295174A1 (en) Apparatus, system and method for fusing sensor data to do sensor translation
WO2020029580A1 (en) Method and apparatus for training control strategy model for generating automatic driving strategy
US11100372B2 (en) Training deep neural networks with synthetic images
KR20230159308A (en) Method, system and computer program product for calibrating and validating an advanced driver assistance system (adas) and/or an automated driving system (ads)
CN115204455A (en) Long-time-domain driving behavior decision method suitable for high-speed and loop traffic scene
Stević et al. Development of ADAS perception applications in ROS and" Software-In-the-Loop" validation with CARLA simulator
Artunedo et al. Advanced co-simulation framework for cooperative maneuvers among vehicles
Curiel-Ramirez et al. Hardware in the loop framework proposal for a semi-autonomous car architecture in a closed route environment
JP6865365B2 (en) A method and device for calibrating the physics engine of the virtual world simulator used for learning deep learning infrastructure devices, a method for learning a real state network for that purpose, and a learning device using it.
CN113379654A (en) Block discriminator for dynamic routing
CN114104005B (en) Decision-making method, device and equipment of automatic driving equipment and readable storage medium
US20230192118A1 (en) Automated driving system with desired level of driving aggressiveness
US10977783B1 (en) Quantifying photorealism in simulated data with GANs
CN113867147B (en) Training and control method, device, computing equipment and medium
CN113485300B (en) Automatic driving vehicle collision test method based on reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant