WO2022221979A1 - Automated driving scenario generation method, apparatus, and system - Google Patents
Automated driving scenario generation method, apparatus, and system Download PDFInfo
- Publication number
- WO2022221979A1 WO2022221979A1 PCT/CN2021/088037 CN2021088037W WO2022221979A1 WO 2022221979 A1 WO2022221979 A1 WO 2022221979A1 CN 2021088037 W CN2021088037 W CN 2021088037W WO 2022221979 A1 WO2022221979 A1 WO 2022221979A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- ads
- automatic driving
- action
- feedback information
- network
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 238000012360 testing method Methods 0.000 claims abstract description 66
- 230000009471 action Effects 0.000 claims description 165
- 230000002787 reinforcement Effects 0.000 claims description 74
- 238000013528 artificial neural network Methods 0.000 claims description 58
- 230000006399 behavior Effects 0.000 claims description 25
- 210000002569 neuron Anatomy 0.000 claims description 25
- 238000012549 training Methods 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 13
- 238000004458 analytical method Methods 0.000 claims description 11
- 238000009826 distribution Methods 0.000 claims description 11
- 238000003062 neural network model Methods 0.000 claims description 11
- 230000015556 catabolic process Effects 0.000 claims description 8
- 238000006731 degradation reaction Methods 0.000 claims description 8
- 238000003860 storage Methods 0.000 claims description 8
- 239000003795 chemical substances by application Substances 0.000 description 73
- 230000006870 function Effects 0.000 description 31
- 238000012545 processing Methods 0.000 description 28
- 238000013461 design Methods 0.000 description 20
- 238000010586 diagram Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 12
- 230000003993 interaction Effects 0.000 description 7
- 238000004088 simulation Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 230000001360 synchronised effect Effects 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000004804 winding Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000012502 risk assessment Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
Definitions
- the present application relates to the technical field of automatic driving simulation, and in particular, to a method, device and system for generating automatic driving scenarios.
- security issues include information security issues (security) and functional security issues (safety).
- Security mainly refers to security issues caused by human malicious attacks, such as property loss, privacy leakage, system control, system failure, etc.
- Safety mainly refers to non-artificial security issues. Security problems caused by internal system failures and information security problems caused by malicious attacks can also lead to functional security problems such as system control or system failure.
- the safety of conventional vehicles is assessed through certification, such as certification of whether the vehicle meets safety requirements according to the ISO-26262 standard and ISO/PAS 21448 SOTIF standard.
- the autonomous driving system of the vehicle is mainly a machine learning system, and the logic of its perception, prediction, planning and control is obtained through training and learning. , the requirement specification is vague; (2) It is difficult to manually detect the correctness of the implementation.
- the neural network of the machine learning system is composed of an input layer, an output layer, multiple hidden layers and many neurons, which are activated by linear or nonlinear activation functions. to pass a high-dimensional vector, which is logically incomprehensible. Therefore, it is currently impossible to evaluate the safety of smart vehicles through certification.
- the present application provides a method, device and system for generating an automatic driving scenario, so as to solve the problem that the existing technology has limited testing driving scenarios, and there are not as many testers as there are real users, resulting in long testing time, high cost and difficulty in coverage. Questions for all driving scenarios.
- an embodiment of the present application provides a method for generating an automatic driving scene, including:
- Acquiring feedback information output by the automatic driving system ADS when testing in a reference automatic driving scenario wherein the feedback information includes vehicle control instructions and neural network behavior information;
- the safety violation parameters are used to indicate the probability of safety violations when the vehicle corresponding to the ADS in the reference automatic driving scenario is driving according to the vehicle control instruction
- the coverage parameter is used to indicate the activated neurons and/or hierarchical correlations in the neural network corresponding to the ADS when testing in the reference automatic driving scenario
- the reference automatic driving scene is updated based on the feedback information to obtain an updated reference automatic driving scene, until the updated reference automatic driving scene is performed based on the ADS.
- the feedback information output by the ADS during the test in the reference automatic driving scenario is obtained, and the safety violation parameters and coverage parameters are obtained based on the feedback information.
- Update the reference autopilot scene to obtain the updated reference autopilot scene, until the safety violation parameters and coverage parameters obtained based on the feedback information output by the ADS during the test in the updated reference autopilot scene meet the preset indicators, so as to guide the driving.
- Scenarios are updated in a direction that is likely to lead to safety violations and previously untested, that is, to generate driving scenarios that are most likely to lead to safety violations and have high coverage, which can not only improve the efficiency of testing driving scenarios, but also cover more Driving scenarios, in addition, closed-loop-based automated testing can further reduce time overhead and labor overhead.
- the autonomous driving scenarios include any one or more of typical autonomous driving scenarios, missing autonomous driving scenarios, and autonomous driving scenarios that are prone to violations.
- the safety violation parameter includes the probability that the parallel distance between the vehicle corresponding to the ADS and other vehicles or pedestrians is less than the safety distance, and the vertical distance between the vehicle corresponding to the ADS and other vehicles or pedestrians is less than the safety distance.
- the probability that the vehicle corresponding to the ADS violates the traffic light indication, the probability that the vehicle corresponding to the ADS violates the traffic sign indication, the probability that the vehicle corresponding to the ADS violates the command of the traffic police and the speeding rate of the vehicle corresponding to the ADS any one or more of the probabilities;
- the coverage parameter includes any one or more of the number of activated neurons in the neural network corresponding to the ADS and the weight of the influence of the input subset on the prediction result in the heatmap of the neural network corresponding to the ADS.
- the reference automatic driving scene is updated based on the feedback information, including:
- the reference driving scenario is updated based on the selected action.
- the reinforcement learning agent determines the reward based on the feedback information and the current state of the vehicle corresponding to the ADS, and based on the reward and the current state of the vehicle corresponding to the ADS, from the action space The agent that selects the action in ;
- the reward is the sum of the reward based on safety violation and the reward based on coverage
- the reward based on safety violation is the ADS after the state of the vehicle corresponding to the ADS is updated based on the vehicle control instruction in the feedback information
- the degree to which the safety violation probability of the corresponding vehicle is close to the preset index, and the coverage-based reward is the closeness that the coverage rate of the reference automatic driving scene determined based on the neural network behavior information in the feedback information is close to the preset index
- the state of the vehicle corresponding to the ADS is used to indicate the position of the vehicle corresponding to the ADS in the reference automatic driving scene.
- the reinforcement learning agent determines the reward based on the feedback information and the state of the vehicle corresponding to the current ADS, and based on the feedback information
- the reward and the state of the vehicle corresponding to the current ADS select the action agent from the action space, and then update the reference driving scene based on the selected action.
- the reward determined by the vehicle control instructions in the feedback information and the neural network behavior information guide the behavior, and the action selected from the action space of the reference automatic driving scene with the goal of obtaining the maximum reward for the reinforcement learning agent, so as to guide the driving scene toward the easy-to-cause Safety violations and previously untested direction updates, i.e., generating driving scenarios that are most likely to lead to safety violations and have high coverage, can not only improve the efficiency of testing driving scenarios, but also cover more driving scenarios.
- the neural network model of the reinforcement learning agent includes a value network and a strategy network, the value network is used to calculate the value of a set action in a set state, and the strategy network is used to obtain the set value.
- Action probability distribution in a fixed state before acquiring the action selected by the reinforcement learning agent from the action space of the reference autonomous driving scene based on the feedback information, further comprising:
- a time difference error is determined; wherein, the time difference error is the difference between the value predicted by the value network and the actual value;
- the two neural networks, the value network and the strategy network respectively approximate the value function and the strategy function of the reinforcement learning agent.
- the function refers to the rules for evaluating the quality of actions and states by the rewards provided by the environment in reinforcement learning.
- the agent is trained so that the reinforcement learning agent can strategically update the reference autonomous driving scenario based on the vehicle control instructions and neural network behavior information in the feedback information, thereby guiding the autonomous driving scenario towards the scenarios that are prone to safety violations and previously untested.
- Orientation update i.e., generating driving scenarios that are most likely to be tested, which are likely to lead to safety violations and have high coverage, can not only improve the efficiency of testing driving scenarios, but also cover more driving scenarios.
- the present application further provides a device for generating an automatic driving scenario, the device for generating an automatic driving scenario having a function of implementing the first aspect or any possible method in design of the first aspect, the function It can be realized by hardware, or can be realized by executing corresponding software by hardware.
- the hardware or software includes one or more modules corresponding to the above functions, such as a first obtaining module, a second obtaining module, and an updating module.
- the first obtaining module is used to obtain feedback information output by the automatic driving system ADS when testing in a reference automatic driving scenario; wherein the feedback information includes vehicle control instructions and neural network behavior information;
- the second obtaining module is configured to obtain safety violation parameters and coverage parameters based on the feedback information; wherein the safety violation parameters are used to indicate that the vehicle corresponding to the ADS in the reference automatic driving scenario is based on the vehicle The probability of safety violations when the control command is driving, and the coverage parameter is used to indicate the activated neurons and/or hierarchical correlations in the neural network corresponding to the ADS when testing in the reference autonomous driving scenario;
- the updating module is configured to update the reference automatic driving scene based on the feedback information to obtain an updated reference automatic driving scene if the safety violation parameter or the coverage parameter does not meet the preset index, until the reference automatic driving scene is updated based on the feedback information.
- the safety violation parameter and the coverage parameter obtained from the feedback information output by the ADS during the test in the updated reference automatic driving scenario satisfy the preset index.
- the device further includes a creation module for:
- the autonomous driving scenarios include any one or more of typical autonomous driving scenarios, missing autonomous driving scenarios, and autonomous driving scenarios that are prone to violations.
- the safety violation parameter includes the probability that the parallel distance between the vehicle corresponding to the ADS and other vehicles or pedestrians is less than the safety distance, and the vertical distance between the vehicle corresponding to the ADS and other vehicles or pedestrians is less than the safety distance.
- the probability that the vehicle corresponding to the ADS violates the traffic light indication, the probability that the vehicle corresponding to the ADS violates the traffic sign indication, the probability that the vehicle corresponding to the ADS violates the command of the traffic police and the speeding rate of the vehicle corresponding to the ADS any one or more of the probabilities;
- the coverage parameter includes any one or more of the number of activated neurons in the neural network corresponding to the ADS and the weight of the influence of the input subset on the prediction result in the heatmap of the neural network corresponding to the ADS.
- the update module is specifically used for:
- the reference driving scenario is updated based on the selected action.
- the reinforcement learning agent determines the reward based on the feedback information and the current state of the vehicle corresponding to the ADS, and based on the reward and the current state of the vehicle corresponding to the ADS, from the action space The agent that selects the action in ;
- the reward is the sum of the reward based on safety violation and the reward based on coverage
- the reward based on safety violation is the ADS after the state of the vehicle corresponding to the ADS is updated based on the vehicle control instruction in the feedback information
- the degree to which the safety violation probability of the corresponding vehicle is close to the preset index, and the coverage-based reward is the closeness that the coverage rate of the reference automatic driving scene determined based on the neural network behavior information in the feedback information is close to the preset index
- the state of the vehicle corresponding to the ADS is used to indicate the position of the vehicle corresponding to the ADS in the reference automatic driving scene.
- the neural network model of the reinforcement learning agent includes a value network and a strategy network, the value network is used to calculate the value of a set action in a set state, and the strategy network is used to obtain the set value.
- a time difference error is determined; wherein, the time difference error is the difference between the value predicted by the value network and the actual value;
- the present application further provides a system for generating an autonomous driving scenario
- the system for generating an autonomous driving scenario may include: at least one processor; and a memory and a communication interface communicatively connected to the at least one processor;
- the memory stores instructions that can be executed by the at least one processor, and the at least one processor executes the first aspect or any one of the possible first aspects by executing the instructions stored in the memory. The function of the method in the design.
- the present application further provides a computer storage medium, where the computer storage medium includes computer instructions, which, when the computer instructions are executed on a computer, cause the computer to execute the first aspect or any one of the first aspects. a possible in-design approach.
- the present application further provides a computer program product that, when the computer program product runs on a computer, causes the computer to execute the first aspect or any possible method in design of the first aspect.
- FIG. 1 is a schematic diagram of a training process of reinforcement learning provided by an embodiment of the present application.
- FIG. 2 is a schematic structural diagram of a system for generating an automatic driving scene according to an embodiment of the present application
- FIG. 3 is a schematic flowchart of a method for generating an automatic driving scene according to an embodiment of the present application
- FIG. 4 is a schematic structural diagram of an apparatus for generating an automatic driving scene according to an embodiment of the present application
- FIG. 5 is a schematic structural diagram of another automatic driving scene generation system provided by an embodiment of the present application.
- ADS Auto driving system
- An autonomous driving system is a system for controlling a vehicle, and the vehicle is capable of autonomous driving under the control of the autonomous driving system.
- the autonomous driving system may include a collection device, two primary processing devices, a secondary processing device, and a vehicle control device.
- the collecting device is used for collecting the initial environment information of the vehicle, and sending the initial environment information to the two main processing devices.
- the main processing device is used to process the received initial environment information to obtain target environment information, and then generate a vehicle control instruction according to the target environment information, and send the vehicle control instruction to the auxiliary processing device.
- the auxiliary processing device is used for sending the vehicle control command sent by one of the main processing devices to the vehicle control device, so that the vehicle control device controls the vehicle (such as forward, reverse or turn, etc.) according to the received vehicle control command.
- the auxiliary processing device can send the vehicle control command sent by the other main processing device to the vehicle control device.
- the automatic driving system of the vehicle is mainly a machine learning system, which is composed of a neural network.
- the neural network includes an input layer, an output layer, multiple hidden layers and many neural networks.
- Agent is a very important concept in the field of artificial intelligence. Any independent entity that can think and interact with the environment can be abstracted as an agent.
- an agent can be a computer system or a part of a computer system in a specific environment. The agent can communicate and cooperate with other agents according to its own perception of the environment, according to the existing instructions or through self-learning, and autonomously complete the set goals in the environment where it is located.
- An agent can be software or a combination of software and hardware.
- Reinforcement Learning also known as Reinforcement Learning, Evaluation Learning or Reinforcement Learning
- the problem of maximizing rewards or achieving a specific goal, that is, reinforcement learning is that the agent learns in a "trial and error" way, and the reward obtained by interacting with the environment through actions guides the behavior. The goal is to enable the agent to obtain Biggest reward.
- Reinforcement learning does not require a training data set.
- the reinforcement signal (ie reward) provided by the environment is used to evaluate the quality of the action, rather than telling the reinforcement learning system how to generate the correct action. Since the external environment provides little information, the agent must learn from its own experience. In this way, the agent acquires knowledge in the environment of action-evaluation (ie, reward) and improves the action plan to adapt to the environment.
- FIG. 1 is a schematic diagram of a training process of reinforcement learning provided by an embodiment of the application.
- reinforcement learning mainly includes four elements: agent, environment, state, Action and reward.
- the input of the agent is the state
- the output is the action.
- the policy function refers to the rules of adopting behavior used by the agent in reinforcement learning.
- the action can be output according to the state, and the action can be used to explore the environment to update the state.
- the update of the policy function depends on the policy gradient ( policy gradient, PG), the policy function is usually a neural network.
- the training process of reinforcement learning is as follows: through multiple interactions between the agent and the environment, the actions, states and rewards of each interaction are obtained, and multiple sets of actions, states and rewards are used as training data to train the agent once , using the above training process to perform the next round of training on the agent until the convergence conditions are met.
- the simulator has the ability to simulate driving scenarios by configuring various parameters, such as configuring parameters such as road network, traffic, pedestrians, landscape, weather, etc.
- the main modules in the simulator include camera image and radar image. , lidar image (lidar image), dynamic model, vehicle position update and inertial navigation (ie global positioning system and inertial sensor (global positioning system+inertial measurement unit, GPS+IMU)), the first three capture the simulation driving scene image, the latter three are mainly used to dynamically update the position of the vehicle in the driving scene.
- a simulator-based simulation environment generates multiple driving scenarios and tests the generated driving scenarios. There are usually the following two methods:
- a simulator-based simulation environment that randomly generates as many driving scenarios as possible in a violent way for testing and learning.
- the steps are as follows: randomly select a real driving scene or violently select a real driving scene, model the selected real driving scene, such as road topology, traffic state, weather, etc., and configure the model in the simulator; automatically or non-automatically update the simulator weather, landscape and other information in the simulator, so as to generate a simulated driving scene in the simulator; set the self-vehicle information in the simulated driving scene, such as the position of the self-vehicle, sensor type and location, etc.; use ADS to generate a driving scene in the simulator Scenarios are tested and verified. Since the driving scenes of this method are imported through violence or randomly generated, it is blind, a large number of driving scenes must be repeated, and it is difficult to traverse all driving scenes, so the efficiency is low and the coverage of the tested driving scenes cannot be effectively improved. .
- the above two methods have certain shortcomings: the driving scenes generated based on violent learning are random, and there are a large number of repeated driving scenes in the generated driving scenes.
- Scenarios have the problems of huge time and labor costs, low efficiency, and cannot effectively improve the coverage of the driving scenarios tested; the formal method is basically to generate driving scenarios manually, so the driving scenarios are usually relatively simple, have low construction efficiency, and are difficult to construct diversely In the driving scene, it is impossible to provide a safety guarantee. That is to say, the efficiency of the above two methods for testing driving scenarios is low, and it is difficult to provide a variety of driving scenarios, and it is impossible to quickly improve the coverage of the tested driving scenarios.
- an embodiment of the present application provides a method for generating an automatic driving scene, which strategically updates the driving scene through reinforcement learning, thereby improving the diversification of the driving scene, and at the same time, guiding the driving scene toward the direction that is likely to lead to safety violations and has not been tested before. direction update, improve the efficiency of test driving scenarios, in addition, closed-loop-based automated testing can further reduce time overhead and labor overhead.
- FIG. 2 is a schematic structural diagram of a system for generating an automatic driving scene provided by an embodiment of the present application, as shown in FIG. 2 .
- the generation system of the automatic driving scene includes the vehicle 100 , the ADS 200 , the reinforcement learning agent 300 and the simulator 400 , wherein the ADS 200 can be set on the vehicle 100 , and a closed loop is formed between the ADS 200 , the reinforcement learning agent 300 and the simulator 400 .
- the simulator 400 is used to configure various parameters to simulate an autonomous driving scenario of the vehicle 100 .
- the ADS200 is used to test the automated driving scenario of the simulated vehicle 100 .
- the reinforcement learning agent 300 is used to take the output of the ADS 200 as the environment, and update the automatic driving scene of the vehicle 100 according to the reward of the environment.
- the ADS 200 may include a collection device, a main processing device, an auxiliary processing device, and a vehicle control device that are connected in sequence.
- the acquisition device may include: at least one sensor among a variety of sensors, such as a camera, a radar, a gyroscope, and an accelerometer.
- the processing capability of the main processing device may be stronger than that of the auxiliary processing device, and the main processing device may be a device integrating image processing functions, scalar computing functions, vector computing functions and matrix computing functions.
- the auxiliary processing device may be a microcontroller unit (MCU).
- the ADS may further include a target radar, and the target radar is connected to the auxiliary processing device.
- the radars in the embodiments of the present invention may all be lidars (light detection and ranging, Lidar).
- the radars in the embodiments of the present invention may also be other types of radars, such as millimeter-wave radars or Ultrasonic radar, which is not limited in this embodiment of the present invention.
- first and second in the embodiments of the present application are only used for description purposes, and cannot be interpreted as indicating or implying relative importance or implicitly indicating the number of indicated technical features.
- a feature defined as “first” or “second” may expressly or implicitly include one or more of that feature.
- At least one means one or more
- plural means two or more.
- “And/or”, which describes the association relationship of the associated objects, indicates that there can be three kinds of relationships, for example, A and/or B, which can indicate: the existence of A alone, the existence of A and B at the same time, and the existence of B alone, where A, B can be singular or plural.
- FIG. 3 is a schematic flowchart of a method for generating an automatic driving scene provided by an embodiment of the present application.
- the method for generating an automatic driving scene may be applied to the generation of the automatic driving scene shown in FIG. 2 or similar in functional structure to FIG. 2 . system.
- the specific flow of the method for generating the automatic driving scene is described as follows.
- the system for generating the autopilot scene needs to acquire the initial autopilot scene of the ADS, wherein the initial autopilot scene may include road, time , weather, vehicles, pedestrians, traffic lights, traffic signs, traffic police, and any one or more of the landscape, and then perform a distribution analysis on the obtained initial automatic driving scene to determine the initial automatic driving scene.
- Road types such as straight roads, T-junctions, overpasses, winding roads, etc.
- the probability of safety violations when the vehicles corresponding to ADS are driving in the initial automatic driving scenario such as the probability of vehicle safety accidents or traffic violations, and finally based on the results of distribution analysis , and create a reference automatic driving scenario, where the reference automatic driving scenario includes any one or more of a typical automatic driving scenario, a missing automatic driving scenario, and a violation-prone automatic driving scenario, which is not limited in this embodiment of the present application.
- the distribution analysis is off-line, that is, the distribution analysis of the initial automatic driving scene is not performed online in the simulator in the automatic driving scene generation system.
- Use offline distribution analysis to determine the road type in the initial autonomous driving scenario and the probability of safety violations when the vehicle corresponding to the ADS is driving in the initial autonomous driving scenario, thereby reducing the difficulty of analysis.
- the automatic driving scene generation system can simulate the reference automatic driving scene by configuring various parameters of the simulator in the automatic driving scene generation system, and obtain the automatic driving scene
- the feedback information output by the ADS in the generation system when tested in the simulated reference autonomous driving scenario wherein the feedback information output by the ADS includes the vehicle control instructions of the ADS in the reference autonomous driving scenario and the neural network behavior of the ADS in the reference autonomous driving scenario information.
- the vehicle control command may be a steering signal, a speed signal or a body control signal, which is not limited in the embodiment of the present application.
- the ADS collection device will collect the initial environmental information of the vehicle in the reference automatic driving environment, and send the initial environmental information to the two main processing devices, and the main processing device will process the received initial environmental information to obtain the target. environment information, and then generate vehicle control instructions according to the target environment information, so that the vehicle control device controls the vehicle, such as forwarding, reversing, or turning, according to the received vehicle control instructions.
- the neural network of the ADS may be a deep neural network (deep neural networks, DNN), or a convolutional neural network (convolutional neural networks, CNN), or a recurrent neural network (rerrent neural network) , RNN), which is not limited in the embodiments of the present application.
- DNN deep neural networks
- CNN convolutional neural networks
- RNN recurrent neural network
- the system for generating the automatic driving scenario may determine the safety violation parameters and coverage respectively based on the vehicle control instructions and the neural network behavior information in the feedback information parameter.
- the safety violation parameter is used to indicate the probability of a safety violation when the vehicle corresponding to the ADS drives according to the vehicle control instruction in the reference automatic driving scenario.
- the probability, or the probability that the vehicle corresponding to the ADS violates the command of the traffic police, or the probability that the vehicle corresponding to the ADS exceeds the speed is not limited in this embodiment of the present application.
- the vehicle control command is to control the vehicle corresponding to the ADS to move forward by 50 meters or back by 100 meters
- the coverage parameter is used to indicate the activated neurons and/or hierarchical correlations in the neural network corresponding to the ADS when testing in the reference autonomous driving scenario, for example, the activated neurons in the neural network
- the neural network is sometimes called a multi-layer perceptron (MLP), which is divided according to the position of different layers.
- MLP multi-layer perceptron
- the network layer can be divided into three categories, input layer, hidden layer and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers.
- any neuron in the i-th layer must be connected to any neuron in the i+1-th layer, so when the ADS is tested in the reference autonomous driving scene, the activated neural network in the ADS neural network
- the neural network behavior information is that the number of neurons activated by the input layer is 2, the number of neurons activated by the output layer is 1, and the number of neurons activated by the hidden layer is 6, then It can be determined that the number of activated neurons in the ADS neural network is 8 when the ADS is tested in the reference autonomous driving scenario; if the neural network behavior information is hierarchical correlation, it can be back-propagated from the output layer according to the hierarchical correlation Find a subset of the input layer and determine the heat map of the neural network. The heat map is used to represent the contribution or weight of all input subsets to the output results. From the heat map, you can intuitively see the impact of the input subsets on the output results.
- the automatic driving scene generation system may determine whether the safety violation parameters and the coverage parameters meet the preset indicators, If the safety violation parameter or the coverage parameter does not meet the preset index, the reference automatic driving scene is updated based on the feedback information to obtain the updated reference automatic driving scene, until the output based on the ADS is output during the test under the updated reference automatic driving scene.
- the security violation parameters and coverage parameters obtained from the feedback information satisfy the preset indicators.
- the safety violation parameter includes the probability that the vehicle corresponding to the ADS violates the command of the traffic police and the probability that the vehicle corresponding to the ADS exceeds the speed limit
- the probability that the vehicle corresponding to the ADS violates the command of the traffic police is 40%
- the speed of the vehicle corresponding to the ADS is 40%.
- the probability is 30%, and the probability of the vehicle corresponding to ADS violating the traffic police command and the probability of speeding of the vehicle corresponding to ADS in the preset indicators should both be 80%, then the test result does not meet the preset indicators, or, if the coverage parameters include neural
- the number of activated neurons in the network where the number of activated neurons in the neural network is 6, and the number of activated neurons in the neural network in the preset index should be 8, and the test result does not meet the preset index.
- the action selected by the reinforcement learning agent from the action space of the reference automatic driving scene based on the feedback information is obtained, wherein the action in the action space is the automatic driving.
- Discrete or continuous update of road topology, road degradation, dynamic time, dynamic weather, dynamic traffic or landscape information in the scene for example, road topology is the type of road in the reference autonomous driving scene, such as straight road, T-junction, overpass, winding mountain Highway, etc.
- the road degradation is the degree of road degradation in the reference autonomous driving scenario.
- Update the reference driving scene based on the selected action to obtain the updated reference automatic driving scene, until the safety violation parameters and coverage parameters obtained based on the feedback information output by the ADS when testing in the updated reference automatic driving scene meet the preset indicators .
- the reinforcement learning agent of the automatic driving scene generation system determines the state of the vehicle corresponding to the reward and the current ADS based on the feedback information, and determines the state of the vehicle corresponding to the current ADS based on the reward and the current ADS.
- the agent that selects the action in the action space of the reference autonomous driving scene.
- the reward is the sum of the safety violation-based reward and the coverage-based reward, and the state of the vehicle corresponding to the ADS is used to indicate the position of the vehicle corresponding to the ADS in the reference autonomous driving scene.
- the reward based on safety violation is the degree to which the safety violation probability of the vehicle corresponding to the ADS is close to the preset index after the state of the vehicle corresponding to the ADS is updated based on the vehicle control instruction in the feedback information, For example, if the reward based on the safety violation is larger, that is, after updating the state of the vehicle corresponding to the ADS based on the vehicle control instruction in the feedback information, the safety violation probability of the vehicle corresponding to the ADS is closer to the preset index, the higher the degree of The higher the probability of safety violations when the vehicle is driving in the reference autonomous driving scenario, which can encourage the evolution of the reference autonomous driving scenario towards an autonomous driving scenario that is prone to safety violations;
- the coverage-based reward is the degree to which the coverage of the reference autonomous driving scene determined based on the neural network behavior information in the feedback information is close to the preset index. For example, if the reward based on the test coverage is larger, the The closer the coverage of the reference autonomous driving scene determined by the behavior information is to the preset index, the higher the neurons and/or hierarchical correlations activated in the neural network corresponding to the ADS when the ADS is tested with reference to the autonomous driving scene. The higher the coverage of reference autonomous driving scenarios, the more autonomous driving scenarios can be covered by encouraging reference autonomous driving scenarios to evolve towards previously untested autonomous driving scenarios.
- the reinforcement learning agent of the generation system of the automatic driving scene in order for the reinforcement learning agent of the generation system of the automatic driving scene to strategically select actions from the action space of the reference automatic driving scene based on the feedback information, the reinforcement learning agent needs to be trained.
- the neural network model of the reinforcement learning agent is set as a value network and a strategy network, wherein the value network is used to calculate the value of the set action in the set state, and the strategy network is used to obtain the action probability distribution in the set state.
- the first value of the first action in the first state and the second value of the second action in the second state are calculated based on the value network, based on the first value, the second value and The reward of the first action, determine the time difference error, where the time difference error is the difference between the value predicted by the value network and the actual value, obtain the gradients of the value network and the policy network, and update the value based on the time difference error and the value network gradient
- the parameters of the network, the parameters of the policy network are updated based on the temporal difference error and the gradient of the policy network.
- the policy function refers to the rules of adopting behaviors used by the agent in reinforcement learning. For example, during the learning process, an action can be output according to the state, and the environment can be explored with this action to update the state.
- the value function refers to a rule that the agent uses the reinforcement signal (ie reward) provided by the environment in reinforcement learning to evaluate the quality of actions and states. For example, the action value is used to evaluate the quality of the action, and the state value is used to evaluate the quality of the action. Evaluate how good the current state is. Set the neural network model of the reinforcement learning agent as the value network and the policy network, that is, the two neural networks of the policy network and the value network respectively approximate the policy function and the action value function of the reinforcement learning agent, and then approximate the state value function.
- V(s; ⁇ , ⁇ ) is the state value function (that is, the value of the current state), which is the policy function (ie the probability of the action) and the action value function q(s, a; ⁇ ) (ie the value of the action), such as
- TD error temporal-difference error
- the above technical solution obtains the action selected by the reinforcement learning agent from the action space of the reference automatic driving scene based on the feedback information, wherein the reinforcement learning agent determines the reward based on the feedback information and the state of the vehicle corresponding to the current ADS, and based on the reward and the state of the vehicle corresponding to the current ADS are obtained.
- the state of the vehicle corresponding to the current ADS selects the action agent from the action space, and then updates the reference driving scene based on the selected action to obtain the updated reference automatic driving scene until the updated reference automatic driving scene based on the ADS
- the safety violation parameters and coverage parameters obtained from the feedback information output during the test in the driving scene meet the preset indicators, and after the reinforcement learning agent performs reinforcement learning in a "trial and error" manner, the vehicle control based on the feedback information is obtained.
- the direction of the update is to generate driving scenarios that are most likely to lead to safety violations and have high coverage that need to be tested, which can not only improve the efficiency of testing driving scenarios, but also cover more driving scenarios.
- closed-loop-based automated testing Time and labor costs can be further reduced.
- the methods provided by the embodiments of the present application are introduced from the perspective of an automatic driving scene generation system as an execution subject.
- the generation system for automatic driving scenarios may include a hardware structure and/or a software module, and implement the above-mentioned various functions in the form of a hardware structure, a software module, or a hardware structure plus a software module. Function. Whether one of the above functions is performed in the form of a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraints of the technical solution.
- an embodiment of the present application also provides an apparatus 400 for generating an automatic driving scene.
- the apparatus 400 may be a system for generating an automatic driving scene, or an apparatus in a system for generating an automatic driving scene.
- the apparatus 400 includes a A module for executing the method shown in FIG. 3 above.
- the apparatus 400 may include:
- the first obtaining module 401 is used to obtain feedback information output by the automatic driving system ADS when testing in a reference automatic driving scenario; wherein the feedback information includes vehicle control instructions and neural network behavior information;
- the second obtaining module 402 is configured to obtain safety violation parameters and coverage parameters based on the feedback information; wherein the safety violation parameters are used to indicate that in the reference automatic driving scenario, the vehicle corresponding to the ADS is controlled according to the vehicle The probability of safety violation when commanded to drive, and the coverage parameter is used to indicate the activated neurons and/or hierarchical correlations in the neural network corresponding to the ADS when testing in the reference autonomous driving scenario;
- the updating module 403 is configured to update the reference automatic driving scene based on the feedback information to obtain an updated reference automatic driving scene if the safety violation parameter or the coverage parameter does not meet the preset index, until the reference automatic driving scene is updated based on the feedback information.
- the safety violation parameter and the coverage parameter obtained from the feedback information output by the ADS during the test in the updated reference automatic driving scenario satisfy the preset index.
- the device further includes a creation module for:
- the autonomous driving scenarios include any one or more of typical autonomous driving scenarios, missing autonomous driving scenarios, and autonomous driving scenarios that are prone to violations.
- the safety violation parameter includes the probability that the parallel distance between the vehicle corresponding to the ADS and other vehicles or pedestrians is less than the safety distance, and the vertical distance between the vehicle corresponding to the ADS and other vehicles or pedestrians is less than the safety distance.
- the probability that the vehicle corresponding to the ADS violates the traffic light indication, the probability that the vehicle corresponding to the ADS violates the traffic sign indication, the probability that the vehicle corresponding to the ADS violates the command of the traffic police and the speeding rate of the vehicle corresponding to the ADS any one or more of the probabilities;
- the coverage parameter includes any one or more of the number of activated neurons in the neural network corresponding to the ADS and the weight of the influence of the input subset on the prediction result in the heatmap of the neural network corresponding to the ADS.
- the update module 403 is specifically used for:
- the reference driving scenario is updated based on the selected action.
- the reinforcement learning agent determines the reward based on the feedback information and the current state of the vehicle corresponding to the ADS, and based on the reward and the current state of the vehicle corresponding to the ADS, from the action space The agent that selects the action in ;
- the reward is the sum of the reward based on safety violation and the reward based on coverage
- the reward based on safety violation is the ADS after the state of the vehicle corresponding to the ADS is updated based on the vehicle control instruction in the feedback information
- the degree to which the safety violation probability of the corresponding vehicle is close to the preset index, and the coverage-based reward is the closeness that the coverage rate of the reference automatic driving scene determined based on the neural network behavior information in the feedback information is close to the preset index
- the state of the vehicle corresponding to the ADS is used to indicate the position of the vehicle corresponding to the ADS in the reference automatic driving scene.
- the neural network model of the reinforcement learning agent includes a value network and a strategy network, the value network is used to calculate the value of a set action in a set state, and the strategy network is used to obtain the set value.
- a time difference error is determined; wherein, the time difference error is the difference between the value predicted by the value network and the actual value;
- an embodiment of the present application further provides a system 500 for generating an automatic driving scenario, including:
- At least one processor 501 at least one processor 501; and, a communication interface 503 communicatively connected to the at least one processor 501;
- the at least one processor 501 causes the automatic driving scenario generation system 500 to execute the method shown in FIG. 3 by executing the instructions stored in the memory 502 .
- the memory 502 is located outside the automatic driving scenario generation system 500 .
- the automatic driving scenario generation system 500 includes the memory 502 , the memory 502 is connected to the at least one processor 501 , and the memory 502 stores the memory 502 that can be executed by the at least one processor 501 . instruction.
- FIG. 5 shows, with dashed lines, that memory 502 is optional to system 500 for generating autonomous driving scenarios.
- the processor 501 and the memory 502 may be coupled through an interface circuit, or may be integrated together, which is not limited here.
- the specific connection medium between the processor 501 , the memory 502 , and the communication interface 503 is not limited in the embodiments of the present application.
- the processor 501, the memory 502, and the communication interface 503 are connected through a bus 504 in FIG. 5.
- the bus is represented by a thick line in FIG. 5, and the connection between other components is only for schematic illustration. , is not limited.
- the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in FIG. 5, but it does not mean that there is only one bus or one type of bus.
- the processor mentioned in the embodiments of the present application may be implemented by hardware or software.
- the processor When implemented in hardware, the processor may be a logic circuit, an integrated circuit, or the like.
- the processor When implemented in software, the processor may be a general-purpose processor implemented by reading software codes stored in memory.
- the processor may be a central processing unit (central processing unit, CPU), or other general-purpose processors, digital signal processing or (DSP), application specific integrated circuit (application specific integrated circuit, ASIC) ), off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- CPU central processing unit
- DSP digital signal processing or
- ASIC application specific integrated circuit
- FPGA off-the-shelf programmable gate array
- a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
- the memory mentioned in the embodiments of the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
- the non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
- Volatile memory may be random access memory (RAM), which acts as an external cache.
- RAM random access memory
- SRAM static random access memory
- DRAM dynamic random access memory
- SDRAM synchronous DRAM
- SDRAM double data rate synchronous dynamic random access memory
- double data eat SDRAM double data eat SDRAM
- DDR SDRAM enhanced synchronous dynamic random access memory
- ESDRAM enhanced synchronous dynamic random access memory
- SCRAM synchronous link dynamic random access memory
- direct rambus RAM direct rambus RAM
- the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components
- the memory storage module
- memory described herein is intended to include, but not be limited to, these and any other suitable types of memory.
- an embodiment of the present application further provides a computer storage medium, including computer instructions, when the computer instructions are executed on a computer, the method shown in FIG. 3 is executed.
- an embodiment of the present application further provides a chip, which is coupled to a memory and used to read and execute program instructions stored in the memory, so that the method shown in FIG. 3 is executed.
- an embodiment of the present application also provides a computer program product, which enables the method shown in FIG. 3 to be executed when the computer program product runs on a computer.
- the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
- computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
- These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
- the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Traffic Control Systems (AREA)
Abstract
An automated driving scenario generation method, apparatus (400), and system (500), which can solve the problems in the prior art of a long test time, large costs and overhead, and difficulty in covering all driving scenarios due to limited test driving scenarios, and the test personnel not being as numerous as real users. The method comprises: acquiring feedback information outputted when an automated driving system (ADS) (200) is tested in a reference automated driving scenario (S301); obtaining safety violation parameters and coverage parameters on the basis of the feedback information (S302); and if the safety violation parameters or the coverage parameters do not meet a preset index, obtaining an updated reference automated driving scenario by updating the reference automated driving scenario on the basis of the feedback information, until the safety violation parameters and the coverage parameters obtained on the basis of the feedback information outputted when the ADS (200) is tested in the updated reference automated driving scenario meet the preset index (S303).
Description
本申请涉及自动驾驶仿真技术领域,尤其涉及一种自动驾驶场景的生成方法、装置及系统。The present application relates to the technical field of automatic driving simulation, and in particular, to a method, device and system for generating automatic driving scenarios.
随着科技的飞速发展,车辆自动驾驶技术越来越普及。由于车辆的安全性是车辆自动驾驶的前提,要实现真正的自动驾驶,就需要确定车辆在自动驾驶时可能出现的安全问题,评估车辆的安全性。其中,安全问题包括信息安全问题(security)和功能安全问题(safety),security主要指人为恶意攻击导致的安全问题,如财产损失,隐私泄露,系统控制,系统故障等,safety主要指非人为的系统内部故障导致的安全问题,人为恶意攻击导致的信息安全问题也可引发系统被控制或系统故障等功能安全问题。With the rapid development of science and technology, vehicle autonomous driving technology is becoming more and more popular. Since the safety of the vehicle is the premise of the automatic driving of the vehicle, in order to realize the real automatic driving, it is necessary to determine the safety problems that may occur during the automatic driving of the vehicle and evaluate the safety of the vehicle. Among them, security issues include information security issues (security) and functional security issues (safety). Security mainly refers to security issues caused by human malicious attacks, such as property loss, privacy leakage, system control, system failure, etc. Safety mainly refers to non-artificial security issues. Security problems caused by internal system failures and information security problems caused by malicious attacks can also lead to functional security problems such as system control or system failure.
目前,传统车辆的安全性是通过认证进行评估的,如根据ISO-26262标准和ISO/PAS 21448 SOTIF标准认证车辆是否满足安全要求。然而,由于智能车辆存在以下两个问题:(1)难以手动检测规范的完整性,车辆的自动驾驶系统主要是机器学习系统,其感知、预测、规划和控制的逻辑是通过训练和学习来的,需求规范是模糊的;(2)难以手动检测实现的正确性,机器学习系统的神经网络是由输入层、输出层、多个隐藏层和众多神经元组成,其通过线性或非线性激活函数来传递高维向量,逻辑上是难以理解的。所以目前无法通过认证对智能车辆的安全性进行评估。Currently, the safety of conventional vehicles is assessed through certification, such as certification of whether the vehicle meets safety requirements according to the ISO-26262 standard and ISO/PAS 21448 SOTIF standard. However, due to the following two problems in intelligent vehicles: (1) It is difficult to manually detect the completeness of the specification. The autonomous driving system of the vehicle is mainly a machine learning system, and the logic of its perception, prediction, planning and control is obtained through training and learning. , the requirement specification is vague; (2) It is difficult to manually detect the correctness of the implementation. The neural network of the machine learning system is composed of an input layer, an output layer, multiple hidden layers and many neurons, which are activated by linear or nonlinear activation functions. to pass a high-dimensional vector, which is logically incomprehensible. Therefore, it is currently impossible to evaluate the safety of smart vehicles through certification.
那么,智能车辆要解决安全问题并获得例如用户、监管机构、保险业者、政府部门等利益相关者的信任,只能通过充分的锻炼来提升车辆的智能水平和安全性,即尽可能执行更多的训练和测试。锻炼通常有下面两种途径:(1)通过用户真实驾驶来锻炼,即先把车卖出去,通过解决用户在使用的过程中发现的问题来提升车的智能水平和安全性,但这种方法显然与利益相关者的信任相冲突,在没有确定安全性之前上路,与用户的人身安全、保险业者的风险评估、监管机构和政府部门的监管职责相冲突;(2)通过实验性路测来锻炼,即在特定试验区由测试工程师而非真正的用户来测试各种驾驶场景,但因测试的驾驶场景有限,测试人员也没有真正的用户那么多,所以需要的时间非常长,成本开销大,且难以覆盖所有驾驶场景。Then, in order for smart vehicles to solve safety problems and gain the trust of stakeholders such as users, regulators, insurers, government departments, etc., they can only improve the intelligence level and safety of vehicles through sufficient exercise, that is, perform as much as possible. training and testing. There are usually two ways to exercise: (1) Exercise through the user's real driving, that is, sell the car first, and improve the intelligence level and safety of the car by solving the problems found by the user during use. The approach clearly conflicts with stakeholder trust, hitting the road before safety is established, conflicting with the personal safety of users, the risk assessment of insurers, and the regulatory responsibilities of regulators and government departments; (2) through experimental road tests To exercise, that is, test engineers instead of real users in a specific test area to test various driving scenarios, but due to the limited driving scenarios tested, there are not as many testers as real users, so the time required is very long and the cost is high. large and difficult to cover all driving scenarios.
可见,在通过实验性路测锻炼车辆,进而提升车辆的智能水平和安全性时,因测试的驾驶场景有限,测试人员也没有真正的用户那么多,所以需要的时间非常长,成本开销大,且难以覆盖所有驾驶场景。It can be seen that when exercising the vehicle through the experimental road test to improve the intelligence level and safety of the vehicle, due to the limited driving scenarios of the test, the testers do not have as many real users, so the time required is very long and the cost is high. And it is difficult to cover all driving scenarios.
发明内容SUMMARY OF THE INVENTION
本申请提供一种自动驾驶场景的生成方法、装置及系统,以解决现有技术因测试的驾驶场景有限,测试人员也没有真正的用户那么多,导致测试时间长,成本开销大,且难以覆盖所有驾驶场景的问题。The present application provides a method, device and system for generating an automatic driving scenario, so as to solve the problem that the existing technology has limited testing driving scenarios, and there are not as many testers as there are real users, resulting in long testing time, high cost and difficulty in coverage. Questions for all driving scenarios.
第一方面,本申请实施例提供一种自动驾驶场景的生成方法,包括:In a first aspect, an embodiment of the present application provides a method for generating an automatic driving scene, including:
获取自动驾驶系统ADS在参考自动驾驶场景下进行测试时输出的反馈信息;其中, 所述反馈信息包括车辆控制指令和神经网络行为信息;Acquiring feedback information output by the automatic driving system ADS when testing in a reference automatic driving scenario; wherein the feedback information includes vehicle control instructions and neural network behavior information;
基于所述反馈信息获取安全违规参数和覆盖参数;其中,所述安全违规参数用于指示在所述参考自动驾驶场景中所述ADS对应的车辆根据所述车辆控制指令行驶时安全违规的概率,所述覆盖参数用于指示在所述参考自动驾驶场景下进行测试时所述ADS对应的神经网络中激活的神经元和/或层级相关性;Obtain safety violation parameters and coverage parameters based on the feedback information; wherein, the safety violation parameters are used to indicate the probability of safety violations when the vehicle corresponding to the ADS in the reference automatic driving scenario is driving according to the vehicle control instruction, The coverage parameter is used to indicate the activated neurons and/or hierarchical correlations in the neural network corresponding to the ADS when testing in the reference automatic driving scenario;
若所述安全违规参数或所述覆盖参数不满足预设指标,则基于所述反馈信息对所述参考自动驾驶场景进行更新得到更新后的参考自动驾驶场景,直至基于所述ADS在所述更新后的参考自动驾驶场景下进行测试时输出的反馈信息获取的安全违规参数和覆盖参数满足所述预设指标。If the safety violation parameter or the coverage parameter does not meet the preset index, the reference automatic driving scene is updated based on the feedback information to obtain an updated reference automatic driving scene, until the updated reference automatic driving scene is performed based on the ADS. The safety violation parameters and coverage parameters obtained by referring to the feedback information output during the test in the automatic driving scenario meet the preset indicators.
基于上述技术方案,获取ADS在参考自动驾驶场景下进行测试时输出的反馈信息,基于反馈信息获取安全违规参数和覆盖参数,若安全违规参数或覆盖参数不满足预设指标,则基于反馈信息对参考自动驾驶场景进行更新得到更新后的参考自动驾驶场景,直至基于ADS在更新后的参考自动驾驶场景下进行测试时输出的反馈信息获取的安全违规参数和覆盖参数满足预设指标,从而引导驾驶场景朝容易导致安全违规和以前未测试的方向更新,即生成最可能需要测试的容易导致安全违规和覆盖率较高的驾驶场景,不仅可以提高测试驾驶场景的效率,同时还可以覆盖更多的驾驶场景,另外,基于闭环的自动化测试可进一步降低时间开销和人力开销。Based on the above technical solutions, the feedback information output by the ADS during the test in the reference automatic driving scenario is obtained, and the safety violation parameters and coverage parameters are obtained based on the feedback information. Update the reference autopilot scene to obtain the updated reference autopilot scene, until the safety violation parameters and coverage parameters obtained based on the feedback information output by the ADS during the test in the updated reference autopilot scene meet the preset indicators, so as to guide the driving. Scenarios are updated in a direction that is likely to lead to safety violations and previously untested, that is, to generate driving scenarios that are most likely to lead to safety violations and have high coverage, which can not only improve the efficiency of testing driving scenarios, but also cover more Driving scenarios, in addition, closed-loop-based automated testing can further reduce time overhead and labor overhead.
一种可能的设计中,获取自动驾驶系统ADS在参考自动驾驶场景下进行测试时输出的反馈信息之前,还包括:In a possible design, before obtaining the feedback information output by the automatic driving system ADS when it is tested in the reference automatic driving scenario, the following steps are further included:
获取所述ADS的初始自动驾驶场景;obtaining the initial automatic driving scenario of the ADS;
分析所述初始自动驾驶场景中的道路类型以及所述ADS对应的车辆在所述初始自动驾驶场景中行驶时安全违规的概率,基于分析结果,创建所述参考自动驾驶场景;其中,所述参考自动驾驶场景包括典型自动驾驶场景、缺失自动驾驶场景和易违规自动驾驶场景中的任一种或多种。Analyze the road type in the initial automatic driving scene and the probability of safety violations when the vehicle corresponding to the ADS is driving in the initial automatic driving scene, and create the reference automatic driving scene based on the analysis result; wherein, the reference The autonomous driving scenarios include any one or more of typical autonomous driving scenarios, missing autonomous driving scenarios, and autonomous driving scenarios that are prone to violations.
一种可能的设计中,所述安全违规参数包括所述ADS对应的车辆与其他车辆或行人的平行距离小于安全距离的概率,所述ADS对应的车辆与其他车辆或行人的垂直距离小于安全距离的概率,所述ADS对应的车辆违反交通灯指示的概率,所述ADS对应的车辆违反交通标志指示的概率,所述ADS对应的车辆违反交通警察指挥的概率和所述ADS对应的车辆超速的概率中的任一种或多种;In a possible design, the safety violation parameter includes the probability that the parallel distance between the vehicle corresponding to the ADS and other vehicles or pedestrians is less than the safety distance, and the vertical distance between the vehicle corresponding to the ADS and other vehicles or pedestrians is less than the safety distance. The probability that the vehicle corresponding to the ADS violates the traffic light indication, the probability that the vehicle corresponding to the ADS violates the traffic sign indication, the probability that the vehicle corresponding to the ADS violates the command of the traffic police and the speeding rate of the vehicle corresponding to the ADS any one or more of the probabilities;
所述覆盖参数包括所述ADS对应的神经网络中激活的神经元的数量和所述ADS对应的神经网络的热力图中输入子集对预测结果的影响的权重中的任一种或多种。The coverage parameter includes any one or more of the number of activated neurons in the neural network corresponding to the ADS and the weight of the influence of the input subset on the prediction result in the heatmap of the neural network corresponding to the ADS.
一种可能的设计中,基于所述反馈信息对所述参考自动驾驶场景进行更新,包括:In a possible design, the reference automatic driving scene is updated based on the feedback information, including:
获取强化学习智能体基于所述反馈信息从所述参考自动驾驶场景的动作空间中选择的动作;其中,所述动作空间中的动作为自动驾驶场景中的道路拓扑、道路降解、动态时间、动态天气、动态交通或景观信息的离散或连续更新;Acquiring actions selected by the reinforcement learning agent from the action space of the reference autonomous driving scene based on the feedback information; wherein the actions in the action space are road topology, road degradation, dynamic time, dynamic time in the autonomous driving scene Discrete or continuous updates of weather, dynamic traffic or landscape information;
基于所述选择的动作对所述参考驾驶场景进行更新。The reference driving scenario is updated based on the selected action.
一种可能的设计中,所述强化学习智能体为基于反馈信息确定奖励和当前所述ADS对应的车辆的状态,并基于所述奖励和当前所述ADS对应的车辆的状态从所述动作空间中选择动作的智能体;In a possible design, the reinforcement learning agent determines the reward based on the feedback information and the current state of the vehicle corresponding to the ADS, and based on the reward and the current state of the vehicle corresponding to the ADS, from the action space The agent that selects the action in ;
其中,所述奖励为基于安全违规的奖励和基于覆盖的奖励之和,所述基于安全违规的 奖励为基于所述反馈信息中的车辆控制指令更新所述ADS对应的车辆的状态后所述ADS对应的车辆的安全违规概率接近预设指标的接近程度,所述基于覆盖的奖励为基于所述反馈信息中的神经网络行为信息确定的所述参考自动驾驶场景的覆盖率接近预设指标的接近程度,所述ADS对应的车辆的状态用于指示所述ADS对应的车辆在所述参考自动驾驶场景中的位置。Wherein, the reward is the sum of the reward based on safety violation and the reward based on coverage, and the reward based on safety violation is the ADS after the state of the vehicle corresponding to the ADS is updated based on the vehicle control instruction in the feedback information The degree to which the safety violation probability of the corresponding vehicle is close to the preset index, and the coverage-based reward is the closeness that the coverage rate of the reference automatic driving scene determined based on the neural network behavior information in the feedback information is close to the preset index The state of the vehicle corresponding to the ADS is used to indicate the position of the vehicle corresponding to the ADS in the reference automatic driving scene.
基于上述技术方案,先获取强化学习智能体基于反馈信息从参考自动驾驶场景的动作空间中选择的动作,其中,强化学习智能体为基于反馈信息确定奖励和当前ADS对应的车辆的状态,并基于奖励和当前ADS对应的车辆的状态从动作空间中选择动作的智能体,再基于选择的动作对参考驾驶场景进行更新,通过获取强化学习智能体以“试错”的方式进行强化学习后,基于反馈信息中的车辆控制指令和神经网络行为信息确定的奖励去指导行为,以强化学习智能体获得最大的奖励为目标从参考自动驾驶场景的动作空间中选择的动作,从而引导驾驶场景朝容易导致安全违规和以前未测试的方向更新,即生成最可能需要测试的容易导致安全违规和覆盖率较高的驾驶场景,不仅可以提高测试驾驶场景的效率,同时还可以覆盖更多的驾驶场景。Based on the above technical solution, first obtain the action selected by the reinforcement learning agent from the action space of the reference automatic driving scene based on the feedback information, wherein the reinforcement learning agent determines the reward based on the feedback information and the state of the vehicle corresponding to the current ADS, and based on the feedback information The reward and the state of the vehicle corresponding to the current ADS select the action agent from the action space, and then update the reference driving scene based on the selected action. The reward determined by the vehicle control instructions in the feedback information and the neural network behavior information guide the behavior, and the action selected from the action space of the reference automatic driving scene with the goal of obtaining the maximum reward for the reinforcement learning agent, so as to guide the driving scene toward the easy-to-cause Safety violations and previously untested direction updates, i.e., generating driving scenarios that are most likely to lead to safety violations and have high coverage, can not only improve the efficiency of testing driving scenarios, but also cover more driving scenarios.
一种可能的设计中,所述强化学习智能体的神经网络模型包括价值网络和策略网络,所述价值网络用于计算设定状态下的设定动作的价值,所述策略网络用于获取设定状态下的动作概率分布;获取强化学习智能体基于所述反馈信息从所述参考自动驾驶场景的动作空间中选择的动作之前,还包括:In a possible design, the neural network model of the reinforcement learning agent includes a value network and a strategy network, the value network is used to calculate the value of a set action in a set state, and the strategy network is used to obtain the set value. Action probability distribution in a fixed state; before acquiring the action selected by the reinforcement learning agent from the action space of the reference autonomous driving scene based on the feedback information, further comprising:
获取当前所述ADS对应的车辆的第一状态,以及所述强化学习智能体基于所述策略网络从所述动作空间中选择的第一动作;obtaining the first state of the vehicle corresponding to the current ADS, and the first action selected by the reinforcement learning agent from the action space based on the policy network;
基于所述第一动作对所述参考自动驾驶场景进行更新,并获取所述ADS在所述更新后的参考自动驾驶场景下进行测试时输出的反馈信息;updating the reference automatic driving scene based on the first action, and acquiring feedback information output by the ADS when testing in the updated reference automatic driving scene;
获取所述强化学习智能体基于所述反馈信息确定的所述第一动作的奖励和当前所述ADS对应的车辆的第二状态,以及基于所述策略网络从所述动作空间中选择的第二动作;Obtain the reward of the first action determined by the reinforcement learning agent based on the feedback information and the current second state of the vehicle corresponding to the ADS, and the second state selected from the action space based on the policy network. action;
基于所述价值网络计算所述第一状态下的第一动作的第一价值和所述第二状态下的第二动作的第二价值;computing a first value for a first action in the first state and a second value for a second action in the second state based on the value network;
基于所述第一价值、所述第二价值以及所述第一动作的奖励,确定时间差分误差;其中,所述时间差分误差为所述价值网络预测的价值和实际的价值之差;Based on the first value, the second value and the reward of the first action, a time difference error is determined; wherein, the time difference error is the difference between the value predicted by the value network and the actual value;
获取所述价值网络和所述策略网络的梯度,并基于所述时间差分误差和所述价值网络的梯度更新所述价值网络的参数,基于所述时间差分误差和所述策略网络的梯度更新所述策略网络的参数。Obtain the gradients of the value network and the strategy network, and update the parameters of the value network based on the time difference error and the gradient of the value network, and update the parameters based on the time difference error and the gradient of the strategy network. Describe the parameters of the policy network.
基于上述技术方案,通过将价值网络和策略网络设置为强化学习智能体的神经网络模型,即将价值网络和策略网络这两个神经网络分别近似强化学习智能体的价值函数和策略函数,其中,价值函数是指智能体在强化学习中由环境提供的奖励对动作和状态的好坏作一种评价的规则,策略函数是指智能体在强化学习中使用的采用行为的规则,并对该强化学习智能体进行训练,以使该强化学习智能体可以基于反馈信息中的车辆控制指令和神经网络行为信息有策略地更新参考自动驾驶场景,从而引导自动驾驶场景朝容易导致安全违规和以前未测试的方向更新,即生成最可能需要测试的容易导致安全违规和覆盖率较高的驾驶场景,不仅可以提高测试驾驶场景的效率,同时还可以覆盖更多的驾驶场景。Based on the above technical solutions, by setting the value network and the strategy network as the neural network model of the reinforcement learning agent, the two neural networks, the value network and the strategy network, respectively approximate the value function and the strategy function of the reinforcement learning agent. The function refers to the rules for evaluating the quality of actions and states by the rewards provided by the environment in reinforcement learning. The agent is trained so that the reinforcement learning agent can strategically update the reference autonomous driving scenario based on the vehicle control instructions and neural network behavior information in the feedback information, thereby guiding the autonomous driving scenario towards the scenarios that are prone to safety violations and previously untested. Orientation update, i.e., generating driving scenarios that are most likely to be tested, which are likely to lead to safety violations and have high coverage, can not only improve the efficiency of testing driving scenarios, but also cover more driving scenarios.
第二方面,本申请还提供一种自动驾驶场景的生成装置,所述自动驾驶场景的生成装 置具有实现上述第一方面或者第一方面的任一种可能的设计中方法的功能,所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块,比如包括第一获取模块、第二获取模块、更新模块。In a second aspect, the present application further provides a device for generating an automatic driving scenario, the device for generating an automatic driving scenario having a function of implementing the first aspect or any possible method in design of the first aspect, the function It can be realized by hardware, or can be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above functions, such as a first obtaining module, a second obtaining module, and an updating module.
所述第一获取模块,用于获取自动驾驶系统ADS在参考自动驾驶场景下进行测试时输出的反馈信息;其中,所述反馈信息包括车辆控制指令和神经网络行为信息;The first obtaining module is used to obtain feedback information output by the automatic driving system ADS when testing in a reference automatic driving scenario; wherein the feedback information includes vehicle control instructions and neural network behavior information;
所述第二获取模块,用于基于所述反馈信息获取安全违规参数和覆盖参数;其中,所述安全违规参数用于指示在所述参考自动驾驶场景中所述ADS对应的车辆根据所述车辆控制指令行驶时安全违规的概率,所述覆盖参数用于指示在所述参考自动驾驶场景下进行测试时所述ADS对应的神经网络中激活的神经元和/或层级相关性;The second obtaining module is configured to obtain safety violation parameters and coverage parameters based on the feedback information; wherein the safety violation parameters are used to indicate that the vehicle corresponding to the ADS in the reference automatic driving scenario is based on the vehicle The probability of safety violations when the control command is driving, and the coverage parameter is used to indicate the activated neurons and/or hierarchical correlations in the neural network corresponding to the ADS when testing in the reference autonomous driving scenario;
所述更新模块,用于若所述安全违规参数或所述覆盖参数不满足预设指标,则基于所述反馈信息对所述参考自动驾驶场景进行更新得到更新后的参考自动驾驶场景,直至基于所述ADS在所述更新后的参考自动驾驶场景下进行测试时输出的反馈信息获取的安全违规参数和覆盖参数满足所述预设指标。The updating module is configured to update the reference automatic driving scene based on the feedback information to obtain an updated reference automatic driving scene if the safety violation parameter or the coverage parameter does not meet the preset index, until the reference automatic driving scene is updated based on the feedback information. The safety violation parameter and the coverage parameter obtained from the feedback information output by the ADS during the test in the updated reference automatic driving scenario satisfy the preset index.
一种可能的设计中,所述装置还包括创建模块,用于:In a possible design, the device further includes a creation module for:
获取所述ADS的初始自动驾驶场景;obtaining the initial automatic driving scenario of the ADS;
分析所述初始自动驾驶场景中的道路类型以及所述ADS对应的车辆在所述初始自动驾驶场景中行驶时安全违规的概率,基于分析结果,创建所述参考自动驾驶场景;其中,所述参考自动驾驶场景包括典型自动驾驶场景、缺失自动驾驶场景和易违规自动驾驶场景中的任一种或多种。Analyze the road type in the initial automatic driving scene and the probability of safety violations when the vehicle corresponding to the ADS is driving in the initial automatic driving scene, and create the reference automatic driving scene based on the analysis result; wherein, the reference The autonomous driving scenarios include any one or more of typical autonomous driving scenarios, missing autonomous driving scenarios, and autonomous driving scenarios that are prone to violations.
一种可能的设计中,所述安全违规参数包括所述ADS对应的车辆与其他车辆或行人的平行距离小于安全距离的概率,所述ADS对应的车辆与其他车辆或行人的垂直距离小于安全距离的概率,所述ADS对应的车辆违反交通灯指示的概率,所述ADS对应的车辆违反交通标志指示的概率,所述ADS对应的车辆违反交通警察指挥的概率和所述ADS对应的车辆超速的概率中的任一种或多种;In a possible design, the safety violation parameter includes the probability that the parallel distance between the vehicle corresponding to the ADS and other vehicles or pedestrians is less than the safety distance, and the vertical distance between the vehicle corresponding to the ADS and other vehicles or pedestrians is less than the safety distance. The probability that the vehicle corresponding to the ADS violates the traffic light indication, the probability that the vehicle corresponding to the ADS violates the traffic sign indication, the probability that the vehicle corresponding to the ADS violates the command of the traffic police and the speeding rate of the vehicle corresponding to the ADS any one or more of the probabilities;
所述覆盖参数包括所述ADS对应的神经网络中激活的神经元的数量和所述ADS对应的神经网络的热力图中输入子集对预测结果的影响的权重中的任一种或多种。The coverage parameter includes any one or more of the number of activated neurons in the neural network corresponding to the ADS and the weight of the influence of the input subset on the prediction result in the heatmap of the neural network corresponding to the ADS.
一种可能的设计中,所述更新模块,具体用于:In a possible design, the update module is specifically used for:
获取强化学习智能体基于所述反馈信息从所述参考自动驾驶场景的动作空间中选择的动作;其中,所述动作空间中的动作为自动驾驶场景中的道路拓扑、道路降解、动态时间、动态天气、动态交通或景观信息的离散或连续更新;Acquiring actions selected by the reinforcement learning agent from the action space of the reference autonomous driving scene based on the feedback information; wherein the actions in the action space are road topology, road degradation, dynamic time, dynamic time in the autonomous driving scene Discrete or continuous updates of weather, dynamic traffic or landscape information;
基于所述选择的动作对所述参考驾驶场景进行更新。The reference driving scenario is updated based on the selected action.
一种可能的设计中,所述强化学习智能体为基于反馈信息确定奖励和当前所述ADS对应的车辆的状态,并基于所述奖励和当前所述ADS对应的车辆的状态从所述动作空间中选择动作的智能体;In a possible design, the reinforcement learning agent determines the reward based on the feedback information and the current state of the vehicle corresponding to the ADS, and based on the reward and the current state of the vehicle corresponding to the ADS, from the action space The agent that selects the action in ;
其中,所述奖励为基于安全违规的奖励和基于覆盖的奖励之和,所述基于安全违规的奖励为基于所述反馈信息中的车辆控制指令更新所述ADS对应的车辆的状态后所述ADS对应的车辆的安全违规概率接近预设指标的接近程度,所述基于覆盖的奖励为基于所述反馈信息中的神经网络行为信息确定的所述参考自动驾驶场景的覆盖率接近预设指标的接近程度,所述ADS对应的车辆的状态用于指示所述ADS对应的车辆在所述参考自动驾驶场景中的位置。Wherein, the reward is the sum of the reward based on safety violation and the reward based on coverage, and the reward based on safety violation is the ADS after the state of the vehicle corresponding to the ADS is updated based on the vehicle control instruction in the feedback information The degree to which the safety violation probability of the corresponding vehicle is close to the preset index, and the coverage-based reward is the closeness that the coverage rate of the reference automatic driving scene determined based on the neural network behavior information in the feedback information is close to the preset index The state of the vehicle corresponding to the ADS is used to indicate the position of the vehicle corresponding to the ADS in the reference automatic driving scene.
一种可能的设计中,所述强化学习智能体的神经网络模型包括价值网络和策略网络,所述价值网络用于计算设定状态下的设定动作的价值,所述策略网络用于获取设定状态下的动作概率分布;所述装置还包括训练模块,用于:In a possible design, the neural network model of the reinforcement learning agent includes a value network and a strategy network, the value network is used to calculate the value of a set action in a set state, and the strategy network is used to obtain the set value. Action probability distribution in a fixed state; the device also includes a training module for:
获取当前所述ADS对应的车辆的第一状态,以及所述强化学习智能体基于所述策略网络从所述动作空间中选择的第一动作;obtaining the first state of the vehicle corresponding to the current ADS, and the first action selected by the reinforcement learning agent from the action space based on the policy network;
基于所述第一动作对所述参考自动驾驶场景进行更新,并获取所述ADS在所述更新后的参考自动驾驶场景下进行测试时输出的反馈信息;updating the reference automatic driving scene based on the first action, and acquiring feedback information output by the ADS when testing in the updated reference automatic driving scene;
获取所述强化学习智能体基于所述反馈信息确定的所述第一动作的奖励和当前所述ADS对应的车辆的第二状态,以及基于所述策略网络从所述动作空间中选择的第二动作;Obtain the reward of the first action determined by the reinforcement learning agent based on the feedback information and the current second state of the vehicle corresponding to the ADS, and the second state selected from the action space based on the policy network. action;
基于所述价值网络计算所述第一状态下的第一动作的第一价值和所述第二状态下的第二动作的第二价值;computing a first value for a first action in the first state and a second value for a second action in the second state based on the value network;
基于所述第一价值、所述第二价值以及所述第一动作的奖励,确定时间差分误差;其中,所述时间差分误差为所述价值网络预测的价值和实际的价值之差;Based on the first value, the second value and the reward of the first action, a time difference error is determined; wherein, the time difference error is the difference between the value predicted by the value network and the actual value;
获取所述价值网络和所述策略网络的梯度,并基于所述时间差分误差和所述价值网络的梯度更新所述价值网络的参数,基于所述时间差分误差和所述策略网络的梯度更新所述策略网络的参数。Obtain the gradients of the value network and the strategy network, and update the parameters of the value network based on the time difference error and the gradient of the value network, and update the parameters based on the time difference error and the gradient of the strategy network. Describe the parameters of the policy network.
第三方面,本申请还提供一种自动驾驶场景的生成系统,所述自动驾驶场景的生成系统可以包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器、通信接口;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述至少一个处理器通过执行所述存储器存储的指令,执行如上述第一方面或者第一方面的任一种可能的设计中方法的功能。In a third aspect, the present application further provides a system for generating an autonomous driving scenario, the system for generating an autonomous driving scenario may include: at least one processor; and a memory and a communication interface communicatively connected to the at least one processor; Wherein, the memory stores instructions that can be executed by the at least one processor, and the at least one processor executes the first aspect or any one of the possible first aspects by executing the instructions stored in the memory. The function of the method in the design.
第四方面,本申请还提供一种计算机存储介质,所述计算机存储介质包括计算机指令,当所述计算机指令在计算机上运行时,使得所述计算机执行上述第一方面或者第一方面的任一种可能的设计中方法。In a fourth aspect, the present application further provides a computer storage medium, where the computer storage medium includes computer instructions, which, when the computer instructions are executed on a computer, cause the computer to execute the first aspect or any one of the first aspects. a possible in-design approach.
第五方面,本申请还提供一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行上述第一方面或者第一方面的任一种可能的设计中方法。In a fifth aspect, the present application further provides a computer program product that, when the computer program product runs on a computer, causes the computer to execute the first aspect or any possible method in design of the first aspect.
图1为本申请实施例提供的强化学习的训练过程示意图;1 is a schematic diagram of a training process of reinforcement learning provided by an embodiment of the present application;
图2为本申请实施例提供的一种自动驾驶场景的生成系统的结构示意图;FIG. 2 is a schematic structural diagram of a system for generating an automatic driving scene according to an embodiment of the present application;
图3为本申请实施例提供的一种自动驾驶场景的生成方法的流程示意图;FIG. 3 is a schematic flowchart of a method for generating an automatic driving scene according to an embodiment of the present application;
图4为本申请实施例提供的一种自动驾驶场景的生成装置的结构示意图;FIG. 4 is a schematic structural diagram of an apparatus for generating an automatic driving scene according to an embodiment of the present application;
图5为本申请实施例提供的另一种自动驾驶场景的生成系统的结构示意图。FIG. 5 is a schematic structural diagram of another automatic driving scene generation system provided by an embodiment of the present application.
为了使本申请的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整的描述。In order to make the objectives, technical solutions and advantages of the present application clearer, the following will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application.
为便于理解本申请实施例,下面先对本申请实施例中涉及的专业术语进行解释说明。In order to facilitate understanding of the embodiments of the present application, the technical terms involved in the embodiments of the present application are explained below first.
一、自动驾驶系统(auto driving system,ADS)1. Auto driving system (ADS)
自动驾驶系统是一种用于控制车辆的系统,车辆能够在自动驾驶系统的控制下实现自动驾驶。自动驾驶系统可以包括采集装置、两个主处理装置、辅处理装置以及车辆控制装置。其中,采集装置用于采集车辆的初始环境信息,并将该初始环境信息发送至两个主处理装置。主处理装置用于将接收到的初始环境信息进行处理以得到目标环境信息,之后再根据该目标环境信息生成车辆控制指令,并将该车辆控制指令发送至辅处理装置。辅处理装置用于将其中一个主处理装置发送的车辆控制指令发送至车辆控制装置,以使该车辆控制装置根据接收到的车辆控制指令对车辆进行控制(如前进、倒退或转弯等)。在其中一个主处理装置故障时,辅处理装置可将另一个主处理装置发送的车辆控制指令发送至车辆控制装置。An autonomous driving system is a system for controlling a vehicle, and the vehicle is capable of autonomous driving under the control of the autonomous driving system. The autonomous driving system may include a collection device, two primary processing devices, a secondary processing device, and a vehicle control device. Wherein, the collecting device is used for collecting the initial environment information of the vehicle, and sending the initial environment information to the two main processing devices. The main processing device is used to process the received initial environment information to obtain target environment information, and then generate a vehicle control instruction according to the target environment information, and send the vehicle control instruction to the auxiliary processing device. The auxiliary processing device is used for sending the vehicle control command sent by one of the main processing devices to the vehicle control device, so that the vehicle control device controls the vehicle (such as forward, reverse or turn, etc.) according to the received vehicle control command. When one of the main processing devices fails, the auxiliary processing device can send the vehicle control command sent by the other main processing device to the vehicle control device.
在利用自动驾驶系统对车辆进行自动驾驶训练和测试时,车辆的自动驾驶系统主要是一个机器学习系统,由神经网络构成,其中,神经网络包括输入层、输出层、多个隐藏层和众多神经元,激活的神经元的个数越多,层级相关性越高,说明对车辆进行自动驾驶训练和测试时的驾驶场景的覆盖率越高,即对车辆进行自动驾驶训练和测试时能够覆盖到越多的驾驶场景。When using the automatic driving system to train and test the automatic driving of the vehicle, the automatic driving system of the vehicle is mainly a machine learning system, which is composed of a neural network. The neural network includes an input layer, an output layer, multiple hidden layers and many neural networks. The higher the number of activated neurons, the higher the level correlation, which means that the coverage of the driving scene during the automatic driving training and testing of the vehicle is higher, that is, the vehicle can be covered when the automatic driving training and testing are performed. More driving scenarios.
二、强化学习智能体2. Reinforcement learning agents
智能体(agent)是人工智能领域中一个很重要的概念,任何独立的能够思想并可以同环境交互的实体都可以抽象为智能体。例如,智能体可以是某个特定的环境下的计算机系统或者计算机系统的一部分。智能体可以根据自身对环境的感知,按照已有的指示或者通过自主学习,并与其他智能体进行沟通协作,在其所处的环境中自主地完成设定的目标。智能体可以是软件或者软件和硬件结合的实体。Agent is a very important concept in the field of artificial intelligence. Any independent entity that can think and interact with the environment can be abstracted as an agent. For example, an agent can be a computer system or a part of a computer system in a specific environment. The agent can communicate and cooperate with other agents according to its own perception of the environment, according to the existing instructions or through self-learning, and autonomously complete the set goals in the environment where it is located. An agent can be software or a combination of software and hardware.
强化学习(Reinforcement Learning,RL),又称再励学习、评价学习或增强学习,是机器学习的范式和方法论之一,用于描述和解决智能体在与环境的交互过程中通过学习策略以达成回报最大化或实现特定目标的问题,即强化学习是智能体以“试错”的方式进行学习,通过动作(action)与环境进行交互获得的奖励(reward)指导行为,目标是使智能体获得最大的奖励。强化学习无需训练数据集,强化学习中将由环境提供的强化信号(即奖励)对动作的好坏作一种评价,而不是告诉强化学习系统如何去产生正确的动作。由于外部环境提供的信息很少,智能体必须靠自身的经历进行学习,通过这种方式,智能体在行动-评价(即奖励)的环境中获得知识,改进行动方案以适应环境。Reinforcement Learning (RL), also known as Reinforcement Learning, Evaluation Learning or Reinforcement Learning, is one of the paradigms and methodologies of machine learning. The problem of maximizing rewards or achieving a specific goal, that is, reinforcement learning is that the agent learns in a "trial and error" way, and the reward obtained by interacting with the environment through actions guides the behavior. The goal is to enable the agent to obtain Biggest reward. Reinforcement learning does not require a training data set. In reinforcement learning, the reinforcement signal (ie reward) provided by the environment is used to evaluate the quality of the action, rather than telling the reinforcement learning system how to generate the correct action. Since the external environment provides little information, the agent must learn from its own experience. In this way, the agent acquires knowledge in the environment of action-evaluation (ie, reward) and improves the action plan to adapt to the environment.
示例性地,图1为本申请实施例提供的强化学习的训练过程示意图,如图1所示,强化学习主要包含四个元素:智能体(agent)、环境(environment)、状态(state)、动作(action)与奖励(reward)。其中,智能体的输入为状态,输出为动作。策略函数是指智能体在强化学习中使用的采用行为的规则,例如,在学习过程中,可以根据状态输出动作,并以此动作探索环境,以更新状态,策略函数的更新依赖于策略梯度(policy gradient,PG),策略函数通常为一个神经网络。当前技术中,强化学习的训练过程为:通过智能体与环境进行多次交互,获得每次交互的动作、状态、奖励,将多组动作、状态、奖励作为训练数据,对智能体进行一次训练,采用上述训练过程,对智能体进行下一轮次训练,直至满足收敛条件。Exemplarily, FIG. 1 is a schematic diagram of a training process of reinforcement learning provided by an embodiment of the application. As shown in FIG. 1 , reinforcement learning mainly includes four elements: agent, environment, state, Action and reward. Among them, the input of the agent is the state, and the output is the action. The policy function refers to the rules of adopting behavior used by the agent in reinforcement learning. For example, in the learning process, the action can be output according to the state, and the action can be used to explore the environment to update the state. The update of the policy function depends on the policy gradient ( policy gradient, PG), the policy function is usually a neural network. In the current technology, the training process of reinforcement learning is as follows: through multiple interactions between the agent and the environment, the actions, states and rewards of each interaction are obtained, and multiple sets of actions, states and rewards are used as training data to train the agent once , using the above training process to perform the next round of training on the agent until the convergence conditions are met.
其中,获得一次交互的动作、状态、奖励的过程如图1所示,将环境当前状态s(t)输入至智能体,获得智能体输出的动作a(t),根据环境在动作a(t)作用下的相关性能指标计算本次交互的奖励r(t),至此,获得本次交互的动作a(t)与奖励r(t)。记录本次交互的动作a(t) 与奖励r(t),以备后续用来训练智能体,同时还将记录环境在动作a(t)作用下的下一个状态s(t+1),以便实现智能体与环境的下一次交互。Among them, the process of obtaining the action, state and reward of an interaction is shown in Figure 1. The current state of the environment s(t) is input to the agent, and the action a(t) output by the agent is obtained. ) to calculate the reward r(t) of this interaction, so far, the action a(t) and reward r(t) of this interaction are obtained. Record the action a(t) and reward r(t) of this interaction for subsequent training of the agent, and also record the next state s(t+1) of the environment under the action of action a(t), In order to realize the next interaction between the agent and the environment.
三、模拟器(simulator)3. Simulator
模拟器可通过配置各种参数具有仿真驾驶场景的能力,比如配置路网、交通、行人、景观、天气等参数,模拟器中的主要模块包括相机图像(camera image)、雷达图像(radar image)、激光雷达图像(lidar image)、动力学模型,车辆位置更新以及惯导(即全球定位系统和惯性传感器(global positioning system+inertial measurement unit,GPS+IMU)),前三者捕捉仿真驾驶场景的图像,后三者主要用来动态更新车辆在驾驶场景中的位置。The simulator has the ability to simulate driving scenarios by configuring various parameters, such as configuring parameters such as road network, traffic, pedestrians, landscape, weather, etc. The main modules in the simulator include camera image and radar image. , lidar image (lidar image), dynamic model, vehicle position update and inertial navigation (ie global positioning system and inertial sensor (global positioning system+inertial measurement unit, GPS+IMU)), the first three capture the simulation driving scene image, the latter three are mainly used to dynamically update the position of the vehicle in the driving scene.
目前,基于模拟器的仿真环境生成多个驾驶场景,并对生成的驾驶场景进行测试,通常有下面两种方法:At present, a simulator-based simulation environment generates multiple driving scenarios and tests the generated driving scenarios. There are usually the following two methods:
(1)暴力学习(1) Violent learning
基于模拟器的仿真环境,通过暴力的方式随机生成尽可能多的驾驶场景来进行测试和学习。步骤如下:随机选取真实驾驶场景或暴力选取真实驾驶场景,对选取的真实驾驶场景进行建模,如道路拓扑、交通状态、天气等,并在模拟器中配置模型;自动或非自动更新模拟器中的天气、景观等信息,从而在模拟器中生成仿真的驾驶场景;设置仿真的驾驶场景中的自车信息,例如自车的位置、传感器种类和位置等;利用ADS对模拟器生成的驾驶场景进行测试验证。由于这种方法的驾驶场景是通过暴力导入或随机生成的,因此具有盲目性,大量驾驶场景必然是重复的,且难以遍历所有驾驶场景,所以效率低下,不能有效提高测试的驾驶场景的覆盖度。A simulator-based simulation environment that randomly generates as many driving scenarios as possible in a violent way for testing and learning. The steps are as follows: randomly select a real driving scene or violently select a real driving scene, model the selected real driving scene, such as road topology, traffic state, weather, etc., and configure the model in the simulator; automatically or non-automatically update the simulator weather, landscape and other information in the simulator, so as to generate a simulated driving scene in the simulator; set the self-vehicle information in the simulated driving scene, such as the position of the self-vehicle, sensor type and location, etc.; use ADS to generate a driving scene in the simulator Scenarios are tested and verified. Since the driving scenes of this method are imported through violence or randomly generated, it is blind, a large number of driving scenes must be repeated, and it is difficult to traverse all driving scenes, so the efficiency is low and the coverage of the tested driving scenes cannot be effectively improved. .
(2)形式方法(2) Formal method
基于模拟器的仿真环境,通过正规的推理,构造安全相关的特定驾驶场景来进行测试和学习。这种方法的步骤如下:通过推理构造特定的驾驶场景,对特定的驾驶场景进行建模,如道路拓扑、交通状态、天气等,并在模拟器中配置模型;自动或非自动更新模拟器中的天气、景观等信息,从而在模拟器中生成仿真的驾驶场景;设置仿真的驾驶场景中的自车信息,例如自车的位置、传感器种类和位置等;利用ADS对模拟器生成的驾驶场景进行测试验证。由于这种方法的驾驶场景主要依靠人工推理,并手动构造特定的驾驶场景,因此构造效率低,且场景一般也比较简单,难以构造多样化的驾驶场景,不能提供安全担保。Based on the simulation environment of the simulator, through formal reasoning, specific driving scenarios related to safety are constructed for testing and learning. The steps of this method are as follows: construct a specific driving scenario by reasoning, model the specific driving scenario, such as road topology, traffic state, weather, etc., and configure the model in the simulator; update the simulator automatically or not automatically to generate a simulated driving scene in the simulator; set the self-vehicle information in the simulated driving scene, such as the position of the self-vehicle, sensor type and location, etc.; use ADS to generate a driving scene in the simulator Carry out test verification. Since the driving scenes of this method mainly rely on manual reasoning and manually construct specific driving scenes, the construction efficiency is low, and the scenes are generally relatively simple, it is difficult to construct diverse driving scenes, and no safety guarantee can be provided.
由此可知,以上两种方法都有一定的缺点:基于暴力学习生成的驾驶场景具有随机性,生成的驾驶场景中有大量重复的驾驶场景,在对生成的驾驶场景进行测试时,难以遍历所有场景,存在时间和人力成本巨大,效率低下,不能有效提高测试的驾驶场景的覆盖度的问题;形式方法基本是手动生成驾驶场景,因此驾驶场景通常比较简单,存在构造效率低,且难以构造多样化的驾驶场景,不能提供安全担保的问题。即以上两种方法测试驾驶场景的效率都较低,且都难以提供多样化驾驶场景,无法快速提高测试的驾驶场景的覆盖度。It can be seen that the above two methods have certain shortcomings: the driving scenes generated based on violent learning are random, and there are a large number of repeated driving scenes in the generated driving scenes. When testing the generated driving scenes, it is difficult to traverse all the driving scenes. Scenarios have the problems of huge time and labor costs, low efficiency, and cannot effectively improve the coverage of the driving scenarios tested; the formal method is basically to generate driving scenarios manually, so the driving scenarios are usually relatively simple, have low construction efficiency, and are difficult to construct diversely In the driving scene, it is impossible to provide a safety guarantee. That is to say, the efficiency of the above two methods for testing driving scenarios is low, and it is difficult to provide a variety of driving scenarios, and it is impossible to quickly improve the coverage of the tested driving scenarios.
鉴于此,本申请实施例提供的一种自动驾驶场景的生成方法,通过强化学习有策略地更新驾驶场景,从而提高驾驶场景的多样化,同时,引导驾驶场景朝容易导致安全违规和以前未测试的方向更新,提高测试驾驶场景的效率,另外,基于闭环的自动化测试可进一步降低时间开销和人力开销。In view of this, an embodiment of the present application provides a method for generating an automatic driving scene, which strategically updates the driving scene through reinforcement learning, thereby improving the diversification of the driving scene, and at the same time, guiding the driving scene toward the direction that is likely to lead to safety violations and has not been tested before. direction update, improve the efficiency of test driving scenarios, in addition, closed-loop-based automated testing can further reduce time overhead and labor overhead.
应理解,本申请实施例可以应用到一种自动驾驶场景的生成系统中,示例性地,图2为本申请实施例提供的一种自动驾驶场景的生成系统的结构示意图,如图2所示,该自动 驾驶场景的生成系统包括车辆100、ADS200、强化学习智能体300以及模拟器400,其中,ADS200可以设置在车辆100上,ADS200、强化学习智能体300以及模拟器400之间形成闭环。模拟器400用于配置各种参数仿真车辆100的自动驾驶场景。ADS200用于对仿真的车辆100的自动驾驶场景进行测试。强化学习智能体300用于将ADS200的输出当做环境,根据环境的奖励来更新车辆100的自动驾驶场景。It should be understood that the embodiments of the present application may be applied to a system for generating an automatic driving scene. Exemplarily, FIG. 2 is a schematic structural diagram of a system for generating an automatic driving scene provided by an embodiment of the present application, as shown in FIG. 2 . , the generation system of the automatic driving scene includes the vehicle 100 , the ADS 200 , the reinforcement learning agent 300 and the simulator 400 , wherein the ADS 200 can be set on the vehicle 100 , and a closed loop is formed between the ADS 200 , the reinforcement learning agent 300 and the simulator 400 . The simulator 400 is used to configure various parameters to simulate an autonomous driving scenario of the vehicle 100 . The ADS200 is used to test the automated driving scenario of the simulated vehicle 100 . The reinforcement learning agent 300 is used to take the output of the ADS 200 as the environment, and update the automatic driving scene of the vehicle 100 according to the reward of the environment.
其中,ADS200可以包括依次连接的采集装置、主处理装置、辅处理装置和车辆的控制装置。可选地,采集装置可以包括:摄像头、雷达、陀螺仪、加速度计等多种传感器中的至少一种传感器。主处理装置的处理能力可以强于辅处理装置,主处理装置可以为集成有图像处理功能、标量计算功能、矢量计算功能和矩阵计算功能的装置。辅处理装置可以为微控制装置(microcontroller unit,MCU)。可选地,该ADS还可以包括目标雷达,且目标雷达与辅处理装置连接。需要说明的是,本发明实施例中的雷达均可以为激光雷达(light detection and ranging,Lidar),可选地,本发明实施例中的雷达还可以为其他类型的雷达,如毫米波雷达或超声波雷达,本发明实施例对此不作限定。The ADS 200 may include a collection device, a main processing device, an auxiliary processing device, and a vehicle control device that are connected in sequence. Optionally, the acquisition device may include: at least one sensor among a variety of sensors, such as a camera, a radar, a gyroscope, and an accelerometer. The processing capability of the main processing device may be stronger than that of the auxiliary processing device, and the main processing device may be a device integrating image processing functions, scalar computing functions, vector computing functions and matrix computing functions. The auxiliary processing device may be a microcontroller unit (MCU). Optionally, the ADS may further include a target radar, and the target radar is connected to the auxiliary processing device. It should be noted that, the radars in the embodiments of the present invention may all be lidars (light detection and ranging, Lidar). Optionally, the radars in the embodiments of the present invention may also be other types of radars, such as millimeter-wave radars or Ultrasonic radar, which is not limited in this embodiment of the present invention.
以上介绍了本申请实施例提供的自动驾驶场景的生成系统,接下来结合附图介绍本申请实施例提供的自动驾驶场景的生成方法。The system for generating an automatic driving scene provided by the embodiments of the present application has been described above. Next, the method for generating an automatic driving scene provided by the embodiments of the present application is described with reference to the accompanying drawings.
应理解,本申请实施例中的术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合,例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a和b,a和c,b和c,或a和b和c。It should be understood that the terms "first" and "second" in the embodiments of the present application are only used for description purposes, and cannot be interpreted as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, a feature defined as "first" or "second" may expressly or implicitly include one or more of that feature. "At least one" means one or more, and "plurality" means two or more. "And/or", which describes the association relationship of the associated objects, indicates that there can be three kinds of relationships, for example, A and/or B, which can indicate: the existence of A alone, the existence of A and B at the same time, and the existence of B alone, where A, B can be singular or plural. The character "/" generally indicates that the associated objects are an "or" relationship. "At least one item(s) below" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural item(s), for example, at least one of a, b, or c An item (a), which can be expressed as: a, b, c, a and b, a and c, b and c, or a and b and c.
图3为本申请实施例提供的一种自动驾驶场景的生成方法的流程示意图,该自动驾驶场景的生成方法可以应用于上述图2所示的或者与图2功能结构类似的自动驾驶场景的生成系统。如图3所示,该自动驾驶场景的生成方法的具体流程描述如下。FIG. 3 is a schematic flowchart of a method for generating an automatic driving scene provided by an embodiment of the present application. The method for generating an automatic driving scene may be applied to the generation of the automatic driving scene shown in FIG. 2 or similar in functional structure to FIG. 2 . system. As shown in FIG. 3 , the specific flow of the method for generating the automatic driving scene is described as follows.
S301、获取自动驾驶系统ADS在参考自动驾驶场景下进行测试时输出的反馈信息。S301 , acquiring feedback information output by the automatic driving system ADS when it is tested in a reference automatic driving scenario.
在一些实施例中,自动驾驶场景的生成系统在获取ADS在参考自动驾驶场景下进行测试时输出的反馈信息之前,需要获取ADS的初始自动驾驶场景,其中,初始自动驾驶场景可以包括道路、时间、天气、车辆、行人、交通灯、交通标志、交通警察以及景观中的任一种或多种,再对获取的初始自动驾驶场景进行分布分析(distribution analysis),从而确定初始自动驾驶场景中的道路类型,例如直路、丁字路口、立交桥、盘山公路等,以及ADS对应的车辆在初始自动驾驶场景中行驶时安全违规的概率,例如车辆发生安全事故或交通违规的概率,最后基于分布分析的结果,创建参考自动驾驶场景,其中,参考自动驾驶场景包括典型自动驾驶场景、缺失自动驾驶场景和易违规自动驾驶场景中的任一种或多种,本申请实施例对此不作限定。In some embodiments, before acquiring the feedback information output by the ADS when the ADS performs the test under the reference autopilot scene, the system for generating the autopilot scene needs to acquire the initial autopilot scene of the ADS, wherein the initial autopilot scene may include road, time , weather, vehicles, pedestrians, traffic lights, traffic signs, traffic police, and any one or more of the landscape, and then perform a distribution analysis on the obtained initial automatic driving scene to determine the initial automatic driving scene. Road types, such as straight roads, T-junctions, overpasses, winding roads, etc., and the probability of safety violations when the vehicles corresponding to ADS are driving in the initial automatic driving scenario, such as the probability of vehicle safety accidents or traffic violations, and finally based on the results of distribution analysis , and create a reference automatic driving scenario, where the reference automatic driving scenario includes any one or more of a typical automatic driving scenario, a missing automatic driving scenario, and a violation-prone automatic driving scenario, which is not limited in this embodiment of the present application.
需要说明的是,在本申请实施例中分布分析是离线的(off-line),即并非是在自动驾驶场景的生成系统中的模拟器中在线对初始自动驾驶场景进行分布分析,而是通过使用离线的分布分析,确定初始自动驾驶场景中的道路类型以及ADS对应的车辆在初始自动驾 驶场景中行驶时安全违规的概率,进而降低分析难度。It should be noted that, in the embodiment of the present application, the distribution analysis is off-line, that is, the distribution analysis of the initial automatic driving scene is not performed online in the simulator in the automatic driving scene generation system. Use offline distribution analysis to determine the road type in the initial autonomous driving scenario and the probability of safety violations when the vehicle corresponding to the ADS is driving in the initial autonomous driving scenario, thereby reducing the difficulty of analysis.
在另一些实施例中,自动驾驶场景的生成系统在创建参考自动驾驶场景之后,可以通过配置自动驾驶场景的生成系统中的模拟器的各个参数,仿真参考自动驾驶场景,并获取自动驾驶场景的生成系统中的ADS在仿真的参考自动驾驶场景下进行测试时输出的反馈信息,其中,ADS输出的反馈信息包括参考自动驾驶场景下ADS的车辆控制指令和参考自动驾驶场景下ADS的神经网络行为信息。In other embodiments, after creating the reference automatic driving scene, the automatic driving scene generation system can simulate the reference automatic driving scene by configuring various parameters of the simulator in the automatic driving scene generation system, and obtain the automatic driving scene The feedback information output by the ADS in the generation system when tested in the simulated reference autonomous driving scenario, wherein the feedback information output by the ADS includes the vehicle control instructions of the ADS in the reference autonomous driving scenario and the neural network behavior of the ADS in the reference autonomous driving scenario information.
需要说明的是,在本申请实施例中,车辆控制指令可以为转向信号、速度信号或车身控制信号,本申请实施例对此不作限定,示例性地,当ADS在仿真的参考自动驾驶场景下进行测试时,ADS的采集装置将采集参考自动驾驶环境下车辆的初始环境信息,并将该初始环境信息发送至两个主处理装置,主处理装置将接收到的初始环境信息进行处理以得到目标环境信息,之后再根据该目标环境信息生成车辆控制指令,从而以使车辆控制装置根据接收到的车辆控制指令对车辆进行如前进、倒退或转弯等控制。It should be noted that, in the embodiment of the present application, the vehicle control command may be a steering signal, a speed signal or a body control signal, which is not limited in the embodiment of the present application. For example, when the ADS is in the simulated reference automatic driving scenario During the test, the ADS collection device will collect the initial environmental information of the vehicle in the reference automatic driving environment, and send the initial environmental information to the two main processing devices, and the main processing device will process the received initial environmental information to obtain the target. environment information, and then generate vehicle control instructions according to the target environment information, so that the vehicle control device controls the vehicle, such as forwarding, reversing, or turning, according to the received vehicle control instructions.
需要说明的是,在本申请实施例中,ADS的神经网络可以为深度神经网络(deep neural networks,DNN),或者卷积神经网络(convolutional neural networks,CNN),或者循环神经网络(rerrent neural network,RNN),本申请实施例对此不作限定。It should be noted that, in this embodiment of the present application, the neural network of the ADS may be a deep neural network (deep neural networks, DNN), or a convolutional neural network (convolutional neural networks, CNN), or a recurrent neural network (rerrent neural network) , RNN), which is not limited in the embodiments of the present application.
S302、基于反馈信息获取安全违规参数和覆盖参数。S302. Obtain security violation parameters and coverage parameters based on the feedback information.
在一些实施例中,自动驾驶场景的生成系统在获取ADS在参考自动驾驶场景进行测试时输出的反馈信息之后,可以基于反馈信息中的车辆控制指令和神经网络行为信息分别确定安全违规参数和覆盖参数。In some embodiments, after obtaining the feedback information output by the ADS when testing with reference to the automatic driving scene, the system for generating the automatic driving scenario may determine the safety violation parameters and coverage respectively based on the vehicle control instructions and the neural network behavior information in the feedback information parameter.
需要说明的是,在本申请实施例中,安全违规参数用于指示在参考自动驾驶场景中ADS对应的车辆根据车辆控制指令行驶时安全违规的概率,例如,ADS对应的车辆与其他车辆或行人的平行距离小于安全距离的概率,或者ADS对应的车辆与其他车辆或行人的垂直距离小于安全距离的概率,或者ADS对应的车辆违反交通灯指示的概率,或者ADS对应的车辆违反交通标志指示的概率,或者ADS对应的车辆违反交通警察指挥的概率,或者ADS对应的车辆超速的概率,本申请实施例对此不作限定。It should be noted that, in this embodiment of the present application, the safety violation parameter is used to indicate the probability of a safety violation when the vehicle corresponding to the ADS drives according to the vehicle control instruction in the reference automatic driving scenario. The probability that the parallel distance of ADS is less than the safety distance, or the probability that the vertical distance between the vehicle corresponding to ADS and other vehicles or pedestrians is less than the safety distance, or the probability that the vehicle corresponding to ADS violates the traffic light, or the vehicle corresponding to ADS violates the traffic sign. The probability, or the probability that the vehicle corresponding to the ADS violates the command of the traffic police, or the probability that the vehicle corresponding to the ADS exceeds the speed, is not limited in this embodiment of the present application.
示例性地,若车辆控制指令为控制ADS对应的车辆前进50米或者后退100米,则可以根据车辆控制指令确定ADS对应的车辆在参考自动驾驶场景中与其他车辆或行人(或者路肩、障碍物等)的平行距离小于安全距离的概率,与其他车辆或行人(或者路肩、障碍物等)的垂直距离小于安全距离的概率,违反交通灯指示的概率,违反交通标志指示的概率,违反交通警察指挥的概率和车辆超速的概率。Exemplarily, if the vehicle control command is to control the vehicle corresponding to the ADS to move forward by 50 meters or back by 100 meters, it can be determined according to the vehicle control command that the vehicle corresponding to the ADS is in the reference automatic driving scenario with other vehicles or pedestrians (or road shoulders, obstacles, etc.). etc.) the probability that the parallel distance is less than the safety distance, the probability that the vertical distance with other vehicles or pedestrians (or road shoulders, obstacles, etc.) is less than the safety distance, the probability of violating traffic light instructions, the probability of violating traffic signs, the violation of traffic police Probability of command and probability of vehicle speeding.
需要说明的是,在本申请实施例中,覆盖参数用于指示在参考自动驾驶场景下进行测试时ADS对应的神经网络中激活的神经元和/或层级相关性,例如,神经网络中激活的神经元的数量和神经网络的热力图中输入子集对预测结果的影响,神经网络有时也叫做多层感知机(multi-layer perceptron,MLP),按不同层的位置划分,神经网络内部的神经网络层可以分为三类,输入层,隐藏层和输出层,一般来说第一层是输入层,最后一层是输出层,而中间的层数都是隐藏层,层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连,所以当ADS在参考自动驾驶场景进行测试时,ADS的神经网络中激活的神经元的数量越高,或者ADS的神经网络中层级相关性越高,如热力图中输入子集对预测结果的影响越大,参考自动驾驶场景的覆盖率越高,即能够测试到以前未测试的自动驾驶场景,从而覆盖更多的自动驾驶场景。It should be noted that, in this embodiment of the present application, the coverage parameter is used to indicate the activated neurons and/or hierarchical correlations in the neural network corresponding to the ADS when testing in the reference autonomous driving scenario, for example, the activated neurons in the neural network The number of neurons and the effect of input subsets in the heat map of the neural network on the prediction results. The neural network is sometimes called a multi-layer perceptron (MLP), which is divided according to the position of different layers. The network layer can be divided into three categories, input layer, hidden layer and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers. Fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer, so when the ADS is tested in the reference autonomous driving scene, the activated neural network in the ADS neural network The higher the number of elements, or the higher the level correlation in the neural network of ADS, the greater the influence of the input subset in the heatmap on the prediction results, the higher the coverage of the reference autonomous driving scene, that is, the ability to test to the previously untested to cover more autonomous driving scenarios.
示例性地,若神经网络行为信息为输入层激活的神经元的个数为2个,输出层激活的神经元的个数为1个,隐藏层激活的神经元的个数为6个,则可确定ADS在参考自动驾驶场景下进行测试时ADS的神经网络中激活的神经元的数量为8个;若神经网络行为信息为层级相关性,则可根据层级相关性,从输出层反向传播找到输入层的子集,确定神经网络的热力图,其中,热力图用于表示所有输入子集对输出结果的贡献或权重,从热力图可以直观看出输入子集对输出结果的影响。Exemplarily, if the neural network behavior information is that the number of neurons activated by the input layer is 2, the number of neurons activated by the output layer is 1, and the number of neurons activated by the hidden layer is 6, then It can be determined that the number of activated neurons in the ADS neural network is 8 when the ADS is tested in the reference autonomous driving scenario; if the neural network behavior information is hierarchical correlation, it can be back-propagated from the output layer according to the hierarchical correlation Find a subset of the input layer and determine the heat map of the neural network. The heat map is used to represent the contribution or weight of all input subsets to the output results. From the heat map, you can intuitively see the impact of the input subsets on the output results.
S303、若安全违规参数或覆盖参数不满足预设指标,则基于反馈信息对参考自动驾驶场景进行更新得到更新后的参考自动驾驶场景,直至基于ADS在更新后的参考自动驾驶场景下进行测试时输出的反馈信息获取的安全违规参数和覆盖参数满足预设指标。S303. If the safety violation parameter or the coverage parameter does not meet the preset index, update the reference automatic driving scene based on the feedback information to obtain the updated reference automatic driving scene, until the test is performed under the updated reference automatic driving scene based on the ADS The security violation parameters and coverage parameters obtained from the output feedback information satisfy the preset indicators.
在一些实施例中,自动驾驶场景的生成系统在基于反馈信息中的车辆控制指令和神经网络行为信息分别确定安全违规参数和覆盖参数之后,可以判断安全违规参数和覆盖参数是否满足预设指标,若安全违规参数或覆盖参数不满足预设指标,则基于反馈信息对参考自动驾驶场景进行更新得到更新后的参考自动驾驶场景,直至基于ADS在更新后的参考自动驾驶场景下进行测试时输出的反馈信息获取的安全违规参数和覆盖参数满足预设指标。In some embodiments, after determining the safety violation parameters and the coverage parameters respectively based on the vehicle control instructions and the neural network behavior information in the feedback information, the automatic driving scene generation system may determine whether the safety violation parameters and the coverage parameters meet the preset indicators, If the safety violation parameter or the coverage parameter does not meet the preset index, the reference automatic driving scene is updated based on the feedback information to obtain the updated reference automatic driving scene, until the output based on the ADS is output during the test under the updated reference automatic driving scene. The security violation parameters and coverage parameters obtained from the feedback information satisfy the preset indicators.
示例性地,若安全违规参数包括ADS对应的车辆违反交通警察指挥的概率和ADS对应的车辆超速的概率,其中,ADS对应的车辆违反交通警察指挥的概率为40%,ADS对应的车辆超速的概率为30%,而预设指标中ADS对应的车辆违反交通警察指挥的概率和ADS对应的车辆超速的概率应均为80%,则测试结果不满足预设指标,或者,若覆盖参数包括神经网络中激活的神经元的数量,其中,神经网络中激活的神经元的数量为6个,而预设指标中神经网络中激活的神经元的数量应为8个,则测试结果不满足预设指标。Exemplarily, if the safety violation parameter includes the probability that the vehicle corresponding to the ADS violates the command of the traffic police and the probability that the vehicle corresponding to the ADS exceeds the speed limit, the probability that the vehicle corresponding to the ADS violates the command of the traffic police is 40%, and the speed of the vehicle corresponding to the ADS is 40%. The probability is 30%, and the probability of the vehicle corresponding to ADS violating the traffic police command and the probability of speeding of the vehicle corresponding to ADS in the preset indicators should both be 80%, then the test result does not meet the preset indicators, or, if the coverage parameters include neural The number of activated neurons in the network, where the number of activated neurons in the neural network is 6, and the number of activated neurons in the neural network in the preset index should be 8, and the test result does not meet the preset index.
示例性地,当确定安全违规参数或覆盖参数不满足预设指标之后,获取强化学习智能体基于反馈信息从参考自动驾驶场景的动作空间中选择的动作,其中,动作空间中的动作为自动驾驶场景中的道路拓扑、道路降解、动态时间、动态天气、动态交通或景观信息的离散或连续更新,例如,道路拓扑为参考自动驾驶场景中的道路的类型,如直路、丁字路口、立交桥、盘山公路等,道路降解为参考自动驾驶场景中的道路退化的程度。基于选择的动作对参考驾驶场景进行更新得到更新后的参考自动驾驶场景,直至基于ADS在更新后的参考自动驾驶场景下进行测试时输出的反馈信息获取的安全违规参数和覆盖参数满足预设指标。Exemplarily, after it is determined that the safety violation parameter or the coverage parameter does not meet the preset index, the action selected by the reinforcement learning agent from the action space of the reference automatic driving scene based on the feedback information is obtained, wherein the action in the action space is the automatic driving. Discrete or continuous update of road topology, road degradation, dynamic time, dynamic weather, dynamic traffic or landscape information in the scene, for example, road topology is the type of road in the reference autonomous driving scene, such as straight road, T-junction, overpass, winding mountain Highway, etc., the road degradation is the degree of road degradation in the reference autonomous driving scenario. Update the reference driving scene based on the selected action to obtain the updated reference automatic driving scene, until the safety violation parameters and coverage parameters obtained based on the feedback information output by the ADS when testing in the updated reference automatic driving scene meet the preset indicators .
需要说明的是,在本申请实施例中,自动驾驶场景的生成系统的强化学习智能体为基于反馈信息确定奖励和当前ADS对应的车辆的状态,并基于奖励和当前ADS对应的车辆的状态从参考自动驾驶场景的动作空间中选择动作的智能体。其中,奖励为基于安全违规的奖励和基于覆盖的奖励之和,ADS对应的车辆的状态用于指示ADS对应的车辆在参考自动驾驶场景中的位置。It should be noted that, in the embodiment of the present application, the reinforcement learning agent of the automatic driving scene generation system determines the state of the vehicle corresponding to the reward and the current ADS based on the feedback information, and determines the state of the vehicle corresponding to the current ADS based on the reward and the current ADS. The agent that selects the action in the action space of the reference autonomous driving scene. The reward is the sum of the safety violation-based reward and the coverage-based reward, and the state of the vehicle corresponding to the ADS is used to indicate the position of the vehicle corresponding to the ADS in the reference autonomous driving scene.
需要说明的是,在本申请实施例中,基于安全违规的奖励为基于反馈信息中的车辆控制指令更新ADS对应的车辆的状态后ADS对应的车辆的安全违规概率接近预设指标的接近程度,例如,如果基于安全违规的奖励越大,即基于反馈信息中的车辆控制指令更新ADS对应的车辆的状态后ADS对应的车辆的安全违规概率接近预设指标的接近程度越高,则ADS对应的车辆在参考自动驾驶场景中行驶时安全违规的概率越高,从而可以鼓励参考自动驾驶场景朝容易安全违规的自动驾驶场景方向演变;It should be noted that, in the embodiment of the present application, the reward based on safety violation is the degree to which the safety violation probability of the vehicle corresponding to the ADS is close to the preset index after the state of the vehicle corresponding to the ADS is updated based on the vehicle control instruction in the feedback information, For example, if the reward based on the safety violation is larger, that is, after updating the state of the vehicle corresponding to the ADS based on the vehicle control instruction in the feedback information, the safety violation probability of the vehicle corresponding to the ADS is closer to the preset index, the higher the degree of The higher the probability of safety violations when the vehicle is driving in the reference autonomous driving scenario, which can encourage the evolution of the reference autonomous driving scenario towards an autonomous driving scenario that is prone to safety violations;
基于覆盖的奖励为基于反馈信息中的神经网络行为信息确定的参考自动驾驶场景的覆盖率接近预设指标的接近程度,例如,如果基于测试覆盖的奖励越大,即基于反馈信息中的神经网络行为信息确定的参考自动驾驶场景的覆盖率接近预设指标的接近程度越高,则ADS在参考自动驾驶场景进行测试时ADS对应的神经网络中激活的神经元和/或层级相关性越高,参考自动驾驶场景的覆盖率越高,从而可以鼓励参考自动驾驶场景朝之前未测试的自动驾驶场景方向演变,覆盖更多的自动驾驶场景。The coverage-based reward is the degree to which the coverage of the reference autonomous driving scene determined based on the neural network behavior information in the feedback information is close to the preset index. For example, if the reward based on the test coverage is larger, the The closer the coverage of the reference autonomous driving scene determined by the behavior information is to the preset index, the higher the neurons and/or hierarchical correlations activated in the neural network corresponding to the ADS when the ADS is tested with reference to the autonomous driving scene. The higher the coverage of reference autonomous driving scenarios, the more autonomous driving scenarios can be covered by encouraging reference autonomous driving scenarios to evolve towards previously untested autonomous driving scenarios.
需要说明的是,在本申请实施例中,为了自动驾驶场景的生成系统强化学习智能体可以基于反馈信息有策略地从参考自动驾驶场景的动作空间中选择动作,需要对强化学习智能体进行训练。具体的,设置强化学习智能体的神经网络模型为价值网络和策略网络,其中,价值网络用于计算设定状态下的设定动作的价值,策略网络用于获取设定状态下的动作概率分布。获取当前ADS对应的车辆的第一状态,以及强化学习智能体基于策略网络从参考自动驾驶场景的动作空间中选择的第一动作,基于第一动作对参考自动驾驶场景进行更新,并获取ADS在更新后的参考自动驾驶场景下进行测试时输出的反馈信息,获取强化学习智能体基于反馈信息确定的第一动作的奖励和当前ADS对应的车辆的第二状态,以及基于策略网络从参考自动驾驶场景的动作空间中选择的第二动作,基于价值网络计算第一状态下的第一动作的第一价值和第二状态下的第二动作的第二价值,基于第一价值、第二价值以及第一动作的奖励,确定时间差分误差,其中,时间差分误差为价值网络预测的价值和实际的价值之差,获取价值网络和策略网络的梯度,并基于时间差分误差和价值网络的梯度更新价值网络的参数,基于时间差分误差和策略网络的梯度更新策略网络的参数。It should be noted that, in the embodiment of the present application, in order for the reinforcement learning agent of the generation system of the automatic driving scene to strategically select actions from the action space of the reference automatic driving scene based on the feedback information, the reinforcement learning agent needs to be trained. . Specifically, the neural network model of the reinforcement learning agent is set as a value network and a strategy network, wherein the value network is used to calculate the value of the set action in the set state, and the strategy network is used to obtain the action probability distribution in the set state. . Obtain the first state of the vehicle corresponding to the current ADS, and the first action selected by the reinforcement learning agent from the action space of the reference automatic driving scene based on the policy network, update the reference automatic driving scene based on the first action, and obtain the ADS at The updated feedback information output during the test in the reference automatic driving scenario, to obtain the reward of the first action determined by the reinforcement learning agent based on the feedback information and the second state of the vehicle corresponding to the current ADS, and the reference automatic driving based on the policy network. For the second action selected in the action space of the scene, the first value of the first action in the first state and the second value of the second action in the second state are calculated based on the value network, based on the first value, the second value and The reward of the first action, determine the time difference error, where the time difference error is the difference between the value predicted by the value network and the actual value, obtain the gradients of the value network and the policy network, and update the value based on the time difference error and the value network gradient The parameters of the network, the parameters of the policy network are updated based on the temporal difference error and the gradient of the policy network.
示例性地,策略函数是指智能体在强化学习中使用的采用行为的规则,例如,在学习过程中,可以根据状态输出动作,并以此动作探索环境,以更新状态。价值函数是指智能体在强化学习中由环境提供的强化信号(即奖励)对动作和状态的好坏作一种评价的规则,例如,动作价值用来评价动作的好坏,状态价值用来评价当前状态的好坏。设置强化学习智能体的神经网络模型为价值网络和策略网络,即将策略网络和价值网络这两个神经网络分别近似强化学习智能体的策略函数和动作价值函数,进而近似状态价值函数,例如,若V(s;θ,ω)是状态价值函数(即当前状态的价值),它是策略函数
(即动作的概率)和动作价值函数q(s,a;ω)(即动作的价值)的连加,如
Illustratively, the policy function refers to the rules of adopting behaviors used by the agent in reinforcement learning. For example, during the learning process, an action can be output according to the state, and the environment can be explored with this action to update the state. The value function refers to a rule that the agent uses the reinforcement signal (ie reward) provided by the environment in reinforcement learning to evaluate the quality of actions and states. For example, the action value is used to evaluate the quality of the action, and the state value is used to evaluate the quality of the action. Evaluate how good the current state is. Set the neural network model of the reinforcement learning agent as the value network and the policy network, that is, the two neural networks of the policy network and the value network respectively approximate the policy function and the action value function of the reinforcement learning agent, and then approximate the state value function. For example, if V(s; θ, ω) is the state value function (that is, the value of the current state), which is the policy function (ie the probability of the action) and the action value function q(s, a; ω) (ie the value of the action), such as
对上述强化学习智能体进行训练的步骤如下:The steps for training the above reinforcement learning agent are as follows:
(1)获取当前ADS对应的车辆的第一状态s
t,以及强化学习智能体基于神经网络模型中的策略网络随机从参考自动驾驶场景的动作空间中选择的第一动作a
t;
(1) Obtain the first state s t of the vehicle corresponding to the current ADS, and the first action a t randomly selected by the reinforcement learning agent from the action space of the reference automatic driving scene based on the strategy network in the neural network model;
(2)基于第一动作a
t对参考自动驾驶场景进行更新,并获取ADS在更新后的参考自动驾驶场景下进行测试时输出的反馈信息;
(2) Update the reference automatic driving scene based on the first action at, and obtain the feedback information output by the ADS when testing in the updated reference automatic driving scene;
(3)获取强化学习智能体基于反馈信息确定第一动作a
t的奖励r
1以及当前ADS对应的车辆的第二状态s
t+1,以及强化学习智能体基于神经网络模型中的策略网络随机从参考自动驾驶场景的动作空间中选择的第二动作a
t+1;
(3) Acquiring the reinforcement learning agent to determine the reward r 1 of the first action a t based on the feedback information and the second state s t+1 of the vehicle corresponding to the current ADS, and the reinforcement learning agent based on the strategy network in the neural network model randomly a second action at +1 selected from the action space of the reference autonomous driving scene;
(4)获取强化学习智能体基于神经网络模型中的价值网络计算的第一状态下的第一动作的第一价值q
t=q(s
t,a
t;ω
t)和第二状态下的第二动作的第二价值q
t+1=q(s
t+1,a
t+1+;ω
t+1);
(4) Obtain the first value of the first action in the first state calculated by the reinforcement learning agent based on the value network in the neural network model q t =q(s t , at ; ω t ) and the value of the first action in the second state The second value of the second action q t+1 =q(s t+1 , at +1 +; ω t+1 );
(5)基于第一价值q
t、第二价值q
t+1以及第一动作a
t的奖励r
1,确定时间差分误差 (temporal-difference error,TD error),其中,TD error为价值网络预测的价值和实际的价值之差,如δ
t=q
t-(r
1+γ·q
t+1);
(5) Based on the first value q t , the second value q t+1 and the reward r 1 of the first action at t , determine the temporal-difference error (TD error), where TD error is the value network prediction The difference between the value and the actual value, such as δ t =q t -(r 1 +γ·q t+1 );
(7)基于TD error,使用下降的价值网络的梯度来更新价值网络的参数,如ω
t+1=ω
t-α·δ
t·d
ω,t;
(7) Based on the TD error, use the gradient of the descending value network to update the parameters of the value network, such as ω t+1 =ω t -α·δ t ·d ω,t ;
(9)基于TD error,使用上升的策略网络的梯度来更新策略网络的参数,如θ
t+1=θ
t+β·δ
t·d
θ,t。
(9) Based on the TD error, use the gradient of the rising policy network to update the parameters of the policy network, such as θ t+1 =θ t +β·δ t ·d θ,t .
上述技术方案,获取强化学习智能体基于反馈信息从参考自动驾驶场景的动作空间中选择的动作,其中,强化学习智能体为基于反馈信息确定奖励和当前ADS对应的车辆的状态,并基于奖励和当前ADS对应的车辆的状态从动作空间中选择动作的智能体,再基于选择的动作对参考驾驶场景进行更新得到更新后的参考自动驾驶场景,直至基于所述ADS在所述更新后的参考自动驾驶场景下进行测试时输出的反馈信息获取的安全违规参数和覆盖参数满足所述预设指标,通过获取强化学习智能体以“试错”的方式进行强化学习后,基于反馈信息中的车辆控制指令和神经网络行为信息确定的奖励去指导行为,以强化学习智能体获得最大的奖励为目标从参考自动驾驶场景的动作空间中选择的动作,从而引导驾驶场景朝容易导致安全违规和以前未测试的方向更新,即生成最可能需要测试的容易导致安全违规和覆盖率较高的驾驶场景,不仅可以提高测试驾驶场景的效率,同时还可以覆盖更多的驾驶场景,另外,基于闭环的自动化测试可进一步降低时间开销和人力开销。The above technical solution obtains the action selected by the reinforcement learning agent from the action space of the reference automatic driving scene based on the feedback information, wherein the reinforcement learning agent determines the reward based on the feedback information and the state of the vehicle corresponding to the current ADS, and based on the reward and the state of the vehicle corresponding to the current ADS are obtained. The state of the vehicle corresponding to the current ADS selects the action agent from the action space, and then updates the reference driving scene based on the selected action to obtain the updated reference automatic driving scene until the updated reference automatic driving scene based on the ADS The safety violation parameters and coverage parameters obtained from the feedback information output during the test in the driving scene meet the preset indicators, and after the reinforcement learning agent performs reinforcement learning in a "trial and error" manner, the vehicle control based on the feedback information is obtained. Rewards determined by instructions and neural network behavior information to guide behavior, with the goal of the reinforcement learning agent obtaining the maximum reward Actions selected from the action space of the reference autonomous driving scene, thus guiding the driving scene toward the easy-to-lead safety violation and previously untested The direction of the update is to generate driving scenarios that are most likely to lead to safety violations and have high coverage that need to be tested, which can not only improve the efficiency of testing driving scenarios, but also cover more driving scenarios. In addition, closed-loop-based automated testing Time and labor costs can be further reduced.
上述各个实施例可以单独使用,也可以相互结合使用,以达到不同的技术效果。The above embodiments can be used alone or in combination with each other to achieve different technical effects.
上述本申请提供的实施例中,从自动驾驶场景的生成系统作为执行主体的角度对本申请实施例提供的方法进行了介绍。为了实现上述本申请实施例提供的方法中的各功能,自动驾驶场景的生成系统可以包括硬件结构和/或软件模块,以硬件结构、软件模块、或硬件结构加软件模块的形式来实现上述各功能。上述各功能中的某个功能以硬件结构、软件模块、还是硬件结构加软件模块的方式来执行,取决于技术方案的特定应用和设计约束条件。In the above-mentioned embodiments of the present application, the methods provided by the embodiments of the present application are introduced from the perspective of an automatic driving scene generation system as an execution subject. In order to realize the functions in the methods provided by the above embodiments of the present application, the generation system for automatic driving scenarios may include a hardware structure and/or a software module, and implement the above-mentioned various functions in the form of a hardware structure, a software module, or a hardware structure plus a software module. Function. Whether one of the above functions is performed in the form of a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraints of the technical solution.
基于同一技术构思,本申请实施例还提供一种自动驾驶场景的生成装置400,该装置400可以是自动驾驶场景的生成系统,或者是自动驾驶场景的生成系统中的装置,该装置400包括用于执行上述图3所示方法的模块。示例性地,参见图4,该装置400可以包括:Based on the same technical concept, an embodiment of the present application also provides an apparatus 400 for generating an automatic driving scene. The apparatus 400 may be a system for generating an automatic driving scene, or an apparatus in a system for generating an automatic driving scene. The apparatus 400 includes a A module for executing the method shown in FIG. 3 above. Illustratively, referring to FIG. 4 , the apparatus 400 may include:
第一获取模块401,用于获取自动驾驶系统ADS在参考自动驾驶场景下进行测试时输出的反馈信息;其中,所述反馈信息包括车辆控制指令和神经网络行为信息;The first obtaining module 401 is used to obtain feedback information output by the automatic driving system ADS when testing in a reference automatic driving scenario; wherein the feedback information includes vehicle control instructions and neural network behavior information;
第二获取模块402,用于基于所述反馈信息获取安全违规参数和覆盖参数;其中,所述安全违规参数用于指示在所述参考自动驾驶场景中所述ADS对应的车辆根据所述车辆控制指令行驶时安全违规的概率,所述覆盖参数用于指示在所述参考自动驾驶场景下进行测试时所述ADS对应的神经网络中激活的神经元和/或层级相关性;The second obtaining module 402 is configured to obtain safety violation parameters and coverage parameters based on the feedback information; wherein the safety violation parameters are used to indicate that in the reference automatic driving scenario, the vehicle corresponding to the ADS is controlled according to the vehicle The probability of safety violation when commanded to drive, and the coverage parameter is used to indicate the activated neurons and/or hierarchical correlations in the neural network corresponding to the ADS when testing in the reference autonomous driving scenario;
更新模块403,用于若所述安全违规参数或所述覆盖参数不满足预设指标,则基于所述反馈信息对所述参考自动驾驶场景进行更新得到更新后的参考自动驾驶场景,直至基于所述ADS在所述更新后的参考自动驾驶场景下进行测试时输出的反馈信息获取的安全违规参数和覆盖参数满足所述预设指标。The updating module 403 is configured to update the reference automatic driving scene based on the feedback information to obtain an updated reference automatic driving scene if the safety violation parameter or the coverage parameter does not meet the preset index, until the reference automatic driving scene is updated based on the feedback information. The safety violation parameter and the coverage parameter obtained from the feedback information output by the ADS during the test in the updated reference automatic driving scenario satisfy the preset index.
一种可能的设计中,所述装置还包括创建模块,用于:In a possible design, the device further includes a creation module for:
获取所述ADS的初始自动驾驶场景;obtaining the initial automatic driving scenario of the ADS;
分析所述初始自动驾驶场景中的道路类型以及所述ADS对应的车辆在所述初始自动驾驶场景中行驶时安全违规的概率,基于分析结果,创建所述参考自动驾驶场景;其中,所述参考自动驾驶场景包括典型自动驾驶场景、缺失自动驾驶场景和易违规自动驾驶场景中的任一种或多种。Analyze the road type in the initial automatic driving scene and the probability of safety violations when the vehicle corresponding to the ADS is driving in the initial automatic driving scene, and create the reference automatic driving scene based on the analysis result; wherein, the reference The autonomous driving scenarios include any one or more of typical autonomous driving scenarios, missing autonomous driving scenarios, and autonomous driving scenarios that are prone to violations.
一种可能的设计中,所述安全违规参数包括所述ADS对应的车辆与其他车辆或行人的平行距离小于安全距离的概率,所述ADS对应的车辆与其他车辆或行人的垂直距离小于安全距离的概率,所述ADS对应的车辆违反交通灯指示的概率,所述ADS对应的车辆违反交通标志指示的概率,所述ADS对应的车辆违反交通警察指挥的概率和所述ADS对应的车辆超速的概率中的任一种或多种;In a possible design, the safety violation parameter includes the probability that the parallel distance between the vehicle corresponding to the ADS and other vehicles or pedestrians is less than the safety distance, and the vertical distance between the vehicle corresponding to the ADS and other vehicles or pedestrians is less than the safety distance. The probability that the vehicle corresponding to the ADS violates the traffic light indication, the probability that the vehicle corresponding to the ADS violates the traffic sign indication, the probability that the vehicle corresponding to the ADS violates the command of the traffic police and the speeding rate of the vehicle corresponding to the ADS any one or more of the probabilities;
所述覆盖参数包括所述ADS对应的神经网络中激活的神经元的数量和所述ADS对应的神经网络的热力图中输入子集对预测结果的影响的权重中的任一种或多种。The coverage parameter includes any one or more of the number of activated neurons in the neural network corresponding to the ADS and the weight of the influence of the input subset on the prediction result in the heatmap of the neural network corresponding to the ADS.
一种可能的设计中,所述更新模块403,具体用于:In a possible design, the update module 403 is specifically used for:
获取强化学习智能体基于所述反馈信息从所述参考自动驾驶场景的动作空间中选择的动作;其中,所述动作空间中的动作为自动驾驶场景中的道路拓扑、道路降解、动态时间、动态天气、动态交通或景观信息的离散或连续更新;Acquiring actions selected by the reinforcement learning agent from the action space of the reference autonomous driving scene based on the feedback information; wherein the actions in the action space are road topology, road degradation, dynamic time, dynamic time in the autonomous driving scene Discrete or continuous updates of weather, dynamic traffic or landscape information;
基于所述选择的动作对所述参考驾驶场景进行更新。The reference driving scenario is updated based on the selected action.
一种可能的设计中,所述强化学习智能体为基于反馈信息确定奖励和当前所述ADS对应的车辆的状态,并基于所述奖励和当前所述ADS对应的车辆的状态从所述动作空间中选择动作的智能体;In a possible design, the reinforcement learning agent determines the reward based on the feedback information and the current state of the vehicle corresponding to the ADS, and based on the reward and the current state of the vehicle corresponding to the ADS, from the action space The agent that selects the action in ;
其中,所述奖励为基于安全违规的奖励和基于覆盖的奖励之和,所述基于安全违规的奖励为基于所述反馈信息中的车辆控制指令更新所述ADS对应的车辆的状态后所述ADS对应的车辆的安全违规概率接近预设指标的接近程度,所述基于覆盖的奖励为基于所述反馈信息中的神经网络行为信息确定的所述参考自动驾驶场景的覆盖率接近预设指标的接近程度,所述ADS对应的车辆的状态用于指示所述ADS对应的车辆在所述参考自动驾驶场景中的位置。Wherein, the reward is the sum of the reward based on safety violation and the reward based on coverage, and the reward based on safety violation is the ADS after the state of the vehicle corresponding to the ADS is updated based on the vehicle control instruction in the feedback information The degree to which the safety violation probability of the corresponding vehicle is close to the preset index, and the coverage-based reward is the closeness that the coverage rate of the reference automatic driving scene determined based on the neural network behavior information in the feedback information is close to the preset index The state of the vehicle corresponding to the ADS is used to indicate the position of the vehicle corresponding to the ADS in the reference automatic driving scene.
一种可能的设计中,所述强化学习智能体的神经网络模型包括价值网络和策略网络,所述价值网络用于计算设定状态下的设定动作的价值,所述策略网络用于获取设定状态下的动作概率分布;所述装置还包括训练模块,用于:In a possible design, the neural network model of the reinforcement learning agent includes a value network and a strategy network, the value network is used to calculate the value of a set action in a set state, and the strategy network is used to obtain the set value. Action probability distribution in a fixed state; the device also includes a training module for:
获取当前所述ADS对应的车辆的第一状态,以及所述强化学习智能体基于所述策略网络从所述动作空间中选择的第一动作;obtaining the first state of the vehicle corresponding to the current ADS, and the first action selected by the reinforcement learning agent from the action space based on the policy network;
基于所述第一动作对所述参考自动驾驶场景进行更新,并获取所述ADS在所述更新后的参考自动驾驶场景下进行测试时输出的反馈信息;updating the reference automatic driving scene based on the first action, and acquiring feedback information output by the ADS when testing in the updated reference automatic driving scene;
获取所述强化学习智能体基于所述反馈信息确定的所述第一动作的奖励和当前所述ADS对应的车辆的第二状态,以及基于所述策略网络从所述动作空间中选择的第二动作;Obtain the reward of the first action determined by the reinforcement learning agent based on the feedback information and the current second state of the vehicle corresponding to the ADS, and the second state selected from the action space based on the policy network. action;
基于所述价值网络计算所述第一状态下的第一动作的第一价值和所述第二状态下的第二动作的第二价值;computing a first value for a first action in the first state and a second value for a second action in the second state based on the value network;
基于所述第一价值、所述第二价值以及所述第一动作的奖励,确定时间差分误差;其中,所述时间差分误差为所述价值网络预测的价值和实际的价值之差;Based on the first value, the second value and the reward of the first action, a time difference error is determined; wherein, the time difference error is the difference between the value predicted by the value network and the actual value;
获取所述价值网络和所述策略网络的梯度,并基于所述时间差分误差和所述价值网络的梯度更新所述价值网络的参数,基于所述时间差分误差和所述策略网络的梯度更新所述 策略网络的参数。Obtain the gradients of the value network and the strategy network, and update the parameters of the value network based on the time difference error and the gradient of the value network, and update the parameters based on the time difference error and the gradient of the strategy network. Describe the parameters of the policy network.
基于同一技术构思,参见图5,本申请实施例还提供一种自动驾驶场景的生成系统500,包括:Based on the same technical concept, referring to FIG. 5 , an embodiment of the present application further provides a system 500 for generating an automatic driving scenario, including:
至少一个处理器501;以及,与所述至少一个处理器501通信连接的通信接口503;at least one processor 501; and, a communication interface 503 communicatively connected to the at least one processor 501;
其中,所述至少一个处理器501通过执行存储器502存储的指令,使得所述自动驾驶场景的生成系统500执行图3所示的方法。Wherein, the at least one processor 501 causes the automatic driving scenario generation system 500 to execute the method shown in FIG. 3 by executing the instructions stored in the memory 502 .
可选的,所述存储器502位于所述自动驾驶场景的生成系统500之外。Optionally, the memory 502 is located outside the automatic driving scenario generation system 500 .
可选的,所述自动驾驶场景的生成系统500包括所述存储器502,所述存储器502与所述至少一个处理器501相连,所述存储器502存储有可被所述至少一个处理器501执行的指令。附图5用虚线表示存储器502对于自动驾驶场景的生成系统500是可选的。Optionally, the automatic driving scenario generation system 500 includes the memory 502 , the memory 502 is connected to the at least one processor 501 , and the memory 502 stores the memory 502 that can be executed by the at least one processor 501 . instruction. FIG. 5 shows, with dashed lines, that memory 502 is optional to system 500 for generating autonomous driving scenarios.
其中,所述处理器501和所述存储器502可以通过接口电路耦合,也可以集成在一起,这里不做限制。The processor 501 and the memory 502 may be coupled through an interface circuit, or may be integrated together, which is not limited here.
本申请实施例中不限定上述处理器501、存储器502以及通信接口503之间的具体连接介质。本申请实施例在图5中以处理器501、存储器502以及通信接口503之间通过总线504连接,总线在图5中以粗线表示,其它部件之间的连接方式,仅是进行示意性说明,并不引以为限。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图5中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The specific connection medium between the processor 501 , the memory 502 , and the communication interface 503 is not limited in the embodiments of the present application. In the embodiment of the present application, the processor 501, the memory 502, and the communication interface 503 are connected through a bus 504 in FIG. 5. The bus is represented by a thick line in FIG. 5, and the connection between other components is only for schematic illustration. , is not limited. The bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in FIG. 5, but it does not mean that there is only one bus or one type of bus.
应理解,本申请实施例中提及的处理器可以通过硬件实现也可以通过软件实现。当通过硬件实现时,该处理器可以是逻辑电路、集成电路等。当通过软件实现时,该处理器可以是一个通用处理器,通过读取存储器中存储的软件代码来实现。It should be understood that the processor mentioned in the embodiments of the present application may be implemented by hardware or software. When implemented in hardware, the processor may be a logic circuit, an integrated circuit, or the like. When implemented in software, the processor may be a general-purpose processor implemented by reading software codes stored in memory.
示例性地,处理器可以是中央处理单元(central processing unit,CPU),还可以是其他通用处理器、数字信号处理器(digital signal processing or,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。Exemplarily, the processor may be a central processing unit (central processing unit, CPU), or other general-purpose processors, digital signal processing or (DSP), application specific integrated circuit (application specific integrated circuit, ASIC) ), off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
应理解,本申请实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data eate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。It should be understood that the memory mentioned in the embodiments of the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory may be random access memory (RAM), which acts as an external cache. By way of example and not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data eat SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synchlink DRAM, SLDRAM) ) and direct memory bus random access memory (direct rambus RAM, DR RAM).
需要说明的是,当处理器为通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件时,存储器(存储模块)可以集成在处理器中。It should be noted that when the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components, the memory (storage module) can be integrated in the processor.
应注意,本文描述的存储器旨在包括但不限于这些和任意其它适合类型的存储器。It should be noted that the memory described herein is intended to include, but not be limited to, these and any other suitable types of memory.
基于同一技术构思,本申请实施例还提供一种计算机存储介质,包括计算机指令,当所述计算机指令在计算机上运行时,使得如图3所示的方法被执行。Based on the same technical concept, an embodiment of the present application further provides a computer storage medium, including computer instructions, when the computer instructions are executed on a computer, the method shown in FIG. 3 is executed.
基于同一技术构思,本申请实施例还提供一种芯片,所述芯片与存储器耦合,用于读取并执行所述存储器中存储的程序指令,使得如图3所示的方法被执行。Based on the same technical concept, an embodiment of the present application further provides a chip, which is coupled to a memory and used to read and execute program instructions stored in the memory, so that the method shown in FIG. 3 is executed.
基于同一技术构思,本申请实施例还提供一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得如图3所示的方法被执行。Based on the same technical concept, an embodiment of the present application also provides a computer program product, which enables the method shown in FIG. 3 to be executed when the computer program product runs on a computer.
应理解,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。It should be understood that, all relevant contents of the steps involved in the above method embodiments can be cited in the functional descriptions of the corresponding functional modules, which will not be repeated here.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the present application. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的保护范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present application without departing from the protection scope of the present application. Thus, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include these modifications and variations.
Claims (15)
- 一种自动驾驶场景的生成方法,其特征在于,包括:A method for generating an automatic driving scene, comprising:获取自动驾驶系统ADS在参考自动驾驶场景下进行测试时输出的反馈信息;其中,所述反馈信息包括车辆控制指令和神经网络行为信息;Acquiring feedback information output by the automatic driving system ADS when testing in a reference automatic driving scenario; wherein the feedback information includes vehicle control instructions and neural network behavior information;基于所述反馈信息获取安全违规参数和覆盖参数;其中,所述安全违规参数用于指示在所述参考自动驾驶场景中所述ADS对应的车辆根据所述车辆控制指令行驶时安全违规的概率,所述覆盖参数用于指示在所述参考自动驾驶场景下进行测试时所述ADS对应的神经网络中激活的神经元和/或层级相关性;Obtain safety violation parameters and coverage parameters based on the feedback information; wherein, the safety violation parameters are used to indicate the probability of safety violations when the vehicle corresponding to the ADS in the reference automatic driving scenario is driving according to the vehicle control instruction, The coverage parameter is used to indicate the activated neurons and/or hierarchical correlations in the neural network corresponding to the ADS when testing in the reference automatic driving scenario;若所述安全违规参数或所述覆盖参数不满足预设指标,则基于所述反馈信息对所述参考自动驾驶场景进行更新得到更新后的参考自动驾驶场景,直至基于所述ADS在所述更新后的参考自动驾驶场景下进行测试时输出的反馈信息获取的安全违规参数和覆盖参数满足所述预设指标。If the safety violation parameter or the coverage parameter does not meet the preset index, the reference automatic driving scene is updated based on the feedback information to obtain an updated reference automatic driving scene, until the updated reference automatic driving scene is performed based on the ADS. The safety violation parameters and coverage parameters obtained by referring to the feedback information output during the test in the automatic driving scenario meet the preset indicators.
- 如权利要求1所述的方法,其特征在于,获取自动驾驶系统ADS在参考自动驾驶场景下进行测试时输出的反馈信息之前,还包括:The method according to claim 1, before acquiring the feedback information output by the automatic driving system ADS when testing in the reference automatic driving scenario, further comprising:获取所述ADS的初始自动驾驶场景;obtaining the initial automatic driving scenario of the ADS;分析所述初始自动驾驶场景中的道路类型以及所述ADS对应的车辆在所述初始自动驾驶场景中行驶时安全违规的概率,基于分析结果,创建所述参考自动驾驶场景;其中,所述参考自动驾驶场景包括典型自动驾驶场景、缺失自动驾驶场景和易违规自动驾驶场景中的任一种或多种。Analyze the road type in the initial automatic driving scene and the probability of safety violations when the vehicle corresponding to the ADS is driving in the initial automatic driving scene, and create the reference automatic driving scene based on the analysis result; wherein, the reference The autonomous driving scenarios include any one or more of typical autonomous driving scenarios, missing autonomous driving scenarios, and autonomous driving scenarios that are prone to violations.
- 如权利要求1或2所述的方法,其特征在于,所述安全违规参数包括所述ADS对应的车辆与其他车辆或行人的平行距离小于安全距离的概率,所述ADS对应的车辆与其他车辆或行人的垂直距离小于安全距离的概率,所述ADS对应的车辆违反交通灯指示的概率,所述ADS对应的车辆违反交通标志指示的概率,所述ADS对应的车辆违反交通警察指挥的概率和所述ADS对应的车辆超速的概率中的任一种或多种;The method according to claim 1 or 2, wherein the safety violation parameter includes a probability that the parallel distance between the vehicle corresponding to the ADS and other vehicles or pedestrians is less than a safe distance, and the vehicle corresponding to the ADS is connected to other vehicles. Or the probability that the vertical distance of the pedestrian is less than the safety distance, the probability that the vehicle corresponding to the ADS violates the traffic light indication, the probability that the vehicle corresponding to the ADS violates the traffic sign indication, the probability that the vehicle corresponding to the ADS violates the traffic police command and any one or more of the probabilities of the vehicle speeding corresponding to the ADS;所述覆盖参数包括所述ADS对应的神经网络中激活的神经元的数量和所述ADS对应的神经网络的热力图中输入子集对预测结果的影响的权重中的任一种或多种。The coverage parameter includes any one or more of the number of activated neurons in the neural network corresponding to the ADS and the weight of the influence of the input subset on the prediction result in the heatmap of the neural network corresponding to the ADS.
- 如权利要求1~3任一所述的方法,其特征在于,基于所述反馈信息对所述参考自动驾驶场景进行更新,包括:The method according to any one of claims 1 to 3, wherein updating the reference automatic driving scene based on the feedback information comprises:获取强化学习智能体基于所述反馈信息从所述参考自动驾驶场景的动作空间中选择的动作;其中,所述动作空间中的动作为自动驾驶场景中的道路拓扑、道路降解、动态时间、动态天气、动态交通或景观信息的离散或连续更新;Acquiring actions selected by the reinforcement learning agent from the action space of the reference autonomous driving scene based on the feedback information; wherein the actions in the action space are road topology, road degradation, dynamic time, dynamic time in the autonomous driving scene Discrete or continuous updates of weather, dynamic traffic or landscape information;基于所述选择的动作对所述参考驾驶场景进行更新。The reference driving scenario is updated based on the selected action.
- 如权利要求4所述的方法,其特征在于,所述强化学习智能体为基于反馈信息确定奖励和当前所述ADS对应的车辆的状态,并基于所述奖励和当前所述ADS对应的车辆的状态从所述动作空间中选择动作的智能体;The method of claim 4, wherein the reinforcement learning agent determines the reward based on the feedback information and the state of the vehicle corresponding to the current ADS, and based on the reward and the current state of the vehicle corresponding to the ADS Agents whose states select actions from said action space;其中,所述奖励为基于安全违规的奖励和基于覆盖的奖励之和,所述基于安全违规的奖励为基于所述反馈信息中的车辆控制指令更新所述ADS对应的车辆的状态后所述ADS对应的车辆的安全违规概率接近预设指标的接近程度,所述基于覆盖的奖励为基于所述反馈信息中的神经网络行为信息确定的所述参考自动驾驶场景的覆盖率接近预设指标的接 近程度,所述ADS对应的车辆的状态用于指示所述ADS对应的车辆在所述参考自动驾驶场景中的位置。Wherein, the reward is the sum of the reward based on safety violation and the reward based on coverage, and the reward based on safety violation is the ADS after the state of the vehicle corresponding to the ADS is updated based on the vehicle control instruction in the feedback information The degree to which the safety violation probability of the corresponding vehicle is close to the preset index, and the coverage-based reward is the closeness that the coverage rate of the reference automatic driving scene determined based on the neural network behavior information in the feedback information is close to the preset index The state of the vehicle corresponding to the ADS is used to indicate the position of the vehicle corresponding to the ADS in the reference automatic driving scene.
- 如权利要求5所述的方法,其特征在于,所述强化学习智能体的神经网络模型包括价值网络和策略网络,所述价值网络用于计算设定状态下的设定动作的价值,所述策略网络用于获取设定状态下的动作概率分布;获取强化学习智能体基于所述反馈信息从所述参考自动驾驶场景的动作空间中选择的动作之前,还包括:The method of claim 5, wherein the neural network model of the reinforcement learning agent comprises a value network and a policy network, the value network is used to calculate the value of a set action in a set state, the The policy network is used to obtain the action probability distribution in the set state; before obtaining the action selected by the reinforcement learning agent from the action space of the reference automatic driving scene based on the feedback information, the method further includes:获取当前所述ADS对应的车辆的第一状态,以及所述强化学习智能体基于所述策略网络从所述动作空间中选择的第一动作;obtaining the first state of the vehicle corresponding to the current ADS, and the first action selected by the reinforcement learning agent from the action space based on the policy network;基于所述第一动作对所述参考自动驾驶场景进行更新,并获取所述ADS在所述更新后的参考自动驾驶场景下进行测试时输出的反馈信息;updating the reference automatic driving scene based on the first action, and acquiring feedback information output by the ADS when testing in the updated reference automatic driving scene;获取所述强化学习智能体基于所述反馈信息确定的所述第一动作的奖励和当前所述ADS对应的车辆的第二状态,以及基于所述策略网络从所述动作空间中选择的第二动作;Obtain the reward of the first action determined by the reinforcement learning agent based on the feedback information and the current second state of the vehicle corresponding to the ADS, and the second state selected from the action space based on the policy network. action;基于所述价值网络计算所述第一状态下的第一动作的第一价值和所述第二状态下的第二动作的第二价值;computing a first value for a first action in the first state and a second value for a second action in the second state based on the value network;基于所述第一价值、所述第二价值以及所述第一动作的奖励,确定时间差分误差;其中,所述时间差分误差为所述价值网络预测的价值和实际的价值之差;Based on the first value, the second value and the reward of the first action, a time difference error is determined; wherein, the time difference error is the difference between the value predicted by the value network and the actual value;获取所述价值网络和所述策略网络的梯度,并基于所述时间差分误差和所述价值网络的梯度更新所述价值网络的参数,基于所述时间差分误差和所述策略网络的梯度更新所述策略网络的参数。Obtain the gradients of the value network and the strategy network, and update the parameters of the value network based on the time difference error and the gradient of the value network, and update the parameters based on the time difference error and the gradient of the strategy network. Describe the parameters of the policy network.
- 一种自动驾驶场景的生成装置,其特征在于,包括:A device for generating automatic driving scenarios, comprising:第一获取模块,用于获取自动驾驶系统ADS在参考自动驾驶场景下进行测试时输出的反馈信息;其中,所述反馈信息包括车辆控制指令和神经网络行为信息;a first acquisition module, configured to acquire feedback information output by the automatic driving system ADS when it is tested in a reference automatic driving scenario; wherein the feedback information includes vehicle control instructions and neural network behavior information;第二获取模块,用于基于所述反馈信息获取安全违规参数和覆盖参数;其中,所述安全违规参数用于指示在所述参考自动驾驶场景中所述ADS对应的车辆根据所述车辆控制指令行驶时安全违规的概率,所述覆盖参数用于指示在所述参考自动驾驶场景下进行测试时所述ADS对应的神经网络中激活的神经元和/或层级相关性;The second obtaining module is configured to obtain safety violation parameters and coverage parameters based on the feedback information; wherein the safety violation parameters are used to indicate that the vehicle corresponding to the ADS in the reference automatic driving scenario is based on the vehicle control instruction The probability of safety violation while driving, the coverage parameter is used to indicate the activated neurons and/or hierarchical correlations in the neural network corresponding to the ADS when testing in the reference autonomous driving scenario;更新模块,用于若所述安全违规参数或所述覆盖参数不满足预设指标,则基于所述反馈信息对所述参考自动驾驶场景进行更新得到更新后的参考自动驾驶场景,直至基于所述ADS在所述更新后的参考自动驾驶场景下进行测试时输出的反馈信息获取的安全违规参数和覆盖参数满足所述预设指标。an update module, configured to update the reference automatic driving scene based on the feedback information to obtain an updated reference automatic driving scene if the safety violation parameter or the coverage parameter does not meet the preset index, until the reference automatic driving scene is updated based on the feedback information. The safety violation parameter and the coverage parameter obtained from the feedback information output by the ADS during the test in the updated reference automatic driving scenario satisfy the preset index.
- 如权利要求7所述的装置,其特征在于,所述装置还包括创建模块,用于:The apparatus of claim 7, wherein the apparatus further comprises a creation module for:获取所述ADS的初始自动驾驶场景;obtaining the initial automatic driving scenario of the ADS;分析所述初始自动驾驶场景中的道路类型以及所述ADS对应的车辆在所述初始自动驾驶场景中行驶时安全违规的概率,基于分析结果,创建所述参考自动驾驶场景;其中,所述参考自动驾驶场景包括典型自动驾驶场景、缺失自动驾驶场景和易违规自动驾驶场景中的任一种或多种。Analyze the road type in the initial automatic driving scene and the probability of safety violations when the vehicle corresponding to the ADS is driving in the initial automatic driving scene, and create the reference automatic driving scene based on the analysis result; wherein, the reference The autonomous driving scenarios include any one or more of typical autonomous driving scenarios, missing autonomous driving scenarios, and autonomous driving scenarios that are prone to violations.
- 如权利要求7或8所述的装置,其特征在于,所述安全违规参数包括所述ADS对应的车辆与其他车辆或行人的平行距离小于安全距离的概率,所述ADS对应的车辆与其他车辆或行人的垂直距离小于安全距离的概率,所述ADS对应的车辆违反交通灯指示的概率,所述ADS对应的车辆违反交通标志指示的概率,所述ADS对应的车辆违反交通警察 指挥的概率和所述ADS对应的车辆超速的概率中的任一种或多种;The device according to claim 7 or 8, wherein the safety violation parameter includes a probability that the parallel distance between the vehicle corresponding to the ADS and other vehicles or pedestrians is less than a safe distance, and the vehicle corresponding to the ADS is connected to other vehicles. Or the probability that the vertical distance of the pedestrian is less than the safety distance, the probability that the vehicle corresponding to the ADS violates the traffic light indication, the probability that the vehicle corresponding to the ADS violates the traffic sign indication, the probability that the vehicle corresponding to the ADS violates the traffic police command and any one or more of the probabilities of the vehicle speeding corresponding to the ADS;所述覆盖参数包括所述ADS对应的神经网络中激活的神经元的数量和所述ADS对应的神经网络的热力图中输入子集对预测结果的影响的权重中的任一种或多种。The coverage parameter includes any one or more of the number of activated neurons in the neural network corresponding to the ADS and the weight of the influence of the input subset on the prediction result in the heatmap of the neural network corresponding to the ADS.
- 如权利要求7~9任一所述的装置,其特征在于,所述更新模块,具体用于:The device according to any one of claims 7 to 9, wherein the update module is specifically configured to:获取强化学习智能体基于所述反馈信息从所述参考自动驾驶场景的动作空间中选择的动作;其中,所述动作空间中的动作为自动驾驶场景中的道路拓扑、道路降解、动态时间、动态天气、动态交通或景观信息的离散或连续更新;Acquiring actions selected by the reinforcement learning agent from the action space of the reference autonomous driving scene based on the feedback information; wherein the actions in the action space are road topology, road degradation, dynamic time, dynamic time in the autonomous driving scene Discrete or continuous updates of weather, dynamic traffic or landscape information;基于所述选择的动作对所述参考驾驶场景进行更新。The reference driving scenario is updated based on the selected action.
- 如权利要求10所述的装置,其特征在于,所述强化学习智能体为基于反馈信息确定奖励和当前所述ADS对应的车辆的状态,并基于所述奖励和当前所述ADS对应的车辆的状态从所述动作空间中选择动作的智能体;The device of claim 10, wherein the reinforcement learning agent determines the reward based on the feedback information and the state of the vehicle corresponding to the current ADS, and based on the reward and the current state of the vehicle corresponding to the ADS Agents whose states select actions from said action space;其中,所述奖励为基于安全违规的奖励和基于覆盖的奖励之和,所述基于安全违规的奖励为基于所述反馈信息中的车辆控制指令更新所述ADS对应的车辆的状态后所述ADS对应的车辆的安全违规概率接近预设指标的接近程度,所述基于覆盖的奖励为基于所述反馈信息中的神经网络行为信息确定的所述参考自动驾驶场景的覆盖率接近预设指标的接近程度,所述ADS对应的车辆的状态用于指示所述ADS对应的车辆在所述参考自动驾驶场景中的位置。Wherein, the reward is the sum of the reward based on safety violation and the reward based on coverage, and the reward based on safety violation is the ADS after the state of the vehicle corresponding to the ADS is updated based on the vehicle control instruction in the feedback information The degree to which the safety violation probability of the corresponding vehicle is close to the preset index, and the coverage-based reward is the closeness that the coverage rate of the reference automatic driving scene determined based on the neural network behavior information in the feedback information is close to the preset index The state of the vehicle corresponding to the ADS is used to indicate the position of the vehicle corresponding to the ADS in the reference automatic driving scene.
- 如权利要求11所述的装置,其特征在于,所述强化学习智能体的神经网络模型包括价值网络和策略网络,所述价值网络用于计算设定状态下的设定动作的价值,所述策略网络用于获取设定状态下的动作概率分布;所述装置还包括训练模块,用于:The device according to claim 11, wherein the neural network model of the reinforcement learning agent comprises a value network and a policy network, the value network is used to calculate the value of a set action in a set state, the The strategy network is used to obtain the action probability distribution under the set state; the device further includes a training module for:获取当前所述ADS对应的车辆的第一状态,以及所述强化学习智能体基于所述策略网络从所述动作空间中选择的第一动作;obtaining the first state of the vehicle corresponding to the current ADS, and the first action selected by the reinforcement learning agent from the action space based on the policy network;基于所述第一动作对所述参考自动驾驶场景进行更新,并获取所述ADS在所述更新后的参考自动驾驶场景下进行测试时输出的反馈信息;updating the reference automatic driving scene based on the first action, and acquiring feedback information output by the ADS when testing in the updated reference automatic driving scene;获取所述强化学习智能体基于所述反馈信息确定的所述第一动作的奖励和当前所述ADS对应的车辆的第二状态,以及基于所述策略网络从所述动作空间中选择的第二动作;Obtain the reward of the first action determined by the reinforcement learning agent based on the feedback information and the current second state of the vehicle corresponding to the ADS, and the second state selected from the action space based on the policy network. action;基于所述价值网络计算所述第一状态下的第一动作的第一价值和所述第二状态下的第二动作的第二价值;computing a first value for a first action in the first state and a second value for a second action in the second state based on the value network;基于所述第一价值、所述第二价值以及所述第一动作的奖励,确定时间差分误差;其中,所述时间差分误差为所述价值网络预测的价值和实际的价值之差;Based on the first value, the second value and the reward of the first action, a time difference error is determined; wherein, the time difference error is the difference between the value predicted by the value network and the actual value;获取所述价值网络和所述策略网络的梯度,并基于所述时间差分误差和所述价值网络的梯度更新所述价值网络的参数,基于所述时间差分误差和所述策略网络的梯度更新所述策略网络的参数。Obtain the gradients of the value network and the strategy network, and update the parameters of the value network based on the time difference error and the gradient of the value network, and update the parameters based on the time difference error and the gradient of the strategy network. Describe the parameters of the policy network.
- 一种自动驾驶场景的生成系统,其特征在于,所述系统包括存储器和处理器;所述存储器,用于存储计算机指令;所述处理器,用于调用所述存储器存储的计算机指令,以执行如权利要求1-6中任一项所述的自动驾驶场景的生成方法。A system for generating automatic driving scenarios, characterized in that the system includes a memory and a processor; the memory is used to store computer instructions; the processor is used to call the computer instructions stored in the memory to execute The method for generating an automatic driving scene according to any one of claims 1-6.
- 一种计算机存储介质,其特征在于,包括计算机指令,当所述计算机指令在计算机上运行时,使得所述计算机执行如权利要求1-6任一项所述的自动驾驶场景的生成方法。A computer storage medium, characterized by comprising computer instructions, which, when the computer instructions are executed on a computer, cause the computer to execute the method for generating an automatic driving scene according to any one of claims 1-6.
- 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1-6中任一项所述的自动驾驶场景的生成方法。A computer program product, characterized in that, when the computer program product runs on a computer, the computer is caused to execute the method for generating an automatic driving scene according to any one of claims 1-6.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202180000816.2A CN112997128B (en) | 2021-04-19 | 2021-04-19 | Method, device and system for generating automatic driving scene |
PCT/CN2021/088037 WO2022221979A1 (en) | 2021-04-19 | 2021-04-19 | Automated driving scenario generation method, apparatus, and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/088037 WO2022221979A1 (en) | 2021-04-19 | 2021-04-19 | Automated driving scenario generation method, apparatus, and system |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022221979A1 true WO2022221979A1 (en) | 2022-10-27 |
Family
ID=76337132
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/088037 WO2022221979A1 (en) | 2021-04-19 | 2021-04-19 | Automated driving scenario generation method, apparatus, and system |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112997128B (en) |
WO (1) | WO2022221979A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116824207A (en) * | 2023-04-27 | 2023-09-29 | 国科赛赋河北医药技术有限公司 | Multidimensional pathological image classification and early warning method based on reinforcement learning mode |
CN118574288A (en) * | 2024-07-31 | 2024-08-30 | 国网湖北省电力有限公司电力科学研究院 | Lighting parameter detection method and device in night construction lighting |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113485300B (en) * | 2021-07-15 | 2022-10-04 | 南京航空航天大学 | Automatic driving vehicle collision test method based on reinforcement learning |
CN113326639B (en) * | 2021-08-03 | 2021-11-02 | 北京赛目科技有限公司 | Method and device for determining automatic driving test scene and electronic equipment |
CN113609784B (en) * | 2021-08-18 | 2024-03-22 | 清华大学 | Traffic limit scene generation method, system, equipment and storage medium |
CN113987751A (en) * | 2021-09-27 | 2022-01-28 | 蜂巢智能转向系统(江苏)有限公司保定分公司 | Scheme screening method and device, electronic equipment and storage medium |
CN114139342A (en) * | 2021-10-20 | 2022-03-04 | 武汉光庭信息技术股份有限公司 | Reinforced learning automatic driving test method and system |
CN113867367B (en) * | 2021-11-30 | 2022-02-22 | 腾讯科技(深圳)有限公司 | Processing method and device for test scene and computer program product |
EP4443257A1 (en) * | 2022-01-13 | 2024-10-09 | Huawei Technologies Co., Ltd. | Test method and apparatus |
WO2023137727A1 (en) * | 2022-01-21 | 2023-07-27 | 华为技术有限公司 | Method and apparatus for controlling intelligent driving function or system |
CN115392438B (en) * | 2022-09-14 | 2023-07-07 | 吉林建筑大学 | Deep reinforcement learning algorithm, equipment and storage medium based on multi-Agent environment |
CN115900725B (en) * | 2023-01-06 | 2023-06-16 | 阿里巴巴达摩院(杭州)科技有限公司 | Path planning device, electronic equipment, storage medium and related method |
CN117718973A (en) * | 2024-02-08 | 2024-03-19 | 国机传感科技有限公司 | Robot discrete control system and method based on axial acceleration |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110378460A (en) * | 2018-04-13 | 2019-10-25 | 北京智行者科技有限公司 | Decision-making technique |
CN110597086A (en) * | 2019-08-19 | 2019-12-20 | 深圳元戎启行科技有限公司 | Simulation scene generation method and unmanned system test method |
CN111091739A (en) * | 2018-10-24 | 2020-05-01 | 百度在线网络技术(北京)有限公司 | Automatic driving scene generation method and device and storage medium |
CN111950726A (en) * | 2020-07-09 | 2020-11-17 | 华为技术有限公司 | Decision method based on multi-task learning, decision model training method and device |
CN112130472A (en) * | 2020-10-14 | 2020-12-25 | 广州小鹏自动驾驶科技有限公司 | Automatic driving simulation test system and method |
WO2021032715A1 (en) * | 2019-08-21 | 2021-02-25 | Dspace Digital Signal Processing And Control Engineering Gmbh | Computer implemented method and test unit for approximating a subset of test results |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109902899B (en) * | 2017-12-11 | 2020-03-10 | 百度在线网络技术(北京)有限公司 | Information generation method and device |
US20190361454A1 (en) * | 2018-05-24 | 2019-11-28 | GM Global Technology Operations LLC | Control systems, control methods and controllers for an autonomous vehicle |
US10955842B2 (en) * | 2018-05-24 | 2021-03-23 | GM Global Technology Operations LLC | Control systems, control methods and controllers for an autonomous vehicle |
CN108791302B (en) * | 2018-06-25 | 2020-05-19 | 大连大学 | Driver behavior modeling system |
US11036232B2 (en) * | 2018-09-14 | 2021-06-15 | Huawei Technologies Co., Ltd | Iterative generation of adversarial scenarios |
US11157006B2 (en) * | 2019-01-10 | 2021-10-26 | International Business Machines Corporation | Training and testing automated driving models |
US11899748B2 (en) * | 2019-09-06 | 2024-02-13 | Volkswagen Aktiengesellschaft | System, method, and apparatus for a neural network model for a vehicle |
CN111122175B (en) * | 2020-01-02 | 2022-02-25 | 阿波罗智能技术(北京)有限公司 | Method and device for testing automatic driving system |
CN111444604B (en) * | 2020-03-24 | 2023-09-15 | 上海汽车集团股份有限公司 | Virtual test scene detection method and device |
CN111625457A (en) * | 2020-05-27 | 2020-09-04 | 多伦科技股份有限公司 | Virtual automatic driving test optimization method based on improved DQN algorithm |
CN112256590B (en) * | 2020-11-12 | 2022-04-29 | 腾讯科技(深圳)有限公司 | Virtual scene effectiveness judgment method and device and automatic driving system |
CN112784485B (en) * | 2021-01-21 | 2021-09-10 | 中国科学院软件研究所 | Automatic driving key scene generation method based on reinforcement learning |
CN113158560B (en) * | 2021-04-09 | 2024-02-09 | 中国科学院合肥物质科学研究院 | Intelligent driving vehicle autonomous capability test method based on scene opposition |
CN113609016B (en) * | 2021-08-05 | 2024-03-15 | 北京赛目科技股份有限公司 | Method, device, equipment and medium for constructing automatic driving test scene of vehicle |
-
2021
- 2021-04-19 CN CN202180000816.2A patent/CN112997128B/en active Active
- 2021-04-19 WO PCT/CN2021/088037 patent/WO2022221979A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110378460A (en) * | 2018-04-13 | 2019-10-25 | 北京智行者科技有限公司 | Decision-making technique |
CN111091739A (en) * | 2018-10-24 | 2020-05-01 | 百度在线网络技术(北京)有限公司 | Automatic driving scene generation method and device and storage medium |
CN110597086A (en) * | 2019-08-19 | 2019-12-20 | 深圳元戎启行科技有限公司 | Simulation scene generation method and unmanned system test method |
WO2021032715A1 (en) * | 2019-08-21 | 2021-02-25 | Dspace Digital Signal Processing And Control Engineering Gmbh | Computer implemented method and test unit for approximating a subset of test results |
CN111950726A (en) * | 2020-07-09 | 2020-11-17 | 华为技术有限公司 | Decision method based on multi-task learning, decision model training method and device |
CN112130472A (en) * | 2020-10-14 | 2020-12-25 | 广州小鹏自动驾驶科技有限公司 | Automatic driving simulation test system and method |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116824207A (en) * | 2023-04-27 | 2023-09-29 | 国科赛赋河北医药技术有限公司 | Multidimensional pathological image classification and early warning method based on reinforcement learning mode |
CN116824207B (en) * | 2023-04-27 | 2024-04-12 | 国科赛赋河北医药技术有限公司 | Multidimensional pathological image classification and early warning method based on reinforcement learning mode |
CN118574288A (en) * | 2024-07-31 | 2024-08-30 | 国网湖北省电力有限公司电力科学研究院 | Lighting parameter detection method and device in night construction lighting |
Also Published As
Publication number | Publication date |
---|---|
CN112997128B (en) | 2022-08-26 |
CN112997128A (en) | 2021-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022221979A1 (en) | Automated driving scenario generation method, apparatus, and system | |
WO2022052406A1 (en) | Automatic driving training method, apparatus and device, and medium | |
CN112703459B (en) | Iterative generation of confrontational scenarios | |
US10832140B2 (en) | Method and device for providing information for evaluating driving habits of driver by detecting driving scenarios occurring during driving | |
Koren et al. | Efficient autonomy validation in simulation with adaptive stress testing | |
Makantasis et al. | Deep reinforcement‐learning‐based driving policy for autonomous road vehicles | |
Guo et al. | DRL-TP3: A learning and control framework for signalized intersections with mixed connected automated traffic | |
CN110686906B (en) | Automatic driving test method and device for vehicle | |
Rahmati et al. | Helping automated vehicles with left-turn maneuvers: A game theory-based decision framework for conflicting maneuvers at intersections | |
JP7520444B2 (en) | Vehicle-based data processing method, data processing device, computer device, and computer program | |
Crosato et al. | Human-centric autonomous driving in an av-pedestrian interactive environment using svo | |
Johnson et al. | Experimental Evaluation and Formal Analysis of High‐Level Tasks with Dynamic Obstacle Anticipation on a Full‐Sized Autonomous Vehicle | |
Makantasis et al. | A deep reinforcement learning driving policy for autonomous road vehicles | |
Song et al. | Identifying critical test scenarios for lane keeping assistance system using analytic hierarchy process and hierarchical clustering | |
Irshayyid et al. | A Review on Reinforcement Learning-based Highway Autonomous Vehicle Control | |
Youssef et al. | Deep reinforcement learning with external control: Self-driving car application | |
Lu et al. | DeepQTest: Testing Autonomous Driving Systems with Reinforcement Learning and Real-world Weather Data | |
Shoman et al. | Autonomous Vehicle–Pedestrian Interaction Modeling Platform: A Case Study in Four Major Cities | |
Mohammed et al. | Reinforcement learning and deep neural network for autonomous driving | |
Zhang et al. | Situation analysis and adaptive risk assessment for intersection safety systems in advanced assisted driving | |
Caballero et al. | Some statistical challenges in automated driving systems | |
Gunawardena et al. | Advancing Enhancing Autonomous Vehicle Safety: Integrating Functional Safety and Advanced Path Planning Algorithms for Precise Control | |
Menendez et al. | Detecting and Predicting Smart Car Collisions in Hybrid Environments from Sensor Data | |
Gandy | Automotive sensor fusion systems for traffic aware adaptive cruise control | |
Lv et al. | A Lane‐Changing Decision‐Making Model of Bus Entering considering Bus Priority Based on GRU Neural Network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21937232 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21937232 Country of ref document: EP Kind code of ref document: A1 |