WO2022017307A1 - 自动驾驶场景生成方法、装置及系统 - Google Patents

自动驾驶场景生成方法、装置及系统 Download PDF

Info

Publication number
WO2022017307A1
WO2022017307A1 PCT/CN2021/107014 CN2021107014W WO2022017307A1 WO 2022017307 A1 WO2022017307 A1 WO 2022017307A1 CN 2021107014 W CN2021107014 W CN 2021107014W WO 2022017307 A1 WO2022017307 A1 WO 2022017307A1
Authority
WO
WIPO (PCT)
Prior art keywords
driving
vehicle
model
speed
data
Prior art date
Application number
PCT/CN2021/107014
Other languages
English (en)
French (fr)
Inventor
邵坤
王滨
刘武龙
陈栋
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022017307A1 publication Critical patent/WO2022017307A1/zh

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units, or advanced driver assistance systems for ensuring comfort, stability and safety or drive control systems for propelling or retarding the vehicle
    • B60W30/08Active safety systems predicting or avoiding probable or impending collision or attempting to minimise its consequences
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models

Definitions

  • the present application relates to the technical field of automatic driving of artificial intelligence, and in particular, to a method, device and system for generating automatic driving scenarios.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that responds in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theory.
  • Autopilot is a mainstream application in the field of artificial intelligence.
  • Autopilot technology relies on the cooperation of computer vision, radar, monitoring devices and global positioning systems to allow motor vehicles to achieve autonomous driving without the need for human active operation.
  • Self-driving vehicles use various computing systems to help transport passengers from one location to another. Some autonomous vehicles may require some initial or continuous input from an operator, such as a pilot, driver, or passenger.
  • An autonomous vehicle permits the operator to switch from a manual mode of operation to an autonomous driving mode or a mode in between. Since automatic driving technology does not require humans to drive motor vehicles, it can theoretically effectively avoid human driving errors, reduce the occurrence of traffic accidents, and improve the efficiency of highway transportation. Therefore, autonomous driving technology is getting more and more attention.
  • autonomous driving policies also known as autonomous driving algorithms
  • autonomous driving algorithms usually need to be trained before being applied to autonomous vehicles.
  • the autonomous driving strategy can be trained in the set autonomous driving scenario until the autonomous driving strategy meets the requirements.
  • the automatic driving scene is to use a camera or the like to collect a real road test environment image, and process the image to obtain the automatic driving scene in the simulation environment.
  • Embodiments of the present application provide a method, device, and system for generating automatic driving scenarios, which are used to obtain relatively comprehensive and diverse automatic driving scenarios.
  • the method for generating an automatic driving scene provided in the embodiments of the present application may be executed by an automatic driving scene generating system.
  • the system for automatic driving scene generation includes a first driving model obtaining unit, a second driving model obtaining unit, a sampling unit and an automatic driving scene generating unit.
  • the first driving model obtaining unit is used for obtaining the first driving model
  • the first driving model is used for outputting the driving strategy.
  • the second driving model obtaining unit is used for modifying the hyperparameters of the first driving model according to the performance index in the automatic driving, so as to obtain the second driving model corresponding to the performance index.
  • the sampling unit is used for sampling the driving data of the automatic driving vehicle in the second driving model corresponding to the performance index.
  • the automatic driving scene generation unit is used to assign obstacle vehicles according to the driving data of the automatic driving vehicle, and generate automatic driving scenes in combination with a preset environment model.
  • the automatic driving scene generation system in the embodiment of the present application may be a single device having the function of automatic driving scene generation. It can also be a combination of at least two devices, that is, at least two devices are combined into a system with an automatic driving scene generation function as a whole.
  • the automatic driving scene generation system is a combination of at least two devices, the automatic driving scene generation system. Between the two devices, communication can be performed through a communication method among Bluetooth, wired connection or wireless transmission.
  • the automatic driving scene generation system in the embodiment of the present application may be installed on a mobile device, such as a vehicle, for the vehicle to generate an automatic driving scene.
  • a mobile device such as a vehicle
  • the automatic driving scene generation system can also be installed on fixed devices, for example, on servers, terminal devices and other devices to generate automatic driving scenarios.
  • an embodiment of the present application provides a method for generating an automatic driving scene, including:
  • a first driving model which is used to output a driving strategy of at least one autonomous vehicle; sample a hyperparameter set of the first driving model, and use the sampling results to initialize a plurality of hyperparameters of the first driving model , according to the performance indicators in automatic driving, adjust some of the hyperparameters of the first driving model to obtain the second driving model corresponding to the performance indicators; sample the driving data of the self-driving vehicle in the second driving model corresponding to the performance indicators; The driving data of the driving vehicle is assigned to the obstacle vehicle, and combined with the preset environment model, the automatic driving scene is generated.
  • the diversified vehicle driving data related to the performance index may be further obtained based on the first driving model, and the diversified vehicle driving data may be assigned to the obstacle data, and combined with the environment
  • the model can obtain a more comprehensive and diverse automatic driving scene.
  • acquiring the first driving model includes: acquiring driving-related data of the first vehicle and driving-related data of surrounding vehicles of the first vehicle; and combining the driving-related data of the first vehicle with surrounding vehicles of the first vehicle.
  • the driving-related data is input into a preset model; the preset model is used to output the driving strategy of the first vehicle; the parameters of the preset model are adjusted until the driving strategy of the first vehicle output by the preset model meets the preset conditions, and the result is obtained:
  • the first driving model In this way, the first driving model can be obtained by training based on the driving-related data of the vehicle.
  • the driving-related data includes one or more of the following: position data, speed data or direction data.
  • position data position data
  • speed data direction data.
  • an accurate first driving model can be obtained by training according to driving-related data such as position data, speed data and/or direction data.
  • the reward function of the preset model is related to the difference between the first vehicle and the vehicle in front of the first vehicle.
  • the distance, the speed of the first vehicle and the speed of the vehicle in front of the first vehicle are related.
  • the reward functions of the preset model are respectively: negatively correlated with the distance, negatively correlated with the speed of the first vehicle, and positively correlated with the speed of the preceding vehicle of the first vehicle.
  • the reward function of the preset model satisfies:
  • ttc d front /(vv front )
  • d front is the distance between the first vehicle and the vehicle in front of the first vehicle
  • v is the speed of the first vehicle
  • v front is the speed of the vehicle in front of the first vehicle
  • ttc target is the first value
  • the reward function of the preset model is related to the speed of the first vehicle.
  • the reward function of the preset model when the speed of the first vehicle is less than 2 meters per second, the reward function of the preset model is positively correlated with the speed of the first vehicle; when the speed of the first vehicle is greater than the first constant In this case, the reward function of the preset model is negatively related to the speed of the first vehicle; when the speed of the first vehicle is greater than or equal to 2 meters per second and less than or equal to the first constant, the reward function of the preset model is The function is positively related to the speed of the first vehicle; the first constant is greater than 2 meters per second.
  • the reward function of the preset model satisfies:
  • v is the speed of the first vehicle
  • v target is a constant
  • the objective function of the preset model is related to the cumulative reward of the first vehicle in a segment of trajectory.
  • the objective function includes:
  • R is the cumulative reward of the first vehicle in a segment of trajectory and ⁇ is the model parameter.
  • the number of automatic driving scenarios is multiple; the method further includes: sorting multiple automatic driving scenarios; and sequentially training driving strategies in the sorted multiple automatic driving scenarios to obtain a target driving model.
  • the method further includes: sorting multiple automatic driving scenarios; and sequentially training driving strategies in the sorted multiple automatic driving scenarios to obtain a target driving model.
  • the driving strategy is trained in the sequenced multiple automatic driving scenarios in turn to obtain the target driving model, including: for multiple automatic driving scenarios sorted from easy to difficult according to the driving difficulty, In the driving scene, the previous driving strategy is obtained by training as the input of the subsequent self-driving driving scene, and a target driving model is obtained by training in turn.
  • training the automatic driving model in the order from easy to difficult automatic driving scenarios can realize progressive training, which can save computing resources compared to training directly in the more difficult automatic driving scenarios.
  • the performance index includes: a speed index, an acceleration index or a distance index from the preceding vehicle.
  • the population model of the performance index includes one or more of the following: a model that maximizes speed, a model that minimizes speed, a model that maximizes the distance from the preceding vehicle, and the model that minimizes the distance from the preceding vehicle. model, a model that maximizes average acceleration, or a model that minimizes average acceleration.
  • the hyperparameters include one or more of the following: learning rate or batch size.
  • the driving-related data is collected from real road test data, and/or the driving-related data is generated by the interaction between the vehicle and the environment in the simulator.
  • the method in the embodiment of the present application may be executed locally or in the cloud, which is not specifically limited in the embodiment of the present application.
  • an embodiment of the present application provides an apparatus for generating an automatic driving scenario, where the apparatus can be used to perform the operations in the first aspect and any possible implementation manner of the first aspect.
  • an apparatus may include modules or units for performing various operations in the above-described first aspect or any possible implementation of the first aspect.
  • it includes a transceiver module and a processing module.
  • the processing module is used for: acquiring a first driving model, the first driving model is used for the output driving strategy of at least one autonomous driving vehicle; sampling the hyperparameter set of the first driving model, and initializing the multi-parameters using the sampling results; the hyperparameters of the first driving model, adjust some of the hyperparameters of the first driving model according to the performance index in the automatic driving, and obtain the second driving model corresponding to the performance index; in the second driving model corresponding to the performance index Sampling the driving data of the self-driving vehicle; assigning obstacle vehicles according to the driving data of the self-driving vehicle, and combining the preset environment model to generate the self-driving scene.
  • the processing module is specifically configured to: obtain the driving-related data of the first vehicle and the driving-related data of the surrounding vehicles of the first vehicle; and combine the driving-related data of the first vehicle with the surrounding vehicles of the first vehicle.
  • the driving-related data is input into a preset model; the preset model is used to output the driving strategy of the first vehicle; the parameters of the preset model are adjusted until the driving strategy of the first vehicle output by the preset model meets the preset conditions, and the result is obtained:
  • the first driving model In this way, the first driving model can be obtained by training based on the form-related data of the vehicle.
  • the driving-related data includes one or more of the following: position data, speed data or direction data.
  • position data position data
  • speed data direction data.
  • an accurate first driving model can be obtained by training according to driving-related data such as position data, speed data and/or direction data.
  • the reward function of the preset model is related to the difference between the first vehicle and the vehicle in front of the first vehicle.
  • the distance, the speed of the first vehicle and the speed of the vehicle in front of the first vehicle are related.
  • the reward functions of the preset model are respectively: negatively correlated with the distance, negatively correlated with the speed of the first vehicle, and positively correlated with the speed of the preceding vehicle of the first vehicle.
  • the reward function of the preset model satisfies:
  • ttc d front /(vv front )
  • d front is the distance between the first vehicle and the vehicle in front of the first vehicle
  • v is the speed of the first vehicle
  • v front is the speed of the vehicle in front of the first vehicle
  • ttc target is the first value
  • the reward function of the preset model is related to the speed of the first vehicle.
  • the reward function of the preset model when the speed of the first vehicle is less than 2 meters per second, the reward function of the preset model is positively correlated with the speed of the first vehicle; when the speed of the first vehicle is greater than the first constant In this case, the reward function of the preset model is negatively related to the speed of the first vehicle; when the speed of the first vehicle is greater than or equal to 2 meters per second and less than or equal to the first constant, the reward function of the preset model is The function is positively related to the speed of the first vehicle; the first constant is greater than 2 meters per second.
  • the reward function of the preset model satisfies:
  • v is the speed of the first vehicle
  • v target is a constant
  • the objective function of the preset model and the accumulation of the first vehicle in a track is the objective function of the preset model and the accumulation of the first vehicle in a track
  • the objective function includes:
  • R is the cumulative reward of the first vehicle in a segment of trajectory and ⁇ is the model parameter.
  • the number of automatic driving scenarios is multiple; the method further includes: sorting multiple automatic driving scenarios; and sequentially training driving strategies in the sorted multiple automatic driving scenarios to obtain a target driving model.
  • the method further includes: sorting multiple automatic driving scenarios; and sequentially training driving strategies in the sorted multiple automatic driving scenarios to obtain a target driving model.
  • the processing module is specifically used for: for multiple autonomous driving scenarios sorted according to the driving difficulty from easy to difficult, the prior driving strategy obtained by training in the prior autonomous driving scenario is used as the subsequent autonomous driving scenario.
  • the input is trained in turn to obtain a target driving model.
  • the performance index includes: a speed index, an acceleration index or a distance index from the preceding vehicle.
  • the population model of the performance index includes one or more of the following: a model that maximizes speed, a model that minimizes speed, a model that maximizes the distance from the preceding vehicle, and the model that minimizes the distance from the preceding vehicle. model, a model that maximizes average acceleration, or a model that minimizes average acceleration.
  • the hyperparameters include one or more of the following: learning rate or batch size.
  • the driving-related data is collected from real road test data, and/or the driving-related data is generated by the interaction between the vehicle and the environment in the simulator.
  • an embodiment of the present application provides a chip system, including a processor, and optionally a memory; wherein, the memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that the installed
  • the automatic driving scene generating apparatus of the chip system executes any method in the first aspect or any possible implementation manner of the first aspect.
  • embodiments of the present application provide a vehicle, at least one camera, at least one memory, at least one transceiver, and at least one processor.
  • a camera for acquiring at least one image
  • a memory for storing one or more programs and data information; wherein the one or more programs include instructions
  • the processor is used to obtain a first driving model, and the first driving model is used for the output driving strategy of at least one automatic driving vehicle; it samples the hyperparameter set of the first driving model, and uses the sampling
  • a plurality of hyperparameters of the first driving model are initialized, and some hyperparameters of the first driving model are adjusted according to the performance indicators in the automatic driving, so as to obtain the second driving model corresponding to the performance indicators; in the second driving model corresponding to the performance indicators
  • the driving data of the self-driving vehicle is sampled in the model; the obstacle vehicle is assigned according to the driving data of the self-driving vehicle, and the automatic driving scene is generated in combination with the preset environment model.
  • the processor in this embodiment of the present application may also perform steps corresponding to the processing module in any possible implementation manner of the second aspect. For details, reference may be made to the description of the second aspect, which will not be repeated here.
  • an embodiment of the present application provides a computer program product, the computer program product includes: computer program code, when the computer program code is run by a communication module, a processing module or a transceiver, or a processor of an automatic driving scene generation device, The automatic driving scene generating apparatus is caused to perform any method in the first aspect or any possible implementation manner of the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a program, and the program enables the automatic driving scene generation apparatus to execute the first aspect or any possible implementation manner of the first aspect. any method.
  • an embodiment of the present application provides an automatic driving system, including a training device and an execution device; the training device is used to execute any method in the first aspect or any possible implementation manner of the first aspect, and the execution device Used to execute driving strategies trained on training equipment.
  • FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 2 is a functional block diagram of a vehicle 100 provided by an embodiment of the present application.
  • Fig. 3 is the structural representation of the computer system in Fig. 2;
  • FIG. 4 is a schematic diagram of a chip hardware structure provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an operating environment provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of a method for generating an automatic driving scene according to an embodiment of the present application
  • FIG. 7 is a schematic diagram of a model training provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a model training provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of generating an automatic driving scene according to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a model training provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a model training provided by an embodiment of the present application.
  • FIG. 12 is a schematic flowchart of a method for generating an automatic driving scene provided by an embodiment of the present application
  • FIG. 13 is a schematic structural diagram of an apparatus for generating an automatic driving scene according to an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of another automatic driving scene generation apparatus provided by an embodiment of the present application.
  • FIG. 15 is a schematic structural diagram of a vehicle according to an embodiment of the application.
  • the method, device, and system for generating an automatic driving scenario provided by the embodiments of the present application can be applied to scenarios such as driving strategy planning of an automatic driving vehicle.
  • the method, device, and system for generating an automatic driving scenario provided by the embodiments of the present application can be applied to scenario A and scenario B.
  • scenario A and scenario B The following briefly introduces scenario A and scenario B respectively.
  • a model of the driving strategy (which may also be called a driving algorithm, a control strategy or a control algorithm, etc.) can be obtained according to the driving-related data of the self-driving car and surrounding vehicles.
  • the scene generation method generates automatic driving scenes, and then a robust target driving strategy can be trained based on the generated automatic driving scenes.
  • the autonomous vehicle can collect the driving-related data of the self-driving vehicle and surrounding vehicles, and send the driving-related data of the self-driving vehicle and surrounding vehicles to other devices that communicate with the self-driving vehicle.
  • the driving-related data of the own vehicle and surrounding vehicles is used to obtain a model for outputting a driving strategy, and the other device generates an automatic driving scene according to the automatic driving scene generation method of the embodiment of the present application.
  • a strong target driving strategy the other device can send the target driving strategy to the self-driving vehicle for driving control of the self-driving vehicle.
  • simulators are very important for policy learning of autonomous driving, where simulators can be used to provide open source codes and protocols for training and validation of autonomous driving policies.
  • simulators can be used to provide open source codes and protocols for training and validation of autonomous driving policies.
  • Such an implementation method not only requires a lot of manual configuration work, but also requires a lot of manual configuration work.
  • the driving models of social vehicles which may also be called obstacles or obstacle vehicles, etc.
  • the driving models of social vehicles which may also be called obstacles or obstacle vehicles, etc.
  • driving models trained with this kind of simulator often cannot show sufficient generalization and intelligence in real complex scenes.
  • the embodiments of the present application provide an automatic driving scene generation method, which can automatically generate rich automatic driving scenes, and provides a possibility for training a driving model with strong robustness.
  • the methods of the embodiments of the present application may run on a vehicle provided with a computer system, and executable codes for environment perception, data processing, action selection and/or vehicle control may be stored on a storage component of the computer system. Alternatively, the methods of the embodiments of the present application may also be run on the cloud or the like.
  • FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the system architecture provided by this embodiment of the present application may include: a training device 01 and an execution device 02 .
  • the training device 01 is used to generate automatic driving scenarios and/or training driving strategies according to the method provided by the embodiment of the present application
  • the execution device 02 is used to determine the target using the driving strategy trained by the training device 01 according to the method provided by the embodiment of the present application Action
  • the execution device 1002 can also be used to train the driving strategy in real time, or train the driving strategy every preset time period.
  • the executing subject of the training method for executing the driving strategy may be the above-mentioned training device 01 , or may be a driving strategy training device in the above-mentioned training device 01 .
  • the driving strategy training device provided in the embodiments of the present application may be implemented by software and/or hardware.
  • the execution body for executing the automatic driving scene generation method may be the foregoing execution device 02 , or may be a device in the foregoing execution device 02 .
  • the apparatus in the execution device 02 provided in the embodiment of the present application may be implemented by software and/or hardware.
  • the training device 01 provided in this embodiment of the present application may include, but is not limited to, a model training platform device.
  • the execution device 02 provided in this embodiment of the present application may include, but is not limited to, an autonomous vehicle, or a control device in an autonomous vehicle.
  • FIG. 2 is a functional block diagram of the vehicle 100 provided by the embodiment of the present application.
  • the vehicle 100 is configured in a fully or partially autonomous driving mode.
  • the vehicle 100 can also determine the current state of the vehicle and its surrounding environment through human operation while in the autonomous driving mode, and determine the possible behavior of at least one other vehicle in the surrounding environment , and determine a confidence level corresponding to the possibility of the other vehicle performing the possible behavior, and control the vehicle 100 based on the determined information.
  • the vehicle 100 may be set to perform driving-related operations automatically without requiring human interaction.
  • Vehicle 100 may include various subsystems, such as travel system 102 , sensor system 104 , control system 106 , one or more peripherals 108 and power supply 110 , computer system 112 , and user interface 116 .
  • vehicle 100 may include more or fewer subsystems, and each subsystem may include multiple elements. Additionally, each of the subsystems and elements of the vehicle 100 may be interconnected by wire or wirelessly.
  • the travel system 102 may include components that provide powered motion for the vehicle 100 .
  • travel system 102 may include engine 118 , energy source 119 , transmission 120 , and wheels/tires 121 .
  • the engine 118 may be an internal combustion engine, an electric motor, an air compression engine, or other types of engine combinations, such as a gasoline engine and electric motor hybrid engine, an internal combustion engine and an air compression engine hybrid engine.
  • Engine 118 converts energy source 119 into mechanical energy.
  • Examples of energy sources 119 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas-based fuels, ethanol, solar panels, batteries, and other sources of electricity.
  • the energy source 119 may also provide energy to other systems of the vehicle 100 .
  • Transmission 120 may transmit mechanical power from engine 118 to wheels 121 .
  • Transmission 120 may include a gearbox, a differential, and a driveshaft.
  • transmission 120 may also include other devices, such as clutches.
  • the drive shaft may include one or more axles that may be coupled to one or more wheels 121 .
  • the sensor system 104 may include several sensors that sense information about the environment surrounding the vehicle 100 .
  • the sensor system 104 may include a positioning system 122 (which may be a GPS system, a Beidou system or other positioning system), an inertial measurement unit (IMU) 124, a radar 126, a laser rangefinder 128, and camera 130.
  • the sensor system 104 may also include sensors of the internal systems of the vehicle 100 being monitored (eg, an in-vehicle air quality monitor, a fuel gauge, an oil temperature gauge, etc.). Sensor data from one or more of these sensors can be used to detect objects and their corresponding characteristics (position, shape, orientation, velocity, etc.). This detection and identification is a critical function for the safe operation of the autonomous vehicle 100 .
  • the positioning system 122 may be used to estimate the geographic location of the vehicle 100 .
  • the IMU 124 is used to sense position and orientation changes of the vehicle 100 based on inertial acceleration.
  • IMU 124 may be a combination of an accelerometer and a gyroscope.
  • Radar 126 may utilize radio signals to sense objects within the surrounding environment of vehicle 100 . In some embodiments, in addition to sensing objects, radar 126 may be used to sense the speed and/or heading of objects.
  • the laser rangefinder 128 may utilize laser light to sense objects in the environment in which the vehicle 100 is located.
  • the laser rangefinder 128 may include one or more laser sources, laser scanners, and one or more detectors, among other system components.
  • Camera 130 may be used to capture multiple images of the surrounding environment of vehicle 100 .
  • Camera 130 may be a still camera or a video camera.
  • Control system 106 controls the operation of the vehicle 100 and its components.
  • Control system 106 may include various elements including steering system 132 , throttle 134 , braking unit 136 , sensor fusion algorithms 138 , computer vision system 140 , route control system 142 , and obstacle avoidance system 144 .
  • the steering system 132 is operable to adjust the heading of the vehicle 100 .
  • it may be a steering wheel system.
  • the throttle 134 is used to control the operating speed of the engine 118 and thus the speed of the vehicle 100 .
  • the braking unit 136 is used to control the deceleration of the vehicle 100 .
  • the braking unit 136 may use friction to slow the wheels 121 .
  • the braking unit 136 may convert the kinetic energy of the wheels 121 into electrical current.
  • the braking unit 136 may also take other forms to slow the wheels 121 to control the speed of the vehicle 100 .
  • Computer vision system 140 may be operable to process and analyze images captured by camera 130 in order to identify objects and/or features in the environment surrounding vehicle 100 .
  • the objects and/or features may include traffic signals, road boundaries and obstacles.
  • Computer vision system 140 may use object recognition algorithms, structure from motion (SFM) algorithms, video tracking, and other computer vision techniques.
  • SFM structure from motion
  • the computer vision system 140 may be used to map the environment, track objects, estimate the speed of objects, and the like.
  • the route control system 142 is used to determine the travel route of the vehicle 100 .
  • route control system 142 may combine data from sensors 138 , global positioning system (GPS) 122 , and one or more predetermined maps to determine a route for vehicle 100 .
  • GPS global positioning system
  • the obstacle avoidance system 144 is used to identify, evaluate, and avoid or otherwise traverse potential obstacles in the environment of the vehicle 100 .
  • control system 106 may additionally or alternatively include components other than those shown and described. Alternatively, some of the components shown above may be reduced.
  • Peripherals 108 may include a wireless communication system 146 , an onboard computer 148 , a microphone 150 and/or a speaker 152 .
  • peripherals 108 provide a means for a user of vehicle 100 to interact with user interface 116 .
  • the onboard computer 148 may provide information to the user of the vehicle 100 .
  • User interface 116 may also operate on-board computer 148 to receive user input.
  • the onboard computer 148 can be operated via a touch screen.
  • peripheral devices 108 may provide a means for vehicle 100 to communicate with other devices located within the vehicle.
  • microphone 150 may receive audio (eg, voice commands or other audio input) from a user of vehicle 100 .
  • speakers 152 may output audio to a user of vehicle 100 .
  • Wireless communication system 146 may wirelessly communicate with one or more devices, either directly or via a communication network.
  • wireless communication system 146 may use 3G cellular communications, such as code division multiple access (CDMA), EVD0, global system for mobile communications (GSM)/general packet radio service, GPRS), or 4G cellular communications such as LTE. Or 5G cellular communications.
  • the wireless communication system 146 may utilize wireless-fidelity (WiFi) to communicate with a wireless local area network (WLAN).
  • WiFi wireless local area network
  • WLAN wireless local area network
  • the wireless communication system 146 may communicate directly with the device using an infrared link, Bluetooth, or ZigBee.
  • Other wireless protocols, such as various vehicle communication systems, for example, wireless communication system 146 may include one or more dedicated short range communications (DSRC) devices, which may include communication between vehicles and/or roadside stations public and/or private data communications.
  • DSRC dedicated short range communications
  • the power supply 110 may provide power to various components of the vehicle 100 .
  • the power source 110 may be a rechargeable lithium-ion or lead-acid battery.
  • One or more battery packs of such a battery may be configured as a power source to provide power to various components of the vehicle 100 .
  • power source 110 and energy source 119 may be implemented together, such as in some all-electric vehicles.
  • Computer system 112 may include at least one processor 113 that executes instructions 115 stored in a non-transitory computer-readable medium such as data storage device 114 .
  • Computer system 112 may also be multiple computing devices that control individual components or subsystems of vehicle 100 in a distributed fashion.
  • the processor 113 may be any conventional processor, such as a commercially available central processing unit (CPU). Alternatively, the processor may be a special-purpose device such as an application specific integrated circuit (ASIC) or other hardware-based processor for use in a specific application.
  • FIG. 2 functionally illustrates a processor, memory, and other elements of the computer system 112 in the same blocks, one of ordinary skill in the art will understand that the processor, computer, or memory may actually include a can or Multiple processors, computers, or memories that are not stored within the same physical enclosure.
  • the memory may be a hard drive or other storage medium located within an enclosure other than a computer.
  • reference to a processor or computer will be understood to include reference to a collection of processors or computers or memories that may or may not operate in parallel.
  • some components such as the steering and deceleration components may each have their own processor that only performs computations related to component-specific functions .
  • a processor may be located remotely from the vehicle and in wireless communication with the vehicle. In other aspects, some of the processes described herein are performed on a processor disposed within the vehicle while others are performed by a remote processor, including taking steps necessary to perform a single maneuver.
  • data storage 114 may include instructions 115 (eg, program logic) executable by processor 113 to perform various functions of vehicle 100 , including those described above.
  • Data storage 114 may also contain additional instructions, including sending data to, receiving data from, interacting with, and/or performing data processing on one or more of propulsion system 102 , sensor system 104 , control system 106 , and peripherals 108 . control commands.
  • the data storage device 114 may store data such as road maps, route information, the vehicle's position, direction, speed, and other such vehicle data, among other information. Such information may be used by the vehicle 100 and the computer system 112 during operation of the vehicle 100 in autonomous, semi-autonomous and/or manual modes.
  • a user interface 116 for providing information to or receiving information from a user of the vehicle 100 .
  • the user interface 116 may include one or more input/output devices within the set of peripheral devices 108 , such as a wireless communication system 146 , an onboard computer 148 , a microphone 150 and a speaker 152 .
  • Computer system 112 may control functions of vehicle 100 based on input received from various subsystems (eg, travel system 102 , sensor system 104 , and control system 106 ) and from user interface 116 .
  • computer system 112 may utilize input from control system 106 to control steering unit 132 to avoid obstacles detected by sensor system 104 and obstacle avoidance system 144.
  • computer system 112 is operable to provide control of various aspects of vehicle 100 and its subsystems.
  • one or more of these components described above may be installed or associated with the vehicle 100 separately.
  • data storage device 114 may exist partially or completely separate from vehicle 100 .
  • the above-described components may be communicatively coupled together in a wired and/or wireless manner.
  • the above component is just an example.
  • components in each of the above modules may be added or deleted according to actual needs, and FIG. 2 should not be construed as a limitation on the embodiments of the present application.
  • a self-driving car traveling on a road can recognize objects in its surroundings to determine its own adjustment to its current speed.
  • the objects may be other vehicles, traffic control equipment, or other types of objects.
  • each identified obstacle may be considered independently, and based on the respective characteristics of each obstacle, such as its current speed, acceleration, distance from the vehicle, etc., to determine what adjustments to be made by the autonomous vehicle (ego vehicle) speed.
  • the autonomous vehicle vehicle 100 or a computing device associated with the autonomous vehicle vehicle 100 may be based on the characteristics of the identified obstacles and The state of the surrounding environment (eg, traffic, rain, ice on the road, etc.) is used to predict the behavior of the identified obstacle.
  • each identified obstacle is dependent on each other's behavior, so it is also possible to predict the behavior of a single identified obstacle by considering all identified obstacles together.
  • the vehicle 100 can adjust its speed based on the predicted behavior of the identified obstacle.
  • the self-driving car can determine what state the vehicle will need to adjust to (eg, accelerate, decelerate, or stop) based on the predicted behavior of the obstacle.
  • other factors may also be considered to determine the speed of the vehicle 100, such as the lateral position of the vehicle 100 in the road being traveled, the curvature of the road, the proximity of static and dynamic objects, and the like.
  • the computing device may also provide instructions to modify the steering angle of the vehicle 100 so that the self-driving car follows a given trajectory and/or maintains contact with obstacles in the vicinity of the self-driving car ( For example, safe lateral and longitudinal distances for vehicles in adjacent lanes on the road.
  • the above-mentioned vehicle 100 can be a car, a truck, a motorcycle, a bus, a boat, an airplane, a helicopter, a lawn mower, a recreational vehicle, a playground vehicle, construction equipment, a tram, a golf cart, a train, a cart, etc.
  • the application examples are not particularly limited.
  • FIG. 3 is a schematic structural diagram of the computer system 112 in FIG. 2 .
  • computer system 112 includes processor 113 coupled to system bus 105 .
  • the processor 113 may be one or more processors, each of which may include one or more processor cores.
  • a video adapter 107 which can drive a display 109, is coupled to the system bus 105.
  • the system bus 105 is coupled to an input-output (I/O) bus through a bus bridge 111 .
  • I/O interface 115 is coupled to the I/O bus.
  • I/O interface 115 communicates with various I/O devices, such as input device 117 (eg, keyboard, mouse, touch screen, etc.), media tray 121, (eg, CD-ROM, multimedia interface, etc.).
  • Transceiver 123 (which can send and/or receive radio communication signals), camera 155 (which can capture still and moving digital video images) and external USB interface 125 .
  • the interface connected to the I/O interface 115 may be a universal serial bus (universal serial bus, USB) interface.
  • the processor 113 may be any conventional processor, including a reduced instruction set computing (“RISC”) processor, a complex instruction set computing (“CISC”) processor, or a combination thereof.
  • the processor may be a special purpose device such as an application specific integrated circuit (“ASIC").
  • the processor 113 may be a neural network processor or a combination of a neural network processor and the above-mentioned conventional processors.
  • the computer system may be located remotely from the autonomous vehicle and may communicate wirelessly with the autonomous vehicle.
  • some of the processes described herein are performed on a processor disposed within the autonomous vehicle, others are performed by a remote processor, including taking actions required to perform a single maneuver.
  • Network interface 129 is a hardware network interface, such as a network card.
  • the network 127 may be an external network, such as the Internet, or an internal network, such as an Ethernet network or a virtual private network (VPN).
  • the network 127 may also be a wireless network, such as a WiFi network, a cellular network, and the like.
  • the hard drive interface 131 is coupled to the system bus 105 .
  • the hard disk drive interface 131 is connected to the hard disk drive 133 .
  • System memory 135 is coupled to system bus 105 .
  • Software running in system memory 135 may include an operating system (OS) 137 and application programs 143 of computer system 112 .
  • OS operating system
  • application programs 143 of computer system 112 .
  • the operating system includes a Shell 139 and a kernel 141 .
  • Shell 139 is an interface between the user and the kernel of the operating system.
  • the shell is the outermost layer of the operating system. The shell manages the interaction between the user and the operating system: waiting for user input, interpreting the user's input to the operating system, and processing various operating system outputs.
  • Kernel 141 consists of those parts of the operating system that manage memory, files, peripherals, and system resources. Interacting directly with hardware, the operating system's kernel 141 typically runs processes and provides inter-process communication, providing CPU time slice management, interrupts, memory management, IO management, and the like.
  • Application 141 includes programs related to controlling the autopilot of the car, for example, programs that manage the interaction of the autopilot car with obstacles on the road, programs that control the route or speed of the autopilot car, and programs that control the interaction between the autopilot car and other autopilot cars on the road .
  • Application 141 also exists on the system of software deploying server 149 .
  • the computer system may download the application 143 from the deploying server 149 when the application 141 needs to be executed.
  • Sensor 153 is associated with a computer system. Sensor 153 is used to detect the environment around computer system 112 .
  • the sensor 153 can detect animals, cars, obstacles and pedestrian crossings, etc. Further sensors can also detect the environment around the above-mentioned animals, cars, obstacles and pedestrian crossings, such as: the environment around animals, for example, animals appear around other animals, weather conditions, ambient light levels, etc.
  • the sensors may be cameras, infrared sensors, chemical detectors, microphones, and the like.
  • FIG. 4 is a schematic diagram of a chip hardware structure according to an embodiment of the present application.
  • the chip may include a neural network processor 40 .
  • the chip may be set in the execution device 02 as shown in FIG. 1 to complete the automatic driving scene generation method provided by the embodiment of the application.
  • the chip can also be set in the training device 01 as shown in FIG. 1 to complete the training method of the control strategy provided by the embodiment of the application.
  • the neural network processor 40 may be a neural network processing unit (NPU), a tensor processing unit (TPU), or a graphics processing unit (graphics processing unit, GPU), etc., all suitable for large-scale applications.
  • a processor for XOR processing Take the NPU as an example: the NPU can be mounted on the main CPU (host CPU) as a co-processor, and the host CPU assigns tasks to it.
  • the core part of the NPU is the arithmetic circuit 403, which is controlled by the controller 404 to extract the matrix data in the memory (401 and 402) and perform multiplication and addition operations.
  • the arithmetic circuit 403 includes multiple processing units (process engines, PEs). In some implementations, arithmetic circuit 403 is a two-dimensional systolic array. The arithmetic circuit 403 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 403 is a general-purpose matrix processor.
  • the arithmetic circuit 403 fetches the weight data of the matrix B from the weight memory 402 and buffers it on each PE in the arithmetic circuit 403 .
  • the arithmetic circuit 403 fetches the input data of the matrix A from the input memory 401 , performs matrix operations according to the input data of the matrix A and the weight data of the matrix B, and stores the partial result or the final result of the matrix in the accumulator 408 .
  • Unified memory 406 is used to store input data and output data.
  • the weight data is directly transferred to the weight memory 402 through a storage unit access controller (direct memory access controller, DMAC) 405 .
  • Input data is also moved to unified memory 406 via the DMAC.
  • DMAC direct memory access controller
  • the bus interface unit (bus interface unit, BIU) 410 is used for the interaction between the DMAC and the instruction fetch buffer (instruction fetch buffer) 409; the bus interface unit 401 is also used for the instruction fetch memory 409 to obtain instructions from the external memory; the bus interface unit 401 also The memory cell access controller 405 acquires the original data of the input matrix A or the weight matrix B from the external memory.
  • the DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 406 , or the weight data to the weight memory 402 , or the input data to the input memory 401 .
  • the vector calculation unit 407 has multiple operation processing units, and if necessary, further processes the output of the operation circuit 403, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on.
  • the vector calculation unit 407 is mainly used for the calculation of non-convolutional layers or fully connected layers (FC) in the neural network, and can specifically handle: Pooling (pooling), Normalization (normalization) and other calculations.
  • the vector calculation unit 407 may apply a nonlinear function to the output of the arithmetic circuit 403, such as a vector of accumulated values, to generate activation values.
  • vector computation unit 407 generates normalized values, merged values, or both.
  • vector computation unit 407 stores the processed vectors to unified memory 406 .
  • the vectors processed by the vector computation unit 407 can be used as the activation input to the arithmetic circuit 403 .
  • the instruction fetch memory (instruction fetch buffer) 409 connected to the controller 404 is used to store the instructions used by the controller 404;
  • the unified memory 406, the input memory 401, the weight memory 402 and the instruction fetch memory 409 are all On-Chip memories. External memory is independent of the NPU hardware architecture.
  • FIG. 5 is a schematic diagram of an operating environment provided by an embodiment of the present application.
  • the cloud service center may receive information (such as data collected by vehicle sensors or other information) from autonomous vehicles 510 and 512 within its operating environment 500 via a network 502 (eg, a wireless communication network).
  • a network 502 eg, a wireless communication network
  • the cloud service center 520 may receive, from the autonomous vehicle 510 via the network 502 (eg, a wireless communication network), the driving information of the autonomous vehicle 510 at any time (for example, information such as driving speed and/or driving position) and the autonomous vehicle. 510 perceives the driving information of other vehicles within the range, and so on.
  • the network 502 eg, a wireless communication network
  • the driving information of the autonomous vehicle 510 at any time (for example, information such as driving speed and/or driving position) and the autonomous vehicle. 510 perceives the driving information of other vehicles within the range, and so on.
  • the cloud service center 520 can run the stored programs related to controlling the automatic driving of the car, so as to realize the control of the automatic driving vehicle 510 and the automatic driving vehicle 512 .
  • Programs related to controlling autonomous driving of cars can be programs that manage the interaction between autonomous vehicles and obstacles on the road, programs that control the route or speed of autonomous vehicles, and programs that control the interaction between autonomous vehicles and other autonomous vehicles on the road.
  • Network 502 provides portions of the map to autonomous vehicles 510 and 512 .
  • multiple cloud service centers may receive, validate, combine, and/or transmit information reports.
  • Information reports and/or sensor data may also be sent between autonomous vehicles in some examples.
  • the cloud service center 520 may send the autonomous vehicle (or autonomous vehicle) suggested solutions based on possible driving situations within the environment (eg, informing of the obstacle ahead, and informing how to get around it). For example, the cloud service center 520 may assist the vehicle in determining how to proceed when faced with certain obstacles within the environment.
  • the cloud service center 520 may send a response to the autonomous vehicle indicating how the vehicle should travel in a given scenario. For example, the cloud service center can confirm the presence of a temporary stop sign ahead of the road based on the collected sensor data, and also determine that the lane is closed due to the application based on the "lane closed" sign and sensor data of the construction vehicle. .
  • the cloud service center 520 may send a suggested operating mode for the autonomous vehicle to pass the obstacle (eg, instructing the vehicle to change lanes to another road).
  • a suggested operating mode for the autonomous vehicle eg, instructing the vehicle to change lanes to another road.
  • the cloud service center 520 can observe the video stream within its operating environment and has confirmed that the self-driving vehicle can safely and successfully traverse the obstacle, the operating steps used by the self-driving vehicle can be added to the driving information map. Accordingly, this information can be sent to other vehicles in the area that may encounter the same obstacle in order to assist other vehicles not only in recognizing closed lanes but also knowing how to pass.
  • autonomous vehicle 510 and/or 512 may autonomously control the driving during operation, and may also not need the control of the cloud service center 520 .
  • FIG. 6 is a schematic diagram of an automatic driving generation method according to an embodiment of the present application.
  • driving data (or referred to as driving-related data or driving-related data or driving data, etc.) of the vehicle may be obtained, and a first driving model may be obtained by training with a common training method.
  • performance indicators in automatic driving can be used to evolve a diverse driving model (for example, for each performance indicator, a corresponding driving model is evolved).
  • each driving model can include an autonomous vehicle (or a host vehicle) and obstacles, and for any driving model in the diverse driving model, it can be used in the other one of the diverse driving models.
  • a training algorithm (such as a curriculum learning method, etc.) can be used to train a robust driving model in a variety of scenarios.
  • the driving-related data of the vehicle involved in the embodiments of the present application may be collected by a sensor device of the vehicle, or may be data generated by the interaction between the vehicle and the environment during reinforcement learning in the simulator.
  • the driving-related data of the vehicle may include data such as position data, speed data, and direction data of the self-driving vehicle, as well as position data, speed data, and direction data of vehicles around the self-driving vehicle (which may be called obstacle vehicles). etc. data.
  • the performance indicators in automatic driving involved in the embodiments of the present application include: speed indicators, acceleration indicators, and/or distance indicators from the preceding vehicle, and other indicators used to describe vehicle-related performance in automatic driving.
  • the second driving model corresponding to the performance index involved in the embodiment of the present application may include one or more of the following: a model for maximizing speed, a model for minimizing speed, a model for maximizing distance from the vehicle in front, a model for minimizing A model for vehicle distance, a model for maximizing average acceleration, or a model for minimizing average acceleration.
  • the second driving model corresponding to the speed index includes a model that maximizes speed and/or a model that minimizes speed.
  • the second driving model corresponding to the acceleration index includes a model that maximizes the average acceleration and/or a model that minimizes the average acceleration.
  • the second driving model corresponding to the distance to the preceding vehicle includes a model that maximizes the distance to the preceding vehicle and/or a model that minimizes the distance to the preceding vehicle.
  • the first driving model involved in the embodiments of the present application may also be referred to as a reference driving model or the like, and the first driving model may be a model obtained by using a model training method.
  • the first driving model may be a model trained using a data-driven method (eg, imitation learning, reinforcement learning, etc.).
  • FIG. 7 shows a schematic flowchart of training a first driving model according to an embodiment of the present application.
  • a set of hyperparameters (such as learning rate, batch size, etc.) can be initialized, a set of performance indicators of the first driving model (such as speed, acceleration, distance to the preceding vehicle, etc.) can be determined, and a preset model (such as neural network model) structure and parameters.
  • the preset model may be a two-layer fully connected model, and the number of neurons in each hidden layer may be 128.
  • the network model structure and parameters can be initialized by sampling from a collection of hyperparameters.
  • the information of the vehicles around the vehicle (such as position, speed, direction, etc.) collected by the on-board sensor equipment
  • the information of N (N is a natural number) vehicles closest to the vehicle is extracted, and the state information of the vehicle is fused as the input o t ( partially observable), the operation of the decision to give each vehicle a t (acceleration).
  • Simulator model in a preset receiving a t the output of each function of the vehicle reward (the reward function may also be referred to) r t (return can comprise a dense internal excitation), and transferred to a new state.
  • the reward function is related to the distance between the ego car and the preceding car, the speed of the ego car and the speed of the preceding car.
  • the reward functions are respectively: negatively correlated with the distance between the ego vehicle and the preceding car, negatively correlated with the speed of the ego car, and positively correlated with the speed of the preceding car.
  • the reward function can be:
  • ttc d front /(vv front );
  • d front is the distance between the vehicle and the vehicle in front, and the unit of distance can be meters;
  • v is the speed of the vehicle, and the unit of speed can be meters per second;
  • v front is the speed of the vehicle in front ;
  • ttc target can be set according to the actual situation, for example, it can be set to 2 meters per second by default, etc.;
  • x can be set according to the actual situation, for example, it can be set to 2 meters per second, etc.
  • the reward function is related to the speed of the ego.
  • the reward function when the speed of the ego car is less than 2 meters per second, the reward function is positively related to the speed of the ego car; when the speed of the ego car is greater than the first constant, the reward function is negatively related to the speed of the ego car. ; When the speed of the self-vehicle is greater than or equal to 2 meters per second and less than or equal to the first constant, the reward function is positively correlated with the speed of the self-vehicle; the first constant is greater than 2 meters per second.
  • the reward function can be:
  • v target can be set according to the actual situation, for example, it can be 5 meters per second by default.
  • the objective function of the preset model is related to the cumulative reward of the ego in a trajectory.
  • the objective function can be:
  • R is the cumulative reward of a trajectory and ⁇ is a parameter of the policy model.
  • the first driving model can be obtained.
  • the driving strategy output by the first driving model is a driving strategy for passing through the intersection
  • the success rate of the vehicle passing through the intersection reaches a certain threshold
  • the training is terminated, and the first driving model is obtained.
  • the second driving model involved in the embodiment of the present application may be obtained by adjusting the hyperparameters of the first driving model on the basis of the first driving model.
  • FIG. 8 shows a schematic flowchart of training a second driving model according to an embodiment of the present application.
  • a set of performance indicators can be sampled, and each performance indicator can generate a certain number of second driving models (which may also be referred to as population models or population models).
  • training can be performed based on the first driving model. For example, if a certain performance index needs to generate M (M is a natural number) second driving models, M first driving models can be copied for the performance index, and then a second driving model is generated based on each first driving model. For example, the intrinsic incentives in the reward function of the first driving model can be removed, and only the rewards of outputting the correct results can be retained, and multi-agent reinforcement learning can be used for training.
  • M is a natural number
  • the hyperparameters may be sampled from the set of hyperparameters according to the requirements of the performance index (for example, maximizing the performance index or minimizing the performance index, etc.), adjusting some of the hyperparameters initialized by the first driving model, and evolving the first driving model.
  • a second driving model (which may be called a diversity driving model) that is strongly correlated with the corresponding performance index can be obtained.
  • the step of generating the second driving model is performed for multiple performance indicators, and different styles of driving models that are strongly related to the corresponding performance indicators can be generated, such as a model that maximizes speed, a model that minimizes speed, and maximizes the distance to the preceding vehicle. , a model that minimizes the distance to the vehicle in front, a model that maximizes average acceleration, a model that minimizes average acceleration, etc.
  • FIG. 9 shows a schematic diagram of generating the second driving model.
  • multiple populations can be sampled from the set of performance indicators (for example, including performance indicators A1-Ai, N1-Ni, etc.), and multiple samples can be sampled from the set of hyperparameters (for example, including performance indicators A1-Ai) , N1-Ni corresponding hyperparameters), load the first driving model (or referred to as the benchmark driving model) for each performance index, copy the model weight of the first driving model, and add exploration randomness to the hyperparameters of the first driving model until the adjusted model converges to a strong correlation with the performance index, and a second driving model is obtained.
  • the benchmark driving model or referred to as the benchmark driving model
  • the second driving model may output a driving strategy (or called a driving algorithm or a driving model) corresponding to the performance index, and assign the driving strategy of the autonomous vehicle in the second driving model to the obstacle vehicle (possibly Called a social vehicle), and from a parameterized environment model (such as a model containing road conditions, weather, and other driving environments, etc.), a variety of autonomous driving scenarios can be generated.
  • driving strategies corresponding to different performance indicators can be assigned to the obstacle vehicle, so that the driving strategies of the obstacle vehicle are diversified, and the driving strategy of the obstacle vehicle does not need to rely on manual coding, Vehicles with obstacles based on diverse driving strategies can generate rich autonomous driving scenarios.
  • a robust target driving model can be obtained by training based on the multiple generated automatic driving scenarios.
  • FIG. 10 shows a schematic diagram of a training target driving model.
  • model training can be carried out for each automatic driving scenario, and a driving model suitable for the automatic driving scenario can be obtained.
  • a driving model suitable for the automatic driving scenario can be obtained.
  • the number of automatic driving scenarios to which the driving model can be applied reaches a preset value, It can be considered that a robust target driving model is obtained.
  • multiple autonomous driving scenarios can be sorted, and driving strategies are trained in sequence in the sorted autonomous driving scenarios to obtain a target driving model.
  • multiple generated autonomous driving scenarios can be sorted according to the driving difficulty from easy to difficult, and the prior driving strategy obtained by training in the sorted prior autonomous driving scenarios can be used as the input of the subsequent autonomous driving scenarios, and a target can be obtained by training in sequence. driving model.
  • FIG. 11 shows a schematic diagram of generating a target driving model.
  • the various autonomous driving scenarios can be sorted by difficulty, and the autonomous driving model can be trained in the autonomous driving scenario one by one by means of curriculum learning.
  • the output of the autonomous driving model meets the conditions in the current autonomous driving scenario .
  • the driving difficulty can be set according to experience, rules or standards.
  • training the autopilot model in the order from easy to difficult autopilot scenarios can achieve progressive training, which can save computing resources compared to training directly in difficult autopilot scenarios.
  • FIG. 12 is a schematic flowchart of a method for generating an automatic driving scene provided by an embodiment of the present application. As shown in FIG. 12 , the method includes:
  • S1201 Acquire a first driving model, where the first driving model is used to output a driving strategy.
  • the first driving model may be obtained by training information of multiple vehicles, and therefore, the first driving model may be used to output a driving strategy of at least one autonomous vehicle.
  • S1202 According to the performance index in automatic driving, modify the hyperparameters of the first driving model to obtain the second driving model corresponding to the performance index.
  • the hyperparameter set of the first driving model may be sampled, the hyperparameters of a plurality of first driving models may be initialized using the sampling results, and the parameters of some of the first driving models may be adjusted according to the performance indicators in automatic driving. Hyperparameters are used to obtain the second driving model corresponding to the performance index.
  • S1203 Sample the driving data of the self-driving vehicle in the model of the performance index.
  • S1204 Assign an obstacle vehicle according to the driving data of the automatic driving vehicle, and generate an automatic driving scene in combination with a preset environment model.
  • the above implementing devices include hardware structures and/or software units corresponding to executing the functions.
  • the present application can be implemented in hardware or a combination of hardware and computer software with the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
  • an embodiment of the present application is an apparatus for generating an automatic driving scene
  • the apparatus for generating an automatic driving scene includes a processor 1300 , a memory 1301 , and a transceiver 1302 ;
  • the processor 1300 is responsible for managing the bus architecture and general processing, and the memory 1301 may store data used by the processor 1300 when performing operations.
  • the transceiver 1302 is used to receive and transmit data under the control of the processor 1300 for data communication with the memory 1301 .
  • the bus architecture may include any number of interconnected buses and bridges, in particular one or more processors represented by processor 1300 and various circuits of memory represented by memory 1301 linked together.
  • the bus architecture may also link together various other circuits, such as peripherals, voltage regulators, and power management circuits, which are well known in the art and, therefore, will not be described further herein.
  • the bus interface provides the interface.
  • the processor 1300 is responsible for managing the bus architecture and general processing, and the memory 1301 may store data used by the processor 1300 when performing operations.
  • each step of the flow of automatic driving scene generation may be completed by an integrated logic circuit of hardware in the processor 1300 or instructions in the form of software.
  • the processor 1300 may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and may implement or execute the embodiments of the present application.
  • a general purpose processor may be a microprocessor or any conventional processor or the like.
  • the steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory 1301, and the processor 1300 reads the information in the memory 1301, and completes the steps of the signal processing flow in combination with its hardware.
  • the processor 1300 is configured to read the program in the memory 1301 and execute the method flow in S1201-S1204 shown in FIG. 12 .
  • an embodiment of the present application provides an apparatus for generating an automatic driving scene.
  • the apparatus includes a transceiver module 1400 and a processing module 1401 .
  • the transceiver module 1400 is configured to support the processing module 1401 to obtain the first driving model.
  • the processing module 1401 is used to obtain a first driving model, which is used for outputting driving strategies of at least one autonomous vehicle; sampling the hyperparameter set of the first driving model, and initializing a plurality of first driving models using the sampling results.
  • a first driving model which is used for outputting driving strategies of at least one autonomous vehicle
  • sampling the hyperparameter set of the first driving model and initializing a plurality of first driving models using the sampling results.
  • For the hyperparameters of the first driving model according to the performance indicators in automatic driving, adjust some of the hyperparameters of the first driving model to obtain the second driving model corresponding to the performance indicators; sample the automatic driving vehicle in the second driving model corresponding to the performance indicators
  • the driving data of the autonomous driving vehicle is assigned; the obstacle vehicle is assigned according to the driving data of the autonomous driving vehicle, and the autonomous driving scene is generated in combination with the preset environment model.
  • the processing module is specifically configured to: obtain the driving-related data of the first vehicle and the driving-related data of the surrounding vehicles of the first vehicle; and combine the driving-related data of the first vehicle with the surrounding vehicles of the first vehicle.
  • the driving-related data is input into a preset model; the preset model is used to output the driving strategy of the first vehicle; the parameters of the preset model are adjusted until the driving strategy of the first vehicle output by the preset model meets the preset conditions, and the result is obtained:
  • the first driving model In this way, the first driving model can be obtained by training based on the form-related data of the vehicle.
  • the driving-related data includes one or more of the following: position data, speed data or direction data.
  • position data position data
  • speed data direction data.
  • an accurate first driving model can be obtained by training according to driving-related data such as position data, speed data and/or direction data.
  • the reward function of the preset model is related to the difference between the first vehicle and the vehicle in front of the first vehicle.
  • the distance, the speed of the first vehicle and the speed of the vehicle in front of the first vehicle are related.
  • the reward functions of the preset model are respectively: negatively correlated with the distance, negatively correlated with the speed of the first vehicle, and positively correlated with the speed of the preceding vehicle of the first vehicle.
  • the reward function of the preset model satisfies:
  • ttc d front /(vv front )
  • d front is the distance between the first vehicle and the vehicle in front of the first vehicle
  • v is the speed of the first vehicle
  • v front is the speed of the vehicle in front of the first vehicle
  • ttc target is the first value
  • the reward function of the preset model is related to the speed of the first vehicle.
  • the reward function of the preset model when the speed of the first vehicle is less than 2 meters per second, the reward function of the preset model is positively correlated with the speed of the first vehicle; when the speed of the first vehicle is greater than the first constant In this case, the reward function of the preset model is negatively related to the speed of the first vehicle; when the speed of the first vehicle is greater than or equal to 2 meters per second and less than or equal to the first constant, the reward function of the preset model is The function is positively related to the speed of the first vehicle; the first constant is greater than 2 meters per second.
  • the reward function of the preset model satisfies:
  • v is the speed of the first vehicle
  • v target is a constant
  • the objective function of the preset model is related to the cumulative reward of the first vehicle in a trajectory.
  • the objective function includes:
  • R is the cumulative reward of the first vehicle in a segment of trajectory and ⁇ is the model parameter.
  • the number of automatic driving scenarios is multiple; the method further includes: sorting multiple automatic driving scenarios; and sequentially training driving strategies in the sorted multiple automatic driving scenarios to obtain a target driving model.
  • the method further includes: sorting multiple automatic driving scenarios; and sequentially training driving strategies in the sorted multiple automatic driving scenarios to obtain a target driving model.
  • the processing module is specifically used for: for multiple autonomous driving scenarios sorted according to the driving difficulty from easy to difficult, the prior driving strategy obtained by training in the prior autonomous driving scenario is used as the subsequent autonomous driving scenario.
  • the input is trained in turn to obtain a target driving model.
  • the performance index includes: a speed index, an acceleration index or a distance index from the preceding vehicle.
  • the population model of the performance index includes one or more of the following: a model that maximizes speed, a model that minimizes speed, a model that maximizes the distance from the preceding vehicle, and the model that minimizes the distance from the preceding vehicle. model, a model that maximizes average acceleration, or a model that minimizes average acceleration.
  • the hyperparameters include one or more of the following: learning rate or batch size.
  • the driving-related data is collected from real road test data, and/or the driving-related data is generated by the interaction between the vehicle and the environment in the simulator.
  • the functions of the transceiver module 1400 and the processing module 1401 shown in FIG. 14 may be executed by the processor 1300 running a program in the memory 1301 , or executed by the processor 1300 alone.
  • the present application provides a vehicle, the device includes at least one camera 1501 , at least one memory 1502 , at least one transceiver 1503 and at least one processor 1504 .
  • the camera 1501 is used to acquire at least one image.
  • the memory 1502 is used to store one or more programs and data information; wherein the one or more programs include instructions.
  • the transceiver 1503 is used for data transmission with the communication device in the vehicle and data transmission with the cloud.
  • the processor 1504 is configured to acquire a first driving model, which is used for outputting a driving strategy of at least one autonomous driving vehicle; sampling the hyperparameter set of the first driving model, and initializing a plurality of first driving models using the sampling results.
  • a hyperparameter of the driving model according to the performance index in the automatic driving, adjust some of the hyperparameters of the first driving model to obtain the second driving model corresponding to the performance index; sample the automatic driving vehicle in the second driving model corresponding to the performance index Driving data; assign obstacle vehicles according to the driving data of the self-driving vehicle, and combine the preset environment model to generate the self-driving scene.
  • various aspects of the method for automatic driving scene generation provided by the embodiments of the present application may also be implemented in the form of a program product, which includes program code, and when the program code runs on a computer device , the program code is used to cause the computer device to execute the steps in the method for generating an automatic driving scene according to various exemplary embodiments of the present application described in this specification.
  • the program product may employ any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • the program product for automatic driving scenario generation can adopt a portable compact disk read only memory (CD-ROM) and include program codes, and can be executed on a server device.
  • CD-ROM portable compact disk read only memory
  • the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that contains or stores a program that can be transmitted by communication, used by an apparatus or device, or used in conjunction therewith.
  • a readable signal medium may include a propagated data signal in baseband or as part of a carrier wave, carrying readable program code therein. Such propagated data signals may take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a readable signal medium can also be any readable medium, other than a readable storage medium, that can transmit, propagate, or transport a program for use by or in connection with a periodic network action system, apparatus, or device.
  • Program code embodied on a readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Program code for carrying out the operations of the present application may be written in any combination of one or more programming languages, including object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural Programming Language - such as the "C" language or similar programming language.
  • the program code may execute entirely on the user computing device, partly on the user device, as a stand-alone software package, partly on the user computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on.
  • the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device.
  • LAN local area network
  • WAN wide area network
  • the embodiments of the present application further provide a storage medium readable by a computing device for the method for generating an automatic driving scenario, that is, the content is not lost after a power failure.
  • Software programs are stored in the storage medium, including program codes. When the program codes are run on a computing device, the software programs can implement any of the above embodiments of the present application when read and executed by one or more processors.
  • the embodiment of the present application also provides an electronic device.
  • the electronic device includes: a processing module for supporting the automatic driving scene generation device to perform the steps in the above embodiments, for example, it can be Perform operations from S101 to S102, or other processes of the technology described in the embodiments of this application.
  • the automatic driving scene generating apparatus includes but is not limited to the unit modules listed above.
  • the specific functions that can be implemented by the above functional units also include but are not limited to the functions corresponding to the method steps described in the above examples.
  • the detailed description of other units of the electronic device please refer to the detailed description of the corresponding method steps. This application implements Examples are not repeated here.
  • the electronic device involved in the above embodiments may include: a processing module, a storage module and a communication module.
  • the storage module is used to save the program codes and data of the electronic device.
  • the communication module is used to support the communication between the electronic device and other network entities, so as to realize the functions of the electronic device's call, data interaction, Internet access and so on.
  • the processing module is used to control and manage the actions of the electronic device.
  • the processing module may be a processor or a controller.
  • the communication module may be a transceiver, an RF circuit or a communication interface or the like.
  • the storage module may be a memory.
  • the electronic device may further include an input module and a display module.
  • the display module can be a screen or a display.
  • the input module can be a touch screen, a voice input device, or a fingerprint sensor.
  • the present application may also be implemented in hardware and/or software (including firmware, resident software, microcode, etc.). Still further, the present application may take the form of a computer program product on a computer-usable or computer-readable storage medium having computer-usable or computer-readable program code embodied in the medium for use by an instruction execution system or Used in conjunction with an instruction execution system.
  • a computer-usable or computer-readable medium can be any medium that can contain, store, communicate, transmit, or transmit a program for use by, or in connection with, an instruction execution system, apparatus, or device. device or equipment use.

Abstract

一种自动驾驶场景生成方法、装置及系统,包括:获取第一驾驶模型,第一驾驶模型用于至少一辆自动驾驶车辆(100、510、512)的输出驾驶策略;对第一驾驶模型的超参数集合进行采样,利用采样结果初始化多个第一驾驶模型的超参数,根据自动驾驶中的性能指标,调整部分第一驾驶模型的超参数,得到性能指标对应的第二驾驶模型;在性能指标对应的第二驾驶模型中采样自动驾驶车辆(100、510、512)的驾驶数据;根据自动驾驶车辆(100、510、512)的驾驶数据赋值障碍物车辆,以及结合预设的环境模型,生成自动驾驶场景。

Description

自动驾驶场景生成方法、装置及系统
本申请要求于2020年07月22日提交中国国家知识产权局、申请号为202010711287.4、申请名称为“自动驾驶场景生成方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能的自动驾驶技术领域,尤其涉及一种自动驾驶场景生成方法、装置及系统。
背景技术
人工智能(artificial intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能与人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人,自然语言处理,计算机视觉,决策与推理,人机交互,推荐与搜索,AI基础理论等。
自动驾驶是人工智能领域的一种主流应用,自动驾驶技术依靠计算机视觉、雷达、监控装置和全球定位系统等协同合作,让机动车辆可以在不需要人类主动操作下,实现自动驾驶。自动驾驶的车辆使用各种计算系统来帮助乘客从一个位置运输到另一位置。一些自动驾驶车辆可能要求来自操作者(诸如,领航员、驾驶员、或者乘客)的一些初始输入或者连续输入。自动驾驶车辆准许操作者从手动模操作式切换到自动驾驶模式或者介于两者之间的模式。由于自动驾驶技术无需人类来驾驶机动车辆,所以理论上能够有效避免人类的驾驶失误,减少交通事故的发生,且能够提高公路的运输效率。因此,自动驾驶技术越来越受到重视。
在自动驾驶领域中,自动驾驶策略(也可以称为自动驾驶算法)在应用于自动驾驶车辆之前,通常需要进行训练。例如,在训练中,可以在设定的自动驾驶场景中训练自动驾驶策略,直到自动驾驶策略满足需求。
相关技术中,自动驾驶场景是利用相机等采集真实的路测环境图像,并对图像进行处理,得到仿真环境中的自动驾驶场景。
但是,通常真实的路测环境中的驾驶场景不够全面,导致相关技术中自动驾驶策略的鲁棒性差。
发明内容
本申请实施例提供一种自动驾驶场景生成方法、装置及系统,用于获取较为全面的、多样性的自动驾驶场景。
应理解,本申请实施例中提供的进行自动驾驶场景生成的方法可以由自动驾驶场景生成系统执行。
在一种可能的实现方式中,自动驾驶场景生成的系统包括第一驾驶模型获取单元、第二驾驶模型获取单元、采样单元和自动驾驶场景生成单元。
其中,第一驾驶模型获取单元,用于获取第一驾驶模型,第一驾驶模型用于输出驾驶策略。
第二驾驶模型获取单元,用于针对自动驾驶中的性能指标,修改第一驾驶模型的超参数,得到性能指标对应的第二驾驶模型。
采样单元,用于在性能指标对应的第二驾驶模型中采样自动驾驶车辆的驾驶数据。
自动驾驶场景生成单元,用于根据自动驾驶车辆的驾驶数据赋值障碍物车辆,以及结合预设的环境模型,生成自动驾驶场景。
需要说明的是,本申请实施例中自动驾驶场景生成系统可以是单独一个具有自动驾驶场景生成功能的装置。也可以是至少两个装置的组合,即至少两个装置组合成一个整体具有自动驾驶场景生成功能的系统,当自动驾驶场景生成系统为至少两个装置的组合时,自动驾驶场景生成系统中的两个装置之间,可以通过蓝牙、有线连接或者无线传输中的一种通信方式进行通信。
其中,本申请实施例中的自动驾驶场景生成系统可以安装在移动设备上,例如车辆中,用于该车辆生成自动驾驶场景。另外,自动驾驶场景生成系统除了安装在移动设备上以外,还可以安装在固定的设备上,例如,安装在服务器、终端设备等设备上,用于生成自动驾驶场景。
第一方面,本申请实施例提供一种自动驾驶场景生成方法,包括:
获取第一驾驶模型,第一驾驶模型用于至少一辆自动驾驶车辆的输出驾驶策略;对第一驾驶模型的超参数集合进行采样,利用采样结果初始化多个所述第一驾驶模型的超参数,根据自动驾驶中的性能指标,调整部分第一驾驶模型的超参数,得到性能指标对应的第二驾驶模型;在性能指标对应的第二驾驶的模型中采样自动驾驶车辆的驾驶数据;根据自动驾驶车辆的驾驶数据赋值障碍物车辆,以及结合预设的环境模型,生成自动驾驶场景。本申请实施例中,在获取第一驾驶模型后,可以进一步基于第一驾驶模型得到与性能指标相关的多样化的车辆行驶数据,将多样化的车辆行驶数据赋值到障碍物数据,并结合环境模型,可以得到较为全面的、多样性的自动驾驶场景。
一种可能的实现方式中,获取第一驾驶模型包括:获取第一车辆的行驶相关数据和第一车辆的周围车辆的行驶相关数据;将第一车辆的行驶相关数据和第一车辆的周围车辆的行驶相关数据输入预设的模型;利用预设的模型输出第一车辆的驾驶策略;调整预设的模型的参数,直到预设的模型输出的第一车辆的驾驶策略符合预设条件,得到第一驾驶模型。这样,可以基于车辆的行驶相关数据训练得到第一驾驶模型。
一种可能的实现方式中,行驶相关数据包括下述的一种或多种:位置数据、速度数据或方向数据。这样,可以根据位置数据、速度数据和/或方向数据等与行驶相关的数据,训练得到准确的第一驾驶模型。
一种可能的实现方式中,在第一车辆与第一车辆的前车的预计碰撞时间小于第一值的情况下,预设的模型的奖励函数与第一车辆与第一车辆的前车的距离、第一车辆 的速度以及第一车辆的前车的速度相关。
一种可能的实现方式中,预设的模型的奖励函数分别:与距离负相关、与第一车辆的速度负相关、与第一车辆的前车的速度正相关。
一种可能的实现方式中,预设的模型的奖励函数满足:
r ttc=max(-1/(ttc/ttc target) 2,-100)
其中,ttc=d front/(v-v front),d front是第一车辆与第一车辆的前车的距离,v是第一车辆的速度,v front是第一车辆的前车的速度,ttc target为第一值。
一种可能的实现方式中,在第一车辆与第一车辆的前车的预计碰撞时间大于或等于第一值的情况下,预设的模型的奖励函数与第一车辆的速度相关。
一种可能的实现方式中,在第一车辆的速度小于2米每秒的情况下,预设的模型的奖励函数与第一车辆的速度正相关;在第一车辆的速度大于第一常量的情况下,预设的模型的奖励函数与第一车辆的速度负相关;在第一车辆的速度大于或等于2米每秒,且小于或等于第一常量的情况下,预设的模型的奖励函数与第一车辆的速度正相关;第一常量大于2米每秒。
一种可能的实现方式中,预设的模型的奖励函数满足:
Figure PCTCN2021107014-appb-000001
其中,v是第一车辆的速度,v target为常量。
一种可能的实现方式中,预设的模型的目标函数与一段轨迹中第一车辆的累积回报相关。
一种可能的实现方式中,目标函数包括:
Figure PCTCN2021107014-appb-000002
其中R是一段轨迹中第一车辆的累积回报,θ是模型参数。
一种可能的实现方式中,自动驾驶场景的数量为多个;方法还包括:排序多个自动驾驶场景;依次在排序后的多个自动驾驶场景中训练驾驶策略,得到目标驾驶模型。这样,在排序后的多个自动驾驶场景中训练驾驶策略,可以得到能够适应多个自动驾驶场景的目标驾驶模型。
一种可能的实现方式中,依次在排序后的多个自动驾驶场景中训练驾驶策略,得到目标驾驶模型,包括:对于按照驾驶难度从易到难排序的多个自动驾驶场景,将在先自动驾驶场景中训练得到在先驾驶策略作为在后自驾驾驶场景的输入,依次训练得到一个目标驾驶模型。这样,按照自动驾驶场景从易到难的顺序训练自动驾驶模型,可以实现递进的训练,相较于直接在难度较高的自动驾驶场景中训练,可以节约计算资源。
一种可能的实现方式中,性能指标包括:速度指标,加速度指标或与前车距离指标。
一种可能的实现方式中,性能指标的种群模型包括下述一种或多种:最大化速度的模型、最小化速度的模型、最大化与前车距离的模型、最小化与前车距离的模型、最大化平均加速度的模型或最小化平均加速度的模型。
一种可能的实现方式中,超参数包括下述一种或多种:学习率或批大小。
一种可能的实现方式中,行驶相关数据是从真实路测数据中收集的,和/或,行驶相关数据是模拟器中的车辆与环境交互生成的。
需要说明的是,本申请实施例方法可以在本地执行,也可以在云端执行,具体本申请实施例不做限定。
第二方面,本申请实施例提供一种自动驾驶场景生成装置,该装置可以用来执行上述第一方面及第一方面的任意可能的实现方式中的操作。例如,装置可以包括用于执行上述第一方面或第一方面的任意可能的实现方式中的各个操作的模块或单元。比如包括收发模块和处理模块。
示例性的,处理模块,用于:获取第一驾驶模型,第一驾驶模型用于至少一辆自动驾驶车辆的输出驾驶策略;对第一驾驶模型的超参数集合进行采样,利用采样结果初始化多个所述第一驾驶模型的超参数,根据自动驾驶中的性能指标,调整部分第一驾驶模型的超参数,得到性能指标对应的第二驾驶模型;在性能指标对应的第二驾驶的模型中采样自动驾驶车辆的驾驶数据;根据自动驾驶车辆的驾驶数据赋值障碍物车辆,以及结合预设的环境模型,生成自动驾驶场景。
一种可能的实现方式中,处理模块,具体用于:获取第一车辆的行驶相关数据和第一车辆的周围车辆的行驶相关数据;将第一车辆的行驶相关数据和第一车辆的周围车辆的行驶相关数据输入预设的模型;利用预设的模型输出第一车辆的驾驶策略;调整预设的模型的参数,直到预设的模型输出的第一车辆的驾驶策略符合预设条件,得到第一驾驶模型。这样,可以基于车辆的形式相关数据训练得到第一驾驶模型。
一种可能的实现方式中,行驶相关数据包括下述的一种或多种:位置数据、速度数据或方向数据。这样,可以根据位置数据、速度数据和/或方向数据等与行驶相关的数据,训练得到准确的第一驾驶模型。
一种可能的实现方式中,在第一车辆与第一车辆的前车的预计碰撞时间小于第一值的情况下,预设的模型的奖励函数与第一车辆与第一车辆的前车的距离、第一车辆的速度以及第一车辆的前车的速度相关。
一种可能的实现方式中,预设的模型的奖励函数分别:与距离负相关、与第一车辆的速度负相关、与第一车辆的前车的速度正相关。
一种可能的实现方式中,预设的模型的奖励函数满足:
r ttc=max(-1/(ttc/ttc target) 2,-100)
其中,ttc=d front/(v-v front),d front是第一车辆与第一车辆的前车的距离,v是第一车辆的速度,v front是第一车辆的前车的速度,ttc target为第一值。
一种可能的实现方式中,在第一车辆与第一车辆的前车的预计碰撞时间大于或等于第一值的情况下,预设的模型的奖励函数与第一车辆的速度相关。
一种可能的实现方式中,在第一车辆的速度小于2米每秒的情况下,预设的模型的奖励函数与第一车辆的速度正相关;在第一车辆的速度大于第一常量的情况下,预设的模型的奖励函数与第一车辆的速度负相关;在第一车辆的速度大于或等于2米每秒,且小于或等于第一常量的情况下,预设的模型的奖励函数与第一车辆的速度正相关;第一常量大于2米每秒。
一种可能的实现方式中,预设的模型的奖励函数满足:
Figure PCTCN2021107014-appb-000003
其中,v是第一车辆的速度,v target为常量。
一种可能的实现方式中,预设的模型的目标函数与一段轨迹中第一车辆的累积
回报相关。
一种可能的实现方式中,目标函数包括:
Figure PCTCN2021107014-appb-000004
其中R是一段轨迹中第一车辆的累积回报,θ是模型参数。
一种可能的实现方式中,自动驾驶场景的数量为多个;方法还包括:排序多个自动驾驶场景;依次在排序后的多个自动驾驶场景中训练驾驶策略,得到目标驾驶模型。这样,在排序后的多个自动驾驶场景中训练驾驶策略,可以得到能够适应多个自动驾驶场景的目标驾驶模型。
一种可能的实现方式中,处理模块,具体用于:对于按照驾驶难度从易到难排序的多个自动驾驶场景,将在先自动驾驶场景中训练得到在先驾驶策略作为在后自驾驾驶场景的输入,依次训练得到一个目标驾驶模型。这样,按照自动驾驶场景从易到难的顺序训练自动驾驶模型,可以实现递进的训练,相较于直接在难度较高的自动驾驶场景中训练,可以节约计算资源。
一种可能的实现方式中,性能指标包括:速度指标,加速度指标或与前车距离指标。
一种可能的实现方式中,性能指标的种群模型包括下述一种或多种:最大化速度的模型、最小化速度的模型、最大化与前车距离的模型、最小化与前车距离的模型、最大化平均加速度的模型或最小化平均加速度的模型。
一种可能的实现方式中,超参数包括下述一种或多种:学习率或批大小。
一种可能的实现方式中,行驶相关数据是从真实路测数据中收集的,和/或,行驶相关数据是模拟器中的车辆与环境交互生成的。
第三方面,本申请实施例提供了一种芯片系统,包括处理器,可选的还包括存储器;其中,存储器用于存储计算机程序,处理器用于从存储器中调用并运行计算机程序,使得安装有芯片系统的自动驾驶场景生成装置执行上述第一方面或第一方面的任意可能的实现方式中的任一方法。
第四方面,本申请实施例提供了一种车辆,至少一个摄像器,至少一个存储器,至少一个收发器以及至少一个处理器。
摄像器,用于获取至少一张图像;存储器,用于存储一个或多个程序以及数据信息;其中一个或多个程序包括指令;收发器,用于与车辆中的通讯设备进行数据传输,以及用于与云端进行数据传输;处理器,用于获取第一驾驶模型,第一驾驶模型用于至少一辆自动驾驶车辆的输出驾驶策略;对第一驾驶模型的超参数集合进行采样,利用采样结果初始化多个所述第一驾驶模型的超参数,根据自动驾驶中的性能指标,调整部分第一驾驶模型的超参数,得到性能指标对应的第二驾驶模型;在性能指标对应的第二驾驶模型中采样自动驾驶车辆的驾驶数据;根据自动驾驶车辆的驾驶数据赋值 障碍物车辆,以及结合预设的环境模型,生成自动驾驶场景。
本申请实施例的处理器,还可以执行如第二方面任一项可能的实现方式中处理模块对应的步骤,具体可以参照第二方面的描述,在此不再赘述。
第五方面,本申请实施例提供了一种计算机程序产品,计算机程序产品包括:计算机程序代码,当计算机程序代码被自动驾驶场景生成装置的通信模块、处理模块或收发器、处理器运行时,使得自动驾驶场景生成装置执行上述第一方面或第一方面的任意可能的实现方式中的任一方法。
第六方面,本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质存储有程序,程序使得自动驾驶场景生成装置执行上述第一方面或第一方面的任意可能的实现方式中的任一方法。
第七方面,本申请实施例提供了一种自动驾驶系统,包括训练设备和执行设备;训练设备用于执行上述第一方面或第一方面的任意可能的实现方式中的任一方法,执行设备用于执行根据训练设备训练得到的驾驶策略。
应当理解的是,本申请的第二方面至第七方面与本申请的第一方面的技术方案相对应,各方面及对应的可行实施方式所取得的有益效果相似,不再赘述。
附图说明
图1为本申请实施例提供的系统架构示意图;
图2是本申请实施例提供的车辆100的功能框图;
图3为图2中的计算机系统的结构示意图;
图4为本申请实施例提供的一种芯片硬件结构的示意图;
图5为本申请实施例提供的操作环境示意图;
图6为本申请实施例提供的一种自动驾驶场景生成方法的流程示意图;
图7为本申请实施例提供的一种模型训练示意图;
图8为本申请实施例提供的一种模型训练示意图;
图9为本申请实施例提供的一种自动驾驶场景生成示意图;
图10为本申请实施例提供的一种模型训练示意图;
图11为本申请实施例提供的一种模型训练示意图;
图12为本申请实施例提供的一种自动驾驶场景生成方法的流程示意图;
图13为本申请实施例提供的一种自动驾驶场景生成装置的结构示意图;
图14为本申请实施例提供的另一种自动驾驶场景生成装置的结构示意图;
图15为本申请实施例提供的一种车辆的结构示意图。
具体实施方式
首先,对本申请实施例所涉及的应用场景和部分术语进行解释说明。
本申请实施例提供的自动驾驶场景生成方法、装置及系统能够应用在自动驾驶车辆的驾驶策略规划等场景。示例性地,本申请实施例提供的自动驾驶场景生成方法、装置及系统能够应用在A场景和B场景中,下面分别对A场景和B场景进行简单的介绍。
A场景:
自动驾驶车辆在道路的行驶过程中,可以依据自车和周围车辆的驾驶相关数据得到驾驶策略(也可能称为驾驶算法、控制策略或控制算法等)的模型,依据本申请实施例的自动驾驶场景生成方法生成自动驾驶场景,进而可以基于生成的自动驾驶场景训练鲁棒性较强的目标驾驶策略。
B场景:
自动驾驶车辆在道路的行驶过程中,可以采集自车和周围车辆的驾驶相关数据,以及将自车和周围车辆的驾驶相关数据发送给与该自动驾驶车辆通信的其他设备,该其他设备跟据自车和周围车辆的驾驶相关数据得到用于输出驾驶策略的模型,该其他设备依据本申请实施例的自动驾驶场景生成方法生成自动驾驶场景,该其他设备进而可以基于生成的自动驾驶场景训练鲁棒性较强的目标驾驶策略,该其他设备可以将该目标驾驶策略发送给自动驾驶车辆,用于自动驾驶车辆的行驶控制。
当然,本申请实施例提供的自动驾驶场景生成方法、装置及系统还可应用在其它场景,本申请实施例中对此并不作限制。
在自动驾驶领域,模拟器(也可能称为仿真器)对于自动驾驶的策略学习非常重要,其中,模拟器可以用于提供开源代码和协议,用于自动驾驶策略的训练和验证。相关技术中,可能采用道路中真实的驾驶场景或对自动驾驶领域有较强制式储备的技术人员编码得到的场景来训练驾驶模型,然而这样的实现方式,不仅需要人工大量的配置工作,而且在模拟器中设置的社会车辆(也可能称为障碍物或障碍物车辆等)的驾驶模型比较单一,无法有效地构建多样性的驾驶场景。使得使用这种模拟器训练的驾驶模型在现实的复杂场景中往往无法表现出足够的泛化性和智能性。
基于此,本申请实施例提供一种自动驾驶场景生成方法,可以自动生成丰富的自动驾驶场景,为训练鲁棒性强的驾驶模型提供可能。
本申请实施例的方法可以运行在设置有计算机系统的车辆上,环境感知、数据处理、动作选取和/或车辆控制的可执行代码可以在计算机系统的存储组件上。或者,本申请实施例的方法也可以运行在云端等。
示例性的,图1为本申请实施例提供的系统架构示意图。如图1所示,本申请实施例提供的系统架构可以包括:训练设备01和执行设备02。其中,训练设备01用于根据本申请实施例提供的方法生成自动驾驶场景和/或训练驾驶策略;执行设备02用于根据本申请实施例提供的方法使用训练设备01所训练的驾驶策略确定目标动作;当然,执行设备1002也可以用于实时训练驾驶策略,或者每隔预设时长训练驾驶策略。
本申请实施例中,执行驾驶策略的训练方法的执行主体可以是上述训练设备01,也可以是上述训练设备01中的驾驶策略的训练装置。示例性地,本申请实施例提供的驾驶策略的训练装置可以通过软件和/或硬件实现。
本申请实施例中,执行自动驾驶场景生成方法的执行主体可以是上述执行设备02,也可以是上述执行设备02中的装置。示例性地,本申请实施例提供的执行设备02中的装置可以通过软件和/或硬件实现。
示例性地,本申请实施例中提供的训练设备01可以包括但不限于:模型训练平台设备。
示例性地,本申请实施例中提供的执行设备02可以包括但不限于:自动驾驶车辆,或者自动驾驶车辆中的控制设备。
图2是本申请实施例提供的车辆100的功能框图。在一个实施例中,将车辆100配置为完全或部分地自动驾驶模式。例如,当车辆100配置为部分地自动驾驶模式时,车辆100在处于自动驾驶模式时还可通过人为操作来确定车辆及其周边环境的当前状态,确定周边环境中的至少一个其他车辆的可能行为,并确定该其他车辆执行可能行为的可能性相对应的置信水平,基于所确定的信息来控制车辆100。例如,在车辆100处于完全地自动驾驶模式中时,可以将车辆100置为不需要与人交互,自动执行驾驶相关操作。
车辆100可包括各种子系统,例如行进系统102、传感器系统104、控制系统106、一个或多个外围设备108以及电源110、计算机系统112和用户接口116。可选地,车辆100可包括更多或更少的子系统,并且每个子系统可包括多个元件。另外,车辆100的每个子系统和元件可以通过有线或者无线互连。
行进系统102可包括为车辆100提供动力运动的组件。在一个实施例中,行进系统102可包括引擎118、能量源119、传动装置120和车轮/轮胎121。引擎118可以是内燃引擎、电动机、空气压缩引擎或其他类型的引擎组合,例如汽油发动机和电动机组成的混动引擎,内燃引擎和空气压缩引擎组成的混动引擎。引擎118将能量源119转换成机械能量。
能量源119的示例包括汽油、柴油、其他基于石油的燃料、丙烷、其他基于压缩气体的燃料、乙醇、太阳能电池板、电池和其他电力来源。能量源119也可以为车辆100的其他系统提供能量。
传动装置120可以将来自引擎118的机械动力传送到车轮121。传动装置120可包括变速箱、差速器和驱动轴。在一个实施例中,传动装置120还可以包括其他器件,比如离合器。其中,驱动轴可包括可耦合到一个或多个车轮121的一个或多个轴。
传感器系统104可包括感测关于车辆100周边的环境的信息的若干个传感器。例如,传感器系统104可包括定位系统122(定位系统可以是GPS系统,也可以是北斗系统或者其他定位系统)、惯性测量单元(inertial measurement unit,IMU)124、雷达126、激光测距仪128以及相机130。传感器系统104还可包括被监视车辆100的内部系统的传感器(例如,车内空气质量监测器、燃油量表、机油温度表等)。来自这些传感器中的一个或多个的传感器数据可用于检测对象及其相应特性(位置、形状、方向、速度等)。这种检测和识别是自主车辆100的安全操作的关键功能。
定位系统122可用于估计车辆100的地理位置。IMU 124用于基于惯性加速度来感测车辆100的位置和朝向变化。在一个实施例中,IMU 124可以是加速度计和陀螺仪的组合。
雷达126可利用无线电信号来感测车辆100的周边环境内的物体。在一些实施例中,除了感测物体以外,雷达126还可用于感测物体的速度和/或前进方向。
激光测距仪128可利用激光来感测车辆100所位于的环境中的物体。在一些实施例中,激光测距仪128可包括一个或多个激光源、激光扫描器以及一个或多个检测器,以及其他系统组件。
相机130可用于捕捉车辆100的周边环境的多个图像。相机130可以是静态相机或视频相机。
控制系统106为控制车辆100及其组件的操作。控制系统106可包括各种元件,其中包括转向系统132、油门134、制动单元136、传感器融合算法138、计算机视觉系统140、路线控制系统142以及障碍物避免系统144。
转向系统132可操作来调整车辆100的前进方向。例如在一个实施例中可以为方向盘系统。
油门134用于控制引擎118的操作速度并进而控制车辆100的速度。
制动单元136用于控制车辆100减速。制动单元136可使用摩擦力来减慢车轮121。在其他实施例中,制动单元136可将车轮121的动能转换为电流。制动单元136也可采取其他形式来减慢车轮121转速从而控制车辆100的速度。
计算机视觉系统140可以操作来处理和分析由相机130捕捉的图像以便识别车辆100周边环境中的物体和/或特征。所述物体和/或特征可包括交通信号、道路边界和障碍物。计算机视觉系统140可使用物体识别算法、运动中恢复结构(structure from motion,SFM)算法、视频跟踪和其他计算机视觉技术。在一些实施例中,计算机视觉系统140可以用于为环境绘制地图、跟踪物体、估计物体的速度等等。
路线控制系统142用于确定车辆100的行驶路线。在一些实施例中,路线控制系统142可结合来自传感器138、全球定位系统(global positioning system,GPS)122和一个或多个预定地图的数据以为车辆100确定行驶路线。
障碍物规避系统144用于识别、评估和避开或者以其他方式越过车辆100的环境中的潜在障碍物。
当然,在一个实例中,控制系统106可以增加或替换地包括除了所示出和描述的那些以外的组件。或者也可以减少一部分上述示出的组件。
车辆100通过外围设备108与外部传感器、其他车辆、其他计算机系统或用户之间进行交互。外围设备108可包括无线通信系统146、车载电脑148、麦克风150和/或扬声器152。
在一些实施例中,外围设备108提供车辆100的用户与用户接口116交互的手段。例如,车载电脑148可向车辆100的用户提供信息。用户接口116还可操作车载电脑148来接收用户的输入。车载电脑148可以通过触摸屏进行操作。在其他情况中,外围设备108可提供用于车辆100与位于车内的其它设备通信的手段。例如,麦克风150可从车辆100的用户接收音频(例如,语音命令或其他音频输入)。类似地,扬声器152可向车辆100的用户输出音频。
无线通信系统146可以直接地或者经由通信网络来与一个或多个设备无线通信。例如,无线通信系统146可使用3G蜂窝通信,例如码分多址(code division multiple access,CDMA)、EVD0、全球移动通信系统(global system for mobile communications,GSM)/通用分组无线服务(general packet radio service,GPRS),或者4G蜂窝通信,例如LTE。或者5G蜂窝通信。无线通信系统146可利用无线保真(wireless-fidelity,WiFi)与无线局域网(wireless local area network,WLAN)通信。在一些实施例中,无线通信系统146可利用红外链路、蓝牙或紫蜂协议(ZigBee)与设备直接通信。其他 无线协议,例如各种车辆通信系统,例如,无线通信系统146可包括一个或多个专用短程通信(dedicated short range communications,DSRC)设备,这些设备可包括车辆和/或路边台站之间的公共和/或私有数据通信。
电源110可向车辆100的各种组件提供电力。在一个实施例中,电源110可以为可再充电锂离子或铅酸电池。这种电池的一个或多个电池组可被配置为电源为车辆100的各种组件提供电力。在一些实施例中,电源110和能量源119可一起实现,例如一些全电动车中那样。
车辆100的部分或所有功能受计算机系统112控制。计算机系统112可包括至少一个处理器113,处理器113执行存储在例如数据存储装置114这样的非暂态计算机可读介质中的指令115。计算机系统112还可以是采用分布式方式控制车辆100的个体组件或子系统的多个计算设备。
处理器113可以是任何常规的处理器,诸如商业可获得的中央处理器(central processing unit,CPU)。替选地,该处理器可以是诸如用于供专门应用的集成电路(application specific integrated circuit,ASIC)或其它基于硬件的处理器的专用设备。尽管图2功能性地图示了处理器、存储器、和在相同块中的计算机系统112的其它元件,但是本领域的普通技术人员应该理解该处理器、计算机、或存储器实际上可以包括可以或者可以不存储在相同的物理外壳内的多个处理器、计算机、或存储器。例如,存储器可以是硬盘驱动器或位于不同于计算机的外壳内的其它存储介质。因此,对处理器或计算机的引用将被理解为包括对可以或者可以不并行操作的处理器或计算机或存储器的集合的引用。不同于使用单一的处理器来执行此处所描述的步骤,诸如转向组件和减速组件的一些组件每个都可以具有其自己的处理器,所述处理器只执行与特定于组件的功能相关的计算。
在此处所描述的各个方面中,处理器可以位于远离该车辆并且与该车辆进行无线通信。在其它方面中,此处所描述的过程中的一些在布置于车辆内的处理器上执行而其它则由远程处理器执行,包括采取执行单一操纵的必要步骤。
在一些实施例中,数据存储装置114可包含指令115(例如,程序逻辑),指令115可被处理器113执行来执行车辆100的各种功能,包括以上描述的那些功能。数据存储装置114也可包含额外的指令,包括向推进系统102、传感器系统104、控制系统106和外围设备108中的一个或多个发送数据、从其接收数据、与其交互和/或对其进行控制的指令。
除了指令115以外,数据存储装置114还可存储数据,例如道路地图、路线信息,车辆的位置、方向、速度以及其它这样的车辆数据,以及其他信息。这种信息可在车辆100在自主、半自主和/或手动模式中操作期间被车辆100和计算机系统112使用。
用户接口116,用于向车辆100的用户提供信息或从其接收信息。可选地,用户接口116可包括在外围设备108的集合内的一个或多个输入/输出设备,例如无线通信系统146、车车在电脑148、麦克风150和扬声器152。
计算机系统112可基于从各种子系统(例如,行进系统102、传感器系统104和控制系统106)以及从用户接口116接收的输入来控制车辆100的功能。例如,计算机系统112可利用来自控制系统106的输入以便控制转向单元132来避免由传感器系 统104和障碍物避免系统144检测到的障碍物。在一些实施例中,计算机系统112可操作来对车辆100及其子系统的许多方面提供控制。
可选地,上述这些组件中的一个或多个可与车辆100分开安装或关联。例如,数据存储装置114可以部分或完全地与车辆100分开存在。上述组件可以按有线和/或无线方式来通信地耦合在一起。
可选地,上述组件只是一个示例,实际应用中,上述各个模块中的组件有可能根据实际需要增添或者删除,图2不应理解为对本申请实施例的限制。
在道路行进的自动驾驶汽车,如上面的车辆100,可以识别其周围环境内的物体以确定自身对当前速度的调整。所述物体可以是其它车辆、交通控制设备、或者其它类型的物体。在一些示例中,可以独立地考虑每个识别的障碍物,并且基于各个障碍物各自的特性,诸如它的当前速度、加速度、与车辆的间距等,来确定自动驾驶汽车(自车)所要调整的速度。
可选地,自动驾驶汽车车辆100或者与自动驾驶汽车车辆100相关联的计算设备(如图2的计算机系统112、计算机视觉系统140、数据存储装置114)可以基于所识别的障碍物的特性和周围环境的状态(例如,交通、雨、道路上的冰、等等)来预测所述识别的障碍物的行为。可选地,每一个所识别的障碍物都依赖于彼此的行为,因此还可以将所识别的所有障碍物全部一起考虑来预测单个识别的障碍物的行为。车辆100能够基于预测的所述识别的障碍物的行为来调整它的速度。换句话说,自动驾驶汽车能够基于所预测的障碍物的行为来确定车辆将需要调整到(例如,加速、减速、或者停止)什么状态。在这个过程中,也可以考虑其它因素来确定车辆100的速度,诸如,车辆100在行驶的道路中的横向位置、道路的曲率、静态和动态物体的接近度等等。
除了提供调整自动驾驶汽车的速度的指令之外,计算设备还可以提供修改车辆100的转向角的指令,以使得自动驾驶汽车遵循给定的轨迹和/或维持与自动驾驶汽车附近的障碍物(例如,道路上的相邻车道中的车辆)的安全横向和纵向距离。
上述车辆100可以为轿车、卡车、摩托车、公共汽车、船、飞机、直升飞机、割草机、娱乐车、游乐场车辆、施工设备、电车、高尔夫球车、火车、和手推车等,本申请实施例不做特别的限定。
图3为图2中的计算机系统112的结构示意图。如图3所示,计算机系统112包括处理器113,处理器113和系统总线105耦合。处理器113可以是一个或者多个处理器,其中每个处理器都可以包括一个或多个处理器核。显示适配器(video adapter)107,显示适配器107可以驱动显示器109,显示器109和系统总线105耦合。系统总线105通过总线桥111和输入输出(I/O)总线耦合。I/O接口115和I/O总线耦合。I/O接口115和多种I/O设备进行通信,比如输入设备117(如:键盘,鼠标,触摸屏等),多媒体盘(media tray)121,(例如,CD-ROM,多媒体接口等)。收发器123(可以发送和/或接受无线电通信信号),摄像头155(可以捕捉静态和动态数字视频图像)和外部USB接口125。其中,可选地,和I/O接口115相连接的接口可以是通用串行总线(universal serial bus,USB)接口。
其中,处理器113可以是任何传统处理器,包括精简指令集计算(“RISC”)处理 器、复杂指令集计算(“CISC”)处理器或上述的组合。可选地,处理器可以是诸如专用集成电路(“ASIC”)的专用装置。可选地,处理器113可以是神经网络处理器或者是神经网络处理器和上述传统处理器的组合。
可选地,在本文所述的各种实施例中,计算机系统可位于远离自动驾驶车辆的地方,并且可与自动驾驶车辆无线通信。在其它方面,本文所述的一些过程在设置在自动驾驶车辆内的处理器上执行,其它由远程处理器执行,包括采取执行单个操纵所需的动作。
计算机系统112可以通过网络接口129和软件部署服务器149通信。网络接口129是硬件网络接口,比如,网卡。网络127可以是外部网络,比如因特网,也可以是内部网络,比如以太网或者虚拟私人网络(VPN)。可选地,网络127还可以是无线网络,比如WiFi网络,蜂窝网络等。
硬盘驱动接口131和系统总线105耦合。硬盘驱动接口131和硬盘驱动器133相连接。系统内存135和系统总线105耦合。运行在系统内存135的软件可以包括计算机系统112的操作系统(operating system,OS)137和应用程序143。
操作系统包括Shell 139和内核(kernel)141。Shell 139是介于使用者和操作系统之内核(kernel)间的一个接口。shell是操作系统最外面的一层。shell管理使用者与操作系统之间的交互:等待使用者的输入,向操作系统解释使用者的输入,并且处理各种各样的操作系统的输出结果。
内核141由操作系统中用于管理存储器、文件、外设和系统资源的那些部分组成。直接与硬件交互,操作系统的内核141通常运行进程,并提供进程间的通信,提供CPU时间片管理、中断、内存管理、IO管理等等。
应用程序141包括控制汽车自动驾驶相关的程序,比如,管理自动驾驶的汽车和路上障碍物交互的程序,控制自动驾驶汽车路线或者速度的程序,控制自动驾驶汽车和路上其他自动驾驶汽车交互的程序。应用程序141也存在于软件部署服务器(deploying server)149的系统上。在一个实施例中,在需要执行应用程序141时,计算机系统可以从deploying server149下载应用程序143。
传感器153和计算机系统关联。传感器153用于探测计算机系统112周围的环境。举例来说,传感器153可以探测动物,汽车,障碍物和人行横道等,进一步传感器还可以探测上述动物,汽车,障碍物和人行横道等物体周围的环境,比如:动物周围的环境,例如,动物周围出现的其他动物,天气条件,周围环境的光亮度等。可选地,如果计算机系统112位于自动驾驶的汽车上,传感器可以是摄像头,红外线感应器,化学检测器,麦克风等。
图4为本申请实施例提供的一种芯片硬件结构的示意图。如图4所示,该芯片可以包括神经网络处理器40。该芯片可以被设置在如图1所示的执行设备02中,用以完成申请实施例提供的自动驾驶场景生成方法。该芯片也可以被设置在如图1所示的训练设备01中,用以完成申请实施例提供的控制策略的训练方法。
神经网络处理器40可以是神经网络处理器(neural network processing unit,NPU),张量处理器(tensor processing unit,TPU),或者图形处理器(graphics processing unit,GPU)等一切适合用于大规模异或运算处理的处理器。以NPU为例:NPU可以作为 协处理器挂载到主CPU(host CPU)上,由主CPU为其分配任务。NPU的核心部分为运算电路403,通过控制器404控制运算电路403提取存储器(401和402)中的矩阵数据并进行乘加运算。
在一些实现中,运算电路403内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路403是二维脉动阵列。运算电路403还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路403是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路403从权重存储器402中取矩阵B的权重数据,并缓存在运算电路403中的每一个PE上。运算电路403从输入存储器401中取矩阵A的输入数据,根据矩阵A的输入数据与矩阵B的权重数据进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)408中。
统一存储器406用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(direct memory access controller,DMAC)405,被搬运到权重存储器402中。输入数据也通过DMAC被搬运到统一存储器406中。
总线接口单元(bus interface unit,BIU)410,用于DMAC和取指存储器(instruction fetch buffer)409的交互;总线接口单元401还用于取指存储器409从外部存储器获取指令;总线接口单元401还用于存储单元访问控制器405从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器406中,或将权重数据搬运到权重存储器402中,或将输入数据搬运到输入存储器401中。
向量计算单元407多个运算处理单元,在需要的情况下,对运算电路403的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。向量计算单元407主要用于神经网络中非卷积层,或全连接层(fully connected layers,FC)的计算,具体可以处理:Pooling(池化),Normalization(归一化)等的计算。例如,向量计算单元407可以将非线性函数应用到运算电路403的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元407生成归一化的值、合并值,或二者均有。
在一些实现中,向量计算单元407将经处理的向量存储到统一存储器406。在一些实现中,经向量计算单元407处理过的向量能够用作运算电路403的激活输入。
控制器404连接的取指存储器(instruction fetch buffer)409,用于存储控制器404使用的指令;
统一存储器406,输入存储器401,权重存储器402以及取指存储器409均为On-Chip存储器。外部存储器独立于该NPU硬件架构。
图5为本申请实施例提供的操作环境示意图。如图5所示,云服务中心可以经网络502(如无线通信网络),从其操作环境500内的自动驾驶车辆510和512接收信息(诸如车辆传感器收集到数据或者其它信息)。
示例性地,云服务中心520可以经网络502(如无线通信网络)从自动驾驶车辆510接收自动驾驶车辆510在任意时刻的行驶信息(例如行驶速度和/或行驶位置等信 息)以及自动驾驶车辆510感知范围内其他车辆的行驶信息等。
云服务中心520根据接收到的信息,可以运行其存储的控制汽车自动驾驶相关的程序,从而实现对自动驾驶车辆510和自动驾驶车辆512的控制。控制汽车自动驾驶相关的程序可以为,管理自动驾驶的汽车和路上障碍物交互的程序,控制自动驾驶汽车路线或者速度的程序,控制自动驾驶汽车和路上其他自动驾驶汽车交互的程序。
网络502将地图的部分提供给自动驾驶车辆510和512。
例如,多个云服务中心可以接收、证实、组合和/或发送信息报告。在一些示例中还可以在自动驾驶车辆之间发送信息报告和/传感器数据。
在一些示例中,云服务中心520可以向自动驾驶车辆(或自动驾驶汽车)发送对于基于环境内可能的驾驶情况所建议的解决方案(如,告知前方障碍物,并告知如何绕开它)。例如,云服务中心520可以辅助车辆确定当面对环境内的特定障碍时如何行进。云服务中心520可以向自动驾驶车辆发送指示该车辆应当在给定场景中如何行进的响应。例如,云服务中心基于收集到的传感器数据,可以确认道路前方具有临时停车标志的存在,并还该车道上基于“车道封闭”标志和施工车辆的传感器数据,确定该车道由于施上而被封闭。相应地,云服务中心520可以发送用于自动驾驶车辆通过障碍的建议操作模式(例如:指示车辆变道另一条道路上)。云服务中心520可以观察其操作环境内的视频流并且已确认自动驾驶车辆能安全并成功地穿过障碍时,对该自动驾驶车辆所使用操作步骤可以被添加到驾驶信息地图中。相应地,这一信息可以发送到该区域内可能遇到相同障碍的其它车辆,以便辅助其它车辆不仅识别出封闭的车道还知道如何通过。
需要说明的是,自动驾驶车辆510和/或512在运行过程中可以自主控制行驶,也可以不需要云服务中心520的控制。
图6为本申请实施例的自动驾驶生成方法的一种示意图。
如图6所示,本申请实施例中可以获取车辆的驾驶数据(或者称为驾驶相关数据或行驶相关数据或者行驶数据等),采用通常的训练方法训练得到第一驾驶模型。在第一驾驶模型的基础上,可以采用自动驾驶中的性能指标,演化得到多样性驾驶模型(例如针对每个性能指标,都演化有对应的驾驶模型)。在多样性驾驶模型中,每个驾驶模型中可以包括自动驾驶车辆(或称为主车)和障碍物,针对多样性驾驶模型中的任一个驾驶模型,可以在多样性驾驶模型中的其他一个或多个驾驶模型中采样自动驾驶车辆的驾驶算法,并将采样的驾驶算法赋值给该任一个驾驶模型的障碍物车辆,将赋值后的任一个驾驶模型结合环境模型的数据,可以生成多样性的场景,进而可以采用训练算法(例如课程学习方法等)在多样性场景中训练得到具有鲁棒性的驾驶模型。
本申请实施例所涉及的车辆的行驶相关数据可以是车辆的传感器设备采集的,也可以是模拟器中强化学习中的车辆与环境交互生成的数据。示例性的,车辆的行驶相关数据可以包括自动驾驶车辆的位置数据、速度数据、方向数据等数据,以及自动驾驶车辆周围的车辆(可能称为障碍物车辆)的位置数据、速度数据、方向数据等数据。
本申请实施例所涉及的自动驾驶中的性能指标包括:速度指标、加速度指标和/或与前车距离指标等用于描述自动驾驶中的车辆相关性能的指标。
本申请实施例所涉及的性能指标对应的第二驾驶模型可以包括下述一种或多种:最大化速度的模型、最小化速度的模型、最大化与前车距离的模型、最小化与前车距离的模型、最大化平均加速度的模型或最小化平均加速度的模型。
可能的实现方式中:性能指标为速度指标时,速度指标对应的第二驾驶模型包括最大化速度的模型和/或最小化速度的模型。性能指标为加速度指标时,加速度指标对应的第二驾驶模型包括最大化平均加速度的模型和/或最小化平均加速度的模型。性能指标为与前车距离指标时,与前车距离指标对应的第二驾驶模型包括最大化与前车距离的模型和/或最小化与前车距离的模型。
本申请实施例中涉及的第一驾驶模型也可能称为基准驾驶模型等,第一驾驶模型可以是采用模型训练方法得到的模型。例如,第一驾驶模型可以是采用数据驱动的方法(如模仿学习、强化学习等)训练的到的模型。
示例性的,图7示出了本申请实施例的一种训练第一驾驶模型的流程示意图。
如图7所示,可以初始化超参数(如学习率,批大小等)集合,确定第一驾驶模型性能指标集合(例如速度,加速度,与前车距离等),初始化预设的模型(例如神经网络模型)的结构和参数。获取车辆的行驶相关数据,进而使用对应的方法训练预设的模型,直到预设的模型的输出值满足一定条件(例如预设的模型的输出值的正确率大于一定阈值),得到第一驾驶模型。
示例性,一种可能的实现方式中,预设的模型可以为两层的全连接模型,每个隐层的神经元个数可以为128。可以从超参数集合中采样,初始化网络模型结构和参数。
根据车载传感器设备采集自车周围车辆的信息(如位置、速度、方向等),提取与自车最近的N(N为自然数)辆车的信息,与自车的状态信息融合作为输入o t(部分可观测),得到每个车辆的决策动作a t(加速度)。模拟器中的预设的模型接收a t,输出每个车辆的奖励函数(也可能称为回报函数)r t(可以包含内在激励的稠密回报),并转移到新的状态。
当自车与前车的预计碰撞时间(time to collision,TTC)在(0,x)之间时,回报函数与自车与前车的距离、自车速度和前车速度相关。
示例性的,奖励函数分别:与自车与前车的距离负相关、与自车的速度负相关、与前车的速度正相关。
例如,回报函数可以为:
r ttc=max(-1/(ttc/ttc target) 2,-100)
其中ttc=d front/(v-v front);d front是自车与前车的距离,距离的单位可以是米;v是自车速度,速度的单位可以是米每秒;v front是前车速度;ttc target可以根据实际情况设置,例如可以默认是2米每秒等;x可以根据实际情况设置,例如可以设置为2米每秒等。
当自车与前车的预计碰撞时间大于x秒时,回报函数与自车的速度相关。
示例性的,在自车的速度小于2米每秒的情况下,回报函数与自车的速度正相关;在自车的速度大于第一常量的情况下,回报函数与自车的速度负相关;在自车的速度大于或等于2米每秒,且小于或等于第一常量的情况下,回报函数与自车的速度正相 关;所述第一常量大于2米每秒。
例如,回报函数可以为:
Figure PCTCN2021107014-appb-000005
其中,v target可以根据实际情况设置,例如可以默认为5米每秒等。
可以基于模拟器中的共享参数强化学习模型收集每个车辆的决策轨迹τ:<o t,a t,r t,o t’>t=0:T,优化预设的模型。
预设的模型的目标函数与一段轨迹中自车的累积回报相关。
例如,目标函数可以为:
Figure PCTCN2021107014-appb-000006
其中,R是一段轨迹的累积回报,θ是策略模型的参数。
在预设的模型输出的值满足目标函数时,可以得到第一驾驶模型。例如,已第一驾驶模型输出的驾驶策略为通过路口的驾驶策略时,当车辆通过路口的成功率达到一定阈值,终止训练,得到第一驾驶模型。
本申请实施例所涉及的第二驾驶模型可以是在第一驾驶模型的基础上,通过调整第一驾驶模型的超参数训练得到的。
示例性的,图8示出了本申请实施例的一种训练第二驾驶模型的流程示意图。
如图8所示,可以从性能指标集合中采样,每一种性能指标可以对应生成一定数目的第二驾驶模型(也可能称为种群模型或种群模型)。
在得到第二驾驶模型时,可以基于第一驾驶模型进行训练。例如,某一性能指标需要生成M(M为自然数)个第二驾驶模型,可以针对该性能指标,复制M个第一驾驶模型,进而基于每个第一驾驶模型生成一个第二驾驶模型。例如,可以去掉第一驾驶模型的回报函数中的内在激励,只保留输出正确结果的回报,使用多智能体强化学习进行训练。
示例性的,可以根据性能指标的需求(例如最大化性能指标或最小化性能指标等)从超参数集合中采样,调整部分第一驾驶模型初始化的超参数,对第一驾驶模型进行演化,在演化后的模型的性能达到阈值,可以得到与对应性能指标强相关的第二驾驶模型(可能称为多样性驾驶模型)。
对多个性能指标均执行生成第二驾驶模型的步骤,则可以生成与对应性能指标强相关的不同风格的驾驶模型,如最大化速度的模型,最小化速度的模型,最大化与前车距离的模型,最小化与前车距离的模型,最大化平均加速度的模型,最小化平均加速度的模型等。
示例性的,图9示出了生成第二驾驶模型的示意图。如图9所示,可以从性能指标集合中采样生成多个种群(例如包括性能指标A1-Ai、N1-Ni等),从超参数集合中采样生成多个示例(例如包括性能指标A1-Ai、N1-Ni对应的超参数),为每个性能指标加载第一驾驶模型(或称为基准驾驶模型),复制第一驾驶模型的模型权重,在第一驾驶模型的超参数中增加探索随机量,直到调整后的模型收敛到与性能指标强相关, 得到第二驾驶模型。
本申请实施例中,第二驾驶模型可以输出对应于性能指标的驾驶策略(或称为驾驶算法或驾驶模型),将第二驾驶模型中的自动驾驶车辆的驾驶策略赋值到障碍物车辆(可能称为社会车),以及从参数化的环境模型(例如包含路况、天气等驾驶环境的模型等),可以生成多样性的自动驾驶场景。可能的理解中,本申请实施例中,可以将与不同性能指标对应的驾驶策略赋值给障碍物车辆,使得障碍物车辆的驾驶策略多样化,且障碍物车辆的驾驶策略不需要依赖人工编码,基于多样性驾驶策略的障碍物车辆,可以生成丰富的自动驾驶场景。
一种可能的实现中,本申请实施例在生成多个自动驾驶场景后,可以基于生成的多个自动驾驶场景,训练得到鲁棒的目标驾驶模型。
示例性的,图10示出了一种训练目标驾驶模型的示意图。如图10所示,可以针对每个自动驾驶场景,分别进行模型训练,得到适应于该自动驾驶场景的驾驶模型,在该驾驶模型能够适用的自动驾驶场景的数量达到预设值的情况下,可以认为得到鲁棒的目标驾驶模型。
一种可能的实现方式中,可以对多个自动驾驶场景排序,依次在排序后的自动驾驶场景中训练驾驶策略,得到目标驾驶模型。
例如,可以将生成的多个自动驾驶场景按照驾驶难度从易到难排序,将排序后的在先自动驾驶场景中训练得到在先驾驶策略作为在后自驾驾驶场景的输入,依次训练得到一个目标驾驶模型。
示例性的,图11示出了一种生成目标驾驶模型的示意图。
如图11所示,可以将多样性的自动驾驶场景按照难度排序,使用课程学习等方式依次在自动驾驶场景中一次训练自动驾驶模型,在当前的自动驾驶场景中自动驾驶模型的输出符合条件时,在后一个自动驾驶场景中继续训练自动驾驶模型,经过在不同难度的自动驾驶场景中的训练可以得到鲁棒性的目标驾驶模型。
可能的实现方式中,驾驶难度可以根据经验、规则或标准等设定。可能的理解中,按照自动驾驶场景从易到难的顺序训练自动驾驶模型,可以实现递进的训练,相较于直接在难度较高的自动驾驶场景中训练,可以节约计算资源。
下面以具体地实施例对本申请的技术方案以及本申请的技术方案如何解决上述技术问题进行详细说明。下面这几个具体的实施例可以独立实现,也可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。
图12为本申请实施例提供的一种自动驾驶场景生成方法的流程示意图,如图12所示,该方法包括:
S1201:获取第一驾驶模型,第一驾驶模型用于输出驾驶策略。
可以理解的是,如上所述第一驾驶模型可以是多辆车辆的信息训练得到的,因此,第一驾驶模型可以用于至少一辆自动驾驶车辆的输出驾驶策略。
S1202:针对自动驾驶中的性能指标,修改第一驾驶模型的超参数,得到性能指标对应的第二驾驶模型。
示例性的,如上所述,可以对第一驾驶模型的超参数集合进行采样,利用采样结果初始化多个第一驾驶模型的超参数,根据自动驾驶中的性能指标,调整部分第一驾 驶模型的超参数,得到性能指标对应的第二驾驶模型。
S1203:在性能指标的模型中采样自动驾驶车辆的驾驶数据。
S1204:根据自动驾驶车辆的驾驶数据赋值障碍物车辆,以及结合预设的环境模型,生成自动驾驶场景。
本申请实施例中,S1201至S1204的具体实现可以参照上述实施例的记载,在此不再赘述。基于S1201至S1204可以生成多样化的自动驾驶场景,进一步的,可以基于多样化的自动驾驶场景训练得到鲁棒性的目标驾驶模型,在此不再赘述。
通过上述对本申请方案的介绍,可以理解的是,上述实现各设备为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件单元。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
如图13所示,本申请实施例一种自动驾驶场景生成的装置,该自动驾驶场景生成的装置包括处理器1300、存储器1301和收发机1302;
处理器1300负责管理总线架构和通常的处理,存储器1301可以存储处理器1300在执行操作时所使用的数据。收发机1302用于在处理器1300的控制下接收和发送数据与存储器1301进行数据通信。
总线架构可以包括任意数量的互联的总线和桥,具体由处理器1300代表的一个或多个处理器和存储器1301代表的存储器的各种电路链接在一起。总线架构还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路链接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口提供接口。处理器1300负责管理总线架构和通常的处理,存储器1301可以存储处理器1300在执行操作时所使用的数据。
本申请实施例揭示的流程,可以应用于处理器1300中,或者由处理器1300实现。在实现过程中,自动驾驶场景生成的流程的各步骤可以通过处理器1300中的硬件的集成逻辑电路或者软件形式的指令完成。处理器1300可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1301,处理器1300读取存储器1301中的信息,结合其硬件完成信号处理流程的步骤。
本申请实施例一种可选的方式,所述处理器1300用于读取存储器1301中的程序并以执行如图12所示的S1201-S1204中的方法流程。
如图14所示,本申请实施例提供一种自动驾驶场景生成的装置,所述装置包括收发模块1400和处理模块1401。
所述收发模块1400,用于支持所述处理模块1401获取第一驾驶模型。
所述处理模块1401,用于获取第一驾驶模型,第一驾驶模型用于至少一辆自动驾驶车辆的输出驾驶策略;对第一驾驶模型的超参数集合进行采样,利用采样结果初始化多个第一驾驶模型的超参数,根据自动驾驶中的性能指标,调整部分第一驾驶模型的超参数,得到性能指标对应的第二驾驶模型;在性能指标对应的第二驾驶的模型中采样自动驾驶车辆的驾驶数据;根据自动驾驶车辆的驾驶数据赋值障碍物车辆,以及结合预设的环境模型,生成自动驾驶场景。
一种可能的实现方式中,处理模块,具体用于:获取第一车辆的行驶相关数据和第一车辆的周围车辆的行驶相关数据;将第一车辆的行驶相关数据和第一车辆的周围车辆的行驶相关数据输入预设的模型;利用预设的模型输出第一车辆的驾驶策略;调整预设的模型的参数,直到预设的模型输出的第一车辆的驾驶策略符合预设条件,得到第一驾驶模型。这样,可以基于车辆的形式相关数据训练得到第一驾驶模型。
一种可能的实现方式中,行驶相关数据包括下述的一种或多种:位置数据、速度数据或方向数据。这样,可以根据位置数据、速度数据和/或方向数据等与行驶相关的数据,训练得到准确的第一驾驶模型。
一种可能的实现方式中,在第一车辆与第一车辆的前车的预计碰撞时间小于第一值的情况下,预设的模型的奖励函数与第一车辆与第一车辆的前车的距离、第一车辆的速度以及第一车辆的前车的速度相关。
一种可能的实现方式中,预设的模型的奖励函数分别:与距离负相关、与第一车辆的速度负相关、与第一车辆的前车的速度正相关。
一种可能的实现方式中,预设的模型的奖励函数满足:
r ttc=max(-1/(ttc/ttc target) 2,-100)
其中,ttc=d front/(v-v front),d front是第一车辆与第一车辆的前车的距离,v是第一车辆的速度,v front是第一车辆的前车的速度,ttc target为第一值。
一种可能的实现方式中,在第一车辆与第一车辆的前车的预计碰撞时间大于或等于第一值的情况下,预设的模型的奖励函数与第一车辆的速度相关。
一种可能的实现方式中,在第一车辆的速度小于2米每秒的情况下,预设的模型的奖励函数与第一车辆的速度正相关;在第一车辆的速度大于第一常量的情况下,预设的模型的奖励函数与第一车辆的速度负相关;在第一车辆的速度大于或等于2米每秒,且小于或等于第一常量的情况下,预设的模型的奖励函数与第一车辆的速度正相关;第一常量大于2米每秒。
一种可能的实现方式中,预设的模型的奖励函数满足:
Figure PCTCN2021107014-appb-000007
其中,v是第一车辆的速度,v target为常量。
一种可能的实现方式中,预设的模型的目标函数与一段轨迹中第一车辆的累积 回报相关。
一种可能的实现方式中,目标函数包括:
Figure PCTCN2021107014-appb-000008
其中R是一段轨迹中第一车辆的累积回报,θ是模型参数。
一种可能的实现方式中,自动驾驶场景的数量为多个;方法还包括:排序多个自动驾驶场景;依次在排序后的多个自动驾驶场景中训练驾驶策略,得到目标驾驶模型。这样,在排序后的多个自动驾驶场景中训练驾驶策略,可以得到能够适应多个自动驾驶场景的目标驾驶模型。
一种可能的实现方式中,处理模块,具体用于:对于按照驾驶难度从易到难排序的多个自动驾驶场景,将在先自动驾驶场景中训练得到在先驾驶策略作为在后自驾驾驶场景的输入,依次训练得到一个目标驾驶模型。这样,按照自动驾驶场景从易到难的顺序训练自动驾驶模型,可以实现递进的训练,相较于直接在难度较高的自动驾驶场景中训练,可以节约计算资源。
一种可能的实现方式中,性能指标包括:速度指标,加速度指标或与前车距离指标。
一种可能的实现方式中,性能指标的种群模型包括下述一种或多种:最大化速度的模型、最小化速度的模型、最大化与前车距离的模型、最小化与前车距离的模型、最大化平均加速度的模型或最小化平均加速度的模型。
一种可能的实现方式中,超参数包括下述一种或多种:学习率或批大小。
一种可能的实现方式中,行驶相关数据是从真实路测数据中收集的,和/或,行驶相关数据是模拟器中的车辆与环境交互生成的。
可能的实现方式中,上述图14所示的收发模块1400和处理模块1401的功能可以由处理器1300运行存储器1301中的程序执行,或者由处理器1300单独执行。
如图15所示,本申请提供一种车辆,所述装置包括至少一个摄像器1501,至少一个存储器1502,至少一个收发器1503以及至少一个处理器1504。
所述摄像器1501,用于获取至少一张图像。
所述存储器1502,用于存储一个或多个程序以及数据信息;其中所述一个或多个程序包括指令。
所述收发器1503,用于与所述车辆中的通讯设备进行数据传输,以及用于与云端进行数据传输。
所述处理器1504,用于获取第一驾驶模型,第一驾驶模型用于至少一辆自动驾驶车辆的输出驾驶策略;对第一驾驶模型的超参数集合进行采样,利用采样结果初始化多个第一驾驶模型的超参数,根据自动驾驶中的性能指标,调整部分第一驾驶模型的超参数,得到性能指标对应的第二驾驶模型;在性能指标对应的第二驾驶模型中采样自动驾驶车辆的驾驶数据;根据自动驾驶车辆的驾驶数据赋值障碍物车辆,以及结合预设的环境模型,生成自动驾驶场景。
在一些可能的实施方式中,本申请实施例提供的自动驾驶场景生成的方法的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序代码在计算机设备上运行时,所述程序代码用于使所述计算机设备执行本说明书中描述的根据本申 请各种示例性实施方式的自动驾驶场景生成的方法中的步骤。
所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
根据本申请的实施方式的用于自动驾驶场景生成的程序产品,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在服务器设备上运行。然而,本申请的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被通信传输、装置或者器件使用或者与其结合使用。
可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括——但不限于——电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由周期网络动作系统、装置或者器件使用或者与其结合使用的程序。
可读介质上包含的程序代码可以用任何适当的介质传输,包括——但不限于——无线、有线、光缆、RF等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言的任意组合来编写用于执行本申请操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算设备,或者,可以连接到外部计算设备。
本申请实施例针对自动驾驶场景生成的方法还提供一种计算设备可读存储介质,即断电后内容不丢失。该存储介质中存储软件程序,包括程序代码,当所述程序代码在计算设备上运行时,该软件程序在被一个或多个处理器读取并执行时可实现本申请实施例上面任何一种自动驾驶场景生成的方案。
本申请实施例还提供一种电子设备,在采用对应各个功能划分各个功能模块的情况下,该电子设备包括:处理模块,用于支持自动驾驶场景生成装置执行上述实施例中的步骤,例如可以执行S101至S102的操作,或者本申请实施例所描述的技术的其他过程。
其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
当然,自动驾驶场景生成装置包括但不限于上述所列举的单元模块。并且,上述功能单元的具体所能够实现的功能也包括但不限于上述实例所述的方法步骤对应的功 能,电子设备的其他单元的详细描述可以参考其所对应方法步骤的详细描述,本申请实施例这里不予赘述。
在采用集成的单元的情况下,上述实施例中所涉及的电子设备可以包括:处理模块、存储模块和通信模块。存储模块,用于保存电子设备的程序代码和数据。该通信模块用于支持电子设备与其他网络实体的通信,以实现电子设备的通话,数据交互,Internet访问等功能。
其中,处理模块用于对电子设备的动作进行控制管理。处理模块可以是处理器或控制器。通信模块可以是收发器、RF电路或通信接口等。存储模块可以是存储器。
进一步的,该电子设备还可以包括输入模块和显示模块。显示模块可以是屏幕或显示器。输入模块可以是触摸屏,语音输入装置,或指纹传感器等。
以上参照示出根据本申请实施例的方法、装置(系统)和/或计算机程序产品的框图和/或流程图描述本申请。应理解,可以通过计算机程序指令来实现框图和/或流程图示图的一个块以及框图和/或流程图示图的块的组合。可以将这些计算机程序指令提供给通用计算机、专用计算机的处理器和/或其它可编程数据处理装置,以产生机器,使得经由计算机处理器和/或其它可编程数据处理装置执行的指令创建用于实现框图和/或流程图块中所指定的功能/动作的方法。
相应地,还可以用硬件和/或软件(包括固件、驻留软件、微码等)来实施本申请。更进一步地,本申请可以采取计算机可使用或计算机可读存储介质上的计算机程序产品的形式,其具有在介质中实现的计算机可使用或计算机可读程序代码,以由指令执行系统来使用或结合指令执行系统而使用。在本申请上下文中,计算机可使用或计算机可读介质可以是任意介质,其可以包含、存储、通信、传输、或传送程序,以由指令执行系统、装置或设备使用,或结合指令执行系统、装置或设备使用。
本申请结合多个流程图详细描述了多个实施例,但应理解,这些流程图及其相应的实施例的相关描述仅为便于理解而示例,不应对本申请构成任何限定。各流程图中的每一个步骤并不一定是必须要执行的,例如有些步骤是可以跳过的。并且,各个步骤的执行顺序也不是固定不变的,也不限于图中所示,各个步骤的执行顺序应以其功能和内在逻辑确定。
本申请描述的多个实施例之间可以任意组合或步骤之间相互交叉执行,各个实施例的执行顺序和各个实施例的步骤之间的执行顺序均不是固定不变的,也不限于图中所示,各个实施例的执行顺序和各个实施例的各个步骤的交叉执行顺序应以其功能和内在逻辑确定。
尽管结合具体特征及其实施例对本申请进行了描述,显而易见的,在不脱离本申请的精神和范围的情况下,可对其进行各种修改和组合。相应地,本说明书和附图仅仅是所附权利要求所界定的本申请的示例性说明,且视为已覆盖本申请范围内的任意和所有修改、变化、组合或等同物。显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包括这些改动和变型在内。

Claims (18)

  1. 一种自动驾驶场景生成方法,其特征在于,包括:
    获取第一驾驶模型,所述第一驾驶模型用于至少一辆自动驾驶车辆的输出驾驶策略;
    对所述第一驾驶模型的超参数集合进行采样,利用采样结果初始化多个所述第一驾驶模型的超参数,根据自动驾驶中的性能指标,调整部分所述第一驾驶模型的超参数,得到所述性能指标对应的第二驾驶模型;
    在所述性能指标对应的第二驾驶模型中采样自动驾驶车辆的驾驶数据;
    根据所述自动驾驶车辆的驾驶数据赋值障碍物车辆,以及结合预设的环境模型,生成自动驾驶场景。
  2. 根据权利要求1所述的方法,其特征在于,所述获取第一驾驶模型包括:
    获取第一车辆的行驶相关数据和所述第一车辆的周围车辆的行驶相关数据;
    将所述第一车辆的行驶相关数据和所述第一车辆的周围车辆的行驶相关数据输入预设的模型;
    利用所述预设的模型输出所述第一车辆的驾驶策略;
    调整所述预设的模型的参数,直到所述预设的模型输出的所述第一车辆的驾驶策略符合预设条件,得到所述第一驾驶模型。
  3. 根据权利要求2所述的方法,其特征在于,所述行驶相关数据包括下述的一种或多种:位置数据、速度数据或方向数据。
  4. 根据权利要求3所述的方法,其特征在于,在所述第一车辆与所述第一车辆的前车的预计碰撞时间小于第一值的情况下,所述预设的模型的奖励函数与所述第一车辆与所述第一车辆的前车的距离、所述第一车辆的速度以及所述第一车辆的前车的速度相关。
  5. 根据权利要求4所述的方法,其特征在于,所述预设的模型的奖励函数分别:与所述距离负相关、与所述第一车辆的速度负相关、与所述第一车辆的前车的速度正相关。
  6. 根据权利要求3所述的方法,其特征在于,在所述第一车辆与所述第一车辆的前车的预计碰撞时间大于或等于所述第一值的情况下,所述预设的模型的奖励函数与所述第一车辆的速度相关。
  7. 根据权利要求6所述的方法,其特征在于,在所述第一车辆的速度小于2米每秒的情况下,所述预设的模型的奖励函数与所述第一车辆的速度正相关;在所述第一车辆的速度大于第一常量的情况下,所述预设的模型的奖励函数与所述第一车辆的速度负相关;在所述第一车辆的速度大于或等于2米每秒,且小于或等于所述第一常量的情况下,所述预设的模型的奖励函数与所述第一车辆的速度正相关;所述第一常量大于2米每秒。
  8. 根据权利要求3-7任一项所述的方法,其特征在于,所述预设的模型的目标函数与一段轨迹中所述第一车辆的累积回报相关。
  9. 根据权利要求1-8任一项所述的方法,其特征在于,所述自动驾驶场景的数量 为多个;所述方法还包括:
    排序多个所述自动驾驶场景;
    依次在排序后的多个所述自动驾驶场景中训练驾驶策略,得到目标驾驶模型。
  10. 根据权利要求9所述的方法,其特征在于,所述依次在排序后的多个所述自动驾驶场景中训练驾驶策略,得到目标驾驶模型,包括:
    对于按照驾驶难度从易到难排序的多个所述自动驾驶场景,将在先自动驾驶场景中训练得到在先驾驶策略作为在后自驾驾驶场景的输入,依次训练得到一个所述目标驾驶模型。
  11. 根据权利要求1-10任一项所述的方法,其特征在于,所述性能指标包括:速度指标,加速度指标或与前车距离指标;
    和/或,所述性能指标的种群模型包括下述一种或多种:最大化速度的模型、最小化速度的模型、最大化与前车距离的模型、最小化与前车距离的模型、最大化平均加速度的模型或最小化平均加速度的模型。
  12. 根据权利要求1-11任一项所述的方法,其特征在于,所述超参数包括下述一种或多种:学习率或批大小。
  13. 根据权利要求1-12任一项所述的方法,其特征在于,所述行驶相关数据是从真实路测数据中收集的,和/或,所述行驶相关数据是模拟器中的车辆与环境交互生成的。
  14. 一种自动驾驶场景生成装置,其特征在于,包括处理器和接口电路,所述接口电路用于接收代码指令并传输至所述处理器;所述处理器用于运行所述代码指令,以执行如权利要求1-13任一项所述的方法。
  15. 一种电子设备,其特征在于,包括:一个或多个处理器、收发器、存储器和接口电路;所述一个或多个处理器、所述收发器、所述存储器和和所述接口电路通过一个或多个通信总线通信;所述接口电路用于与其它装置通信,一个或多个计算机程序被存储在所述存储器中,并被配置为被所述一个或多个处理器或所述收发器执行以使得所述电子设备执行如权利要求1-13任一项所述的方法。
  16. 一种车辆,其特征在于,包括:至少一个摄像器,至少一个存储器,至少一个收发器以及至少一个处理器;
    所述摄像器,用于获取至少一张图像;
    所述存储器,用于存储一个或多个程序以及数据信息;其中所述一个或多个程序包括指令;
    所述收发器,用于与所述车辆中的通讯设备进行数据传输,以及用于与云端进行数据传输;
    所述处理器,用于获取第一驾驶模型,所述第一驾驶模型用于至少一辆自动驾驶车辆的输出驾驶策略;对所述第一驾驶模型的超参数集合进行采样,利用采样结果初始化多个所述第一驾驶模型的超参数,根据自动驾驶中的性能指标,调整部分所述第一驾驶模型的超参数,得到所述性能指标对应的第二驾驶模型;在所述性能指标对应的第二驾驶模型中采样自动驾驶车辆的驾驶数据;根据所述自动驾驶车辆的驾驶数据赋值障碍物车辆,以及结合预设的环境模型,生成自动驾驶场景。
  17. 一种自动驾驶系统,其特征在于,包括训练设备和执行设备;
    所述训练设备用于执行如权利要求1-13任一项所述的方法;
    所述执行设备用于执行根据所述训练设备训练得到的驾驶策略。
  18. 一种可读计算机存储产品,其特征在于,所述可读计算机存储产品用于存储计算机程序,所述计算机程序用于实现如权利要求1-13任一项所述的方法。
PCT/CN2021/107014 2020-07-22 2021-07-19 自动驾驶场景生成方法、装置及系统 WO2022017307A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010711287.4 2020-07-22
CN202010711287.4A CN113968242B (zh) 2020-07-22 2020-07-22 自动驾驶场景生成方法、装置及系统

Publications (1)

Publication Number Publication Date
WO2022017307A1 true WO2022017307A1 (zh) 2022-01-27

Family

ID=79584793

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/107014 WO2022017307A1 (zh) 2020-07-22 2021-07-19 自动驾驶场景生成方法、装置及系统

Country Status (2)

Country Link
CN (1) CN113968242B (zh)
WO (1) WO2022017307A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117473879A (zh) * 2023-12-27 2024-01-30 万物镜像(北京)计算机系统有限公司 一种自动驾驶仿真场景的生成方法、装置及设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115876493B (zh) * 2023-01-18 2023-05-23 禾多科技(北京)有限公司 用于自动驾驶的测试场景生成方法、装置、设备和介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107862346A (zh) * 2017-12-01 2018-03-30 驭势科技(北京)有限公司 一种进行驾驶策略模型训练的方法与设备
US20180284768A1 (en) * 2017-03-30 2018-10-04 Uber Technologies, Inc. Systems and Methods to Control Autonomous Vehicle Motion
US20190113917A1 (en) * 2017-10-16 2019-04-18 Toyota Research Institute, Inc. System and method for leveraging end-to-end driving models for improving driving task modules
CN109733415A (zh) * 2019-01-08 2019-05-10 同济大学 一种基于深度强化学习的拟人化自动驾驶跟驰模型
CN110322017A (zh) * 2019-08-13 2019-10-11 吉林大学 基于深度强化学习的自动驾驶智能车轨迹跟踪控制策略

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108891421B (zh) * 2018-06-25 2020-05-19 大连大学 一种构建驾驶策略的方法
US10845815B2 (en) * 2018-07-27 2020-11-24 GM Global Technology Operations LLC Systems, methods and controllers for an autonomous vehicle that implement autonomous driver agents and driving policy learners for generating and improving policies based on collective driving experiences of the autonomous driver agents
US11900797B2 (en) * 2018-10-16 2024-02-13 Five AI Limited Autonomous vehicle planning
CN109901574B (zh) * 2019-01-28 2021-08-13 华为技术有限公司 自动驾驶方法及装置
CN109839937B (zh) * 2019-03-12 2023-04-07 百度在线网络技术(北京)有限公司 确定车辆自动驾驶规划策略的方法、装置、计算机设备
CN111123927A (zh) * 2019-12-20 2020-05-08 北京三快在线科技有限公司 轨迹规划方法、装置、自动驾驶设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180284768A1 (en) * 2017-03-30 2018-10-04 Uber Technologies, Inc. Systems and Methods to Control Autonomous Vehicle Motion
US20190113917A1 (en) * 2017-10-16 2019-04-18 Toyota Research Institute, Inc. System and method for leveraging end-to-end driving models for improving driving task modules
CN107862346A (zh) * 2017-12-01 2018-03-30 驭势科技(北京)有限公司 一种进行驾驶策略模型训练的方法与设备
CN109733415A (zh) * 2019-01-08 2019-05-10 同济大学 一种基于深度强化学习的拟人化自动驾驶跟驰模型
CN110322017A (zh) * 2019-08-13 2019-10-11 吉林大学 基于深度强化学习的自动驾驶智能车轨迹跟踪控制策略

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117473879A (zh) * 2023-12-27 2024-01-30 万物镜像(北京)计算机系统有限公司 一种自动驾驶仿真场景的生成方法、装置及设备
CN117473879B (zh) * 2023-12-27 2024-04-02 万物镜像(北京)计算机系统有限公司 一种自动驾驶仿真场景的生成方法、装置及设备

Also Published As

Publication number Publication date
CN113968242B (zh) 2023-10-20
CN113968242A (zh) 2022-01-25

Similar Documents

Publication Publication Date Title
CN109901574B (zh) 自动驾驶方法及装置
CN110379193B (zh) 自动驾驶车辆的行为规划方法及行为规划装置
EP3835908B1 (en) Automatic driving method, training method and related apparatuses
WO2022027304A1 (zh) 一种自动驾驶车辆的测试方法及装置
WO2022001773A1 (zh) 轨迹预测方法及装置
WO2021102955A1 (zh) 车辆的路径规划方法以及车辆的路径规划装置
CN110371132B (zh) 驾驶员接管评估方法及装置
WO2021000800A1 (zh) 道路可行驶区域推理方法及装置
US20220080972A1 (en) Autonomous lane change method and apparatus, and storage medium
WO2021212379A1 (zh) 车道线检测方法及装置
CN110471411A (zh) 自动驾驶方法和自动驾驶装置
WO2021244207A1 (zh) 训练驾驶行为决策模型的方法及装置
CN111950726A (zh) 基于多任务学习的决策方法、决策模型训练方法及装置
WO2022142839A1 (zh) 一种图像处理方法、装置以及智能汽车
WO2022017307A1 (zh) 自动驾驶场景生成方法、装置及系统
WO2022062825A1 (zh) 车辆的控制方法、装置及车辆
US20230048680A1 (en) Method and apparatus for passing through barrier gate crossbar by vehicle
CN113954858A (zh) 一种规划车辆行驶路线的方法以及智能汽车
CN113552867A (zh) 一种运动轨迹的规划方法及轮式移动设备
WO2022178858A1 (zh) 一种车辆行驶意图预测方法、装置、终端及存储介质
US20230107033A1 (en) Method for optimizing decision-making regulation and control, method for controlling traveling of vehicle, and related apparatus
CN113859265A (zh) 一种驾驶过程中的提醒方法及设备
CN113741384A (zh) 检测自动驾驶系统的方法和装置
WO2022001432A1 (zh) 推理车道的方法、训练车道推理模型的方法及装置
CN114556251B (zh) 用于确定车辆可通行空间的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21845503

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21845503

Country of ref document: EP

Kind code of ref document: A1