WO2021226921A1 - Method and system of data processing for autonomous driving - Google Patents

Method and system of data processing for autonomous driving Download PDF

Info

Publication number
WO2021226921A1
WO2021226921A1 PCT/CN2020/090197 CN2020090197W WO2021226921A1 WO 2021226921 A1 WO2021226921 A1 WO 2021226921A1 CN 2020090197 W CN2020090197 W CN 2020090197W WO 2021226921 A1 WO2021226921 A1 WO 2021226921A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
down view
obstacle object
fusion image
road
Prior art date
Application number
PCT/CN2020/090197
Other languages
French (fr)
Inventor
Guoxia ZHANG
Qingshan Zhang
Kecai WU
Original Assignee
Harman International Industries, Incorporated
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harman International Industries, Incorporated filed Critical Harman International Industries, Incorporated
Priority to PCT/CN2020/090197 priority Critical patent/WO2021226921A1/en
Publication of WO2021226921A1 publication Critical patent/WO2021226921A1/en

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2520/00Input parameters relating to overall vehicle dynamics
    • B60W2520/10Longitudinal speed
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2520/00Input parameters relating to overall vehicle dynamics
    • B60W2520/10Longitudinal speed
    • B60W2520/105Longitudinal acceleration
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2520/00Input parameters relating to overall vehicle dynamics
    • B60W2520/14Yaw
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2540/00Input parameters relating to occupants
    • B60W2540/18Steering angle
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2552/00Input parameters relating to infrastructure
    • B60W2552/53Road markings, e.g. lane marker or crosswalk
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2554/00Input parameters relating to objects
    • B60W2554/40Dynamic objects, e.g. animals, windblown objects
    • B60W2554/402Type
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2554/00Input parameters relating to objects
    • B60W2554/40Dynamic objects, e.g. animals, windblown objects
    • B60W2554/404Characteristics
    • B60W2554/4041Position
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2554/00Input parameters relating to objects
    • B60W2554/40Dynamic objects, e.g. animals, windblown objects
    • B60W2554/404Characteristics
    • B60W2554/4042Longitudinal speed
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2554/00Input parameters relating to objects
    • B60W2554/40Dynamic objects, e.g. animals, windblown objects
    • B60W2554/404Characteristics
    • B60W2554/4044Direction of movement, e.g. backwards
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2556/00Input parameters relating to data
    • B60W2556/35Data fusion
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2556/00Input parameters relating to data
    • B60W2556/40High definition maps

Definitions

  • the present disclosure relates to a data processing method and system for an autonomous driving vehicle, and specifically relates to a method and system for inputting representation for deep reinforcement learning in an autonomous driving system.
  • reinforcement learning has brought a series of breakthroughs in recent years.
  • reinforcement learning has brought a series of breakthroughs in recent years.
  • the current solutions do not take into account all the sensors deployed on autonomous vehicles, and most of them use only a front view image as the input and learn an end-to-end driving policy.
  • the front view image does not contain enough information for a decision making.
  • the agent cannot make a left lane change decision accurately, without information such as regarding a location, a speed, and a direction of the left rear vehicle in the adjacent lane.
  • the raw image contains extremely high dimensional information such as appearances and textures of the roads and objects, weather conditions, and light conditions, etc.
  • these kinds of extremely complex high dimensional visual features dramatically enlarge the sample complexity for learning.
  • the dataset In order to obtain good generalization, the dataset must cover enough data for each dimension of the raw sensor information.
  • the disclosure designs a low dimensional representation with enough information for the reinforcement learning agent to make a driving policy.
  • This low dimensional representation not only includes environment information, but also includes vehicle information.
  • a method of data processing for an autonomous driving vehicle may comprises: receiving, from a plurality of sensors mounted on the autonomous driving vehicle, sensing data of surrounding environment of the autonomous driving vehicle; processing the sensing data and outputting environment data; extracting map data for the autonomous driving vehicle; generating a top-down view fusion image based on the environment data and the map data; encoding the top-down view fusion image and outputting output low-dimensional latent states; concatenating the low-dimensional latent states with vehicle information and outputting a state vector; and performing a training based on the state vector and determining a driving policy.
  • the environment data includes at least one of road data and obstacle object data. Furthermore, the processing the sensing data may comprises performing a road perception to output the road data, by identifying a segmented drivable area based on the sensing data; and performing an obstacle object detection to output the obstacle object data, by detecting, classifying and tracking one or more obstacle objects based on the sensing data.
  • the method further comprises identifying one or more lane marks within the segmented drivable area, and generating an obstacle object list for the obstacle object data based on the obstacle object data.
  • the obstacle object list may include information regarding a type, a size, a distance, a direction, a velocity and a heading of each obstacle object.
  • the map data comprises local map data which is extracted from a global map based on a current location of the autonomous vehicle.
  • the map data further comprises intended route data which represents an intended route of the autonomous vehicle.
  • the method may further map the road data into a first top-down view, map the obstacle object data into a second top-down view, and then fuse the first top-down view, the second top-down view and the map data to generate the top-down view fusion image.
  • the top-down view fusion image may be expressed as a function of a width of the top-down view fusion image, a height of the top-down view fusion image and a number of channels for representing at least one of the road data, the obstacle object data and the map data.
  • the method may output a control command to the autonomous driving vehicle based on the driving policy.
  • a system of data processing for an autonomous driving vehicle may comprises a perception module, a local map extraction module, a top-down view fusion image generation module, a visual encoding module, a concatenate module and a reinforcement learning (RL) agent.
  • the perception module is configured to receive, from a plurality of sensors mounted on the autonomous driving vehicle, sensing data of surrounding environment of the autonomous driving vehicle, process the sensing data and output environment data.
  • the local map extraction module is configured to extract map data for the autonomous driving vehicle.
  • the top-down view fusion image generation module is configured to generate a top-down view fusion image based on the environment data and the map data.
  • the visual encoding module is configured to encode the top-down view fusion image to output low-dimensional latent states.
  • the concatenate module is configured to concatenate the low-dimensional latent states with vehicle information to output a state vector.
  • the reinforcement learning agent is configured to perform a training based on the state vector and determine a driving policy.
  • a computer readable media having computer-executable instructions for performing the abovesaid method is provided.
  • FIG. 1 illustrates a schematic block diagram including a vehicle environment in accordance with one or more embodiments of the present disclosure.
  • FIG. 2A illustrates an example of a multi-task network for obstacle objects detection and road perception.
  • FIG. 2B illustrates a schematic view which shows an example of a perception result.
  • FIG. 3 illustrates a schematic view which shows an example of local map for an autonomous driving vehicle.
  • FIG. 4 illustrates a schematic view which shows an example of an intended route for the autonomous driving vehicle.
  • FIG. 5 illustrates an example of a drivable area with a top-down view.
  • FIG. 6 illustrates an example of size of objects with a top-down view.
  • FIG. 7 illustrates an example of a top-down map fusion image.
  • FIG. 8 illustrates a flowchart of the method of the present disclosure.
  • FIG. 1 illustrates a schematic block diagram including an autonomous vehicle in accordance with one or more embodiments of the present disclosure, for explaining how to implement deep reinforcement learning for an autonomous vehicle in an autonomous driving system.
  • the system may comprise a perception module 100, a local map extraction module 200, a top-down view fusion image generation module 300, a visual encoding module 400, a concatenate module 500 and a reinforcement learning (RL) agent 600.
  • the perception module 100 receives, from a plurality of sensors mounted on the autonomous driving vehicle, sensing data of surrounding environment of the vehicle.
  • the sensors may be on-board sensors, such as a camera, a radar and a LiDAR, etc. These sensors are amounted on and around the vehicle and may sense the surrounding environment information, for example a plurality of images, radar or LiDAR data of the surrounding environment.
  • the perception module 100 may further process the sensing data and output environment data.
  • the local map extraction module 200 extracts map data for the vehicle.
  • the local map extraction module 200 may extract a local map from the global route map based on GPS data from the vehicle-mounted GPS sensor.
  • the top-down view fusion image generation module 300 generates a top-down view fusion image based on the environment data and the map data received respectively from the perception module 100 and the local map extraction module 200.
  • the top-down view fusion image generated by the top-down view fusion image generation module 300 is then output to the visual encoding module 400 for encoding.
  • the visual encoding module 400 performs encoding on the top-down view fusion image to capture low-dimensional latent states. These states are then concatenated with vehicle information in the concatenate module 500 to generate a final state vector for the reinforcement learning agent 600.
  • reinforcement learning algorithms are adopted to train a deep network which takes the final state vector state as input and then a driving policy is determined. Then, the reinforcement learning agent 600 outputs a control command such as a speed and a steering angle, etc., for controlling the vehicle.
  • the perception module 100 may incorporate the capability of using multiple cameras, radars and LiDARs to identify the environment data, such as a drivable area and recognize obstacles.
  • the perception module 100 includes a road perception sub-module 1002 and an obstacle object detection sub-module 1004.
  • the road perception sub-module 1002 performs a road perception to obtain road data.
  • a segmentation deep learning network is used to identify a region of pixels that indicates a segmented drivable area in a given image captured by the camera or LiDAR.
  • lane markers may be segmented and detected from the segmented dravable area.
  • the obstacle object detection sub-module 1004 performs an obstacle object detection to output obstacle object data.
  • a deep learning network may be used to detect, classify, track obstacles based on the sensing data and generate the information regarding obstacles, such as type and position information, etc.
  • This sub-module also fuses each obstacle object detected from the sensing data which are captured by different sensors, and then generates the obstacle object data indicative of a final fusion obstacle objects list, which contains more information regarding the objects, such as a type, a size, a distance, a direction, a velocity and a heading of each object.
  • the environment data output from the perception module may include at least one of the road data and the obstacle object data.
  • FIG. 2A illustrates an example of a multi-task network architecture for performing the obstacle object detection and the road perception.
  • the multi-task network architecture contains an encoder for the feature extraction and two decoder branches respectively for the object detection and road semantic segmentation. Both decoders use multi-level feature maps from the residual network based encoder. The decoders also use multiple convolutional layers for feature decoding.
  • FIG. 2B illustrates an example of perception result of the environment data including road data and obstacle objects data. As shown in FIG.
  • the left image shows the information of the detected obstacle objects.
  • the types of the obstacle objects are identified, such as car, truck, traffic signs (Tsigns) , etc.
  • the size of the obstacle objects are identified using length, width, and height .
  • the distance information is also identified in the left image. As the skilled in the art can realize, it is only used to illustration and may include other information according to different environment and different interests/requirements.
  • the right image in FIG. 2B mainly shows the road information for the same environment, wherein it shows the segmented drivable area on the ground and the segmented lane markers indicated as gray lines.
  • the local map extraction module 200 uses the vehicle current location obtained from the vehicle-mounted GPS module to extract a local map (local map data) from the global rout map.
  • a local map local map data
  • the ego vehicle is always at a fixed location. Take the ego vehicle as the coordinate origin, there are at least two lane widths on each of the left and right sides. The forward and backward length from the ego vehicle should not be less than the ranging capability of the on-board sensors. As the vehicle moves, this local map moves with it so the vehicle always sees a fixed range.
  • W is used to indicate the width of the local map
  • H is used to indicate the height of the local map.
  • the local map may contain information of drivable road geometry.
  • FIG. 3 illustrates an example of the extracted local map, in which the ego vehicle is indicated as a dot.
  • FIG. 4 illustrates an example of an intended route long which the vehicle wish to drive. This intended route may be generated by a router and provide heuristic navigation information of the vehicle, including a location, a specified lane and a driving direction.
  • map data may include the local map data and the intended route data.
  • the top-down view fusion image generation module 300 may include a top-down view image processing sub-module 3002 and a map fusion sub-module 3004.
  • the top-down view image processing sub-module 3002 maps the road data into a first top-down view and map the obstacle object data into a second top-down view.
  • the map fusion sub-module 3004 fuses the first top-down view, the second top-down view and the map data to output the top-down view fusion image.
  • this top-down view fusion image generation module generates a top-down view fusion image for visual encoding by overlaying the drivable area, the obstacle objects and the local map.
  • Inverse Perspective Mapping can be used to map the segmented drivable area and the detected objects into top-down view.
  • the Inverse Perspective Mapping can be considered as an image projection process.
  • the pixels in the original images can be projected to the bird’s eye-view images using a mapping matrix (projection matrix) .
  • a multiple-point correspondence-based method can be used to estimate the mapping matrix
  • Figure 5 gives an example of top-down view drivable area.
  • obstacle objects take the ego vehicle as the coordinate origin, those objects whose distance does not exceed the map size are retained. Considering the actual deployment of the on-board sensors, the car cannot perceive the entire shape information of neighboring objects.
  • the width and length of the object that can be seen in the perspective of the ego vehicle are used to represent the size of the obstacle objects.
  • Figure 6 gives an example of such representation in image.
  • the abovesaid generated top-down view images are further fused with the extracted local map and the intended route in the map fusion sub-module 3004 to generate a map-fusion image.
  • Figure 7 gives an example of the generated map-fusion image.
  • the top-down view fusion image may expressed as a function of a width of the top-down view fusion image, a height of the top-down view fusion image and a number of channels for representing at least one of the road data, the obstacle object data and the map data.
  • W is the width of the image
  • H is the height of the image.
  • N is the total number of the channels to represent segmented drivable area, detected obstacle objects and the intended route.
  • the segmented drivable area needs one channel.
  • the detected obstacle objects need three channels to represent objects’ type, velocity, and heading information.
  • the intended route needs one channel to represent driving direction. Accordingly, N may be five in this example.
  • the visual encoding module 400 uses a visual encoding algorithm, such as VAE, PCA or increment PCA, to learn a low dimensional latent representation, i.e., low dimensional latent states from the top-down view fusion image. These states are then concatenated with vehicle information, such as necessary vehicle kinematic parameters to generate a final state vector.
  • vehicle information includes, for example, a vehicle acceleration rate, a vehicle speed, a vehicle heading, and a vehicle lateral distance to road boundary, a vehicle previous steering angle and a vehicle steering torque.
  • the reinforcement learning agent 600 trains a deep network to learn the driving policy, and outputs a control command such as speed and steering angle to the vehicle.
  • a control command such as speed and steering angle to the vehicle.
  • LSTM based DDPG algorithm may be used preferably.
  • FIG. 8 illustrates a flowchart of a method for the autonomous driving vehicle according to one or more embodiments.
  • the method may be suitable for the system described above and the features described in the abovesaid system may be incorporated into the method herein.
  • the method receives, from a plurality of sensors mounted on the autonomous driving vehicle, sensing data of surrounding environment of the vehicle.
  • the sensing data may be processed and environment data may be obtained.
  • the environment data output from the perception module may include at least one of the road data and the obstacle object data.
  • map data may be extracted for the vehicle.
  • the map data may include at least one of local map data and intended route map.
  • a top-down view fusion image may be generated based on the environment data and the map data.
  • the top-down view fusion image may be encoded to output low-dimensional latent states.
  • the low-dimensional latent states may be concatenated with vehicle information to output a state vector.
  • the vehicle information may include, for example, a vehicle acceleration rate, a vehicle speed, a vehicle heading, and a vehicle lateral distance to road boundary, a vehicle previous steering angle and a vehicle steering torque.
  • a training to a deep network is performed based on the state vector and a driving policy may be determined.
  • the method and system disclosed herein uses a kind of input representation which contains enough information including road, neighboring objects and vehicle information, which makes the reinforcement learning agent generate more accurate and safe driving policy.
  • the method and system disclosed herein uses the structured road geometry information, map information and the changed environment information which are uniformly represented as a fused image. Then, the fused image is further compressed in a fixed length vector using visual encoding technology, which accordingly makes the learning process of reinforcement learning converge faster.
  • aspects of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc. ) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit, ” “module” or “system. ”
  • the present disclosure may be a system, a method, and/or a computer program product.
  • the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.
  • the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
  • the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM) , a read-only memory (ROM) , an erasable programmable read-only memory (EPROM or Flash memory) , a static random access memory (SRAM) , a portable compact disc read-only memory (CD-ROM) , a digital versatile disk (DVD) , a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or Flash memory erasable programmable read-only memory
  • SRAM static random access memory
  • CD-ROM compact disc read-only memory
  • DVD digital versatile disk
  • memory stick a floppy disk
  • a mechanically encoded device such as punch-cards or raised structures in a groove having instructions
  • a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable) , or electrical signals transmitted through a wire.
  • Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
  • the network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
  • the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function (s) .
  • the functions noted in the block may occur out of the order noted in the figures.
  • two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

Abstract

The disclosure describes a method and a system of data processing for a vehicle. The method comprises receiving, from a plurality of sensors mounted on the autonomous driving vehicle, sensing data of surrounding environment of the autonomous driving vehicle; processing the sensing data and outputting environment data; extracting map data for the autonomous driving vehicle; generating a top-down view fusion image based on the environment data and the map data; encoding the top-down view fusion image and outputting output low-dimensional latent states; concatenating the low-dimensional latent states with vehicle information and outputting a state vector; and performing a training based on the state vector and determining a driving policy.

Description

METHOD AND SYSTEM OF DATA PROCESSING FOR AUTONOMOUS DRIVING
TECHINICAL FIELD
The present disclosure relates to a data processing method and system for an autonomous driving vehicle, and specifically relates to a method and system for inputting representation for deep reinforcement learning in an autonomous driving system.
BACKGROUND
Combined with deep learning techniques, reinforcement learning (RL) has brought a series of breakthroughs in recent years. However, there aren’ t many successful applications for deep reinforcement learning in autonomous driving due to the following defects. Firstly, the current solutions do not take into account all the sensors deployed on autonomous vehicles, and most of them use only a front view image as the input and learn an end-to-end driving policy. But the front view image does not contain enough information for a decision making. For example, the agent cannot make a left lane change decision accurately, without information such as regarding a location, a speed, and a direction of the left rear vehicle in the adjacent lane. Secondly, most of current solutions feed a raw image into a reinforcement agent, the raw image contains extremely high dimensional information such as appearances and textures of the roads and objects, weather conditions, and light conditions, etc. However, these kinds of extremely complex high dimensional visual features dramatically enlarge the sample complexity for learning. In order to obtain good generalization, the dataset must cover enough data for each dimension of the raw sensor information.
Therefore, there is a need to develop an improved method/system to enable the reinforcement learning agent to generate a more accurate and safe driving strategy, with a faster learning process.
SUMMARY
The disclosure designs a low dimensional representation with enough information for the reinforcement learning agent to make a driving policy. This low dimensional representation not only includes environment information, but also includes vehicle information.
According to one aspect of the disclosure, a method of data processing for an autonomous driving vehicle. The method may comprises: receiving, from a plurality of sensors mounted on the autonomous driving vehicle, sensing data of surrounding environment of the autonomous driving vehicle; processing the sensing data and outputting environment data; extracting map data for the autonomous driving vehicle; generating a top-down view fusion image based on the environment data and the map data; encoding the top-down view fusion image and outputting output low-dimensional latent states; concatenating the low-dimensional latent states with vehicle information and outputting a state vector; and performing a training based on the state vector and determining a driving policy.
The environment data includes at least one of road data and obstacle object data. Furthermore, the processing the sensing data may comprises performing a road perception to output the road data, by identifying a segmented drivable area based on the sensing data; and performing an obstacle object detection to output the obstacle object data, by detecting, classifying and tracking one or more obstacle objects based on the sensing data.
The method further comprises identifying one or more lane marks within the segmented drivable area, and generating an obstacle object list for the obstacle object data based on the obstacle object data. For example, the obstacle object list may include information regarding a type, a size, a distance, a direction, a velocity and a heading of each obstacle object.
The map data comprises local map data which is extracted from a global map based on a current location of the autonomous vehicle. The map data further comprises intended route data which represents an intended route of the autonomous vehicle.
The method may further map the road data into a first top-down view, map the obstacle object data into a second top-down view, and then fuse the first top-down view, the second top-down view and the map data to generate the top-down view fusion image.
The top-down view fusion image may be expressed as a function of a width of the top-down view fusion image, a height of the top-down view fusion image and a number of channels for representing at least one of the road data, the obstacle object data and the map data.
Furthermore, the method may output a control command to the autonomous driving vehicle based on the driving policy.
Furthermore, the road perception and the obstacle object detection are performed simultaneously.
According to another aspect of the present disclosure, a system of data processing for an autonomous driving vehicle. The system may comprises a perception module, a local map extraction module, a top-down view fusion image generation module, a visual encoding module, a concatenate module and a reinforcement learning (RL) agent. The perception module is configured to receive, from a plurality of sensors mounted on the autonomous driving vehicle, sensing data of surrounding environment of the autonomous driving vehicle, process the sensing data and output environment data. The local map extraction module is configured to extract map data for the autonomous driving vehicle. The top-down view fusion image generation module is configured to generate a top-down view fusion image based on the environment data and the map data. The visual encoding module is configured to encode the top-down view fusion image to output low-dimensional latent states. The concatenate module is configured to concatenate the low-dimensional latent states with vehicle information to output a state vector. The reinforcement learning agent is configured to perform a training based on the state vector and determine a driving policy.
According to another aspect of the present disclosure, a computer readable media having computer-executable instructions for performing the abovesaid method is provided.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates a schematic block diagram including a vehicle environment in accordance with one or more embodiments of the present disclosure.
FIG. 2A illustrates an example of a multi-task network for obstacle objects detection and road perception.
FIG. 2B illustrates a schematic view which shows an example of a perception result.
FIG. 3 illustrates a schematic view which shows an example of local map for an autonomous driving vehicle.
FIG. 4 illustrates a schematic view which shows an example of an intended route for the autonomous driving vehicle.
FIG. 5 illustrates an example of a drivable area with a top-down view.
FIG. 6 illustrates an example of size of objects with a top-down view.
FIG. 7 illustrates an example of a top-down map fusion image.
FIG. 8 illustrates a flowchart of the method of the present disclosure.
To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures. It is contemplated that elements disclosed in one embodiment may be beneficially utilized on other embodiments without specific recitation. The drawings referred to here should not be understood as being drawn to scale unless specifically noted. Also, the drawings are often simplified and details or components omitted for clarity of presentation and explanation. The drawings and discussion serve to explain principles discussed below, where like designations denote like elements.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Examples will be provided below for illustration. The descriptions of the various examples will be presented for purposes of illustration, but are not intended to  be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.
FIG. 1 illustrates a schematic block diagram including an autonomous vehicle in accordance with one or more embodiments of the present disclosure, for explaining how to implement deep reinforcement learning for an autonomous vehicle in an autonomous driving system.
For example, the system may comprise a perception module 100, a local map extraction module 200, a top-down view fusion image generation module 300, a visual encoding module 400, a concatenate module 500 and a reinforcement learning (RL) agent 600. The perception module 100 receives, from a plurality of sensors mounted on the autonomous driving vehicle, sensing data of surrounding environment of the vehicle. The sensors may be on-board sensors, such as a camera, a radar and a LiDAR, etc. These sensors are amounted on and around the vehicle and may sense the surrounding environment information, for example a plurality of images, radar or LiDAR data of the surrounding environment. The perception module 100 may further process the sensing data and output environment data. The local map extraction module 200 extracts map data for the vehicle. For example, the local map extraction module 200 may extract a local map from the global route map based on GPS data from the vehicle-mounted GPS sensor. The top-down view fusion image generation module 300 generates a top-down view fusion image based on the environment data and the map data received respectively from the perception module 100 and the local map extraction module 200. The top-down view fusion image generated by the top-down view fusion image generation module 300 is then output to the visual encoding module 400 for encoding. The visual encoding module 400 performs encoding on the top-down view fusion image to capture low-dimensional latent states. These states are then concatenated with vehicle information in the concatenate module 500 to generate a final state vector for the reinforcement learning agent 600. In the reinforcement learning agent 600, reinforcement learning algorithms are adopted to train a deep network which takes the final state vector state as input and then a driving policy is  determined. Then, the reinforcement learning agent 600 outputs a control command such as a speed and a steering angle, etc., for controlling the vehicle.
For example, the perception module 100 may incorporate the capability of using multiple cameras, radars and LiDARs to identify the environment data, such as a drivable area and recognize obstacles. In particular, the perception module 100 includes a road perception sub-module 1002 and an obstacle object detection sub-module 1004. The road perception sub-module 1002 performs a road perception to obtain road data. For example, in the road perception sub-module 1002, a segmentation deep learning network is used to identify a region of pixels that indicates a segmented drivable area in a given image captured by the camera or LiDAR. Moreover, in this sub-module, lane markers may be segmented and detected from the segmented dravable area. The obstacle object detection sub-module 1004 performs an obstacle object detection to output obstacle object data. In the obstacle objects detection sub-module 1004, a deep learning network may be used to detect, classify, track obstacles based on the sensing data and generate the information regarding obstacles, such as type and position information, etc. This sub-module also fuses each obstacle object detected from the sensing data which are captured by different sensors, and then generates the obstacle object data indicative of a final fusion obstacle objects list, which contains more information regarding the objects, such as a type, a size, a distance, a direction, a velocity and a heading of each object. The environment data output from the perception module may include at least one of the road data and the obstacle object data.
The process of road perception and the process of obstacle object detection as said above may be performed separately. However, in order to reduce the computational complexity, the road perception and the obstacle object detection may be performed simultaneously by using a single multi-task model. FIG. 2A illustrates an example of a multi-task network architecture for performing the obstacle object detection and the road perception. The multi-task network architecture contains an encoder for the feature extraction and two decoder branches respectively for the object detection and road semantic segmentation. Both decoders use multi-level feature maps from the residual network based encoder. The decoders also use multiple convolutional  layers for feature decoding. FIG. 2B illustrates an example of perception result of the environment data including road data and obstacle objects data. As shown in FIG. 2B, for example, the left image shows the information of the detected obstacle objects. The types of the obstacle objects are identified, such as car, truck, traffic signs (Tsigns) , etc. The size of the obstacle objects are identified using length, width, and height . The distance information is also identified in the left image. As the skilled in the art can realize, it is only used to illustration and may include other information according to different environment and different interests/requirements. The right image in FIG. 2B mainly shows the road information for the same environment, wherein it shows the segmented drivable area on the ground and the segmented lane markers indicated as gray lines.
The local map extraction module 200 uses the vehicle current location obtained from the vehicle-mounted GPS module to extract a local map (local map data) from the global rout map. In the local map, the ego vehicle is always at a fixed location. Take the ego vehicle as the coordinate origin, there are at least two lane widths on each of the left and right sides. The forward and backward length from the ego vehicle should not be less than the ranging capability of the on-board sensors. As the vehicle moves, this local map moves with it so the vehicle always sees a fixed range. Here, W is used to indicate the width of the local map, and H is used to indicate the height of the local map. The local map may contain information of drivable road geometry. FIG. 3 illustrates an example of the extracted local map, in which the ego vehicle is indicated as a dot. FIG. 4 illustrates an example of an intended route long which the vehicle wish to drive. This intended route may be generated by a router and provide heuristic navigation information of the vehicle, including a location, a specified lane and a driving direction. Thus, map data may include the local map data and the intended route data.
The top-down view fusion image generation module 300 may include a top-down view image processing sub-module 3002 and a map fusion sub-module 3004. The top-down view image processing sub-module 3002 maps the road data into a first top-down view and map the obstacle object data into a second top-down view. The map fusion sub-module 3004 fuses the first top-down view, the second top-down view  and the map data to output the top-down view fusion image. For example, this top-down view fusion image generation module generates a top-down view fusion image for visual encoding by overlaying the drivable area, the obstacle objects and the local map.
Without loss of generality, the road regions can be assumed to be homographic planes in road detection scenarios. Therefore, Inverse Perspective Mapping can be used to map the segmented drivable area and the detected objects into top-down view. The Inverse Perspective Mapping can be considered as an image projection process. In this process, the pixels in the original images can be projected to the bird’s eye-view images using a mapping matrix (projection matrix) . A multiple-point correspondence-based method can be used to estimate the mapping matrix
Figure PCTCN2020090197-appb-000001
p i=Hp′ i, (i=1, 2, …n)       (1)
Figure PCTCN2020090197-appb-000002
where p′ and p represent the homogeneous coordinates of pixels in the original imageI′ (p′∈I′) and the projected image I (bird’s-eye view) (p∈I) , respectively. The mapping matrix
Figure PCTCN2020090197-appb-000003
can be estimated in two-steps: First, select two sets of pixels p′ i and p i (i=1, 2, …n) from I′ and I, respectively, and the optimal estimate
Figure PCTCN2020090197-appb-000004
of the mapping matrix H can be calculated based-on Equation (1) . Once
Figure PCTCN2020090197-appb-000005
is determined, all pixels of the original image I′ can be projected into a bird’s-eye view image I using the estimated
Figure PCTCN2020090197-appb-000006
 (Equation (2) ) . Figure 5 gives an example of top-down view drivable area.
As for obstacle objects, take the ego vehicle as the coordinate origin, those objects whose distance does not exceed the map size are retained. Considering the actual deployment of the on-board sensors, the car cannot perceive the entire shape information of neighboring objects. Here, the width and length of the object that can be seen in the perspective of the ego vehicle are used to represent the size of the obstacle objects. Figure 6 gives an example of such representation in image.
For example, the abovesaid generated top-down view images are further fused with the extracted local map and the intended route in the map fusion sub-module 3004  to generate a map-fusion image. Figure 7 gives an example of the generated map-fusion image. Inspired by the multi-channel representation in the image, here use multiple channels to represent the generated map fusion image. Thus, the top-down view fusion image may expressed as a function of a width of the top-down view fusion image, a height of the top-down view fusion image and a number of channels for representing at least one of the road data, the obstacle object data and the map data. For example, it can be expressed as (W, H, N) , in which W is the width of the image, H is the height of the image. N is the total number of the channels to represent segmented drivable area, detected obstacle objects and the intended route. For example, the segmented drivable area needs one channel. The detected obstacle objects need three channels to represent objects’ type, velocity, and heading information. The intended route needs one channel to represent driving direction. Accordingly, N may be five in this example.
The visual encoding module 400 uses a visual encoding algorithm, such as VAE, PCA or increment PCA, to learn a low dimensional latent representation, i.e., low dimensional latent states from the top-down view fusion image. These states are then concatenated with vehicle information, such as necessary vehicle kinematic parameters to generate a final state vector. The vehicle information includes, for example, a vehicle acceleration rate, a vehicle speed, a vehicle heading, and a vehicle lateral distance to road boundary, a vehicle previous steering angle and a vehicle steering torque.
Take the final state vector as input, the reinforcement learning agent 600 trains a deep network to learn the driving policy, and outputs a control command such as speed and steering angle to the vehicle. Several state-of-the-art model-free based algorithms can be implemented into the framework herein. For example, use LSTM based DDPG algorithm may be used preferably.
FIG. 8 illustrates a flowchart of a method for the autonomous driving vehicle according to one or more embodiments. The method may be suitable for the system described above and the features described in the abovesaid system may be incorporated into the method herein. As shown in FIG. 8, at block 810, the method receives, from a plurality of sensors mounted on the autonomous driving vehicle, sensing data of surrounding environment of the vehicle. At block 820, the sensing data  may be processed and environment data may be obtained. The environment data output from the perception module may include at least one of the road data and the obstacle object data. At block 830, map data may be extracted for the vehicle. The map data may include at least one of local map data and intended route map. At block 840, a top-down view fusion image may be generated based on the environment data and the map data. At block 850, the top-down view fusion image may be encoded to output low-dimensional latent states. At block 860, the low-dimensional latent states may be concatenated with vehicle information to output a state vector. The vehicle information may include, for example, a vehicle acceleration rate, a vehicle speed, a vehicle heading, and a vehicle lateral distance to road boundary, a vehicle previous steering angle and a vehicle steering torque. At block 870, a training to a deep network is performed based on the state vector and a driving policy may be determined.
The method and system disclosed herein uses a kind of input representation which contains enough information including road, neighboring objects and vehicle information, which makes the reinforcement learning agent generate more accurate and safe driving policy. The method and system disclosed herein uses the structured road geometry information, map information and the changed environment information which are uniformly represented as a fused image. Then, the fused image is further compressed in a fixed length vector using visual encoding technology, which accordingly makes the learning process of reinforcement learning converge faster.
The descriptions of the various embodiments have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
In the preceding, reference is made to embodiments presented in this disclosure. However, the scope of the present disclosure is not limited to specific described embodiments. Instead, any combination of the preceding features and  elements, whether related to different embodiments or not, is contemplated to implement and practice contemplated embodiments. Furthermore, although embodiments disclosed herein may achieve advantages over other possible solutions or over the prior art, whether or not a particular advantage is achieved by a given embodiment is not limiting of the scope of the present disclosure. Thus, the preceding aspects, features, embodiments and advantages are merely illustrative and are not considered elements or limitations of the appended claims except where explicitly recited in a claim (s) .
Aspects of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc. ) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit, ” “module” or “system. ”
The present disclosure may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.
The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM) , a read-only memory (ROM) , an erasable programmable read-only memory (EPROM or Flash memory) , a static random access memory (SRAM) , a portable compact disc read-only memory (CD-ROM) , a digital versatile disk (DVD) , a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves,  electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable) , or electrical signals transmitted through a wire.
Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) , and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a  series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function (s) . In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
While the foregoing is directed to embodiments of the present disclosure, other and further embodiments of the disclosure may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims (19)

  1. A method of data processing for an autonomous driving vehicle, the method comprising:
    receiving, from a plurality of sensors mounted on the autonomous driving vehicle, sensing data of surrounding environment of the autonomous driving vehicle;
    processing the sensing data and outputting environment data;
    extracting map data for the autonomous driving vehicle;
    generating a top-down view fusion image based on the environment data and the map data;
    encoding the top-down view fusion image and outputting output low-dimensional latent states;
    concatenating the low-dimensional latent states with vehicle information and outputting a state vector; and
    performing a training based on the state vector and determining a driving policy.
  2. The method according to claim 1, wherein the environment data includes at least one of road data and obstacle object data, and wherein the processing the sensing data comprising:
    performing a road perception to output the road data, by identifying a segmented drivable area based on the sensing data; and
    performing an obstacle object detection to output the obstacle object data, by detecting, classifying and tracking one or more obstacle objects based on the sensing data.
  3. The method according to claim 2, wherein the performing the road perception further comprising identifying one or more lane marks within the segmented drivable area.
  4. The method according to claim 2, wherein the performing an obstacle object detection further comprising generating an obstacle object list for the obstacle object data by fusing the obstacle object data, which includes information regarding a type, a size, a distance, a direction, a velocity and a heading of each obstacle object.
  5. The method according to any one of claims 1-4, wherein the map data comprises local map data which is extracted from a global map based on a current location of the autonomous vehicle, and intended route data which represents an intended route of the autonomous vehicle.
  6. The method according to any one of claims 2-5, wherein the generating the top-down view fusion image comprising further comprising:
    mapping the road data into a first top-down view;
    mapping the obstacle object data into a second top-down view; and
    fusing the first top-down view, the second top-down view and the map data to generate the top-down view fusion image.
  7. The method according to claim 6, wherein the top-down view fusion image is expressed as a function of a width of the top-down view fusion image, a height of the top-down view fusion image and a number of channels for representing at least one of the road data, the obstacle object data and the map data.
  8. The method according to any one of claims 1-7, wherein the method further comprises outputting a control command to the autonomous driving vehicle based on the driving policy.
  9. The method according to any one of claim 2-8, wherein the road perception and the obstacle object detection are performed simultaneously.
  10. A system of data processing for an autonomous driving vehicle, the system comprising:
    a perception module configured to receive, from a plurality of sensors mounted on the autonomous driving vehicle, sensing data of surrounding environment  of the autonomous driving vehicle, process the sensing data and output environment data;
    a local map extraction module configured to extract map data for the autonomous driving vehicle;
    a top-down view fusion image generation module configured to generate a top-down view fusion image based on the environment data and the map data;
    a visual encoding module configured to encode the top-down view fusion image to output low-dimensional latent states ;
    a concatenate module configured to concatenate the low-dimensional latent states with vehicle information to output a state vector; and
    a reinforcement learning agent configured to perform a training based on the state vector and determine a driving policy.
  11. The system according to claim 10, wherein the environment data includes at least one of road data and obstacle object data, and wherein the perception module comprises:
    a road perception sub-module configured to perform a road perception to output the road data, by identifying a segmented drivable area based on the sensing data; and
    an obstacle object detection sub-module configured to perform an obstacle object detection to output the obstacle object data, by detecting, classifying and tracking one or more obstacle objects based on the sensing data.
  12. The system according to claim 11, wherein the road perception sub-module further is configured to identify one or more lane marks within the segmented drivable area.
  13. The system according to claim 11, wherein the obstacle object detection sub-module is further configured to generate an obstacle object list for the obstacle object data, which includes information regarding a type, a size, a distance, a direction, a velocity and a heading of each obstacle object.
  14. The system according to nay one of claims 10-13, wherein the map data comprises local map data which is extracted from a global map based on a current location of the autonomous vehicle, and intended route data which represents an intended route of the autonomous vehicle.
  15. The system according to any one of claims 11-14, the top-down view fusion image generation module is further configured to:
    map the road data into a first top-down view;
    map the obstacle object data into a second top-down view; and
    fuse the first top-down view, the second top-down view and the map data to output the top-down view fusion image.
  16. The system of claim 15, wherein the top-down view fusion image is expressed as a function of a width of the top-down view fusion image, a height of the top-down view fusion image and a number of channels for representing at least one of the road data, the obstacle object data and the map data.
  17. The system of any one of claims 10-16, wherein the reinforcement learning agents is further configured to output a control command to the autonomous driving vehicle based on the driving policy.
  18. The system of any one of claims 11-17, wherein the road perception and the obstacle object detection are performed simultaneously.
  19. A computer readable medium having computer-executable instructions for performing the method according to any one of claims 1-9.
PCT/CN2020/090197 2020-05-14 2020-05-14 Method and system of data processing for autonomous driving WO2021226921A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/090197 WO2021226921A1 (en) 2020-05-14 2020-05-14 Method and system of data processing for autonomous driving

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/090197 WO2021226921A1 (en) 2020-05-14 2020-05-14 Method and system of data processing for autonomous driving

Publications (1)

Publication Number Publication Date
WO2021226921A1 true WO2021226921A1 (en) 2021-11-18

Family

ID=78526202

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/090197 WO2021226921A1 (en) 2020-05-14 2020-05-14 Method and system of data processing for autonomous driving

Country Status (1)

Country Link
WO (1) WO2021226921A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114030488A (en) * 2022-01-11 2022-02-11 清华大学 Method and device for realizing automatic driving decision, computer storage medium and terminal
CN115221260A (en) * 2022-07-18 2022-10-21 小米汽车科技有限公司 Data processing method, device, vehicle and storage medium
WO2023102962A1 (en) * 2021-12-06 2023-06-15 深圳先进技术研究院 Method for training end-to-end autonomous driving strategy
CN116385949A (en) * 2023-03-23 2023-07-04 广州里工实业有限公司 Mobile robot region detection method, system, device and medium
CN116453087A (en) * 2023-03-30 2023-07-18 无锡物联网创新中心有限公司 Automatic driving obstacle detection method of data closed loop
CN116880462A (en) * 2023-03-17 2023-10-13 北京百度网讯科技有限公司 Automatic driving model, training method, automatic driving method and vehicle
CN116881707A (en) * 2023-03-17 2023-10-13 北京百度网讯科技有限公司 Automatic driving model, training method, training device and vehicle

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018229552A2 (en) * 2017-06-14 2018-12-20 Mobileye Vision Technologies Ltd. Fusion framework and batch alignment of navigation information for autonomous navigation
US10205457B1 (en) * 2018-06-01 2019-02-12 Yekutiel Josefsberg RADAR target detection system for autonomous vehicles with ultra lowphase noise frequency synthesizer
US20190049957A1 (en) * 2018-03-30 2019-02-14 Intel Corporation Emotional adaptive driving policies for automated driving vehicles
US20190243371A1 (en) * 2018-02-02 2019-08-08 Nvidia Corporation Safety procedure analysis for obstacle avoidance in autonomous vehicles
US20200142421A1 (en) * 2018-11-05 2020-05-07 GM Global Technology Operations LLC Method and system for end-to-end learning of control commands for autonomous vehicle
US20200139973A1 (en) * 2018-11-01 2020-05-07 GM Global Technology Operations LLC Spatial and temporal attention-based deep reinforcement learning of hierarchical lane-change policies for controlling an autonomous vehicle

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018229552A2 (en) * 2017-06-14 2018-12-20 Mobileye Vision Technologies Ltd. Fusion framework and batch alignment of navigation information for autonomous navigation
US20190243371A1 (en) * 2018-02-02 2019-08-08 Nvidia Corporation Safety procedure analysis for obstacle avoidance in autonomous vehicles
US20190049957A1 (en) * 2018-03-30 2019-02-14 Intel Corporation Emotional adaptive driving policies for automated driving vehicles
US10205457B1 (en) * 2018-06-01 2019-02-12 Yekutiel Josefsberg RADAR target detection system for autonomous vehicles with ultra lowphase noise frequency synthesizer
US20200139973A1 (en) * 2018-11-01 2020-05-07 GM Global Technology Operations LLC Spatial and temporal attention-based deep reinforcement learning of hierarchical lane-change policies for controlling an autonomous vehicle
US20200142421A1 (en) * 2018-11-05 2020-05-07 GM Global Technology Operations LLC Method and system for end-to-end learning of control commands for autonomous vehicle

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023102962A1 (en) * 2021-12-06 2023-06-15 深圳先进技术研究院 Method for training end-to-end autonomous driving strategy
CN114030488A (en) * 2022-01-11 2022-02-11 清华大学 Method and device for realizing automatic driving decision, computer storage medium and terminal
US11673562B1 (en) 2022-01-11 2023-06-13 Tsinghua University Method, apparatus, computer storage medium and terminal for implementing autonomous driving decision-making
CN115221260A (en) * 2022-07-18 2022-10-21 小米汽车科技有限公司 Data processing method, device, vehicle and storage medium
CN115221260B (en) * 2022-07-18 2024-02-09 小米汽车科技有限公司 Data processing method, device, vehicle and storage medium
CN116880462A (en) * 2023-03-17 2023-10-13 北京百度网讯科技有限公司 Automatic driving model, training method, automatic driving method and vehicle
CN116881707A (en) * 2023-03-17 2023-10-13 北京百度网讯科技有限公司 Automatic driving model, training method, training device and vehicle
CN116385949A (en) * 2023-03-23 2023-07-04 广州里工实业有限公司 Mobile robot region detection method, system, device and medium
CN116385949B (en) * 2023-03-23 2023-09-08 广州里工实业有限公司 Mobile robot region detection method, system, device and medium
CN116453087A (en) * 2023-03-30 2023-07-18 无锡物联网创新中心有限公司 Automatic driving obstacle detection method of data closed loop
CN116453087B (en) * 2023-03-30 2023-10-20 无锡物联网创新中心有限公司 Automatic driving obstacle detection method of data closed loop

Similar Documents

Publication Publication Date Title
WO2021226921A1 (en) Method and system of data processing for autonomous driving
JP7228652B2 (en) OBJECT DETECTION DEVICE, OBJECT DETECTION METHOD AND PROGRAM
EP3751455A2 (en) Image fusion for autonomous vehicle operation
JP6833630B2 (en) Object detector, object detection method and program
US11386567B2 (en) Systems and methods for weakly supervised training of a model for monocular depth estimation
US10489686B2 (en) Object detection for an autonomous vehicle
US11482014B2 (en) 3D auto-labeling with structural and physical constraints
EP3822852B1 (en) Method, apparatus, computer storage medium and program for training a trajectory planning model
WO2022206942A1 (en) Laser radar point cloud dynamic segmentation and fusion method based on driving safety risk field
US11670087B2 (en) Training data generating method for image processing, image processing method, and devices thereof
US11280630B2 (en) Updating map data
EP3942794B1 (en) Depth-guided video inpainting for autonomous driving
CN109421730B (en) Cross traffic detection using cameras
US20230326168A1 (en) Perception system for autonomous vehicles
EP3859390A1 (en) Method and system for rendering a representation of an evinronment of a vehicle
Aditya et al. Collision Detection: An Improved Deep Learning Approach Using SENet and ResNext
CN112837209A (en) New method for generating image with distortion for fisheye lens
Sanberg et al. Asteroids: A stixel tracking extrapolation-based relevant obstacle impact detection system
US11544899B2 (en) System and method for generating terrain maps
EP4137845A1 (en) Methods and systems for predicting properties of a plurality of objects in a vicinity of a vehicle
US11461922B2 (en) Depth estimation in images obtained from an autonomous vehicle camera
US20230417885A1 (en) Systems and methods for detecting erroneous lidar data
US20230057509A1 (en) Vision-based machine learning model for autonomous driving with adjustable virtual camera
Kadav Advancing Winter Weather ADAS: Tire Track Identification and Road Snow Coverage Estimation Using Deep Learning and Sensor Integration
Hellinckx Lane Marking Detection Using LiDAR Sensor

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20935702

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20935702

Country of ref document: EP

Kind code of ref document: A1