WO2021226921A1

WO2021226921A1 - Method and system of data processing for autonomous driving

Info

Publication number: WO2021226921A1
Application number: PCT/CN2020/090197
Authority: WO
Inventors: Guoxia ZHANG; Qingshan Zhang; Kecai WU
Original assignee: Harman International Industries, Incorporated
Priority date: 2020-05-14
Filing date: 2020-05-14
Publication date: 2021-11-18

Abstract

The disclosure describes a method and a system of data processing for a vehicle. The method comprises receiving, from a plurality of sensors mounted on the autonomous driving vehicle, sensing data of surrounding environment of the autonomous driving vehicle; processing the sensing data and outputting environment data; extracting map data for the autonomous driving vehicle; generating a top-down view fusion image based on the environment data and the map data; encoding the top-down view fusion image and outputting output low-dimensional latent states; concatenating the low-dimensional latent states with vehicle information and outputting a state vector; and performing a training based on the state vector and determining a driving policy.

Description

METHOD AND SYSTEM OF DATA PROCESSING FOR AUTONOMOUS DRIVING

TECHINICAL FIELD

The present disclosure relates to a data processing method and system for an autonomous driving vehicle, and specifically relates to a method and system for inputting representation for deep reinforcement learning in an autonomous driving system.

BACKGROUND

Combined with deep learning techniques, reinforcement learning (RL) has brought a series of breakthroughs in recent years. However, there aren’ t many successful applications for deep reinforcement learning in autonomous driving due to the following defects. Firstly, the current solutions do not take into account all the sensors deployed on autonomous vehicles, and most of them use only a front view image as the input and learn an end-to-end driving policy. But the front view image does not contain enough information for a decision making. For example, the agent cannot make a left lane change decision accurately, without information such as regarding a location, a speed, and a direction of the left rear vehicle in the adjacent lane. Secondly, most of current solutions feed a raw image into a reinforcement agent, the raw image contains extremely high dimensional information such as appearances and textures of the roads and objects, weather conditions, and light conditions, etc. However, these kinds of extremely complex high dimensional visual features dramatically enlarge the sample complexity for learning. In order to obtain good generalization, the dataset must cover enough data for each dimension of the raw sensor information.

Therefore, there is a need to develop an improved method/system to enable the reinforcement learning agent to generate a more accurate and safe driving strategy, with a faster learning process.

SUMMARY

The disclosure designs a low dimensional representation with enough information for the reinforcement learning agent to make a driving policy. This low dimensional representation not only includes environment information, but also includes vehicle information.

According to one aspect of the disclosure, a method of data processing for an autonomous driving vehicle. The method may comprises: receiving, from a plurality of sensors mounted on the autonomous driving vehicle, sensing data of surrounding environment of the autonomous driving vehicle; processing the sensing data and outputting environment data; extracting map data for the autonomous driving vehicle; generating a top-down view fusion image based on the environment data and the map data; encoding the top-down view fusion image and outputting output low-dimensional latent states; concatenating the low-dimensional latent states with vehicle information and outputting a state vector; and performing a training based on the state vector and determining a driving policy.

The environment data includes at least one of road data and obstacle object data. Furthermore, the processing the sensing data may comprises performing a road perception to output the road data, by identifying a segmented drivable area based on the sensing data; and performing an obstacle object detection to output the obstacle object data, by detecting, classifying and tracking one or more obstacle objects based on the sensing data.

The method further comprises identifying one or more lane marks within the segmented drivable area, and generating an obstacle object list for the obstacle object data based on the obstacle object data. For example, the obstacle object list may include information regarding a type, a size, a distance, a direction, a velocity and a heading of each obstacle object.

The map data comprises local map data which is extracted from a global map based on a current location of the autonomous vehicle. The map data further comprises intended route data which represents an intended route of the autonomous vehicle.

The method may further map the road data into a first top-down view, map the obstacle object data into a second top-down view, and then fuse the first top-down view, the second top-down view and the map data to generate the top-down view fusion image.

The top-down view fusion image may be expressed as a function of a width of the top-down view fusion image, a height of the top-down view fusion image and a number of channels for representing at least one of the road data, the obstacle object data and the map data.

Furthermore, the method may output a control command to the autonomous driving vehicle based on the driving policy.

Furthermore, the road perception and the obstacle object detection are performed simultaneously.

According to another aspect of the present disclosure, a system of data processing for an autonomous driving vehicle. The system may comprises a perception module, a local map extraction module, a top-down view fusion image generation module, a visual encoding module, a concatenate module and a reinforcement learning (RL) agent. The perception module is configured to receive, from a plurality of sensors mounted on the autonomous driving vehicle, sensing data of surrounding environment of the autonomous driving vehicle, process the sensing data and output environment data. The local map extraction module is configured to extract map data for the autonomous driving vehicle. The top-down view fusion image generation module is configured to generate a top-down view fusion image based on the environment data and the map data. The visual encoding module is configured to encode the top-down view fusion image to output low-dimensional latent states. The concatenate module is configured to concatenate the low-dimensional latent states with vehicle information to output a state vector. The reinforcement learning agent is configured to perform a training based on the state vector and determine a driving policy.

According to another aspect of the present disclosure, a computer readable media having computer-executable instructions for performing the abovesaid method is provided.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a schematic block diagram including a vehicle environment in accordance with one or more embodiments of the present disclosure.

FIG. 2A illustrates an example of a multi-task network for obstacle objects detection and road perception.

FIG. 2B illustrates a schematic view which shows an example of a perception result.

FIG. 3 illustrates a schematic view which shows an example of local map for an autonomous driving vehicle.

FIG. 4 illustrates a schematic view which shows an example of an intended route for the autonomous driving vehicle.

FIG. 5 illustrates an example of a drivable area with a top-down view.

FIG. 6 illustrates an example of size of objects with a top-down view.

FIG. 7 illustrates an example of a top-down map fusion image.

FIG. 8 illustrates a flowchart of the method of the present disclosure.

To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures. It is contemplated that elements disclosed in one embodiment may be beneficially utilized on other embodiments without specific recitation. The drawings referred to here should not be understood as being drawn to scale unless specifically noted. Also, the drawings are often simplified and details or components omitted for clarity of presentation and explanation. The drawings and discussion serve to explain principles discussed below, where like designations denote like elements.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Examples will be provided below for illustration. The descriptions of the various examples will be presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.

FIG. 1 illustrates a schematic block diagram including an autonomous vehicle in accordance with one or more embodiments of the present disclosure, for explaining how to implement deep reinforcement learning for an autonomous vehicle in an autonomous driving system.

For example, the system may comprise a perception module 100, a local map extraction module 200, a top-down view fusion image generation module 300, a visual encoding module 400, a concatenate module 500 and a reinforcement learning (RL) agent 600. The perception module 100 receives, from a plurality of sensors mounted on the autonomous driving vehicle, sensing data of surrounding environment of the vehicle. The sensors may be on-board sensors, such as a camera, a radar and a LiDAR, etc. These sensors are amounted on and around the vehicle and may sense the surrounding environment information, for example a plurality of images, radar or LiDAR data of the surrounding environment. The perception module 100 may further process the sensing data and output environment data. The local map extraction module 200 extracts map data for the vehicle. For example, the local map extraction module 200 may extract a local map from the global route map based on GPS data from the vehicle-mounted GPS sensor. The top-down view fusion image generation module 300 generates a top-down view fusion image based on the environment data and the map data received respectively from the perception module 100 and the local map extraction module 200. The top-down view fusion image generated by the top-down view fusion image generation module 300 is then output to the visual encoding module 400 for encoding. The visual encoding module 400 performs encoding on the top-down view fusion image to capture low-dimensional latent states. These states are then concatenated with vehicle information in the concatenate module 500 to generate a final state vector for the reinforcement learning agent 600. In the reinforcement learning agent 600, reinforcement learning algorithms are adopted to train a deep network which takes the final state vector state as input and then a driving policy is determined. Then, the reinforcement learning agent 600 outputs a control command such as a speed and a steering angle, etc., for controlling the vehicle.

For example, the perception module 100 may incorporate the capability of using multiple cameras, radars and LiDARs to identify the environment data, such as a drivable area and recognize obstacles. In particular, the perception module 100 includes a road perception sub-module 1002 and an obstacle object detection sub-module 1004. The road perception sub-module 1002 performs a road perception to obtain road data. For example, in the road perception sub-module 1002, a segmentation deep learning network is used to identify a region of pixels that indicates a segmented drivable area in a given image captured by the camera or LiDAR. Moreover, in this sub-module, lane markers may be segmented and detected from the segmented dravable area. The obstacle object detection sub-module 1004 performs an obstacle object detection to output obstacle object data. In the obstacle objects detection sub-module 1004, a deep learning network may be used to detect, classify, track obstacles based on the sensing data and generate the information regarding obstacles, such as type and position information, etc. This sub-module also fuses each obstacle object detected from the sensing data which are captured by different sensors, and then generates the obstacle object data indicative of a final fusion obstacle objects list, which contains more information regarding the objects, such as a type, a size, a distance, a direction, a velocity and a heading of each object. The environment data output from the perception module may include at least one of the road data and the obstacle object data.

The process of road perception and the process of obstacle object detection as said above may be performed separately. However, in order to reduce the computational complexity, the road perception and the obstacle object detection may be performed simultaneously by using a single multi-task model. FIG. 2A illustrates an example of a multi-task network architecture for performing the obstacle object detection and the road perception. The multi-task network architecture contains an encoder for the feature extraction and two decoder branches respectively for the object detection and road semantic segmentation. Both decoders use multi-level feature maps from the residual network based encoder. The decoders also use multiple convolutional layers for feature decoding. FIG. 2B illustrates an example of perception result of the environment data including road data and obstacle objects data. As shown in FIG. 2B, for example, the left image shows the information of the detected obstacle objects. The types of the obstacle objects are identified, such as car, truck, traffic signs (Tsigns) , etc. The size of the obstacle objects are identified using length, width, and height . The distance information is also identified in the left image. As the skilled in the art can realize, it is only used to illustration and may include other information according to different environment and different interests/requirements. The right image in FIG. 2B mainly shows the road information for the same environment, wherein it shows the segmented drivable area on the ground and the segmented lane markers indicated as gray lines.

The local map extraction module 200 uses the vehicle current location obtained from the vehicle-mounted GPS module to extract a local map (local map data) from the global rout map. In the local map, the ego vehicle is always at a fixed location. Take the ego vehicle as the coordinate origin, there are at least two lane widths on each of the left and right sides. The forward and backward length from the ego vehicle should not be less than the ranging capability of the on-board sensors. As the vehicle moves, this local map moves with it so the vehicle always sees a fixed range. Here, W is used to indicate the width of the local map, and H is used to indicate the height of the local map. The local map may contain information of drivable road geometry. FIG. 3 illustrates an example of the extracted local map, in which the ego vehicle is indicated as a dot. FIG. 4 illustrates an example of an intended route long which the vehicle wish to drive. This intended route may be generated by a router and provide heuristic navigation information of the vehicle, including a location, a specified lane and a driving direction. Thus, map data may include the local map data and the intended route data.

The top-down view fusion image generation module 300 may include a top-down view image processing sub-module 3002 and a map fusion sub-module 3004. The top-down view image processing sub-module 3002 maps the road data into a first top-down view and map the obstacle object data into a second top-down view. The map fusion sub-module 3004 fuses the first top-down view, the second top-down view and the map data to output the top-down view fusion image. For example, this top-down view fusion image generation module generates a top-down view fusion image for visual encoding by overlaying the drivable area, the obstacle objects and the local map.

Without loss of generality, the road regions can be assumed to be homographic planes in road detection scenarios. Therefore, Inverse Perspective Mapping can be used to map the segmented drivable area and the detected objects into top-down view. The Inverse Perspective Mapping can be considered as an image projection process. In this process, the pixels in the original images can be projected to the bird’s eye-view images using a mapping matrix (projection matrix) . A multiple-point correspondence-based method can be used to estimate the mapping matrix

p _i=Hp′ _i, (i=1, 2, …n) (1)

where p′ and p represent the homogeneous coordinates of pixels in the original imageI′ (p′∈I′) and the projected image I (bird’s-eye view) (p∈I) , respectively. The mapping matrix

can be estimated in two-steps: First, select two sets of pixels p′ _i and p _i (i=1, 2, …n) from I′ and I, respectively, and the optimal estimate

of the mapping matrix H can be calculated based-on Equation (1) . Once

is determined, all pixels of the original image I′ can be projected into a bird’s-eye view image I using the estimated

(Equation (2) ) . Figure 5 gives an example of top-down view drivable area.

As for obstacle objects, take the ego vehicle as the coordinate origin, those objects whose distance does not exceed the map size are retained. Considering the actual deployment of the on-board sensors, the car cannot perceive the entire shape information of neighboring objects. Here, the width and length of the object that can be seen in the perspective of the ego vehicle are used to represent the size of the obstacle objects. Figure 6 gives an example of such representation in image.

For example, the abovesaid generated top-down view images are further fused with the extracted local map and the intended route in the map fusion sub-module 3004 to generate a map-fusion image. Figure 7 gives an example of the generated map-fusion image. Inspired by the multi-channel representation in the image, here use multiple channels to represent the generated map fusion image. Thus, the top-down view fusion image may expressed as a function of a width of the top-down view fusion image, a height of the top-down view fusion image and a number of channels for representing at least one of the road data, the obstacle object data and the map data. For example, it can be expressed as (W, H, N) , in which W is the width of the image, H is the height of the image. N is the total number of the channels to represent segmented drivable area, detected obstacle objects and the intended route. For example, the segmented drivable area needs one channel. The detected obstacle objects need three channels to represent objects’ type, velocity, and heading information. The intended route needs one channel to represent driving direction. Accordingly, N may be five in this example.

The visual encoding module 400 uses a visual encoding algorithm, such as VAE, PCA or increment PCA, to learn a low dimensional latent representation, i.e., low dimensional latent states from the top-down view fusion image. These states are then concatenated with vehicle information, such as necessary vehicle kinematic parameters to generate a final state vector. The vehicle information includes, for example, a vehicle acceleration rate, a vehicle speed, a vehicle heading, and a vehicle lateral distance to road boundary, a vehicle previous steering angle and a vehicle steering torque.

Take the final state vector as input, the reinforcement learning agent 600 trains a deep network to learn the driving policy, and outputs a control command such as speed and steering angle to the vehicle. Several state-of-the-art model-free based algorithms can be implemented into the framework herein. For example, use LSTM based DDPG algorithm may be used preferably.

FIG. 8 illustrates a flowchart of a method for the autonomous driving vehicle according to one or more embodiments. The method may be suitable for the system described above and the features described in the abovesaid system may be incorporated into the method herein. As shown in FIG. 8, at block 810, the method receives, from a plurality of sensors mounted on the autonomous driving vehicle, sensing data of surrounding environment of the vehicle. At block 820, the sensing data may be processed and environment data may be obtained. The environment data output from the perception module may include at least one of the road data and the obstacle object data. At block 830, map data may be extracted for the vehicle. The map data may include at least one of local map data and intended route map. At block 840, a top-down view fusion image may be generated based on the environment data and the map data. At block 850, the top-down view fusion image may be encoded to output low-dimensional latent states. At block 860, the low-dimensional latent states may be concatenated with vehicle information to output a state vector. The vehicle information may include, for example, a vehicle acceleration rate, a vehicle speed, a vehicle heading, and a vehicle lateral distance to road boundary, a vehicle previous steering angle and a vehicle steering torque. At block 870, a training to a deep network is performed based on the state vector and a driving policy may be determined.

The method and system disclosed herein uses a kind of input representation which contains enough information including road, neighboring objects and vehicle information, which makes the reinforcement learning agent generate more accurate and safe driving policy. The method and system disclosed herein uses the structured road geometry information, map information and the changed environment information which are uniformly represented as a fused image. Then, the fused image is further compressed in a fixed length vector using visual encoding technology, which accordingly makes the learning process of reinforcement learning converge faster.

The descriptions of the various embodiments have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

In the preceding, reference is made to embodiments presented in this disclosure. However, the scope of the present disclosure is not limited to specific described embodiments. Instead, any combination of the preceding features and elements, whether related to different embodiments or not, is contemplated to implement and practice contemplated embodiments. Furthermore, although embodiments disclosed herein may achieve advantages over other possible solutions or over the prior art, whether or not a particular advantage is achieved by a given embodiment is not limiting of the scope of the present disclosure. Thus, the preceding aspects, features, embodiments and advantages are merely illustrative and are not considered elements or limitations of the appended claims except where explicitly recited in a claim (s) .

Aspects of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc. ) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit, ” “module” or “system. ”

The present disclosure may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present disclosure.

The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM) , a read-only memory (ROM) , an erasable programmable read-only memory (EPROM or Flash memory) , a static random access memory (SRAM) , a portable compact disc read-only memory (CD-ROM) , a digital versatile disk (DVD) , a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable) , or electrical signals transmitted through a wire.

Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.

Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) , and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function (s) . In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.

While the foregoing is directed to embodiments of the present disclosure, other and further embodiments of the disclosure may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims

A method of data processing for an autonomous driving vehicle, the method comprising:

receiving, from a plurality of sensors mounted on the autonomous driving vehicle, sensing data of surrounding environment of the autonomous driving vehicle;

processing the sensing data and outputting environment data;

extracting map data for the autonomous driving vehicle;

generating a top-down view fusion image based on the environment data and the map data;

encoding the top-down view fusion image and outputting output low-dimensional latent states;

concatenating the low-dimensional latent states with vehicle information and outputting a state vector; and

performing a training based on the state vector and determining a driving policy.
The method according to claim 1, wherein the environment data includes at least one of road data and obstacle object data, and wherein the processing the sensing data comprising:

performing a road perception to output the road data, by identifying a segmented drivable area based on the sensing data; and

performing an obstacle object detection to output the obstacle object data, by detecting, classifying and tracking one or more obstacle objects based on the sensing data.
The method according to claim 2, wherein the performing the road perception further comprising identifying one or more lane marks within the segmented drivable area.
The method according to claim 2, wherein the performing an obstacle object detection further comprising generating an obstacle object list for the obstacle object data by fusing the obstacle object data, which includes information regarding a type, a size, a distance, a direction, a velocity and a heading of each obstacle object.
The method according to any one of claims 1-4, wherein the map data comprises local map data which is extracted from a global map based on a current location of the autonomous vehicle, and intended route data which represents an intended route of the autonomous vehicle.
The method according to any one of claims 2-5, wherein the generating the top-down view fusion image comprising further comprising:

mapping the road data into a first top-down view;

mapping the obstacle object data into a second top-down view; and

fusing the first top-down view, the second top-down view and the map data to generate the top-down view fusion image.
The method according to claim 6, wherein the top-down view fusion image is expressed as a function of a width of the top-down view fusion image, a height of the top-down view fusion image and a number of channels for representing at least one of the road data, the obstacle object data and the map data.
The method according to any one of claims 1-7, wherein the method further comprises outputting a control command to the autonomous driving vehicle based on the driving policy.
The method according to any one of claim 2-8, wherein the road perception and the obstacle object detection are performed simultaneously.
A system of data processing for an autonomous driving vehicle, the system comprising:

a perception module configured to receive, from a plurality of sensors mounted on the autonomous driving vehicle, sensing data of surrounding environment of the autonomous driving vehicle, process the sensing data and output environment data;

a local map extraction module configured to extract map data for the autonomous driving vehicle;

a top-down view fusion image generation module configured to generate a top-down view fusion image based on the environment data and the map data;

a visual encoding module configured to encode the top-down view fusion image to output low-dimensional latent states ;

a concatenate module configured to concatenate the low-dimensional latent states with vehicle information to output a state vector; and

a reinforcement learning agent configured to perform a training based on the state vector and determine a driving policy.
The system according to claim 10, wherein the environment data includes at least one of road data and obstacle object data, and wherein the perception module comprises:

a road perception sub-module configured to perform a road perception to output the road data, by identifying a segmented drivable area based on the sensing data; and

an obstacle object detection sub-module configured to perform an obstacle object detection to output the obstacle object data, by detecting, classifying and tracking one or more obstacle objects based on the sensing data.
The system according to claim 11, wherein the road perception sub-module further is configured to identify one or more lane marks within the segmented drivable area.
The system according to claim 11, wherein the obstacle object detection sub-module is further configured to generate an obstacle object list for the obstacle object data, which includes information regarding a type, a size, a distance, a direction, a velocity and a heading of each obstacle object.
The system according to nay one of claims 10-13, wherein the map data comprises local map data which is extracted from a global map based on a current location of the autonomous vehicle, and intended route data which represents an intended route of the autonomous vehicle.
The system according to any one of claims 11-14, the top-down view fusion image generation module is further configured to:

map the road data into a first top-down view;

map the obstacle object data into a second top-down view; and

fuse the first top-down view, the second top-down view and the map data to output the top-down view fusion image.
The system of claim 15, wherein the top-down view fusion image is expressed as a function of a width of the top-down view fusion image, a height of the top-down view fusion image and a number of channels for representing at least one of the road data, the obstacle object data and the map data.
The system of any one of claims 10-16, wherein the reinforcement learning agents is further configured to output a control command to the autonomous driving vehicle based on the driving policy.
The system of any one of claims 11-17, wherein the road perception and the obstacle object detection are performed simultaneously.
A computer readable medium having computer-executable instructions for performing the method according to any one of claims 1-9.