WO2023089635A1

WO2023089635A1 - Method and system to facilitate navigation planning for vehicles

Info

Publication number: WO2023089635A1
Application number: PCT/IN2022/051010
Authority: WO
Inventors: Unnikrishnan R. NAIR; Sarthak Sharma; Midhun S. MENON; Srikanth Vidapanakal
Original assignee: Ola Electric Mobility Private Limited
Priority date: 2021-11-17
Filing date: 2022-11-17
Publication date: 2023-05-25
Also published as: WO2023089635A8

Abstract

A navigation planning method and a navigation planning system to facilitate navigation planning for vehicles is provided. The navigation planning method includes reception, by a processor, an output that indicates semantic information and a set of waypoints associated with a non-planar surface based on a set of images of the non-planar surface to be traversed by a vehicle, from a trained neural network. A trajectory to navigate the vehicle from a source waypoint to a target waypoint is determined by the processor based on the semantic information and the set of waypoints. The vehicle is navigated based on the determined trajectory by a controller of the vehicle.

Description

METHOD AND SYSTEM TO FACILITATE NAVIGATION PLANNING FOR

VEHICLES

BACKGROUND

FIELD OF THE DISCLOSURE

Various embodiments of the disclosure relate generally to vehicles. More specifically, various embodiments of the disclosure relate to methods and systems to facilitate navigation planning for vehicles.

DESCRIPTION OF THE RELATED ART

A vehicle requires efficient reasoning about spatio-temporal nature of semantics of a surrounding scene of the vehicle for autonomous driving. Recent approaches have successfully augmented the traditional modular architecture of an autonomous driving stack comprising perception, prediction, and planning in an end-to-end trainable system. Such a system calls for a shared latent encoding with interpretable intermediate trainable projected representation. One such successfully deployed representation is the Bird’s-Eye View (BEV) representation of the scene in ego-frame. However, a fundamental assumption for a BEV is a coplanarity of the vehicle with other traffic participants. In other words, BEV assumes a planar surface. This assumption is highly restrictive, as roads can have gradients, making navigation planning inefficient and incorrect.

In light of the aforementioned drawbacks, there is a need for a technical solution that improves navigation planning for vehicles.

SUMMARY

Methods and systems for enabling, are provided substantially as shown in, and described in connection with, at least one of the figures, as set forth more completely in the claims.

These and other features and advantages of the present disclosure may be appreciated from a review of the following detailed description of the present disclosure, along with the accompanying figures in which like reference numerals refer to like parts throughout. BRIEF DESCRIPTION OF DRAWINGS

FIG. 1A is a block diagram that illustrates a navigation planning system to facilitate navigation planning for vehicles, in accordance with an exemplary embodiment of the present disclosure;

FIG. IB is a block diagram that illustrates the navigation planning system to facilitate navigation planning for vehicles, in accordance with another exemplary embodiment of the present disclosure;

FIG. 2 is a block diagram that illustrates training of a first neural network of FIGS. 1A and IB, in accordance with an exemplary embodiment of the present disclosure;

FIG. 3 is a block diagram that illustrates implementation of a trained neural network of FIGS. 1 A and IB, in accordance with an exemplary embodiment of the present disclosure;

FIG. 4 is a diagram that illustrates a non-planar surface to be traversed by a first vehicle of FIGS. 1A and IB, in accordance with an exemplary embodiment of the present disclosure;

FIG. 5 is a diagram that illustrates the trajectory to be traversed by the first vehicle of FIGS. 1A and IB, in accordance with an exemplary embodiment of the present disclosure;

FIGS. 6A and 6B collectively represent a flowchart that illustrates a navigation planning method for the first vehicle of FIGS. 1A and IB, in accordance with an exemplary embodiment of the present disclosure; and

FIGS. 7A and 7B collectively represent a flowchart that illustrates a method for training the first neural network of FIGS. 1A and IB, in accordance with an exemplary embodiment of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS

The present disclosure is best understood with reference to the detailed figures and description set forth herein. Various embodiments are discussed below with reference to the figures. However, those skilled in the art will readily appreciate that the detailed descriptions given herein with respect to the figures are simply for explanatory purposes as the methods and systems may extend beyond the described embodiments. In one example, the teachings presented and the needs of a particular application may yield multiple alternate and suitable approaches to implement the functionality of any detail described herein. Therefore, any approach may extend beyond the particular implementation choices in the following embodiments that are described and shown.

Further, the detailed description of the appended drawings is intended as a description of the currently preferred embodiments of the present disclosure and is not intended to represent the only form in which the present disclosure may be practiced.

References to “an embodiment”, “another embodiment”, “yet another embodiment”, “one example”, “another example”, “yet another example”, “for example”, and so on, indicate that the embodiment(s) or example(s) so described may include a particular feature, structure, characteristic, property, element, or limitation, but that not every embodiment or example necessarily includes that particular feature, structure, characteristic, property, element or limitation. Furthermore, repeated use of the phrase “in an embodiment” does not necessarily refer to the same embodiment.

Exemplary aspects of the disclosure provide a navigation planning method. The navigation planning method includes receiving, by a processor an output that indicates semantic information and a set of waypoints associated with a non-planar surface from a trained neural network. The output is based on a first set of images of the non-planar surface to be traversed by the vehicle. The method further includes determining, by the processor, a trajectory to navigate the vehicle from a source waypoint to a target waypoint on the non-planar surface, based on the output.

In another embodiment, a navigation planning system for a vehicle is disclosed. The navigation planning system includes a trained neural network and a processor. The processor is configured to receive, an output that indicates semantic information and a set of waypoints associated with a non-planar surface, from the trained neural network. The output is based on a first set of images of a non-planar surface to be traversed by the vehicle. The processor is further configured to determine, a trajectory to navigate the vehicle from a source waypoint to a target waypoint on the non-planar surface, based on the output. In yet another embodiment, a navigation planning system for vehicles is disclosed. The navigation planning system includes a first neural network and a processor. The processor is configured to train the first neural network based on a first set of images of a non-planar surface to be traversed by a plurality of vehicles, to obtain a trained neural network. The processor is further configured to receive, from a target vehicle, a second set of images of the non-planar surface to be traversed by the target vehicle. Further, the processor is configured to receive, based on the second set of images, from the trained neural network, an output that indicates semantic information and a set of waypoints associated with the non-planar surface. The processor further determines a trajectory to navigate the target vehicle from a source waypoint to a target waypoint on the non-planar surface, based on the output.

In some embodiments, the semantic information represents a plurality of objects present on the non-planar surface to be traversed by the vehicle.

In some embodiments, the navigation planning method further includes generating, by the trained neural network, a latent representation based on a summation of feature representations of the first set of images, a first speed dataset, and a first set of positional embeddings that are associated with the vehicle. The navigation planning method further includes generating, by the trained neural network, a latent embedding based on the latent representation and a first location dataset associated with the vehicle. The navigation planning method further includes extracting, by the trained neural network, a first set of control points from the latent embedding and generating, by the trained neural network, the semantic information and the set of waypoints, based on the first set of control points and the first location dataset.

In some embodiments, the navigation planning method further includes training a first neural network based on a second set of images of the non-planar surface to be traversed by the vehicle, a second speed dataset, a second set of positional embeddings, a second location dataset, and a second set of control points that are associated with the vehicle, to obtain the trained neural network.

In some embodiments, the second set of control points is based on the non-planar surface captured by a three-dimensional sensor of a plurality of sensors of the vehicle. The three-dimensional sensor is embedded in the vehicle during the training of the first neural network. In some embodiments, the training of the first neural network includes generating, a latent representation based on a summation of feature representations of the second set of images, the second speed dataset, and the second set of positional embeddings. The training of the first neural network further includes generating, a latent embedding based on the latent representation and the second location dataset. An attention map of the latent embedding is updated, iteratively. The training further includes extracting, a third set of control points from the latent embedding. The training of the first neural network further includes determining, a loss value based on comparison of the third set of control points with the second set of control points such that the attention map of the latent embedding is updated until the loss value is below a threshold value. The training of the first neural network further includes generating, the semantic information and the set of waypoints, based on the third set of control points and the second location dataset.

In some embodiments, the vehicle is navigated based on the trajectory determined by the processor, by a controller of the vehicle.

In some embodiments, the trained neural network is configured to generate a latent representation based on a summation of feature representations of the first set of images, a first speed dataset, and a first set of positional embeddings that are associated with the vehicle. The trained neural network is further configured to generate, a latent embedding based on the latent representation and a first location dataset associated with the vehicle. Additionally, the trained neural network is configured to extract, a first set of control points from the latent embedding and generate, the semantic information and the set of waypoints, based on the first set of control points and the first location dataset.

In some embodiments, the navigation planning system further includes a first neural network. The processor is configured to train the first neural network based on a second set of images of the non- planar surface to be traversed by the vehicle, a second speed dataset, a second set of positional embeddings, a second location dataset, and a second set of control points that are associated with the vehicle, to obtain the trained neural network.

In some embodiments, when the processor trains the first neural network, the first neural network is configured to generate, a latent representation based on a summation of feature representations of the second set of images, the second speed dataset, and the second set of positional embeddings. The first neural network is further configured to generate, a latent embedding based on the latent representation and the second location dataset. An attention map of the latent embedding is updated, iteratively. The first neural network is further configured to extract, a third set of control points from the latent embedding. Additionally, the first neural network is configured to determine, a loss value based on comparison of the third set of control points with the second set of control points such that the attention map of the latent embedding is updated until the loss value is below a threshold value. The first neural network is further configured to generate, the semantic information and the set of waypoints, based on the third set of control points and the second location dataset.

In some embodiments, the navigation planning system includes a controller that is configured to navigate the vehicle based on the trajectory determined by the processor.

In some embodiments, the first neural network is trained based on a first speed dataset, a first set of positional embeddings, a first location dataset, and a first set of control points that are associated with the plurality of vehicles.

In some embodiments, the processor is configured to receive a second speed dataset, a second set of positional embeddings, and a second location dataset that are associated with the target vehicle, from the target vehicle.

Various embodiments of the disclosure may be found in disclosed systems and methods for facilitating navigation planning for vehicles. A navigation planning system includes a processor and a trained neural network. The processor determines a trajectory to be traversed by a vehicle on a non-planar surface from a one waypoint to another waypoint, based on an output of the trained neural network. The trained neural network outputs semantic information of the non-planar surface to be traversed by the vehicle and waypoints based on images provided by the processor as an input. The vehicle follows the determined trajectory.

The disclosed system and method consider a surface to be traversed by the vehicle as a non-planar surface. Therefore, any vehicles or traffic participants present on the non-planar surface are considered to be non-coplanar. As a result, an accuracy of navigation planning for the vehicles is improved in comparison to conventional navigation planning techniques. FIG. 1 A is a block diagram that illustrates a navigation planning system 100 to facilitate navigation planning for vehicles, in accordance with an exemplary embodiment of the present disclosure. The navigation planning system 100 is shown to include a plurality of vehicles 102, a database server 104, and a communication network 106. Examples of the communication network 106 may include, but are not limited to, a wireless fidelity (Wi-Fi) network, a light fidelity (Li-Fi) network, a local area network (FAN), a wide area network (WAN), a metropolitan area network (MAN), a satellite network, the Internet, a fiber optic network, a coaxial cable network, an infrared (IR) network, a radio frequency (RF) network, and a combination thereof. Various entities (such as, the plurality of vehicles 102 and the database server 104) in the navigation planning system 100 may be communicatively coupled to the communication network 106 in accordance with various wired and wireless communication protocols, such as Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), Fong Term Evolution (ETE) communication protocols, or any combination thereof.

The plurality of vehicles 102 is shown to include a first vehicle 102a. However, in an actual implementation, the plurality of vehicles 102 may include any number of vehicles, without deviating from the scope of the disclosure. For the sake of brevity, the plurality of vehicles 102 is shown to include the first vehicle 102a. Thus, the navigation planning system 100 facilitates navigation planning for the first vehicle 102a.

The first vehicle 102a and the database server 104 may be communicatively coupled by way of the communication network 106.

The first vehicle 102a is a mode of transport that is used by a user to travel from one place to another. In an embodiment, the first vehicle 102a may be owned by the user. In another embodiment, the first vehicle 102a may be owned by a vehicle service provider for offering on- demand vehicle or ride services to a passenger (e.g., the user). The first vehicle 102a may be a two-wheeled vehicle, a three-wheeled vehicle, a four-wheeled vehicle, or a vehicle with greater than four wheels. In an example, the first vehicle 102a may be an electric vehicle or a hybrid vehicle. In an embodiment, the first vehicle 102a may be an autonomous vehicle (e.g., self-driving vehicle) that features partial or fully autonomous driving capabilities. In other words, the first vehicle 102a may include an advanced driver assistance system (ADAS) that enables partial or fully autonomous driving of the first vehicle 102a. The ADAS included in the first vehicle 102a may correspond to one of various levels of ADAS. The various levels of ADAS may include Level “0”, Level “1”, Level “2”, Level “2+”, Level “3”, Level “4”, and Level “5”. For the sake of brevity, the various levels of ADAS are assumed to be known to those of ordinary skill in the art, and, therefore, are not explained. In this disclosure, surface traversed by the first vehicle 102a is considered to be non-planar.

The first vehicle 102a may include a plurality of sensors 110, a plurality of vehicular systems 112, a first processor 114, a first memory 116, a controller 118, a first network interface 120, and a neural network 122. The plurality of sensors 110, the plurality of vehicular systems 112, the first processor 114, the first memory 116, the controller 118, the first network interface 120, and the neural network 122 may communicate with each other via one or more vehicle communication protocols, such as Control Area Network (CAN) bus protocol, Local Interconnect Network (LIN) bus protocol, Flex-Ray protocol, or the like.

The plurality of sensors 110 may include various active and passive sensors that facilitate operation (e.g., autonomous driving) of the first vehicle 102a. The plurality of sensors 110 may include a set of image sensors (e.g., cameras), a set of velocity or speed sensors (e.g., inertial measurement unit (IMU) sensors), and a set of location detection sensors (e.g., global positioning system sensors). The set of image sensors may capture surrounding environment of the first vehicle 102a. In a nonlimiting example, the set of image sensors include six image sensors. Each image sensor of the set of image sensors may be embedded on different sides of the first vehicle 102a. Images captured by the set of image sensors may be associated with one of a red, green, and blue (RGB) color model, a black and white color model, a greyscale color model, a hue, saturation, and value (HSV) color model, a cyan, magenta, yellow, and black (CMYK) color model, a hue, saturation, and brightness (HSB) color model, a hue, saturation, and lightness (HSL) color model, a hue, chroma, and value (HCV) color model, or the like. In the present disclosure, it is assumed that the images captured by the set of image sensors is associated with RGB color model.

The set of velocity or speed sensors may be configured to obtain speed or velocity of the first vehicle 102a. For the sake of brevity, the terms “velocity” and “speed” are used interchangeably throughout the disclosure. The plurality of sensors 110 may further include radar sensors, three- dimensional sensors (e.g., light detection and ranging (LiDAR sensors), sensors associated with global navigation satellite systems (GNSS), ultrasonic sensors, audio sensors (e.g., microphones), or the like. It will be apparent to those of skill in the art that the plurality of sensors 110 are not limited to sensors that facilitate autonomous driving. The plurality of sensors 110 may include any sensor that facilitates operation of the first vehicle 102a without deviating from the scope of the disclosure. For example, the plurality of sensors 110 may include a set of pressure sensors for measuring tire pressure, a set of limit sensors for operation of a sunroof, a set of temperature for measuring a cabin temperature, or the like.

The plurality of vehicular systems 112 may include various on-board vehicular systems that facilitate operation (e.g., autonomous driving) of the first vehicle 102a. The plurality of vehicular systems 112 may include, but are not limited to, an object detection and recognition system, a navigation system, a vehicle driving system, a power management system, or the like. The plurality of vehicular systems 112 may further include a battery management system, a traction control system, an anti-lock braking system, an air-conditioning system, or the like. The plurality of vehicular systems 112 may include other systems required for operation of the first vehicle 102a, without deviating from the scope of the disclosure.

The first processor 114 may include suitable logic, circuitry, interfaces, and/or code, executable by the circuitry, that may be configured to execute instructions stored in the first memory 116 to perform various operations to facilitate autonomous driving by the first vehicle 102a. In an embodiment, the first processor 114 may be configured to perform various operations to facilitate navigation planning for the first vehicle 102a. Further, the first processor 114 may be configured to train the neural network 122 to facilitate navigation planning for the first vehicle 102a. Further, the first processor 114 may be configured to execute the trained neural network 122 to facilitate navigation planning for the first vehicle 102a. Execution of the trained neural network 122 enables the first processor 114 to determine a topology or a road profile (e.g., manifold representation shown in FIG. 5) of the non-planar surface being traversed by the first vehicle 102a. The road profile of the non-planar surface may be indicative of various attributes of the non-planar surface such as, but not limited to, gradient of the non-planar surface, undulations (e.g., potholes) present in the non-planar surface, or the like. The execution of the trained neural network 122 further enables the first processor 114 to determine a trajectory to navigate the first vehicle 102a on the non-planar surface.

The first processor 114 may be implemented by one or more processors, such as, but not limited to, an application-specific integrated circuit (ASIC) processor, a reduced instruction set computing (RISC) processor, a complex instruction set computing (CISC) processor, and a field- programmable gate array (FPGA) processor. The first processor 114 may also correspond to central processing units (CPUs), graphics processing units (GPUs), network processing units (NPUs), digital signal processors (DSPs), or the like. It will be apparent to a person of ordinary skill in the art that the first processor 114 may be compatible with multiple operating systems.

The first memory 116 may include suitable logic, circuitry, and interfaces that may be configured to store, therein, sensor data from the plurality of sensors 110. The first memory 116 may further be configured to store one or more instructions, which when executed by the first processor 114 causes the first processor 114 to perform various operations for facilitating navigation planning of the first vehicle 102a.

Examples of the first memory 116 may include, but are not limited to, a random-access memory (RAM), a read only memory (ROM), a removable storage drive, a hard disk drive (HDD), a flash memory, a solid-state memory, or the like. It will be apparent to a person skilled in the art that the scope of the disclosure is not limited to realizing the first memory 116 in the first vehicle 102a, as described herein. In another embodiment, the first memory 116 may be realized in the form of a database or a cloud storage working in conjunction with the first processor 114, without deviating from the scope of the disclosure.

The controller 118 may include suitable logic, circuitry, interfaces, and/or code, executable by the circuitry, that may be configured to navigate the first vehicle 102a based on the trajectory determined by first processor 114. In a non-limiting example, a vehicle control unit (VCU) of the first vehicle 102a is the controller 118.1n another non-limiting example, an electronic control unit (ECU) of the first vehicle 102a is the controller 118.

The first network interface 120 may include suitable logic, circuitry, interfaces, and/or code, executable by the circuitry, that may be configured to enable the first vehicle 102a (e.g., the first processor 114) to communicate with the database server 104. The first network interface 120 may be implemented as a hardware, software, firmware, or a combination thereof. Examples of the first network interface 120 may include a network interface card, a physical port, a network interface device, an antenna, a radio frequency transceiver, a wireless transceiver, an Ethernet port, a universal serial bus (USB) port, or the like.

The neural network 122 may include suitable logic, circuitry, interfaces, and/or code, executable by the circuitry, that may be configured to output semantic information of a non-planar surface to be traversed by the first vehicle 102a and set of waypoints on the non-planar surface to be traversed by the first vehicle 102a based on inputs provided by the first processor 114. Hereinafter, the neural network 122 is referred to as “the first neural network 122” before and during the training of the neural network 122. Further, once the neural network 122 is trained, the neural network 122 is referred to as “the trained neural network 122”. The first neural network 122 may be trained by the first processor 114 based on the sensor inputs from the plurality of sensors 110. Training of the first neural network 122 is further described in conjunction with FIG. 2. The first neural network 122 may be implemented by one or more processors, such as, but not limited to, an ASIC processor, RISC processor, a CISC processor, and a FPGA processor. The one or more processors may also CPUs, GPUs, NPUs, DSPs, or the like.

In an embodiment, the semantic information may represent a plurality of objects present on the non-planar surface to be traversed by the first vehicle 102a. The plurality of objects may include a vehicle, a pedestrian, a tree, an animal, a traffic signal, or the like. In another embodiment, the sematic information may include, but is not limited to, details of portions of the road currently occupied by the first vehicle 102a, details of portions of the road occupied by other vehicles or traffic participants, details of portions of the road occupied by objects or obstacles, relative positions of occupied portions of the road with respect to a position of the first vehicle 102a, or the like.

The set of waypoints may include a first waypoint, a second waypoint, a third waypoint, and so on until an nth waypoint. Each waypoint of the set of waypoints represents a point on the non-planar surface. The first waypoint of the set of waypoints is referred to as a source waypoint and the nth waypoint of the set of waypoints is referred to as a target waypoint. The first processor 114 determines the trajectory to be followed by the first vehicle 102a on the non-planar surface based on a sequence of waypoints of the set of waypoints and the semantic information.

The first neural network 122 is trained by the first processor 114 to obtain a trained first neural network 122. The trained first neural network 122 is hereinafter referred to as “the trained neural network 122”. The trained neural network 122 may be configured to output the semantic information of the non-planar surface to be traversed by the first vehicle 102a and the set of waypoints on the non-planar surface based on inputs provided by the first processor 114. Execution of the trained neural network 122 by the first processor 114 is described in conjunction with FIG. 3.

The database server 104 may include suitable logic, circuitry, interfaces, and/or code, executable by the circuitry. For example, the database server 104 may be configured to store sensor data associated with the plurality of vehicles 102. The database server 104 may perform one or more database operations, such as receiving, storing, processing, and transmitting queries, data, information, messages, or content associated with the navigation planning to be provided to the first vehicle 102a. The database server 104 may be configured to provide sensor data to the first processor 114 for training the first neural network 122. The database server 104 may be implemented as a cloud-based server, a local database, a distributed database, a database management system (DBMS), or the like.

Though in FIG. 1A, it is shown that the navigation planning system 100 includes the database server 104 and the communication network 106, in another embodiment, the navigation planning system 100 may be inside the first vehicle 102a and thus excludes the database server 104 and the communication network 106, and includes the plurality of sensors 110, the first processor 114, the first memory 116, the controller 118, and the neural network 122.

FIG. IB is a block diagram that illustrates the navigation planning system 100 to facilitate navigation planning for vehicles, in accordance with another exemplary embodiment of the present disclosure. The navigation planning system 100 is shown to include the plurality of vehicles 102, the database server 104, the communication network 106, and an application server 124. The plurality of vehicles 102 is shown to include the first vehicle 102a. However, in an actual implementation, the plurality of vehicles 102 may include any number of vehicles, without deviating from the scope of the disclosure. For the sake of brevity, the plurality of vehicles 102 is shown to include the first vehicle 102a.

The first vehicle 102a is shown to include the plurality of sensors 110, the plurality of vehicular systems 112, the first processor 114, the first memory 116, the controller 118, and the first network interface 120. In the present disclosure, surface traversed by the first vehicle 102a is considered to be non-planar.

The first processor 114 may include suitable logic, circuitry, interfaces, and/or code, executable by the circuitry, that may be configured to execute instructions stored in the first memory 116 to perform various operations to facilitate autonomous driving by the first vehicle 102a. In an embodiment, the first processor 114 may be configured to perform various operations to facilitate navigation planning for the first vehicle 102a. The first processor 114 may be implemented as described in conjunction with FIG. 1A.

The first memory 116 may include suitable logic, circuitry, and interfaces that may be configured to store, therein, sensor data from the plurality of sensors 110. The first memory 116 may further be configured to store one or more instructions, which when executed by the first processor 114 causes the first processor 114 to perform various operations for facilitating navigation planning of the first vehicle 102a. Examples of the first memory 116 are similar as mentioned in conjunction with FIG. 1A.

The controller 118 may include suitable logic, circuitry, interfaces, and/or code, executable by the circuitry, that may be configured to navigate the first vehicle 102a. In a non-limiting example, a vehicle control unit (VCU) of the first vehicle 102a is the controller 118. In another non -limiting example, an electronic control unit (ECU) of the first vehicle 102a is the controller 118.

The first network interface 120 may include suitable logic, circuitry, interfaces, and/or code, executable by the circuitry, that may be configured to enable the first vehicle 102a (e.g., the first processor 114) to communicate with the application server 124. The first network interface 120 may be implemented as described in conjunction with FIG. 1A.

Operation of the plurality of sensors 110 and the plurality of vehicular systems 112 in FIG. IB is similar to the operations as described in conjunction with FIG. 1A. The application server 124 may include suitable logic, circuitry, interfaces, and/or code, executable by the circuitry, that may be configured to, facilitate navigation planning for the plurality of vehicles 102. In an embodiment, the application server 124 is configured to facilitate navigation planning for the first vehicle 102a. Examples of the application server 124 may include, but is not limited to, a personal computer, a laptop, a mini -computer, a mainframe computer, a cloud-based server, a network of computer systems, or a non-transient and tangible machine executing a machine-readable code. The application server 124 includes a second processor 126, a second memory 128, the neural network 122, and a second network interface 130.

The second processor 126 may include suitable logic, circuitry, interfaces, and/or code, executable by the circuitry, that may be configured to execute instructions stored in the second memory 128 to perform various operations to facilitate training of the neural network 122, determination of the trajectory to be traversed by the first vehicle 102a, or the like. The second processor 126 may be implemented by one or more processors, such as, but not limited to, an ASIC processor, RISC processor, a CISC processor, and an FPGA processor. The one or more processors may further refer to CPUs, GPUs, NPUs, DSPs, or the like. It will be apparent to a person of ordinary skill in the art that the second processor 126 may be compatible with multiple operating systems.

The second memory 128 may include suitable logic, circuitry, and interfaces that may be configured to store, therein, training datasets for training the neural network 122. The second memory 128 may further store one or more instructions, which when executed by the second processor 126 causes the second processor 126 to perform various operations for facilitating training of the neural network 122.

Examples of the second memory 128 may include, but are not limited to, a RAM, a ROM, a removable storage drive, an HDD, a flash memory, a solid-state memory, or the like. It will be apparent to a person skilled in the art that the scope of the disclosure is not limited to realizing the second memory 128 in the application server 124, as described herein. In another embodiment, the second memory 128 may be realized in the form of a database or a cloud storage working in conjunction with the second processor 126, without deviating from the scope of the disclosure.

The neural network 122 may include suitable logic, circuitry, interfaces, and/or code, executable by the circuitry, that may be configured to output semantic information of non -planar surfaces to be traversed by the plurality of vehicles 102 and set of waypoints on the non-planar surfaces to be traversed by the plurality of vehicles 102 based on inputs provided by the plurality of vehicles 102. In an embodiment, the neural network 122 may be configured to output the semantic information of the non-planar surface to be traversed by the first vehicle 102a and the set of waypoints on the non-planar surface to be traversed by the first vehicle 102a based on inputs provided by the second processor 126. The neural network 122 is referred to as “the first neural network 122” before and during the training of the neural network 122. Further, once the neural network 122 is trained, the neural network 122 is referred to as “the trained neural network 122”. The first neural network 122 may be trained by the second processor 126 based on the sensor inputs from the plurality of sensors 110. Training of the first neural network 122 is further described in conjunction with FIG. 3. The first neural network 122 may be implemented as described in conjunction with FIG. 1A.

The set of waypoints may include a first waypoint, a second waypoint, a third waypoint, and so on until an nth waypoint. Each waypoint of the set of waypoints represents a point on the non-planar surface. The first waypoint of the set of waypoints is referred to as a source waypoint and the nth waypoint of the set of waypoints is referred to as a target waypoint. The second processor 126 determines the trajectory to be followed by the first vehicle 102a on the non-planar surface based on a sequence of waypoints of the set of waypoints and the semantic information.

The first neural network 122 is trained by the second processor 126 to obtain the trained first neural network 122. The trained first neural network 122 is referred to as “the trained neural network 122”. The trained neural network 122 may be configured to output the semantic information of the non-planar surface to be traversed by the first vehicle 102a and the set of waypoints on the non- planar surface based on inputs provided by the second processor 126. Further, the second processor 126 may be configured to determine the trajectory to be traversed by the first vehicle 102a and provide the trajectory to the first vehicle 102a. The controller 118 is configured to navigate the first vehicle 102a based on the trajectory determined by the second processor 126. Execution of the trained neural network 122 by the second processor 126 is described in conjunction with FIG. 3. In a non-limiting example, it is assumed that first neural network 122 is trained at the application server 124, and the first processor 114 runs or executes a local version of the trained neural network 122. In another non-limiting example, the application server 124 provides the trained neural network 122 to the first vehicle 102a. Further, the first processor 114 executes the trained neural network 122 to determine the trajectory to be traversed by the first vehicle 102a. Further, the controller 118 navigates the first vehicle 102a based on the trajectory determined by the first processor 114.

The second network interface 130 may include suitable logic, circuitry, interfaces, and/or code, executable by the circuitry, that may be configured to enable the application server 124 to communicate with the first vehicle 102a. The second network interface 130 may be implemented as a hardware, software, firmware, or a combination thereof. Examples of the second network interface 130 may include a network interface card, a physical port, a network interface device, an antenna, a radio frequency transceiver, a wireless transceiver, an Ethernet port, a USB port, or the like.

In an embodiment, the second processor 126 may be configured to train the first neural network 122 based on sensor inputs received from the plurality of vehicles 102 to facilitate navigation planning for the plurality of vehicles 102. The second processor 126 trains the first neural network 122 to obtain the trained neural network 122 to facilitate navigation planning for the plurality of vehicles 102.

The database server 104 may include suitable logic, circuitry, interfaces, and/or code, executable by the circuitry. For example, the database server 104 may be configured to store sensor data associated with the plurality of vehicles 102. The database server 104 may perform one or more database operations, such as receiving, storing, processing, and transmitting queries, data, information, messages, or content associated with the navigation planning to be provided to each vehicle. The database server 104 may be configured to receive data from each vehicle or user devices associated with each vehicle of the plurality of vehicles 102. The database server 104 may be implemented as described in conjunction with FIG. 1A.

FIG. 2 is a block diagram 200 that illustrates training of the first neural network 122 of FIGS. 1A and IB, in accordance with an exemplary embodiment of the present disclosure. FIG. 2 is explained in conjunction with FIGS. 1A and IB. The first neural network 122 may include an encoder section 202, an attention field section 204, and a decoder section 206. In an embodiment, the first neural network 122 is present in the first vehicle 102a and is trained by the first processor 114. In another embodiment, the first neural network 122 is present in the application server 124 and is trained by the second processor 126.

In operation, when the first neural network 122 is present in the first vehicle 102a, the first processor 114 is configured to receive a first sensor dataset from the plurality of sensors 110. The first sensor dataset may include a first speed dataset 208 obtained from the set of velocity or speed sensors, a first set of images 210 obtained by the set of image sensors, and a first location dataset 214 obtained from the set of location detection sensors. A three-dimensional sensor (for example LiDAR) is embedded in the first vehicle 102a only during the training of the first neural network 122. The three-dimensional sensor is configured to obtain three-dimensional (3D) datapoints of the surroundings of the first vehicle 102a. Further, the 3D datapoints are fitted to a surface and a first set of control points 216 are extracted from the surface by the first processor 114. In a nonlimiting example, the 3D datapoints are fitted to a Bezier surface. The first set of control points 216 may be indicative of a lattice structure of the non -planar surface to be traversed by the first vehicle 102a.

The encoder section 202 may include a set of networks, a summation function, and a transformer (not shown). The first processor 114 is configured to provide the first set of images 210 to the set of networks of the encoder section 202. The set of networks is configured to extract feature representation of each image of the first set of images 210. In a non-limiting example, the set of networks correspond to ResNet. The feature representation of each image is representative of various features in a corresponding image. For example, the feature representation of each image may be indicative of one or more agents or traffic participants (e.g., other vehicles, humans, obstacles, or the like) present in a corresponding image. Further, feature representation may include or may be indicative of distance-related information associated with each of the one or more traffic participants. For example, the feature representation of an image may indicate that a human is “5” meters behind the first vehicle 102a and is walking towards the first vehicle 102a. Similarly, the feature representation of the image may further indicate that a motorist on a motorbike is “12” meters ahead of the first vehicle 102a and is traveling towards the first vehicle 102a.

Further, the first processor 114 provides the first speed dataset 208 and a first set of positional embeddings 212 as an input to the encoder section 202. The first set of positional embeddings 212 is initialized to zero. The first set of positional embeddings 212 is learnt during the training of the first neural network 122. The first speed dataset 208, the feature representations of the first set of images 210, and the first set of positional embeddings 212 may correspond to an n-dimensional space. The summation function is configured to generate a summation of the first speed dataset 208, the feature representations of the first set of images 210, and the first set of positional embeddings 212 and provide the summation to the transformer.

The transformer integrates features that correspond to speed of the first vehicle 102a, a position of the first vehicle 102a, and images of the surroundings of the first vehicle 102a. The integration enables interactions over large spatial regions and across the first speed dataset 208, the first set of images 210, and the first set of positional embeddings 212. An output of the transformer is a latent representation that is represented as “Z”, where “z” G “R(S*T*P)XC„, g j_{s a} number of image sensors, T corresponds to time, and “P” and “C” correspond to number of spatial features and feature dimensionality, respectively. An operation of the encoder section 202 is represented by equation (1) as shown below:

Encoder (E) : R^{SxTx WxHx} _{x R}_> _R(S*T*P)XC _{( t} where,

W and H corresponds to width and height of each image of the first set of images 210, respectively; and

R corresponds to a set of real numbers. The latent representation is provided as input to the attention field section 204. Further, the first processor 114 provides the first location dataset 214 as another input to the attention field section 204. The first location dataset 214 includes a query point “q”, where “q” G “R⁷” = (xi, yi, zi, t, X2, y₂, Z2). The query point “q” corresponds to the source location (e.g., “xi”, “yi”, “zi” in the vehicle coordinate space) and the target location (e.g., “X2”, “y?”, “22” in the vehicle coordinate space). The query point “q” is passed through an attention vector in the attention field section 204 to obtain an attention map. The attention map is multiplied with the latent representation to obtain a latent embedding. The attention map of the attention field section 204 is updated iteratively. For a first iteration “Zo” is initialized with a mean of “Z”, indicating uniform attention is used to start. Weights of the attention field section 204 is shared across “N” iterations. An operation of the attention field section 204 is represented by equation (2) as shown below:

A: R⁷X R^c— ► R(^S*^T*^P) ₍₂₎

A common attention field is required to capture a correlation between presence of the traffic participants on the road and a road profile of portion of the road on which each traffic participant is present. An output of the attention field section 204 is provided to the decoder section 206 as input. The latent embedding represented as Zi for an ith iteration is considered as the output of the attention field section 204.

Based on the latent embedding Zi, the decoder section 206 is configured to generate a grid of E*F control points that is referred to as a second set of control points. {P_c ^ef}, where “e” 6 “1... E” and “f” 6 “1... F” that govern a manifold representation (shown in FIG. 5) of the road and a surrounding environment of the first vehicle 102a, are extracted from a multi-layer perceptron (MLP). Each control point {P_c ^ef } G R³ based on the output of the attention field section 204. The second set of control points may be indicative of the lattice structure of the non-planar surface to be traversed by the first vehicle 102a. In other words, the second set of control points indicate the road profile or the topology (e.g., elevation, potholes, or the like) of the road. Control points are well known to those of skill in the art and therefore are not explained. Each control point of the second set of control points may be indicative of a point on the non-planar surface to be traversed by the first vehicle 102a (e.g., a point on the topology of the road). Each control point of the second set of control points corresponds to a point in a first coordinate space that is different from the “XY” coordinate space. The first coordinate space may be a two-dimensional coordinate space with a first coordinate “u” and a second coordinate “v”, such that 0 < u < 1 and 0 < v < 1. The first processor 114 provides the first set of control points 216 as an input to the decoder section 206. Further, in the decoder section 206, the first processor 114 determines a first loss value based on a comparison of the second set of control points with the first set of control points 216. The attention map of the attention field section 204 is updated iteratively until the first loss value is less than a threshold value. In a non-limiting example, the threshold value is zero.

The first processor 114 further provides the query point q of the first location dataset 214 as another input to the decoder section 206. Given “qp : (xi, yi, zi)” and “ { P_c ^ef } 6 R(^E*^F)^X3”, _a mapping of the first coordinate space to the vehicle coordinate space is governed by an equation (3) as shown below:

E “R³” and Bf (u). B^F (v)is a Bernstein basis function.

It will be apparent to a person of skill in the art that, any smooth basis function may be utilized instead of the Bernstein basis function in equation (3). Further, given “qp : (xi, yi, zi)”, a deep declarative network (DDN) layer of the decoder section 206 obtains reverse mapping “qp 6 B³— ► (u,v) G B³” by minimizing following equation (4).

Consequently, the decoder section 206 remaps the first coordinate space to isometric arch length space “(S_u, S_v) G B²” through the MLP to prevent distortion of metric properties. Further, the decoder section 206 predicts or determines the semantic information “Si G R^M”, where “M” corresponds to a number of semantic classes. Examples of a semantic class may include, but is not limited to a vehicle class, a pedestrian class, a road class, or the like. The decoder section 206 further predicts or determines a first waypoint of the set of waypoints “oi G B²” at each of the “N” iterations.

Based on the determined semantic information 218 and the set of waypoints 220, the first processor 114 may determine the trajectory to be followed by the first vehicle 102a to reach the target waypoint from the source waypoint. Based on the determined trajectory, the first processor 114 may communicate instructions to the controller 118 of the first vehicle 102a to navigate the first vehicle 102a in the determined trajectory. For each of the M semantic classes, K points are sampled in an edge aware manner to accurately capture the semantic information.

During the training of the first neural network 122, the first processor 114 may be configured to supervise the output of the first neural network 122 for each iteration to assist with convergence of optimization. The semantic information 218 and the set of waypoints 220 are supervised based on a comparison with an actual semantic information and an actual set of waypoints. The comparison of the semantic information 218 with the actual semantic information determines a second loss value, and the comparison of the set of waypoints 220 with the actual set of waypoints determines a third loss value. A final loss function is shown below in equation (5): otal = hl^l + h2^2 + h3^3 (5) where,

, L₂ , and L₃ corresponds to the first loss value, the second loss value, and the third loss value, respectively; and rp, r|₂, and p₃ are weight control factors that control relative importance of each loss term.

In another embodiment, when the first neural network 122 is present in the application server 124, the second processor 126 is configured to train the first neural network 122. The training of the first neural network 122 by the second processor 126 is similar to the training of the first neural network 122 by the first processor 114. In some embodiments, the second processor 126 trains the first neural network 122 to facilitate navigation planning for the plurality of vehicles 102.

FIG. 3 is a block diagram 300 that illustrates implementation of the trained neural network 122 of FIGS. 1A and IB, in accordance with an exemplary embodiment of the present disclosure. The trained neural network 122 is shown to include the encoder section 202, the attention field section 204, and the decoder section 206. FIG. 3 is explained in conjunction with FIGS. 1A and IB.

When the first vehicle 102a is to traverse on the non-planar surface, a second speed dataset 302, a second set of images 304, and a second location dataset 308 are obtained by the plurality of sensors 110. Further, the first processor 114 is configured to provide the second speed dataset 302, the second set of images 304, and a second set of positional embeddings 306 as input to the encoder section 202. The second set of positional embeddings 306 is learnt during the training of the first neural network 122. Further, the encoder section 202 is configured to generate the latent representation based on the received input. The latent representation and the second location dataset 308 are provided as input to the attention field section 204. Based on the training, the attention field section 204 generates the latent embedding based on the received input. Further, the latent embedding and the second location dataset 308 is provided as input to the decoder section 206. The decoder section 206 generates the semantic information 218 and the set of waypoints 220 based on the training and the received input.

The operation of the encoder section 202, the attention field section 204, and the decoder section 206 is similar to the operations as described in conjunction with FIG. 2.

In an embodiment, the first processor 114 may be configured to execute the trained neural network 122. The first processor 114 determines the trajectory to be followed by the first vehicle 102a on the non-planar surface based on the semantic information 218 and the set of waypoints 220. In another embodiment, the second processor 126 may be configured to execute the trained neural network 122. The second processor 126 determines the trajectory to be followed by the first vehicle 102a on the non-planar surface based on the semantic information 218 and the set of waypoints 220. The controller 118 navigates the first vehicle 102a based on the determined trajectory on the non-planar surface.

In another embodiment, the first neural network 122 may be trained by the second processor 126 in the application server 124. Further, the trained neural network 122 may be communicated to the first vehicle 102a, and the first processor 114 may be configured to execute (implement) the trained neural network 122 to facilitate navigation planning for the first vehicle 102a.

In yet another embodiment, the second processor 126 may be configured to determine the trajectory to be traversed by the first vehicle 102a based on the second speed dataset 302, the second set of images 304, the second set of positional embeddings 306, and the second location dataset 308. The second speed dataset 302, the second set of images 304, the second set of positional embeddings 306, and the second location dataset 308 may be received by the second processor 126 from the plurality of sensors 110. FIG. 4 is a diagram 400 that illustrates the non-planar surface to be traversed by the first vehicle 102a of FIGS. 1A and IB, in accordance with an exemplary embodiment of the present disclosure. The non-planar surface to be traversed by the first vehicle 102a is hereinafter referred to as “the non-planar surface 402”. FIG. 4 includes the source waypoint, hereinafter referred to as “the source waypoint 404”, which represents a position of the first vehicle 102a at time-instance “T”. FIG. 4 further includes the target waypoint, hereinafter referred to as “the target waypoint 406” which represents a finite point to be reached by the first vehicle 102a.

FIG. 5 is a diagram 500 that illustrates the trajectory to be traversed by the first vehicle 102a of FIGS. 1A and IB, in accordance with an exemplary embodiment of the present disclosure. The trajectory to be traversed by the first vehicle 102a is hereinafter referred to as “the trajectory 502” as illustrated in FIG. 5. FIG. 5 is further shown to include the manifold representation, hereinafter referred to as “the manifold representation 504”. The manifold representation 504 is an intermediate output of the first neural network 122 that is obtained during the training of the first neural network 122. The manifold representation 504 illustrates the non-planar surface 402 with gradients. In some embodiments, the manifold representation 504 may be utilized for navigation planning for the first vehicle 102a. Once the first neural network 122 is trained, the trajectory 502 is determined by one of the first processor 114 and the second processor 126 based on the semantic information 218 and the set of waypoints 220. The controller 118 navigates the first vehicle 102a along the trajectory 502 to reach the target waypoint 406 from the source waypoint 404.

FIGS. 6A and 6B collectively represent a flowchart 600 that illustrates a navigation planning method for the first vehicle 102a of FIGS. 1A and IB, in accordance with an exemplary embodiment of the present disclosure. The flowchart 600 is described in conjunction with FIGS. 1-3.

At step 602, the latent representation is generated based on the summation of feature representations of the second set of images 304 of the non-planar surface 402 to be traversed by the first vehicle 102a, the second speed dataset 302, and the second set of positional embeddings 306 that are associated with the first vehicle 102a, by the trained neural network 122. The first processor 114 is configured to receive the second set of images 304 and the second speed dataset 302 from the plurality of sensors 110. The encoder section 202 of the trained neural network 122 generates the latent representation.

At step 604, the latent embedding is generated based on the latent representation and the second location dataset 308 that are associated with the first vehicle 102a, by the trained neural network 122. The second location dataset 308 is received by the first processor 114 from the plurality of sensors 110. The attention field section 204 of the trained neural network 122 generates the latent embedding.

At step 606, the second set of control points are extracted from the latent embedding by the trained neural network 122. In the decoder section 206, the second set of control points is extracted from the MLP.

At step 608, the semantic information 218 and the set of waypoints 220 associated with non-planar surface 402 are generated, based on the second set of control points and the second location dataset 308, by the trained neural network 122. The decoder section 206 generates the semantic information 218 and the set of waypoints 220.

At step 610, the semantic information 218 and the set of waypoints 220 associated with the non- planar surface 402 from the trained neural network 122 are received by the first processor 114.

At step 612, the trajectory 502 to navigate the first vehicle 102a from the source waypoint 404 to the target waypoint 406 on the non-planar surface 402 is determined by the first processor 114, based on the semantic information 218 and the set of waypoints 220.

FIGS. 7A and 7B collectively represent a flowchart 700 that illustrates a method for training the first neural network 122 of FIGS. 1A and IB, in accordance with an exemplary embodiment of the present disclosure. The flowchart 700 is described in conjunction with FIGS. 1-2.

At step 702, the latent representation is generated based on the summation of feature representations of the first set of images 210 of the non-planar surface 402 to be traversed by the first vehicle 102a, the first speed dataset 208, and the first set of positional embeddings 212 that are associated with the first vehicle 102a, by the first neural network 122. The first processor 114 is configured to receive the first set of images 210 and the first speed dataset 208 from the plurality of sensors 110. The encoder section 202 of the first neural network 122 generates the latent representation.

At step 704, the latent embedding is generated based on the latent representation and the first location dataset 214 that are associated with the first vehicle 102a, by the first neural network 122. The first location dataset 214 is received by the first processor 114 from the plurality of sensors 110. The first processor 114 provides the first location dataset 214 to the first neural network 122. The attention field section 204 of the first neural network 122 generates the latent embedding.

At step 706, the second set of control points are extracted by the first neural network 122 from the latent embedding. In the decoder section 206, the second set of control points are extracted from the MLP.

At step 708, the first loss value is determined based on the comparison of the second set of control points with the first set of control points 216, by the first neural network 122. The first set of control points 216 is provided as the input by the first processor 114.

At step 710, if it is determined that the first loss value is less than the threshold value, the process proceeds to step 712, otherwise the process proceeds to step 704. The first processor 114 may be configured to determine whether the first loss value is less than the threshold value.

At step 712, the semantic information 218 and the set of waypoints 220 associated with 402 are generated by the first neural network 122, based on the second set of control points and the first location dataset 214. The decoder section 206 generates the semantic information 218 and the set of waypoints 220.

The disclosed navigation planning method and the navigation planning system (100) for vehicles encompass numerous advantages. The navigation planning system 100 enables accurate representation of road profiles and topologies, taking into account gradients and undulations (e.g., potholes) present in each road. The navigation planning system 100 considers the surface to be traversed by the vehicles as the non-planar surface 402. Thus, the trajectory 502 determined by the disclosed method and system is accurate. Increased accuracy in road representation facilitates improvements in navigation and behavior planning by autonomous vehicles (e.g., the first vehicle 102a), resulting in an enhanced experience for users availing the autonomous vehicles. In the disclosed method and system, the three-dimensional sensor (for example, LiDAR) is only required during the training and not during the implementation. This results in low implementation costs. Further, the disclosed system and method generates well resolved segmentation map without incurring high computation cost in comparison to conventional techniques. Techniques consistent with the disclosure provide, among other features, methods and systems to facilitate navigation planning for vehicles. While various exemplary embodiments of the disclosed systems and methods have been described above, it should be understood that they have been presented for purposes of example only, and not limitations. It is not exhaustive and does not limit the disclosure to the precise form disclosed. Modifications and variations are possible in light of the above teachings or may be acquired from practicing of the disclosure, without departing from the breadth or scope.

While various embodiments of the disclosure have been illustrated and described, it will be clear that the disclosure is not limited to these embodiments only. Numerous modifications, changes, variations, substitutions, and equivalents will be apparent to those skilled in the art, without departing from the spirit and scope of the disclosure, as described in the claims.

Claims

WE CLAIM

1. A navigation planning method comprising: receiving, by a processor, based on a first set of images of a non-planar surface to be traversed by a vehicle, from a trained neural network, an output that indicates semantic information and a set of waypoints associated with the non-planar surface; and determining, by the processor, a trajectory to navigate the vehicle from a source waypoint to a target waypoint on the non-planar surface, based on the output.

2. The navigation planning method as claimed in claim 1 , wherein the semantic information represents a plurality of objects present on the non-planar surface to be traversed by the vehicle.

3. The navigation planning method as claimed in claim 1, comprising: generating, by the trained neural network, a latent representation based on a summation of feature representations of the first set of images, a first speed dataset, and a first set of positional embeddings that are associated with the vehicle; generating, by the trained neural network, a latent embedding based on the latent representation and a first location dataset associated with the vehicle; extracting, by the trained neural network, a first set of control points from the latent embedding; and generating, by the trained neural network, the semantic information and the set of waypoints, based on the first set of control points and the first location dataset.

4. The navigation planning method as claimed in claim 1, comprising: training a first neural network based on a second set of images of the non-planar surface to be traversed by the vehicle, a second speed dataset, a second set of positional embeddings, a second location dataset, and a second set of control points that are associated with the vehicle, to obtain the trained neural network.

5. The navigation planning method as claimed in claim 4, wherein the second set of control points is based on the non-planar surface captured by a three-dimensional sensor of a plurality of

27 sensors of the vehicle, and wherein the three-dimensional sensor is embedded in the vehicle during the training of the first neural network.

6. The navigation planning method as claimed in claim 4, wherein the training of the first neural network comprises: generating, a latent representation based on a summation of feature representations of the second set of images, the second speed dataset, and the second set of positional embeddings; generating, a latent embedding based on the latent representation and the second location dataset, wherein an attention map of the latent embedding is updated, iteratively; extracting, a third set of control points from the latent embedding; determining, a loss value based on comparison of the third set of control points with the second set of control points, wherein the attention map of the latent embedding is updated until the loss value is below a threshold value; and generating, the semantic information and the set of waypoints, based on the third set of control points and the second location dataset.

7. The navigation planning method as claimed in claim 1, wherein the vehicle is navigated based on the trajectory determined by the processor, by a controller of the vehicle.

8. A navigation planning system for a vehicle, the navigation planning system comprising: a trained neural network; and a processor configured to: receive, based on a first set of images of a non-planar surface to be traversed by the vehicle, from the trained neural network, an output that indicates semantic information and a set of waypoints associated with the non-planar surface; and determine, a trajectory to navigate the vehicle from a source waypoint to a target waypoint on the non-planar surface, based on the output.

9. The navigation planning system as claimed in claim 8, wherein the semantic information represents a plurality of objects present on the non-planar surface to be traversed by the vehicle.

10. The navigation planning system as claimed in claim 8, wherein the trained neural network is configured to: generate a latent representation based on a summation of feature representations of the first set of images, a first speed dataset, and a first set of positional embeddings that are associated with the vehicle; generate, a latent embedding based on the latent representation and a first location dataset associated with the vehicle; extract, a first set of control points from the latent embedding; and generate, the semantic information and the set of waypoints, based on the first set of control points and the first location dataset.

11. The navigation planning system as claimed in claim 8, comprising a first neural network, wherein the processor is configured to train the first neural network based on a second set of images of the non-planar surface to be traversed by the vehicle, a second speed dataset, a second set of positional embeddings, a second location dataset, and a second set of control points that are associated with the vehicle, to obtain the trained neural network.

12. The navigation planning system as claimed in claim 11, wherein when the processor trains the first neural network, the first neural network is configured to: generate, a latent representation based on a summation of feature representations of the second set of images, the second speed dataset, and the second set of positional embeddings; generate, a latent embedding based on the latent representation and the second location dataset, wherein an attention map of the latent embedding is updated, iteratively; extract, a third set of control points from the latent embedding; determine, a loss value based on comparison of the third set of control points with the second set of control points, wherein the attention map of the latent embedding is updated until the loss value is below a threshold value; and generate, the semantic information and the set of waypoints, based on the third set of control points and the second location dataset.

13. The navigation planning system as claimed in claim 8, comprising a controller configured to navigate the vehicle based on the trajectory determined by the processor.

14. A navigation planning system for vehicles, comprising: a first neural network; and a processor configured to: train, the first neural network based on a first set of images of a non-planar surface to be traversed by a plurality of vehicles, to obtain a trained neural network; receive, from a target vehicle, a second set of images of the non-planar surface to be traversed by the target vehicle; receive, based on the second set of images, from the trained neural network, an output that indicates semantic information and a set of waypoints associated with the non-planar surface; and determine, a trajectory to navigate the target vehicle from a source waypoint to a target waypoint on the non-planar surface, based on the output.

15. The navigation planning system as claimed in claim 14, wherein the first neural network is trained based on a first speed dataset, a first set of positional embeddings, a first location dataset, and a first set of control points that are associated with the plurality of vehicles.

16. The navigation planning system as claimed in claim 15, wherein the first neural network is configured to: generate, a latent representation based on a summation of feature representations of the first set of images, the first speed dataset, and the first set of positional embeddings; generate, a latent embedding based on the latent representation and the first location dataset, wherein an attention map of the latent embedding is updated, iteratively; extract, a second set of control points from the latent embedding; determine, a loss value based on comparison of the second set of control points with the first set of control points, wherein the attention map of the latent embedding is updated until the loss value is below a threshold value; and generate, the semantic information and the set of waypoints, based on the second set of control points and the first location dataset.

17. The navigation planning system as claimed in claim 14, wherein the processor is configured to receive a second speed dataset, a second set of positional embeddings, and a second location dataset that are associated with the target vehicle, from the target vehicle.

18. The navigation planning system as claimed in claim 17, wherein the trained neural network is configured: generate a latent representation based on a summation of feature representations of the second set of images, the second speed dataset, and the second set of positional embeddings; generate, a latent embedding based on the latent representation and the second location dataset; extract, a first set of control points from the latent embedding; and generate, the semantic information and the set of waypoints, based on the first set of control points and the second location dataset.

31