WO2021006870A1

WO2021006870A1 - Vehicular autonomy-level functions

Info

Publication number: WO2021006870A1
Application number: PCT/US2019/040833
Authority: WO
Inventors: Ahsan Habib; Gael Kamdem DE TEYOU; Wei Su
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2019-07-08
Filing date: 2019-07-08
Publication date: 2021-01-14
Also published as: CN114127810A

Abstract

A computer-implemented method for performing autonomy-level functions associated with a moving vehicle is disclosed. The method includes extracting, using convolutional layers of a convolutional neural network, a plurality of features from a plurality of image frames obtained by a camera within the moving vehicle. A sequence of correlations among the plurality of features is generated using a first fully connected (FC) layer of the convolutional neural network. A dimensionality reduction of the sequence of correlations is performed, using a spatial long-short-term memory (LSTM), to generate a modified sequence of correlations. Road parameters estimation data associated with the moving vehicle is generated using a second FC layer of the convolutional neural network and based on the modified sequence of correlations. A lane keep assist system (LKAS) warning is provided based on the road parameters estimation data.

Description

VEHICULAR AUTONOMY-LEVEL FUNCTIONS

Cross-Reference to Related Applications

[0001] N/A

Technical Field

[0002] The present disclosure is related to performing automated functions in moving vehicles.

Background

[0003] With time, vehicles are getting“smarter” by achieving some level of autonomy (e.g., by incorporating perceptive sensor technology and artificial intelligence, or AI) and by cooperation with neighboring vehicles and infrastructure through vehicle-to- vehicle (V2V) or vehicle-to-infrastructure (V2I) communication. For example, vehicles on a highway communicate via V2V with each other so every vehicle on the road can be aware of nearby vehicles.

[0004] The SAE International is a globally active professional association and standards developing organization for engineering professionals in various industries, such as automotive and aerospace industries. The SAE classifies vehicular autonomy using six levels (Level 0 through Level 5). In Level 0 vehicles, the human driver controls everything independently, including steering, throttle, brakes, etc. In Level 1 vehicles, steering, accelerating, and braking are each controlled automatically but done separately. At present, there are significant number of vehicles operating on the road which include sensor technologies such as Lane Keep Assist System (LKAS) cameras. LKAS cameras extract lane geometry from the road lane marking and enable Level 1 vehicles to perform lane keep assist functionality such as Lane Departure Warning (LDW), Road Departure Warning (RDW), and Lane Centering (LC).

[0005] In Level 2 vehicles, several operations are done automatically and at the same time. For example, simultaneous steering and acceleration for a lane change can be performed automatically. As another example, General Motor’s “Super Cruise” hands-free highway driving system operates by utilizing a high precision map, such as a high definition (HD) map and high precision Global Positioning System (GPS) technology.

[0006] In Level 3 vehicles, vehicle drivers will be given more freedom to completely turn their attention away from the road under certain conditions. In other words, the vehicle driver will be able to hand over complete driving control to the vehicle, with the driver still being able to monitor various vehicular systems and intervene when desired. Safety-critical functions, under certain circumstances, can be shifted to the vehicle.

[0007] In Level 4 vehicles, no vehicular system monitoring is required by the driver. Level 4 vehicles are designed to operate safety-critical functions and monitor road conditions for an entire trip duration.

[0008] In Level 5 vehicles, drivers do not have to be fit to drive and do not even need to have a license. The Level 5 vehicle performs any and all driving tasks without human intervention. The Level 5 vehicle does not have a driver cockpit and everyone in the vehicle is a passenger. Levels 3-5 vehicles utilize perceptive sensor technology, such as Light Detection and Ranging (LIDAR) along with high precision localization technology, to delegate the driving functionality completely to the vehicle AI.

[0009] At present, an overwhelming majority of the vehicles operating on the roads are non-smart (e.g., Level 0) vehicles since they do not possess any of the high precision sensor technology required to perform the autonomous functions of higher level vehicles, such as LKAS for providing LDW, RDW, and LC. Even with the advent of smarter vehicles, the Level 0 vehicles are predicted to maintain their prevalence for a long period of time in the future because the smartness capabilities of higher level vehicles come with a heavy price tag due to the use of high precision sensors (e.g., high-precision LIDAR radars and GPS). Additionally, conventional localization techniques mostly rely on GPS technology which cannot meet the accuracy requirements for modern intelligent transport system.

Summary

[0010] Various examples are now described to introduce a selection of concepts in a simplified form, which are further described below in the detailed description. The Summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

[0011] According to a first aspect of the present disclosure, there is provided a computer-implemented method for performing autonomy-level functions associated with a moving vehicle. The method includes extracting, using convolutional layers of a convolutional neural network, a plurality of features from a plurality of image frames obtained by a camera within the moving vehicle. A sequence of correlations among the plurality of features is generated using a first fully connected (FC) layer of the convolutional neural network. A dimensionality reduction of the sequence of correlations is performed, using a spatial long-short-term memory (LSTM), to generate a modified sequence of correlations. Road parameters estimation data associated with the moving vehicle is generated, using a second FC layer of the convolutional neural network, based on the modified sequence of correlations.

A lane keep assist system (LKAS) warning is provided based on the road parameters estimation data.

[0012] In a first implementation form of the method according to the first aspect as such, one or more time series data patterns are detected in the road parameters estimation data using a temporal LSTM.

[0013] In a second implementation form of the method according to the first aspect as such or any preceding implementation form of the first aspect, the road parameters estimation data is modified, using the temporal LSTM, based on the detected one or more time series data patterns, to generate modified road parameters estimation data.

[0014] In a third implementation form of the method according to the first aspect as such or any preceding implementation form of the first aspect, the modified road parameters estimation data includes lane parameters information and lane contextual information.

[0015] In a fourth implementation form of the method according to the first aspect as such or any preceding implementation form of the first aspect, the lane parameters information includes one or more of the following: a lane marker heading angle, a lane marker offset, a lane marker curvature, a lane marker curvature derivative, and a lane marker type.

[0016] In a fifth implementation form of the method according to the first aspect as such or any preceding implementation form of the first aspect, the lane contextual information includes one or more of the following: a relative road distance indicating a distance traveled by the moving vehicle between two consecutive frames of the plurality of image frames, a lane number indication of a road lane the moving vehicle is traveling on, and a number of lanes associated with a road the moving vehicle is traveling on.

[0017] In a sixth implementation form of the method according to the first aspect as such or any preceding implementation form of the first aspect, geo-location information of the moving vehicle is extracted, using convolutional layers of a second convolutional neural network, using the plurality of image frames obtained by the camera.

[0018] In a seventh implementation form of the method according to the first aspect as such or any preceding implementation form of the first aspect, one or more spatial constraints are applied to the geo-location information using Bayesian filtering, to generate updated geolocation information of the moving vehicle. The one or more spatial constraints are based on the road parameters estimation data. The updated geolocation information of the moving vehicle is output.

[0019] According to a second aspect of the present disclosure, there is provided a system for performing autonomy- level functions associated with a moving vehicle. The system includes memory storing instructions and one or more processors in communication with the memory. The one or more processors execute the instructions to extract, using convolutional layers of a convolutional neural network, a plurality of features from a plurality of image frames obtained by a camera within the moving vehicle. A sequence of correlations among the plurality of features is generated, using a first fully connected (FC) layer of the convolutional neural network. A dimensionality reduction of the sequence of correlations is performed, using a spatial long- short-term memory (LSTM), to generate a modified sequence of correlations. Road parameters estimation data associated with the moving vehicle is generated, using a second FC layer of the convolutional neural network, based on the modified sequence of correlations. A lane keep assist system (LKAS) warning is provided based on the road parameters estimation data.

[0020] In a first implementation form of the system according to the second aspect as such, the one or more processors are further configured to execute the instructions to detect, using a temporal LSTM, one or more time series data patterns in the road parameters estimation data.

[0021] In a second implementation form of the system according to the second aspect as such or any preceding implementation form of the second aspect, the one or more processors are further configured to execute the instructions to modify, using the temporal LSTM, the road parameters estimation data based on the detected one or more time series data patterns, to generate modified road parameters estimation data.

[0022] In a third implementation form of the system according to the second aspect as such or any preceding implementation form of the second aspect, where the modified road parameters estimation data includes lane parameters information and lane contextual information.

[0023] In a fourth implementation form of the system according to the second aspect as such or any preceding implementation form of the second aspect, where the lane parameters information includes one or more of the following: a lane marker heading angle, a lane marker offset, a lane marker curvature, a lane marker curvature derivative, and a lane marker type.

[0024] In a fifth implementation form of the system according to the second aspect as such or any preceding implementation form of the second aspect, where the lane contextual information includes one or more of the following: a relative road distance indicating a distance traveled by the moving vehicle between two consecutive frames of the plurality of image frames, a lane number indication of a road lane the moving vehicle is traveling on, and a number of lanes associated with a road the moving vehicle is traveling on.

[0025] In a sixth implementation form of the system according to the second aspect as such or any preceding implementation form of the second aspect, the one or more processors are further configured to extract, using convolutional layers of a second convolutional neural network, geo-location information of the moving vehicle using the plurality of image frames obtained by the camera.

[0026] In a seventh implementation form of the system according to the second aspect as such or any preceding implementation form of the second aspect, the one or more processors are further configured to apply, using Bayesian filtering, one or more spatial constraints to the geo-location information to generate updated geolocation information of the moving vehicle. The one or more spatial constraints are based on the road parameters estimation data. The updated geolocation information of the moving vehicle is output.

[0027] According to a third aspect of the present disclosure, there is provided a non-transitory computer-readable medium storing instructions for performing autonomy-level functions associated with a moving vehicle. When executed by one or more processors of a computing device, the instructions cause the one or more processors to extract, using convolutional layers of a convolutional neural network, a plurality of features from a plurality of image frames obtained by a camera within the moving vehicle. A sequence of correlations among the plurality of features is generated, using a first fully connected (FC) layer of the convolutional neural network. A dimensionality reduction of the sequence of correlations is performed, using a spatial long- short-term memory (LSTM), to generate a modified sequence of correlations. Road parameters estimation data associated with the moving vehicle is generated, using a second FC layer of the convolutional neural network, based on the modified sequence of correlations. A lane keep assist system (LKAS) warning is provided based on the road parameters estimation data.

[0028] In a first implementation form of the non-transitory computer- readable medium according to the third aspect as such, the instructions further cause the one or more processors to detect one or more time series data patterns in the road parameters estimation data using a temporal LSTM.

[0029] In a second implementation form of the non-transitory computer- readable medium according to the third aspect as such or any preceding implementation form of the third aspect, the instructions further cause the one or more processors to modify, using the temporal LSTM, the road parameters estimation data based on the detected one or more time series data patterns, to generate modified road parameters estimation data.

[0030] In a third implementation form of the non-transitory computer- readable medium according to the third aspect as such or any preceding implementation form of the third aspect, the modified road parameters estimation data includes lane parameters information and lane contextual information.

[0031] In a fourth implementation form of the non-transitory computer- readable medium according to the third aspect as such or any preceding implementation form of the third aspect, the lane parameters information includes one or more of the following: a lane marker heading angle, a lane marker offset, a lane marker curvature, a lane marker curvature derivative, and a lane marker type.

[0032] In a fifth implementation form of the non-transitory computer- readable medium according to the third aspect as such or any preceding implementation form of the third aspect, the lane contextual information includes one or more of the following: a relative road distance indicating a distance traveled by the moving vehicle between two consecutive frames of the plurality of image frames, a lane number indication of a road lane the moving vehicle is traveling on, and a number of lanes associated with a road the moving vehicle is traveling on.

[0033] In a sixth implementation form of the non-transitory computer- readable medium according to the third aspect as such or any preceding implementation form of the third aspect, the instructions further cause the one or more processors to extract, using convolutional layers of a second convolutional neural network, geo-location information of the moving vehicle using the plurality of image frames obtained by the camera.

[0034] In a seventh implementation form of the non-transitory computer- readable medium according to the third aspect as such or any preceding implementation form of the third aspect, the instructions further cause the one or more processors to apply using Bayesian filtering, one or more spatial constraints to the geo-location information to generate updated geolocation information of the moving vehicle. The one or more spatial constraints are based on the road parameters estimation data. The updated geolocation information of the moving vehicle is output.

[0035] According to a fourth aspect of the present disclosure, there is provided a system for performing autonomy- level functions associated with a moving vehicle. The system includes an extracting means for performing extracting of a plurality of features from a plurality of image frames obtained by a camera within the moving vehicle. The system includes a correlating means for generating a sequence of correlations among the plurality of features. The system includes a reduction means for performing a dimensionality reduction of the sequence of correlations to generate a modified sequence of correlations.

The system includes an estimating means for generating road parameters estimation data associated with the moving vehicle, based on the modified sequence of correlations. The system includes a notification means for providing a lane keep assist system (LKAS) warning based on the road parameters estimation data.

[0036] Any of the foregoing examples may be combined with any one or more of the other foregoing examples to create a new embodiment within the scope of the present disclosure.

Brief Description of the Drawings

[0037] In the drawings, which are not necessarily drawn to scale, like numerals may describe similar components in different views. The drawings illustrate generally, by way of example, but not by way of limitation, various embodiments discussed in the present document.

[0038] FIG. 1 is a block diagram illustrating the training of a deep learning (DL) model using a DL architecture (DLA), according to some example embodiments.

[0039] FIG. 2 is a diagram illustrating generation of a trained DL model using a neural network model trained within a DLA, according to some example embodiments.

[0040] FIG. 3 illustrates various SAE autonomy levels and using techniques disclosed herein to enable a lower SAE autonomy level vehicle to perform higher SAE autonomy level functions using a computing device, according to some example embodiments.

[0041] FIG. 4 illustrates various lane parameters information which can be used in connection with some example embodiments.

[0042] FIG. 5 illustrates lane contextual information which can be used in connection with some example embodiments.

[0043] FIG. 6 illustrates a FKAS module for estimating lane parameters information and lane contextual information, according to some example embodiments.

[0044] FIG. 7 illustrates a road parameters estimator sub-network

(RPESN) used in the FKAS module of FIG. 6, according to some example embodiments.

[0045] FIG. 8 illustrates a road parameters estimator network (RPEN) used in the FKAS module of FIG. 6, according to some example embodiments.

[0046] FIG. 9A - FIG. 9D illustrate multiple image frames used for determining RPEN odometry as part of the lane contextual information, according to some example embodiments.

[0047] FIG. 10 illustrates examples of continuous pose estimation via position tracking and place recognition, according to some example embodiments.

[0048] FIG. 11 illustrates an example feature-sparse environment which is used in connection with continuous pose estimation techniques, according to some example embodiments.

[0049] FIG. 12 illustrates a block diagram of a localization module for high precision localization, according to some example embodiments.

[0050] FIG. 13 illustrates an example of high precision localization with lane accuracy using the localization module of FIG. 12, according to some example embodiments.

[0051] FIG. 14 illustrates example suburban and urban environments where the localization module of FIG. 12 can be used, according to some example embodiments. [0052] FIG. 15 is a flowchart of a method for performing autonomy-level functions associated with a moving vehicle, according to some example embodiments.

[0053] FIG. 16 is a block diagram illustrating a representative software architecture, which may be used in conjunction with various device hardware described herein, according to some example embodiments.

[0054] FIG. 17 is a block diagram illustrating circuitry for a device that implements algorithms and performs methods, according to some example embodiments.

Detailed Description

[0055] It should be understood at the outset that although an illustrative implementation of one or more embodiments is provided below, the disclosed systems and methods described with respect to FIGS. 1-17 may be implemented using any number of techniques, whether currently known or not yet in existence. The disclosure should in no way be limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.

[0056] The present disclosure is related to enabling SAE International

(originally the Society of Automotive Engineers) vehicular autonomy level functions in vehicles.

[0057] In the following description, reference is made to the

accompanying drawings that form a part hereof, and in which are shown, by way of illustration, specific embodiments which may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosed subject matter, and it is to be understood that other embodiments may be utilized, and that structural, logical, and electrical changes may be made without departing from the scope of the present disclosure. The following description of example embodiments is, therefore, not to be taken in a limiting sense, and the scope of the present disclosure is defined by the appended claims. [0058] As used herein, the terms“forward computation” and“backward computation” refer to computations performed at a worker machine in connection with the training of a neural network model (or another type of model). The computations performed during forward and backward

computations modify weights based on results from prior iterations (e.g., based on gradients generated at a conclusion of a prior backward computation). A gradient is a measurement of how much the output of a worker machine changes per change to the weights of the model that the worker machine is computing. A gradient measures a change in all weights with regard to the change in error.

The larger the gradient value, the faster a model can learn.

[0059] As used herein, the term LKAS refers to a set of assistive systems that help the driver to keep the vehicle in the appropriate lane. Example of LKAS functions are LDW, RDW, LC, and so forth. LDW is a warning system that conveys a warning to the driver when the vehicle is deviating a lane without proper signal notification. RDW (also known as Road Departure Mitigation, or RDM) is a warning system that conveys a warning to the driver when the vehicle departs a boundary of the road. LC is a mechanism designed to keep a moving vehicle centered in the lane, relieving the driver of the task of steering.

[0060] As used herein, the term“High Definition map” (or HD map) refers to a category of maps built for self-driving purposes in connection with higher SAE autonomy level vehicles. The HD maps are characterized by extremely high precision such as centimeter-level accuracy. HD maps contain information such as where the lanes are, where the road boundaries are, where the curbs are and how high the curbs are, where the traffic signs and road markers are located, and so forth.

[0061] As used herein, the term“lane line map” refers to a category of maps including geo-referenced lane lines of the road. A lane line map is a subset of HD maps which includes the lane line information only.

[0062] As used herein, the term“Level x vehicle” (where x is an integer between 0 and 5) refers to the SAE’s vehicle autonomy levels.

[0063] As used herein, the term“inertial measurement unit” (or IMU) refers to a device that has accelerometer, gyroscope, and magnetometer incorporated together. The IMU is a device designed to track the position and orientation of an object and is embedded in most modern computing devices such as smartphones.

[0064] Techniques disclosed herein can be used to enable the Level 0 vehicles to perform higher-level functionalities (e.g., functionalities of Level 1 or Level 2 vehicles) without incurring the additional cost due to high-precision sensors used by the higher level vehicles. This can be achieved by utilizing the sensing capability of a computing device (e.g., a smartphone device), which is widely available to users.

[0065] In the proposed techniques, sensors of the computing device (e.g., camera, IMU, and GPS) extract lane information of the road (e.g., road parameters) using a neural network (NN), thereby enabling LKAS features on the vehicle without the additional cost of high-precision equipment. A training database is used which includes a sequence of images of the road from the driver's perspective recorded while driving along a route and the corresponding consumer graded GPS data, industrial graded GPS data, and a lane line map of their route. Road parameters ground truth (which are the true values of the road parameters that are associated with the images) can be obtained from the lane line map and the high precision geolocation provided by the industrial graded GPS data. The dataset consists of a wide variety of road geometry, a viewpoint from driver’ s perspective, weather, and different time of the day. A deep NN, called Road Parameters Estimator Network (RPEN), can then be used to train with input data (such as the video frames) and with corresponding output data (such as the ground truth for the road parameters associated with the video frames). The trained NN is capable of generalizing, so it learns to detect the lane lines and extract road parameters for roads it has never seen before. This trained NN can later be used by level 0 vehicles using a computing device camera as an input to generate road parameters estimation data, including lane parameters information and lane contextual information for output. The lane parameters information can include a lane marker heading angle, a lane marker offset, a lane marker curvature, a lane marker curvature derivative, and a lane marker type.

The lane parameters information is used by vehicles to perform LKAS functionalities, such as provide LDW and RDW warning notifications which can be conveyed to the vehicle driver via a display or other notification means.

[0066] Techniques disclosed herein can also be used to provide accurate vehicle localization (e.g., lane level localization). The same training database and NN for enabling the LKAS functionalities are used to provide road parameters estimation data, including lane contextual information. A second NN is used (e.g., a visual-based re-localization algorithm), which can predict accurate geo-locations (e.g., vehicle latitude and longitude) using a sequence of images of the road from the driver's perspective and coarse GPS data from a computing device (e.g., a smartphone used by the driver). A sensor fusion technique is used to combine the result of both NNs to provide a lane- level accuracy of the vehicle’s current location. In this regard, the level 0 vehicle driver can use the sensing capability of a computing device (e.g., a smartphone) to provide accurate localization of the vehicle.

[0067] Prior art techniques perform higher level vehicular autonomy functions using cost-prohibitive high-precision radars and GPS systems. In comparison, techniques disclosed herein can use neural networks executing on a consumer device (such as a smartphone), together with the device sensor capabilities (e.g., camera, IMU, GPS) to provide sensory functions for improving the SAE autonomy level of a vehicle, such as by enabling LKAS functions (e.g., LDW and RDW notification) and providing lane-level vehicle localization.

[0068] Techniques disclosed herein estimate novel lane contextual information such as the lane number a vehicle is located on (also referred to as “egoLaneNo”), a number of lanes associated with the road the vehicle is traveling on (also referred to as“noOfLanes”) and relative road distance traveled by a vehicle between two consecutive image frames (also referred to as RPEN odometry or“relativeRoadDistance”), which are not estimated by existing navigation systems. Additionally, disclosed techniques are used for improving geo-location accuracy of the existing positioning system of a vehicle by imposing a spatial constraint on the vehicle probable location based on the real time extraction of lane contextual information, such as the egoLaneNo, along with other lane parameters information and lane contextual information. [0069] FIG. 1 is a block diagram 100 illustrating the training of deep learning (DL) model to generate a trained DL model 110 using a DL architecture (DLA), according to some example embodiments. In some example embodiments, machine-learning programs (MLPs), including deep learning programs, also collectively referred to as machine- learning algorithms or tools, are utilized to perform operations associated with correlating data or other artificial intelligence (Al)-based functions.

[0070] As illustrated in FIG. 1, deep learning model training 108 is performed within the DLA 106 based on training data 102 (which can include features). During the deep learning model training 108, features from the training data 102 can be assessed for purposes of further training of the DL model. The DL model training 108 results in a trained DL model 110. The trained DL model 110 can include one or more classifiers 112 that can be used to provide DL assessments 116 based on new data 114.

[0071] In some aspects, the training data 102 can include input data 103, such as image data taken from a driver’s perspective, GPS data (e.g., geo location information associated with the image data), and lane line map data (e.g., lane line map geo- location data associated with the image data). The training data 102 can also include road parameters ground truth data corresponding to the input data 103. The input data 103 and the output data 105 are used during the DL model training 108 to train the DL model 110. In this regard, the trained DL model 110 receives new data 114 (e.g., images are taken from a driver’s perspective using a computing device such as a smartphone), detects the lane lines, and extract road parameters information (e.g., lane parameters information and lane contextual information) for roads it has never seen before, using the new data 114. The extracted road parameters information (e.g., output as the DL assessments 116) is used to provide LKAS-related functions (such as LDW/RDW notifications and lane centering) as well as lane- level vehicle localization information as discussed herein.

[0072] Deep learning is part of machine learning, a field of study that gives computers the ability to learn without being explicitly programmed.

Machine learning explores the study and construction of algorithms, also referred to herein as tools, that may learn from existing data, may correlate data, and may make predictions about new data. Such machine learning tools operate by building a model from example training data (e.g., the training data 102) in order to make data-driven predictions or decisions expressed as outputs or assessments 116. Although example embodiments are presented with respect to a few machine-learning tools (e.g., a deep learning architecture), the principles presented herein may be applied to other machine learning tools.

[0073] In some example embodiments, different machine learning tools may be used. For example, Logistic Regression, Naive-Bayes, Random Forest, neural networks, matrix factorization, and Support Vector Machines tools may be used during the deep learning model training 108 (e.g., for correlating the training data 102).

[0074] Two common types of problems in machine learning are classification problems and regression problems. Classification problems, also referred to as categorization problems, aim at classifying items into one of several category values (for example, is this object an apple or an orange?). Regression algorithms aim at quantifying some items (for example, by providing a value that is a real number). In some embodiments, the DLA 106 can be configured to use machine learning algorithms that utilize the training data 102 to find correlations among identified features that affect the outcome.

[0075] The machine learning algorithms utilize features from the training data 102 for analyzing the new data 114 to generate the assessments 116. The features include individual measurable properties of a phenomenon being observed and used for training the machine learning model. The concept of a feature is related to that of an explanatory variable used in statistical techniques such as linear regression. Choosing informative, discriminating, and independent features are important for the effective operation of the MLP in pattern recognition, classification, and regression. Features may be of different types, such as numeric features, strings, and graphs. In some aspects, training data can be of different types, with the features being numeric for use by a computing device.

[0076] In some aspects, the features used during the DL model training

108 can include the input data 103, the output data 105, as well as one or more of the following: sensor data from a plurality of sensors (e.g., audio, motion, GPS, image sensors); actuator event data from a plurality of actuators (e.g., wireless switches or other actuators); external information from a plurality of external sources; timer data associated with the sensor state data (e.g., time sensor data is obtained), the actuator event data, or the external information source data; user communications information; user data; user behavior data, and so forth.

[0077] The machine learning algorithms utilize the training data 102 to find correlations among the identified features that affect the outcome of assessments 116. In some example embodiments, the training data 102 includes labeled data, which is known data for one or more identified features and one or more outcomes. With the training data 102 (which can include identified features), the DL model is trained using the DL model training 108 within the DLA 106. The result of the training is the trained DL model 110. When the DL model 110 is used to perform an assessment, new data 114 is provided as an input to the trained DL model 110, and the DL model 110 generates the assessments 116 as an output. For example, the DLA 106 can be deployed at a mobile device and the new data 114 can include LR images (e.g., frames from an LR video such as a real-time LR video feed). The DLA 106 performs UR functions (e.g., increasing image resolution while reducing noise, removing blocking artifacts, and boosting image contrast) on the LR images to generate HR output images in real time.

[0078] FIG. 2 is a diagram 200 illustrating the generation of a trained DL model 206 using a neural network model 204 trained within a DLA 106, according to some example embodiments. Referring to FIG. 2, source data 202 is analyzed by a neural network model 204 (or another type of a machine learning algorithm or technique) to generate the trained DL model 206 (which can be the same as the trained DL model 110). The source data 202 includes a training set of data, such as 102 (which includes the input data 103 and the output data 105), including data identified by one or more features. As used herein, the terms“neural network” and“neural network model” are

interchangeable.

[0079] Machine learning techniques train models to accurately make predictions on data fed into the models (e.g., what was said by a user in a given utterance; whether a noun is a person, place, or thing; what the weather will be like tomorrow). During a learning phase, the models are developed against a training dataset of inputs to optimize the models to correctly predict the target output for a given input. Generally, the learning phase may be supervised, semi- supervised, or unsupervised; indicating a decreasing level to which the“correct” outputs are provided in correspondence to the training inputs. In a supervised learning phase, all of the target outputs are provided to the model and the model is directed to develop a general rule or algorithm that maps the input to the output. In contrast, in an unsupervised learning phase, the desired output is not provided for the inputs so that the model may develop its own rules to discover relationships within the training dataset. In a semi- supervised learning phase, an incompletely labeled training set is provided, with some of the outputs known and some unknown for the training dataset.

[0080] Models may be run against a training dataset for several epochs, in which the training dataset is repeatedly fed into the model to refine its results (i.e., the entire dataset is processed during an epoch). During an iteration, the model (e.g., a neural network model or another type of machine learning model) is run against a mini-batch (or a portion) of the entire dataset. In a supervised learning phase, a model is developed to predict the target output for a given set of inputs (e.g., source data 202) and is evaluated over several epochs to more reliably provide the output that is specified as corresponding to the given input for the greatest number of inputs for the training dataset. In another example, for an unsupervised learning phase, a model is developed to cluster the dataset into n groups and is evaluated over several epochs as to how consistently it places a given input into a given group and how reliably it produces the n desired clusters across each epoch.

[0081] Once an epoch is run, the models are evaluated, and the values of their variables (e.g., weights, biases, or other parameters) are adjusted to attempt to better refine the model in an iterative fashion. As used herein, the term “weights” is used to refer to the parameters used by a machine learning model. During a backward computation, a model can output gradients, which can be used for updating weights associated with a forward computation.

[0082] In various aspects, the evaluations are biased against false negatives, biased against false positives, or evenly biased with respect to the overall accuracy of the model. The values may be adjusted in several ways depending on the machine learning technique used. For example, in a genetic or evolutionary algorithm, the values for the models that are most successful in predicting the desired outputs are used to develop values for models to use during the subsequent epoch, which may include random variation/mutation to provide additional data points. One of ordinary skill in the art will be familiar with several other machine learning algorithms that may be applied with the present disclosure, including linear regression, random forests, decision tree learning, neural networks, deep neural networks, etc.

[0083] Each model develops a rule or algorithm over several epochs by varying the values of one or more variables affecting the inputs to more closely map to the desired result, but as the training dataset may be varied, and is preferably very large, perfect accuracy and precision may not be achievable. A number of epochs that make up a learning phase, therefore, may be set as a given number of trials or a fixed time/computing budget, or may be terminated before that number/budget is reached when the accuracy of a given model is high enough or low enough or an accuracy plateau has been reached. For example, if the training phase is designed to run n epochs and produce a model with at least 95% accuracy, and such a model is produced before the n^th epoch, the learning phase may end early and use the produced model satisfying the end-goal accuracy threshold. Similarly, if a given model is inaccurate enough to satisfy a random chance threshold (e.g., the model is only 55% accurate in determining true/false outputs for given inputs), the learning phase for that model may be terminated early, although other models in the learning phase may continue training. Similarly, when a given model continues to provide similar accuracy or vacillate in its results across multiple epochs - having reached a performance plateau - the learning phase for the given model may terminate before the epoch number/computing budget is reached.

[0084] Once the learning phase is complete, the models are finalized. In some example embodiments, models that are finalized are evaluated against testing criteria. In a first example, a testing dataset that includes known outputs for its inputs is fed into the finalized models to determine an accuracy of the model in handling data that has not been trained on. In a second example, a false positive rate or false negative rate may be used to evaluate the models after finalization. In a third example, a delineation between data clusters in each model is used to select a model that produces the clearest bounds for its clusters of data.

[0085] In some example embodiments, the DL model 206 is trained by a neural network model 204 (e.g., deep learning, deep convolutional, or recurrent neural network), which comprises a series of“neurons,” such as Long Short Term Memory (LSTM) nodes, arranged into a network. A neuron is an architectural element used in data processing and artificial intelligence, particularly machine learning, that includes memory that may determine when to “remember” and when to“forget” values held in that memory based on the weights of inputs provided to the given neuron. Each of the neurons used herein is configured to accept a predefined number of inputs from other neurons in the network to provide relational and sub -relational outputs for the content of the frames being analyzed. Individual neurons may be chained together or organized into tree structures in various configurations of neural networks to provide interactions and relationship learning modeling for how each of the frames in an utterance is related to one another.

[0086] For example, an LSTM serving as a neuron includes several gates to handle input vectors (e.g., phonemes from an utterance), a memory cell, and an output vector (e.g., contextual representation). The input gate and output gate control the information flowing into and out of the memory cell, respectively, whereas forget gates optionally remove information from the memory cell based on the inputs from linked cells earlier in the neural network. Weights and bias vectors for the various gates are adjusted over the course of a training phase, and once the training phase is complete, those weights and biases are finalized for normal operation. One of skill in the art will appreciate that neurons and neural networks may be constructed programmatically (e.g., via software instructions) or via specialized hardware linking each neuron to form the neural network.

[0087] Neural networks utilize features for analyzing the data to generate assessments (e.g., recognize units of speech). A feature is an individual measurable property of a phenomenon being observed. The concept of the feature is related to that of an explanatory variable used in statistical techniques such as linear regression. Further, deep features represent the output of nodes in hidden layers of the deep neural network.

[0088] A neural network (e.g., the neural network model 204), sometimes referred to as an artificial neural network (ANN) or a neural network model, is a computing system based on consideration of biological neural networks of animal brains. Such systems progressively improve performance, which is referred to as learning, to perform tasks, typically without task-specific programming. For example, in image recognition, a neural network may be taught to identify images that contain an object by analyzing example images that have been tagged with a name for the object and, having learned the object and name, may use the analytic results to identify the object in untagged images. A neural network is based on a collection of connected units called neurons, where each connection between neurons, called a synapse, can transmit a unidirectional signal with an activating strength that varies with the strength of the connection. The receiving neuron can activate and propagate a signal to downstream neurons connected to it, typically based on whether the combined incoming signals, which are from potentially many transmitting neurons, are of sufficient strength, where strength is a parameter.

[0089] A deep neural network (DNN), also referred to as a convolutional neural network (CNN), is a stacked neural network, which is composed of multiple convolutional layers. The layers are composed of nodes, which are locations where computation occurs, loosely patterned on a neuron in the human brain, which fires when it encounters sufficient stimuli. A node combines input from the data with a set of coefficients, or weights, that either amplify or dampen that input, which assigns significance to inputs for the task the algorithm is trying to learn. These input-weight products are summed, and the sum is passed through what is called a node’s activation function, to determine whether and to what extent that signal progresses further through the network to affect the ultimate outcome. A DNN uses a cascade of many layers of non-linear processing units for feature extraction and transformation. Each successive layer uses the output from the previous layer as input. Higher-level features are derived from lower-level features to form a hierarchical representation. The layers following the input layer may be convolution layers that produce feature maps that are filtering results of the inputs and are used by the next convolution layer.

[0090] In training of a DNN architecture, a regression, which is structured as a set of statistical processes for estimating the relationships among variables, can include minimization of a cost function. The cost function may be implemented as a function to return a number representing how well the neural network performed in mapping training examples to correct output. In training, if the cost function value is not within a predetermined range, based on the known training images, backpropagation is used, where backpropagation is a common method of training artificial neural networks that are used with an optimization method such as stochastic gradient descent (SGD) method.

[0091] Use of backpropagation can include propagation and weight updates. When an input is presented to the neural network, it is propagated forward through the neural network, layer by layer, until it reaches the output layer. The output of the neural network is then compared to the desired output, using the cost function, and an error value is calculated for each of the nodes in the output layer. The error values are propagated backward, starting from the output, until each node has an associated error value that roughly represents its contribution to the original output. Backpropagation can use these error values to calculate the gradient of the cost function with respect to the weights in the neural network. The calculated gradient is fed to the selected optimization method to update the weights to attempt to minimize the cost function.

[0092] Even though the training architecture 106 is referred to as a deep learning architecture using a neural network model (and the model that is trained is referred to as a trained deep learning model, such as the trained DL models 110 and 206), the disclosure is not limited in this regard and other types of machine learning training architectures may also be used for model training, using the techniques disclosed herein.

[0093] FIG. 3 illustrates a diagram 300 of various SAE autonomy levels and using techniques disclosed herein to enable a lower SAE autonomy level vehicle to perform higher SAE autonomy level functions using a computing device, according to some example embodiments. Referring to FIG. 3, diagram 300 illustrates vehicles 302, 304, 306, ..., 308 associated with SAE autonomy levels 0, 1 , 2, ..., 5, respectively. Vehicle 304 performs level 1 functions such as providing a lane departure warning 310. Vehicle 306 performs level 2 functions such as lane centering 316. Vehicle 308 performs level 5 functions such as self driving 322.

[0094] As illustrated in FIG. 3, the lane departure warning 310 can be based on road parameters information such as lane line information 314 detected by a high precision LKAS camera 312. The lane centering 316 and the self driving 322 functions use high precision localization 320, which is performed using a high precision GPS 318 and a high precision radar or LIDAR, such as LIDAR 324, that are present on higher-level vehicles.

[0095] In some aspects, a computing device 330 (such as a smartphone or another consumer device) can be used within the level 0 vehicle 302 to perform technique 326 (e.g., for determining road parameters estimation data) and technique 328 (e.g., for determining lane level vehicle localization) as discussed herein. In this regard, by using the computing device 330 to perform techniques 326 and 328, the level 0 vehicle 302 can be upgraded to a higher level vehicle 332 without the need of costly high precision equipment such as the LKAS camera 312, the high precision GPS 318, and the LIDAR 324.

[0096] FIG. 4 illustrates various lane parameters information which can be used in connection with some example embodiments. Referring to FIG. 4, diagram 400 illustrates a vehicle 406 located on a road that includes lanes 402 and 404. In some aspects, a front-facing camera mounted inside the vehicle 406 (e.g., a LKAS camera or camera of a computing device such as a smartphone) can be used to take multiple images of the road and extract lane parameters information, including one or more of the following: a lane marker heading angle (Ci), a lane marker offset (Co), a lane marker curvature (C₂), a lane marker curvature derivative (C₃), and a lane marker type. In FIG. 4, C₂ and C₃ are zero, because the road is straight and has no curvature.

[0097] In some aspects, the front-facing camera mounted inside the vehicle 406 can use techniques disclosed herein to extract the lane parameters information. More specifically, the camera can detect lane markers (or lane lines) in each lane marker can be modeled with a 3rd degree polynomial model that describes a function X(Z), where Z is the physical longitudinal distance from the camera and X is the physical lateral offset from the camera. The following formula can be used to derive the physical lateral offset X from the camera: X = C₃ Z³ + C₂ Z² + C_t Z + C₀, where Co is the lane marker offset 408 at Z=0 (e.g., the distance, in meters, from the center of the vehicle 406 to the left lane line or to the right lane line at Z = 0 while the vehicle is in lane 1 402 or lane 2 404), Ci is the lane marker heading angle at Z=0 (e.g., the angle between the direction of the road 412 and the direction of the vehicle 410), C2 is the lane marker curvature at Z=0, and C3 is the lane marker curvature derivative at Z=0. The lane marker curvature C₂ ~ 0 and the lane marker curvature derivative C₃ ~

0 for straight lanes.

[0098] FIG. 5 illustrates lane contextual information which can be used in connection with some example embodiments. Referring to FIG. 5, diagram 500 illustrates vehicle 506 moving on a road with two lanes (lane 1 502 and lane 2 504). In some aspects, lane contextual information determined in accordance with techniques disclosed herein can include one or more of the following: a relative road distance (As) 512 indicating a distance traveled by the moving vehicle 506 between two consecutive image frames as taken by an onboard camera (e.g., the distance traveled between vehicle positions 508 and 510); a lane number indication of a road lane the moving vehicle is traveling on (e.g., the egoLaneNo information indicating lane 1 502 as the lane the vehicle 506 is traveling on); and a number of lanes associated with the road the moving vehicle is traveling on (e.g., the noOfLanes is 2 for the example illustrated in FIG. 5).

[0099] FIG. 6 illustrates a LKAS module 600 for estimating lane parameters information and lane contextual information, according to some example embodiments. A module can comprise one or both of hardware or software that has been designed to perform a function or functions. Referring to FIG. 6, the LKAS module 600 can include a DNN 604 which can be trained to generate road parameters estimation data 610 using a plurality of video frames 602 which can be taken by an in-vehicle camera such as a camera on a computing device of a user of the vehicle. In some aspects, the DNN 604 can use a neural network model such as the model 204 of FIG. 2, which has been trained using input data 103 and output data 105 so that it can estimate road parameters based on image data from the driver’s perspective (e.g. video frames taken from a front-facing camera mounted inside a moving vehicle). During the training, a database is used which includes the input data 103 (e.g., a sequence of images of the road from the driver's perspective recorded while driving and their corresponding consumer graded GPS, industrial graded GPS and lane line map of the route) as well as output data 105 (e.g., ground truth for the road parameters which can be obtained from the lane line map and the high precision geolocation provided by the industrial graded GPS). The training dataset (e.g., data 103 and 105) further includes training images with a wide variety of road geometry as well as different viewpoints from driver’s perspective, taken during varying weather and time of the day.

[0100] In some aspects, the DNN 604 is also referred as a road parameters estimator network (RPEN) and includes a road parameters estimator sub-network (RPESN) 702 and at least one LSTM node 806, as illustrated in connection with FIG. 7 and FIG. 8. The trained DNN 604 is capable of generalizing so it learns to see the lane lines and extract the lane information for roads it has never seen before.

[0101] In operation, the LKAS module 600 receives video frames 602

(e.g., via a computing device camera of a level 0 vehicle driver) which are communicated as input to DNN 604. The DNN 604, using RPESN 702 and LSTM 806 generates road parameters estimation data 610, which can include lane parameters information 608 and lane contextual information 606. The lane contextual information 606 can include relative road distance (or odometry information, such as relative road distance 512 in FIG. 5), egoLaneNo information, and noOfLanes information as discussed in connection with FIG. 5. The lane parameters information 608 can include lane marker heading angle information, lane marker offset information, lane marker curvature information, lane marker curvature derivative information, and lane marker type, as discussed in connection with FIG. 4. In this regard, the trained DNN 604 uses images received from a computing device of the level 0 vehicle user as input to produce lane contextual information and lane parameters information as output. The road parameters estimation data 610 can be used for generating LKAS warnings, including LDW or RDW. In addition to providing sensor capabilities for the LKAS functionalities, the consumer device can also convey the warning to the user via its display, in effect providing an end-to-end solution without the assistance from an external component.

[0102] FIG. 7 illustrates a road parameters estimator sub-network

(RPESN) used in the LKAS module of FIG. 6, according to some example embodiments. Referring to FIG. 7, the RPESN 702 includes a pre-trained CNN 706, a pooling layer 708, a first fully connected (FC) layer 710, a spatial LSTM 712, and a second FC layer 714.

[0103] The input to the RPESN 702 is a plurality of video frames 704 captured by the camera of a consumer device (e.g., a computing device such as a smartphone of a driver of a level 0 vehicle), which is first fed to the pre-trained CNN 706. The pre-trained CNN 706 is used as a classification network instead of training a deep neural network from scratch for the following two reasons: (a) training convolutional networks is dependent on very large labeled image datasets which increases implementation costs and processing time; and (b) unlike classification problems, where each output label is covered by at least one training sample, the output space in regression is continuous and infinite in theory. Therefore, training a convolutional neural network from scratch can be omitted and a pre-trained classification network can be used instead, which has been trained on large image datasets. Following the convolutional layers of the pre-trained CNN 706, an average pooling layer 708 collects the information of each feature channel for the entire image. Following the pooling layer 708, the first FC layer 710 learns the correlation among the extracted features. The output of the first FC layer 710 can be regarded as a sequence, which is fed to the spatial LSTM 712, whose memory block performs dimensionality reduction. The spatial LSTM 712 is used to assess the spatial relationship of the features based on the sequence output from the first FC layer 710. In this regard, the spatial LSTM 712 memory units identify the most useful feature correlations for the task of road parameters estimation. The output of the spatial LSTM 712 is then passed to the second FC layer 714, which generates the road parameters estimation data 716, which includes lane contextual information.

[0104] FIG. 8 illustrates a road parameters estimator network (RPEN)

800 used in the LKAS module of FIG. 6, according to some example embodiments. Referring to FIG. 8, the RPEN 800 receives as input a plurality of images and 802 which are processed by the RPESN 804. In some aspects, the road parameters estimation data generated by the RPESN 804 is further processed by temporal LSTMs 806 to generate modified (or final) road parameters estimation data 808.

[0105] For instant i, the RPEN 800 processes a fixed length visual input of size k frames with the RPESN 804 (which is the same as RPESN 702), whose outputs are fed into a stack of recurrent sequence models (temporal LSTMs 806). The temporal LSTMs 806 generate the final road parameters estimation data 808 including lane contextual information (such as relative road distance

information) and land parameters information. In some aspects, the relative road information within the lane contextual information is also referred to as RPEN odometry information.

[0106] Unlike a traditional feed-forward neural network, which has no notion of order in time, and the only input it considers is the current example it has been exposed to, temporal LSTMs 806 can use their internal state (memory) to recognize patterns in the time series data (e.g., the sequential image frames 802) and generate the modified (or final) road parameters estimation data 808.

As illustrated in FIG. 8, the temporal LSTMs 806 form a network with loops where the output from a previous time is fed as the input to the current time, allowing information of the past to persist. The temporal LSTMs 806 are configured to model the temporal dynamics and dependencies of the road parameters estimation data. For example, temporal constraint binds the current state of the road parameters to its previous state, and significant changes (e.g., being above a pre-determined threshold) of the lane marker offset information (CO) can trigger the change in the egoLaneNo information. Changes in the noOfLanes information are usually accompanied by changes in the lane line type (e.g., dashed to solid or preceded by the road merge sign).

[0107] FIG. 9A - FIG. 9D illustrate multiple image frames used for determining RPEN odometry as part of the lane contextual information, according to some example embodiments. Referring to FIG. 9A - FIG. 9D, there are illustrated sequential image frames 900 A, 900B, 900C, and 900D, which are received for processing by the RPEN 800. More specifically, the pre trained CNN 706 within the RPESN 702 is trained to model the shape and contour of the lane lines (e.g., lane lines 906) and how the view of the lane lines changes with the change in distance from them. These high-level features can be reused to estimate the RPEN odometry information (e.g., relativeRoadDistance information or As 902) associated with the moving vehicle 904. High dimensional representation of the dashed lane lines 906 on the road can form a smoothly varying injective (one-to-one) function of As 902. Dashed lane lines 906, being a constant presence in the road, are therefore a good candidate for the RPEN 800 to model in order to regress As 902. Dimensionality reduction capability of the spatial LSTM enables the RPEN 800 to identify the dashed lane lines 906 as the most useful feature correlations for the task of regressing the relativeRoadDistance information 902 directly from the images 900A-900D. Since the estimation of the relativeRoadDistance information 902 depends on features from the previous frame, the temporal LSTMs 806 preserve this information about the previous frame. The RPEN 800 can, therefore, estimate the relativeRoadDistance information 902 by just observing the road.

[0108] Vision-based relocalization using monocular images (e.g., continuously estimating the pose (3D position and orientation) of a moving camera) is a fundamental problem in computer vision and robotics. With the advent of autonomous vehicles, localizing a vehicle on the road using an inexpensive perceptive sensor such as monocular camera has recently become a very active research topic. There are generally two approaches to solve continuous pose estimation of a moving camera (1) through position tracking (visual odometry) and (2) through scene recognition (visual-based

relocalization).

[0109] FIG. 10 illustrates examples 1000A and 1000B of continuous pose estimation via position tracking and place recognition, according to some example embodiments. Referring to FIG. 10, example 1000A illustrates continuous pose estimation via position tracking (also known as visual odometry). In the example 1000A, a robot 1002 moves from position 1004A to position 1006 A in a landscape where the map is unknown, but the initial robot pose at position 1004A is known. As illustrated in FIG. 10, the visual odometry localization technique is locally accurate but often results in a drift 1008A over time. [0110] Referring to FIG. 10, example 1000B illustrates continuous pose estimation via place recognition (also known as vision-based relocalization). In example 1000B, the robot 1002 moves from position 1004B to position 1006B in a landscape where the map is known but the initial robot pose at position 1004B is unknown. As illustrated in FIG. 10, the vision-based relocalization technique is associated with noisy predictions with minimal to no drift 1008B.

[0111] As mentioned above, in case of pose estimation via position tracking (example 1000 A), the initial pose of the camera is known and the map of the area of operation is not necessarily needed to be known. In example 1000 A, the current pose is updated by integrating the change in a pose estimate provided by the perceptive sensor over time. For example, visual odometry algorithms can be used to compute the relative pose between consecutive frames based on sparse feature detection. Pose estimation is locally accurate but often drifts over time. Another limitation of visual odometry is that it does not work well in an environment where it is not able to extract features, as illustrated by the example feature-sparse environment 1100 in FIG. 11 (which can be a feature-sparse rural road environment). The RPEN 800 on the other hand, is trained to detect the lane lines. It learns the shape and contour of the lane lines and also learns how their appearance changes with distance from them. In this regard, the RPEN-based odometry works where visual odometry fails.

[0112] In case of pose estimation via scene recognition, which is popularly known as vision-based relocalization, the initial pose of the camera is unknown, whereas the map of the area of operation is known. Based on the observed surrounding scene, a coarse estimate of the camera pose can be made by classifying the scene among a limited number of discrete locations. Here, the pose prediction is noisy, but it is drift-free. Modem vehicle localization system can exploit the complementary, coarse but drift-free prediction from a vision- based relocalization algorithm and the locally smooth but drifty pose estimation from visual odometry algorithm by fusing them.

[0113] There are generally two approaches to vision-based relocalization

(example 1000B): point feature-based relocalization and machine learning based relocalization. In the point feature-based approach, given a 3D model and a query image, the first step is to establish a set of 2D-3D matches that link pixel position in the query image and 3D points in this 3D model. Next, the camera pose is estimated by applying n-point-pose solver that computes camera pose from a n number of 2D-3D matches inside a random sample consensus

(RANSAC) loop to handle outliers. The classical way to build the 3D model is by using structure from motion. The points are essentially reconstructed by essentially triangulating local feature matches, which creates a many-to-one mapping from feature descriptors to 3D points. One of the challenges this approach faces is inefficient scaling with scene size. The computational complexity of matching grows as the size of the scene increases. In addition, the 2D-3D matches become less unique as the chances of finding multiple 3d points with similar local appearances increases. Another challenge is that this method requires a good initial pose estimate and processing can be challenging when the scene has inconsistent illumination, motion blur, texture-less surface, and lack of overlap between images. This can be concerning in vehicle localization applications, especially when the vehicle travels fast on a highway (which is usually a featureless area).

[0114] Machine learning based localization (especially deep learning) has recently shown great promise in vision-based localization using monocular images. Some localization techniques (e.g., PoseNet) use a deep CNN to regress directly the camera's continuous pose from RGB images. However, the accuracy of PoseNet is far from what is required in practical applications and there is also a difficulty between learning between position and orientation in the same model. Some improvements to PoseNet include a method called Posenetl7.

This deep learning approach tries to learn and infer the weighting from the scene geometry. In other approaches, such as MapNet, sensory inputs like visual odometry and GPS are fused in addition to images for camera localization.

Other approaches introduce a CNN and LSTM architecture to regress geo location directly (e.g., via a geospatial deep NN), using monocular camera images and coarse GPS from phone to predict geo-location with nearly lane- level accuracy. There are several advantages of the machine learning approach over the traditional approach which makes it suitable for real-time application such as vehicle relocalization: (a) deep learning approach does not need an initial pose estimate; (b) the machine learning approach is fairly robust to

environmental challenges such lighting, weather, dynamic objects, and texture- less scenes (this is because it can learn features which are richer than what might be obtained from point features); (c) deep learning approaches have a faster inference capability; and (d) the machine learning approach to localization scales efficiently with scene sizes since it is able to learn a compact representation of the map with a deep learning model. However, the accuracy of the machine learning approach is not as optimal as the traditional feature-based method, particularly for indoor environments. Vision-based relocalization relies on landmark recognition. Therefore, it performs well in urban areas but performs poorly in suburban or rural areas. Unfortunately, none of the known localization techniques can provide lane-level localization even in urban areas.

[0115] Lane- level accuracy for vehicle localization is highly desirable since it opens up a number of applications on which the consumer device can provide an end-to-end solution without the assistance from an external component. For example, the following can be achieve with lane-level pose awareness and localization: (a) estimated“Time of Arrival” (TOA) prediction made by an automotive navigation system in the consumer device can be more accurate if it is aware of the lane the car is currently traveling; and (b) a warning can be provided beforehand if the lane on which the car is currently traveling is soon going to merge or become congested due to an accident or a work zone.

[0116] FIG. 12 illustrates a block diagram of a localization module 1200 for high precision localization, according to some example embodiments.

Referring to FIG. 12, the localization module 1200 includes RPEN 1202 (which can be the same as the RPEN 800 of FIG. 8), a neural network 1204, and a sensor fusion module 1206.

[0117] The sensor fusion module 1206 can receive as inputs RPEN odometry information 1216 and road parameters estimation data from the RPEN 1202, latitude and longitude information 1220 from the neural network 1204, latitude and longitude information 1222 from the coarse GPS 1212, additional information 1224 such as distance the vehicle travels between adjacent poses from visual odometry or IMU input 1208 (which can be extracted from computing device IMU sensors), and lane line map information 1214.

[0118] The neural network 1204 can be off-the-shelf relocalization algorithm, such as the publicly available Geo-spatial NN, which is deployed along with the RPEN 1202 to provide the spatial constraints of probable vehicle locations. The Geo-spatial NN 1204 predicts lane-level accurate geo-locations 1220 (latitude and longitude) using a sequence of images of the road from the driver's perspective and location data 1222 from a coarse GPS 1212 from a computing device (e.g., a consumer device of the driver, such as a smartphone).

[0119] The RPEN 1202 can predict road parameters using a sequence of images of the road from the driver’s perspective obtained from the computing device (e.g., as discussed in connection with FIGS. 3-9D). The output of the RPEN 1202, the Geo-spatial NN 1204 along with the lane line map information 1214 and the coarse GPS location data 1222 communicated as an input to a Bayesian filter algorithm within the sensor fusion module 1206 which predicts an accurate (e.g., lane-based) geo-location as output 1226. The sensor input to the Bayesian filtering based sensor fusion algorithm used by the sensor fusion module 1206 includes spatial constraints imposed by the RPEN output 1216 and 1218 as well as the lane line map information 1214, absolute position information from the coarse GPS 1212 and the off-the-shelf relocalization algorithm such as the Geo-spatial NN 1204, and constraints between adjacent poses from the IMU, visual odometry, and odometry output from the RPEN.

[0120] In some aspects, the Bayesian filtering based vehicle pose estimation algorithm used by the sensor fusion module 1206 includes the following processing functionalities: (a) predicting the geo-location of the vehicle after time t+1 based on previous knowledge of a vehicle’s position (at time t) and kinematic equations; (b) given the observations from the sensors, a Bayesian filter compares these observations with the prediction; and (c) update knowledge about geolocation of vehicle based on the predictions and sensor readings communicated as inputs to the sensor fusion module 1206.

[0121] FIG. 13 illustrates an example 1300 of high precision localization with lane accuracy using the localization module of FIG. 12, according to some example embodiments.

[0122] Given the lane line map information, road parameters estimation data (e.g., 1216 and 1218) by the RPEN 1202 has a higher degree of lateral accuracy compared to the longitudinal direction of the road. Thus, the RPEN 1202 output provides a lateral constraint in the probability distribution of the vehicle position, as illustrated in FIG. 13. The lane line map can then be used to translate this lateral constraint into the geographic coordinate space (latitude, longitude).

[0123] As illustrated in FIG. 13, coarse GPS accuracy location 1302 is achieved based on the output from the coarse GPS 1212. The Geo-spatial NN 1204 is able to reach nearly lane-level accuracy location 1304 using only images from the driver perspective and the coarse GPS data. The RPEN odometry 1216 (e.g., relative road distance information) along with inputs such as visual odometry and IMU generate constraints between adjacent poses, whereas the off-the-shelf relocalization algorithm, the coarse GPS output, and the RPEN road parameters 1218 impose spatial constraints on the location of the vehicle, resulting in RPEN localization accuracy location 1306. These constraints are exploited by the Bayesian filtering algorithm (such as Particle Filter (PF) or Extended Kalman Filter (EKF)) employed by the sensor fusion module 1206 to accurately localize the vehicle and achieve lane-level accuracy location 1308 by fusing the coarse GPS accuracy location 1302, the Geo-spatial NN accuracy location 1304, and the RPEN accuracy location 1306. In some aspects, accurate lane-level localization of a vehicle is possible without the visual odometry and IMU input 1224.

[0124] FIG. 14 illustrates an example of a suburban environment 1402 and an example urban environment 1404 where the localization module of FIG. 12 can be used, according to some example embodiments.

[0125] Off-the-shelf vision-based relocalization algorithm, such as Geo spatial NN 1204 relies on landmark recognition. In this regard, the Geo-spatial NN 1204 performs well in urban areas (such as 1404) but performs poorly in suburban or rural areas (such as 1402). Urban areas can be problematic areas for a GPS, but a GPS performs fairly well in rural or suburban areas. The RPEN relies on roads and its lane markings which is usually ignored by vision-based relocalization. Therefore, vision-based relocalization, GPS, and RPEN complement each other and can be used by the localization module 1200 to provide accurate lane-based localization.

[0126] FIG. 15 is a flowchart of a method 1500 for performing autonomy-level functions associated with a moving vehicle, according to some example embodiments. The method 1500 includes operations 1502, 1504, 1506, 1508, and 1510. By way of example and not limitation, the method 1500 may be performed by the LKAS module 1760, which is configured to execute within a mobile device such as device 1700 illustrated in FIG. 17.

[0127] Referring to FIG. 15, at operation 1502, a plurality of features are extracted, using convolutional layers of a convolutional neural network, from a plurality of image frames obtained by a camera within a moving vehicle. For example, a pre-trained CNN (e.g., pre-trained CNN 706 which is part of CNN 604) extracts a plurality of features from images obtained by a vehicle-mounted camera (e.g., images 704). At operation 1504, a sequence of correlations among the plurality of features is generated, using a first FC layer of a deep neural network. For example, the first FC layer generates a sequence of correlations among the plurality of features extracted by the pre-trained CNN.

[0128] At operation 1506, a dimensionality reduction of the sequence of correlations is performed, using a spatial long-short-term memory (LSTM) model, to generate a modified sequence of correlations. For example, a spatial LSTM (e.g., LSTM 712) within a road parameter estimator sub-network (RPESN) (e.g., RPESN 702) performs a dimensionality reduction of the sequence of correlations. At operation 1508, road parameters estimation data associated with the moving vehicle is generated, using a second FC layer of the deep neural network, based on the modified sequence of correlations. For example, a second FC layer 714 generates road parameters estimation data 716 associated with a moving vehicle. At operation 1510, a lane keep assist system (LKAS) warning is provided based on the road parameters estimation data. For example, an LDW or an RDW notification can be provided by the computing device used in connection with generating the road parameters estimation data 716.

[0129] The aspects disclosed herein use a consumer device (e.g., 1700) to provide the sensory function required to improve the SAE autonomy level of a vehicle. The RPEN discussed herein estimates novel parameters, such as egoLaneNo, noOfLanes, and relativeRoadDistance.

[0130] Additionally, the RPEN discussed herein provides a constraint between adjacent poses by observing the change in the appearance of lane lines with distance from them. This constraint can be exploited by Bayesian filtering based pose estimation algorithm to improve the geo-location accuracy and provide lane-based vehicle localization. Techniques disclosed herein are used for improving geo-location accuracy of existing positioning system of vehicles by imposing a spatial constraint on the vehicle probable location based on the real-time extraction of lane contextual information such as egoLaneNo (on what lane number is the ego vehicle located) along with lane parameters by visual sensor, given the lane line map of the road.

[0131] FIG. 16 is a block diagram illustrating a representative software architecture 1600, which may be used in conjunction with various device hardware described herein, according to some example embodiments. FIG. 16 is merely a non-limiting example of a software architecture 1602 and it will be appreciated that many other architectures may be implemented to facilitate the functionality described herein. The software architecture 1602 may be executing on hardware such as device 1700 of FIG. 17 that includes, among other things, processor 1705, memory 1710, storage 1715 and 1720, and I/O interfaces 1725 and 1730.

[0132] A representative hardware layer 1604 is illustrated and can represent, for example, the device 1700 of FIG. 17. The representative hardware layer 1604 comprises one or more processing units 1606 having associated executable instructions 1608. Executable instructions 1608 represent the executable instructions of the software architecture 1602, including

implementation of the methods, modules and so forth of FIGS. 1-15. Hardware layer 1604 also includes memory or storage modules 1610, which also have executable instructions 1608. Hardware layer 1604 may also comprise other hardware 1612, which represents any other hardware of the hardware layer 1604, such as the other hardware illustrated as part of device 1700.

[0133] In the example architecture of FIG. 16, the software architecture

1602 may be conceptualized as a stack of layers where each layer provides particular functionality. For example, the software architecture 1602 may include layers such as an operating system 1614, libraries 1616,

frameworks/middleware 1618, applications 1620, and presentation layer 1644. Operationally, the applications 1620 or other components within the layers may invoke application programming interface (API) calls 1624 through the software stack and receive a response, returned values, and so forth illustrated as messages 1626 in response to the API calls 1624. The layers illustrated in FIG. 16 are representative in nature and not all software architectures 1602 have all layers. For example, some mobile or special purpose operating systems may not provide frameworks/middleware 1618, while others may provide such a layer. Other software architectures may include additional or different layers.

[0134] The operating system 1614 may manage hardware resources and provide common services. The operating system 1614 may include, for example, a kernel 1628, services 1630, and drivers 1632. The kernel 1628 may act as an abstraction layer between the hardware and the other software layers. For example, kernel 1628 may be responsible for memory management, processor management (e.g., scheduling), component management, networking, security settings, and so on. The services 1630 may provide other common services for the other software layers. The drivers 1632 may be responsible for controlling or interfacing with the underlying hardware. For instance, the drivers 1632 may include display drivers, camera drivers, Bluetooth® drivers, flash memory drivers, serial communication drivers (e.g., Universal Serial Bus (USB) drivers), Wi-Fi® drivers, audio drivers, power management drivers, and so forth, depending on the hardware configuration.

[0135] The libraries 1616 may provide a common infrastructure that may be utilized by the applications 1620 or other components or layers. The libraries 1616 typically provide functionality that allows other software modules to perform tasks in an easier fashion than to interface directly with the underlying operating system 1614 functionality (e.g., kernel 1628, services 1630, or drivers 1632). The libraries 1616 may include system libraries 1634 (e.g., C standard library) that may provide functions such as memory allocation functions, string manipulation functions, mathematic functions, and the like. In addition, the libraries 1616 may include API libraries 1636 such as media libraries (e.g., libraries to support presentation and manipulation of various media format such as MPEG4, H.264, MP3, AAC, AMR, JPG, PNG), graphics libraries (e.g., an OpenGL framework that may be used to render 2D and 3D in a graphic content on a display), database libraries (e.g., SQLite that may provide various relational database functions), web libraries (e.g., WebKit that may provide web browsing functionality), and the like. The libraries 1616 may also include a wide variety of other libraries 1638 to provide many other APIs to the applications 1620 and other software components/modules.

[0136] The frameworks/middleware 1618 (also sometimes referred to as middleware) may provide a higher-level common infrastructure that may be utilized by the applications 1620 or other software components/modules. For example, the frameworks/middleware 1618 may provide various graphical user interface (GUI) functions, high-level resource management, high-level location services, and so forth. The frameworks/middleware 1618 may provide a broad spectrum of other APIs that may be utilized by the applications 1620 or other software components/modules, some of which may be specific to a particular operating system 1614 or platform.

[0137] The applications 1620 include built-in applications 1640, third- party applications 1642, an LKAS module 1660, and a localization module 1662. In some aspects, the LKAS module 1660 may comprise suitable circuitry, logic, interfaces, or code and is configured to perform one or more of the

functionalities associated with the LKAS module 600 of FIG. 6 and discussed in connection with FIGS. 1-15. The localization module 1662 may comprise suitable circuitry, logic, interfaces, or code and is configured to perform one or more of the vehicle localization functionalities associated with the localization module 1200 of FIG. 12 and discussed in connection with FIGS. 10-14.

[0138] Examples of representative built-in applications 1640 may include but are not limited to, a contacts application, a browser application, a book reader application, a location application, a media application, a messaging application, or a game application. Third-party applications 1642 may include any of the built-in applications 1640 as well as a broad assortment of other applications. In a specific example, the third-party application 1642 (e.g., an application developed using the Android™ or iOS™ software development kit (SDK) by an entity other than the vendor of the particular platform) may be mobile software running on a mobile operating system such as iOS™,

Android™, Windows® Phone, or other mobile operating systems. In this example, the third-party application 1642 may invoke the API calls 1624 provided by the mobile operating system such as operating system 1614 to facilitate functionality described herein.

[0139] The applications 1620 may utilize built-in operating system functions (e.g., kernel 1628, services 1630, and drivers 1632), libraries (e.g., system libraries 1634, API libraries 1636, and other libraries 1638), and frameworks/middleware 1618 to create user interfaces to interact with users of the system. Alternatively, or additionally, in some systems, interactions with a user may occur through a presentation layer, such as presentation layer 1644. In these systems, the application/module "logic" can be separated from the aspects of the application/module that interact with a user.

[0140] Some software architectures utilize virtual machines. In the example of FIG. 16, this is illustrated by virtual machine 1648. A virtual machine creates a software environment where applications/modules can execute as if they were executing on a hardware machine (such as the device 1700 of FIG. 17, for example). A virtual machine 1648 is hosted by a host operating system (e.g., operating system 1614) and typically, although not always, has a virtual machine monitor 1646, which manages the operation of the virtual machine 1648 as well as the interface with the host operating system (i.e., operating system 1614). A software architecture 1602 executes within the virtual machine 1648 such as an operating system 1650, libraries 1652, frameworks/middleware 1654, applications 1656, or presentation layer 1658. These layers of software architecture executing within the virtual machine 1648 can be the same as corresponding layers previously described or may be different.

[0141] FIG. 17 is a block diagram illustrating circuitry for a device that implements algorithms and performs methods, according to some example embodiments. All components need not be used in various embodiments. For example, clients, servers, and cloud-based network devices may each use a different set of components, or in the case of servers, for example, larger storage devices.

[0142] One example computing device in the form of a computer 1700

(also referred to as computing device 1700, computer system 1700, or computer 1700) may include a processor 1705, memory 1710, removable storage 1715, non-removable storage 1720, input interface 1725, output interface 1730, and communication interface 1735, all connected by a bus 1740. Although the example computing device is illustrated and described as the computer 1700, the computing device may be in different forms in different embodiments.

[0143] The memory 1710 may include volatile memory 1745 and non volatile memory 1750 and may store a program 1755. The computing device 1700 may include - or have access to a computing environment that includes - a variety of computer-readable media, such as the volatile memory 1745, the non volatile memory 1750, the removable storage 1715, and the non-removable storage 1720. Computer storage includes random-access memory (RAM), read only memory (ROM), erasable programmable read-only memory (EPROM) and electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, compact disc read-only memory (CD ROM), digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium capable of storing computer-readable instructions.

[0144] Computer-readable instructions stored on a computer-readable medium (e.g., the program 1755 stored in the memory 1710) are executable by the processor 1705 of the computing device 1700. A hard drive, CD-ROM, and RAM are some examples of articles including a non-transitory computer- readable medium such as a storage device. The terms“computer-readable medium” and“storage device” do not include carrier waves to the extent that carrier waves are deemed too transitory. “Computer-readable non-transitory media” includes all types of computer-readable media, including magnetic storage media, optical storage media, flash media, and solid-state storage media. It should be understood that software can be installed in and sold with a computer. Alternatively, the software can be obtained and loaded into the computer, including obtaining the software through a physical medium or distribution system, including, for example, from a server owned by the software creator or from a server not owned but used by the software creator. The software can be stored on a server for distribution over the Internet, for example. As used herein, the terms“computer-readable medium” and“machine-readable medium” are interchangeable. [0145] The program 1755 may utilize a customer preference structure using modules discussed herein, such as the LKAS module 1760 and the localization module 1765, which may be the same as the LKAS module 1660 and the localization module 1662 of FIG. 16 respectively.

[0146] Any one or more of the modules described herein may be implemented using hardware (e.g., a processor of a machine, an application- specific integrated circuit (ASIC), field-programmable gate array (FPGA), or any suitable combination thereof). Moreover, any two or more of these modules may be combined into a single module, and the functions described herein for a single module may be subdivided among multiple modules. Furthermore, according to various example embodiments, modules described herein as being implemented within a single machine, database, or device may be distributed across multiple machines, databases, or devices.

[0147] In some aspects, the LKAS module 1760, the localization module

1765, as well as one or more other modules that are part of the program 1755, can be integrated as a single module, performing the corresponding functions of the integrated modules.

[0148] Although a few embodiments have been described in detail above, other modifications are possible. For example, the logic flows depicted in the figures do not require the particular order shown, or sequential order, to achieve desirable results. Other steps may be provided, or steps may be eliminated, from the described flows, and other components may be added to, or removed from, the described systems. Other embodiments may be within the scope of the following claims.

[0149] In an example embodiment, a computer 1700 includes an extraction module extracting a plurality of features from a plurality of image frames obtained by a camera within the moving vehicle, a sequence module generating a sequence of correlations among the plurality of features, a reduction module performing a dimensionality reduction of the sequence of correlations to generate a modified sequence of correlations, a road parameters module generating road parameters estimation data associated with the moving vehicle, based on the modified sequence of correlations, and a warning module providing a lane keep assist system (LKAS) warning based on the road parameters estimation data. In some embodiments, the computer 1700 may include other or additional modules for performing any one of or combination of steps described in the embodiments. Further, any of the additional or alternative embodiments or aspects of the method, as shown in any of the figures or recited in any of the claims, are also contemplated to include similar modules.

[0150] In an example embodiment, a computer system 1700 is provided for performing autonomy-level functions associated with a moving vehicle. The computer system 1700 comprises a memory 1710 storing instructions, and one or more processors 1705 in communication with the memory 1710. The one or more processors 1705 execute the instructions to extract, using convolutional layers of a convolutional neural network, a plurality of features from a plurality of image frames obtained by a camera within the moving vehicle, generate, using a first fully connected (FC) layer of the convolutional neural network, a sequence of correlations among the plurality of features, perform, using a spatial long-short-term memory (LSTM), a dimensionality reduction of the sequence of correlations to generate a modified sequence of correlations, generate, using a second FC layer of the convolutional neural network, road parameters estimation data associated with the moving vehicle, based on the modified sequence of correlations, and provide a lane keep assist system (LKAS) warning based on the road parameters estimation data.

[0151] It should be further understood that software including one or more computer-executable instructions that facilitate processing and operations as described above with reference to any one or all of steps of the disclosure can be installed in and sold with one or more computing devices consistent with the disclosure. Alternatively, the software can be obtained and loaded into one or more computing devices, including obtaining the software through physical medium or distribution system, including, for example, from a server owned by the software creator or from a server not owned but used by the software creator. The software can be stored on a server for distribution over the Internet, for example.

[0152] Also, it will be understood by one skilled in the art that this disclosure is not limited in its application to the details of construction and the arrangement of components set forth in the description or illustrated in the drawings. The embodiments herein are capable of other embodiments and capable of being practiced or carried out in various ways. Also, it will be understood that the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of“including,” “comprising,” or“having” and variations thereof herein is meant to encompass the items listed thereafter and equivalents thereof as well as additional items. Unless limited otherwise, the terms“connected,”“coupled,” and“mounted,” and variations thereof herein are used broadly and encompass direct and indirect connections, couplings, and mountings. In addition, the terms“connected” and “coupled,” and variations thereof, are not restricted to physical or mechanical connections or couplings. Further, terms such as up, down, bottom, and top are relative, and are employed to aid illustration, but are not limiting.

[0153] The components of the illustrative devices, systems, and methods employed in accordance with the illustrated embodiments can be implemented, at least in part, in digital electronic circuitry, analog electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. These components can be implemented, for example, as a computer program product such as a computer program, program code or computer instructions tangibly embodied in an information carrier, or in a machine-readable storage device, for execution by, or to control the operation of, data processing apparatus such as a programmable processor, a computer, or multiple computers.

[0154] A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other units suitable for use in a computing environment. A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network. Also, functional programs, codes, and code segments for accomplishing the techniques described herein can be easily construed as within the scope of the claims by programmers skilled in the art to which the techniques described herein pertain. Method steps associated with the illustrative embodiments can be performed by one or more

programmable processors executing a computer program, code or instructions to perform functions (e.g., by operating on input data or generating an output). Method steps can also be performed by, and apparatus for performing the methods can be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit), for example.

[0155] The various illustrative logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a general-purpose processor, a digital signal processor (DSP), an ASIC, a FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.

[0156] Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random-access memory or both. The required elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. Information carriers suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example, semiconductor memory devices, e.g., electrically programmable read-only memory or ROM (EPROM), electrically erasable programmable ROM (EEPROM), flash memory devices, or data storage disks (e.g., magnetic disks, internal hard disks, or removable disks, magneto-optical disks, or CD- ROM/DVD-ROM disks). The processor and the memory can be supplemented by or incorporated in special purpose logic circuitry.

[0157] Those of skill in the art understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.

[0158] As used herein,“machine-readable medium” (or“computer- readable medium”) comprises a device able to store instructions and data temporarily or permanently and may include, but is not limited to, random- access memory (RAM), read-only memory (ROM), buffer memory, flash memory, optical media, magnetic media, cache memory, other types of storage (e.g., Erasable Programmable Read-Only Memory (EEPROM)), or any suitable combination thereof. The term“machine -readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store processor instructions. The term“machine-readable medium” shall also be taken to include any medium or a combination of multiple media, that is capable of storing instructions for execution by one or more processors, such that the instructions, when executed by one or more processors, cause the one or more processors to perform any one or more of the methodologies described herein. Accordingly, a“machine- readable medium” refers to a single storage apparatus or device, as well as “cloud-based” storage systems or storage networks that include multiple storage apparatus or devices. The term“machine-readable medium” as used herein excludes signals per se.

[0159] In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present disclosure. Other items shown or discussed as coupled or directly coupled or communicating with each other may be indirectly coupled or communicating through some interface, device, or intermediate component whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and could be made without departing from the scope disclosed herein.

[0160] Although the present disclosure has been described with reference to specific features and embodiments thereof, it is evident that various modifications and combinations can be made thereto without departing from the scope of the disclosure. For example, other components may be added to, or removed from, the described systems. The specification and drawings are, accordingly, to be regarded simply as an illustration of the disclosure as defined by the appended claims, and are contemplated to cover any and all

modifications, variations, combinations or equivalents that fall within the scope of the present disclosure. Other aspects may be within the scope of the following claims. Finally, as used herein, the conjunction“or” refers to a non exclusive“or,” unless specifically stated otherwise.

Claims

CLAIMS What is claimed is:

1. A computer- implemented method for performing autonomy- level functions associated with a moving vehicle, the method comprising:

extracting, using convolutional layers of a convolutional neural network, a plurality of features from a plurality of image frames obtained by a camera within the moving vehicle;

generating, using a first fully connected (FC) layer of the convolutional neural network, a sequence of correlations among the plurality of features; performing, using a spatial long-short-term memory (LSTM), a dimensionality reduction of the sequence of correlations to generate a modified sequence of correlations;

generating, using a second FC layer of the convolutional neural network, road parameters estimation data associated with the moving vehicle, based on the modified sequence of correlations; and

providing a lane keep assist system (LKAS) warning based on the road parameters estimation data.

2. The computer- implemented method of claim 1, further comprising: detecting one or more time series data patterns in the road parameters estimation data using a temporal LSTM.

3. The computer- implemented method of claim 2, further comprising: modifying, using the temporal LSTM, the road parameters estimation data based on the detected one or more time series data patterns to generate modified road parameters estimation data.

4. The computer-implemented method of claim 3, wherein the modified road parameters estimation data includes lane parameters information and lane contextual information.

5. The computer-implemented method of claim 4, wherein the lane parameters information includes one or more of the following: a lane marker heading angle;

a lane marker offset;

a lane marker curvature;

a lane marker curvature derivative; and

a lane marker type.

6. The computer-implemented method of claim 4, wherein the lane contextual information includes one or more of the following:

a relative road distance indicating a distance traveled by the moving vehicle between two consecutive frames of the plurality of image frames;

a lane number indication of a road lane the moving vehicle is traveling on; and

a number of lanes associated with a road the moving vehicle is traveling on.

7. The computer-implemented method of any of claims 1 to 6, further comprising: extracting, using convolutional layers of a second convolutional neural network, geo-location information of the moving vehicle using the plurality of image frames obtained by the camera.

8. The computer- implemented method of claim 7, further comprising: applying, using Bayesian filtering, one or more spatial constraints to the geo-location information to generate updated geolocation information of the moving vehicle, the one or more spatial constraints based on the road parameters estimation data; and

outputting the updated geolocation information of the moving vehicle.

9. A system for performing autonomy-level functions associated with a moving vehicle, the system comprising:

a memory storing instructions; and

one or more processors in communication with the memory, the one or more processors executing the instructions to: extract, using convolutional layers of a convolutional neural network, a plurality of features from a plurality of image frames obtained by a camera within the moving vehicle;

generate, using a first fully connected (FC) layer of the convolutional neural network, a sequence of correlations among the plurality of features;

perform, using a spatial long-short-term memory (LSTM), a dimensionality reduction of the sequence of correlations to generate a modified sequence of correlations;

generate, using a second FC layer of the convolutional neural network, road parameters estimation data associated with the moving vehicle, based on the modified sequence of correlations; and

provide a lane keep assist system (LKAS) warning based on the road parameters estimation data.

10. The system of claim 9, wherein the one or more processors further execute the instructions to:

detect, using a temporal LSTM, one or more time series data patterns in the road parameters estimation data.

11. The system of claim 10, wherein the one or more processors further execute the instructions to:

modify, using the temporal LSTM, the road parameters estimation data based on the detected one or more time series data patterns to generate modified road parameters estimation data.

12. The system of claim 11, wherein the modified road parameters estimation data includes lane parameters information and lane contextual information.

13. The system of claim 12, wherein the lane parameters information includes one or more of the following:

a lane marker heading angle;

a lane marker offset;

a lane marker curvature; a lane marker curvature derivative; and

a lane marker type.

14. The system of claim 12, wherein the lane contextual information includes one or more of the following:

a lane number indication of a road lane the moving vehicle is traveling on; and

a number of lanes associated with a road the moving vehicle is traveling on.

15. The system of any of claims 9 to 14, wherein the one or more processors further execute the instructions to:

extract, using convolutional layers of a second convolutional neural network, geo-location information of the moving vehicle using the plurality of image frames obtained by the camera.

16. The system of claim 15, wherein the one or more processors further execute the instructions to:

apply, using Bayesian filtering, one or more spatial constraints to the geo-location information to generate updated geolocation information of the moving vehicle, the one or more spatial constraints based on the road parameters estimation data; and

output the updated geolocation information of the moving vehicle.

17. A computer-readable medium storing computer instructions for performing autonomy- level functions associated with a moving vehicle, wherein the instructions when executed by one or more processors of a computing device, cause the one or more processors to perform operations comprising:

extract, using convolutional layers of a convolutional neural network, a plurality of features from a plurality of image frames obtained by a camera within the moving vehicle; generate, using a first fully connected (FC) layer of the convolutional neural network, a sequence of correlations among the plurality of features; perform, using a spatial long-short-term memory (LSTM), a dimensionality reduction of the sequence of correlations to generate a modified sequence of correlations;

18. The computer-readable medium of claim 17, wherein the instructions further cause the one or more processors to:

detect one or more time series data patterns in the road parameters estimation data using a temporal LSTM.

19. The computer-readable medium of claim 18, wherein the instructions further cause the one or more processors to:

20. The computer-readable medium of claim 19, wherein the modified road parameters estimation data includes lane parameters information and lane contextual information.

21. The computer-readable medium of claim 20, wherein the lane parameters information includes one or more of the following:

a lane marker heading angle;

a lane marker offset;

a lane marker curvature;

a lane marker curvature derivative; and

a lane marker type.

22. The computer-readable medium of claim 20, wherein the lane contextual information includes one or more of the following:

a lane number indication of a road lane the moving vehicle is traveling on; and

a number of lanes associated with a road the moving vehicle is traveling on.

23. The computer-readable medium of any of claims 17 to 22, wherein the instructions further cause the one or more processors to:

24. The computer-readable medium of claim 23, wherein the instructions further cause the one or more processors to:

output the updated geolocation information of the moving vehicle.

25. A system for performing autonomy-level functions associated with a moving vehicle, the system comprising:

an extracting means for performing extracting of a plurality of features from a plurality of image frames obtained by a camera within the moving vehicle;

a correlating means for generating a sequence of correlations among the plurality of features;

a reduction means for performing a dimensionality reduction of the sequence of correlations to generate a modified sequence of correlations; an estimating means for generating road parameters estimation data associated with the moving vehicle, based on the modified sequence of correlations; and

a notification means for providing a lane keep assist system (LKAS) warning based on the road parameters estimation data.