CN114127810A - Vehicle autonomous level function - Google Patents

Vehicle autonomous level function Download PDF

Info

Publication number
CN114127810A
CN114127810A CN201980098224.1A CN201980098224A CN114127810A CN 114127810 A CN114127810 A CN 114127810A CN 201980098224 A CN201980098224 A CN 201980098224A CN 114127810 A CN114127810 A CN 114127810A
Authority
CN
China
Prior art keywords
lane
road
parameter estimation
moving vehicle
estimation data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201980098224.1A
Other languages
Chinese (zh)
Inventor
阿桑·哈比卜
盖尔·卡姆德·德·特尤
苏巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN114127810A publication Critical patent/CN114127810A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/588Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

A computer-implemented method for performing autonomous level functions associated with a mobile vehicle is disclosed. The method includes extracting a plurality of features from a plurality of image frames obtained from a camera within the moving vehicle using convolutional layers of a convolutional neural network. Generating a correlation sequence between the plurality of features using a first Fully Connected (FC) layer of the convolutional neural network. The correlation sequence is dimensionality reduced using a long-short-term memory (LSTM) in space to generate a modified correlation sequence. Generating road parameter estimation data associated with the moving vehicle using a second FC layer of the convolutional neural network and according to the modified correlation sequence. And providing Lane Keep Assist System (LKAS) warning according to the road parameter estimation data.

Description

Vehicle autonomous level function
Cross reference to related applications
N/A
Technical Field
The present invention relates to performing automatic functions in a moving vehicle.
Background
Over time, vehicles become increasingly "intelligent" by implementing a degree of autonomy (e.g., by fusing cognitive sensor technology and Artificial Intelligence (AI)), and cooperating with neighboring vehicles and infrastructure through vehicle-to-vehicle (V2V) or vehicle-to-infrastructure (V2I) communication. For example, vehicles on a highway communicate with each other through V2V so that each vehicle on the road can learn about nearby vehicles.
The society of International automotive engineers (SAE International) is a globally active professional association and standard-making organization, and is directed to engineering professionals in various industries such as the automotive and aerospace industries. SAE uses six levels (level 0 to level 5) to classify vehicle autonomy. In a class 0 vehicle, the human driver controls everything independently, including steering, throttle, braking, etc. In a class 1 vehicle, steering, acceleration and braking are each individually controlled automatically. Currently, a large number of vehicles are running on a road, these vehicles comprising sensor technology, such as Lane Keep Assist System (LKAS) cameras. The LKAS camera extracts lane geometry from the road lane markings and enables the class 1 vehicle to perform lane keeping assist functions such as Lane Departure Warning (LDW), Road Departure Warning (RDW), and Lane Centering (LC).
In a class 2 vehicle, multiple operations are automatically performed simultaneously. For example, simultaneous steering and acceleration may be automatically performed to effect lane changes. In another example, the "Super Cruise" hands-free highway driving system of general-purpose automobile company operates by utilizing high-precision maps (e.g., High Definition (HD) maps) and high-precision Global Positioning System (GPS) technology.
In class 3 vehicles, the vehicle driver will have more freedom to turn his attention away from the road completely under certain conditions. In other words, the vehicle driver will be able to fully transfer drive control to the vehicle, and the driver will still be able to monitor the various vehicle systems and intervene when needed. In certain situations, safety critical functions may be transferred to the vehicle.
In a class 4 vehicle, the driver does not need to perform vehicle system monitoring. Class 4 vehicles are intended to run safety critical functions and monitor road conditions throughout the trip.
In a class 5 vehicle, the driver does not need to be adapted to driving, even with a driving license. A class 5 vehicle may perform any and all driving tasks without human intervention. Class 5 vehicles do not have a driver's cab and everyone in the vehicle is a passenger. Class 3-5 vehicles fully delegate driving functions to the vehicle AI using perceptual sensor technology, such as light detection and ranging (LIDAR), and high precision positioning technology.
Currently, most vehicles operating on roadways are non-intelligent (e.g., class 0) vehicles because these vehicles do not have any of the high precision sensor technology required to perform the autonomous functions of higher level vehicles (e.g., LKAS for providing LDW, RDW, and LC). Even with the advent of more intelligent vehicles, class 0 vehicles are expected to remain popular for long periods of time in the future, as higher class vehicles are expensive due to the use of high precision sensors (e.g., high precision LIDAR radar and GPS). In addition, the traditional positioning technology mostly depends on the GPS technology, and the GPS technology cannot meet the accuracy requirement of the modern intelligent traffic system.
Disclosure of Invention
Various examples are now described, briefly introducing a series of concepts, which are further described in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
According to a first aspect of the invention, a computer-implemented method for performing autonomous level functions associated with a mobile vehicle is provided. The method includes extracting a plurality of features from a plurality of image frames obtained from a camera within the moving vehicle using convolutional layers of a convolutional neural network. Generating a correlation sequence between the plurality of features using a first Fully Connected (FC) layer of the convolutional neural network. The correlation sequence is dimensionality reduced using a long-short-term memory (LSTM) in space to generate a modified correlation sequence. Generating road parameter estimation data associated with the moving vehicle from the modified correlation sequence using a second FC layer of the convolutional neural network. And providing Lane Keep Assist System (LKAS) warning according to the road parameter estimation data.
In a first implementation form of the method according to the first aspect, the one or more time series data patterns in the road parameter estimation data are detected using temporal LSTM.
In a second implementation form of the method according to the first aspect as such or according to any of the preceding implementation forms of the first aspect, the road parameter estimation data is modified according to the detected one or more time series data patterns using the temporal LSTM to generate modified road parameter estimation data.
In a third implementation form of the method according to the first aspect as such or any of the above implementation forms of the first aspect, the modified road parameter estimation data comprises lane parameter information and lane context information.
In a fourth implementation form of the method according to the first aspect as such or any of the preceding implementation forms of the first aspect, the lane parameter information comprises one or more of: lane marker heading angle, lane marker offset, lane marker curvature derivative, and lane marker type.
In a fifth implementation form of the method according to the first aspect as such or any of the preceding implementation forms of the first aspect, the lane context information comprises one or more of: a relative road distance representing a distance traveled by the moving vehicle between two consecutive frames of the plurality of image frames; a lane number indication of a road lane in which the mobile vehicle is traveling; a number of lanes associated with a road on which the moving vehicle is traveling.
In a sixth implementation form of the method according to the first aspect as such or any of the preceding implementation forms of the first aspect, the geographical position information of the moving vehicle is extracted using the plurality of image frames obtained by the camera using a convolutional layer of a second convolutional neural network.
In a seventh implementation form of the method according to the first aspect as such or any of the preceding implementation forms of the first aspect, one or more spatial constraints are applied to the geographical location information using bayesian filtering to generate updated geographical location information of the moving vehicle. The one or more spatial constraints are based on the road parameter estimation data. Outputting the updated geographic location information of the moving vehicle.
According to a second aspect of the invention, a system for performing autonomous level functions associated with a mobile vehicle is provided. The system includes a memory storing instructions and one or more processors in communication with the memory. The one or more processors execute the instructions to extract a plurality of features from a plurality of image frames obtained from a camera within the moving vehicle using convolutional layers of a convolutional neural network. Generating a correlation sequence between the plurality of features using a first Fully Connected (FC) layer of the convolutional neural network. The correlation sequence is dimensionality reduced using a long-short-term memory (LSTM) in space to generate a modified correlation sequence. Generating road parameter estimation data associated with the moving vehicle from the modified correlation sequence using a second FC layer of the convolutional neural network. And providing Lane Keep Assist System (LKAS) warning according to the road parameter estimation data.
In a first implementation form of the system of the second aspect, the one or more processors are further configured to execute the instructions to detect one or more time series data patterns in the road parameter estimation data using time LSTM.
In a second implementation of the system according to the second aspect as such or any of the preceding implementations of the second aspect, the one or more processors are further configured to execute the instructions to modify the road parameter estimation data according to the detected one or more time series data patterns using the time LSTM to generate modified road parameter estimation data.
In a third implementation form of the system according to the second aspect as such or any of the preceding implementation forms of the second aspect, the modified road parameter estimation data comprises lane parameter information and lane context information.
In a fourth implementation form of the system according to the second aspect as such or any of the preceding implementation forms of the second aspect, the lane parameter information comprises one or more of: lane marker heading angle, lane marker offset, lane marker curvature derivative, and lane marker type.
In a fifth implementation form of the system according to the second aspect as such or any of the preceding implementation forms of the second aspect, the lane context information comprises one or more of: a relative road distance representing a distance traveled by the moving vehicle between two consecutive frames of the plurality of image frames; a lane number indication of a road lane in which the mobile vehicle is traveling; a number of lanes associated with a road on which the moving vehicle is traveling.
In a sixth implementation of the system according to the second aspect as such or any of the preceding implementations of the second aspect, the one or more processors are further configured to extract the geographic location information of the moving vehicle using the plurality of image frames obtained by the camera using a convolutional layer of a second convolutional neural network.
In a seventh implementation of the system according to the second aspect as such or any of the preceding implementations of the second aspect, the one or more processors are further configured to apply one or more spatial constraints to the geographic location information using bayesian filtering to generate updated geographic location information for the moving vehicle. The one or more spatial constraints are based on the road parameter estimation data. Outputting the updated geographic location information of the moving vehicle.
According to a third aspect of the invention, a non-transitory computer-readable medium is provided that stores instructions for performing autonomous level functions associated with a moving vehicle. The instructions, when executed by one or more processors of a computing device, cause the one or more processors to extract a plurality of features from a plurality of image frames obtained by a camera within the moving vehicle using convolutional layers of a convolutional neural network. Generating a correlation sequence between the plurality of features using a first Fully Connected (FC) layer of the convolutional neural network. The correlation sequence is dimensionality reduced using a long-short-term memory (LSTM) in space to generate a modified correlation sequence. Generating road parameter estimation data associated with the moving vehicle from the modified correlation sequence using a second FC layer of the convolutional neural network. And providing Lane Keep Assist System (LKAS) warning according to the road parameter estimation data.
In a first implementation of the non-transitory computer readable medium of the third aspect, the instructions further cause the one or more processors to detect one or more time series data patterns in the road parameter estimation data using time LSTM.
In a second implementation of the non-transitory computer readable medium according to the third aspect as such or according to any of the preceding implementations of the third aspect, the instructions further cause the one or more processors to modify the road parameter estimation data according to the detected one or more time series data patterns using the time LSTM to generate modified road parameter estimation data.
In a third implementation form of the non-transitory computer readable medium according to the third aspect as such or according to any of the preceding implementation forms of the third aspect, the modified road parameter estimation data comprises lane parameter information and lane context information.
In a fourth implementation form of the non-transitory computer readable medium according to the third aspect as such or any of the preceding implementation forms of the third aspect, the lane parameter information comprises one or more of: lane marker heading angle, lane marker offset, lane marker curvature derivative, and lane marker type.
In a fifth implementation of the non-transitory computer readable medium according to the third aspect as such or any of the above implementations of the third aspect, the lane context information comprises one or more of: a relative road distance representing a distance traveled by the moving vehicle between two consecutive frames of the plurality of image frames; a lane number indication of a road lane in which the mobile vehicle is traveling; a number of lanes associated with a road on which the moving vehicle is traveling.
In a sixth implementation of the non-transitory computer readable medium according to the third aspect as such or any of the preceding implementations of the third aspect, the instructions further cause the one or more processors to extract geographic location information of the moving vehicle using the plurality of image frames obtained by the camera using convolutional layers of a second convolutional neural network.
In a seventh implementation of the non-transitory computer readable medium according to the third aspect or any of the previous implementations of the third aspect, the instructions further cause the one or more processors to apply one or more spatial constraints to the geographic location information using bayesian filtering to generate updated geographic location information for the moving vehicle. The one or more spatial constraints are based on the road parameter estimation data. Outputting the updated geographic location information of the moving vehicle.
According to a fourth aspect of the present invention, a system for performing autonomous level functions associated with a mobile vehicle is provided. The system includes an extraction module to extract a plurality of features from a plurality of image frames obtained from a camera within the moving vehicle.
The system includes a correlation module for generating a correlation sequence between the plurality of features. The system includes a dimension reduction module to perform a dimension reduction of the correlation sequence to generate a modified correlation sequence. The system includes an estimation module to generate road parameter estimation data associated with the moving vehicle from the modified correlation sequence. The system comprises a notification module for providing Lane Keep Assist System (LKAS) warning according to the road parameter estimation data.
Any of the above examples may be combined with any one or more of the other examples described above to create new embodiments within the scope of the present invention.
Drawings
In the drawings, which are not necessarily drawn to scale, like numerals may describe similar components in different views. The drawings illustrate generally, by way of example, and not by way of limitation, various embodiments described herein.
Fig. 1 is a block diagram of a Deep Learning (DL) model trained using DL architecture (DLA) provided by some example embodiments.
Fig. 2 is a diagram of generation of a trained DL model using a neural network model trained within a DLA provided by some example embodiments.
FIG. 3 illustrates various levels of SAE autonomy provided by some example embodiments and enabling a lower SAE autonomy level vehicle to perform higher SAE autonomy level functions using a computing device using techniques disclosed herein.
Fig. 4 illustrates various lane parameter information that may be used in connection with some example embodiments.
Fig. 5 illustrates lane context information that may be used in connection with some example embodiments.
Fig. 6 illustrates an LKAS module for estimating lane parameter information and lane context information provided by some example embodiments.
Fig. 7 illustrates a road parameter estimator sub-network (RPESN) for the LKAS module of fig. 6, as provided by some example embodiments.
Fig. 8 illustrates a Road Parameter Estimator Network (RPEN) for the LKAS module of fig. 6 provided by some example embodiments.
Fig. 9A-9D illustrate a plurality of image frames provided by some example embodiments for determining an RPEN odometer as part of lane context information.
Fig. 10 illustrates an example of continuous pose estimation by position tracking and position identification provided by some example embodiments.
FIG. 11 illustrates an example feature sparse environment provided by some example embodiments for use in connection with continuous pose estimation techniques.
Fig. 12 illustrates a block diagram of a positioning module for high precision positioning provided by some example embodiments.
Fig. 13 illustrates an example of high-precision positioning with lane accuracy using the positioning module of fig. 12 provided by some example embodiments.
Fig. 14 illustrates example suburban and urban environments that some example embodiments provide that may use the location module of fig. 12.
Fig. 15 is a flow diagram of a method for performing autonomous level functions associated with a mobile vehicle provided by some example embodiments.
Fig. 16 is a block diagram of a representative software architecture provided by some example embodiments, which may be used in connection with the various device hardware described herein.
Fig. 17 is a block diagram of circuitry of an apparatus implementing an algorithm and performing a method provided by some example embodiments.
Detailed Description
It should be understood at the outset that although illustrative implementations of one or more embodiments are provided below, the disclosed systems and methods described in connection with fig. 1-17 may be implemented using any number of techniques, whether currently known or not yet existing. The present invention should in no way be limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.
The present invention relates to enabling the Society of Automotive Engineers (originally the Society of Automotive Engineers) vehicle autonomous level functions in vehicles.
The following detailed description is to be read in connection with the accompanying drawings, which are a part of the description and which show, by way of illustration, specific embodiments which can be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosed subject matter, and it is to be understood that other embodiments may be utilized and that structural, logical and electrical changes may be made without departing from the scope of the present invention. The following description of example embodiments is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims.
The terms "forward calculation" and "backward calculation" as used herein refer to calculations performed in a work machine in connection with the training of a neural network model (or another model). The calculations performed during the forward and backward calculations modify the weights according to the results of the previous iteration (e.g., according to the gradient generated at the end of the previous backward calculation). Gradient refers to a measure of the change in output of a work machine as the model weight being calculated by the work machine changes. The gradient measures the change of all weights related to the error change. The larger the gradient value, the faster the model learns.
The term LKAS as used herein refers to a set of assistance systems that assist the driver in keeping the vehicle in the proper lane. Examples of LKAS functions are LDW, RDW, LC, etc. LDW is a warning system that delivers warnings to the driver when the vehicle deviates from the lane without proper signaling. RDW (also known as Road Deviation Mitigation (RDM)) is a warning system that delivers warnings to the driver when the vehicle leaves a road boundary. LC is a mechanism intended to keep the moving vehicle in the center of the lane, mitigating the driver's steering task.
The term "high definition map" (or HD map) as used herein refers to a category of maps built for autonomous driving purposes in relation to higher SAE autonomous level vehicles. HD maps are characterized by extremely high accuracy, e.g., centimeter-level accuracy. The HD map includes information of lane positions, road boundary positions, positions and heights of curbs, positions of traffic signs and road markers, and the like.
The term "lane line graph" as used herein refers to a category of maps that includes a geographical reference lane line for a road. The lane line map is a subset of the HD map, including only lane line information.
The term "class x vehicle" (where x is an integer between 0 and 5) as used herein refers to the vehicle autonomy level of SAE.
The term "inertial measurement unit (or IMU)" as used herein refers to a device that combines an accelerometer, a gyroscope, and a magnetometer together. An IMU is a device intended to track the position and orientation of an object, embedded in most modern computing devices such as smartphones.
The techniques disclosed herein may be used to enable a class 0 vehicle to perform higher level functions (e.g., those of a class 1 or 2 vehicle) without the additional cost incurred by the higher level vehicle's use of high precision sensors. This may be accomplished by using the sensing capabilities of computing devices (e.g., smartphone devices) that are widely available to users.
In the proposed technology, sensors of computing devices (e.g., cameras, IMUs, and GPS) extract lane information (e.g., road parameters) of a road using a Neural Network (NN), thereby enabling the LKAS function on a vehicle without additional high precision device cost. A training database is used that includes a sequence of images of roads recorded from the perspective of the driver while traveling along the route, along with corresponding consumer-level GPS data, industrial-level GPS data, and lane line maps of the route. Ground truth values for road parameters (which are true values for road parameters associated with an image) can be obtained from the lane line map and the high-precision geographic location provided by the industrial-grade GPS data. The data sets include various road geometries, the viewpoint of the driver, the weather, and different times of day. Input data (e.g., video frames) and corresponding output data (e.g., ground truth values for road parameters associated with the video frames) may then be used to train a depth NN for a Road Parameter Estimator Network (RPEN). The trained NN can be generalized, thus learning to detect lane lines, and extract road parameters for roads never seen. The trained NN may then be used by a class 0 vehicle using the computing device camera as an input to generate road parameter estimation data, including lane parameter information and lane context information for output. The lane parameter information may include a lane marker heading angle, a lane marker offset, a lane marker curvature derivative, and a lane marker type. The lane parameter information is used by the vehicle to perform LKAS functions, such as providing LDW and RDW warning notifications that may be communicated to the vehicle driver via a display or other notification device.
The techniques disclosed herein may also be used to provide accurate vehicle positioning (e.g., lane-level positioning). The same training database and NN used to enable the LKAS function is used to provide road parameter estimation data including lane context information. Using a second NN (e.g., a vision-based repositioning algorithm), the second NN may predict an accurate geographic location (e.g., vehicle latitude and longitude) using a sequence of images of the driver's angled road and rough GPS data of a computing device (e.g., a smartphone used by the driver). Sensor fusion techniques are used to combine the results of the two NNs to provide lane-level accuracy of the current position of the vehicle. In this regard, a class 0 vehicle driver may use the sensing capabilities of a computing device (e.g., a smartphone) to provide accurate positioning of the vehicle.
The prior art uses costly high precision radar and GPS systems to perform higher levels of vehicle autonomous functions. In contrast, the techniques disclosed herein may use a neural network running on a consumer device (e.g., smartphone), as well as device sensor capabilities (e.g., camera, IMU, GPS), to provide sensing functionality for improving the SAE autonomy level of the vehicle by enabling LKAS functionality (e.g., LDW and RDW notifications) and providing lane-level vehicle positioning, among other ways.
The techniques disclosed herein estimate new lane context information, such as the lane number in which the vehicle is located (also referred to as "ego Lane No"), the number of lanes associated with the road on which the vehicle is traveling (also referred to as "noOfLanes"), and the relative road distance traveled by the vehicle between two consecutive image frames (also referred to as an RPEN odometer or "relative road distance"), which cannot be estimated by existing navigation systems. Further, the disclosed techniques are used to improve the geographic position accuracy of existing positioning systems for vehicles by imposing spatial constraints on the possible positions of the vehicle based on lane context information (e.g., ego Lane No) as well as real-time extraction of other lane parameter information and lane context information.
Fig. 1 is a block diagram 100 of training a Deep Learning (DL) model using a DL architecture (DLA) to generate a trained DL model 110 provided by some example embodiments. In some example embodiments, machine-learning programs (MLPs), including deep learning programs, also commonly referred to as machine learning algorithms or tools, are used to perform operations associated with associating data or other Artificial Intelligence (AI) -based functions.
As shown in FIG. 1, deep learning model training 108 is performed within DLA 106 based on training data 102 (which may include features). During deep learning model training 108, features from training data 102 may be evaluated in order to further train the DL model. DL model training 108 produces a trained DL model 110. The trained DL model 110 may include one or more classifiers 112, which classifiers 112 may be used to provide DL evaluation results 116 based on new data 114.
In some aspects, the training data 102 may include input data 103, such as image data obtained from a driver perspective, GPS data (e.g., geographic location information associated with the image data), and lane line map data (e.g., lane line map geographic location data associated with the image data). Training data 102 may also include road parameter ground truth data corresponding to input data 103. Input data 103 and output data 105 are used during DL model training 108 to train DL model 110. In this regard, the trained DL model 110 receives new data 114 (e.g., taking images from the perspective of the driver using a computing device such as a smartphone), detects lane lines, and extracts road parameter information (e.g., lane parameter information and lane context information) for roads never seen before using the new data 114. The extracted road parameter information (e.g., output as DL evaluation results 116) is used to provide the LKAS-related functions described herein (e.g., LDW/RDW notification and lane centering) as well as lane-level vehicle positioning information.
Deep learning is part of machine learning, the field of research in which computers are enabled to learn without explicit programming. Machine learning explores the study and construction of algorithms (also referred to herein as tools) that can learn from existing data, can correlate data, and can predict new data. Such machine learning tools operate by building models from example training data (e.g., training data 102) to make data-driven predictions or decisions represented in output or evaluation results 116. Although example embodiments are presented for some machine learning tools (e.g., deep learning architectures), the principles presented herein may also be applied to other machine learning tools.
In some example embodiments, different machine learning tools may be used. For example, logistic regression, Naive Bayes (Naive-Bayes), random forests, neural networks, matrix factorization, and support vector machine tools may be used during deep learning model training 108 (e.g., to correlate training data 102).
Two common types of problems in machine learning are classification problems and regression problems. The classification problem (classification/organization scheme) aims at classifying an item into one of several class values (e.g., is the object apple or orange. Regression algorithms aim to quantify certain items (e.g., by providing real values). In some embodiments, DLA 106 may be used to use machine learning algorithms that use training data 102 to find correlations between recognition features that affect the results.
The machine learning algorithm analyzes the new data 114 using features in the training data 102 to generate an evaluation result 116. These features include various measurable attributes of observed phenomena used to train the machine learning model. The concept of a feature is related to the concept of an explanatory variable used in statistical techniques such as linear regression. The selection of informative, discriminative and independent features is important for the efficient operation of MLPs in pattern recognition, classification and regression. The features may be of different types, for example, numeric features, character strings, and graphics. In some aspects, the training data may be of different types, and the features are numbers, for use by the computing device.
In some aspects, the features used during DL model training 108 may include input data 103, output data 105, one or more of the following: sensor data from a plurality of sensors (e.g., audio sensor, motion sensor, GPS sensor, image sensor); brake event data from a plurality of actuators (e.g., wireless switches or other actuators); external information from a plurality of external sources; timer data associated with sensor status data (e.g., obtaining time sensor data), actuator event data, or external information source data; user communication information; user data; user behavior data, and the like.
The machine learning algorithm uses the training data 102 to find correlations between the identified features that affect the assessment results 116. In some example embodiments, the training data 102 includes label data that is known data of the one or more identifying features and the one or more outcomes. The DL model training 108 within the DLA 106 trains the DL model using the training data 102 (which may include the recognition features). The training result is a trained DL model 110. When evaluating using DL model 110, new data 114 is provided as input to the trained DL model 110, and DL model 110 generates evaluation result 116 as output. For example, the DLA 106 may be deployed on a mobile device, and the new data 114 may include LR images (e.g., frames from LR video, such as a real-time LR video feed). The DLA 106 performs UR functions on the LR image (e.g., increasing image resolution while reducing noise, removing blocking artifacts, and increasing image contrast) to generate an HR output image in real-time.
Fig. 2 is a diagram 200 of generating a trained DL model 206 using a neural network model 204 trained within DLA 106 provided by some example embodiments. Referring to fig. 2, source data 202 is analyzed by a neural network model 204 (or another type of machine learning algorithm or technique) to generate a trained DL model 206 (which may be the same as trained DL model 110). The source data 202 includes a training data set, e.g., 102 (including input data 103 and output data 105), that includes data identified by one or more features. As used herein, the terms "neural network" and "neural network model" are interchangeable.
Machine learning techniques train the model to accurately predict the data that is input to the model (e.g., what the user says in a given dialog, whether the noun is a person, place, or thing, what the weather will be on tomorrow). In the learning phase, a model is developed from a training data set of inputs to optimize the model to correctly predict a target output for a given input. In general, the learning phase may be supervised, semi-supervised or unsupervised; indicating a falling level that provides a "correct" output corresponding to the training input. In the supervised learning phase, all target outputs are provided to the model and the model is guided to develop general rules or algorithms that map inputs to outputs. In contrast, in the unsupervised learning phase, the required outputs are not provided for the inputs so that the model can develop its own rules to discover relationships within the training data set. In the semi-supervised learning phase, an incompletely labeled training set is provided, some outputs of the training data set being known and some outputs being unknown.
The model may be run for several cycles against one training data set, where the training data set is repeatedly entered into the model to refine its results (i.e., the entire data set is processed within one cycle). During an iteration, a model (e.g., a neural network model or another machine learning model) is run for a small batch (or portion) of the entire data set. During the supervised learning phase, a model is developed to predict a target output for a given set of inputs (e.g., source data 202) and evaluated over several cycles to more reliably provide an output that is designated as corresponding to the given input for the maximum number of inputs of the training data set. In another example, for the unsupervised learning phase, a model is developed to cluster the data sets into n groups and evaluate the consistency of the model to place a given input into a given group over several cycles, and the reliability of the model to produce n required clusters per cycle.
The models are evaluated after a cycle of operation and the values of the variables (e.g., weights, biases, or other parameters) of the models are adjusted in an attempt to better refine the models through iteration. The term "weight" as used herein is used to refer to a parameter used by the machine learning model. During the backward calculation, the model may output gradients, which may be used to update the weights associated with the forward calculation.
In various aspects, the bias is evaluated uniformly for false negative bias, for false positive bias, or for the overall accuracy of the model. The values may be adjusted in a variety of ways depending on the machine learning technique used. For example, in a genetic or evolutionary algorithm, the values of the model that predict the most successful output required are used to develop values that the model uses in subsequent cycles, which may include random variations/mutations to provide additional data points. Those of ordinary skill in the art are familiar with several other machine learning algorithms that may be applied to the present invention, including linear regression, random forests, decision tree learning, neural networks, deep neural networks, and the like.
Each model develops a rule or algorithm over several cycles by changing the values of one or more variables that affect the input to more closely map to the desired result, but perfect accuracy and precision may not be achieved because the training data set may vary and is preferably very large. Thus, the number of cycles that make up the learning phase may be set to a given number of trials or a fixed time/computational budget, or may terminate before the number/budget is reached when the accuracy of a given model is sufficiently high or low or a stable phase of accuracy is reached. For example, if the training phase is designed to run n cycles and generate a model with at least 95% accuracy, and such a model is generated before the nth cycle, the learning phase may end early and use the generated model that meets the final target accuracy threshold. Similarly, if the accuracy of a given model is not sufficient to meet the random probability threshold (e.g., the model has only 55% accuracy in determining true/false outputs for a given input), the learning phase of the model may terminate prematurely, but other models of the learning phase may continue to be trained. Similarly, when a given model continues to provide similar accuracy or fluctuation in results over multiple cycles (a performance stabilization phase has been reached), the learning phase of the given model may terminate before the number of cycles/computational budget is reached.
After the learning phase is complete, the model is finalized. In some example embodiments, the finalized model is evaluated according to test criteria. In a first example, a test data set comprising known outputs of inputs is input into the finalized model to determine the accuracy of the model in processing data that has not yet been trained. In a second example, the false positive or false negative rate may be used to evaluate the final determined model. In a third example, the delimitation between data clusters in each model is used to select the model for which the data clusters produce the sharpest boundaries.
In some example embodiments, the DL model 206 is trained by a neural network model 204 (e.g., a deep learning network, a deep convolutional network, or a recurrent neural network), the neural network model 204 comprising a series of "neurons" (e.g., Long Short Term Memory (LSTM) nodes) arranged in a network. Neurons are architectural elements used for data processing and artificial intelligence, particularly machine learning, and include a memory that can determine when to "remember" and when to "forget" values stored in the memory based on the weights of inputs provided to a given neuron. As used herein, each neuron is operable to receive a predefined number of inputs from other neurons in the network to provide relational and sub-relational outputs for the content of the analyzed frames. The individual neurons may be linked together in various configurations of neural networks or organized into tree structures to provide interactive and relational learning modeling to determine how each frame in a conversational sentence correlates with each other.
For example, an LSTM as a neuron includes several gates for processing input vectors (e.g., phonemes from a conversational sentence), storage units, and output vectors (e.g., context representations). An input gate and an output gate control information flow into and out of the memory cell, respectively, and a forgetting gate optionally deletes information from the memory cell based on an input from an earlier link cell in the neural network. The weights and bias vectors for the various gates are adjusted during the training phase and will ultimately be determined for normal operation once the training phase is complete. Those skilled in the art will appreciate that the neurons and neural networks may be constructed programmatically (e.g., by software instructions) or by dedicated hardware that connects each neuron to form a neural network.
The neural network analyzes the data using the features to generate an evaluation result (e.g., recognizing speech units). The features are individual measurable properties of the observed phenomenon. The concept of a feature is related to the concept of an explanatory variable used in statistical techniques such as linear regression. Further, the depth features represent node outputs in a hidden layer of the deep neural network.
Neural networks (e.g., neural network model 204), sometimes referred to as Artificial Neural Networks (ANN) or neural network models, are computing systems based on biological neural networks of an animal brain. These systems gradually improve performance (known as learning) to perform tasks, typically without task-specific programming. For example, in image recognition, a neural network may be taught to recognize an image containing an object by analyzing an example image that has been tagged with an object name, and after learning the object and name, the neural network may use the analysis results to recognize the object in the untagged image. Neural networks are based on a collection of connected units called neurons, where each connection between neurons (called synapse) can transmit a unidirectional signal, the activation strength of which varies with the connection strength. A receiving neuron may activate and propagate a signal to a downstream neuron connected thereto, typically based on whether the combined input signal from potentially many transmitting neurons has sufficient strength, where strength is one parameter.
A Deep Neural Network (DNN), also called a Convolutional Neural Network (CNN), is a stacked neural network composed of a plurality of convolutional layers. These layers are made up of nodes that are locations where computations occur, loosely laid out, forming a pattern of neurons in, for example, the human brain, that trigger when sufficient stimuli are encountered. The node combines the input of data with a set of coefficients or weights that can amplify or suppress the input, which will assign importance to the input of the task that the algorithm is trying to learn. These input-weight products are summed and the sum is passed through the activation function of the node to determine whether and to what extent the signal further passes through the network to affect the final result. DNN uses a cascade of multiple layers of nonlinear processing units for feature extraction and conversion. Each successive layer uses the output of the previous layer as input. The high-level features are derived from the low-level features to form a hierarchical representation. The layers following the input layer may be convolutional layers that produce feature maps that are the result of the filtering of the input and are used by the next convolutional layer.
In the training of the DNN architecture, regression may include minimization of a cost function, where the regression is constructed as a set of statistical processes for estimating the relationships between the variables. The cost function may be implemented as a function that returns a value representing the behavior of the neural network in the mapping training example to correct the output. In training, if the cost function value is not within a predetermined range, back propagation is used according to a known training image, wherein the back propagation is a common method for training an artificial neural network, and is used together with an optimization method such as a Stochastic Gradient Descent (SGD) method.
The use of back propagation may include propagation and weight updating. As an input is presented to the neural network, it is propagated through the neural network layer-by-layer onwards until it reaches the output layer. The output of the neural network is then compared to the desired output using a cost function, and an error value is calculated for each node in the output layer. The error values are propagated back from the output until each node has an associated error value that approximately represents its contribution to the original output. Back propagation may use these error values to calculate the gradient of the cost function with respect to the weights in the neural network. The calculated gradients are input into the selected optimization method to update the weights in an attempt to minimize the cost function.
Even though training architecture 106 is referred to as a deep learning architecture using neural network models (and trained models are referred to as trained deep learning models, e.g., trained DL models 110 and 206), the invention is not limited in this respect and other types of machine learning training architectures may use the techniques disclosed herein for model training.
FIG. 3 illustrates various SAE autonomy levels provided by some example embodiments and a diagram 300 that enables a lower SAE autonomous level vehicle to perform higher SAE autonomous level functions using a computing device using techniques disclosed herein. Referring to fig. 3, a diagram 300 shows vehicles 302, 304, 306 … … 308 associated with SAE autonomous levels 0, 1, 2 … … 5, respectively. The vehicle 304 performs a level 1 function, such as providing a lane departure warning 310. The vehicle 306 performs a level 2 function, such as lane centering 316. The vehicle 308 performs a level 5 function, such as autopilot 322.
As shown in fig. 3, the lane departure warning 310 may be based on road parameter information, such as lane line information 314 detected by the high-precision LKAS camera 312. The lane centering 316 and autopilot 322 functions use a high accuracy position fix 320, which high accuracy position fix 320 is performed using a high accuracy GPS 318 and a high accuracy radar or LIDAR (e.g., LIDAR 324) present on higher level vehicles.
In some aspects, a computing device 330 (e.g., a smartphone or other consumer device) may be used within the level 0 vehicle 302 to perform the techniques 326 (e.g., for determining road parameter estimation data) and 328 (e.g., for determining lane level vehicle positioning) described herein. In this regard, by using the computing device 330 to perform the techniques 326 and 328, the class 0 vehicle 302 may be upgraded to a higher-class vehicle 332 without requiring expensive high-precision equipment, such as the LKAS camera 312, the high-precision GPS 318, and the LIDAR 324.
FIG. 4 showsVarious lane parameter information may be used in connection with some example embodiments. Referring to fig. 4, a diagram 400 shows a vehicle 406 positioned on a roadway including lanes 402 and 404. In some aspects, a front-facing camera mounted within the vehicle 406 (e.g., a camera of a computing device such as a LKAS camera or smartphone) may be used to take multiple images of the road and extract lane parameter information including one or more of: lane marker heading angle (C)1) Lane marker offset (C)0) Lane marker curvature (C)2) Lane marker curvature derivative (C)3) And lane marker types. In FIG. 4, C2And C3Is zero because the road is straight and has no curvature.
In some aspects, a front-facing camera mounted within the vehicle 406 may extract lane parameter information using the techniques disclosed herein. More specifically, the camera may detect lane markers (or lane lines), each of which may be modeled with a 3 rd order polynomial model describing a function X (Z), where Z is the physical longitudinal distance from the camera and X is the physical lateral offset from the camera. The following formula can be used to derive the physical lateral offset X from the camera: x ═ C3·Z3+C2·Z2+C1·Z+C0Wherein, C0Is the lane marker offset 408 when Z is 0 (e.g., the distance in meters from the center of the vehicle 406 to the left lane line or the right lane line when Z is 0 when the vehicle is in the lane 1402 or the lane 2404), C1Is the lane marker heading angle (e.g., the angle between the direction of the road 412 and the direction of the vehicle 410) when Z is 0, C2Curvature of lane marker when Z is 0, C3Is the lane marker curvature derivative when Z is 0. For straight lanes, lane marker curvature C20 and lane marker curvature derivative C3≈0。
Fig. 5 illustrates lane context information that may be used in connection with some example embodiments. Referring to fig. 5, diagram 500 shows vehicle 506 moving on a road having two lanes (lane 1502 and lane 2504). In some aspects, lane context information determined according to the techniques disclosed herein may include one or more of: relative road distance (Δ s)512, representing the distance traveled by moving vehicle 506 between two consecutive image frames captured by the onboard camera (e.g., the distance traveled between vehicle positions 508 and 510); a lane number indication of a road lane in which the moving vehicle is traveling (e.g., egolaneono information indicates that lane 1502 is the lane in which vehicle 506 is traveling); and the number of lanes associated with the road on which the mobile vehicle is traveling (e.g., noOfLanes is 2 for the example shown in fig. 5).
Fig. 6 illustrates a LKAS module 600 for estimating lane parameter information and lane context information provided by some example embodiments. A module may include hardware and/or software designed to perform one or more functions. Referring to fig. 6, LKAS module 600 may include DNN 604, DNN 604 may be trained using a plurality of video frames 602 to generate road parameter estimation data 610, where the plurality of video frames 602 may be captured by an onboard camera (e.g., a camera on a computing device of a user of a vehicle). In some aspects, DNN 604 may use a neural network model (e.g., model 204 of fig. 2) that has been trained using input data 103 and output data 105 so that road parameters may be estimated from driver-angle image data (e.g., video frames taken from a front-facing camera mounted within a moving vehicle). During training, a database is used that includes input data 103 (e.g., a sequence of images of the road recorded from the driver's perspective while driving and their corresponding consumer-level GPS, industrial-level GPS, and lane line maps of the route) and output data 105 (e.g., ground truth values for road parameters that can be obtained from the lane line maps and the high-precision geographic location provided by the industrial-level GPS). The training data sets (e.g., data 103 and 105) also include training images taken at different days and times of day with various road geometries and different viewpoints of the driver's angle.
In some aspects, DNN 604 is also referred to as a Road Parameter Estimator Network (RPEN) and includes a road parameter estimator sub-network (RPESN) 702 and at least one LSTM node 806, as shown in conjunction with fig. 7 and 8. The trained DNN 604 can be generalized, thus learning to observe lane lines, and extracting lane information of roads never seen.
In operation, the LKAS module 600 receives a video frame 602 (e.g., via a computing device camera of a class 0 vehicle driver), the video frame 602 being transmitted as input to the DNN 604. DNN 604 uses RPESN 702 and LSTM 806 to generate road parameter estimation data 610, which road parameter estimation data 610 may include lane parameter information 608 and lane context information 606. The lane context information 606 may include relative road distance (or odometry information, such as relative road distance 512 in fig. 5), egolane no information, and noOfLanes information, as described in connection with fig. 5. The lane parameter information 608 may include lane marker heading angle information, lane marker offset information, lane marker curvature derivative information, and lane marker type, as described in connection with fig. 4. In this regard, the trained DNN 604 uses images received from a class 0 vehicle user's computing device as input to produce lane context information and lane parameter information as output. The road parameter estimation data 610 may be used to generate a LKAS alert including LDW or RDW. In addition to providing sensor functionality for LKAS functionality, the consumer device may also communicate alerts to the user through its display, in effect providing an end-to-end solution without the assistance of external components.
Fig. 7 illustrates a road parameter estimator sub-network (RPESN) for the LKAS module of fig. 6, as provided by some example embodiments. Referring to fig. 7, the RPESN 702 includes a pre-trained CNN706, a pooling layer 708, a first Fully Connected (FC) layer 710, a spatial LSTM 712, and a second FC layer 714.
The input to the RPESN 702 is a plurality of video frames 704 captured by a camera of a consumer device (e.g., a computing device, such as a smartphone of a class 0 vehicle driver), which plurality of video frames 704 are first fed to a pre-trained CNN 706. The pre-trained CNN706 is used as a classification network instead of training the deep neural network from scratch for two reasons: (a) training convolutional networks relies on very large sets of labeled image data, which increases implementation cost and processing time; (b) unlike the classification problem, where each output label is covered by at least one training sample, the output space in the regression is theoretically continuous and infinite. Thus, training the convolutional neural network from zero may be omitted, but a pre-trained classification network that has been trained based on a large image dataset may be used. After the convolutional layer of the pre-trained CNN706, the average pooling layer 708 collects information for each feature channel of the entire image. After the pooling layer 708, the first FC layer 710 learns the correlations between the extracted features. The output of the first FC layer 710 may be viewed as a sequence that is fed to a spatial LSTM 712, the memory chunks of the spatial LSTM 712 performing dimensionality reduction. The spatial LSTM 712 is used to evaluate the spatial relationship of features from the sequences output from the first FC layer 710. In this regard, the spatial LSTM 712 storage unit identifies the most useful feature correlations in the road parameter estimation task. The output of the space LSTM 712 is then passed to a second FC layer 714, the second FC layer 714 generating road parameter estimation data 716 including lane context information.
Fig. 8 illustrates a Road Parameter Estimator Network (RPEN) 800 for the LKAS module of fig. 6 provided by some example embodiments. Referring to fig. 8, RPEN 800 receives as input a plurality of pictures sum 802 processed by RPESN 804. In some aspects, the road parameter estimation data generated by RPESN 804 is further processed by time LSTM 806 to generate modified (or final) road parameter estimation data 808.
For time instant i, RPEN 800 processes a fixed length visual input of size k frames using RPESN 804 (same as RPESN 702), the output of which RPESN 804 is fed into the stack of the cyclic sequence model (time LSTM 806). Time LSTM 806 generates final road parameter estimation data 808 that includes lane context information (e.g., relative road distance information) and land parameter information. In some aspects, the relative road information in the lane context information is also referred to as RPEN odometer information.
Conventional feed-forward neural networks do not have a chronological concept, and the only input it considers is the current example it has been exposed to. Unlike conventional feed-forward neural networks, temporal LSTM 806 may use its internal state (memory) to identify patterns in time-series data (e.g., sequence image frames 802) and generate modified (or final) road parameter estimation data 808. As shown in fig. 8, time LSTM 806 forms a network with a loop in which an output from a previous time is fed as an input to a current time, thereby allowing past information to persist. Temporal LSTM 806 is used to model the temporal dynamics and dependence of the road parameter estimation data. For example, the time constraint binds the current state of the road parameter to its previous state, and a significant change (e.g., above a predetermined threshold) in the lane marker offset information (C0) may trigger a change in the egolaneon no information. Changes in the noOfLanes information are typically accompanied by changes in lane line type (e.g., dashed line to solid line or ahead of road merge signs).
Fig. 9A-9D illustrate a plurality of image frames provided by some example embodiments for determining an RPEN odometer as part of lane context information. Referring to fig. 9A-9D, sequential image frames 900A, 900B, 900C and 900D received for processing by RPEN 800 are shown. More specifically, the pre-trained CNN706 within the RPESN 702 is trained to model the shape and contour of lane lines (e.g., lane lines 906), and how the view of the lane lines changes as the distance from these lane lines changes. These high-level features may be reused to estimate RPEN odometry information (e.g., relative roaddistance information or Δ s 902) associated with the moving vehicle 904. The high dimension of the dashed line 906 on the road represents an injective (one-to-one) function that can form a smooth change in Δ s 902. The virtual lane line 906 is constantly present in the road and is therefore a good candidate for RPEN 800 modeling to regress Δ s 902. The dimensionality reduction capability of the space LSTM enables RPEN 800 to identify the virtual lane lines 906 as the most useful feature correlations for the task of regressing the relative RoadDistance information 902 directly from the images 900A-900D. Since the estimate of the relative roaddistance information 902 depends on the features from the previous frame, the time LSTM 806 retains this information about the previous frame. Therefore, RPEN 800 can estimate the relative roaddistance information 902 by observing only the road.
Vision-based repositioning using monocular images, e.g., continuously estimating the pose (3D position and orientation) of a moving camera, is a fundamental problem in computer vision and robotics. With the advent of autonomous vehicles, locating vehicles on the road using inexpensive sensing sensors (e.g., monocular cameras) has recently become a very popular research topic. There are generally two approaches to solving continuous pose estimation for a moving camera: (1) by location tracking (visual odometer) and (2) by scene recognition (visual-based repositioning).
Fig. 10 illustrates examples 1000A and 1000B of continuous pose estimation by position tracking and position identification provided by some example embodiments. Referring to fig. 10, an example 1000A illustrates continuous pose estimation by position tracking (also known as visual odometry). In example 1000A, robot 1002 moves from location 1004A to location 1006A in a landscape where the map is unknown, but the initial robot pose at location 1004A is known. As shown in fig. 10, the visual odometer positioning technique is locally accurate, but typically results in drift 1008A over time.
Referring to fig. 10, an example 1000B illustrates continuous pose estimation by location identification (also referred to as vision-based repositioning). In example 1000B, robot 1002 moves from location 1004B to location 1006B in a landscape where the map is known but the initial robot pose at location 1004B is unknown. As shown in fig. 10, the vision-based repositioning technique is associated with noise predictions with minimal or no drift 1008B.
As described above, in the case of pose estimation by position tracking (for example, 1000A), the initial pose of the camera is known, and it is not necessarily necessary to know the map of the operation area. In example 1000A, the current pose is updated by integrating the change over time of the pose estimate provided by the perception sensor. For example, visual odometry computation can be used to compute the relative pose between successive frames from sparse feature detection. Pose estimates are locally accurate, but typically drift over time. Another limitation of the visual odometer is that it does not perform well in environments where features cannot be extracted, as shown by the example feature sparse environment 1100 in fig. 11 (which may be a feature sparse rural road environment). RPEN 800, on the other hand, is trained to detect lane lines. RPEN 800 knows the shape and contour of the lane lines and also knows how the appearance of the lane lines changes with distance. In this regard, RPEN-based odometers work in the event of a visual odometer failure.
In the case of pose estimation by scene recognition (often referred to as vision-based repositioning), the initial pose of the camera is unknown, while the map of the operating area is known. From the observed surrounding scene, the scene can be classified in a limited number of discrete positions, allowing a coarse estimation of the camera pose. Here, the pose prediction is noisy, but drift-free. Modern vehicle positioning systems can take advantage of this by fusing complementary, coarse but drift-free predictions of vision-based repositioning algorithms and locally smooth but drift pose estimates of vision mileage calculations.
There are generally two approaches to vision-based repositioning (example 1000B): point feature based repositioning and machine learning based repositioning. In the point feature based approach, given a 3D model and a query image, the first step is to establish a set of 2D-3D matches that relate pixel locations in the query image to 3D points in the 3D model. Next, the camera pose is estimated by applying an n-point pose resolver that calculates the camera pose based on n 2D-3D matches within a random sample consensus (RANSAC) loop to handle outliers. The classical approach to building 3D models is to use structures in motion. These points are essentially reconstructed by essentially triangulating local feature matches, creating a many-to-one mapping from feature descriptors to 3D points. One of the challenges faced by this approach is the inefficient scaling of scene sizes. The computational complexity of the matching increases as the scene size increases. Furthermore, as the probability of finding multiple 3D points with similar local appearance increases, the uniqueness of the 2D-3D match decreases. Another challenge is that this approach requires good initial pose estimation, which can be challenging when the scene has inconsistent illumination, motion blur, non-textured surfaces, and lack of overlap between images. This may be of concern in vehicle positioning applications, particularly when the vehicle is traveling quickly (usually a non-functional area) on a highway.
Machine learning-based localization, in particular depth learning, has recently shown great promise in using monocular images for vision-based localization. Some localization techniques (e.g., PoseNet) use depth CNNs to directly regress the camera's continuous pose based on RGB images. However, the accuracy of PoseNet is far from the requirements of practical applications, and learning between positions and orientations in the same model is difficult. Some improvements to PoseNet include a method known as PoseNet 17. Such deep learning approaches attempt to learn and infer weights based on scene geometry. In other approaches, such as MapNet, sensory inputs such as visual odometer and GPS are fused in addition to images for camera positioning. Other approaches introduce CNN and LSTM architectures to directly regress the geographic location (e.g., via geospatial depth NN) using monocular camera images and coarse GPS in cell phones to predict the geographic location with near lane-level accuracy. Compared to conventional methods, the machine learning method has several advantages that make it suitable for real-time applications, such as vehicle repositioning: (a) the deep learning method does not need initial pose estimation; (b) the machine learning approach is more robust to environmental challenges such as lighting, weather, dynamic objects and non-textured scenes (as it can learn richer features than those obtained from point features); (c) the deep learning method has faster reasoning capability; and (d) the localized machine learning method can scale scene size efficiently because it is able to learn a compact representation of a map using a deep learning model. However, the accuracy of machine learning methods is not as optimal as traditional feature-based methods, especially for indoor environments. The vision-based repositioning relies on landmark recognition. Thus, it performs well in urban areas, but performs poorly in suburban or rural areas. Unfortunately, even in urban areas, no known location technology can provide lane-level location.
Lane-level accuracy of vehicle positioning is highly desirable because it opens up many applications that consumer devices may rely on to provide an end-to-end solution without the aid of external components. For example, with lane-level attitude sensing and positioning, the following goals may be achieved: (a) if the car navigation system in the consumer device knows the lane in which the car is currently traveling, the estimated "time of arrival (TOA)" prediction made by it may be more accurate; and (b) if the lane in which the vehicle is currently traveling soon will merge or jam due to an accident or construction area, a warning may be provided in advance.
Fig. 12 illustrates a block diagram of a positioning module 1200 for high precision positioning provided by some example embodiments. Referring to fig. 12, localization module 1200 includes RPEN 1202 (which may be the same as RPEN 800 of fig. 8), neural network 1204, and sensor fusion module 1206.
The sensor fusion module 1206 may receive as inputs: RPEN odometry information 1216 and road parameter estimation data from RPEN 1202, latitude and longitude information 1220 from neural network 1204, latitude and longitude information 1222 from rough GPS 1212, additional information 1224 such as the distance traveled by the vehicle between adjacent poses from visual odometry or IMU input 1208 (extractable from computing device IMU sensors), and lane plot information 1214.
The neural network 1204 may be an off-the-shelf relocation algorithm, such as a publicly available geospatial NN deployed with RPEN 1202 to provide spatial constraints of possible vehicle locations. The geospatial NN 1204 predicts a geographic location 1220 (latitude and longitude) of lane-level accuracy using a sequence of images of roads from the driver's perspective and location data 1222 from a coarse GPS 1212 of a computing device (e.g., a driver's consumer device, such as a smartphone).
RPEN 1202 may predict the road parameters using a sequence of images of the driver-oriented road obtained from a computing device (e.g., as described in connection with fig. 3-9D). The output of RPEN 1202, geospatial NN 1204, and lane line map information 1214 and coarse GPS location data 1222 are communicated as inputs to a bayesian filtering algorithm within sensor fusion module 1206 that predicts an accurate (e.g., lane-based) geographic location as an output 1226. The sensor inputs of the bayesian-filtering-based sensor fusion algorithm used by the sensor fusion module 1206 include: the spatial constraints imposed by the RPEN outputs 1216 and 1218, as well as the lane line graph information 1214, absolute position information from the coarse GPS 1212 and off-the-shelf repositioning algorithms (e.g., geospatial NN 1204), and constraints between adjacent poses from the IMU, visual odometer, and odometer outputs from the RPEN.
In some aspects, the bayesian-filter-based vehicle pose estimation algorithm used by the sensor fusion module 1206 includes the following processing functions: (a) predicting the geographical position of the vehicle after time t +1 according to the previous knowledge of the vehicle position (time t) and a kinematic equation; (b) given the observations of the sensor, the bayesian filter compares these observations to the predictions; and (c) update the knowledge of the vehicle's geographic location based on the predictions and sensor readings communicated as inputs to the sensor fusion module 1206.
Fig. 13 illustrates an example 1300 of high-precision positioning with lane accuracy using the positioning module of fig. 12 provided by some example embodiments.
Given the lane line graph information, the road parameter estimation data (e.g., 1216 and 1218) of the RPEN 1202 has a higher lateral accuracy than the longitudinal direction of the road. Thus, as shown in fig. 13, the RPEN 1202 output provides a lateral constraint in the probability distribution of vehicle position. The lane line graph may then be used to convert this lateral constraint into a geographic coordinate space (latitude, longitude).
As shown in fig. 13, a coarse GPS accuracy position 1302 is achieved from the output from the coarse GPS 1212. The geospatial NN 1204 is able to almost reach the lane level accuracy location 1304 using only images from the driver's perspective and coarse GPS data. The RPEN odometer 1216 (e.g., relative road distance information) and inputs such as visual odometer and IMU create constraints between adjacent poses, while the off-the-shelf repositioning algorithm, coarse GPS output, and RPEN road parameters 1218 impose spatial constraints on the vehicle's position, resulting in an RPEN position accuracy position 1306. A bayesian filtering algorithm (e.g., Particle Filter (PF) or Extended Kalman Filter (EKF)) used by the sensor fusion module 1206 utilizes these constraints to accurately locate the vehicle and achieve lane-level accuracy position 1308 by fusing the coarse GPS accuracy position 1302, the geospatial NN accuracy position 1304, and the RPEN accuracy position 1306. In some aspects, it is possible to achieve accurate lane-level positioning of the vehicle without the visual odometer and IMU input 1224.
Figure 14 illustrates an example of a suburban environment 1402 and an example urban environment 1404 that some example embodiments provide that may use the location module of figure 12.
Off-the-shelf vision-based repositioning algorithms (e.g., geospatial NN 1204) rely on landmark recognition. In this regard, the geospatial NN 1204 performs well in urban areas (e.g., 1404) but performs poorly in suburban or rural areas (e.g., 1402). Urban areas may be problematic areas for GPS, but GPS performs quite well in rural or suburban areas. RPEN relies on the road and its lane markings, which are usually ignored by vision-based repositioning. Thus, vision-based repositioning, GPS, and RPEN complement each other and can be used by the positioning module 1200 to provide accurate lane-based positioning.
Fig. 15 is a flow diagram of a method 1500 for performing autonomous level functions associated with a mobile vehicle provided by some example embodiments. Method 1500 includes operations 1502, 1504, 1506, 1508, and 1510. By way of example and not limitation, method 1500 may be performed by LKAS module 1760, which LKAS module 1760 is configured for execution within a mobile device, such as device 1700 shown in fig. 17.
Referring to fig. 15, in operation 1502, a plurality of features are extracted from a plurality of image frames obtained from a camera within a moving vehicle using a convolutional layer of a convolutional neural network. For example, a pre-trained CNN (e.g., pre-trained CNN706, which is part of CNN 604) extracts a plurality of features from an image (e.g., image 704) obtained by an onboard camera. In operation 1504, a correlation sequence between the plurality of features is generated using a first FC layer of the deep neural network. For example, the first FC layer generates a correlation sequence between the plurality of features extracted by the pre-trained CNN.
In operation 1506, the correlation sequence is dimensionality reduced using a long-short-term memory (LSTM) model to generate a modified correlation sequence. For example, a spatial LSTM (e.g., LSTM 712) within a road parameter estimator sub-network (RPESN) (e.g., RPESN 702) performs dimensionality reduction of the correlation sequence. In operation 1508, road parameter estimation data associated with the moving vehicle is generated from the modified correlation sequence using a second FC layer of the deep neural network. For example, the second FC layer 714 generates road parameter estimation data 716 associated with the moving vehicle. In operation 1510, a Lane Keep Assist System (LKAS) warning is provided based on the road parameter estimation data. For example, the LDW or RDW notification may be provided by a computing device used to generate the road parameter estimation data 716.
Aspects disclosed herein use a consumer device (e.g., 1700) to provide the sensing functionality needed to increase the SAE autonomy level of a vehicle. RPEN described herein estimates new parameters such as egolaneon, noOfLanes, and relative roaddistance.
Further, RPEN described herein provides constraints between adjacent poses by observing lane line appearance as a function of distance from the lane line. A pose estimation algorithm based on bayesian filtering can take advantage of this constraint to improve geographic location accuracy and provide lane-based vehicle positioning. The technology disclosed herein is used to improve the geographic position accuracy of existing vehicle positioning systems by: under the lane line graph of a given road, spatial constraints are imposed on the possible positions of the vehicle based on lane context information such as egoLane No (self-aware on which lane number the vehicle is located) and real-time extraction of lane parameters provided by the vision sensors.
Fig. 16 is a block diagram of a representative software architecture 1600 provided by some example embodiments, which software architecture 1600 may be used in conjunction with various device hardware described herein. FIG. 16 is only a non-limiting example of a software architecture 1602, and it is to be understood that many other architectures can be implemented that facilitate achieving the functionality described herein. Software architecture 1602 may be implemented in hardware such as device 1700 of FIG. 17, which includes a processor 1705, a memory 1710, storage 1715 and 1720, and I/ O interfaces 1725 and 1730, among others.
A representative hardware layer 1604 is shown, the hardware layer 1604 may represent the device 1700 of fig. 17, or the like. The representative hardware layer 1604 includes one or more processing units 1606 with associated executable instructions 1608. Executable instructions 1608 represent executable instructions of the software architecture 1602, including implementing the methods, modules, etc. of fig. 1-15. The hardware layer 1604 also includes a memory or storage module 1610, which memory or storage module 1610 also has executable instructions 1608. The hardware layer 1604 may also include other hardware 1612, which represents any other hardware of the hardware layer 1604, such as other hardware illustrated as part of the device 1700.
In the example architecture of FIG. 16, the software architecture 1602 may be conceptualized as a stack of layers, where each layer has a particular function. For example, software architecture 1602 may include layers of operating system 1614, libraries 1616, framework/middleware 1618, applications 1620, and presentation layers 1644. In operation, application 1620, or other components within each layer, may call an Application Programming Interface (API) call 1624 through a software stack and receive a response, return value, etc., illustrated with message 1626 in response to API call 1624. The layers shown in FIG. 16 are representative in nature and not all software architectures 1602 have all layers. For example, some mobile or special-purpose operating systems may not provide framework/middleware 1618, while other operating systems may provide such a layer. Other software architectures may include other layers or different layers.
The operating system 1614 may manage hardware resources and provide common services. The operating system 1614 may include a kernel 1628, services 1630, and drivers 1632, among others. The kernel 1628 may act as an abstraction layer between hardware and other software layers. For example, the kernel 1628 may be responsible for memory management, processor management (e.g., scheduling), component management, networking, security settings, and the like. Services 1630 may provide other common services for other software layers. The driver 1632 may be responsible for controlling or interfacing with the underlying hardware. E.g. rootDepending on the hardware configuration, the drivers 1632 may include a display driver, a camera driver,
Figure BDA0003457765890000172
Drives, flash drives, serial communication drives (e.g., Universal Serial Bus (USB) drives),
Figure BDA0003457765890000171
Drivers, audio drivers, power management drivers, and the like.
The repository 1616 may provide a common infrastructure that may be used by the applications 1620 or other components or layers. The function of the library 1616 is generally to allow other software modules to perform tasks in an easier manner than by directly interfacing with the underlying operating system 1614 functions (e.g., kernel 1628, services 1630, or drivers 1632). The library 1616 may include a system library 1634 (e.g., a C-standard library), which system library 1634 may provide functions such as memory allocation functions, string manipulation functions, math functions, etc. Further, the libraries 1616 may include API libraries 1636, such as media libraries (e.g., libraries that support the rendering and operation of various media formats (e.g., MPEG4, h.264, MP3, AAC, AMR, JPG, PNG)), graphics libraries (e.g., OpenGL framework that may be used to render 2D and 3D graphical content on a display), database libraries (e.g., SQLite that may provide various relational database functions), web page libraries (e.g., WebKit that may provide web browsing functions), and so forth. The library 1616 may also include a variety of other libraries 1638 to provide many other APIs to the applications 1620 and other software components/modules.
Framework/middleware 1618 (also sometimes referred to as middleware) may provide a high-level, common infrastructure that can be used by applications 1620 or other software components/modules. For example, the framework/middleware 1618 may provide various Graphical User Interface (GUI) functions, advanced resource management, advanced location services, and the like. The framework/middleware 1618 may provide a wide variety of other APIs for use by applications 1620 or other software components/modules, some of which may be specific to a particular operating system 1614 or platform.
The applications 1620 include a built-in application 1640, a third-party application 1642, a LKAS module 1660, and a positioning module 1662. In some aspects, the LKAS module 1660 may comprise suitable circuitry, logic, interfaces, or code and may be operable to perform one or more functions associated with the LKAS module 600 of fig. 6 and described in conjunction with fig. 1-15. The positioning module 1662 may comprise suitable circuitry, logic, interfaces, or code and may be operable to perform one or more of the vehicle positioning functions associated with the positioning module 1200 of FIG. 12 and described in connection with FIGS. 10-14.
Examples of representative built-in applications 1640 may include, but are not limited to, a contacts application, a browser application, a reader application, a location application, a media application, a messaging application, or a gaming application. The third-party applications 1642 may include any built-in application 1640 as well as a wide variety of other applications. In a particular example, the third-party application 1642 (e.g., Android used by an entity other than the vendor of a particular platform)TMOr iOSTMSoftware Development Kit (SDK) developed applications) may be in a system such as the iOSTM、AndroidTM
Figure BDA0003457765890000181
Phone or other mobile operating system. In this example, the third-party application 1642 may invoke an API call 1624 provided by a mobile operating system (e.g., operating system 1614) to facilitate implementing the functionality described herein.
Applications 1620 may create user interfaces using built-in operating system functions (e.g., kernel 1628, services 1630, and drivers 1632), libraries (e.g., system library 1634, API library 1636, and other libraries 1638), and framework/middleware 1618 to interact with system users. Alternatively or additionally, in some systems, the user may be interacted with through a presentation layer (e.g., presentation layer 1644). In these systems, the application/module "logic" may be separate from aspects of the application/module that interact with the user.
Some software architectures use virtual machines. In the example of FIG. 16, the virtual machine is illustrated by virtual machine 1648. The virtual machine creates a software environment in which applications/modules can execute as if they were executing in a hardware machine (e.g., device 1700 of fig. 17). Virtual machine 1648 is hosted by a host operating system (e.g., operating system 1614) and typically (although not always) has a virtual machine monitor 1646. The virtual machine monitor 1646 is used to manage the operation of the virtual machine 1648 and its connections to the host operating system (i.e., operating system 1614). Software architecture 1602 executes within a virtual machine 1648 such as operating system 1650, libraries 1652, framework/middleware 1654, applications 1656, or presentation layer 1658. These software architecture layers executing within virtual machine 1648 may or may not be the same as the corresponding layers described above.
Fig. 17 is a block diagram of circuitry of an apparatus implementing an algorithm and performing a method provided by some example embodiments. Not all components need be used in various embodiments. For example, the client, server, and cloud-based network device may each use a different set of components, or, in the case of a server, a larger storage device, for example.
An example computing device in the form of a computer 1700 (also referred to as computing device 1700, computer system 1700, or computer 1700) may include a processor 1705, memory 1710, removable storage 1715, non-removable storage 1720, input interface 1725, output interface 1730, and communication interface 1735, all connected via a bus 1740. While the example computing device is shown and described as computer 1700, the computing device may take different forms in different embodiments.
Memory 1710 may include volatile memory 1745 and nonvolatile memory 1750, and may store programs 1755. Computing device 1700 may include or have access to a computing environment that includes a variety of computer-readable media, such as volatile memory 1745, non-volatile memory 1750, removable storage 1715, and non-removable storage 1720. Computer storage includes random-access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), and electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disc read-only memory (CD ROM), Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium capable of storing computer-readable instructions.
Computer-readable instructions stored in a computer-readable medium (e.g., programs 1755 stored in memory 1710) are executable by processor 1705 of computing device 1700. Hard drives, CD-ROMs, and RAMs are some examples of articles of manufacture that include a non-transitory computer-readable medium, such as a storage device. The terms "computer-readable medium" and "storage device" do not include carrier waves that are considered too transient. "computer-readable non-transitory media" includes all types of computer-readable media, including magnetic storage media, optical storage media, flash memory media, and solid state storage media. It should be understood that the software may be installed in and sold with a computer. Alternatively, the software may be acquired and loaded into a computer, including by way of physical media or a distribution system, including, for example, from a server owned by the software author or from a server not owned but used by the software author. For example, the software may be stored in a server for distribution over the internet. The terms "computer-readable medium" and "machine-readable medium" as used herein are interchangeable.
Program 1755 may use a customer preference structure that uses modules described herein, such as LKAS module 1760 and positioning module 1765, which may be the same as LKAS module 1660 and positioning module 1662 of fig. 16, respectively.
Any one or more of the modules described herein may be implemented using hardware (e.g., a processor of a machine, an application-specific integrated circuit (ASIC), a field-programmable gate array (FPGA), or any suitable combination thereof). Further, any two or more of these modules may be combined into a single module, and the functionality of the single module described herein may be subdivided among multiple modules. Further, according to various example embodiments, modules described herein as being implemented within a single machine, database, or device may be distributed across multiple machines, databases, or devices.
In some aspects, the LKAS module 1760, the location module 1765, and one or more other modules that are part of the process 1755 may be integrated into a single module, performing the corresponding functions of the integrated module.
Although several embodiments have been described in detail above, other modifications may be made. For example, the logic flows depicted in the figures do not require the particular order shown, or sequential order, to achieve desirable results. Other steps may be provided, or steps may be deleted, from, the described flows, and other components may be added to, or removed from, the described systems. Other embodiments may be within the scope of the following claims.
In an example embodiment, computer 1700 includes: an extraction module for extracting a plurality of features from a plurality of image frames obtained from a camera within a moving vehicle; a sequence module for generating a correlation sequence between a plurality of features; a reduction module to perform a dimension reduction of the correlation sequence to generate a modified correlation sequence; a road parameter module for generating road parameter estimation data associated with the mobile vehicle from the modified correlation sequence; and a warning module for providing Lane Keep Assist System (LKAS) warning according to the road parameter estimation data. In some embodiments, computer 1700 may include other modules or additional modules for performing any one or combination of the steps described in the embodiments. Moreover, it is contemplated that any additional or alternative embodiments or aspects of the methods as shown in any of the figures or described in any of the claims can include similar modules.
In an example embodiment, computer system 1700 is provided to perform autonomous level functions associated with a moving vehicle. The computer system 1700 includes a memory 1710 that stores instructions, and one or more processors 1705 in communication with the memory 1710. The one or more processors 1705 execute instructions to extract a plurality of features from a plurality of image frames obtained from a camera within a moving vehicle using a convolutional layer of a convolutional neural network; generating a correlation sequence between a plurality of features using a first Fully Connected (FC) layer of a convolutional neural network; performing a dimension reduction of the correlation sequence using a long-short-term memory (LSTM) in space to generate a modified correlation sequence; generating road parameter estimation data associated with the moving vehicle from the modified correlation sequence using a second FC layer of the convolutional neural network; and providing Lane Keep Assist System (LKAS) warning according to the road parameter estimation data.
It will also be appreciated that software comprising one or more computer-executable instructions which facilitate the processes and operations described above in connection with any or all of the steps of the present invention may be installed in and sold with one or more computing devices consistent with the present invention. Alternatively, the software may be acquired and loaded into one or more computing devices, including acquiring the software through a physical medium or distribution system, including, for example, acquiring the software from a server owned by the software author or from a server not owned but used by the software author. For example, the software may be stored in a server for distribution over the internet.
Furthermore, it is to be understood by those skilled in the art that the present invention is not limited in its application to the details of construction and the arrangement of components set forth in the following description or illustrated in the following drawings. The embodiments herein are capable of other embodiments and of being practiced or of being carried out in various ways. Also, it is to be understood that the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of "including," "comprising," or "having" and variations thereof herein is meant to encompass the items listed thereafter and equivalents thereof as well as additional items. Unless limited otherwise, the terms "connected," "coupled," and "mounted," and variations thereof herein are used broadly and encompass direct and indirect connections, couplings, and mountings. In addition, the terms "connected" and "coupled" and variations thereof are not restricted to physical or mechanical connections or couplings. Furthermore, terms such as "upper," "lower," "bottom," and "top" are relative and are used to aid in the description, but are not limiting.
The components of the illustrative apparatus, systems, and methods used in accordance with the described embodiments may be implemented at least partially in digital electronic circuitry, analog electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. For example, these components may be implemented as a computer program product (e.g., a computer program, program code, or computer instructions) tangibly embodied in an information carrier, or in a machine-readable storage device, for execution by, or to control the operation of, data processing apparatus (e.g., a programmable processor, a computer, or multiple computers).
A computer program can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program can be deployed to be executed at one site in one computer or on multiple computers or distributed across multiple sites and interconnected by a communication network. Furthermore, functional programs, codes, and code segments for implementing the techniques described herein are readily understood by programmers skilled in the art to which the techniques described herein pertain to be within the scope of the claims. Method steps associated with the illustrative embodiments may be performed by one or more programmable processors executing a computer program, code, or instructions to perform functions such as operating on input data or generating output. For example, method steps can also be performed by, and apparatus for performing, special purpose logic circuitry, e.g., a Field Programmable Gate Array (FPGA) or an application-specific integrated circuit (ASIC).
The various illustrative logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an ASIC, FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other similar configuration.
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. Information carriers suitable for embodying computer program instructions and data include all forms of non-volatile memory, including, for example, semiconductor memory devices, such as electrically programmable read-only memory or Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), flash memory devices, or data storage disks (e.g., magnetic disks, internal hard disks, removable magnetic disks, magneto-optical disks, or CD-ROM/DVD-ROM disks). The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
Those of skill in the art would understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
As used herein, a "machine-readable medium" (or "computer-readable medium") includes devices capable of storing instructions and data, either temporarily or permanently, and may include, but is not limited to, random-access memory (RAM), read-only memory (ROM), cache memory, flash memory, optical media, magnetic media, cache memory, other types of memory (e.g., erasable programmable read-only memory (EEPROM)), or any suitable combination thereof. The term "machine-readable medium" should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) that are capable of storing the processor instructions. The term "machine-readable medium" shall also be taken to include any medium, or combination of media, that is capable of storing instructions for execution by one or more processors, which when executed by the one or more processors, cause the one or more processors to perform any one or more of the methodologies described herein. Accordingly, "machine-readable medium" refers to a single storage apparatus or device, as well as a "cloud-based" storage system or storage network that includes multiple storage apparatuses or devices. The term "machine-readable medium" as used herein does not include a signal per se.
Furthermore, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present disclosure. Other items shown or described as coupled or directly coupled or communicating with each other may be indirectly coupled or communicating through some interface, device, or intermediate component whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and could be made without departing from the scope disclosed herein.
While the invention has been described with reference to specific features and embodiments thereof, it will be apparent that various modifications and combinations of the invention can be made without departing from the invention. For example, other components may be added to or removed from the described systems. Accordingly, the specification and figures are to be regarded only as illustrative of the invention as defined in the appended claims and any and all modifications, variations, combinations, or equivalents that fall within the scope of the invention are contemplated. Other aspects may be within the scope of the following claims. Finally, as used herein, the conjunction "or" refers to a non-exclusive "or" unless specifically stated otherwise.

Claims (25)

1. A computer-implemented method for performing autonomous level functions associated with a mobile vehicle, the method comprising:
extracting a plurality of features from a plurality of image frames obtained from a camera within the moving vehicle using convolutional layers of a convolutional neural network;
generating a correlation sequence between the plurality of features using a first Fully Connected (FC) layer of the convolutional neural network;
performing a dimension reduction of the correlation sequence using a long-short-term memory (LSTM) in space to generate a modified correlation sequence;
generating road parameter estimation data associated with the moving vehicle from the modified correlation sequence using a second FC layer of the convolutional neural network;
and providing Lane Keep Assist System (LKAS) warning according to the road parameter estimation data.
2. The computer-implemented method of claim 1, further comprising:
one or more time series data patterns in the road parameter estimation data are detected using time LSTM.
3. The computer-implemented method of claim 2, further comprising:
modifying the road parameter estimation data according to the detected one or more time series data patterns using the temporal LSTM to generate modified road parameter estimation data.
4. The computer-implemented method of claim 3, wherein the modified road parameter estimation data includes lane parameter information and lane context information.
5. The computer-implemented method of claim 4, wherein the lane parameter information comprises one or more of:
marking a course angle by a lane;
lane marking offset;
lane marking curvature;
lane marker curvature derivative;
lane marker type.
6. The computer-implemented method of claim 4, wherein the lane context information comprises one or more of:
a relative road distance representing a distance traveled by the moving vehicle between two consecutive frames of the plurality of image frames;
a lane number indication of a road lane in which the mobile vehicle is traveling;
a number of lanes associated with a road on which the moving vehicle is traveling.
7. The computer-implemented method of any of claims 1-6, further comprising: extracting geographic location information of the moving vehicle using the plurality of image frames obtained by the camera using a convolutional layer of a second convolutional neural network.
8. The computer-implemented method of claim 7, further comprising:
applying one or more spatial constraints to the geographic location information using Bayesian filtering to generate updated geographic location information for the moving vehicle, the one or more spatial constraints being based on the road parameter estimation data;
outputting the updated geographic location information of the moving vehicle.
9. A system for performing autonomous level functions associated with a mobile vehicle, the system comprising:
a memory storing instructions;
one or more processors in communication with the memory, the one or more processors executing the instructions to:
extracting a plurality of features from a plurality of image frames obtained from a camera within the moving vehicle using convolutional layers of a convolutional neural network;
generating a correlation sequence between the plurality of features using a first Fully Connected (FC) layer of the convolutional neural network;
performing a dimension reduction of the correlation sequence using a long-short-term memory (LSTM) in space to generate a modified correlation sequence;
generating road parameter estimation data associated with the moving vehicle from the modified correlation sequence using a second FC layer of the convolutional neural network;
and providing Lane Keep Assist System (LKAS) warning according to the road parameter estimation data.
10. The system of claim 9, wherein the one or more processors are further to execute the instructions to:
one or more time series data patterns in the road parameter estimation data are detected using time LSTM.
11. The system according to claim 10, wherein the one or more processors further execute the instructions to:
modifying the road parameter estimation data according to the detected one or more time series data patterns using the temporal LSTM to generate modified road parameter estimation data.
12. The system of claim 11, wherein the modified road parameter estimation data comprises lane parameter information and lane context information.
13. The system of claim 12, wherein the lane parameter information comprises one or more of:
marking a course angle by a lane;
lane marking offset;
lane marking curvature;
lane marker curvature derivative;
lane marker type.
14. The system of claim 12, wherein the lane context information comprises one or more of:
a relative road distance representing a distance traveled by the moving vehicle between two consecutive frames of the plurality of image frames;
a lane number indication of a road lane in which the mobile vehicle is traveling;
a number of lanes associated with a road on which the moving vehicle is traveling.
15. The system according to any one of claims 9 to 14, wherein the one or more processors are further to execute the instructions to:
extracting geographic location information of the moving vehicle using the plurality of image frames obtained by the camera using a convolutional layer of a second convolutional neural network.
16. The system according to claim 15, wherein the one or more processors further execute the instructions to:
applying one or more spatial constraints to the geographic location information using Bayesian filtering to generate updated geographic location information for the moving vehicle, the one or more spatial constraints being based on the road parameter estimation data;
outputting the updated geographic location information of the moving vehicle.
17. A computer-readable medium storing computer instructions for performing autonomous level functions associated with a moving vehicle, wherein the instructions, when executed by one or more processors of a computing device, cause the one or more processors to perform operations comprising:
extracting a plurality of features from a plurality of image frames obtained from a camera within the moving vehicle using convolutional layers of a convolutional neural network;
generating a correlation sequence between the plurality of features using a first Fully Connected (FC) layer of the convolutional neural network;
performing a dimension reduction of the correlation sequence using a long-short-term memory (LSTM) in space to generate a modified correlation sequence;
generating road parameter estimation data associated with the moving vehicle from the modified correlation sequence using a second FC layer of the convolutional neural network;
and providing Lane Keep Assist System (LKAS) warning according to the road parameter estimation data.
18. The computer-readable medium of claim 17, wherein the instructions further cause the one or more processors to:
one or more time series data patterns in the road parameter estimation data are detected using time LSTM.
19. The computer-readable medium of claim 18, wherein the instructions further cause the one or more processors to:
modifying the road parameter estimation data according to the detected one or more time series data patterns using the temporal LSTM to generate modified road parameter estimation data.
20. The computer-readable medium of claim 19, wherein the modified road parameter estimation data includes lane parameter information and lane context information.
21. The computer-readable medium of claim 20, wherein the lane parameter information comprises one or more of:
marking a course angle by a lane;
lane marking offset;
lane marking curvature;
lane marker curvature derivative;
lane marker type.
22. The computer-readable medium of claim 20, wherein the lane context information comprises one or more of:
a relative road distance representing a distance traveled by the moving vehicle between two consecutive frames of the plurality of image frames;
a lane number indication of a road lane in which the mobile vehicle is traveling;
a number of lanes associated with a road on which the moving vehicle is traveling.
23. The computer-readable medium of any one of claims 17 to 22, wherein the instructions further cause the one or more processors to:
extracting geographic location information of the moving vehicle using the plurality of image frames obtained by the camera using a convolutional layer of a second convolutional neural network.
24. The computer-readable medium of claim 23, wherein the instructions further cause the one or more processors to:
applying one or more spatial constraints to the geographic location information using Bayesian filtering to generate updated geographic location information for the moving vehicle, the one or more spatial constraints being based on the road parameter estimation data;
outputting the updated geographic location information of the moving vehicle.
25. A system for performing autonomous level functions associated with a mobile vehicle, the system comprising:
an extraction module to extract a plurality of features from a plurality of image frames obtained from a camera within the moving vehicle;
a correlation module for generating a correlation sequence between the plurality of features;
a dimension reduction module to perform a dimension reduction of the correlation sequence to generate a modified correlation sequence;
an estimation module to generate road parameter estimation data associated with the moving vehicle from the modified correlation sequence;
and the notification module is used for providing Lane Keep Assist System (LKAS) warning according to the road parameter estimation data.
CN201980098224.1A 2019-07-08 2019-07-08 Vehicle autonomous level function Pending CN114127810A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2019/040833 WO2021006870A1 (en) 2019-07-08 2019-07-08 Vehicular autonomy-level functions

Publications (1)

Publication Number Publication Date
CN114127810A true CN114127810A (en) 2022-03-01

Family

ID=67441719

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980098224.1A Pending CN114127810A (en) 2019-07-08 2019-07-08 Vehicle autonomous level function

Country Status (2)

Country Link
CN (1) CN114127810A (en)
WO (1) WO2021006870A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113264043A (en) * 2021-05-17 2021-08-17 北京工业大学 Unmanned driving layered motion decision control method based on deep reinforcement learning
CN115240150A (en) * 2022-06-21 2022-10-25 佛山仙湖实验室 Lane departure warning method, system, device and medium based on monocular camera
CN115203457B (en) * 2022-07-15 2023-11-14 小米汽车科技有限公司 Image retrieval method, device, vehicle, storage medium and chip
DE102022121670A1 (en) 2022-08-26 2024-02-29 Connaught Electronics Ltd. Lane recognition and driving a vehicle

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3495993A1 (en) * 2017-12-11 2019-06-12 Continental Automotive GmbH Road marking determining apparatus for automated driving

Also Published As

Publication number Publication date
WO2021006870A1 (en) 2021-01-14

Similar Documents

Publication Publication Date Title
US10579058B2 (en) Apparatus and method for generating training data to train neural network determining information associated with road included in image
CN111670468B (en) Moving body behavior prediction device and moving body behavior prediction method
US11131993B2 (en) Methods and systems for trajectory forecasting with recurrent neural networks using inertial behavioral rollout
Hou et al. Interactive trajectory prediction of surrounding road users for autonomous driving using structural-LSTM network
JP7338052B2 (en) Trajectory prediction method, device, equipment and storage media resource
US10627818B2 (en) Temporal prediction model for semantic intent understanding
US20200242375A1 (en) Neural networks for object detection and characterization
CN111771135B (en) LIDAR positioning using RNN and LSTM for time smoothing in autonomous vehicles
WO2019199873A1 (en) Techniques for considering uncertainty in use of artificial intelligence models
WO2019199880A1 (en) User interface for presenting decisions
WO2019199876A1 (en) Dynamically controlling sensor behavior
CN114127810A (en) Vehicle autonomous level function
CN111971574A (en) Deep learning based feature extraction for LIDAR localization of autonomous vehicles
JP2021515724A (en) LIDAR positioning to infer solutions using 3DCNN network in self-driving cars
JP2017027599A (en) Turning prediction
Niranjan et al. Deep learning based object detection model for autonomous driving research using carla simulator
US11875680B2 (en) Systems and methods for augmenting perception data with supplemental information
JP7345577B2 (en) Dynamic model evaluation package for autonomous vehicles
US20220161830A1 (en) Dynamic Scene Representation
US11810365B1 (en) Perception error modeling
Villagra et al. Motion prediction and risk assessment
Tas et al. High-definition map update framework for intelligent autonomous transfer vehicles
Salzmann et al. Online Path Generation from Sensor Data for Highly Automated Driving Functions
US20220326714A1 (en) Unmapped u-turn behavior prediction using machine learning
US20240208546A1 (en) Predictive models for autonomous vehicles based on object interactions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination