US20190235520A1 - Cognitive mapping for vehicles - Google Patents

Cognitive mapping for vehicles Download PDF

Info

Publication number
US20190235520A1
US20190235520A1 US15/881,228 US201815881228A US2019235520A1 US 20190235520 A1 US20190235520 A1 US 20190235520A1 US 201815881228 A US201815881228 A US 201815881228A US 2019235520 A1 US2019235520 A1 US 2019235520A1
Authority
US
United States
Prior art keywords
vehicle
cognitive map
image
cognitive
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US15/881,228
Other versions
US10345822B1 (en
Inventor
Mostafa Parchami
Vahid Taimouri
Gintaras Vincent Puskorius
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ford Global Technologies LLC
Original Assignee
Ford Global Technologies LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ford Global Technologies LLC filed Critical Ford Global Technologies LLC
Priority to US15/881,228 priority Critical patent/US10345822B1/en
Assigned to FORD GLOBAL TECHNOLOGIES, LLC reassignment FORD GLOBAL TECHNOLOGIES, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: PUSKORIUS, GINTARAS VINCENT, PARCHAMI, MOSTAFA, Taimouri, Vahid
Priority to CN201910068684.1A priority patent/CN110084091A/en
Priority to DE102019101938.9A priority patent/DE102019101938A1/en
Application granted granted Critical
Publication of US10345822B1 publication Critical patent/US10345822B1/en
Publication of US20190235520A1 publication Critical patent/US20190235520A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/08Active safety systems predicting or avoiding probable or impending collision or attempting to minimise its consequences
    • B60W30/09Taking automatic action to avoid collision, e.g. braking and steering
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/0088Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0231Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
    • G05D1/0246Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means
    • G05D1/0248Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means in combination with a laser
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0231Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
    • G05D1/0246Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means
    • G05D1/0251Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means extracting 3D information from a plurality of images taken from different locations, e.g. stereo vision
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24143Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]
    • G06K9/00791
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations

Definitions

  • Vehicles can be equipped to operate in both autonomous and occupant piloted mode.
  • Vehicles can be equipped with computing devices, networks, sensors and controllers to acquire information regarding the vehicle's environment and to operate the vehicle based on the information.
  • Safe and comfortable operation of the vehicle can depend upon determining predicted vehicle trajectories based on accurate and timely information regarding the vehicle's environment.
  • safe and comfortable operation of the vehicle can depend upon acquiring accurate and timely information regarding objects in a vehicle's environment while the vehicle is being operated on a roadway. It is a problem to provide accurate and timely information regarding objects near or around a vehicle to support operation of the vehicle.
  • FIG. 1 is a block diagram of an example vehicle.
  • FIG. 2 is a diagram of an example image of a traffic scene.
  • FIG. 3 is a diagram of an example cognitive map.
  • FIG. 4 is a diagram of an example convolutional neural network.
  • FIG. 5 is a flowchart diagram of an example process to operate a vehicle based on a cognitive map.
  • FIG. 6 is a flowchart diagram of an example process to train a convolutional neural network to output a cognitive map.
  • Vehicles can be equipped to operate in both autonomous and occupant piloted mode.
  • a semi- or fully-autonomous mode we mean a mode of operation wherein a vehicle can be piloted by a computing device as part of a vehicle information system having sensors and controllers. The vehicle can be occupied or unoccupied, but in either case the vehicle can be piloted without assistance of an occupant.
  • an autonomous mode is defined as one in which each of vehicle propulsion (e.g., via a powertrain including an internal combustion engine and/or electric motor), braking, and steering are controlled by one or more vehicle computers; in a semi-autonomous mode the vehicle computer(s) control(s) one or two of vehicle propulsion, braking, and steering. In a non-autonomous vehicle, none of these are controlled by a computer.
  • An estimate of a location, e.g., according to geo-coordinates, of a vehicle with respect to a map can be used by a computing device to operate a vehicle on a roadway from a current location to a determined destination, for example.
  • the map can be a cognitive map.
  • a cognitive map in the context of this disclosure is a top-down view, 2D representation of the physical environment around a vehicle.
  • the cognitive map can include a top-down, 2D representation of the roadway ahead of a current vehicle location and in a direction of current vehicle travel.
  • the direction of current vehicle travel is based on the current vehicle trajectory, which includes speed, direction, longitudinal acceleration, and lateral acceleration.
  • the cognitive map can include a roadway and objects such as lanes, barriers, shoulders, and lane markers, vehicles and pedestrians, for example.
  • a cognitive map is a mental representation of the physical environment. For example, humans and animals use cognitive maps to find their way around their environment.
  • a cognitive map is used by a computing device to operate a vehicle, including actuating vehicle components including powertrain, steering and braking to direct the vehicle from a current location to a destination location in a safe and comfortable fashion.
  • the cognitive map can be used by the computing device to determine predicted vehicle trajectories based on determined locations of lanes and determined locations and trajectories of other vehicles in the cognitive map, for example.
  • a cognitive map can depict semantic segmentation of objects viewed from top-down view and accurately illustrate a distance to each point from vehicle 110 .
  • a method including acquiring an image of a vehicle environment, determining a cognitive map, which includes a top-down view of the vehicle environment, based on the image, and operating the vehicle based on the cognitive map.
  • the vehicle environment can include a roadway and objects including other vehicles and pedestrians.
  • the cognitive map can include locations of the objects including at least one of other vehicles and pedestrians, relative to the vehicle.
  • the image can be a monocular video frame.
  • the cognitive map of the vehicle environment can be based on processing the image with a convolutional neural network.
  • the convolutional neural network can be trained based on ground truth data prior to determining the cognitive map.
  • the ground truth data can be based on object detection, pixel-wise segmentation, 3D object pose, and relative distance.
  • Training the convolutional neural network can be based on prediction images included in the convolutional neural network.
  • the prediction images can be based on ground truth data.
  • the neural network learns how to transform input RGB images to estimation of cognitive maps.
  • the estimated cognitive maps can be combined with intermediate estimations of cognitive maps to and compared against the prediction images to determine similarity.
  • the similarity between the estimated combined cognitive maps can be determined by calculating a cost function.
  • the cost function can be based on a weighted cross entropy function based on comparing the estimated cognitive maps and the intermediate cognitive maps with the prediction images.
  • the prediction images can be based on LIDAR data.
  • a computer readable medium storing program instructions for executing some or all of the above method steps.
  • a computer programmed for executing some or all of the above method steps including a computer apparatus, programmed to acquire an image of a vehicle environment, determine a cognitive map, which includes a top-down view of the vehicle environment, based on the image, and operate the vehicle based on the cognitive map.
  • the vehicle environment can include a roadway and objects including other vehicles and pedestrians.
  • the cognitive map can include locations of the objects including at least one of other vehicles and pedestrians, relative to the vehicle.
  • the image can be a monocular video frame.
  • the cognitive map of the vehicle environment can be based on processing the image with a convolutional neural network.
  • the convolutional neural network can be trained based on ground truth data prior to determining the cognitive map.
  • the ground truth data can be based on object detection, pixel-wise segmentation, 3D object pose, and relative distance.
  • the computer can be further programmed to train the convolutional neural network based on prediction images included in the convolutional neural network.
  • the prediction images can be based on ground truth data.
  • the prediction images can transform estimated results into estimated cognitive maps.
  • the estimated cognitive maps can be combined with intermediate cognitive maps to determine similarity.
  • the similarity between the estimated cognitive maps and the prediction images can be determined by calculating a cost function.
  • the cost function can be based on a weighted cross entropy function based on comparing the estimated cognitive maps combined with the intermediate cognitive maps and prediction images.
  • the prediction images can be based on LIDAR data.
  • FIG. 1 is a diagram of a vehicle information system 100 that includes a vehicle 110 operable in autonomous (“autonomous” by itself in this disclosure means “fully autonomous”) and occupant piloted (also referred to as non-autonomous) mode.
  • Vehicle 110 also includes one or more computing devices 115 for performing computations for piloting the vehicle 110 during autonomous operation.
  • Computing devices 115 can receive information regarding the operation of the vehicle from sensors 116 .
  • the computing device 115 includes a processor and a memory such as are known. Further, the memory includes one or more forms of computer-readable media, and stores instructions executable by the processor for performing various operations, including as disclosed herein.
  • the computing device 115 may include programming to operate one or more of vehicle brakes, propulsion (e.g., control of acceleration in the vehicle 110 by controlling one or more of an internal combustion engine, electric motor, hybrid engine, etc.), steering, climate control, interior and/or exterior lights, etc., as well as to determine whether and when the computing device 115 , as opposed to a human operator, is to control such operations.
  • propulsion e.g., control of acceleration in the vehicle 110 by controlling one or more of an internal combustion engine, electric motor, hybrid engine, etc.
  • steering climate control
  • interior and/or exterior lights etc.
  • the computing device 115 may include or be communicatively coupled to, e.g., via a vehicle communications bus as described further below, more than one computing devices, e.g., controllers or the like included in the vehicle 110 for monitoring and/or controlling various vehicle components, e.g., a powertrain controller 112 , a brake controller 113 , a steering controller 114 , etc.
  • the computing device 115 is generally arranged for communications on a vehicle communication network, e.g., including a bus in the vehicle 110 such as a controller area network (CAN) or the like; the vehicle 110 network can additionally or alternatively include wired or wireless communication mechanism such as are known, e.g., Ethernet or other communication protocols.
  • a vehicle communication network e.g., including a bus in the vehicle 110 such as a controller area network (CAN) or the like
  • the vehicle 110 network can additionally or alternatively include wired or wireless communication mechanism such as are known, e.g., Ethernet or other communication protocols.
  • the computing device 115 may transmit messages to various devices in the vehicle and/or receive messages from the various devices, e.g., controllers, actuators, sensors, etc., including sensors 116 .
  • the vehicle communication network may be used for communications between devices represented as the computing device 115 in this disclosure.
  • various controllers or sensing elements such as sensors 116 may provide data to the computing device 115 via the vehicle communication network.
  • the computing device 115 may be configured for communicating through a vehicle-to-infrastructure (V-to-I) interface 111 with a remote server computer 120 , e.g., a cloud server, via a network 130 , which, as described below.
  • a vehicle-to-infrastructure (V-to-I) interface 111 includes hardware, firmware, and software that permits computing device 115 to communicate with a remote server computer 120 via a network 130 such as wireless Internet (Wi-Fi) or cellular networks.
  • V-to-I interface 111 may accordingly include processors, memory, transceivers, etc., configured to utilize various wired and/or wireless networking technologies, e.g., cellular, BLUETOOTH® and wired and/or wireless packet networks.
  • Computing device 115 may be configured for communicating with other vehicles through V-to-I interface 111 using vehicle-to-vehicle (V-to-V) networks, e.g., according to Dedicated Short Range Communications (DSRC) and/or the like, e.g., formed on an ad hoc basis among nearby vehicles 110 or formed through infrastructure-based networks including the Internet via cellular networks or Wi-Fi, for example.
  • V-to-V vehicle-to-vehicle
  • DSRC Dedicated Short Range Communications
  • the computing device 115 also includes nonvolatile memory such as is known.
  • Computing device 115 can log, i.e., store in a memory, information by storing the information in nonvolatile memory for later retrieval and transmittal via the vehicle communication network and a vehicle to infrastructure (V-to-I) interface 111 to a server computer 120 or user mobile device 160 .
  • the computing device 115 may make various determinations and/or control various vehicle 110 components and/or operations without a driver to operate the vehicle 110 .
  • the computing device 115 may include programming to regulate vehicle 110 operational behaviors (i.e., physical manifestations of vehicle 110 operation) such as speed, acceleration, deceleration, steering, etc., as well as tactical behaviors (i.e., control of operational behaviors typically in a manner intended to achieve safe and efficient traversal of a route) such as a distance between vehicles and/or amount of time between vehicles, lane-change, minimum gap between vehicles, left-turn-across-path minimum, time-to-arrival at a particular location and intersection (without signal) minimum time-to-arrival to cross the intersection.
  • vehicle 110 operational behaviors i.e., physical manifestations of vehicle 110 operation
  • tactical behaviors i.e., control of operational behaviors typically in a manner intended to achieve safe and efficient traversal of a route
  • tactical behaviors i.e., control of operational behaviors typically in a manner intended to achieve safe and efficient traversal of a route
  • Controllers include computing devices that typically are programmed to control a specific vehicle subsystem. Examples include a powertrain controller 112 , a brake controller 113 , and a steering controller 114 .
  • a controller may be an electronic control unit (ECU) such as is known, possibly including additional programming as described herein.
  • the controllers may communicatively be connected to and receive instructions from the computing device 115 to actuate the subsystem according to the instructions.
  • the brake controller 113 may receive instructions from the computing device 115 to operate the brakes of the vehicle 110 .
  • the one or more controllers 112 , 113 , 114 for the vehicle 110 may include conventional electronic control units (ECUs) or the like including, as non-limiting examples, one or more powertrain controllers 112 , one or more brake controllers 113 and one or more steering controllers 114 .
  • ECUs electronice control units
  • Each of the controllers 112 , 113 , 114 may include respective processors and memories and one or more actuators.
  • the controllers 112 , 113 , 114 may be programmed and connected to a vehicle 110 communications bus, such as a controller area network (CAN) bus or local interconnect network (LIN) bus, to receive instructions from the computer 115 and control actuators based on the instructions.
  • a vehicle 110 communications bus such as a controller area network (CAN) bus or local interconnect network (LIN) bus
  • Sensors 116 may include a variety of devices known to provide data via the vehicle communications bus.
  • a radar fixed to a front bumper (not shown) of the vehicle 110 may provide a distance from the vehicle 110 to a next vehicle in front of the vehicle 110
  • a global positioning system (GPS) sensor disposed in the vehicle 110 may provide geographical coordinates of the vehicle 110 .
  • the distance(s) provided by the radar and/or other sensors 116 and/or the geographical coordinates provided by the GPS sensor may be used by the computing device 115 to operate the vehicle 110 autonomously or semi-autonomously.
  • the vehicle 110 is generally a land-based autonomous vehicle 110 having three or more wheels, e.g., a passenger car, light truck, etc.
  • the vehicle 110 includes one or more sensors 116 , the V-to-I interface 111 , the computing device 115 and one or more controllers 112 , 113 , 114 .
  • the sensors 116 may be programmed to collect data related to the vehicle 110 and the environment in which the vehicle 110 is operating.
  • sensors 116 may include, e.g., altimeters, cameras, LIDAR, radar, ultrasonic sensors, infrared sensors, pressure sensors, accelerometers, gyroscopes, temperature sensors, pressure sensors, hall sensors, optical sensors, voltage sensors, current sensors, mechanical sensors such as switches, etc.
  • the sensors 116 may be used to sense the environment in which the vehicle 110 is operating, e.g., sensors 116 can detect phenomena such as weather conditions (precipitation, external ambient temperature, etc.), the grade of a road, the location of a road (e.g., using road edges, lane markings, etc.), or locations of target objects such as neighboring vehicles 110 .
  • the sensors 116 may further be used to collect data including dynamic vehicle 110 data related to operations of the vehicle 110 such as velocity, yaw rate, steering angle, engine speed, brake pressure, oil pressure, the power level applied to controllers 112 , 113 , 114 in the vehicle 110 , connectivity between components, and accurate and timely performance of components of the vehicle 110 .
  • FIG. 2 illustrates an image 200 of a traffic scene including a roadway 202 and other vehicles 204 , 206 , 208 , 210 .
  • the image 200 can be a monocular video frame acquired by computing device 115 from a video sensor 116 included in a vehicle 110 , for example.
  • a monocular video frame can include three color planes with a bit depth of eight bits each for a total of 24 bits corresponding to red, green, and blue (RGB) color components.
  • Image 200 can include a roadway 202 , lane marker 212 , barriers 224 , 226 , 228 and roadway shoulders or terrain adjacent to roadway 230 , 232 .
  • Computing device 115 can use image 200 to produce a cognitive map including roadway 202 and objects including other vehicles 204 , 206 , 208 , 210 , lane marker 212 , barriers 224 , 226 , 228 and roadway shoulders or terrain adjacent to roadway 230 , 232 and, based on the cognitive map including roadway 202 and objects, determine predicted trajectories for operating vehicle 110 .
  • FIG. 3 is a cognitive map 300 of a traffic scene including a roadway 302 (white) and objects including other vehicles 304 , 306 , 308 , 310 , (grid) rendered in white and grid, respectively, to denote different colors.
  • lane marker 312 black
  • barriers 314 , 316 , 318 upward diagonal
  • shoulders or adjacent terrain 320 , 322 cross-hatch
  • a cognitive map can include 20 or more channels each including objects belonging to a single class, such a “roadway”, “vehicle”, “pedestrian”, “cyclist”, etc.
  • Cognitive map 300 can be created by inputting an image 200 into a convolutional neural network (CNN), configured and trained as described in relation to FIG. 4 , below, which, in response to the input, outputs a cognitive map 300 .
  • CNN convolutional neural network
  • Computing device 115 can operate vehicle 110 based on cognitive map 300 .
  • Operating vehicle 110 can include actuating vehicle components such as powertrain, steering and braking via controllers 112 , 113 , 114 to determine vehicle location and trajectory based on predicted locations and trajectories. The predicted locations and trajectories can be determined based on the cognitive map 300 .
  • computing device 115 can operate vehicle 110 to follow predicted trajectories that locate vehicle 110 in the center of a lane, the lane determined based on lane marker 312 and barrier 314 while maintaining a predetermined distance between vehicle 110 and other vehicle 310 .
  • Computing device 115 can predict vehicle trajectories that can be used to actuate powertrain, steering and braking components based on distances to and locations of objects in the cognitive map 300 relative to the location of vehicle 110 , for example.
  • Predicted trajectories of object including other vehicles 304 , 306 , 308 , 310 can be determined by comparing the location of the objects in successive cognitive maps 300 created at successive time intervals, from images 200 acquired at successive time intervals. Trajectories of other vehicles 304 , 306 , 308 , 310 can be determined by determining the locations of other vehicles 304 , 306 , 308 , 310 in successive cognitive maps 300 created at successive time intervals, fitting a curve to the location points and calculating vectors equal to the first and second derivatives of each curve in the 2D plane of the cognitive map 300 .
  • the magnitude of the first derivative is speed and the angle is direction.
  • the second derivatives are directional derivatives parallel to the first derivative direction (longitudinal acceleration) and perpendicular to the first derivative direction (latitudinal acceleration).
  • FIG. 4 is a diagram of an example CNN 400 configured to input an image 200 and output a cognitive map 300 .
  • the image 200 can be a monocular RGB video image acquired from a video sensor 116 included in a vehicle 110 that includes a scene depicting the physical environment near vehicle 110 .
  • the cognitive map 300 is a 2D representation of the physical environment near vehicle 110 including 20 or more channels each including a single class of objects present in the scene, identified by type, distance and 3D pose relative to vehicle 110 , where 3D pose is defined as the orientation of an object in 3D space relative to a frame of reference expressed as angles ⁇ , ⁇ , and ⁇ .
  • Information regarding object type, distance and 3D pose included in cognitive map 300 as a top-down view can permit computing device 115 to determine trajectories to operate vehicle 110 safely by traveling on the roadway and avoiding collisions.
  • CNN 400 is a program in memory executing on a processor included in computing device 115 and includes a set of ten convolutional layers C 1 -C 10 (3D boxes) configured to input 402 an image 200 to convolutional layer C 1 .
  • Convolutional layer C 1 produces an intermediate result 406 , represented by the arrow between convolutional layer C 1 and convolutional layer C 2 .
  • Each convolutional layer C 2 -C 10 receives an intermediate result 406 and outputs an intermediate result 406 represented by the arrows between adjacent convolutional layers C 1 -C 10 , representing forward propagation of intermediate results 406 .
  • Convolutional layers C 1 -C 10 each output an intermediate result 406 at an output spatial resolution equal to the input spatial resolution or at an output spatial resolution reduced from the input spatial resolution.
  • Bit depth per resolution element increases for intermediate results as spatial resolution increases as described in Table 1, below.
  • This repeats for convolutional layers C 2 -C 9 which produce intermediate results 406 , represented by the dark arrows between convolutional layers C 2 -C 9 at successively lower resolutions.
  • Convolutional layers C 1 -C 9 can reduce resolution by pooling, wherein an adjacent group of pixels, which can be a 2 ⁇ 2 neighborhood, for example, are combined to form a single pixel according to a predetermined equation. Combining a group of pixels by selecting a maximum value among them, called “max pooling”, can reduce resolution while retaining information in intermediate results 406 .
  • convolutional layer C 10 outputs intermediate result 406 to first deconvolutional layer D 1 , which can deconvolve and upsample intermediate result 406 to produce intermediate cognitive map 408 , represented by the arrows between each of deconvolutional layers D 1 -D 10 .
  • Deconvolution is convolution performed with a kernel that is, at least in part, an inverse of another kernel previously used to convolve a function and can partially invert the effects of the previous convolution.
  • deconvolutional layers D 1 -D 10 can increase spatial resolution of intermediate cognitive map 408 while decreasing the bit depth according to Table 1, below.
  • Convolutional layer C 10 also outputs estimated feature maps 412 to prediction image p 6 , which, when training CNN 400 , combines estimated feature maps 412 from convolutional layer C 10 with ground truth-based information regarding objects that transforms the estimated feature maps 412 into an estimated cognitive map 414 .
  • the estimated cognitive map 414 is combined with the intermediate feature maps 408 output from deconvolution layer D 1 when training CNN 400 . This is shown by the “+” signs on the intermediate cognitive map 408 arrow between deconvolution layers D 1 -D 2 . Comparing the intermediate cognitive map 408 based on input image I with ground truth-based information including object detection, pixel-wise segmentation, 3D object poses, and relative distances is used for training the convolutional neural network.
  • the “+” sign on the intermediate cognitive map 408 between deconvolution layers D 1 -D 2 also indicates combining intermediate feature map 408 and predicted cognitive map 414 with skip connection results 410 from convolutional layer C 7 received via skip connections.
  • Skip connection results 410 are intermediate results 406 forward propagated via skip connections as input to an upsampling deconvolution layer D 2 , D 4 , D 6 , D 8 , D 10 .
  • Skip connection results 410 can be combined with intermediate feature maps 408 to increase resolution of intermediate feature map 408 by upsampling to pass onto succeeding deconvolutional layers D 3 , D 5 , D 7 , D 9 .
  • Skip connections can forward propagate skip connection results 410 at the same resolution as the deconvolutional layers D 2 , D 4 , D 6 , D 8 , D 10 receiving the information.
  • Deconvolutional layers D 1 -D 10 include prediction images p 2 -p 6 .
  • Prediction images p 2 -p 6 are used for training CNN 400 to produce cognitive maps 300 from image 200 input.
  • Prediction images p 2 -p 6 are determined based on ground truth images developed independently of CNN 400 .
  • Ground truth refers to information regarding the physical environment near vehicle 110 . Accordingly, ground truth data in the present context can include distance and pose information determined using sensors 116 including multi-camera video sensors 116 , LIDAR sensors 116 , and radar sensors 116 , location data from GPS sensors 116 , INS sensors 116 , and odometry sensors 116 .
  • Ground truth data in the present context can also include map data stored in a memory of computing device 115 , and/or from a server computer 120 , combined with information regarding object classification determined using CNN-based object classification programs.
  • Such CNN-based object classification programs typically receive as input images 200 , and then output images 200 segmented into regions that include objects such as roadways, lane markings, barriers, lanes, shoulders or adjacent terrain, other vehicles including type and model, and other objects including pedestrians, animals, bicycles, etc.
  • Prediction images p 2 -p 6 combine distance information with segmentation information to transform estimated results 412 from convolutional layer C 10 and deconvolutional layers D 2 , D 4 , D 6 and D 8 into estimated cognitive maps 414 by orthographically projecting the estimated results 412 onto a 2D ground plane based on distance information to segmented objects and coloring the estimated cognitive map 414 based on information regarding object detection, pixel-wise segmentation, 3D object poses, and relative distances included in prediction images p 2 -p 6 .
  • Prediction images p 2 -p 6 are used to train CNN 400 to output a cognitive map 300 in response to inputting an image 200 by outputting estimated cognitive maps 414 , to be combined with the intermediate cognitive maps 408 output by deconvolutional layers D 1 , D 3 , D 5 , D 7 , D 9 .
  • This combination is denoted by the “+” signs on the intermediate cognitive maps 408 between deconvolution layers D 1 -D 2 , D 3 -D 4 , D 5 -D 6 , D 7 -D 8 and D 9 -D 10 .
  • Prediction images p 2 -p 6 can be based on ground truth including semantic segmentation applied to an input image 200 .
  • Multiple monocular images 200 acquired at different locations can be processed using optical flow techniques, for example, to determine distances to objects detected by semantic segmentation.
  • Data from a sensor 116 can be combined with semantic segmentation information to determine distances to objects.
  • a top-down view can be generated by homography, where depictions of objects detected in an input image 200 are orthographically projected onto a plane parallel with a ground plane or roadway based on their estimated 3D shape and 3D pose. Once projected onto the plane representing an estimated cognitive map 414 , objects can retain their class or type, as indicated by color.
  • Multiple prediction images p 2 -p 6 are used to train CNN 400 with the goal that each prediction image p 2 -p 6 is combined with the intermediate cognitive map 408 at the appropriate resolution.
  • Combining estimated cognitive maps 414 with intermediate cognitive maps 408 can include scoring positively (rewarding) output from deconvolutional layers D 1 , D 3 , D 5 , D 7 , D 9 based on the similarity between the intermediate cognitive maps 408 and the estimated cognitive maps 414 .
  • CNN 400 can be trained to output 404 a cognitive map 300 from deconvolution layer D 10 .
  • Trained CNN 400 will output 404 a cognitive map 300 based on recognizing visual similarities between an input image 200 and input images 200 processed as part of a training set.
  • Similarity between the intermediate cognitive map 408 to the estimated cognitive map 414 can be determined based on a cost function that measures the similarity of the intermediate cognitive map 408 to the estimated cognitive map 414 by the equation:
  • Cost( l,M ) W *Cross Entropy(M,M Rec ) +neighborhood_cost( M, M Rec ) (1)
  • W is a weight of each object calculated based on the number of available training pixels for each class of objects
  • I is the input image 200
  • M is the estimated cognitive map 414
  • M_Rec is the intermediate cognitive map 408 .
  • the Cross_Entropy loss function is calculated as:
  • the neighborhood similarity cost term can be determined by considering the agreement between a pixel and its neighboring pixels in the cognitive map predictions p 2 -p 6 and 300 .
  • Calculation of a neighborhood cost function can be simplified by applying a Gaussian filter to the cross-entropy of a 3 ⁇ 3 block of pixels for the estimated cognitive map and ground truth. Applying a neighborhood cost function in this manner can improve the convergence speed of training and result in better predictions.
  • a CNN 400 can process input images 200 to produce cognitive maps 300 without inputting prediction images p 2 -p 6 .
  • Convolutional layers C 1 -C 10 can convolve and down-sample intermediate results 406 that get passed to deconvolutional layers D 1 -D 10 to deconvolve and upsample intermediate cognitive maps 408 with input from convolutional layers C 1 , C 2 , C 4 , C 6 , C 7 via skip connection results 410 .
  • Cognitive maps 300 produced by CNN 400 can be used by computing device 115 to operate vehicle 110 by permitting computing device to predict vehicle trajectories based on the cognitive map 300 .
  • multiple CNNs 400 can be trained to determine cognitive maps 300 based on ground truth including multiple monocular image inputs, LIDAR and radar and the results combined by adding a fusion layer to the CNNs 400 .
  • Temporal information can be included in the CNN 400 by adding recurrent convolutional layers to process temporal information.
  • Cognitive maps 300 output from CNN 400 can be combined with other information available to computing device 115 from sensors 116 including GPS, INS and odometry location information, LIDAR, radar, and multi-camera information regarding distances and map information stored at computing device 115 or downloaded from a server computer 120 , for example to improve the accuracy cognitive map 300 p 1 and distances to objects therein.
  • a recorded image 200 along with recorded ground truth information can be used to update CNN 400 by providing additional training.
  • the re-trained CNN 400 can be stored in computing device 115 memory for future use.
  • a trained CNN 400 can be recalled from memory and executed by computing device 115 to produce cognitive maps 300 from image 200 input in real time as required for operation of a vehicle 110 on a roadway with traffic, for example.
  • FIG. 5 is a diagram of a flowchart, described in relation to FIGS. 1-4 , of a process 500 for operating a vehicle based on a cognitive map.
  • Process 500 can be implemented by a processor of computing device 115 , taking as input information from sensors 116 , and executing commands and sending control signals via controllers 112 , 113 , 114 , for example.
  • Process 500 includes multiple steps taken in the disclosed order.
  • Process 500 also includes implementations including fewer steps or can include the steps taken in different orders.
  • Process 500 begins at step 502 , where a computing device 115 included in a vehicle 110 acquires an image 200 as described above in relation to FIG. 2 .
  • the image 200 can be an RGB color video image acquired by a video sensor 116 included in vehicle 110 .
  • the image 200 can depict the physical environment near vehicle 110 , including a roadway 202 and objects including other vehicles 204 , 206 , 208 , 210 .
  • computing device 115 inputs image 200 to a trained CNN 400 as discussed above in relation to FIG. 4 , above.
  • trained CNN 400 produces a cognitive map 300 including a roadway 302 and objects including other vehicles 304 , 306 , 308 , 310 .
  • Training CNN 400 will be discussed in relation to FIG. 6 .
  • computing device 115 operates a vehicle 110 based on cognitive map 300 .
  • Computing device 115 can operate vehicle 110 based on cognitive map 300 by determining predicted vehicle trajectories based on lanes and objects including other vehicles.
  • Computing device 115 can combine cognitive maps 300 with map data from multi-camera sensors 116 , LIDAR sensors 116 , and radar sensors 116 , location data from GPS, INS and odometry and map data from a server computer 120 , for example, to improve the accuracy of cognitive map 300 .
  • the computing device 115 can provide instructions to one or more of the powertrain controller 112 , brake controller 113 , and steering controller 114 .
  • the computing device may be programmed to take certain actions concerning adjusting or maintains speed, acceleration, and/or steering based on objects such as other vehicles 304 - 310 ; the cognitive map 300 advantageously can provide more accurate data for such actions than was previously available. Vehicle 110 safety and or efficiency can thereby be improved by the cognitive map 300 . Following this step process 500 ends.
  • FIG. 6 is a diagram of a flowchart, described in relation to FIGS. 1-4 , of a process 600 for training a CNN 400 based on ground-truth.
  • Process 600 can be implemented by a processor of computing device 115 , taking as input information from sensors 116 , and executing commands and sending control signals via controllers 112 , 113 , 114 , for example.
  • Process 600 includes multiple steps taken in the disclosed order.
  • Process 600 also includes implementations including fewer steps or can include the steps taken in different orders.
  • Process 600 begins at step 602 , where a computing device 115 included in a vehicle 110 acquires and records one or more images 200 as described above in relation to FIG. 2 .
  • the images 200 can be an RGB color video images acquired by a video sensor 116 included in vehicle 110 .
  • the image 200 can depict the physical environment near vehicle 110 , including a roadway 202 and objects including other vehicles 204 , 206 , 208 , 210 .
  • computing device 115 records ground truth data based on object detection, pixel-wise segmentation, 3D object poses, and relative distances all determined based the recorded images 200 , distance data, location data, and map data as discussed above in relation to FIG. 4 , corresponding to the images 200 recorded at step 602 .
  • computing device inputs images 200 to CNN 400 while constructing prediction images p 2 -p 6 to train CNN 400 according to cost functions in equations 1 and 2, above.
  • Prediction images p 2 -p 6 are constructed to include the recorded ground truth data based on object detection, pixel-wise segmentation, 3D object poses, and relative distances.
  • Prediction images p 2 -p 6 can be created by homographic projection of ground truth data and used to transform estimated results 412 into top-down view, estimated cognitive maps 414 that can be used to train CNN 400 to output a cognitive map 300 in response to inputting an image 200 as discussed above in relation to FIG. 4 .
  • CNN 400 can be trained to output a cognitive map 300 in response to an input image 200 .
  • the trained CNN 400 is output to be stored at memory included in computing device 115 .
  • Computing device 115 can recall the trained CNN 400 from memory, input an acquired image 200 to the trained CNN 400 and receive as output a cognitive map 300 , to be used to operate a vehicle 110 , without having to input ground truth data. Following this step process 600 ends.
  • Computing devices such as those discussed herein generally each include commands executable by one or more computing devices such as those identified above, and for carrying out blocks or steps of processes described above.
  • process blocks discussed above may be embodied as computer-executable commands.
  • Computer-executable commands may be compiled or interpreted from computer programs created using a variety of programming languages and/or technologies, including, without limitation, and either alone or in combination, JavaTM, C, C++, Visual Basic, Java Script, Perl, HTML, etc.
  • a processor e.g., a microprocessor
  • receives commands e.g., from a memory, a computer-readable medium, etc.
  • executes these commands thereby performing one or more processes, including one or more of the processes described herein.
  • commands and other data may be stored in files and transmitted using a variety of computer-readable media.
  • a file in a computing device is generally a collection of data stored on a computer readable medium, such as a storage medium, a random access memory, etc.
  • a computer-readable medium includes any medium that participates in providing data (e.g., commands), which may be read by a computer. Such a medium may take many forms, including, but not limited to, non-volatile media, volatile media, etc.
  • Non-volatile media include, for example, optical or magnetic disks and other persistent memory.
  • Volatile media include dynamic random access memory (DRAM), which typically constitutes a main memory.
  • DRAM dynamic random access memory
  • Computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EEPROM, any other memory chip or cartridge, or any other medium from which a computer can read.
  • exemplary is used herein in the sense of signifying an example, e.g., a reference to an “exemplary widget” should be read as simply referring to an example of a widget.
  • adverb “approximately” modifying a value or result means that a shape, structure, measurement, value, determination, calculation, etc. may deviate from an exact described geometry, distance, measurement, value, determination, calculation, etc., because of imperfections in materials, machining, manufacturing, sensor measurements, computations, processing time, communications time, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Automation & Control Theory (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Electromagnetism (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Transportation (AREA)
  • Game Theory and Decision Science (AREA)
  • Business, Economics & Management (AREA)
  • Optics & Photonics (AREA)
  • Mechanical Engineering (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Traffic Control Systems (AREA)
  • Image Analysis (AREA)

Abstract

A system, comprising a processor, and a memory, the memory including instructions to be executed by the processor to acquire the images of the vehicle environment, determine a cognitive map, which includes a top-down view of the vehicle environment, based on the image, and operate the vehicle based on the cognitive map.

Description

    BACKGROUND
  • Vehicles can be equipped to operate in both autonomous and occupant piloted mode. Vehicles can be equipped with computing devices, networks, sensors and controllers to acquire information regarding the vehicle's environment and to operate the vehicle based on the information. Safe and comfortable operation of the vehicle can depend upon determining predicted vehicle trajectories based on accurate and timely information regarding the vehicle's environment. For example, safe and comfortable operation of the vehicle can depend upon acquiring accurate and timely information regarding objects in a vehicle's environment while the vehicle is being operated on a roadway. It is a problem to provide accurate and timely information regarding objects near or around a vehicle to support operation of the vehicle.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a block diagram of an example vehicle.
  • FIG. 2 is a diagram of an example image of a traffic scene.
  • FIG. 3 is a diagram of an example cognitive map.
  • FIG. 4 is a diagram of an example convolutional neural network.
  • FIG. 5 is a flowchart diagram of an example process to operate a vehicle based on a cognitive map.
  • FIG. 6 is a flowchart diagram of an example process to train a convolutional neural network to output a cognitive map.
  • DETAILED DESCRIPTION
  • Vehicles can be equipped to operate in both autonomous and occupant piloted mode. By a semi- or fully-autonomous mode, we mean a mode of operation wherein a vehicle can be piloted by a computing device as part of a vehicle information system having sensors and controllers. The vehicle can be occupied or unoccupied, but in either case the vehicle can be piloted without assistance of an occupant. For purposes of this disclosure, an autonomous mode is defined as one in which each of vehicle propulsion (e.g., via a powertrain including an internal combustion engine and/or electric motor), braking, and steering are controlled by one or more vehicle computers; in a semi-autonomous mode the vehicle computer(s) control(s) one or two of vehicle propulsion, braking, and steering. In a non-autonomous vehicle, none of these are controlled by a computer.
  • An estimate of a location, e.g., according to geo-coordinates, of a vehicle with respect to a map can be used by a computing device to operate a vehicle on a roadway from a current location to a determined destination, for example. The map can be a cognitive map. A cognitive map in the context of this disclosure is a top-down view, 2D representation of the physical environment around a vehicle. In examples where a vehicle is in motion, for example, operating on a roadway, the cognitive map can include a top-down, 2D representation of the roadway ahead of a current vehicle location and in a direction of current vehicle travel. The direction of current vehicle travel is based on the current vehicle trajectory, which includes speed, direction, longitudinal acceleration, and lateral acceleration. The cognitive map can include a roadway and objects such as lanes, barriers, shoulders, and lane markers, vehicles and pedestrians, for example.
  • In the field of psychology, a cognitive map is a mental representation of the physical environment. For example, humans and animals use cognitive maps to find their way around their environment. In the present disclosure, a cognitive map is used by a computing device to operate a vehicle, including actuating vehicle components including powertrain, steering and braking to direct the vehicle from a current location to a destination location in a safe and comfortable fashion. The cognitive map can be used by the computing device to determine predicted vehicle trajectories based on determined locations of lanes and determined locations and trajectories of other vehicles in the cognitive map, for example. A cognitive map can depict semantic segmentation of objects viewed from top-down view and accurately illustrate a distance to each point from vehicle 110.
  • Disclosed herein is a method, including acquiring an image of a vehicle environment, determining a cognitive map, which includes a top-down view of the vehicle environment, based on the image, and operating the vehicle based on the cognitive map. The vehicle environment can include a roadway and objects including other vehicles and pedestrians. The cognitive map can include locations of the objects including at least one of other vehicles and pedestrians, relative to the vehicle. The image can be a monocular video frame. The cognitive map of the vehicle environment can be based on processing the image with a convolutional neural network. The convolutional neural network can be trained based on ground truth data prior to determining the cognitive map. The ground truth data can be based on object detection, pixel-wise segmentation, 3D object pose, and relative distance.
  • Training the convolutional neural network can be based on prediction images included in the convolutional neural network. The prediction images can be based on ground truth data. The neural network learns how to transform input RGB images to estimation of cognitive maps. The estimated cognitive maps can be combined with intermediate estimations of cognitive maps to and compared against the prediction images to determine similarity. The similarity between the estimated combined cognitive maps can be determined by calculating a cost function. The cost function can be based on a weighted cross entropy function based on comparing the estimated cognitive maps and the intermediate cognitive maps with the prediction images. The prediction images can be based on LIDAR data.
  • Further disclosed is a computer readable medium, storing program instructions for executing some or all of the above method steps. Further disclosed is a computer programmed for executing some or all of the above method steps, including a computer apparatus, programmed to acquire an image of a vehicle environment, determine a cognitive map, which includes a top-down view of the vehicle environment, based on the image, and operate the vehicle based on the cognitive map. The vehicle environment can include a roadway and objects including other vehicles and pedestrians. The cognitive map can include locations of the objects including at least one of other vehicles and pedestrians, relative to the vehicle. The image can be a monocular video frame. The cognitive map of the vehicle environment can be based on processing the image with a convolutional neural network. The convolutional neural network can be trained based on ground truth data prior to determining the cognitive map. The ground truth data can be based on object detection, pixel-wise segmentation, 3D object pose, and relative distance.
  • The computer can be further programmed to train the convolutional neural network based on prediction images included in the convolutional neural network. The prediction images can be based on ground truth data. The prediction images can transform estimated results into estimated cognitive maps. The estimated cognitive maps can be combined with intermediate cognitive maps to determine similarity. The similarity between the estimated cognitive maps and the prediction images can be determined by calculating a cost function. The cost function can be based on a weighted cross entropy function based on comparing the estimated cognitive maps combined with the intermediate cognitive maps and prediction images. The prediction images can be based on LIDAR data.
  • FIG. 1 is a diagram of a vehicle information system 100 that includes a vehicle 110 operable in autonomous (“autonomous” by itself in this disclosure means “fully autonomous”) and occupant piloted (also referred to as non-autonomous) mode. Vehicle 110 also includes one or more computing devices 115 for performing computations for piloting the vehicle 110 during autonomous operation. Computing devices 115 can receive information regarding the operation of the vehicle from sensors 116.
  • The computing device 115 includes a processor and a memory such as are known. Further, the memory includes one or more forms of computer-readable media, and stores instructions executable by the processor for performing various operations, including as disclosed herein. For example, the computing device 115 may include programming to operate one or more of vehicle brakes, propulsion (e.g., control of acceleration in the vehicle 110 by controlling one or more of an internal combustion engine, electric motor, hybrid engine, etc.), steering, climate control, interior and/or exterior lights, etc., as well as to determine whether and when the computing device 115, as opposed to a human operator, is to control such operations.
  • The computing device 115 may include or be communicatively coupled to, e.g., via a vehicle communications bus as described further below, more than one computing devices, e.g., controllers or the like included in the vehicle 110 for monitoring and/or controlling various vehicle components, e.g., a powertrain controller 112, a brake controller 113, a steering controller 114, etc. The computing device 115 is generally arranged for communications on a vehicle communication network, e.g., including a bus in the vehicle 110 such as a controller area network (CAN) or the like; the vehicle 110 network can additionally or alternatively include wired or wireless communication mechanism such as are known, e.g., Ethernet or other communication protocols.
  • Via the vehicle network, the computing device 115 may transmit messages to various devices in the vehicle and/or receive messages from the various devices, e.g., controllers, actuators, sensors, etc., including sensors 116. Alternatively, or additionally, in cases where the computing device 115 actually comprises multiple devices, the vehicle communication network may be used for communications between devices represented as the computing device 115 in this disclosure. Further, as mentioned below, various controllers or sensing elements such as sensors 116 may provide data to the computing device 115 via the vehicle communication network.
  • In addition, the computing device 115 may be configured for communicating through a vehicle-to-infrastructure (V-to-I) interface 111 with a remote server computer 120, e.g., a cloud server, via a network 130, which, as described below. A vehicle-to-infrastructure (V-to-I) interface 111 includes hardware, firmware, and software that permits computing device 115 to communicate with a remote server computer 120 via a network 130 such as wireless Internet (Wi-Fi) or cellular networks. V-to-I interface 111 may accordingly include processors, memory, transceivers, etc., configured to utilize various wired and/or wireless networking technologies, e.g., cellular, BLUETOOTH® and wired and/or wireless packet networks. Computing device 115 may be configured for communicating with other vehicles through V-to-I interface 111 using vehicle-to-vehicle (V-to-V) networks, e.g., according to Dedicated Short Range Communications (DSRC) and/or the like, e.g., formed on an ad hoc basis among nearby vehicles 110 or formed through infrastructure-based networks including the Internet via cellular networks or Wi-Fi, for example. The computing device 115 also includes nonvolatile memory such as is known. Computing device 115 can log, i.e., store in a memory, information by storing the information in nonvolatile memory for later retrieval and transmittal via the vehicle communication network and a vehicle to infrastructure (V-to-I) interface 111 to a server computer 120 or user mobile device 160.
  • As already mentioned, generally included in instructions stored in the memory and executable by the processor of the computing device 115 is programming for operating one or more vehicle 110 components, e.g., braking, steering, propulsion, etc., without intervention of a human operator. Using data received in the computing device 115, e.g., the sensor data from the sensors 116, the server computer 120, etc., the computing device 115 may make various determinations and/or control various vehicle 110 components and/or operations without a driver to operate the vehicle 110. For example, the computing device 115 may include programming to regulate vehicle 110 operational behaviors (i.e., physical manifestations of vehicle 110 operation) such as speed, acceleration, deceleration, steering, etc., as well as tactical behaviors (i.e., control of operational behaviors typically in a manner intended to achieve safe and efficient traversal of a route) such as a distance between vehicles and/or amount of time between vehicles, lane-change, minimum gap between vehicles, left-turn-across-path minimum, time-to-arrival at a particular location and intersection (without signal) minimum time-to-arrival to cross the intersection.
  • Controllers, as that term is used herein, include computing devices that typically are programmed to control a specific vehicle subsystem. Examples include a powertrain controller 112, a brake controller 113, and a steering controller 114. A controller may be an electronic control unit (ECU) such as is known, possibly including additional programming as described herein. The controllers may communicatively be connected to and receive instructions from the computing device 115 to actuate the subsystem according to the instructions. For example, the brake controller 113 may receive instructions from the computing device 115 to operate the brakes of the vehicle 110.
  • The one or more controllers 112, 113, 114 for the vehicle 110 may include conventional electronic control units (ECUs) or the like including, as non-limiting examples, one or more powertrain controllers 112, one or more brake controllers 113 and one or more steering controllers 114. Each of the controllers 112, 113, 114 may include respective processors and memories and one or more actuators. The controllers 112, 113, 114 may be programmed and connected to a vehicle 110 communications bus, such as a controller area network (CAN) bus or local interconnect network (LIN) bus, to receive instructions from the computer 115 and control actuators based on the instructions.
  • Sensors 116 may include a variety of devices known to provide data via the vehicle communications bus. For example, a radar fixed to a front bumper (not shown) of the vehicle 110 may provide a distance from the vehicle 110 to a next vehicle in front of the vehicle 110, or a global positioning system (GPS) sensor disposed in the vehicle 110 may provide geographical coordinates of the vehicle 110. The distance(s) provided by the radar and/or other sensors 116 and/or the geographical coordinates provided by the GPS sensor may be used by the computing device 115 to operate the vehicle 110 autonomously or semi-autonomously.
  • The vehicle 110 is generally a land-based autonomous vehicle 110 having three or more wheels, e.g., a passenger car, light truck, etc. The vehicle 110 includes one or more sensors 116, the V-to-I interface 111, the computing device 115 and one or more controllers 112, 113, 114.
  • The sensors 116 may be programmed to collect data related to the vehicle 110 and the environment in which the vehicle 110 is operating. By way of example, and not limitation, sensors 116 may include, e.g., altimeters, cameras, LIDAR, radar, ultrasonic sensors, infrared sensors, pressure sensors, accelerometers, gyroscopes, temperature sensors, pressure sensors, hall sensors, optical sensors, voltage sensors, current sensors, mechanical sensors such as switches, etc. The sensors 116 may be used to sense the environment in which the vehicle 110 is operating, e.g., sensors 116 can detect phenomena such as weather conditions (precipitation, external ambient temperature, etc.), the grade of a road, the location of a road (e.g., using road edges, lane markings, etc.), or locations of target objects such as neighboring vehicles 110. The sensors 116 may further be used to collect data including dynamic vehicle 110 data related to operations of the vehicle 110 such as velocity, yaw rate, steering angle, engine speed, brake pressure, oil pressure, the power level applied to controllers 112, 113, 114 in the vehicle 110, connectivity between components, and accurate and timely performance of components of the vehicle 110.
  • FIG. 2 illustrates an image 200 of a traffic scene including a roadway 202 and other vehicles 204, 206, 208, 210. The image 200 can be a monocular video frame acquired by computing device 115 from a video sensor 116 included in a vehicle 110, for example. A monocular video frame can include three color planes with a bit depth of eight bits each for a total of 24 bits corresponding to red, green, and blue (RGB) color components. Image 200 can include a roadway 202, lane marker 212, barriers 224, 226, 228 and roadway shoulders or terrain adjacent to roadway 230, 232. Computing device 115 can use image 200 to produce a cognitive map including roadway 202 and objects including other vehicles 204, 206, 208, 210, lane marker 212, barriers 224, 226, 228 and roadway shoulders or terrain adjacent to roadway 230, 232 and, based on the cognitive map including roadway 202 and objects, determine predicted trajectories for operating vehicle 110.
  • FIG. 3 is a cognitive map 300 of a traffic scene including a roadway 302 (white) and objects including other vehicles 304, 306, 308, 310, (grid) rendered in white and grid, respectively, to denote different colors. Likewise, lane marker 312 (black), barriers 314, 316, 318 (upward diagonal) and shoulders or adjacent terrain 320, 322 (cross-hatch) are each rendered to denote different colors, where each different color represents an object class or type and will each occupy a separate channel or plane in cognitive map 300. For example, a cognitive map can include 20 or more channels each including objects belonging to a single class, such a “roadway”, “vehicle”, “pedestrian”, “cyclist”, etc. Vehicle 110 trajectory with respect to cognitive map 300 is denoted by arrow 324. Cognitive map 300 can be created by inputting an image 200 into a convolutional neural network (CNN), configured and trained as described in relation to FIG. 4, below, which, in response to the input, outputs a cognitive map 300.
  • Computing device 115 can operate vehicle 110 based on cognitive map 300. Operating vehicle 110 can include actuating vehicle components such as powertrain, steering and braking via controllers 112, 113, 114 to determine vehicle location and trajectory based on predicted locations and trajectories. The predicted locations and trajectories can be determined based on the cognitive map 300. For example, computing device 115 can operate vehicle 110 to follow predicted trajectories that locate vehicle 110 in the center of a lane, the lane determined based on lane marker 312 and barrier 314 while maintaining a predetermined distance between vehicle 110 and other vehicle 310. Computing device 115 can predict vehicle trajectories that can be used to actuate powertrain, steering and braking components based on distances to and locations of objects in the cognitive map 300 relative to the location of vehicle 110, for example.
  • Predicted trajectories of object including other vehicles 304, 306, 308, 310 can be determined by comparing the location of the objects in successive cognitive maps 300 created at successive time intervals, from images 200 acquired at successive time intervals. Trajectories of other vehicles 304, 306, 308, 310 can be determined by determining the locations of other vehicles 304, 306, 308, 310 in successive cognitive maps 300 created at successive time intervals, fitting a curve to the location points and calculating vectors equal to the first and second derivatives of each curve in the 2D plane of the cognitive map 300. The magnitude of the first derivative is speed and the angle is direction. The second derivatives are directional derivatives parallel to the first derivative direction (longitudinal acceleration) and perpendicular to the first derivative direction (latitudinal acceleration).
  • FIG. 4 is a diagram of an example CNN 400 configured to input an image 200 and output a cognitive map 300. The image 200 can be a monocular RGB video image acquired from a video sensor 116 included in a vehicle 110 that includes a scene depicting the physical environment near vehicle 110. The cognitive map 300 is a 2D representation of the physical environment near vehicle 110 including 20 or more channels each including a single class of objects present in the scene, identified by type, distance and 3D pose relative to vehicle 110, where 3D pose is defined as the orientation of an object in 3D space relative to a frame of reference expressed as angles ρ, φ, and θ. Information regarding object type, distance and 3D pose included in cognitive map 300 as a top-down view can permit computing device 115 to determine trajectories to operate vehicle 110 safely by traveling on the roadway and avoiding collisions.
  • CNN 400 is a program in memory executing on a processor included in computing device 115 and includes a set of ten convolutional layers C1-C10 (3D boxes) configured to input 402 an image 200 to convolutional layer C1. Convolutional layer C1 produces an intermediate result 406, represented by the arrow between convolutional layer C1 and convolutional layer C2. Each convolutional layer C2-C10 receives an intermediate result 406 and outputs an intermediate result 406 represented by the arrows between adjacent convolutional layers C1-C10, representing forward propagation of intermediate results 406. Convolutional layers C1-C10 each output an intermediate result 406 at an output spatial resolution equal to the input spatial resolution or at an output spatial resolution reduced from the input spatial resolution. Bit depth per resolution element increases for intermediate results as spatial resolution increases as described in Table 1, below. This repeats for convolutional layers C2-C9, which produce intermediate results 406, represented by the dark arrows between convolutional layers C2-C9 at successively lower resolutions. Convolutional layers C1-C9 can reduce resolution by pooling, wherein an adjacent group of pixels, which can be a 2×2 neighborhood, for example, are combined to form a single pixel according to a predetermined equation. Combining a group of pixels by selecting a maximum value among them, called “max pooling”, can reduce resolution while retaining information in intermediate results 406. Following convolutional layers C1-C10, convolutional layer C10 outputs intermediate result 406 to first deconvolutional layer D1, which can deconvolve and upsample intermediate result 406 to produce intermediate cognitive map 408, represented by the arrows between each of deconvolutional layers D1-D10. Deconvolution is convolution performed with a kernel that is, at least in part, an inverse of another kernel previously used to convolve a function and can partially invert the effects of the previous convolution. For example, deconvolutional layers D1-D10 can increase spatial resolution of intermediate cognitive map 408 while decreasing the bit depth according to Table 1, below.
  • Convolutional layer C10 also outputs estimated feature maps 412 to prediction image p6, which, when training CNN 400, combines estimated feature maps 412 from convolutional layer C10 with ground truth-based information regarding objects that transforms the estimated feature maps 412 into an estimated cognitive map 414. The estimated cognitive map 414 is combined with the intermediate feature maps 408 output from deconvolution layer D1 when training CNN 400. This is shown by the “+” signs on the intermediate cognitive map 408 arrow between deconvolution layers D1-D2. Comparing the intermediate cognitive map 408 based on input image I with ground truth-based information including object detection, pixel-wise segmentation, 3D object poses, and relative distances is used for training the convolutional neural network.
  • The “+” sign on the intermediate cognitive map 408 between deconvolution layers D1-D2 also indicates combining intermediate feature map 408 and predicted cognitive map 414 with skip connection results 410 from convolutional layer C7 received via skip connections. Skip connection results 410 are intermediate results 406 forward propagated via skip connections as input to an upsampling deconvolution layer D2, D4, D6, D8, D10. Skip connection results 410 can be combined with intermediate feature maps 408 to increase resolution of intermediate feature map 408 by upsampling to pass onto succeeding deconvolutional layers D3, D5, D7, D9. This is shown by the “+” signs on the intermediate results 408 between deconvolution layers D1-D2, D3-D4, D5-D6, D7-D8 and D9-D10. Skip connections can forward propagate skip connection results 410 at the same resolution as the deconvolutional layers D2, D4, D6, D8, D10 receiving the information.
  • Deconvolutional layers D1-D10 include prediction images p2-p6. Prediction images p2-p6 are used for training CNN 400 to produce cognitive maps 300 from image 200 input. Prediction images p2-p6 are determined based on ground truth images developed independently of CNN 400. Ground truth refers to information regarding the physical environment near vehicle 110. Accordingly, ground truth data in the present context can include distance and pose information determined using sensors 116 including multi-camera video sensors 116, LIDAR sensors 116, and radar sensors 116, location data from GPS sensors 116, INS sensors 116, and odometry sensors 116. Ground truth data in the present context can also include map data stored in a memory of computing device 115, and/or from a server computer 120, combined with information regarding object classification determined using CNN-based object classification programs. Such CNN-based object classification programs typically receive as input images 200, and then output images 200 segmented into regions that include objects such as roadways, lane markings, barriers, lanes, shoulders or adjacent terrain, other vehicles including type and model, and other objects including pedestrians, animals, bicycles, etc. Prediction images p2-p6 combine distance information with segmentation information to transform estimated results 412 from convolutional layer C10 and deconvolutional layers D2, D4, D6 and D8 into estimated cognitive maps 414 by orthographically projecting the estimated results 412 onto a 2D ground plane based on distance information to segmented objects and coloring the estimated cognitive map 414 based on information regarding object detection, pixel-wise segmentation, 3D object poses, and relative distances included in prediction images p2-p6.
  • Prediction images p2-p6 are used to train CNN 400 to output a cognitive map 300 in response to inputting an image 200 by outputting estimated cognitive maps 414, to be combined with the intermediate cognitive maps 408 output by deconvolutional layers D1, D3, D5, D7, D9. This combination is denoted by the “+” signs on the intermediate cognitive maps 408 between deconvolution layers D1-D2, D3-D4, D5-D6, D7-D8 and D9-D10. Prediction images p2-p6 can be based on ground truth including semantic segmentation applied to an input image 200. Multiple monocular images 200 acquired at different locations can be processed using optical flow techniques, for example, to determine distances to objects detected by semantic segmentation. Data from a sensor 116 can be combined with semantic segmentation information to determine distances to objects. Once distances to objects are determined and a 3D shape is estimated, a top-down view can be generated by homography, where depictions of objects detected in an input image 200 are orthographically projected onto a plane parallel with a ground plane or roadway based on their estimated 3D shape and 3D pose. Once projected onto the plane representing an estimated cognitive map 414, objects can retain their class or type, as indicated by color.
  • Multiple prediction images p2-p6 are used to train CNN 400 with the goal that each prediction image p2-p6 is combined with the intermediate cognitive map 408 at the appropriate resolution. Combining estimated cognitive maps 414 with intermediate cognitive maps 408 can include scoring positively (rewarding) output from deconvolutional layers D1, D3, D5, D7, D9 based on the similarity between the intermediate cognitive maps 408 and the estimated cognitive maps 414. By positively rewarding deconvolutional layers D1, D3, D5, D7, D9 in this fashion, CNN 400 can be trained to output 404 a cognitive map 300 from deconvolution layer D10. Once deconvolutional layers D1, D3, D5, D7, D9 have been trained to output intermediate cognitive maps 408, input from prediction images p2-p6 is no longer required output 404 a cognitive map 300 based on input image 200. Trained CNN 400 will output 404 a cognitive map 300 based on recognizing visual similarities between an input image 200 and input images 200 processed as part of a training set.
  • Similarity between the intermediate cognitive map 408 to the estimated cognitive map 414 can be determined based on a cost function that measures the similarity of the intermediate cognitive map 408 to the estimated cognitive map 414 by the equation:

  • Cost(l,M)=W*CrossEntropy(M,M Rec )+neighborhood_cost(M, M Rec)   (1)
  • where W is a weight of each object calculated based on the number of available training pixels for each class of objects, I is the input image 200, M is the estimated cognitive map 414, and M_Rec is the intermediate cognitive map 408. The Cross_Entropy loss function is calculated as:

  • H(M, M_Rec)=−Σi(M_Reci*log(M i)+(−1−i M_Reci)*log(1−M i))   (2)
  • where i is the ith pixel in the image. The neighborhood similarity cost term can be determined by considering the agreement between a pixel and its neighboring pixels in the cognitive map predictions p2-p6 and 300. Calculation of a neighborhood cost function can be simplified by applying a Gaussian filter to the cross-entropy of a 3×3 block of pixels for the estimated cognitive map and ground truth. Applying a neighborhood cost function in this manner can improve the convergence speed of training and result in better predictions.
  • Table 1 is a table of convolutional layers 402 C1-C10, deconvolutional layers 404 D1-D10, cognitive map 300 (p1) and prediction images p2-p6, with their respective sizes expressed as fractions of the height and width of the input RGB image 200 I, along with a bit depth, wherein the input RGB image is size W×H×3, with each of the RGB color planes having a bit depth of eight bits, with W=1920, H=1080 and bit depth of 24, for example.
  • TABLE 1
    Sizes and bit depth for convolutional layers C1-C10,
    deconvolutional layers D1-D10, cognitive map
    300 (p1) and prediction images p2-p6.
    C1-C10 D1-D10 p1-p6
    1 W/2 × H/2 × 64 W/32 × H/32 × 512 W × HX 24
    2 W/4 × H/4 × 128 W/32 × H/32 × 512 W/4 × H/4 × 24
    3 W/8 × H/8 × 256 W/16 × H/16 × 256 W/8 × H/8 × 24
    4 W/8 × H/8 × 256 W/16 × H/16 × 256 W/16 × H/16 × 24
    5 W/16 × H/16 × 512 W/8 × H/8 × 128 W/32 × H/32 × 24
    6 W/16 × H/16 × 512 W/8 × H/8 × 128 W/64 × H/64 × 24
    7 W/32 × H/32 × 512 W/4 × H/4 × 64
    8 W/32 × H/32 × 512 W/4 × H/4 × 64
    9 W/64 × H/64 × 1024 W/2 × H/2 × 32
    10 W/64 × H/64 × 1024 W/2 × H/2 × 32
  • Once trained using ground-truth based prediction images p2-p6, a CNN 400 can process input images 200 to produce cognitive maps 300 without inputting prediction images p2-p6. Convolutional layers C1-C10 can convolve and down-sample intermediate results 406 that get passed to deconvolutional layers D1-D10 to deconvolve and upsample intermediate cognitive maps 408 with input from convolutional layers C1, C2, C4, C6, C7 via skip connection results 410. Cognitive maps 300 produced by CNN 400 can be used by computing device 115 to operate vehicle 110 by permitting computing device to predict vehicle trajectories based on the cognitive map 300.
  • In other examples, multiple CNNs 400 can be trained to determine cognitive maps 300 based on ground truth including multiple monocular image inputs, LIDAR and radar and the results combined by adding a fusion layer to the CNNs 400. Temporal information can be included in the CNN 400 by adding recurrent convolutional layers to process temporal information. Cognitive maps 300 output from CNN 400 can be combined with other information available to computing device 115 from sensors 116 including GPS, INS and odometry location information, LIDAR, radar, and multi-camera information regarding distances and map information stored at computing device 115 or downloaded from a server computer 120, for example to improve the accuracy cognitive map 300 p1 and distances to objects therein.
  • In other examples, in cases where other information available to computing device 115 including GPS, INS and odometry location information, LIDAR, radar, and multi-camera information regarding distances and map information stored at computing device 115 or downloaded from a server computer 120, provides information that does not agree with the cognitive map 300 p1, a recorded image 200 along with recorded ground truth information can be used to update CNN 400 by providing additional training. The re-trained CNN 400 can be stored in computing device 115 memory for future use. A trained CNN 400 can be recalled from memory and executed by computing device 115 to produce cognitive maps 300 from image 200 input in real time as required for operation of a vehicle 110 on a roadway with traffic, for example.
  • FIG. 5 is a diagram of a flowchart, described in relation to FIGS. 1-4, of a process 500 for operating a vehicle based on a cognitive map. Process 500 can be implemented by a processor of computing device 115, taking as input information from sensors 116, and executing commands and sending control signals via controllers 112, 113, 114, for example. Process 500 includes multiple steps taken in the disclosed order. Process 500 also includes implementations including fewer steps or can include the steps taken in different orders.
  • Process 500 begins at step 502, where a computing device 115 included in a vehicle 110 acquires an image 200 as described above in relation to FIG. 2. The image 200 can be an RGB color video image acquired by a video sensor 116 included in vehicle 110. The image 200 can depict the physical environment near vehicle 110, including a roadway 202 and objects including other vehicles 204, 206, 208, 210.
  • At step 504 computing device 115 inputs image 200 to a trained CNN 400 as discussed above in relation to FIG. 4, above. In response to inputting image 200, trained CNN 400 produces a cognitive map 300 including a roadway 302 and objects including other vehicles 304, 306, 308, 310. Training CNN 400 will be discussed in relation to FIG. 6.
  • At step 506 computing device 115 operates a vehicle 110 based on cognitive map 300. Computing device 115 can operate vehicle 110 based on cognitive map 300 by determining predicted vehicle trajectories based on lanes and objects including other vehicles. Computing device 115 can combine cognitive maps 300 with map data from multi-camera sensors 116, LIDAR sensors 116, and radar sensors 116, location data from GPS, INS and odometry and map data from a server computer 120, for example, to improve the accuracy of cognitive map 300. Thus, based on the cognitive map 300, the computing device 115 can provide instructions to one or more of the powertrain controller 112, brake controller 113, and steering controller 114. For example, the computing device may be programmed to take certain actions concerning adjusting or maintains speed, acceleration, and/or steering based on objects such as other vehicles 304- 310; the cognitive map 300 advantageously can provide more accurate data for such actions than was previously available. Vehicle 110 safety and or efficiency can thereby be improved by the cognitive map 300. Following this step process 500 ends.
  • FIG. 6 is a diagram of a flowchart, described in relation to FIGS. 1-4, of a process 600 for training a CNN 400 based on ground-truth. Process 600 can be implemented by a processor of computing device 115, taking as input information from sensors 116, and executing commands and sending control signals via controllers 112, 113, 114, for example. Process 600 includes multiple steps taken in the disclosed order. Process 600 also includes implementations including fewer steps or can include the steps taken in different orders.
  • Process 600 begins at step 602, where a computing device 115 included in a vehicle 110 acquires and records one or more images 200 as described above in relation to FIG. 2. The images 200 can be an RGB color video images acquired by a video sensor 116 included in vehicle 110. The image 200 can depict the physical environment near vehicle 110, including a roadway 202 and objects including other vehicles 204, 206, 208, 210.
  • At step 604 computing device 115 records ground truth data based on object detection, pixel-wise segmentation, 3D object poses, and relative distances all determined based the recorded images 200, distance data, location data, and map data as discussed above in relation to FIG. 4, corresponding to the images 200 recorded at step 602.
  • At step 606 computing device inputs images 200 to CNN 400 while constructing prediction images p2-p6 to train CNN 400 according to cost functions in equations 1 and 2, above. Prediction images p2-p6 are constructed to include the recorded ground truth data based on object detection, pixel-wise segmentation, 3D object poses, and relative distances. Prediction images p2-p6 can be created by homographic projection of ground truth data and used to transform estimated results 412 into top-down view, estimated cognitive maps 414 that can be used to train CNN 400 to output a cognitive map 300 in response to inputting an image 200 as discussed above in relation to FIG. 4. By comparing the intermediate cognitive maps 806 output by deconvolution layers D1, D3, D5, D7, and D9 with the estimated cognitive results 414 and back propagating the results of a cost function as described in relation to equations 1 and 2, CNN 400 can be trained to output a cognitive map 300 in response to an input image 200.
  • At step 608 the trained CNN 400 is output to be stored at memory included in computing device 115. Computing device 115 can recall the trained CNN 400 from memory, input an acquired image 200 to the trained CNN 400 and receive as output a cognitive map 300, to be used to operate a vehicle 110, without having to input ground truth data. Following this step process 600 ends.
  • Computing devices such as those discussed herein generally each include commands executable by one or more computing devices such as those identified above, and for carrying out blocks or steps of processes described above. For example, process blocks discussed above may be embodied as computer-executable commands.
  • Computer-executable commands may be compiled or interpreted from computer programs created using a variety of programming languages and/or technologies, including, without limitation, and either alone or in combination, Java™, C, C++, Visual Basic, Java Script, Perl, HTML, etc. In general, a processor (e.g., a microprocessor) receives commands, e.g., from a memory, a computer-readable medium, etc., and executes these commands, thereby performing one or more processes, including one or more of the processes described herein. Such commands and other data may be stored in files and transmitted using a variety of computer-readable media. A file in a computing device is generally a collection of data stored on a computer readable medium, such as a storage medium, a random access memory, etc.
  • A computer-readable medium includes any medium that participates in providing data (e.g., commands), which may be read by a computer. Such a medium may take many forms, including, but not limited to, non-volatile media, volatile media, etc. Non-volatile media include, for example, optical or magnetic disks and other persistent memory. Volatile media include dynamic random access memory (DRAM), which typically constitutes a main memory. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EEPROM, any other memory chip or cartridge, or any other medium from which a computer can read.
  • All terms used in the claims are intended to be given their plain and ordinary meanings as understood by those skilled in the art unless an explicit indication to the contrary in made herein. In particular, use of the singular articles such as “a,” “the,” “said,” etc. should be read to recite one or more of the indicated elements unless a claim recites an explicit limitation to the contrary.
  • The term “exemplary” is used herein in the sense of signifying an example, e.g., a reference to an “exemplary widget” should be read as simply referring to an example of a widget.
  • The adverb “approximately” modifying a value or result means that a shape, structure, measurement, value, determination, calculation, etc. may deviate from an exact described geometry, distance, measurement, value, determination, calculation, etc., because of imperfections in materials, machining, manufacturing, sensor measurements, computations, processing time, communications time, etc.
  • In the drawings, the same reference numbers indicate the same elements. Further, some or all of these elements could be changed. With regard to the media, processes, systems, methods, etc. described herein, it should be understood that, although the steps of such processes, etc. have been described as occurring according to a certain ordered sequence, such processes could be practiced with the described steps performed in an order other than the order described herein. It further should be understood that certain steps could be performed simultaneously, that other steps could be added, or that certain steps described herein could be omitted. In other words, the descriptions of processes herein are provided for the purpose of illustrating certain embodiments, and should in no way be construed so as to limit the claimed invention.

Claims (20)

1. A method, comprising:
acquiring, from an image sensor, an image of a vehicle environment;
determining, by executing programming in a processor, a cognitive map as output from a convolutional neural network (CNN) that accepts the image as input, the cognitive map including a plurality of objects, including a class, location, and pose of each object in a top-down view of the vehicle environment, wherein the cognitive map includes a plurality of planes, each of the planes including at most a single class of object; and
operating the vehicle based on the cognitive map.
2. The method of claim 1, wherein the vehicle environment includes a roadway, and the objects include other vehicles and pedestrians.
3. The method of claim 2, further comprising determining the cognitive map including locations of the objects including at least one of other vehicles and pedestrians, relative to the vehicle.
4. The method of claim 1, wherein the image is a monocular video frame.
5. (canceled)
6. The method of claim 1, further comprising training the convolutional neural network based on ground truth data prior to determining the cognitive map.
7. The method of claim 6, wherein ground truth data is based on object detection, pixel-wise segmentation, 3D object pose, and relative distance.
8. The method of claim 7, wherein training the convolutional neural network is based on prediction images included in the convolutional neural network.
9. The method of claim 8, wherein the prediction images are based on ground truth data.
10. A system, comprising a processor; and a memory, the memory including instructions to be executed by the processor to:
acquire an image of a vehicle environment;
determine a cognitive map as output from a convolutional neural network (CNN) that accepts the image as input, the cognitive map including a plurality of objects, including a class, location, and pose of each object in a top-down view of the vehicle environment, wherein the cognitive map includes a plurality of planes, each of the planes including at most a single class of object; and
operate the vehicle based on the cognitive map.
11. The processor of claim 10, wherein the vehicle environment includes a roadway, and the objects include other vehicles and pedestrians.
12. The processor of claim 11, the instructions further including instructions to determine the cognitive map including locations of the objects including at least one of other vehicles and pedestrians, relative to the vehicle.
13. The processor of claim 10, wherein the image is a monocular video frame.
14. (canceled)
15. The processor of claim 10, wherein the convolutional neural network is trained based on ground truth data prior to determining the cognitive map.
16. The processor of claim 15, wherein ground truth data includes object detection, pixel-wise segmentation, 3D object pose, and relative distance.
17. The processor of claim 16, wherein training the convolutional neural network is based on prediction images included in the convolutional neural network.
18. The processor of claim 17, wherein the prediction images are based on ground truth data.
19. A system, comprising:
a video sensor operative to acquire an image of a vehicle environment;
vehicle components operative to operate a vehicle;
a processor; and a memory, the memory including instructions to be executed by the processor to:
acquire the image of the vehicle environment;
determine a cognitive map as output from a convolutional neural network (CNN) that accepts the image as input, the cognitive map including a plurality of objects, including a class, location, and pose of each object in a top-down view of the vehicle environment, wherein the cognitive map includes a plurality of planes, each of the planes including at most a single class of object; and
operate the vehicle based on the cognitive map.
20. The system of claim 19, wherein the vehicle environment includes a roadway, and the objects include other vehicles and pedestrians.
US15/881,228 2018-01-26 2018-01-26 Cognitive mapping for vehicles Active US10345822B1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US15/881,228 US10345822B1 (en) 2018-01-26 2018-01-26 Cognitive mapping for vehicles
CN201910068684.1A CN110084091A (en) 2018-01-26 2019-01-24 Cognition for vehicle maps
DE102019101938.9A DE102019101938A1 (en) 2018-01-26 2019-01-25 Creation of cognitive maps for vehicles

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US15/881,228 US10345822B1 (en) 2018-01-26 2018-01-26 Cognitive mapping for vehicles

Publications (2)

Publication Number Publication Date
US10345822B1 US10345822B1 (en) 2019-07-09
US20190235520A1 true US20190235520A1 (en) 2019-08-01

Family

ID=67106346

Family Applications (1)

Application Number Title Priority Date Filing Date
US15/881,228 Active US10345822B1 (en) 2018-01-26 2018-01-26 Cognitive mapping for vehicles

Country Status (3)

Country Link
US (1) US10345822B1 (en)
CN (1) CN110084091A (en)
DE (1) DE102019101938A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190340522A1 (en) * 2017-01-23 2019-11-07 Panasonic Intellectual Property Management Co., Ltd. Event prediction system, event prediction method, recording media, and moving body
US20210101624A1 (en) * 2019-10-02 2021-04-08 Zoox, Inc. Collision avoidance perception system
US11068724B2 (en) * 2018-10-11 2021-07-20 Baidu Usa Llc Deep learning continuous lane lines detection system for autonomous vehicles
US11180080B2 (en) * 2019-12-13 2021-11-23 Continental Automotive Systems, Inc. Door opening aid systems and methods
US11994866B2 (en) 2019-10-02 2024-05-28 Zoox, Inc. Collision avoidance perception system

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10733506B1 (en) * 2016-12-14 2020-08-04 Waymo Llc Object detection neural network
GB2601644B (en) * 2017-04-28 2023-02-08 FLIR Belgium BVBA Video and image chart fusion systems and methods
CN107589552B (en) 2017-10-17 2023-08-04 歌尔光学科技有限公司 Optical module assembly equipment
EP3904835A4 (en) * 2018-12-24 2022-10-05 LG Electronics Inc. Route providing device and route providing method thereof
US10635938B1 (en) * 2019-01-30 2020-04-28 StradVision, Inc. Learning method and learning device for allowing CNN having trained in virtual world to be used in real world by runtime input transformation using photo style transformation, and testing method and testing device using the same
US10762393B2 (en) * 2019-01-31 2020-09-01 StradVision, Inc. Learning method and learning device for learning automatic labeling device capable of auto-labeling image of base vehicle using images of nearby vehicles, and testing method and testing device using the same
US11150664B2 (en) * 2019-02-01 2021-10-19 Tesla, Inc. Predicting three-dimensional features for autonomous driving
US10997461B2 (en) 2019-02-01 2021-05-04 Tesla, Inc. Generating ground truth for machine learning from time series elements
US11341614B1 (en) * 2019-09-24 2022-05-24 Ambarella International Lp Emirror adaptable stitching
CN112711249B (en) * 2019-10-24 2023-01-03 科沃斯商用机器人有限公司 Robot positioning method and device, intelligent robot and storage medium
CN111275249A (en) * 2020-01-15 2020-06-12 吉利汽车研究院(宁波)有限公司 Driving behavior optimization method based on DQN neural network and high-precision positioning
US11511576B2 (en) * 2020-01-24 2022-11-29 Ford Global Technologies, Llc Remote trailer maneuver assist system
KR20210124603A (en) * 2020-04-06 2021-10-15 현대자동차주식회사 Apparatus for controlling autonomous driving of a vehicle, system having the same and method thereof
CN111959495B (en) * 2020-06-29 2021-11-12 阿波罗智能技术(北京)有限公司 Vehicle control method and device and vehicle
EP4192714A1 (en) * 2020-09-11 2023-06-14 Waymo Llc Estimating ground truth object keypoint labels for sensor readings
CN113312438B (en) * 2021-03-09 2023-09-15 中南大学 Marine target position prediction method integrating route extraction and trend judgment
DE102021209786A1 (en) 2021-09-06 2023-03-09 Robert Bosch Gesellschaft mit beschränkter Haftung Method for positioning a map representation of an area surrounding a vehicle in a semantic road map
US11541910B1 (en) * 2022-01-07 2023-01-03 Plusai, Inc. Methods and apparatus for navigation of an autonomous vehicle based on a location of the autonomous vehicle relative to shouldered objects
US11840257B2 (en) * 2022-03-25 2023-12-12 Embark Trucks Inc. Lane change determination for vehicle on shoulder

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3164860A4 (en) * 2014-07-03 2018-01-17 GM Global Technology Operations LLC Vehicle cognitive radar methods and systems
US10099615B2 (en) * 2014-09-29 2018-10-16 Ambarella, Inc. All-round view monitoring system for a motor vehicle
US10133947B2 (en) * 2015-01-16 2018-11-20 Qualcomm Incorporated Object detection using location data and scale space representations of image data
CN105260699B (en) 2015-09-10 2018-06-26 百度在线网络技术(北京)有限公司 A kind of processing method and processing device of lane line data
CN105488534B (en) 2015-12-04 2018-12-07 中国科学院深圳先进技术研究院 Traffic scene deep analysis method, apparatus and system
US10181195B2 (en) * 2015-12-28 2019-01-15 Facebook, Inc. Systems and methods for determining optical flow
EP3206184A1 (en) * 2016-02-11 2017-08-16 NXP USA, Inc. Apparatus, method and system for adjusting predefined calibration data for generating a perspective view
CN106125730B (en) 2016-07-10 2019-04-30 北京工业大学 A kind of robot navigation's map constructing method based on mouse cerebral hippocampal spatial cell
CN106372577A (en) 2016-08-23 2017-02-01 北京航空航天大学 Deep learning-based traffic sign automatic identifying and marking method
CN106558058B (en) 2016-11-29 2020-10-09 北京图森未来科技有限公司 Segmentation model training method, road segmentation method, vehicle control method and device
US10067509B1 (en) * 2017-03-10 2018-09-04 TuSimple System and method for occluding contour detection
CN107169421B (en) 2017-04-20 2020-04-28 华南理工大学 Automobile driving scene target detection method based on deep convolutional neural network
US10474908B2 (en) 2017-07-06 2019-11-12 GM Global Technology Operations LLC Unified deep convolutional neural net for free-space estimation, object detection and object pose estimation

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190340522A1 (en) * 2017-01-23 2019-11-07 Panasonic Intellectual Property Management Co., Ltd. Event prediction system, event prediction method, recording media, and moving body
US11068724B2 (en) * 2018-10-11 2021-07-20 Baidu Usa Llc Deep learning continuous lane lines detection system for autonomous vehicles
US20210101624A1 (en) * 2019-10-02 2021-04-08 Zoox, Inc. Collision avoidance perception system
US11726492B2 (en) * 2019-10-02 2023-08-15 Zoox, Inc. Collision avoidance perception system
US11994866B2 (en) 2019-10-02 2024-05-28 Zoox, Inc. Collision avoidance perception system
US11180080B2 (en) * 2019-12-13 2021-11-23 Continental Automotive Systems, Inc. Door opening aid systems and methods

Also Published As

Publication number Publication date
CN110084091A (en) 2019-08-02
DE102019101938A1 (en) 2019-08-01
US10345822B1 (en) 2019-07-09

Similar Documents

Publication Publication Date Title
US10345822B1 (en) Cognitive mapping for vehicles
US10853670B2 (en) Road surface characterization using pose observations of adjacent vehicles
US11312372B2 (en) Vehicle path prediction
US10733510B2 (en) Vehicle adaptive learning
US10981564B2 (en) Vehicle path planning
US11783707B2 (en) Vehicle path planning
US10528055B2 (en) Road sign recognition
US11460851B2 (en) Eccentricity image fusion
US9672446B1 (en) Object detection for an autonomous vehicle
US20200020117A1 (en) Pose estimation
US20170316684A1 (en) Vehicle lane map estimation
US10769799B2 (en) Foreground detection
US11521494B2 (en) Vehicle eccentricity mapping
US11055859B2 (en) Eccentricity maps
US11030774B2 (en) Vehicle object tracking
US11662741B2 (en) Vehicle visual odometry
US11138452B2 (en) Vehicle neural network training
US11119491B2 (en) Vehicle steering control
CN111791814A (en) Vehicle capsule network
US10599146B2 (en) Action-conditioned vehicle control
US20230186587A1 (en) Three-dimensional object detection
US11610412B2 (en) Vehicle neural network training
US20240037961A1 (en) Systems and methods for detecting lanes using a segmented image and semantic context
US20230368541A1 (en) Object attention network

Legal Events

Date Code Title Description
AS Assignment

Owner name: FORD GLOBAL TECHNOLOGIES, LLC, MICHIGAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PARCHAMI, MOSTAFA;TAIMOURI, VAHID;PUSKORIUS, GINTARAS VINCENT;SIGNING DATES FROM 20180116 TO 20180125;REEL/FRAME:044742/0375

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4