US20190235520A1 - Cognitive mapping for vehicles - Google Patents
Cognitive mapping for vehicles Download PDFInfo
- Publication number
- US20190235520A1 US20190235520A1 US15/881,228 US201815881228A US2019235520A1 US 20190235520 A1 US20190235520 A1 US 20190235520A1 US 201815881228 A US201815881228 A US 201815881228A US 2019235520 A1 US2019235520 A1 US 2019235520A1
- Authority
- US
- United States
- Prior art keywords
- vehicle
- cognitive map
- image
- cognitive
- processor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000001149 cognitive effect Effects 0.000 title claims abstract description 130
- 238000013507 mapping Methods 0.000 title 1
- 230000015654 memory Effects 0.000 claims abstract description 23
- 238000013527 convolutional neural network Methods 0.000 claims description 57
- 238000000034 method Methods 0.000 claims description 37
- 238000012549 training Methods 0.000 claims description 14
- 230000011218 segmentation Effects 0.000 claims description 13
- 238000001514 detection method Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 description 24
- 238000004891 communication Methods 0.000 description 13
- 230000006870 function Effects 0.000 description 13
- 238000010586 diagram Methods 0.000 description 10
- 230000001133 acceleration Effects 0.000 description 7
- 230000004888 barrier function Effects 0.000 description 6
- 230000004044 response Effects 0.000 description 5
- 239000003550 marker Substances 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 230000001413 cellular effect Effects 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 238000002485 combustion reaction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 238000001454 recorded image Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004040 coloring Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000003754 machining Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003340 mental effect Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000001556 precipitation Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W30/00—Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
- B60W30/08—Active safety systems predicting or avoiding probable or impending collision or attempting to minimise its consequences
- B60W30/09—Taking automatic action to avoid collision, e.g. braking and steering
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/0088—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0231—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
- G05D1/0246—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means
- G05D1/0248—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means in combination with a laser
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0231—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
- G05D1/0246—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means
- G05D1/0251—Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using a video camera in combination with image processing means extracting 3D information from a plurality of images taken from different locations, e.g. stereo vision
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24143—Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]
-
- G06K9/00791—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/70—Labelling scene content, e.g. deriving syntactic or semantic representations
Definitions
- Vehicles can be equipped to operate in both autonomous and occupant piloted mode.
- Vehicles can be equipped with computing devices, networks, sensors and controllers to acquire information regarding the vehicle's environment and to operate the vehicle based on the information.
- Safe and comfortable operation of the vehicle can depend upon determining predicted vehicle trajectories based on accurate and timely information regarding the vehicle's environment.
- safe and comfortable operation of the vehicle can depend upon acquiring accurate and timely information regarding objects in a vehicle's environment while the vehicle is being operated on a roadway. It is a problem to provide accurate and timely information regarding objects near or around a vehicle to support operation of the vehicle.
- FIG. 1 is a block diagram of an example vehicle.
- FIG. 2 is a diagram of an example image of a traffic scene.
- FIG. 3 is a diagram of an example cognitive map.
- FIG. 4 is a diagram of an example convolutional neural network.
- FIG. 5 is a flowchart diagram of an example process to operate a vehicle based on a cognitive map.
- FIG. 6 is a flowchart diagram of an example process to train a convolutional neural network to output a cognitive map.
- Vehicles can be equipped to operate in both autonomous and occupant piloted mode.
- a semi- or fully-autonomous mode we mean a mode of operation wherein a vehicle can be piloted by a computing device as part of a vehicle information system having sensors and controllers. The vehicle can be occupied or unoccupied, but in either case the vehicle can be piloted without assistance of an occupant.
- an autonomous mode is defined as one in which each of vehicle propulsion (e.g., via a powertrain including an internal combustion engine and/or electric motor), braking, and steering are controlled by one or more vehicle computers; in a semi-autonomous mode the vehicle computer(s) control(s) one or two of vehicle propulsion, braking, and steering. In a non-autonomous vehicle, none of these are controlled by a computer.
- An estimate of a location, e.g., according to geo-coordinates, of a vehicle with respect to a map can be used by a computing device to operate a vehicle on a roadway from a current location to a determined destination, for example.
- the map can be a cognitive map.
- a cognitive map in the context of this disclosure is a top-down view, 2D representation of the physical environment around a vehicle.
- the cognitive map can include a top-down, 2D representation of the roadway ahead of a current vehicle location and in a direction of current vehicle travel.
- the direction of current vehicle travel is based on the current vehicle trajectory, which includes speed, direction, longitudinal acceleration, and lateral acceleration.
- the cognitive map can include a roadway and objects such as lanes, barriers, shoulders, and lane markers, vehicles and pedestrians, for example.
- a cognitive map is a mental representation of the physical environment. For example, humans and animals use cognitive maps to find their way around their environment.
- a cognitive map is used by a computing device to operate a vehicle, including actuating vehicle components including powertrain, steering and braking to direct the vehicle from a current location to a destination location in a safe and comfortable fashion.
- the cognitive map can be used by the computing device to determine predicted vehicle trajectories based on determined locations of lanes and determined locations and trajectories of other vehicles in the cognitive map, for example.
- a cognitive map can depict semantic segmentation of objects viewed from top-down view and accurately illustrate a distance to each point from vehicle 110 .
- a method including acquiring an image of a vehicle environment, determining a cognitive map, which includes a top-down view of the vehicle environment, based on the image, and operating the vehicle based on the cognitive map.
- the vehicle environment can include a roadway and objects including other vehicles and pedestrians.
- the cognitive map can include locations of the objects including at least one of other vehicles and pedestrians, relative to the vehicle.
- the image can be a monocular video frame.
- the cognitive map of the vehicle environment can be based on processing the image with a convolutional neural network.
- the convolutional neural network can be trained based on ground truth data prior to determining the cognitive map.
- the ground truth data can be based on object detection, pixel-wise segmentation, 3D object pose, and relative distance.
- Training the convolutional neural network can be based on prediction images included in the convolutional neural network.
- the prediction images can be based on ground truth data.
- the neural network learns how to transform input RGB images to estimation of cognitive maps.
- the estimated cognitive maps can be combined with intermediate estimations of cognitive maps to and compared against the prediction images to determine similarity.
- the similarity between the estimated combined cognitive maps can be determined by calculating a cost function.
- the cost function can be based on a weighted cross entropy function based on comparing the estimated cognitive maps and the intermediate cognitive maps with the prediction images.
- the prediction images can be based on LIDAR data.
- a computer readable medium storing program instructions for executing some or all of the above method steps.
- a computer programmed for executing some or all of the above method steps including a computer apparatus, programmed to acquire an image of a vehicle environment, determine a cognitive map, which includes a top-down view of the vehicle environment, based on the image, and operate the vehicle based on the cognitive map.
- the vehicle environment can include a roadway and objects including other vehicles and pedestrians.
- the cognitive map can include locations of the objects including at least one of other vehicles and pedestrians, relative to the vehicle.
- the image can be a monocular video frame.
- the cognitive map of the vehicle environment can be based on processing the image with a convolutional neural network.
- the convolutional neural network can be trained based on ground truth data prior to determining the cognitive map.
- the ground truth data can be based on object detection, pixel-wise segmentation, 3D object pose, and relative distance.
- the computer can be further programmed to train the convolutional neural network based on prediction images included in the convolutional neural network.
- the prediction images can be based on ground truth data.
- the prediction images can transform estimated results into estimated cognitive maps.
- the estimated cognitive maps can be combined with intermediate cognitive maps to determine similarity.
- the similarity between the estimated cognitive maps and the prediction images can be determined by calculating a cost function.
- the cost function can be based on a weighted cross entropy function based on comparing the estimated cognitive maps combined with the intermediate cognitive maps and prediction images.
- the prediction images can be based on LIDAR data.
- FIG. 1 is a diagram of a vehicle information system 100 that includes a vehicle 110 operable in autonomous (“autonomous” by itself in this disclosure means “fully autonomous”) and occupant piloted (also referred to as non-autonomous) mode.
- Vehicle 110 also includes one or more computing devices 115 for performing computations for piloting the vehicle 110 during autonomous operation.
- Computing devices 115 can receive information regarding the operation of the vehicle from sensors 116 .
- the computing device 115 includes a processor and a memory such as are known. Further, the memory includes one or more forms of computer-readable media, and stores instructions executable by the processor for performing various operations, including as disclosed herein.
- the computing device 115 may include programming to operate one or more of vehicle brakes, propulsion (e.g., control of acceleration in the vehicle 110 by controlling one or more of an internal combustion engine, electric motor, hybrid engine, etc.), steering, climate control, interior and/or exterior lights, etc., as well as to determine whether and when the computing device 115 , as opposed to a human operator, is to control such operations.
- propulsion e.g., control of acceleration in the vehicle 110 by controlling one or more of an internal combustion engine, electric motor, hybrid engine, etc.
- steering climate control
- interior and/or exterior lights etc.
- the computing device 115 may include or be communicatively coupled to, e.g., via a vehicle communications bus as described further below, more than one computing devices, e.g., controllers or the like included in the vehicle 110 for monitoring and/or controlling various vehicle components, e.g., a powertrain controller 112 , a brake controller 113 , a steering controller 114 , etc.
- the computing device 115 is generally arranged for communications on a vehicle communication network, e.g., including a bus in the vehicle 110 such as a controller area network (CAN) or the like; the vehicle 110 network can additionally or alternatively include wired or wireless communication mechanism such as are known, e.g., Ethernet or other communication protocols.
- a vehicle communication network e.g., including a bus in the vehicle 110 such as a controller area network (CAN) or the like
- the vehicle 110 network can additionally or alternatively include wired or wireless communication mechanism such as are known, e.g., Ethernet or other communication protocols.
- the computing device 115 may transmit messages to various devices in the vehicle and/or receive messages from the various devices, e.g., controllers, actuators, sensors, etc., including sensors 116 .
- the vehicle communication network may be used for communications between devices represented as the computing device 115 in this disclosure.
- various controllers or sensing elements such as sensors 116 may provide data to the computing device 115 via the vehicle communication network.
- the computing device 115 may be configured for communicating through a vehicle-to-infrastructure (V-to-I) interface 111 with a remote server computer 120 , e.g., a cloud server, via a network 130 , which, as described below.
- a vehicle-to-infrastructure (V-to-I) interface 111 includes hardware, firmware, and software that permits computing device 115 to communicate with a remote server computer 120 via a network 130 such as wireless Internet (Wi-Fi) or cellular networks.
- V-to-I interface 111 may accordingly include processors, memory, transceivers, etc., configured to utilize various wired and/or wireless networking technologies, e.g., cellular, BLUETOOTH® and wired and/or wireless packet networks.
- Computing device 115 may be configured for communicating with other vehicles through V-to-I interface 111 using vehicle-to-vehicle (V-to-V) networks, e.g., according to Dedicated Short Range Communications (DSRC) and/or the like, e.g., formed on an ad hoc basis among nearby vehicles 110 or formed through infrastructure-based networks including the Internet via cellular networks or Wi-Fi, for example.
- V-to-V vehicle-to-vehicle
- DSRC Dedicated Short Range Communications
- the computing device 115 also includes nonvolatile memory such as is known.
- Computing device 115 can log, i.e., store in a memory, information by storing the information in nonvolatile memory for later retrieval and transmittal via the vehicle communication network and a vehicle to infrastructure (V-to-I) interface 111 to a server computer 120 or user mobile device 160 .
- the computing device 115 may make various determinations and/or control various vehicle 110 components and/or operations without a driver to operate the vehicle 110 .
- the computing device 115 may include programming to regulate vehicle 110 operational behaviors (i.e., physical manifestations of vehicle 110 operation) such as speed, acceleration, deceleration, steering, etc., as well as tactical behaviors (i.e., control of operational behaviors typically in a manner intended to achieve safe and efficient traversal of a route) such as a distance between vehicles and/or amount of time between vehicles, lane-change, minimum gap between vehicles, left-turn-across-path minimum, time-to-arrival at a particular location and intersection (without signal) minimum time-to-arrival to cross the intersection.
- vehicle 110 operational behaviors i.e., physical manifestations of vehicle 110 operation
- tactical behaviors i.e., control of operational behaviors typically in a manner intended to achieve safe and efficient traversal of a route
- tactical behaviors i.e., control of operational behaviors typically in a manner intended to achieve safe and efficient traversal of a route
- Controllers include computing devices that typically are programmed to control a specific vehicle subsystem. Examples include a powertrain controller 112 , a brake controller 113 , and a steering controller 114 .
- a controller may be an electronic control unit (ECU) such as is known, possibly including additional programming as described herein.
- the controllers may communicatively be connected to and receive instructions from the computing device 115 to actuate the subsystem according to the instructions.
- the brake controller 113 may receive instructions from the computing device 115 to operate the brakes of the vehicle 110 .
- the one or more controllers 112 , 113 , 114 for the vehicle 110 may include conventional electronic control units (ECUs) or the like including, as non-limiting examples, one or more powertrain controllers 112 , one or more brake controllers 113 and one or more steering controllers 114 .
- ECUs electronice control units
- Each of the controllers 112 , 113 , 114 may include respective processors and memories and one or more actuators.
- the controllers 112 , 113 , 114 may be programmed and connected to a vehicle 110 communications bus, such as a controller area network (CAN) bus or local interconnect network (LIN) bus, to receive instructions from the computer 115 and control actuators based on the instructions.
- a vehicle 110 communications bus such as a controller area network (CAN) bus or local interconnect network (LIN) bus
- Sensors 116 may include a variety of devices known to provide data via the vehicle communications bus.
- a radar fixed to a front bumper (not shown) of the vehicle 110 may provide a distance from the vehicle 110 to a next vehicle in front of the vehicle 110
- a global positioning system (GPS) sensor disposed in the vehicle 110 may provide geographical coordinates of the vehicle 110 .
- the distance(s) provided by the radar and/or other sensors 116 and/or the geographical coordinates provided by the GPS sensor may be used by the computing device 115 to operate the vehicle 110 autonomously or semi-autonomously.
- the vehicle 110 is generally a land-based autonomous vehicle 110 having three or more wheels, e.g., a passenger car, light truck, etc.
- the vehicle 110 includes one or more sensors 116 , the V-to-I interface 111 , the computing device 115 and one or more controllers 112 , 113 , 114 .
- the sensors 116 may be programmed to collect data related to the vehicle 110 and the environment in which the vehicle 110 is operating.
- sensors 116 may include, e.g., altimeters, cameras, LIDAR, radar, ultrasonic sensors, infrared sensors, pressure sensors, accelerometers, gyroscopes, temperature sensors, pressure sensors, hall sensors, optical sensors, voltage sensors, current sensors, mechanical sensors such as switches, etc.
- the sensors 116 may be used to sense the environment in which the vehicle 110 is operating, e.g., sensors 116 can detect phenomena such as weather conditions (precipitation, external ambient temperature, etc.), the grade of a road, the location of a road (e.g., using road edges, lane markings, etc.), or locations of target objects such as neighboring vehicles 110 .
- the sensors 116 may further be used to collect data including dynamic vehicle 110 data related to operations of the vehicle 110 such as velocity, yaw rate, steering angle, engine speed, brake pressure, oil pressure, the power level applied to controllers 112 , 113 , 114 in the vehicle 110 , connectivity between components, and accurate and timely performance of components of the vehicle 110 .
- FIG. 2 illustrates an image 200 of a traffic scene including a roadway 202 and other vehicles 204 , 206 , 208 , 210 .
- the image 200 can be a monocular video frame acquired by computing device 115 from a video sensor 116 included in a vehicle 110 , for example.
- a monocular video frame can include three color planes with a bit depth of eight bits each for a total of 24 bits corresponding to red, green, and blue (RGB) color components.
- Image 200 can include a roadway 202 , lane marker 212 , barriers 224 , 226 , 228 and roadway shoulders or terrain adjacent to roadway 230 , 232 .
- Computing device 115 can use image 200 to produce a cognitive map including roadway 202 and objects including other vehicles 204 , 206 , 208 , 210 , lane marker 212 , barriers 224 , 226 , 228 and roadway shoulders or terrain adjacent to roadway 230 , 232 and, based on the cognitive map including roadway 202 and objects, determine predicted trajectories for operating vehicle 110 .
- FIG. 3 is a cognitive map 300 of a traffic scene including a roadway 302 (white) and objects including other vehicles 304 , 306 , 308 , 310 , (grid) rendered in white and grid, respectively, to denote different colors.
- lane marker 312 black
- barriers 314 , 316 , 318 upward diagonal
- shoulders or adjacent terrain 320 , 322 cross-hatch
- a cognitive map can include 20 or more channels each including objects belonging to a single class, such a “roadway”, “vehicle”, “pedestrian”, “cyclist”, etc.
- Cognitive map 300 can be created by inputting an image 200 into a convolutional neural network (CNN), configured and trained as described in relation to FIG. 4 , below, which, in response to the input, outputs a cognitive map 300 .
- CNN convolutional neural network
- Computing device 115 can operate vehicle 110 based on cognitive map 300 .
- Operating vehicle 110 can include actuating vehicle components such as powertrain, steering and braking via controllers 112 , 113 , 114 to determine vehicle location and trajectory based on predicted locations and trajectories. The predicted locations and trajectories can be determined based on the cognitive map 300 .
- computing device 115 can operate vehicle 110 to follow predicted trajectories that locate vehicle 110 in the center of a lane, the lane determined based on lane marker 312 and barrier 314 while maintaining a predetermined distance between vehicle 110 and other vehicle 310 .
- Computing device 115 can predict vehicle trajectories that can be used to actuate powertrain, steering and braking components based on distances to and locations of objects in the cognitive map 300 relative to the location of vehicle 110 , for example.
- Predicted trajectories of object including other vehicles 304 , 306 , 308 , 310 can be determined by comparing the location of the objects in successive cognitive maps 300 created at successive time intervals, from images 200 acquired at successive time intervals. Trajectories of other vehicles 304 , 306 , 308 , 310 can be determined by determining the locations of other vehicles 304 , 306 , 308 , 310 in successive cognitive maps 300 created at successive time intervals, fitting a curve to the location points and calculating vectors equal to the first and second derivatives of each curve in the 2D plane of the cognitive map 300 .
- the magnitude of the first derivative is speed and the angle is direction.
- the second derivatives are directional derivatives parallel to the first derivative direction (longitudinal acceleration) and perpendicular to the first derivative direction (latitudinal acceleration).
- FIG. 4 is a diagram of an example CNN 400 configured to input an image 200 and output a cognitive map 300 .
- the image 200 can be a monocular RGB video image acquired from a video sensor 116 included in a vehicle 110 that includes a scene depicting the physical environment near vehicle 110 .
- the cognitive map 300 is a 2D representation of the physical environment near vehicle 110 including 20 or more channels each including a single class of objects present in the scene, identified by type, distance and 3D pose relative to vehicle 110 , where 3D pose is defined as the orientation of an object in 3D space relative to a frame of reference expressed as angles ⁇ , ⁇ , and ⁇ .
- Information regarding object type, distance and 3D pose included in cognitive map 300 as a top-down view can permit computing device 115 to determine trajectories to operate vehicle 110 safely by traveling on the roadway and avoiding collisions.
- CNN 400 is a program in memory executing on a processor included in computing device 115 and includes a set of ten convolutional layers C 1 -C 10 (3D boxes) configured to input 402 an image 200 to convolutional layer C 1 .
- Convolutional layer C 1 produces an intermediate result 406 , represented by the arrow between convolutional layer C 1 and convolutional layer C 2 .
- Each convolutional layer C 2 -C 10 receives an intermediate result 406 and outputs an intermediate result 406 represented by the arrows between adjacent convolutional layers C 1 -C 10 , representing forward propagation of intermediate results 406 .
- Convolutional layers C 1 -C 10 each output an intermediate result 406 at an output spatial resolution equal to the input spatial resolution or at an output spatial resolution reduced from the input spatial resolution.
- Bit depth per resolution element increases for intermediate results as spatial resolution increases as described in Table 1, below.
- This repeats for convolutional layers C 2 -C 9 which produce intermediate results 406 , represented by the dark arrows between convolutional layers C 2 -C 9 at successively lower resolutions.
- Convolutional layers C 1 -C 9 can reduce resolution by pooling, wherein an adjacent group of pixels, which can be a 2 ⁇ 2 neighborhood, for example, are combined to form a single pixel according to a predetermined equation. Combining a group of pixels by selecting a maximum value among them, called “max pooling”, can reduce resolution while retaining information in intermediate results 406 .
- convolutional layer C 10 outputs intermediate result 406 to first deconvolutional layer D 1 , which can deconvolve and upsample intermediate result 406 to produce intermediate cognitive map 408 , represented by the arrows between each of deconvolutional layers D 1 -D 10 .
- Deconvolution is convolution performed with a kernel that is, at least in part, an inverse of another kernel previously used to convolve a function and can partially invert the effects of the previous convolution.
- deconvolutional layers D 1 -D 10 can increase spatial resolution of intermediate cognitive map 408 while decreasing the bit depth according to Table 1, below.
- Convolutional layer C 10 also outputs estimated feature maps 412 to prediction image p 6 , which, when training CNN 400 , combines estimated feature maps 412 from convolutional layer C 10 with ground truth-based information regarding objects that transforms the estimated feature maps 412 into an estimated cognitive map 414 .
- the estimated cognitive map 414 is combined with the intermediate feature maps 408 output from deconvolution layer D 1 when training CNN 400 . This is shown by the “+” signs on the intermediate cognitive map 408 arrow between deconvolution layers D 1 -D 2 . Comparing the intermediate cognitive map 408 based on input image I with ground truth-based information including object detection, pixel-wise segmentation, 3D object poses, and relative distances is used for training the convolutional neural network.
- the “+” sign on the intermediate cognitive map 408 between deconvolution layers D 1 -D 2 also indicates combining intermediate feature map 408 and predicted cognitive map 414 with skip connection results 410 from convolutional layer C 7 received via skip connections.
- Skip connection results 410 are intermediate results 406 forward propagated via skip connections as input to an upsampling deconvolution layer D 2 , D 4 , D 6 , D 8 , D 10 .
- Skip connection results 410 can be combined with intermediate feature maps 408 to increase resolution of intermediate feature map 408 by upsampling to pass onto succeeding deconvolutional layers D 3 , D 5 , D 7 , D 9 .
- Skip connections can forward propagate skip connection results 410 at the same resolution as the deconvolutional layers D 2 , D 4 , D 6 , D 8 , D 10 receiving the information.
- Deconvolutional layers D 1 -D 10 include prediction images p 2 -p 6 .
- Prediction images p 2 -p 6 are used for training CNN 400 to produce cognitive maps 300 from image 200 input.
- Prediction images p 2 -p 6 are determined based on ground truth images developed independently of CNN 400 .
- Ground truth refers to information regarding the physical environment near vehicle 110 . Accordingly, ground truth data in the present context can include distance and pose information determined using sensors 116 including multi-camera video sensors 116 , LIDAR sensors 116 , and radar sensors 116 , location data from GPS sensors 116 , INS sensors 116 , and odometry sensors 116 .
- Ground truth data in the present context can also include map data stored in a memory of computing device 115 , and/or from a server computer 120 , combined with information regarding object classification determined using CNN-based object classification programs.
- Such CNN-based object classification programs typically receive as input images 200 , and then output images 200 segmented into regions that include objects such as roadways, lane markings, barriers, lanes, shoulders or adjacent terrain, other vehicles including type and model, and other objects including pedestrians, animals, bicycles, etc.
- Prediction images p 2 -p 6 combine distance information with segmentation information to transform estimated results 412 from convolutional layer C 10 and deconvolutional layers D 2 , D 4 , D 6 and D 8 into estimated cognitive maps 414 by orthographically projecting the estimated results 412 onto a 2D ground plane based on distance information to segmented objects and coloring the estimated cognitive map 414 based on information regarding object detection, pixel-wise segmentation, 3D object poses, and relative distances included in prediction images p 2 -p 6 .
- Prediction images p 2 -p 6 are used to train CNN 400 to output a cognitive map 300 in response to inputting an image 200 by outputting estimated cognitive maps 414 , to be combined with the intermediate cognitive maps 408 output by deconvolutional layers D 1 , D 3 , D 5 , D 7 , D 9 .
- This combination is denoted by the “+” signs on the intermediate cognitive maps 408 between deconvolution layers D 1 -D 2 , D 3 -D 4 , D 5 -D 6 , D 7 -D 8 and D 9 -D 10 .
- Prediction images p 2 -p 6 can be based on ground truth including semantic segmentation applied to an input image 200 .
- Multiple monocular images 200 acquired at different locations can be processed using optical flow techniques, for example, to determine distances to objects detected by semantic segmentation.
- Data from a sensor 116 can be combined with semantic segmentation information to determine distances to objects.
- a top-down view can be generated by homography, where depictions of objects detected in an input image 200 are orthographically projected onto a plane parallel with a ground plane or roadway based on their estimated 3D shape and 3D pose. Once projected onto the plane representing an estimated cognitive map 414 , objects can retain their class or type, as indicated by color.
- Multiple prediction images p 2 -p 6 are used to train CNN 400 with the goal that each prediction image p 2 -p 6 is combined with the intermediate cognitive map 408 at the appropriate resolution.
- Combining estimated cognitive maps 414 with intermediate cognitive maps 408 can include scoring positively (rewarding) output from deconvolutional layers D 1 , D 3 , D 5 , D 7 , D 9 based on the similarity between the intermediate cognitive maps 408 and the estimated cognitive maps 414 .
- CNN 400 can be trained to output 404 a cognitive map 300 from deconvolution layer D 10 .
- Trained CNN 400 will output 404 a cognitive map 300 based on recognizing visual similarities between an input image 200 and input images 200 processed as part of a training set.
- Similarity between the intermediate cognitive map 408 to the estimated cognitive map 414 can be determined based on a cost function that measures the similarity of the intermediate cognitive map 408 to the estimated cognitive map 414 by the equation:
- Cost( l,M ) W *Cross Entropy(M,M Rec ) +neighborhood_cost( M, M Rec ) (1)
- W is a weight of each object calculated based on the number of available training pixels for each class of objects
- I is the input image 200
- M is the estimated cognitive map 414
- M_Rec is the intermediate cognitive map 408 .
- the Cross_Entropy loss function is calculated as:
- the neighborhood similarity cost term can be determined by considering the agreement between a pixel and its neighboring pixels in the cognitive map predictions p 2 -p 6 and 300 .
- Calculation of a neighborhood cost function can be simplified by applying a Gaussian filter to the cross-entropy of a 3 ⁇ 3 block of pixels for the estimated cognitive map and ground truth. Applying a neighborhood cost function in this manner can improve the convergence speed of training and result in better predictions.
- a CNN 400 can process input images 200 to produce cognitive maps 300 without inputting prediction images p 2 -p 6 .
- Convolutional layers C 1 -C 10 can convolve and down-sample intermediate results 406 that get passed to deconvolutional layers D 1 -D 10 to deconvolve and upsample intermediate cognitive maps 408 with input from convolutional layers C 1 , C 2 , C 4 , C 6 , C 7 via skip connection results 410 .
- Cognitive maps 300 produced by CNN 400 can be used by computing device 115 to operate vehicle 110 by permitting computing device to predict vehicle trajectories based on the cognitive map 300 .
- multiple CNNs 400 can be trained to determine cognitive maps 300 based on ground truth including multiple monocular image inputs, LIDAR and radar and the results combined by adding a fusion layer to the CNNs 400 .
- Temporal information can be included in the CNN 400 by adding recurrent convolutional layers to process temporal information.
- Cognitive maps 300 output from CNN 400 can be combined with other information available to computing device 115 from sensors 116 including GPS, INS and odometry location information, LIDAR, radar, and multi-camera information regarding distances and map information stored at computing device 115 or downloaded from a server computer 120 , for example to improve the accuracy cognitive map 300 p 1 and distances to objects therein.
- a recorded image 200 along with recorded ground truth information can be used to update CNN 400 by providing additional training.
- the re-trained CNN 400 can be stored in computing device 115 memory for future use.
- a trained CNN 400 can be recalled from memory and executed by computing device 115 to produce cognitive maps 300 from image 200 input in real time as required for operation of a vehicle 110 on a roadway with traffic, for example.
- FIG. 5 is a diagram of a flowchart, described in relation to FIGS. 1-4 , of a process 500 for operating a vehicle based on a cognitive map.
- Process 500 can be implemented by a processor of computing device 115 , taking as input information from sensors 116 , and executing commands and sending control signals via controllers 112 , 113 , 114 , for example.
- Process 500 includes multiple steps taken in the disclosed order.
- Process 500 also includes implementations including fewer steps or can include the steps taken in different orders.
- Process 500 begins at step 502 , where a computing device 115 included in a vehicle 110 acquires an image 200 as described above in relation to FIG. 2 .
- the image 200 can be an RGB color video image acquired by a video sensor 116 included in vehicle 110 .
- the image 200 can depict the physical environment near vehicle 110 , including a roadway 202 and objects including other vehicles 204 , 206 , 208 , 210 .
- computing device 115 inputs image 200 to a trained CNN 400 as discussed above in relation to FIG. 4 , above.
- trained CNN 400 produces a cognitive map 300 including a roadway 302 and objects including other vehicles 304 , 306 , 308 , 310 .
- Training CNN 400 will be discussed in relation to FIG. 6 .
- computing device 115 operates a vehicle 110 based on cognitive map 300 .
- Computing device 115 can operate vehicle 110 based on cognitive map 300 by determining predicted vehicle trajectories based on lanes and objects including other vehicles.
- Computing device 115 can combine cognitive maps 300 with map data from multi-camera sensors 116 , LIDAR sensors 116 , and radar sensors 116 , location data from GPS, INS and odometry and map data from a server computer 120 , for example, to improve the accuracy of cognitive map 300 .
- the computing device 115 can provide instructions to one or more of the powertrain controller 112 , brake controller 113 , and steering controller 114 .
- the computing device may be programmed to take certain actions concerning adjusting or maintains speed, acceleration, and/or steering based on objects such as other vehicles 304 - 310 ; the cognitive map 300 advantageously can provide more accurate data for such actions than was previously available. Vehicle 110 safety and or efficiency can thereby be improved by the cognitive map 300 . Following this step process 500 ends.
- FIG. 6 is a diagram of a flowchart, described in relation to FIGS. 1-4 , of a process 600 for training a CNN 400 based on ground-truth.
- Process 600 can be implemented by a processor of computing device 115 , taking as input information from sensors 116 , and executing commands and sending control signals via controllers 112 , 113 , 114 , for example.
- Process 600 includes multiple steps taken in the disclosed order.
- Process 600 also includes implementations including fewer steps or can include the steps taken in different orders.
- Process 600 begins at step 602 , where a computing device 115 included in a vehicle 110 acquires and records one or more images 200 as described above in relation to FIG. 2 .
- the images 200 can be an RGB color video images acquired by a video sensor 116 included in vehicle 110 .
- the image 200 can depict the physical environment near vehicle 110 , including a roadway 202 and objects including other vehicles 204 , 206 , 208 , 210 .
- computing device 115 records ground truth data based on object detection, pixel-wise segmentation, 3D object poses, and relative distances all determined based the recorded images 200 , distance data, location data, and map data as discussed above in relation to FIG. 4 , corresponding to the images 200 recorded at step 602 .
- computing device inputs images 200 to CNN 400 while constructing prediction images p 2 -p 6 to train CNN 400 according to cost functions in equations 1 and 2, above.
- Prediction images p 2 -p 6 are constructed to include the recorded ground truth data based on object detection, pixel-wise segmentation, 3D object poses, and relative distances.
- Prediction images p 2 -p 6 can be created by homographic projection of ground truth data and used to transform estimated results 412 into top-down view, estimated cognitive maps 414 that can be used to train CNN 400 to output a cognitive map 300 in response to inputting an image 200 as discussed above in relation to FIG. 4 .
- CNN 400 can be trained to output a cognitive map 300 in response to an input image 200 .
- the trained CNN 400 is output to be stored at memory included in computing device 115 .
- Computing device 115 can recall the trained CNN 400 from memory, input an acquired image 200 to the trained CNN 400 and receive as output a cognitive map 300 , to be used to operate a vehicle 110 , without having to input ground truth data. Following this step process 600 ends.
- Computing devices such as those discussed herein generally each include commands executable by one or more computing devices such as those identified above, and for carrying out blocks or steps of processes described above.
- process blocks discussed above may be embodied as computer-executable commands.
- Computer-executable commands may be compiled or interpreted from computer programs created using a variety of programming languages and/or technologies, including, without limitation, and either alone or in combination, JavaTM, C, C++, Visual Basic, Java Script, Perl, HTML, etc.
- a processor e.g., a microprocessor
- receives commands e.g., from a memory, a computer-readable medium, etc.
- executes these commands thereby performing one or more processes, including one or more of the processes described herein.
- commands and other data may be stored in files and transmitted using a variety of computer-readable media.
- a file in a computing device is generally a collection of data stored on a computer readable medium, such as a storage medium, a random access memory, etc.
- a computer-readable medium includes any medium that participates in providing data (e.g., commands), which may be read by a computer. Such a medium may take many forms, including, but not limited to, non-volatile media, volatile media, etc.
- Non-volatile media include, for example, optical or magnetic disks and other persistent memory.
- Volatile media include dynamic random access memory (DRAM), which typically constitutes a main memory.
- DRAM dynamic random access memory
- Computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EEPROM, any other memory chip or cartridge, or any other medium from which a computer can read.
- exemplary is used herein in the sense of signifying an example, e.g., a reference to an “exemplary widget” should be read as simply referring to an example of a widget.
- adverb “approximately” modifying a value or result means that a shape, structure, measurement, value, determination, calculation, etc. may deviate from an exact described geometry, distance, measurement, value, determination, calculation, etc., because of imperfections in materials, machining, manufacturing, sensor measurements, computations, processing time, communications time, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Automation & Control Theory (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Electromagnetism (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Transportation (AREA)
- Game Theory and Decision Science (AREA)
- Business, Economics & Management (AREA)
- Optics & Photonics (AREA)
- Mechanical Engineering (AREA)
- Bioinformatics & Computational Biology (AREA)
- Mathematical Physics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Traffic Control Systems (AREA)
- Image Analysis (AREA)
Abstract
Description
- Vehicles can be equipped to operate in both autonomous and occupant piloted mode. Vehicles can be equipped with computing devices, networks, sensors and controllers to acquire information regarding the vehicle's environment and to operate the vehicle based on the information. Safe and comfortable operation of the vehicle can depend upon determining predicted vehicle trajectories based on accurate and timely information regarding the vehicle's environment. For example, safe and comfortable operation of the vehicle can depend upon acquiring accurate and timely information regarding objects in a vehicle's environment while the vehicle is being operated on a roadway. It is a problem to provide accurate and timely information regarding objects near or around a vehicle to support operation of the vehicle.
-
FIG. 1 is a block diagram of an example vehicle. -
FIG. 2 is a diagram of an example image of a traffic scene. -
FIG. 3 is a diagram of an example cognitive map. -
FIG. 4 is a diagram of an example convolutional neural network. -
FIG. 5 is a flowchart diagram of an example process to operate a vehicle based on a cognitive map. -
FIG. 6 is a flowchart diagram of an example process to train a convolutional neural network to output a cognitive map. - Vehicles can be equipped to operate in both autonomous and occupant piloted mode. By a semi- or fully-autonomous mode, we mean a mode of operation wherein a vehicle can be piloted by a computing device as part of a vehicle information system having sensors and controllers. The vehicle can be occupied or unoccupied, but in either case the vehicle can be piloted without assistance of an occupant. For purposes of this disclosure, an autonomous mode is defined as one in which each of vehicle propulsion (e.g., via a powertrain including an internal combustion engine and/or electric motor), braking, and steering are controlled by one or more vehicle computers; in a semi-autonomous mode the vehicle computer(s) control(s) one or two of vehicle propulsion, braking, and steering. In a non-autonomous vehicle, none of these are controlled by a computer.
- An estimate of a location, e.g., according to geo-coordinates, of a vehicle with respect to a map can be used by a computing device to operate a vehicle on a roadway from a current location to a determined destination, for example. The map can be a cognitive map. A cognitive map in the context of this disclosure is a top-down view, 2D representation of the physical environment around a vehicle. In examples where a vehicle is in motion, for example, operating on a roadway, the cognitive map can include a top-down, 2D representation of the roadway ahead of a current vehicle location and in a direction of current vehicle travel. The direction of current vehicle travel is based on the current vehicle trajectory, which includes speed, direction, longitudinal acceleration, and lateral acceleration. The cognitive map can include a roadway and objects such as lanes, barriers, shoulders, and lane markers, vehicles and pedestrians, for example.
- In the field of psychology, a cognitive map is a mental representation of the physical environment. For example, humans and animals use cognitive maps to find their way around their environment. In the present disclosure, a cognitive map is used by a computing device to operate a vehicle, including actuating vehicle components including powertrain, steering and braking to direct the vehicle from a current location to a destination location in a safe and comfortable fashion. The cognitive map can be used by the computing device to determine predicted vehicle trajectories based on determined locations of lanes and determined locations and trajectories of other vehicles in the cognitive map, for example. A cognitive map can depict semantic segmentation of objects viewed from top-down view and accurately illustrate a distance to each point from
vehicle 110. - Disclosed herein is a method, including acquiring an image of a vehicle environment, determining a cognitive map, which includes a top-down view of the vehicle environment, based on the image, and operating the vehicle based on the cognitive map. The vehicle environment can include a roadway and objects including other vehicles and pedestrians. The cognitive map can include locations of the objects including at least one of other vehicles and pedestrians, relative to the vehicle. The image can be a monocular video frame. The cognitive map of the vehicle environment can be based on processing the image with a convolutional neural network. The convolutional neural network can be trained based on ground truth data prior to determining the cognitive map. The ground truth data can be based on object detection, pixel-wise segmentation, 3D object pose, and relative distance.
- Training the convolutional neural network can be based on prediction images included in the convolutional neural network. The prediction images can be based on ground truth data. The neural network learns how to transform input RGB images to estimation of cognitive maps. The estimated cognitive maps can be combined with intermediate estimations of cognitive maps to and compared against the prediction images to determine similarity. The similarity between the estimated combined cognitive maps can be determined by calculating a cost function. The cost function can be based on a weighted cross entropy function based on comparing the estimated cognitive maps and the intermediate cognitive maps with the prediction images. The prediction images can be based on LIDAR data.
- Further disclosed is a computer readable medium, storing program instructions for executing some or all of the above method steps. Further disclosed is a computer programmed for executing some or all of the above method steps, including a computer apparatus, programmed to acquire an image of a vehicle environment, determine a cognitive map, which includes a top-down view of the vehicle environment, based on the image, and operate the vehicle based on the cognitive map. The vehicle environment can include a roadway and objects including other vehicles and pedestrians. The cognitive map can include locations of the objects including at least one of other vehicles and pedestrians, relative to the vehicle. The image can be a monocular video frame. The cognitive map of the vehicle environment can be based on processing the image with a convolutional neural network. The convolutional neural network can be trained based on ground truth data prior to determining the cognitive map. The ground truth data can be based on object detection, pixel-wise segmentation, 3D object pose, and relative distance.
- The computer can be further programmed to train the convolutional neural network based on prediction images included in the convolutional neural network. The prediction images can be based on ground truth data. The prediction images can transform estimated results into estimated cognitive maps. The estimated cognitive maps can be combined with intermediate cognitive maps to determine similarity. The similarity between the estimated cognitive maps and the prediction images can be determined by calculating a cost function. The cost function can be based on a weighted cross entropy function based on comparing the estimated cognitive maps combined with the intermediate cognitive maps and prediction images. The prediction images can be based on LIDAR data.
-
FIG. 1 is a diagram of avehicle information system 100 that includes avehicle 110 operable in autonomous (“autonomous” by itself in this disclosure means “fully autonomous”) and occupant piloted (also referred to as non-autonomous) mode.Vehicle 110 also includes one ormore computing devices 115 for performing computations for piloting thevehicle 110 during autonomous operation.Computing devices 115 can receive information regarding the operation of the vehicle fromsensors 116. - The
computing device 115 includes a processor and a memory such as are known. Further, the memory includes one or more forms of computer-readable media, and stores instructions executable by the processor for performing various operations, including as disclosed herein. For example, thecomputing device 115 may include programming to operate one or more of vehicle brakes, propulsion (e.g., control of acceleration in thevehicle 110 by controlling one or more of an internal combustion engine, electric motor, hybrid engine, etc.), steering, climate control, interior and/or exterior lights, etc., as well as to determine whether and when thecomputing device 115, as opposed to a human operator, is to control such operations. - The
computing device 115 may include or be communicatively coupled to, e.g., via a vehicle communications bus as described further below, more than one computing devices, e.g., controllers or the like included in thevehicle 110 for monitoring and/or controlling various vehicle components, e.g., apowertrain controller 112, abrake controller 113, asteering controller 114, etc. Thecomputing device 115 is generally arranged for communications on a vehicle communication network, e.g., including a bus in thevehicle 110 such as a controller area network (CAN) or the like; thevehicle 110 network can additionally or alternatively include wired or wireless communication mechanism such as are known, e.g., Ethernet or other communication protocols. - Via the vehicle network, the
computing device 115 may transmit messages to various devices in the vehicle and/or receive messages from the various devices, e.g., controllers, actuators, sensors, etc., includingsensors 116. Alternatively, or additionally, in cases where thecomputing device 115 actually comprises multiple devices, the vehicle communication network may be used for communications between devices represented as thecomputing device 115 in this disclosure. Further, as mentioned below, various controllers or sensing elements such assensors 116 may provide data to thecomputing device 115 via the vehicle communication network. - In addition, the
computing device 115 may be configured for communicating through a vehicle-to-infrastructure (V-to-I) interface 111 with aremote server computer 120, e.g., a cloud server, via anetwork 130, which, as described below. A vehicle-to-infrastructure (V-to-I) interface 111 includes hardware, firmware, and software that permitscomputing device 115 to communicate with aremote server computer 120 via anetwork 130 such as wireless Internet (Wi-Fi) or cellular networks. V-to-I interface 111 may accordingly include processors, memory, transceivers, etc., configured to utilize various wired and/or wireless networking technologies, e.g., cellular, BLUETOOTH® and wired and/or wireless packet networks.Computing device 115 may be configured for communicating with other vehicles through V-to-I interface 111 using vehicle-to-vehicle (V-to-V) networks, e.g., according to Dedicated Short Range Communications (DSRC) and/or the like, e.g., formed on an ad hoc basis amongnearby vehicles 110 or formed through infrastructure-based networks including the Internet via cellular networks or Wi-Fi, for example. Thecomputing device 115 also includes nonvolatile memory such as is known.Computing device 115 can log, i.e., store in a memory, information by storing the information in nonvolatile memory for later retrieval and transmittal via the vehicle communication network and a vehicle to infrastructure (V-to-I) interface 111 to aserver computer 120 or user mobile device 160. - As already mentioned, generally included in instructions stored in the memory and executable by the processor of the
computing device 115 is programming for operating one ormore vehicle 110 components, e.g., braking, steering, propulsion, etc., without intervention of a human operator. Using data received in thecomputing device 115, e.g., the sensor data from thesensors 116, theserver computer 120, etc., thecomputing device 115 may make various determinations and/or controlvarious vehicle 110 components and/or operations without a driver to operate thevehicle 110. For example, thecomputing device 115 may include programming to regulatevehicle 110 operational behaviors (i.e., physical manifestations ofvehicle 110 operation) such as speed, acceleration, deceleration, steering, etc., as well as tactical behaviors (i.e., control of operational behaviors typically in a manner intended to achieve safe and efficient traversal of a route) such as a distance between vehicles and/or amount of time between vehicles, lane-change, minimum gap between vehicles, left-turn-across-path minimum, time-to-arrival at a particular location and intersection (without signal) minimum time-to-arrival to cross the intersection. - Controllers, as that term is used herein, include computing devices that typically are programmed to control a specific vehicle subsystem. Examples include a
powertrain controller 112, abrake controller 113, and asteering controller 114. A controller may be an electronic control unit (ECU) such as is known, possibly including additional programming as described herein. The controllers may communicatively be connected to and receive instructions from thecomputing device 115 to actuate the subsystem according to the instructions. For example, thebrake controller 113 may receive instructions from thecomputing device 115 to operate the brakes of thevehicle 110. - The one or
more controllers vehicle 110 may include conventional electronic control units (ECUs) or the like including, as non-limiting examples, one ormore powertrain controllers 112, one ormore brake controllers 113 and one ormore steering controllers 114. Each of thecontrollers controllers vehicle 110 communications bus, such as a controller area network (CAN) bus or local interconnect network (LIN) bus, to receive instructions from thecomputer 115 and control actuators based on the instructions. -
Sensors 116 may include a variety of devices known to provide data via the vehicle communications bus. For example, a radar fixed to a front bumper (not shown) of thevehicle 110 may provide a distance from thevehicle 110 to a next vehicle in front of thevehicle 110, or a global positioning system (GPS) sensor disposed in thevehicle 110 may provide geographical coordinates of thevehicle 110. The distance(s) provided by the radar and/orother sensors 116 and/or the geographical coordinates provided by the GPS sensor may be used by thecomputing device 115 to operate thevehicle 110 autonomously or semi-autonomously. - The
vehicle 110 is generally a land-basedautonomous vehicle 110 having three or more wheels, e.g., a passenger car, light truck, etc. Thevehicle 110 includes one ormore sensors 116, the V-to-I interface 111, thecomputing device 115 and one ormore controllers - The
sensors 116 may be programmed to collect data related to thevehicle 110 and the environment in which thevehicle 110 is operating. By way of example, and not limitation,sensors 116 may include, e.g., altimeters, cameras, LIDAR, radar, ultrasonic sensors, infrared sensors, pressure sensors, accelerometers, gyroscopes, temperature sensors, pressure sensors, hall sensors, optical sensors, voltage sensors, current sensors, mechanical sensors such as switches, etc. Thesensors 116 may be used to sense the environment in which thevehicle 110 is operating, e.g.,sensors 116 can detect phenomena such as weather conditions (precipitation, external ambient temperature, etc.), the grade of a road, the location of a road (e.g., using road edges, lane markings, etc.), or locations of target objects such as neighboringvehicles 110. Thesensors 116 may further be used to collect data includingdynamic vehicle 110 data related to operations of thevehicle 110 such as velocity, yaw rate, steering angle, engine speed, brake pressure, oil pressure, the power level applied tocontrollers vehicle 110, connectivity between components, and accurate and timely performance of components of thevehicle 110. -
FIG. 2 illustrates animage 200 of a traffic scene including aroadway 202 andother vehicles image 200 can be a monocular video frame acquired by computingdevice 115 from avideo sensor 116 included in avehicle 110, for example. A monocular video frame can include three color planes with a bit depth of eight bits each for a total of 24 bits corresponding to red, green, and blue (RGB) color components.Image 200 can include aroadway 202,lane marker 212,barriers roadway Computing device 115 can useimage 200 to produce a cognitivemap including roadway 202 and objects includingother vehicles lane marker 212,barriers roadway map including roadway 202 and objects, determine predicted trajectories for operatingvehicle 110. -
FIG. 3 is acognitive map 300 of a traffic scene including a roadway 302 (white) and objects includingother vehicles barriers adjacent terrain 320, 322 (cross-hatch) are each rendered to denote different colors, where each different color represents an object class or type and will each occupy a separate channel or plane incognitive map 300. For example, a cognitive map can include 20 or more channels each including objects belonging to a single class, such a “roadway”, “vehicle”, “pedestrian”, “cyclist”, etc.Vehicle 110 trajectory with respect tocognitive map 300 is denoted byarrow 324.Cognitive map 300 can be created by inputting animage 200 into a convolutional neural network (CNN), configured and trained as described in relation toFIG. 4 , below, which, in response to the input, outputs acognitive map 300. -
Computing device 115 can operatevehicle 110 based oncognitive map 300. Operatingvehicle 110 can include actuating vehicle components such as powertrain, steering and braking viacontrollers cognitive map 300. For example,computing device 115 can operatevehicle 110 to follow predicted trajectories that locatevehicle 110 in the center of a lane, the lane determined based onlane marker 312 andbarrier 314 while maintaining a predetermined distance betweenvehicle 110 andother vehicle 310.Computing device 115 can predict vehicle trajectories that can be used to actuate powertrain, steering and braking components based on distances to and locations of objects in thecognitive map 300 relative to the location ofvehicle 110, for example. - Predicted trajectories of object including
other vehicles cognitive maps 300 created at successive time intervals, fromimages 200 acquired at successive time intervals. Trajectories ofother vehicles other vehicles cognitive maps 300 created at successive time intervals, fitting a curve to the location points and calculating vectors equal to the first and second derivatives of each curve in the 2D plane of thecognitive map 300. The magnitude of the first derivative is speed and the angle is direction. The second derivatives are directional derivatives parallel to the first derivative direction (longitudinal acceleration) and perpendicular to the first derivative direction (latitudinal acceleration). -
FIG. 4 is a diagram of anexample CNN 400 configured to input animage 200 and output acognitive map 300. Theimage 200 can be a monocular RGB video image acquired from avideo sensor 116 included in avehicle 110 that includes a scene depicting the physical environment nearvehicle 110. Thecognitive map 300 is a 2D representation of the physical environment nearvehicle 110 including 20 or more channels each including a single class of objects present in the scene, identified by type, distance and 3D pose relative tovehicle 110, where 3D pose is defined as the orientation of an object in 3D space relative to a frame of reference expressed as angles ρ, φ, and θ. Information regarding object type, distance and 3D pose included incognitive map 300 as a top-down view can permitcomputing device 115 to determine trajectories to operatevehicle 110 safely by traveling on the roadway and avoiding collisions. -
CNN 400 is a program in memory executing on a processor included incomputing device 115 and includes a set of ten convolutional layers C1-C10 (3D boxes) configured to input 402 animage 200 to convolutional layer C1. Convolutional layer C1 produces anintermediate result 406, represented by the arrow between convolutional layer C1 and convolutional layer C2. Each convolutional layer C2-C10 receives anintermediate result 406 and outputs anintermediate result 406 represented by the arrows between adjacent convolutional layers C1-C10, representing forward propagation ofintermediate results 406. Convolutional layers C1-C10 each output anintermediate result 406 at an output spatial resolution equal to the input spatial resolution or at an output spatial resolution reduced from the input spatial resolution. Bit depth per resolution element increases for intermediate results as spatial resolution increases as described in Table 1, below. This repeats for convolutional layers C2-C9, which produceintermediate results 406, represented by the dark arrows between convolutional layers C2-C9 at successively lower resolutions. Convolutional layers C1-C9 can reduce resolution by pooling, wherein an adjacent group of pixels, which can be a 2×2 neighborhood, for example, are combined to form a single pixel according to a predetermined equation. Combining a group of pixels by selecting a maximum value among them, called “max pooling”, can reduce resolution while retaining information inintermediate results 406. Following convolutional layers C1-C10, convolutional layer C10 outputsintermediate result 406 to first deconvolutional layer D1, which can deconvolve and upsampleintermediate result 406 to produce intermediatecognitive map 408, represented by the arrows between each of deconvolutional layers D1-D10. Deconvolution is convolution performed with a kernel that is, at least in part, an inverse of another kernel previously used to convolve a function and can partially invert the effects of the previous convolution. For example, deconvolutional layers D1-D10 can increase spatial resolution of intermediatecognitive map 408 while decreasing the bit depth according to Table 1, below. - Convolutional layer C10 also outputs estimated feature maps 412 to prediction image p6, which, when training
CNN 400, combines estimated feature maps 412 from convolutional layer C10 with ground truth-based information regarding objects that transforms the estimated feature maps 412 into an estimatedcognitive map 414. The estimatedcognitive map 414 is combined with the intermediate feature maps 408 output from deconvolution layer D1 when trainingCNN 400. This is shown by the “+” signs on the intermediatecognitive map 408 arrow between deconvolution layers D1-D2. Comparing the intermediatecognitive map 408 based on input image I with ground truth-based information including object detection, pixel-wise segmentation, 3D object poses, and relative distances is used for training the convolutional neural network. - The “+” sign on the intermediate
cognitive map 408 between deconvolution layers D1-D2 also indicates combiningintermediate feature map 408 and predictedcognitive map 414 withskip connection results 410 from convolutional layer C7 received via skip connections. Skip connection results 410 areintermediate results 406 forward propagated via skip connections as input to an upsampling deconvolution layer D2, D4, D6, D8, D10. Skip connection results 410 can be combined with intermediate feature maps 408 to increase resolution ofintermediate feature map 408 by upsampling to pass onto succeeding deconvolutional layers D3, D5, D7, D9. This is shown by the “+” signs on theintermediate results 408 between deconvolution layers D1-D2, D3-D4, D5-D6, D7-D8 and D9-D10. Skip connections can forward propagateskip connection results 410 at the same resolution as the deconvolutional layers D2, D4, D6, D8, D10 receiving the information. - Deconvolutional layers D1-D10 include prediction images p2-p6. Prediction images p2-p6 are used for training
CNN 400 to producecognitive maps 300 fromimage 200 input. Prediction images p2-p6 are determined based on ground truth images developed independently ofCNN 400. Ground truth refers to information regarding the physical environment nearvehicle 110. Accordingly, ground truth data in the present context can include distance and pose information determined usingsensors 116 includingmulti-camera video sensors 116,LIDAR sensors 116, andradar sensors 116, location data fromGPS sensors 116,INS sensors 116, andodometry sensors 116. Ground truth data in the present context can also include map data stored in a memory ofcomputing device 115, and/or from aserver computer 120, combined with information regarding object classification determined using CNN-based object classification programs. Such CNN-based object classification programs typically receive asinput images 200, and thenoutput images 200 segmented into regions that include objects such as roadways, lane markings, barriers, lanes, shoulders or adjacent terrain, other vehicles including type and model, and other objects including pedestrians, animals, bicycles, etc. Prediction images p2-p6 combine distance information with segmentation information to transform estimatedresults 412 from convolutional layer C10 and deconvolutional layers D2, D4, D6 and D8 into estimatedcognitive maps 414 by orthographically projecting the estimatedresults 412 onto a 2D ground plane based on distance information to segmented objects and coloring the estimatedcognitive map 414 based on information regarding object detection, pixel-wise segmentation, 3D object poses, and relative distances included in prediction images p2-p6. - Prediction images p2-p6 are used to train
CNN 400 to output acognitive map 300 in response to inputting animage 200 by outputting estimatedcognitive maps 414, to be combined with the intermediatecognitive maps 408 output by deconvolutional layers D1, D3, D5, D7, D9. This combination is denoted by the “+” signs on the intermediatecognitive maps 408 between deconvolution layers D1-D2, D3-D4, D5-D6, D7-D8 and D9-D10. Prediction images p2-p6 can be based on ground truth including semantic segmentation applied to aninput image 200. Multiplemonocular images 200 acquired at different locations can be processed using optical flow techniques, for example, to determine distances to objects detected by semantic segmentation. Data from asensor 116 can be combined with semantic segmentation information to determine distances to objects. Once distances to objects are determined and a 3D shape is estimated, a top-down view can be generated by homography, where depictions of objects detected in aninput image 200 are orthographically projected onto a plane parallel with a ground plane or roadway based on their estimated 3D shape and 3D pose. Once projected onto the plane representing an estimatedcognitive map 414, objects can retain their class or type, as indicated by color. - Multiple prediction images p2-p6 are used to train
CNN 400 with the goal that each prediction image p2-p6 is combined with the intermediatecognitive map 408 at the appropriate resolution. Combining estimatedcognitive maps 414 with intermediatecognitive maps 408 can include scoring positively (rewarding) output from deconvolutional layers D1, D3, D5, D7, D9 based on the similarity between the intermediatecognitive maps 408 and the estimatedcognitive maps 414. By positively rewarding deconvolutional layers D1, D3, D5, D7, D9 in this fashion,CNN 400 can be trained to output 404 acognitive map 300 from deconvolution layer D10. Once deconvolutional layers D1, D3, D5, D7, D9 have been trained to output intermediatecognitive maps 408, input from prediction images p2-p6 is no longer required output 404 acognitive map 300 based oninput image 200. TrainedCNN 400 will output 404 acognitive map 300 based on recognizing visual similarities between aninput image 200 andinput images 200 processed as part of a training set. - Similarity between the intermediate
cognitive map 408 to the estimatedcognitive map 414 can be determined based on a cost function that measures the similarity of the intermediatecognitive map 408 to the estimatedcognitive map 414 by the equation: -
Cost(l,M)=W*CrossEntropy(M,MRec )+neighborhood_cost(M, M Rec) (1) - where W is a weight of each object calculated based on the number of available training pixels for each class of objects, I is the
input image 200, M is the estimatedcognitive map 414, and M_Rec is the intermediatecognitive map 408. The Cross_Entropy loss function is calculated as: -
H(M, M_Rec)=−Σi(M_Reci*log(M i)+(−1−i M_Reci)*log(1−M i)) (2) - where i is the ith pixel in the image. The neighborhood similarity cost term can be determined by considering the agreement between a pixel and its neighboring pixels in the cognitive map predictions p2-p6 and 300. Calculation of a neighborhood cost function can be simplified by applying a Gaussian filter to the cross-entropy of a 3×3 block of pixels for the estimated cognitive map and ground truth. Applying a neighborhood cost function in this manner can improve the convergence speed of training and result in better predictions.
- Table 1 is a table of
convolutional layers 402 C1-C10,deconvolutional layers 404 D1-D10, cognitive map 300 (p1) and prediction images p2-p6, with their respective sizes expressed as fractions of the height and width of the input RGB image 200 I, along with a bit depth, wherein the input RGB image is size W×H×3, with each of the RGB color planes having a bit depth of eight bits, with W=1920, H=1080 and bit depth of 24, for example. -
TABLE 1 Sizes and bit depth for convolutional layers C1-C10, deconvolutional layers D1-D10, cognitive map 300 (p1) and prediction images p2-p6. C1-C10 D1-D10 p1-p6 1 W/2 × H/2 × 64 W/32 × H/32 × 512 W × HX 24 2 W/4 × H/4 × 128 W/32 × H/32 × 512 W/4 × H/4 × 24 3 W/8 × H/8 × 256 W/16 × H/16 × 256 W/8 × H/8 × 24 4 W/8 × H/8 × 256 W/16 × H/16 × 256 W/16 × H/16 × 24 5 W/16 × H/16 × 512 W/8 × H/8 × 128 W/32 × H/32 × 24 6 W/16 × H/16 × 512 W/8 × H/8 × 128 W/64 × H/64 × 24 7 W/32 × H/32 × 512 W/4 × H/4 × 64 8 W/32 × H/32 × 512 W/4 × H/4 × 64 9 W/64 × H/64 × 1024 W/2 × H/2 × 32 10 W/64 × H/64 × 1024 W/2 × H/2 × 32 - Once trained using ground-truth based prediction images p2-p6, a
CNN 400 can process inputimages 200 to producecognitive maps 300 without inputting prediction images p2-p6. Convolutional layers C1-C10 can convolve and down-sampleintermediate results 406 that get passed to deconvolutional layers D1-D10 to deconvolve and upsample intermediatecognitive maps 408 with input from convolutional layers C1, C2, C4, C6, C7 via skip connection results 410.Cognitive maps 300 produced byCNN 400 can be used by computingdevice 115 to operatevehicle 110 by permitting computing device to predict vehicle trajectories based on thecognitive map 300. - In other examples,
multiple CNNs 400 can be trained to determinecognitive maps 300 based on ground truth including multiple monocular image inputs, LIDAR and radar and the results combined by adding a fusion layer to theCNNs 400. Temporal information can be included in theCNN 400 by adding recurrent convolutional layers to process temporal information.Cognitive maps 300 output fromCNN 400 can be combined with other information available tocomputing device 115 fromsensors 116 including GPS, INS and odometry location information, LIDAR, radar, and multi-camera information regarding distances and map information stored atcomputing device 115 or downloaded from aserver computer 120, for example to improve the accuracycognitive map 300 p1 and distances to objects therein. - In other examples, in cases where other information available to
computing device 115 including GPS, INS and odometry location information, LIDAR, radar, and multi-camera information regarding distances and map information stored atcomputing device 115 or downloaded from aserver computer 120, provides information that does not agree with thecognitive map 300 p1, a recordedimage 200 along with recorded ground truth information can be used to updateCNN 400 by providing additional training. There-trained CNN 400 can be stored incomputing device 115 memory for future use. A trainedCNN 400 can be recalled from memory and executed by computingdevice 115 to producecognitive maps 300 fromimage 200 input in real time as required for operation of avehicle 110 on a roadway with traffic, for example. -
FIG. 5 is a diagram of a flowchart, described in relation toFIGS. 1-4 , of aprocess 500 for operating a vehicle based on a cognitive map.Process 500 can be implemented by a processor ofcomputing device 115, taking as input information fromsensors 116, and executing commands and sending control signals viacontrollers Process 500 includes multiple steps taken in the disclosed order.Process 500 also includes implementations including fewer steps or can include the steps taken in different orders. -
Process 500 begins atstep 502, where acomputing device 115 included in avehicle 110 acquires animage 200 as described above in relation toFIG. 2 . Theimage 200 can be an RGB color video image acquired by avideo sensor 116 included invehicle 110. Theimage 200 can depict the physical environment nearvehicle 110, including aroadway 202 and objects includingother vehicles - At
step 504computing device 115inputs image 200 to a trainedCNN 400 as discussed above in relation toFIG. 4 , above. In response to inputtingimage 200, trainedCNN 400 produces acognitive map 300 including aroadway 302 and objects includingother vehicles Training CNN 400 will be discussed in relation toFIG. 6 . - At
step 506computing device 115 operates avehicle 110 based oncognitive map 300.Computing device 115 can operatevehicle 110 based oncognitive map 300 by determining predicted vehicle trajectories based on lanes and objects including other vehicles.Computing device 115 can combinecognitive maps 300 with map data frommulti-camera sensors 116,LIDAR sensors 116, andradar sensors 116, location data from GPS, INS and odometry and map data from aserver computer 120, for example, to improve the accuracy ofcognitive map 300. Thus, based on thecognitive map 300, thecomputing device 115 can provide instructions to one or more of thepowertrain controller 112,brake controller 113, andsteering controller 114. For example, the computing device may be programmed to take certain actions concerning adjusting or maintains speed, acceleration, and/or steering based on objects such as other vehicles 304- 310; thecognitive map 300 advantageously can provide more accurate data for such actions than was previously available.Vehicle 110 safety and or efficiency can thereby be improved by thecognitive map 300. Following thisstep process 500 ends. -
FIG. 6 is a diagram of a flowchart, described in relation toFIGS. 1-4 , of aprocess 600 for training aCNN 400 based on ground-truth.Process 600 can be implemented by a processor ofcomputing device 115, taking as input information fromsensors 116, and executing commands and sending control signals viacontrollers Process 600 includes multiple steps taken in the disclosed order.Process 600 also includes implementations including fewer steps or can include the steps taken in different orders. -
Process 600 begins atstep 602, where acomputing device 115 included in avehicle 110 acquires and records one ormore images 200 as described above in relation toFIG. 2 . Theimages 200 can be an RGB color video images acquired by avideo sensor 116 included invehicle 110. Theimage 200 can depict the physical environment nearvehicle 110, including aroadway 202 and objects includingother vehicles - At
step 604computing device 115 records ground truth data based on object detection, pixel-wise segmentation, 3D object poses, and relative distances all determined based the recordedimages 200, distance data, location data, and map data as discussed above in relation toFIG. 4 , corresponding to theimages 200 recorded atstep 602. - At
step 606 computingdevice inputs images 200 toCNN 400 while constructing prediction images p2-p6 to trainCNN 400 according to cost functions inequations 1 and 2, above. Prediction images p2-p6 are constructed to include the recorded ground truth data based on object detection, pixel-wise segmentation, 3D object poses, and relative distances. Prediction images p2-p6 can be created by homographic projection of ground truth data and used to transform estimatedresults 412 into top-down view, estimatedcognitive maps 414 that can be used to trainCNN 400 to output acognitive map 300 in response to inputting animage 200 as discussed above in relation toFIG. 4 . By comparing the intermediate cognitive maps 806 output by deconvolution layers D1, D3, D5, D7, and D9 with the estimatedcognitive results 414 and back propagating the results of a cost function as described in relation toequations 1 and 2,CNN 400 can be trained to output acognitive map 300 in response to aninput image 200. - At
step 608 the trainedCNN 400 is output to be stored at memory included incomputing device 115.Computing device 115 can recall the trainedCNN 400 from memory, input an acquiredimage 200 to the trainedCNN 400 and receive as output acognitive map 300, to be used to operate avehicle 110, without having to input ground truth data. Following thisstep process 600 ends. - Computing devices such as those discussed herein generally each include commands executable by one or more computing devices such as those identified above, and for carrying out blocks or steps of processes described above. For example, process blocks discussed above may be embodied as computer-executable commands.
- Computer-executable commands may be compiled or interpreted from computer programs created using a variety of programming languages and/or technologies, including, without limitation, and either alone or in combination, Java™, C, C++, Visual Basic, Java Script, Perl, HTML, etc. In general, a processor (e.g., a microprocessor) receives commands, e.g., from a memory, a computer-readable medium, etc., and executes these commands, thereby performing one or more processes, including one or more of the processes described herein. Such commands and other data may be stored in files and transmitted using a variety of computer-readable media. A file in a computing device is generally a collection of data stored on a computer readable medium, such as a storage medium, a random access memory, etc.
- A computer-readable medium includes any medium that participates in providing data (e.g., commands), which may be read by a computer. Such a medium may take many forms, including, but not limited to, non-volatile media, volatile media, etc. Non-volatile media include, for example, optical or magnetic disks and other persistent memory. Volatile media include dynamic random access memory (DRAM), which typically constitutes a main memory. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EEPROM, any other memory chip or cartridge, or any other medium from which a computer can read.
- All terms used in the claims are intended to be given their plain and ordinary meanings as understood by those skilled in the art unless an explicit indication to the contrary in made herein. In particular, use of the singular articles such as “a,” “the,” “said,” etc. should be read to recite one or more of the indicated elements unless a claim recites an explicit limitation to the contrary.
- The term “exemplary” is used herein in the sense of signifying an example, e.g., a reference to an “exemplary widget” should be read as simply referring to an example of a widget.
- The adverb “approximately” modifying a value or result means that a shape, structure, measurement, value, determination, calculation, etc. may deviate from an exact described geometry, distance, measurement, value, determination, calculation, etc., because of imperfections in materials, machining, manufacturing, sensor measurements, computations, processing time, communications time, etc.
- In the drawings, the same reference numbers indicate the same elements. Further, some or all of these elements could be changed. With regard to the media, processes, systems, methods, etc. described herein, it should be understood that, although the steps of such processes, etc. have been described as occurring according to a certain ordered sequence, such processes could be practiced with the described steps performed in an order other than the order described herein. It further should be understood that certain steps could be performed simultaneously, that other steps could be added, or that certain steps described herein could be omitted. In other words, the descriptions of processes herein are provided for the purpose of illustrating certain embodiments, and should in no way be construed so as to limit the claimed invention.
Claims (20)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/881,228 US10345822B1 (en) | 2018-01-26 | 2018-01-26 | Cognitive mapping for vehicles |
CN201910068684.1A CN110084091A (en) | 2018-01-26 | 2019-01-24 | Cognition for vehicle maps |
DE102019101938.9A DE102019101938A1 (en) | 2018-01-26 | 2019-01-25 | Creation of cognitive maps for vehicles |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/881,228 US10345822B1 (en) | 2018-01-26 | 2018-01-26 | Cognitive mapping for vehicles |
Publications (2)
Publication Number | Publication Date |
---|---|
US10345822B1 US10345822B1 (en) | 2019-07-09 |
US20190235520A1 true US20190235520A1 (en) | 2019-08-01 |
Family
ID=67106346
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/881,228 Active US10345822B1 (en) | 2018-01-26 | 2018-01-26 | Cognitive mapping for vehicles |
Country Status (3)
Country | Link |
---|---|
US (1) | US10345822B1 (en) |
CN (1) | CN110084091A (en) |
DE (1) | DE102019101938A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190340522A1 (en) * | 2017-01-23 | 2019-11-07 | Panasonic Intellectual Property Management Co., Ltd. | Event prediction system, event prediction method, recording media, and moving body |
US20210101624A1 (en) * | 2019-10-02 | 2021-04-08 | Zoox, Inc. | Collision avoidance perception system |
US11068724B2 (en) * | 2018-10-11 | 2021-07-20 | Baidu Usa Llc | Deep learning continuous lane lines detection system for autonomous vehicles |
US11180080B2 (en) * | 2019-12-13 | 2021-11-23 | Continental Automotive Systems, Inc. | Door opening aid systems and methods |
US11994866B2 (en) | 2019-10-02 | 2024-05-28 | Zoox, Inc. | Collision avoidance perception system |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10733506B1 (en) * | 2016-12-14 | 2020-08-04 | Waymo Llc | Object detection neural network |
GB2601644B (en) * | 2017-04-28 | 2023-02-08 | FLIR Belgium BVBA | Video and image chart fusion systems and methods |
CN107589552B (en) | 2017-10-17 | 2023-08-04 | 歌尔光学科技有限公司 | Optical module assembly equipment |
EP3904835A4 (en) * | 2018-12-24 | 2022-10-05 | LG Electronics Inc. | Route providing device and route providing method thereof |
US10635938B1 (en) * | 2019-01-30 | 2020-04-28 | StradVision, Inc. | Learning method and learning device for allowing CNN having trained in virtual world to be used in real world by runtime input transformation using photo style transformation, and testing method and testing device using the same |
US10762393B2 (en) * | 2019-01-31 | 2020-09-01 | StradVision, Inc. | Learning method and learning device for learning automatic labeling device capable of auto-labeling image of base vehicle using images of nearby vehicles, and testing method and testing device using the same |
US11150664B2 (en) * | 2019-02-01 | 2021-10-19 | Tesla, Inc. | Predicting three-dimensional features for autonomous driving |
US10997461B2 (en) | 2019-02-01 | 2021-05-04 | Tesla, Inc. | Generating ground truth for machine learning from time series elements |
US11341614B1 (en) * | 2019-09-24 | 2022-05-24 | Ambarella International Lp | Emirror adaptable stitching |
CN112711249B (en) * | 2019-10-24 | 2023-01-03 | 科沃斯商用机器人有限公司 | Robot positioning method and device, intelligent robot and storage medium |
CN111275249A (en) * | 2020-01-15 | 2020-06-12 | 吉利汽车研究院(宁波)有限公司 | Driving behavior optimization method based on DQN neural network and high-precision positioning |
US11511576B2 (en) * | 2020-01-24 | 2022-11-29 | Ford Global Technologies, Llc | Remote trailer maneuver assist system |
KR20210124603A (en) * | 2020-04-06 | 2021-10-15 | 현대자동차주식회사 | Apparatus for controlling autonomous driving of a vehicle, system having the same and method thereof |
CN111959495B (en) * | 2020-06-29 | 2021-11-12 | 阿波罗智能技术(北京)有限公司 | Vehicle control method and device and vehicle |
EP4192714A1 (en) * | 2020-09-11 | 2023-06-14 | Waymo Llc | Estimating ground truth object keypoint labels for sensor readings |
CN113312438B (en) * | 2021-03-09 | 2023-09-15 | 中南大学 | Marine target position prediction method integrating route extraction and trend judgment |
DE102021209786A1 (en) | 2021-09-06 | 2023-03-09 | Robert Bosch Gesellschaft mit beschränkter Haftung | Method for positioning a map representation of an area surrounding a vehicle in a semantic road map |
US11541910B1 (en) * | 2022-01-07 | 2023-01-03 | Plusai, Inc. | Methods and apparatus for navigation of an autonomous vehicle based on a location of the autonomous vehicle relative to shouldered objects |
US11840257B2 (en) * | 2022-03-25 | 2023-12-12 | Embark Trucks Inc. | Lane change determination for vehicle on shoulder |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3164860A4 (en) * | 2014-07-03 | 2018-01-17 | GM Global Technology Operations LLC | Vehicle cognitive radar methods and systems |
US10099615B2 (en) * | 2014-09-29 | 2018-10-16 | Ambarella, Inc. | All-round view monitoring system for a motor vehicle |
US10133947B2 (en) * | 2015-01-16 | 2018-11-20 | Qualcomm Incorporated | Object detection using location data and scale space representations of image data |
CN105260699B (en) | 2015-09-10 | 2018-06-26 | 百度在线网络技术(北京)有限公司 | A kind of processing method and processing device of lane line data |
CN105488534B (en) | 2015-12-04 | 2018-12-07 | 中国科学院深圳先进技术研究院 | Traffic scene deep analysis method, apparatus and system |
US10181195B2 (en) * | 2015-12-28 | 2019-01-15 | Facebook, Inc. | Systems and methods for determining optical flow |
EP3206184A1 (en) * | 2016-02-11 | 2017-08-16 | NXP USA, Inc. | Apparatus, method and system for adjusting predefined calibration data for generating a perspective view |
CN106125730B (en) | 2016-07-10 | 2019-04-30 | 北京工业大学 | A kind of robot navigation's map constructing method based on mouse cerebral hippocampal spatial cell |
CN106372577A (en) | 2016-08-23 | 2017-02-01 | 北京航空航天大学 | Deep learning-based traffic sign automatic identifying and marking method |
CN106558058B (en) | 2016-11-29 | 2020-10-09 | 北京图森未来科技有限公司 | Segmentation model training method, road segmentation method, vehicle control method and device |
US10067509B1 (en) * | 2017-03-10 | 2018-09-04 | TuSimple | System and method for occluding contour detection |
CN107169421B (en) | 2017-04-20 | 2020-04-28 | 华南理工大学 | Automobile driving scene target detection method based on deep convolutional neural network |
US10474908B2 (en) | 2017-07-06 | 2019-11-12 | GM Global Technology Operations LLC | Unified deep convolutional neural net for free-space estimation, object detection and object pose estimation |
-
2018
- 2018-01-26 US US15/881,228 patent/US10345822B1/en active Active
-
2019
- 2019-01-24 CN CN201910068684.1A patent/CN110084091A/en active Pending
- 2019-01-25 DE DE102019101938.9A patent/DE102019101938A1/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190340522A1 (en) * | 2017-01-23 | 2019-11-07 | Panasonic Intellectual Property Management Co., Ltd. | Event prediction system, event prediction method, recording media, and moving body |
US11068724B2 (en) * | 2018-10-11 | 2021-07-20 | Baidu Usa Llc | Deep learning continuous lane lines detection system for autonomous vehicles |
US20210101624A1 (en) * | 2019-10-02 | 2021-04-08 | Zoox, Inc. | Collision avoidance perception system |
US11726492B2 (en) * | 2019-10-02 | 2023-08-15 | Zoox, Inc. | Collision avoidance perception system |
US11994866B2 (en) | 2019-10-02 | 2024-05-28 | Zoox, Inc. | Collision avoidance perception system |
US11180080B2 (en) * | 2019-12-13 | 2021-11-23 | Continental Automotive Systems, Inc. | Door opening aid systems and methods |
Also Published As
Publication number | Publication date |
---|---|
CN110084091A (en) | 2019-08-02 |
DE102019101938A1 (en) | 2019-08-01 |
US10345822B1 (en) | 2019-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10345822B1 (en) | Cognitive mapping for vehicles | |
US10853670B2 (en) | Road surface characterization using pose observations of adjacent vehicles | |
US11312372B2 (en) | Vehicle path prediction | |
US10733510B2 (en) | Vehicle adaptive learning | |
US10981564B2 (en) | Vehicle path planning | |
US11783707B2 (en) | Vehicle path planning | |
US10528055B2 (en) | Road sign recognition | |
US11460851B2 (en) | Eccentricity image fusion | |
US9672446B1 (en) | Object detection for an autonomous vehicle | |
US20200020117A1 (en) | Pose estimation | |
US20170316684A1 (en) | Vehicle lane map estimation | |
US10769799B2 (en) | Foreground detection | |
US11521494B2 (en) | Vehicle eccentricity mapping | |
US11055859B2 (en) | Eccentricity maps | |
US11030774B2 (en) | Vehicle object tracking | |
US11662741B2 (en) | Vehicle visual odometry | |
US11138452B2 (en) | Vehicle neural network training | |
US11119491B2 (en) | Vehicle steering control | |
CN111791814A (en) | Vehicle capsule network | |
US10599146B2 (en) | Action-conditioned vehicle control | |
US20230186587A1 (en) | Three-dimensional object detection | |
US11610412B2 (en) | Vehicle neural network training | |
US20240037961A1 (en) | Systems and methods for detecting lanes using a segmented image and semantic context | |
US20230368541A1 (en) | Object attention network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: FORD GLOBAL TECHNOLOGIES, LLC, MICHIGAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PARCHAMI, MOSTAFA;TAIMOURI, VAHID;PUSKORIUS, GINTARAS VINCENT;SIGNING DATES FROM 20180116 TO 20180125;REEL/FRAME:044742/0375 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |